Why LOB data makes shrink run slooooowly (T-SQL Tuesday #006)

[Edit January 2015: Everything in this post applies to SQL Server 2012 and 2014 still.]

This blog post is part of the monthly T-SQL Tuesday series that fellow-MVP Adam Machanic (twitter|blog) organizes. This month’s T-SQL Tuesday is being run by Michael Coles (twitter|blog) and is on the subject of reporting – see this blog post for details.

The theme of this month’s T-SQL Tuesday is LOB data so I’m going to stretch things a little and explain why LOB data makes shrink performance really suck (that’s a technical term :-)

You all know that I really don’t like data file shrinking (although it can sometimes be absolutely necessary. See these posts for my soap-box diatribes:

The way that data file shrink works is that it picks up pages from the end of the data file and tries to move them as far towards the beginning of the data file as it can. As part of doing this, it has to fix up any structures that page is part of.

For example, if the page is part of an index, there will be other pages in the index that have physical links to the page being moved – these linkages have to be fixed up. The left- and right-hand neighboring pages in the same level of the index are easy to fix, as the page being moved points to them.

However, the page in the next level up towards the root page in the index is more tricky, as there’s no back-link pointing up to it – a scan of the next level in the index is required to find the page that points down to the page being moved. This should be pretty fast as the index levels get much smaller pretty quickly as you move up towards the root page (the magnitude change in size depends on the fanout of the index – I explain this in the post Inside sys.dm_db_index_physical_stats).

Important point: Whenever a page is being moved and there’s some portion of a structure on the page that doesn’t have a backlink to what is pointing to it, a scan is involved.

LOB values can be stored in-row or off-row (i.e. as part of the data or index record, or in a tree of text records on different pages altogether). Legacy LOB types (text, ntext, image) are stored off-row by default. New LOB types (varchar(max), nvarchar(max), XML, varbinary(max)) are stored in-row by default, up to a maximum size of 8000 bytes as long as there is space in the record.

When a LOB value is stored off-row, there is a complex pointer stored in the data or index row (called a blob root) that contains a pointer to the physical location of the top of the tree of text records that make up the LOB value, the size of the first record being pointed at, and a timestamp (not a time, a timestamp data-type value). The text record being pointed to also contains the same timestamp.

Let’s take a look:

CREATE TABLE TextTest (c1 INT, c2 TEXT);

GO

INSERT INTO TextTest VALUES (1, ‘a’);

GO

Now figure out the first page in the table using DBCC IND or my sp_AllocationMetadata script, or whatever, and run DBCC PAGE on it:

DBCC TRACEON (3604);

DBCC PAGE (TestDB, 1, 737, 3);

GO

<snip>

Slot 0 Column 0 Offset 0x4 Length 4

c1 = 1

c2 = [Textpointer] Slot 0 Column 1 Offset 0xf Length 16

TextTimeStamp = 131137536 RowId = (1:152:0)

If we follow the pointer to page (1:152) and run DBCC PAGE on it, we see:

DBCC TRACEON (3604);

DBCC PAGE (foo, 1, 152, 3);

GO

<snip>

Blob row at: Page (1:152) Slot 0 Length: 84 Type: 0 (SMALL_ROOT)

Small Blob Id: 131137536 Size:1

6855C074: 61 a

Notice that the two numbers in bold match. Notice also that there’s no backlink from the off-row LOB value back to the ‘owning’ data record.

When a text page is moved by data file shrink, a table scan must be performed for each LOB value to find the ‘owning’ data record so the pointer can be updated. (If the LOB value is from an INCLUDEd column in a nonclustered index, then a similar scan must be performed of the nonclustered index. If the LOB value is lower down in the text tree for a >8000 byte LOB value, all text pages for that table or index must be scanned.)

Very slow. Very, very slow.

There’s a similar issue if DBCC CHECKDB finds an orphaned text record – it can’t tell who is supposed to own it so it has to go back and rescan all tables in its current batch to figure it out – which can make DBCC CHECKDB take a lot longer than usual too.

During SQL Server 2005 development when the new LOB types were implemented, I pushed for a backlink to alleviate these problems, but the engineering effort wasn’t worth it. So we’re stuck with it.

Yet one more reason that shrink is nasty!

20 thoughts on “Why LOB data makes shrink run slooooowly (T-SQL Tuesday #006)”

Soren Agerbo says:

January 29, 2015 at 2:48 am

Hi Paul,
Thanks for a fantastic post about DBCC Shrinkfile.

Do you know is this still is an issue/behavior in SQL 2012 and 2014?

Reply
1. Paul Randal says:
  
  January 29, 2015 at 7:25 am
  
  Yes, it’s an issue in all versions.
  
  Reply
John Heimiller says:

September 28, 2015 at 10:58 am

Thanks…good explanation…its a problem I’m having right now with a big varchar(max) off-row LOB based table. wish that backlink was there right now….ugh.

Reply
Jeff says:

January 26, 2016 at 11:31 am

https://support.microsoft.com/en-us/kb/2967240

This KB predates your January 2015 comment, but I just wanted to double-check since they both come up when searching on this issue. The KB references the complete inability to shrink tables with LOB data, while this blog post details why shrinking takes forever. Is that correct?

Reply
1. Paul Randal says:
  
  February 5, 2016 at 2:20 pm
  
  Not quite – it’s saying that the LOB compaction portion of the shrink/reorganize commands isn’t working. LOB compaction aside, all a shrink does is move pages to the start of the file so the space at the end can be removed. So the shrink has always worked, albeit slowly, but the LOB compaction doesn’t. Hope that makes sense.
  
  Reply
Faceofit says:

July 26, 2016 at 6:16 am

Thanks for the wonderful post. I hope I am not too late. Now that SQL 2016 is out, Do you know if this is improved in SQL 2016?

Reply
1. Paul Randal says:
  
  July 31, 2016 at 1:31 pm
  
  Nope – no change in 2016
  
  Reply
John says:

May 10, 2017 at 7:30 pm

Paul,
So if I do an EMPTYFILE, will the process not move those orphaned LOB pages (albeit slowly). I would think it would inspect them like a shrink would…right?

I am hopeful because the table is now just a few hundred records and scanning it a hundred thousand times for all those orphaned LOB values might not be so bad.

Reply
1. Paul Randal says:
  
  May 11, 2017 at 9:28 am
  
  You mean from my CHECKDB comment? If there are orphaned LOB values in the database (i.e. corruption), shrink will stop IIRC.
  
  Reply
HUI SHI says:

December 29, 2017 at 2:27 pm

Thanks Paul for explaining this . Do you have any solution or workaround to shrink database files filled with LOB datatypes?

Reply
1. Paul Randal says:
  
  December 29, 2017 at 4:27 pm
  
  You can do alter index reorganize on the indexes/tables with LOB columns, and then do a shrink with truncateonly. Apart from that, no.
  
  Reply
  1. Hui Shi says:
    
    April 26, 2019 at 2:37 pm
    
    Is that possible to use the following solution to shrink database files filled with LOB datatypes:
    Step 1: create an extra FileGroup and name it as FileGroup_LOB
    Step 2: find those tables with LOB datatype and move these tables into the newly created FG via indexes recreating
    Step 3: once all LOB tables are moved to the new FG, shrink existing FG which does not have those LOB tables
    
    will this work?
    
    thanks
    
    Reply
    1. Paul Randal says:
      
      April 26, 2019 at 2:39 pm
      
      Unfortunately not as index recreation does’t move off-row LOB data.
      
      Reply
      1. LukeM says:
        
        June 28, 2021 at 7:12 am
        
        I am wondering if you tried this. We have a bunch of tables using the old type, ntext and text and the DB will not shrink, at least within a reasonable enough time. I tried recreating and repopulating the tables with TEXT_IMAGE on a new file group and also updated text to varchar(max). However left the indexes on primary, but no change in shrink performance. Deleting the tables, I was able to shrink at a faster rate. Will try moving indexes over to new file group next to see if that works but was just wondering if you had any luck.
      2. Paul Randal says:
        
        June 28, 2021 at 7:49 am
        
        It doesn’t matter what you do – if the table has LOB data of any kind in the file being shrunk, it will run slowly.
Stephen Morris says:

October 12, 2018 at 6:18 am

A colleague was attempting to shrink a db with clustered columnstore indexes and wasn’t making any progress, this article plus the 1 from Joe – https://www.sqlskills.com/blogs/joe/which-lob-pages-are-associated-with-a-segment/ makes it fairly clear why. It seems the impact of this by design limitation is getting bigger.

Reply
Philip Mattson says:

March 20, 2019 at 9:44 am

I added a request to Microsoft to address this a couple of years ago. Please upvote if you think it’s important:

https://feedback.azure.com/forums/908035-sql-server/suggestions/32898637-can-t-shrink-data-files-due-to-off-row-lob-in-larg

Reply
1. Paul Randal says:
  
  March 21, 2019 at 2:22 pm
  
  I upvoted, but don’t hold your breath as that would be a major rewrite of the text-handling code.
  
  Reply
Romain S. says:

July 2, 2019 at 6:14 am

Thanks Paul,

Will the behavior be the same with the LOB pages on a table that will have been dropped ?

Reply
1. Paul Randal says:
  
  July 2, 2019 at 10:29 am
  
  No – shrink only moves pages that are allocated.
  
  Reply

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.

Why LOB data makes shrink run slooooowly (T-SQL Tuesday #006)

20 thoughts on “Why LOB data makes shrink run slooooowly (T-SQL Tuesday #006)”

Leave a Reply Cancel reply

Other articles

2024: the year in books

2023 (not a typo!): the year in books

Black Friday Super Sale!

Summer School Savings SALE is live!

The Curious Case of… missing log information

The Curious Case of… missing 832 errors during memory corruption

Imagine feeling confident enough to handle whatever your database throws at you.