Inside the Storage Engine: Ghost cleanup in depth

Over the years I was in the Storage Engine team I saw a lot of concern on the various forums about the ghost cleanup task. There have been a few bugs with it in previous versions (see these KB articles – 932115 and 815594) and there's very little info available on it. For some reason I didn't get around to posting about it on my old blog but today I want to go into some depth on it.

So what is ghost cleanup? It's a background process that cleans up ghost records – usually referred to as the ghost cleanup task. What's a ghost record? As I described briefly in the Anatomy of a record post last week, a ghost record is one that's just been deleted in an index on a table (well, actually it gets more complicated if snapshot isolation of some form is enabled but for now, a record in an index is a good start). Such a delete operation never physically removes records from pages – it only marks them as having been deleted, or ghosted. This is a performance optimization that allows delete operations to complete more quickly. It also allows delete operations to rollback more quickly because all that needs to happen is to unmark the records as being deleted/ghosted, instead of having to reinsert the deleted records. The deleted record will be physically removed (well, its slot will be removed – the record data isn't actually overwritten) later by the background ghost cleanup task. The ghost cleanup task will leave a single record on the page to avoid having to deallocate empty data or index pages.

The ghost cleanup task can't physically delete the ghost records until after the delete transaction commits because the deleted records are locked and the locks aren't released until the transaction commits. As an aside, when ghost records exist on a page, even a NOLOCK or READ UNCOMMITTED scan won't return them because they are marked as ghost records.

When a record is deleted, apart from it being marked as a ghost record, the page that the record is on is also marked as having ghost records in one of the allocation maps – the PFS page (post coming soon!) – and in its page header. Marking a page as having ghost records in a PFS page also changes the database state to indicate that there are some ghost records to cleanup – somewhere. Nothing tells the ghost cleanup task to clean the specific page that the delete happened on – yet. That only happens when the next scan operation reads the page and notices that the page has ghost records.

The ghost cleanup task doesn't just start up when it's told to – it starts up in the background every 5 seconds and looks for ghost records to cleanup. Remember that it won't be told to go cleanup a specific page by a delete operation – it's a subsequent scan that does it, if a scan happens. When the ghost cleanup task starts up it checks to see if its been told to cleanup a page – if so it goes and does it. If not, it picks the next database that is marked as having some ghost records and looks through the PFS allocation map pages to see if there are any ghost records to cleanup. It will check through or cleanup a limited number of pages each time it wakes up – I remember the limit is 10 pages – to ensure it doesn't swamp the system. So – the ghost records will eventually be removed – either by the ghost cleanup task processing a database for ghost records or by it specifically being told to remove them from a page. If it processes a database and doesn't find any ghost records, it marks the database as not having any ghost records so it will be skipped next time.

How can you tell its running? On SQL Server 2005, you can use the following code to see the ghost cleanup task in sys.dm_exec_requests:

SELECT * INTO myexecrequests FROM sys.dm_exec_requests WHERE 1 = 0;

GO

SET NOCOUNT ON;

GO

DECLARE @a INT

SELECT @a = 0;

WHILE (@a < 1)

BEGIN

INSERT INTO myexecrequests SELECT * FROM sys.dm_exec_requests WHERE command LIKE '%ghost%'

SELECT @a = COUNT (*) FROM myexecrequests

END;

GO

SELECT * FROM myexecrequests;

GO

And on SQL Server 2000 you need to use sysprocesses (well, on SQL Server 2005 this works as well but its fake view derived from the DMVs):

SELECT * INTO mysysprocesses FROM master.dbo.sysprocesses WHERE 1 = 0;

GO

SET NOCOUNT ON;

GO

DECLARE @a INT

SELECT @a = 0;

WHILE (@a < 1)

BEGIN

INSERT INTO mysysprocesses SELECT * FROM master.dbo.sysprocesses WHERE cmd LIKE '%ghost%'

SELECT @a = COUNT (*) FROM mysysprocesses

END;

GO

SELECT * FROM mysysprocesses;

GO

The output from sys.dm_exec_requests is (with most unused and uninteresting columns stripped off):

session_id request_id start_time status command
———- ———– ———————– ———— —————-
15 0 2007-10-05 16:34:49.653 background GHOST CLEANUP

So how can you tell if a record is ghosted? Let's engineer some and look at it with DBCC PAGE – I've stripped out the uninteresting bits of the output and highlighted the interesting ghost parts:

CREATE TABLE t1 (c1 CHAR(10))

CREATE CLUSTERED INDEX t1c1 on t1 (c1)

GO

BEGIN TRAN

INSERT INTO t1 VALUES ('PAUL')

INSERT INTO t1 VALUES ('KIMBERLY')

DELETE FROM t1 WHERE c1='KIMBERLY';

GO

DBCC IND ('ghostrecordtest', 't1', 1);

GO

DBCC TRACEON (3604);

GO

DBCC PAGE ('ghostrecordtest', 1, 143, 3);

GO

<snip>

m_freeData = 130    m_reservedCnt = 0   m_lsn = (20:88:20)
m_xactReserved = 0  m_xdesId = (0:518)  m_ghostRecCnt = 1
m_tornBits = 0

<snip>

Slot 0 Offset 0x71 Length 17

Record Type = GHOST_DATA_RECORD      Record Attributes = NULL_BITMAP
Memory Dump @0x6256C071

00000000:   1c000e00 4b494d42 45524c59 20200200 †….KIMBERLY ..
00000010:   fc†††††††††††††††††††††††††††††††††††.
UNIQUIFIER = [NULL]

Slot 0 Column 1 Offset 0x4 Length 10

c1 = KIMBERLY

Slot 1 Offset 0x60 Length 17

Record Type = PRIMARY_RECORD         Record Attributes = NULL_BITMAP
Memory Dump @0x6256C060

00000000:   10000e00 5041554c 20202020 20200200 †….PAUL      ..
00000010:   fc†††††††††††††††††††††††††††††††††††.
UNIQUIFIER = [NULL]

Slot 1 Column 1 Offset 0x4 Length 10

c1 = PAUL

Let's see what goes on the transaction log during this process (remember this is undocumented and unsupported – do it on a test database) – I've stripped off a bunch of the columns in the output:

DECLARE @a CHAR (20)

SELECT @a = [Transaction ID] FROM fn_dblog (null, null) WHERE [Transaction Name]='PaulsTran'

SELECT * FROM fn_dblog (null, null) WHERE [Transaction ID] = @a;

GO

Current LSN              Operation         Context             Transaction ID
———————— —————– ——————- ————–
00000014:00000054:0011   LOP_BEGIN_XACT    LCX_NULL            0000:00000206
00000014:0000005a:0012   LOP_INSERT_ROWS   LCX_CLUSTERED       0000:00000206
00000014:0000005a:0013   LOP_INSERT_ROWS   LCX_CLUSTERED       0000:00000206
00000014:0000005a:0014   LOP_DELETE_ROWS   LCX_MARK_AS_GHOST   0000:00000206
00000014:0000005a:0016   LOP_DELETE_ROWS   LCX_MARK_AS_GHOST   0000:00000206

So there are the two inserts followed by the two deletes – with the rows being marked as ghost records. But where's the update to the PFS page? Well, changing the ghost bit in a PFS page is not done as part of a transaction. We'll need to look for it another way (apart from just dumping everything in the transaction log and searching manually):

SELECT Description, * FROM fn_dblog (null, null) WHERE Context like '%PFS%' AND AllocUnitName like '%t1%';

GO

Description               Current LSN              Operation        Context   Transaction ID
————————- ———————— —————- ——— —————-
Allocated 0001:0000008f   00000014:00000054:0014   LOP_MODIFY_ROW   LCX_PFS   0000:00000208
                          00000014:0000005a:0015   LOP_SET_BITS     LCX_PFS   0000:00000000

The first one is just allocating a page but the second one is the one we're looking for – it's changed the bit for the page to say it has ghost records on. Let's commit the transaction and see what happens, filtering out all the previous transaction log:

SELECT MAX ([Current LSN]) FROM fn_dblog (null, null);

GO

— 00000014:0000005e:0001

COMMIT TRAN

GO

SELECT [Page ID], * FROM fn_dblog (null, null) WHERE [Current LSN] > '00000014:0000005e:0001';

GO

Page ID         Current LSN              Operation          Context         Transaction ID
————— ———————— —————— ————— ————–
NULL            00000014:0000005f:0001   LOP_COMMIT_XACT    LCX_NULL        0000:00000206
0001:0000008f   00000014:00000060:0001   LOP_EXPUNGE_ROWS   LCX_CLUSTERED   0000:00000000

We see that almost as soon as the transaction has commited, the ghost cleanup task goes in and process the page. Let's check a page dump to make sure the record is gone, and show that the contents of the record are still on the page (again, with non-relevant bits snipped out):

DBCC PAGE ('ghostrecordtest', 1, 143, 3);

GO

<snip>

m_freeData = 130         m_reservedCnt = 0        m_lsn = (20:94:1)
m_xactReserved = 0       m_xdesId = (0:518)       m_ghostRecCnt = 0
m_tornBits = 0

<snip>

Record Type = PRIMARY_RECORD         Record Attributes = NULL_BITMAP
Memory Dump @0x6212C060

00000000:   10000e00 5041554c 20202020 20200200 †….PAUL      ..
00000010:   fc†††††††††††††††††††††††††††††††††††.
UNIQUIFIER = [NULL]

Slot 0 Column 1 Offset 0x4 Length 10

c1 = PAUL

DBCC PAGE ('ghostrecordtest', 1, 143, 2);

GO

<snip>

6212C040:   01000000 00000000 00000000 00000000 †…………….
6212C050:   00000000 00000000 00000000 00000000 †…………….
6212C060:   10000e00 5041554c 20202020 20200200 †….PAUL      ..
6212C070:   fc1c000e 004b494d 4245524c 59202002 †…..KIMBERLY .
6212C080:   00fc0000 00000000 00000000 01000000 †…………….
6212C090:   00000000 13000000 01000000 00000000 †…………….

<snip>

So even though the record no longer exists, all that happened was that the slot was removed from the slot array at the end of the page – the record contents will remain on the page until the space is reused.

In the next post I'll go into details of the PFS and other allocation maps. Btw – please let me know if this stuff is interesting – I'd like to know where to spend blogging time. Thanks!

Posted Under:

Example Scripts, Inside the Storage Engine, On-Disk Structures, Transaction Log, Undocumented commands

55 thoughts on “Inside the Storage Engine: Ghost cleanup in depth”

GarbageCollector says:

January 4, 2008 at 2:33 pm

Yes, this stuff is interesting. Please keep posting!!

Reply
Praveen Voleti says:

October 7, 2008 at 6:18 pm

Absolutely Awesome explanation. Thank you very much.

Reply
Vaibhav Bapat says:

January 28, 2009 at 5:14 am

Hi all,
The above article extremely excellent, it clear all doubt in ghost cleanup process without doing any question. This article focus on different angle to solve and understand different question raise against it

Thank a lot
Vaibhav Bapat

Reply
ladnar luap says:

March 25, 2009 at 4:00 pm

"The ghost cleanup task will leave a single record on the page to avoid having to deallocate empty data or index pages." – this is not true in SQL Server 2005 or later version. Unless this is the first page maintained in system table, a page could be de-allocated by ghost cleanup.

Reply
paul says:

March 25, 2009 at 4:07 pm

Yes, I think you’re right. That was certainly the case in 2000. But why post the comment with the obvious fake name?

Reply
Amy says:

October 12, 2009 at 3:05 pm

Very well articulated!!! (I know you must get that a lot )
But thoroughly enjoyed reading this

Reply
ERIC says:

April 23, 2013 at 12:29 am

Paul ,may I know why heap doesn’t support ghost records ?

Reply
1. Paul Randal says:
  
  April 24, 2013 at 10:44 am
  
  It doesn’t need them as there’s no way that a duplicate record being inserted before the deleting transaction rolls back could cause the rollback to fail a uniqueness test, or cause a page to split.
  
  Reply
  1. ERIC says:
    
    May 10, 2013 at 3:48 am
    
    thank you Paul , and sorry for my late reply.
    But I cannot understand for your explanation, can you kind of helping to explain it more detail or give a example?
    
    Reply
Gurvinder says:

June 9, 2014 at 1:09 pm

We are in a situation where we have very busy OLTP system which is doing lot of deletes and then inserts,
the ghost cleanup task cannot keep up, its always lagging behind and we are facing lot of performance problem.

Is there a way to solve performance problem. We dont want to rebuid or reindex table.

ALso if Ghost cleanup task is executed every 10 seconds.
What is the possiblity that previous Ghost clean up task is still running and new one is started, overlapping each other.

Any help is appreciated.

Thanks.

Reply
1. Paul Randal says:
  
  June 10, 2014 at 8:31 am
  
  No possibility that the other one is still running as it only runs on one thread.
  
  No way to solve your perf issue except to manually force ghost record removal by doing something like DBCC FORCEGHOSTCLEANUP or changing your schema so the new inserts are not going into the same place in the index as the deleted records that haven’t been ‘removed’ yet by ghost cleanup.
  
  Reply
Mithalesh Gupta says:

January 19, 2015 at 7:12 am

Beautifully and elaborately explained.

Thank you
Mithalesh

Reply
Altaf says:

September 21, 2015 at 12:13 pm

Hi Paul,

First of all thanks for this wonderful that helps us to server of job better.

I have a questions which not exactly on this topic but very much related to this. If i reduce table rows from 1 million to assume 200 for a table A any new record into this table will reuse the space and how other table will it reuse the space released from table A?

Reply
1. Paul Randal says:
  
  October 1, 2015 at 3:05 pm
  
  When the ghosted records are removed, if a page becomes empty it is deallocated from the table. Any deallocated pages in extents allocated to the table can then be reused by that table. If the deallocated table is the only one allocated in an extent, the whole extent is deallocated and can then be allocated to a different table to be used.
  
  Reply
Arun says:

September 16, 2016 at 3:33 am

Hi Paul is LOP_EXPUNGE_ROWS the ghost clean-up taking place ( after the LOP_COMMIT_XACT ) ?

Reply
1. Paul Randal says:
  
  September 16, 2016 at 8:43 am
  
  Yes
  
  Reply
  1. Arun says:
    
    September 19, 2016 at 12:21 am
    
    ok thanks
    
    Reply
sateesh says:

December 9, 2016 at 8:24 am

Thank you Paul

Reply
Dave says:

March 13, 2018 at 10:52 am

Hi Paul

Great article as always. I have a question. We’re seeing an issue with Ghost Cleanup on our 2014 Enterprise server. It seems to be on one table where a lot of deletes are occurring. using fn_dblog I can see millions of entries for ghost cleanup against the two indexes on this table and the transaction log keeps growing to hundreds of gigabytes in a short period. The main issue seems to be on the clustered index (where were seeing most of the ghost deletes) which is a wide clustered index with 9 columns and data types of bigint, bigint, nvarchar(3), datetime, nvarchar(20), dec(32,16), dec(32,16),nvarchar(20), datetime

There is a separate non clustered primary index against a bigint field.

I’ve been working with SQL for 20 years now and this is the first time I’ve come across this issue. Could this simply be down to the clustered index being so wide?

This is a third party table written by an outside company used for data extracts from our AX system.

Cheers
Dave

Reply
1. Paul Randal says:
  
  March 13, 2018 at 4:07 pm
  
  Is it the deletes themselves or the ghost cleanup that’s causing the log growth? If the latter, it could be that the ghost cleanup is being triggered in one big batch (various reasons for this) in which case you might want to disable it with trace flag 661 (https://support.microsoft.com/en-us/help/920093/tuning-options-for-sql-server-when-running-in-high-performance-workloa) and rely on regular index maintenance to reclaim the space from the deleted records. If the former, the deletes could be done in small batches to allow the log to not grow so much. Also, make sure that nothing is preventing the log from being cleared and that log backups are being done at a frequency that prevents the log growing so large.
  
  Reply
tong says:

April 28, 2018 at 2:50 am

Hi Paul:
Thank you very much for your article.i am bothered by this problem.
In 10 minutes the sqlserver log flush becomed higher, and write/sec was close to 30000 and write bytes/sec was 100M+/sec ,the pcie disk io is full.
I’m sure that the server query was normal and there was no abnormal data file growthf at that moment，but the log file was doubled than usual.so i used dump_log() functions to do statistics on logs，i found lots of ‘ShrinkD’ 、’ghostcleanuptask’、 ‘BTree Split/Shrink’ in that moment.

the results:
select distinct [Transaction Name] ,count(*) from [t1].[dbo].split0426 where [Begin Time]>’2018/04/26 09:36:00′ and [Begin Time]<'2018/04/26 09:55:00' group by [Transaction Name] order by count(*) desc
ShrinkD 957691
INSERT 161702
BTree Split/Shrink 110077
UPDATE 107921
SplitPage 18663
implicit_transaction 16592
GhostCleanupTask 14072
user_transaction 245
MERGE 219
AllocHeapPageSysXactDML 143
DELETE 43
AllocPages 22

select convert(varchar(16),[Begin Time]) as min, count(*) as counts from [t1].[dbo].split0426 where [Transaction Name]='GhostCleanupTask'
group by convert(varchar(16),[Begin Time]) order by convert(varchar(16),[Begin Time])
2018/04/26 09:31 4
2018/04/26 09:32 4
2018/04/26 09:33 4
2018/04/26 09:34 4
2018/04/26 09:35 4
2018/04/26 09:36 946
2018/04/26 09:37 525
2018/04/26 09:38 576
2018/04/26 09:39 534
2018/04/26 09:40 654
2018/04/26 09:41 717
2018/04/26 09:42 1028
2018/04/26 09:43 539
2018/04/26 09:44 1317
2018/04/26 09:45 1229
2018/04/26 09:46 1088
2018/04/26 09:47 1837
2018/04/26 09:48 1212
2018/04/26 09:49 329
2018/04/26 09:50 454
2018/04/26 09:51 592
2018/04/26 09:52 480
2018/04/26 09:53 7
2018/04/26 09:54 8
2018/04/26 09:55 7
select convert(varchar(16),[Begin Time]) as min, count(*) as counts from [t1].[dbo].split0426 where [Transaction Name]='ShrinkD'
group by convert(varchar(16),[Begin Time]) order by convert(varchar(16),[Begin Time])
2018/04/26 09:36 75247
2018/04/26 09:37 22963
2018/04/26 09:38 20069
2018/04/26 09:39 17273
2018/04/26 09:40 37594
2018/04/26 09:41 29922
2018/04/26 09:42 33271
2018/04/26 09:43 17801
2018/04/26 09:44 34105
2018/04/26 09:45 13176
2018/04/26 09:46 173689
2018/04/26 09:47 113353
2018/04/26 09:48 241987
2018/04/26 09:49 5350
2018/04/26 09:50 5058
2018/04/26 09:51 55132
2018/04/26 09:52 61694
2018/04/26 09:53 3
2018/04/26 09:54 4
2018/04/26 09:55 3
2018/04/26 09:56 5

some configurations:
windows server2008 sp2
sqlserver 2014
database target_recovery_time = 60 seconds
database auto shrink and index auto shrink false
backup log per hour
physical disk:pcie ssd 3.0T used 70%；

other operations:
Recently clean up the useless index, organize index fragmentation,
increase the automatic archiving data operation, about 500w daily backup and delete；

my question：
Is GhostCleanupTask the cause of the problem?
what's the shrinkd in logs? shrink database?
how was it triggered?
why did the database engine use so high io without any protection?

thanks very much,all the best.

Reply
1. Paul Randal says:
  
  April 28, 2018 at 8:51 am
  
  Looks like a) you’ve got auto-shrink enabled for the database, or there’s an Agent job doing shrink b) you’ve had a lot of deletes, and the shrink ‘touched’ a lot of the pages with deleted records on so they got queued up for the ghost cleanup task to take care of. You need to fix a).
  
  Reply
  1. tong says:
    
    May 1, 2018 at 2:23 am
    
    Hello, Paul.
    I checked the database. The database and index AUTO_SHRINK set OFF, and there was no shrinking job at that time.
    Why would there be so many ghostcleanup tasks suddenly, this is what I do not understand. I will continue to observe.
    Thank you very much.
    
    Reply
    1. Paul Randal says:
      
      May 1, 2018 at 5:05 am
      
      Well someone or something started a shrink, as shrinkd transactions only come from a manual or automatic shrink.
      
      Reply
      1. tong says:
        
        May 1, 2018 at 3:39 pm
        
        Thank you for your article and reply, which made us understand a lot of knowledge that the official website did not have. It’s a great thing. I’ll check it again. Thank you again.
      2. tong says:
        
        May 4, 2018 at 12:13 am
        
        Hello, Paul.
        I seem to have found the answer.When it happened again,i set my server dbcc traceon(661,-1),the server resumed immediately.
        And why would there be so many ghostcleanup tasks suddenly without any shrink, it seems:if the number of hot ghost pages is above the specified (hard-coded) limit,the task runs non-stop till the count comes down below the threshold value.https://zh.scribd.com/document/323838917/Ghost-Records-in-SQL-Server
        Now i resumed ghostcleanup tast auto run,and i will add a “dbcc forceghostcleanup” job after the deleted job,and keep looking at it
        Thanks very much.
Jon S says:

May 2, 2018 at 12:27 pm

Very informative blog, thank you.
Quick question , does every deleted page must go through “Ghost Cleanup” process before it’s re-allocated?

Reply
1. Paul Randal says:
  
  May 2, 2018 at 12:38 pm
  
  The following have to go through the ghost cleanup process when they’re deleted: all index records, all data records in a clustered index, all data records in a heap when snapshot isolation is enabled.
  
  Reply
Cody says:

May 2, 2018 at 11:12 pm

What happens when the ghost clean-up can’t complete (for example if the records are locked on a readable secondary and are being queried). Does it exit immediately and try again after 5s when it gets spun up again or does it sit there waiting for the locks to be released?

Reply
1. Paul Randal says:
  
  May 3, 2018 at 8:20 am
  
  That scenario isn’t possible. Read activity on a readable-secondary doesn’t cause any locking/blocking on the primary (or have any effect at all on the primary). But if the ghost cleanup task can’t get the page lock it needs, it’ll skip cleaning that page.
  
  Reply
Paul Randal says:

May 4, 2018 at 7:58 am

Tong – something is reading the pages that have had a large number of deletes on them, and that is triggering ghost cleanup. There isn’t a threshold, there’s a queue of pages with known ghost records, that is added to when a page with ghost records is encountered. If there are a lot of such pages, the ghost cleanup task will clean up everything on it’s queue next time it wakes up, rather than just searching for ghost records to clean up.

Reply
1. tong says:
  
  May 4, 2018 at 2:41 pm
  
  If so,when the ghost cleanup task clean up everything on it’s queue,it seems that there is not the limit of 10 pages?…
  
  Reply
  1. Paul Randal says:
    
    May 4, 2018 at 2:43 pm
    
    Correct
    
    Reply
    1. tong says:
      
      May 4, 2018 at 4:03 pm
      
      Ok,but why not…If there is a limit,the system will be better…
      
      Reply
      1. Paul Randal says:
        
        May 4, 2018 at 6:00 pm
        
        Because that’s just the way it was written.
tong says:

May 4, 2018 at 11:34 pm

Ok,i see.Really appreciate all your help:)

Reply
Sam Greene says:

July 5, 2018 at 1:58 pm

Paul,

I’ve got a table with one row that is updated frequently by a vendor application, table is called QRTZ_FIRED_TRIGGERS and used by Java Quartz module as part of a vendor supported application. I am seeing a high version_ghost_record_count. If I don’t rebuild the index on this table often, our transactions per second and CPU eventually climb and the SQL Instance performance becomes poor. Read committed snapshot is set to on. Have you written about ghost records and snapshot isolation since this article? I see the ghost record cleanup in this database and have also tried to force a full scan, but have no luck getting this table to play nice.

Thanks,
Sam

Reply
1. Paul Randal says:
  
  July 5, 2018 at 2:11 pm
  
  Hi Sam,
  
  Several options here:
  1) disable snapshot isolation
  2) rebuild frequently
  3) identify the long-running versioning query that’s preventing version store cleanup and ghost removal
  
  My guess is you’re going to be stuck with #2.
  
  Thanks
  
  Reply
  1. Sam says:
    
    July 11, 2018 at 4:59 pm
    
    Paul,
    
    Thanks for your advice. I’m unable to identify any query running. My tempdb version store is also massive (at 155GB). Need to figure out how to get that issue under control as well. Hoping an app restart will let the cleanup happen.
    
    One thing I don’t understand is why having those version ghost records causes transactions (tps perfmon counter) to ramp up over time. Is the engine having to scan all pages and check to see if they are marked as deleted? If I stop rebuilds, the TPS gets up to 30-40k/s and we start to see system-wide impact.
    
    Thanks Again,
    
    Sam
    
    Reply
    1. Paul Randal says:
      
      July 12, 2018 at 3:36 pm
      
      Hi Sam – I’m afraid it’s impossible to say without digging in deeper to see what’s going on on the system.
      
      Reply
  2. Vladimir says:
    
    October 29, 2018 at 10:35 am
    
    I’ve also encountered some version_ghost_record_count issues before, but they were all caused by a very long-running transactions (for hours and even days sometimes), and everything would come in order as soon as those transactions were gone. But today I faced something new and I’m at a loss: sys.dm_tran_active_snapshot_database_transactions is empty, and there are no long-living (more than a minute or so) transactions active on the server at all, but i’ve got millions of version ghost records in a frequently rewritten table (a single user process routinely inserts about 1000 short rows, processes them, and then deletes them all) , and I could deduce that such beahior lasts for hours already. What else, besides long-running queries could be the reason for this?
    
    Reply
    1. Paul Randal says:
      
      October 29, 2018 at 10:49 am
      
      It could be that nothing else has read those pages (so they’re not entered into the to-do list) and the ghost cleanup task is working through another database or portion of this one.
      
      Reply
Max Vernon says:

October 15, 2018 at 7:57 am

Does a checkpoint need to occur before ghost cleanup runs? Specifically, If I delete rows for a clustered index, then run DBCC CLEANPAGE against pages in the index, is it *required* to run a checkpoint before the DBCC CLEANPAGE in order for it to be effective?

Reply
1. Paul Randal says:
  
  October 16, 2018 at 12:32 pm
  
  No – checkpoints have nothing to do with the ghost cleanup task’s ability to remove deleted records.
  
  Reply
Bertrand says:

June 4, 2019 at 2:41 am

Hello Paul,

nice explanation, I really appreciate it, thank you very much.

Is it possible ghost cleanup process would be triggered by change tracking auto cleanup ?

At this moment, we have an issue that seems corresponding to all issues described above.

Thanks a lot.

Bertrand.

Reply
1. Paul Randal says:
  
  June 5, 2019 at 4:40 pm
  
  Only on the change tracking tables themselves.
  
  Reply
Brian Beuning says:

July 27, 2021 at 12:05 pm

Does an update that changes indexed columns cause ghost records?

Reply
1. Paul Randal says:
  
  July 27, 2021 at 12:25 pm
  
  Yes, because an update to an indexed column means a delete+insert instead of an update.
  
  Reply
Guru says:

December 4, 2024 at 5:35 am

What is the impact of the ghost cleanup process on table fragmentation? If none, are there any processes that automatically reduce table fragmentation in SQL Server?

Reply
1. Paul Randal says:
  
  December 10, 2024 at 2:30 pm
  
  None – apart from making from the deleted row actually reusable. No – there’s nothing to automatically reduce index fragmentation.
  
  Reply
Jorge says:

April 29, 2025 at 5:17 am

Hi. Having a issue with an extreamly busy Service Broker Queue, probably due to the Ghost Cleanup not being able to keep up to mantain the queue empty.
As you know SB is continually inserting rows in a queue which at the end uses an ordinal table under the covers.
Every receive issued is actually an “delete …. output” over the base table.
So we have reached a point (rougly 2000 events/second of an average of 2 kb each) where the table backing up the queue cannot be cleared an begins to growth in size, in spite of keeping a relative low number of records.
I think this space comes from the pages not recovered by the Ghost Cleanup and with every consecuent receive, the storage engine has to read more physical pages (fragmentation) to get the events.
So we reach a point where we can only rebuild the queue but this causes lots of locks.
Reading deeply your explanation about the Ghost Cleanup is clear that splitting the queue even in different databases will not help and probably we have topped the maximun insert/delete thoughput that SQL can keep in a single table concurrently?.

Reply
1. Paul Randal says:
  
  April 29, 2025 at 11:03 am
  
  Add a job that once a minute does a select count(*) with nolock of the queue. That will read every page and enter them in the ghost cleanup task queue so they’ll be cleaned up very quickly. Same issue as for a very busy system with an asp.net session state tracking table we’ve seen in the past. Hope that helps.
  
  Reply
  1. Jorge says:
    
    April 30, 2025 at 12:02 am
    
    Thank you so much for your response, Paul. I’ve been following you for almost 20 years, and I’m amazed at how you continue to give your time to the community.
    
    I’ve done what you said, and I don’t think it worked. The table size, as reported by sys.dm_exec_partition_stats, keeps growing.
    There’s one nuance I haven’t mentioned. There’s another database in the same instance that has an identical event handling system with Service Broker ant the same rate of events.
    Well, I’ve tried to take a trace to monitor Ghost Cleanup activity, and I see that it spends much more time in that other database. Furthermore, since it has the same number of events, there’s no problem with the queue size there, and it remains much more stable.
    
    Could it be that Ghost Cleanup starts its work in that database and spends a more time, “stucked” in that first database?
    
    Reply
    1. Paul Randal says:
      
      April 30, 2025 at 9:23 am
      
      Make sure the select is forcing the clustered index using WITH (INDEX(1)). If that doesn’t work, we’d need to look at the system to see what’s going on.
      
      Reply
      1. Jorge says:
        
        May 14, 2025 at 4:51 am
        
        Sorry for the delay, I have been out of the office for a few days.
        I tried running the query with the hint and same result. Size of the queue keeps growing.
        
        I will inform my manager and contact you via email to see if you or someone on your team can help.
        
        Thanks Paul.

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.

Inside the Storage Engine: Ghost cleanup in depth

Posted Under:

55 thoughts on “Inside the Storage Engine: Ghost cleanup in depth”

Leave a Reply Cancel reply

Other articles

Presenting a workshop at PASS Summit West in Seattle!

Summer Sale: $699 Blackbelt Bundle!

The SQL Server Transaction Log, Part 4: Log Records

SQL101: Application data consistency checking

SQL101: Phishing attacks

SQL101: Top Ten SQL Server Performance Tuning Best Practices

Imagine feeling confident enough to handle whatever your database throws at you.

Inside the Storage Engine: Ghost cleanup in depth

Check It Out!

SQLskills Blackbelt Bundle and 2022 Updates - 1 Year Access

Posted Under:

55 thoughts on “Inside the Storage Engine: Ghost cleanup in depth”

Leave a Reply Cancel reply

Other articles

Presenting a workshop at PASS Summit West in Seattle!

Summer Sale: $699 Blackbelt Bundle!

The SQL Server Transaction Log, Part 4: Log Records

SQL101: Application data consistency checking

SQL101: Phishing attacks

SQL101: Top Ten SQL Server Performance Tuning Best Practices

Imagine feeling confident enough to handle whatever your database throws at you.