Earlier this morning I noticed a discussion on the SQL MCM distribution list (that all the original MCM instructors are part of) that was trying to make sense of a huge disparity between tempdb data file usage and log file usage. I explained the answer and thought I'd share it with you all too.

The situation was a tempdb where the data files grew from 30GB to 120GB, running out of space on the disk, but the tempdb log file did not grow at all from its initial size of 1GB! How could that be?

One of the things to consider about tempdb is that logging in tempdb is ultra-efficient. Log records for updates in tempdb, for instance, only log the before image of the data instead of logging both before and after images. There is no need to log the after image – as that is only used for the REDO portion of crash recovery. As tempdb never gets crash-recovered, REDO never occurs. The before image *is* necessary, however, because transactions can be rolled back in tempdb, just like other databases, and so the before image of an update must be available to be able to successfully roll back the update.

Getting to the question though, I can easily explain the observed behavior by considering how a sort spill happens with tempdb.

I can simulate this using a gnarly query on the SalesDB database you can download from our resources page (see the top of the page for the sample databases to download).

I'm going to do a join of my Sales and Products tables and then sort the multi-million row result set by product name:

SELECT S.*, P.* from Sales S
JOIN Products P ON P.ProductID = S.ProductID
ORDER BY P.Name;
GO 

The query plan for this is (using Plan Explorer):

tempdbquery2 Understanding data vs log usage for spills in tempdb

I know that the sort is going to spill out of memory into tempdb in this case. First I checkpoint tempdb (to clear out the log) and then after running the query, I can analyze the transaction log for tempdb.

Looking at the operation in the log:

SELECT
     [Current LSN],
     [Operation],
     [Context],
     [Transaction ID],
     [Log Record Length],
     [Description]
FROM fn_dblog (null, null);
GO

Current LSN            Operation       Context  Transaction ID Len Description
———————- ————— ——– ————– — ———————————————————-
000000c0:00000077:0001 LOP_BEGIN_XACT  LCX_NULL 0000:00005e4d  120 sort_init;<snip>
000000c0:00000077:0002 LOP_BEGIN_XACT  LCX_NULL 0000:00005e4e  132 FirstPage Alloc;<snip>
000000c0:00000077:0003 LOP_SET_BITS    LCX_GAM  0000:00005e4e  60  Allocated 1 extent(s) starting at page 0001:0000aa48
000000c0:00000077:0004 LOP_MODIFY_ROW  LCX_PFS  0000:00005e4e  88  Allocated 0001:0000aa48;Allocated 0001:0000aa49;
000000c0:00000077:0005 LOP_MODIFY_ROW  LCX_PFS  0000:00005e4d  80  Allocated 0001:00000123
000000c0:00000077:0006 LOP_FORMAT_PAGE LCX_IAM  0000:00005e4d  84               
000000c0:00000077:0007 LOP_SET_BITS    LCX_IAM  0000:00005e4e  60               
000000c0:00000077:0009 LOP_COMMIT_XACT LCX_NULL 0000:00005e4e  52               
000000c0:00000077:000a LOP_BEGIN_XACT  LCX_NULL 0000:00005e4f  128 soAllocExtents;<snip>
000000c0:00000077:000b LOP_SET_BITS    LCX_GAM  0000:00005e4f  60  Allocated 1 extent(s) starting at page 0001:0000aa50
000000c0:00000077:000c LOP_MODIFY_ROW  LCX_PFS  0000:00005e4f  88  Allocated 0001:0000aa50;Allocated 0001:0000aa51;<snip>
000000c0:00000077:000d LOP_SET_BITS    LCX_IAM  0000:00005e4f  60               
000000c0:00000077:000e LOP_SET_BITS    LCX_GAM  0000:00005e4f  60  Allocated 1 extent(s) starting at page 0001:0000aa58
000000c0:00000077:000f LOP_MODIFY_ROW  LCX_PFS  0000:00005e4f  88  Allocated 0001:0000aa58;Allocated 0001:0000aa59;<snip>
000000c0:00000077:0010 LOP_SET_BITS    LCX_IAM  0000:00005e4f  60               
000000c0:00000077:0011 LOP_SET_BITS    LCX_GAM  0000:00005e4f  60  Allocated 1 extent(s) starting at page 0001:0000aa60
000000c0:00000077:0012 LOP_MODIFY_ROW  LCX_PFS  0000:00005e4f  88  Allocated 0001:0000aa60;Allocated 0001:0000aa61;<snip>
000000c0:00000077:0013 LOP_SET_BITS    LCX_IAM  0000:00005e4f  60               
000000c0:00000077:0014 LOP_COMMIT_XACT LCX_NULL 0000:00005e4f  52               
000000c0:00000077:0015 LOP_BEGIN_XACT  LCX_NULL 0000:00005e50  128 soAllocExtents;<snip>
000000c0:00000077:0016 LOP_SET_BITS    LCX_GAM  0000:00005e50  60  Allocated 1 extent(s) starting at page 0001:0000aa68
000000c0:00000077:0017 LOP_MODIFY_ROW  LCX_PFS  0000:00005e50  88  Allocated 0001:0000aa68;Allocated 0001:0000aa69;<snip>
000000c0:00000077:0018 LOP_SET_BITS    LCX_IAM  0000:00005e50  60               
000000c0:00000077:0019 LOP_SET_BITS    LCX_GAM  0000:00005e50  60  Allocated 1 extent(s) starting at page 0001:0000aa70
000000c0:00000077:001a LOP_MODIFY_ROW  LCX_PFS  0000:00005e50  88  Allocated 0001:0000aa70;Allocated 0001:0000aa71;<snip>
000000c0:00000077:001b LOP_SET_BITS    LCX_IAM  0000:00005e50  60               
000000c0:00000077:001c LOP_SET_BITS    LCX_GAM  0000:00005e50  60  Allocated 1 extent(s) starting at page 0001:0000aa78
000000c0:00000077:001d LOP_MODIFY_ROW  LCX_PFS  0000:00005e50  88  Allocated 0001:0000aa78;Allocated 0001:0000aa79;<snip>
000000c0:00000077:001e LOP_SET_BITS    LCX_IAM  0000:00005e50  60               
000000c0:00000077:001f LOP_SET_BITS    LCX_GAM  0000:00005e50  60  Allocated 1 extent(s) starting at page 0001:0000aa80
000000c0:00000077:0020 LOP_MODIFY_ROW  LCX_PFS  0000:00005e50  88  Allocated 0001:0000aa80;Allocated 0001:0000aa81;<snip>
000000c0:00000077:0021 LOP_SET_BITS    LCX_IAM  0000:00005e50  60               
000000c0:00000077:0022 LOP_COMMIT_XACT LCX_NULL 0000:00005e50  52               
000000c0:00000077:0023 LOP_BEGIN_XACT  LCX_NULL 0000:00005e51  128 soAllocExtents;<snip>

<snip>

000000cd:00000088:01d3 LOP_SET_BITS    LCX_GAM  0000:000078fc  60  Deallocated 1 extent(s) starting at page 0001:00010e50
000000cd:00000088:01d4 LOP_COMMIT_XACT LCX_NULL 0000:000078fc  52               
000000cd:00000088:01d5 LOP_BEGIN_XACT  LCX_NULL 0000:000078fd  140 ExtentDeallocForSort;<snip>
000000cd:00000088:01d6 LOP_SET_BITS    LCX_IAM  0000:000078fd  60               
000000cd:00000088:01d7 LOP_MODIFY_ROW  LCX_PFS  0000:000078fd  88  Deallocated 0001:00010e68;Deallocated 0001:00010e69;<snip>
000000cd:00000088:01d8 LOP_SET_BITS    LCX_GAM  0000:000078fd  60  Deallocated 1 extent(s) starting at page 0001:00010e68
000000cd:00000088:01d9 LOP_COMMIT_XACT LCX_NULL 0000:000078fd  52               
000000cd:00000088:01da LOP_MODIFY_ROW  LCX_PFS  0000:00005fac  80  Deallocated 0001:00000109
000000cd:00000088:01db LOP_SET_BITS    LCX_SGAM 0000:00005fac  60  ClearBit 0001:00000108
000000cd:00000088:01dc LOP_SET_BITS    LCX_GAM  0000:00005fac  60  Deallocated 1 extent(s) starting at page 0001:00000108
000000cd:00000088:01dd LOP_COMMIT_XACT LCX_NULL 0000:00005fac  52               

(I snipped out a few extraneous log records plus the 6 extra 'Allocated' and 'Deallocated' for each of the PFS row modifications.) 

One of the things I notice is that the sort spill space is allocated in extents, and almost the entire sort – from initialization, through allocating all the extents, to deallocating them – is contained in a few very large transactions. But the transactions aren't actually that large.

Look at the soAllocExtents transaction with Transaction ID 00005e50. It's allocating 4 extents – i.e. 256KB – in a single system transction (4 x mark an extent as unavailable in the GAM, 4 x bulk set the 8 PFS bytes for the 8 pages in the extent, 4 x mark an extent allocated in the IAM). The total size of the log records for this transaction is 1012 bytes. (The first soAllocExtents system transaction only allocates 3 extents, all the others allocate 4 extents.)

When the sort ends, the extents are deallocated one-at-a-time in system transactions called ExtentDeallocForSort. An example is the transaction with Transaction ID 000078fd. It generates log records totalling 400 bytes. This means each 256KB takes 4 x 400 = 1600 bytes to deallocate.

Combining the allocation and deallocation operations, each 256KB of the sort that spills into tempdb generates 2612 bytes of log records.

Now let's consider the original behavior that I explained. If the 90GB was all sort space:

  • 90GB is 90 x 1024 x 1024 = 94371840KB, which is 94371840 / 256 = 368640 x 256KB chunks.
  • Each 256KB chunk takes 2612 bytes to allocate and deallocate, so our 90GB would take 368640 x 2612 = 962887680 bytes of log, which is 962887680 / 1024 / 1024 = ~918MB of log.

And this would explain the observed behavior. 90GB of tempdb space can be allocated and used for a sort spill with roughly 918MB of transaction log, give or take a bit from my rough calculations.

Tempdb logs things very efficiently – especially things that spill out of memory. The next stop in debugging such a problem would be regularly capturing the output of sys.dm_db_task_space_usage to figure out who is using all the space and then digging in from there.

Hope this helps explain things!