Using fn_dblog, fn_dump_dblog, and restoring with STOPBEFOREMARK to an LSN

(Check out my Pluralsight online training course: SQL Server: Logging, Recovery, and the Transaction Log.)

I’ve blogged a bunch about using the undocumented fn_dblog function I helped write (and I’ve got a lot more to come :-) but here’s one I haven’t mentioned on my blog before: fn_dump_dblog (although I have talked about it at conferences last year).

Here’s a scenario: someone dropped a table and you want to find out when it happened and maybe who did it. The default trace has also wrapped so you can’t get the DDL trace from there anymore.

If the transaction log for the DROP hasn’t yet been cleared from the active portion of the log then you’d be able to use fn_dblog to search through the log for the information you need. You might even be able to look in the inactive portion of the log by using trace flag 2536, which instructs the log reader to ignore the log truncation point and dump all possible log records from the log.

But what do you do if the pertinent log records just don’t exist in the log anymore? They’re only in your log backups. You could tediously inch your way through restoring the log backups a few seconds at a time until you find the point at which the DROP took place, and then restore to just before that point so you can get the data back.

Or you could save a whole ton of time and use fn_dump_dblog which allows you to dump and search log records from a log backup file, without having to restore the database!

Edit 8/15/13: Beware – Jonathan just found out from a customer system that uses this extensively that every time fn_dump_dblog is called, it creates a new hidden SQLOS scheduler and up to three threads, which will not go away (and will not be reused) until a server restart. It’s a bug that the SQL team is going to fix now we’ve alerted them to it. Use with caution.

Edit 5/15/15: It’s fixed in SQL Server 2012 SP2+ and SQL Server 2014. The fix won’t be backported any earlier.

Finding a DROP in the log

Here’s an example – I’m going to create a table, populate it, back it up, then drop it.

USE [master];
GO

CREATE DATABASE [FNDBLogTest];
GO
USE [FNDBLogTest];
GO
SET NOCOUNT ON;
GO

-- Create tables to play with
CREATE TABLE [ProdTable] (
    [c1] INT IDENTITY,
    [c2] DATETIME DEFAULT GETDATE (),
    [c3] CHAR (25) DEFAULT 'a');

CREATE TABLE [ProdTable2] (
    [c1] INT IDENTITY,
    [c2] DATETIME DEFAULT GETDATE (),
    [c3] CHAR (25) DEFAULT 'a');
GO

INSERT INTO [ProdTable] DEFAULT VALUES;
GO 1000

-- Take initial backups
BACKUP DATABASE [FNDBLogTest] TO DISK = N'D:\SQLskills\FNDBLogTest_Full.bak' WITH INIT;
GO
BACKUP LOG [FNDBLogTest] TO DISK = N'D:\SQLskills\FNDBLogTest_Log1.bak' WITH INIT;
GO

INSERT INTO [ProdTable2] DEFAULT VALUES;
GO 1000

Now I’ll drop the table and add some more log records:

DROP TABLE [ProdTable];
GO

INSERT INTO [ProdTable2] DEFAULT VALUES;
GO 1000

Now how can I find the point at which the table was dropped?

SELECT
    [Current LSN],
    [Operation],
    [Context],
    [Transaction ID],
    [Description]
FROM
    fn_dblog (NULL, NULL),
    (SELECT
        [Transaction ID] AS [tid]
    FROM
        fn_dblog (NULL, NULL)
    WHERE
        [Transaction Name] LIKE '%DROPOBJ%') [fd]
WHERE
    [Transaction ID] = [fd].[tid];
GO

 

Current LSN            Operation       Context           Transaction ID Description
---------------------- --------------- ----------------- -------------  --------------------------------
0000009d:0000021e:0001 LOP_BEGIN_XACT  LCX_NULL          0000:00001ff7  DROPOBJ; <snip>
0000009d:0000021e:0002 LOP_LOCK_XACT   LCX_NULL          0000:00001ff7
0000009d:0000021e:0003 LOP_LOCK_XACT   LCX_NULL          0000:00001ff7
0000009d:0000021e:0008 LOP_MODIFY_ROW  LCX_IAM           0000:00001ff7
0000009d:0000021e:0009 LOP_MODIFY_ROW  LCX_PFS           0000:00001ff7  Deallocated 0001:0000009b
0000009d:0000021e:000a LOP_MODIFY_ROW  LCX_IAM           0000:00001ff7
0000009d:0000021e:000b LOP_MODIFY_ROW  LCX_PFS           0000:00001ff7  Deallocated 0001:0000009c
0000009d:0000021e:000c LOP_MODIFY_ROW  LCX_IAM           0000:00001ff7
0000009d:0000021e:000d LOP_MODIFY_ROW  LCX_PFS           0000:00001ff7  Deallocated 0001:0000009d
0000009d:0000021e:000e LOP_MODIFY_ROW  LCX_IAM           0000:00001ff7
0000009d:0000021e:000f LOP_MODIFY_ROW  LCX_PFS           0000:00001ff7  Deallocated 0001:0000009e
0000009d:0000021e:0010 LOP_MODIFY_ROW  LCX_IAM           0000:00001ff7
0000009d:0000021e:0011 LOP_MODIFY_ROW  LCX_PFS           0000:00001ff7  Deallocated 0001:0000009f
0000009d:0000021e:0012 LOP_MODIFY_ROW  LCX_PFS           0000:00001ff7  Deallocated 0001:0000009a
0000009d:0000021e:0013 LOP_HOBT_DDL    LCX_NULL          0000:00001ff7  Action 3 on HoBt 0xd:100 <snip>
0000009d:0000021e:0014 LOP_DELETE_ROWS LCX_MARK_AS_GHOST 0000:00001ff7
0000009d:0000021e:0032 LOP_LOCK_XACT   LCX_NULL          0000:00001ff7
...snip...

Cool eh?

Now I’ll take another log backup, which clears the log, and contains the log I just looked at.

BACKUP LOG [FNDBLogTest] TO DISK = N'D:\SQLskills\FNDBLogTest_Log2.bak' WITH INIT;
GO

Who Did the DROP?

If you want to figure out who ran the DROP command, look at the Transaction SID field for the DROP transaction’s LOP_BEGIN_XACT log record and then pass that value into the SUSER_SNAME () function. Simple!

Restoring using STOPBEFOREMARK

The LSN for the LOP_BEGIN_XACT log record is where I need to restore to just before.

To do that you can just plug in the LSN to the STOPBEFOREMARK option for RESTORE. The option is documented but the format is not – how helpful!!

 

The LSN needs to be specified with ‘0x’ in front of it, and then the format is exactly as returned by fn_dblog.

The restore sequence to restore to just before the DROP is therefore:

RESTORE DATABASE [FNDBLogTest2]
    FROM DISK = N'D:\SQLskills\FNDBLogTest_Full.bak'
WITH
    MOVE N'FNDBLogTest' TO N'C:\SQLskills\FNDBLogTest2.mdf',
    MOVE N'FNDBLogTest_log' TO N'C:\SQLskills\FNDBLogTest2_log.ldf',
    REPLACE, NORECOVERY;
GO

RESTORE LOG [FNDBLogTest2]
    FROM DISK = N'D:\SQLskills\FNDBLogTest_Log1.bak'
WITH
    NORECOVERY;
GO

RESTORE LOG [FNDBLogTest2]
FROM
    DISK = N'D:\SQLskills\FNDBLogTest_Log2.bak'
WITH
    STOPBEFOREMARK = 'lsn:0x0000009d:0000021e:0001',
    NORECOVERY;
GO

RESTORE DATABASE [FNDBLogTest2] WITH RECOVERY;
GO

And the table is there again, right before the point it was dropped. You can see where I used the constructed LSN string in the final log restore.

Using fn_dump_dblog

So what if the log records are no longer in the log? I can use the fn_dump_dblog function.

Edit 8/15/13: Beware – Jonathan just found out from a customer system that uses this extensively that every time fn_dump_dblog is called, it creates a new hidden SQLOS scheduler and up to three threads, which will not go away (and will not be reused) until a server restart. It’s a bug that the SQL team is going to fix now we’ve alerted them to it. Use with caution.

Edit 5/15/15: It’s fixed in SQL Server 2012 SP2+ and SQL Server 2014. The fix won’t be backported any earlier.

For instance, here is how I can use it to look in the FNDBLogTest_Log2.bak backup:

SELECT
    COUNT (*)
FROM
    fn_dump_dblog (
        NULL, NULL, N'DISK', 1, N'D:\SQLskills\FNDBLogTest_Log2.bak',
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT);
GO

I have to specify all the DEFAULT parameters (63 of them!) or it won’t work. The other parameters are:

  • Starting LSN (usually just NULL)
  • Ending LSN (again, usually just NULL)
  • Type of file (DISK or TAPE)
  • Backup number within the backup file (for multi-backup media sets)
  • File name

So I could do the same query as I did above:

SELECT
    [Current LSN],
    [Operation],
    [Context],
    [Transaction ID],
    [Description]
FROM
    fn_dump_dblog (
        NULL, NULL, N'DISK', 1, N'D:\SQLskills\FNDBLogTest_Log2.bak',
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT),
    (SELECT
        [Transaction ID] AS [tid]
    FROM
        fn_dump_dblog (
            NULL, NULL, N'DISK', 1, N'D:\SQLskills\FNDBLogTest_Log2.bak',
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT)
    WHERE
        [Transaction Name] LIKE '%DROPOBJ%') [fd]
WHERE [Transaction ID] = [fd].[tid];
GO

Which works perfectly, but takes much longer to run.

So maybe you’re wondering what all the other parameters to fn_dump_dblog are for? They are for specifying the media families of a media set that has more than one media family.

Here’s an example using a log backup striped across two files:

BACKUP LOG [FNDBLogTest] TO
    DISK = N'D:\SQLskills\FNDBLogTest_Log3_1.bak',
    DISK = N'D:\SQLskills\FNDBLogTest_Log3_2.bak'
WITH INIT;
GO

If I try to use fn_dump_dblog and only specify a single file, I get an error:

SELECT
    COUNT (*)
FROM
    fn_dump_dblog (
        NULL, NULL, N'DISK', 1, N'D:\SQLskills\FNDBLogTest_Log3_1.bak',
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT);
GO
Msg 3132, Level 16, State 1, Line 1
The media set has 2 media families but only 1 are provided. All members must be provided.

So I have to specify both media families:

SELECT
    COUNT (*)
FROM
    fn_dump_dblog (
        NULL, NULL, N'DISK', 1, N'D:\SQLskills\FNDBLogTest_Log3_1.bak',
        N'D:\SQLskills\FNDBLogTest_Log3_2.bak', DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT);
GO

Summary

So there you go – some more powerful tools to add to your disaster recovery arsenal.

Enjoy!

Understanding data vs log usage for spills in tempdb

Earlier this morning I noticed a discussion on the SQL MCM distribution list (that all the original MCM instructors are part of) that was trying to make sense of a huge disparity between tempdb data file usage and log file usage. I explained the answer and thought I'd share it with you all too.

The situation was a tempdb where the data files grew from 30GB to 120GB, running out of space on the disk, but the tempdb log file did not grow at all from its initial size of 1GB! How could that be?

One of the things to consider about tempdb is that logging in tempdb is ultra-efficient. Log records for updates in tempdb, for instance, only log the before image of the data instead of logging both before and after images. There is no need to log the after image – as that is only used for the REDO portion of crash recovery. As tempdb never gets crash-recovered, REDO never occurs. The before image *is* necessary, however, because transactions can be rolled back in tempdb, just like other databases, and so the before image of an update must be available to be able to successfully roll back the update.

Getting to the question though, I can easily explain the observed behavior by considering how a sort spill happens with tempdb.

I can simulate this using a gnarly query on the SalesDB database you can download from our resources page (see the top of the page for the sample databases to download).

I'm going to do a join of my Sales and Products tables and then sort the multi-million row result set by product name:

SELECT S.*, P.* from Sales S
JOIN Products P ON P.ProductID = S.ProductID
ORDER BY P.Name;
GO 

The query plan for this is (using Plan Explorer):

I know that the sort is going to spill out of memory into tempdb in this case. First I checkpoint tempdb (to clear out the log) and then after running the query, I can analyze the transaction log for tempdb.

Looking at the operation in the log:

SELECT
     [Current LSN],
     [Operation],
     [Context],
     [Transaction ID],
     [Log Record Length],
     [Description]
FROM fn_dblog (null, null);
GO

Current LSN            Operation       Context  Transaction ID Len Description
———————- ————— ——– ————– — ———————————————————-
000000c0:00000077:0001 LOP_BEGIN_XACT  LCX_NULL 0000:00005e4d  120 sort_init;<snip>
000000c0:00000077:0002 LOP_BEGIN_XACT  LCX_NULL 0000:00005e4e  132 FirstPage Alloc;<snip>
000000c0:00000077:0003 LOP_SET_BITS    LCX_GAM  0000:00005e4e  60  Allocated 1 extent(s) starting at page 0001:0000aa48
000000c0:00000077:0004 LOP_MODIFY_ROW  LCX_PFS  0000:00005e4e  88  Allocated 0001:0000aa48;Allocated 0001:0000aa49;
000000c0:00000077:0005 LOP_MODIFY_ROW  LCX_PFS  0000:00005e4d  80  Allocated 0001:00000123
000000c0:00000077:0006 LOP_FORMAT_PAGE LCX_IAM  0000:00005e4d  84               
000000c0:00000077:0007 LOP_SET_BITS    LCX_IAM  0000:00005e4e  60               
000000c0:00000077:0009 LOP_COMMIT_XACT LCX_NULL 0000:00005e4e  52               
000000c0:00000077:000a LOP_BEGIN_XACT  LCX_NULL 0000:00005e4f  128 soAllocExtents;<snip>
000000c0:00000077:000b LOP_SET_BITS    LCX_GAM  0000:00005e4f  60  Allocated 1 extent(s) starting at page 0001:0000aa50
000000c0:00000077:000c LOP_MODIFY_ROW  LCX_PFS  0000:00005e4f  88  Allocated 0001:0000aa50;Allocated 0001:0000aa51;<snip>
000000c0:00000077:000d LOP_SET_BITS    LCX_IAM  0000:00005e4f  60               
000000c0:00000077:000e LOP_SET_BITS    LCX_GAM  0000:00005e4f  60  Allocated 1 extent(s) starting at page 0001:0000aa58
000000c0:00000077:000f LOP_MODIFY_ROW  LCX_PFS  0000:00005e4f  88  Allocated 0001:0000aa58;Allocated 0001:0000aa59;<snip>
000000c0:00000077:0010 LOP_SET_BITS    LCX_IAM  0000:00005e4f  60               
000000c0:00000077:0011 LOP_SET_BITS    LCX_GAM  0000:00005e4f  60  Allocated 1 extent(s) starting at page 0001:0000aa60
000000c0:00000077:0012 LOP_MODIFY_ROW  LCX_PFS  0000:00005e4f  88  Allocated 0001:0000aa60;Allocated 0001:0000aa61;<snip>
000000c0:00000077:0013 LOP_SET_BITS    LCX_IAM  0000:00005e4f  60               
000000c0:00000077:0014 LOP_COMMIT_XACT LCX_NULL 0000:00005e4f  52               
000000c0:00000077:0015 LOP_BEGIN_XACT  LCX_NULL 0000:00005e50  128 soAllocExtents;<snip>
000000c0:00000077:0016 LOP_SET_BITS    LCX_GAM  0000:00005e50  60  Allocated 1 extent(s) starting at page 0001:0000aa68
000000c0:00000077:0017 LOP_MODIFY_ROW  LCX_PFS  0000:00005e50  88  Allocated 0001:0000aa68;Allocated 0001:0000aa69;<snip>
000000c0:00000077:0018 LOP_SET_BITS    LCX_IAM  0000:00005e50  60               
000000c0:00000077:0019 LOP_SET_BITS    LCX_GAM  0000:00005e50  60  Allocated 1 extent(s) starting at page 0001:0000aa70
000000c0:00000077:001a LOP_MODIFY_ROW  LCX_PFS  0000:00005e50  88  Allocated 0001:0000aa70;Allocated 0001:0000aa71;<snip>
000000c0:00000077:001b LOP_SET_BITS    LCX_IAM  0000:00005e50  60               
000000c0:00000077:001c LOP_SET_BITS    LCX_GAM  0000:00005e50  60  Allocated 1 extent(s) starting at page 0001:0000aa78
000000c0:00000077:001d LOP_MODIFY_ROW  LCX_PFS  0000:00005e50  88  Allocated 0001:0000aa78;Allocated 0001:0000aa79;<snip>
000000c0:00000077:001e LOP_SET_BITS    LCX_IAM  0000:00005e50  60               
000000c0:00000077:001f LOP_SET_BITS    LCX_GAM  0000:00005e50  60  Allocated 1 extent(s) starting at page 0001:0000aa80
000000c0:00000077:0020 LOP_MODIFY_ROW  LCX_PFS  0000:00005e50  88  Allocated 0001:0000aa80;Allocated 0001:0000aa81;<snip>
000000c0:00000077:0021 LOP_SET_BITS    LCX_IAM  0000:00005e50  60               
000000c0:00000077:0022 LOP_COMMIT_XACT LCX_NULL 0000:00005e50  52               
000000c0:00000077:0023 LOP_BEGIN_XACT  LCX_NULL 0000:00005e51  128 soAllocExtents;<snip>

<snip>

000000cd:00000088:01d3 LOP_SET_BITS    LCX_GAM  0000:000078fc  60  Deallocated 1 extent(s) starting at page 0001:00010e50
000000cd:00000088:01d4 LOP_COMMIT_XACT LCX_NULL 0000:000078fc  52               
000000cd:00000088:01d5 LOP_BEGIN_XACT  LCX_NULL 0000:000078fd  140 ExtentDeallocForSort;<snip>
000000cd:00000088:01d6 LOP_SET_BITS    LCX_IAM  0000:000078fd  60               
000000cd:00000088:01d7 LOP_MODIFY_ROW  LCX_PFS  0000:000078fd  88  Deallocated 0001:00010e68;Deallocated 0001:00010e69;<snip>
000000cd:00000088:01d8 LOP_SET_BITS    LCX_GAM  0000:000078fd  60  Deallocated 1 extent(s) starting at page 0001:00010e68
000000cd:00000088:01d9 LOP_COMMIT_XACT LCX_NULL 0000:000078fd  52               
000000cd:00000088:01da LOP_MODIFY_ROW  LCX_PFS  0000:00005fac  80  Deallocated 0001:00000109
000000cd:00000088:01db LOP_SET_BITS    LCX_SGAM 0000:00005fac  60  ClearBit 0001:00000108
000000cd:00000088:01dc LOP_SET_BITS    LCX_GAM  0000:00005fac  60  Deallocated 1 extent(s) starting at page 0001:00000108
000000cd:00000088:01dd LOP_COMMIT_XACT LCX_NULL 0000:00005fac  52               

(I snipped out a few extraneous log records plus the 6 extra 'Allocated' and 'Deallocated' for each of the PFS row modifications.) 

One of the things I notice is that the sort spill space is allocated in extents, and almost the entire sort – from initialization, through allocating all the extents, to deallocating them – is contained in a few very large transactions. But the transactions aren't actually that large.

Look at the soAllocExtents transaction with Transaction ID 00005e50. It's allocating 4 extents – i.e. 256KB – in a single system transction (4 x mark an extent as unavailable in the GAM, 4 x bulk set the 8 PFS bytes for the 8 pages in the extent, 4 x mark an extent allocated in the IAM). The total size of the log records for this transaction is 1012 bytes. (The first soAllocExtents system transaction only allocates 3 extents, all the others allocate 4 extents.)

When the sort ends, the extents are deallocated one-at-a-time in system transactions called ExtentDeallocForSort. An example is the transaction with Transaction ID 000078fd. It generates log records totalling 400 bytes. This means each 256KB takes 4 x 400 = 1600 bytes to deallocate.

Combining the allocation and deallocation operations, each 256KB of the sort that spills into tempdb generates 2612 bytes of log records.

Now let's consider the original behavior that I explained. If the 90GB was all sort space:

  • 90GB is 90 x 1024 x 1024 = 94371840KB, which is 94371840 / 256 = 368640 x 256KB chunks.
  • Each 256KB chunk takes 2612 bytes to allocate and deallocate, so our 90GB would take 368640 x 2612 = 962887680 bytes of log, which is 962887680 / 1024 / 1024 = ~918MB of log.

And this would explain the observed behavior. 90GB of tempdb space can be allocated and used for a sort spill with roughly 918MB of transaction log, give or take a bit from my rough calculations.

Tempdb logs things very efficiently – especially things that spill out of memory. The next stop in debugging such a problem would be regularly capturing the output of sys.dm_db_task_space_usage to figure out who is using all the space and then digging in from there.

Hope this helps explain things!

Survey: most prevalent latch waits (code to run)

I first started blogging about latches and some of the deeper parts of SQL Server internals last year (see Advanced performance troubleshooting: waits, latches, spinlocks) and now I’d like to pick up that thread (no scheduling pun intended :-)) and blog some more about some of the common latches that could be a performance bottleneck.

To that end, I’ve got some code below (plus example output) that will show the most common latch waits that have occurred on your system.

WITH Latches AS
(SELECT
latch_class,
wait_time_ms / 1000.0 AS WaitS,
waiting_requests_count AS WaitCount,
100.0 * wait_time_ms / SUM (wait_time_ms) OVER() AS Percentage,
ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS RowNum
FROM sys.dm_os_latch_stats
WHERE latch_class NOT IN (
‘BUFFER’)
AND wait_time_ms > 0

)
SELECT
W1.latch_class AS LatchClass,
CAST (W1.WaitS AS DECIMAL(14, 2)) AS Wait_S,
W1.WaitCount AS WaitCount,
CAST (W1.Percentage AS DECIMAL(14, 2)) AS Percentage,
CAST ((W1.WaitS / W1.WaitCount) AS DECIMAL (14, 4)) AS AvgWait_S
FROM Latches AS W1
INNER JOIN Latches AS W2
ON W2.RowNum <= W1.RowNum
WHERE W1.WaitCount > 0
GROUP BY W1.RowNum, W1.latch_class, W1.WaitS, W1.WaitCount, W1.Percentage
HAVING SUM (W2.Percentage) – W1.Percentage < 95; — percentage threshold
GO

LatchClass                        Wait_S  WaitCount  Percentage  AvgWait_S
——————————— ——- ———- ———– ———-
LOG_MANAGER                       221.43  4659       45.81       0.0475
ACCESS_METHODS_HOBT_VIRTUAL_ROOT  199.56  7017       41.28       0.0284
FGCB_ADD_REMOVE                   35.17   1047       7.27        0.0336
DBCC_OBJECT_METADATA              26.85   256490     5.55        0.0001

I’d like you to run the code and send me the output (either as a comment or in email). I’ll collate all your output and do some blogging for your enjoyment. [Update: I don’t need any more data – thanks!]

Thanks!