What can cause log reads and other transaction log questions

Earlier today there was a question on SQL Server Central where someone wanted to know what could be causing so many reads on their transaction log. I was asked to chime in by fellow MVP Jonathan Kehayias (who also sent me some questions that I've answered in this post – thanks Jon!), so I did, with a list of everything I could think of. I thought it would make for a good post, so here it is, with a few more things I remembered while writing the post.

Before I start, if you're not comfortable talking log records and transaction log architecture, see my TechNet Magazine article on Understanding Logging and Recovery, which explains everything clearly, including how having too many VLFs can affect operations on the log that have to scan VLFs.

Each of these things can cause reads of the log:

  • Transaction rollback: when a transaction has to roll back (either because you say ROLLBACK TRAN or something goes wrong and SQL Server aborts the transaction), the log records describing what happened in the transaction have to be read so that their effects can be removed from the database. This is explained in the TechNet Magazine article. Note that it doesn't matter if you're using explicit transactions or not (i.e. BEGIN TRAN), SQL Server always starts a transaction for you (called an implicit transaction) so that it can put a boundary on what needs to be rolled back in case of a failure.
  • Crash recovery: crash recovery must read the transaction log to figure out what to do with all the log records in the active portion of the log (all the way back to the earlier of the most recent checkpoint or the start of the oldest active transaction). The log is read twice – once going forward from that oldest point (called the REDO phase) and then going backwards (called the UNDO phase). Again, this is explained in great depth in the article.
  • Creating a database snapshot: a database snapshot is a point-in-time view of a database. What's more, it's a transactionally consistent point-in-time view of a database – which means that, essentially, crash recovery must be run on the real database to create the transactionally consistent view. The crash recovery is run into the database snapshot, the real database isn't affected – apart from having all the active transaction log read so that crash recovery can run.
  • Running DBCC CHECKDB: creates a database snapshot by default on 2005 onwards, and runs the consistency checks on the snapshot. See above. There's a much more detailed description, including how this worked in 2000, in the first part of the 10-page blog post CHECKDB From Every Angle: Complete description of all CHECKDB stages.
  • Transaction log backups: this one's kind of obvious. A transaction log backup contains all the transaction log records generated since the last log backup finished (or since the log backup chain was established). To back up the log it has to read it. What's not so obvious is that a log backup will also scan through all the VLFs in the log to see if any active ones can be made inactive (called clearing or truncating the log – both misnomers as nothing is cleared and nothing is truncated). See my TechNet Magazine article on Understanding SQL Server Backups and in the blog post Importance of proper transaction log size management.
  • Any kind of data backup: (full/differential backup of a file/filegroup/database). Yup – data backups always include transaction log – so the backup can be restored and give you a transactionally consistent view of the database. See Debunking a couple of myths around full database backups and More on how much transaction log a full backup includes for details if you don't believe me.
  • Transactional replication: transactional replication works by harvesting committed transactions from the transaction log of the publication database (and then sending them to the subscriber(s) via the distribution database – beyond the scope of this post). This is done by the Log Reader Agent job, running from the Distributor. It needs to read all the log records generated in the publication database, even if they're nothing to do with the publications. More log equals more reads. My whitepaper on combining database mirroring and transactional replication in 2008 has more details on this stuff, as does Books Online.
  • Change data capture (in 2008): CDC uses the transactional replication log reader agent to harvest changes from the transaction log. See above. This means the CDC can cause the log to not be able to clear properly, just like transactional replication or database mirroring – see my blog post Search Engine Q&A #1: Running out of transaction log space for more details. Note the I didn't say Change Tracking – it uses a totally different mechanism – see my TechNet Magazine article on Tracking Changes in Your Enterprise Database for more details.
  • Database mirroring: DBM works by sending physical log records from the principal to the mirror database. If the mirroring sessions drops out of the SYNCHRONIZED state, then the log records won't be able to be read from memory and the mirroring subsystem will have to get them from disk – causing log reads. This can happen if you're running asynchronous mirroring (where you're specifically allowing for this), or if something went wrong while running synchronous mirroring (e.g. the network link between the principal and mirror dropped out, and a witness wasn't configured or the principal could still see the witness – again, beyond the scope of this post). Regardless, this is called having a SEND queue on the principal.
  • Restoring a backup: whenever backups are restored, even is you've said WITH NORECOVERY, the REDO portion of recovery is run for each restore, which reads the log.
  • Restoring a log backup using WITH STANDBY: in this case, you've essentially said you'd like recovery to run, but not to affect the transaction log itself. Running recovery has to read the log. For more info on using WITH RECOVERY, NORECOVERY, or STANDBY, see my latest TechNet Magazine article on Recovering from Disasters Using Backups, which explains how restores work.
  • A checkpoint, in the SIMPLE recovery mode only: see my blog post How do checkpoints work and what gets logged for a description of what checkpoints are and what they do. In the SIMPLE recovery mode, checkpoints are responsible for clearing the log (described with links above) so must read through all the VLFs to see which can be marked inactive.
  • When processing a DML trigger (on 2000): (thanks to Clay Lenhart for the comment that reminded me of this). In SQL Server 2000, the before and after tables that you can process in a DML trigger body are actually found from looking at the log records generated by the operation that caused the trigger to fire. My dev team changed this in 2005 to store the before and after tables using the version store, giving a big perf boost to DML trigger processing. 
  • Manually looking in the log (with DBCC LOG or the table-valued function fn_dblog): this one's pretty obvious.

Phew – a lot of things can cause log reads, the trick is knowing which one it is!

As you can see, there could be a lot of activity reading from your log as well as writing to it, which could cause an IO bottleneck. Make sure that the IO subsystem on which you place the log file (note: you don't get ANY performance benefit from having multiple log files) can handle the read and write workload the log demands. RAID 1 or RAID 10 with a bunch of spindles to spread the IOs out (note/warning/achtung: that's a big generalization – don't reply with a comment saying it's wrong because you've seen something different – different scenarios have different demands), and a proper RAID configuration (64k multiple for a stripe size, NTFS allocation unit size, volume partition alignment).

What does checkpoint do for tempdb?

Last week I posted in-depth about how checkpoints work and what exactly goes on (see How do checkpoints work and what gets logged). About a year ago I posted about why the buffer pool on a busy system may seem to have an inordinate amount of dirty tempdb pages in it, and now I want to clarify a bit more why this is the case and how checkpoints work for tempdb. To see the buffer pool contents, see my post Inside the Storage Engine: What’s in the buffer pool?.

A checkpoint is only done for tempdb when the tempdb log file reaches 70% full – this is to prevent the tempdb log from growing if at all possible (note that a long-running transaction can still essentially hold the log hostage and prevent it from clearing, just like in a user database).

If you read the first post I reference above, you’ll see that when a checkpoint occurs for a user database, all dirty pages for that database are flushed to disk (as well as other operations). This does not happen for tempdb. Tempdb is not recovered in the event of a crash, and so there is no need to force dirty tempdb pages to disk, except in the case where the lazywriter process (part of the buffer pool) has to make space for pages from other databases. Of course, when you issue a *manual* CHECKPOINT, all the dirty pages are flushed, but for automatic checkpoints they’re not. You can easily prove this to yourself with a large transaction inserting into a user table on tempdb, and watch the Databases/Log File(s) Used Size (KB) and Databases/Log Truncations for tempdb, when you see them go up, check to see if the number of dirty pages in tempdb has reduced – it won’t have.

The other operation that occurs during a checkpoint of databases in the SIMPLE recovery model is that the VLFs in the log are examined to see if they can be made inactive (see TechNet Magazine: feature article on understanding logging and recovery and Inside the Storage Engine: More on the circular nature of the log). This process allows the log to wrap-around and overwrite itself without having to grow – and this process *IS* required by tempdb, for proper management of it’s log size.

This is all that happens for a checkpoint of tempdb – no data pages are flushed to disk except for a manual CHECKPOINT.

How do checkpoints work and what gets logged

(Check out my Pluralsight online training course: SQL Server: Logging, Recovery, and the Transaction Log.)

This is a blog post I’ve been meaning to do for a while, and I’ve recently noticed some info on the web about checkpoints which is a little misleading, so I want to do a quick post to explain how checkpoints work as far as log records are concerned.

When a checkpoint operation occurs, no matter how it’s triggered (for instance through a manual CHECKPOINT, from a database or differential backup, or automatically) the same set of operations occurs:

  • All dirty data file pages for the database are written to disk (all pages that have changed in memory since they were read from disk or since the last checkpoint), regardless of the state of the transaction that made the change.
  • Before a page is written to disk, all log records up to and including the most recent log record describing a change to that page are written to disk (yes, log records can be cached in memory too). This guarantees recovery can work and is called write-ahead logging. Log records are written to the log sequentially, and log records from multiple transactions will be interspersed in the log. The log cannot be selectively written to disk, so writing a dirty page to disk that only has a single log record affecting it may mean writing many more previous log records to disk as well.
  • Log records describing the checkpoint are generated.
  • The LSN of the checkpoint is recorded in the database boot page in the dbi_checkptLSN field (see Search Engine Q&A #20: Boot pages, and boot page corruption).
  • If in the SIMPLE recovery model, the VLFs in the log are checked to see whether they can be marked inactive (called clearing or truncating the log – both of which are terrible misnomers, as nothing is either physically cleared or truncated).

I’m using some terms you may not have come across – for a background primer on logging, recovery, and transaction log architecture see my article in the February 2009 TechNet Magazine – Understanding Logging and Recovery in SQL Server (or come to my Wednesday November 4th PASS Spotlight session with the same title).

Checkpoints are not really tracked in the transaction log – it just serves as a useful repository for information about which transactions are active at the time of the checkpoint. The LSN of the last checkpoint is recorded in the database boot page. This is where recovery starts, and if this page is inaccessible, the database cannot be attached, opened, or processed in any way – partly because it’s the boot page that knows whether the database was cleanly shut down or not, and partly because it’s the only place that records the LSN of the last checkpoint record. You may say, well it’s recorded in the transaction log too, but what if the log is corrupt in some way?

One area of confusion I’ve seen is that the checkpoint log records are overwritten by subsequent checkpoints. Absolutely not – once written, a log record is NEVER updated or overwritten – it will only be overwritten when the log wraps and the VLFs are re-used (see Inside the Storage Engine: More on the circular nature of the log). This has led to further confusion about when checkpoint information is retrievable from the log, using commands such as fn_dblog.

For the rest of this post, I want to show you what’s going on in the transaction log when checkpoints occur under different circumstances.

[Edit: note that from 2012 onward, there’s an additional log record about updating the boot page.]

Consider the following example:

CREATE DATABASE CheckpointTest;
GO
USE CheckpointTest;
GO
CREATE TABLE t1 (c1 INT);
GO
INSERT INTO t1 VALUES (1);
GO
CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
———————– ——————————-
00000047:00000051:009b  LOP_BEGIN_CKPT
00000047:00000091:0001  LOP_END_CKPT

We see the log records for the checkpoint. In this case the checkpoint is very simple and so there are only two records – beginning and ending the checkpoint.

If we run another checkpoint, what do we see?

CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
———————– ——————————-
00000047:00000092:0001  LOP_BEGIN_CKPT
00000047:00000093:0001  LOP_END_CKPT

Only information about one checkpoint, but with different LSNs for the log records. It’s not that the previous checkpoint was overwritten, it’s that we did a checkpoint – so the active portion of the log has moved forward and the log records for the previous checkpoint aren’t considered active any more as they’re not required (for instance, for database mirroring, an active transaction, a log backup, transactional replication). They’re still there in the log though, but just aren’t part of the required portion of the log and so aren’t dumped by fn_dblog.

Now, what if I create an active transaction? In another connection, I’ll do:

USE CheckpointTest;
GO
BEGIN TRAN;
GO
INSERT INTO t1 VALUES (1);
GO

And what if we do a checkpoint and look at the log now?

CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
———————– ——————————-
00000047:00000094:0001  LOP_BEGIN_XACT
00000047:00000094:0002  LOP_INSERT_ROWS
00000047:00000094:0003  LOP_COUNT_DELTA
00000047:00000094:0004  LOP_COUNT_DELTA
00000047:00000094:0005  LOP_COUNT_DELTA
00000047:00000094:0006  LOP_BEGIN_CKPT
00000047:00000096:0001  LOP_XACT_CKPT
00000047:00000096:0002  LOP_END_CKPT

We see the start of the open transaction, the insert of the record, the update of row counts in metadata, and the checkpoint.

You may notice that there’s another log record being generated for the checkpoint – LOP_XACT_CKPT. This is only generated when there are active (uncommitted) transactions and it lists information about all actvie transactions at the time the checkpoint begins. This is used during crash recovery to work out how far back in the transaction log to go to start REDO and UNDO operations (well, technically only UNDO will need to go this far back). Focusing in on this log record, we can see:

SELECT [Current LSN], [Operation], [Num Transactions], [Log Record]
FROM fn_dblog (NULL, NULL) WHERE [Operation] = ‘LOP_XACT_CKPT’;
GO

Current LSN             Operation                       Num Transactions
———————– ——————————- —————-
00000047:00000096:0001  LOP_XACT_CKPT                   1

Log Record
———————————————————————————————————–
0x000018 <snip> 780500000000000047000000940000000100040147000000940000000200000001 <snip> 6621000000000000

This log record contains information about each active (uncommitted) transaction at the time of the checkpoint. Without going into all the details about what’s in the payload for this log record, you can see two things in there:

  • The LSN of the LOP_BEGIN_XACT log record for the oldest active transaction (the first bold number above – match it against the LOP_BEGIN_XACT in the log dump a little higher up)
  • The LSN of the first log record making a database change for that transaction (the second bold number above – match it against the LOP_INSERT_ROWS in the log dump a little higher up)

You’ll notice that these LSNs are stored byte-reversed. 

How about if we do another checkpoint?

CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
———————– ——————————-
00000047:00000094:0001  LOP_BEGIN_XACT
00000047:00000094:0002  LOP_INSERT_ROWS
00000047:00000094:0003  LOP_COUNT_DELTA
00000047:00000094:0004  LOP_COUNT_DELTA
00000047:00000094:0005  LOP_COUNT_DELTA
00000047:00000094:0006  LOP_BEGIN_CKPT
00000047:00000096:0001  LOP_XACT_CKPT
00000047:00000096:0002  LOP_END_CKPT
00000047:00000097:0001  LOP_BEGIN_CKPT
00000047:00000098:0001  LOP_XACT_CKPT
00000047:00000098:0002  LOP_END_CKPT

This time we see the log records for the current checkpoint and the previous one – as the active log stretches all the way back to the start of the oldest active transaction – no matter how many checkpoints we do while there’s an active transaction.

Now what if we start another active transaction in a third connection?

USE CheckpointTest;
GO
BEGIN TRAN;
GO
INSERT INTO t1 VALUES (2);
GO

And then go back to the original connection and do another checkpoint and dump the log again:

CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
———————– ——————————-
00000047:00000094:0001  LOP_BEGIN_XACT
00000047:00000094:0002  LOP_INSERT_ROWS
00000047:00000094:0003  LOP_COUNT_DELTA
00000047:00000094:0004  LOP_COUNT_DELTA
00000047:00000094:0005  LOP_COUNT_DELTA
00000047:00000094:0006  LOP_BEGIN_CKPT
00000047:00000096:0001  LOP_XACT_CKPT
00000047:00000096:0002  LOP_END_CKPT
00000047:00000097:0001  LOP_BEGIN_CKPT
00000047:00000098:0001  LOP_XACT_CKPT
00000047:00000098:0002  LOP_END_CKPT
00000047:00000099:0001  LOP_BEGIN_XACT
00000047:00000099:0002  LOP_INSERT_ROWS
00000047:00000099:0003  LOP_COUNT_DELTA
00000047:00000099:0004  LOP_COUNT_DELTA
00000047:00000099:0005  LOP_COUNT_DELTA
00000047:00000099:0006  LOP_BEGIN_CKPT
00000047:0000009b:0001  LOP_XACT_CKPT
00000047:0000009b:0002  LOP_END_CKPT

You can see that we now have three sets of checkpoint log records, and two active transactions. Only one of these sets of checkpoint log records is the pertinent one – the previous two have been superceded, but the log records haven’t been overwritten or removed, as you can clearly see.

Looking inside all the LOP_XACT_CKPT records, we can see (with a bit of creative formatting of the output):

SELECT [Current LSN], [Operation], [Num Transactions], [Log Record]
FROM fn_dblog (NULL, NULL) WHERE [Operation] = ‘LOP_XACT_CKPT’;
GO

Current LSN             Operation                       Num Transactions
———————– ——————————- —————-
00000047:00000096:0001  LOP_XACT_CKPT                   1
00000047:00000098:0001  LOP_XACT_CKPT                   1
00000047:0000009b:0001  LOP_XACT_CKPT                   2

Log Record
————————————————————————————————————-
0x000018 <snip> 780500000000000047000000940000000100040147000000940000000200000001 <snip> 21000000000000
0x000018 <snip> 780500000000000047000000940000000100040147000000940000000200000001 <snip> 21000000000000
0x000018 <snip> 780500000000000047000000940000000100040147000000940000000200000001 <snip> 21000000000000 …
… 79050000000000004700000099000000010004014700000099000000020000000100000002000000DC000000

The first two checkpoints only list one active transaction and the most recent one lists two, as we’d expect. The log record payload for the first two lists the same oldest active transaction (highlighted in bold). The payload for the last checkpoint lists the same oldset active transaction (as that hasn’t changed), but now lists an additional transaction (match the 470000009900000001000 with the LSN of the second LOP_BEGIN_XACT in the log dump above), and has a transaction count of two.

Just to finish things off, let’s look at the boot page of the database either using DBCC PAGE or DBCC DBINFO:

DBCC TRACEON (3604);
GO
DBCC DBINFO (‘CheckpointTest’);
GO

DBINFO STRUCTURE:
DBINFO @0x6711EF64

dbi_dbid = 18                        dbi_status = 65536                   dbi_nextid = 2089058478
dbi_dbname = CheckpointTest          dbi_maxDbTimestamp = 2000            dbi_version = 611
dbi_createVersion = 611              dbi_ESVersion = 0
dbi_nextseqnum = 1900-01-01 00:00:00.000                                  dbi_crdate = 2009-09-28 07:06:35.873
dbi_filegeneration = 0
dbi_checkptLSN

m_fSeqNo = 71                        m_blockOffset = 153                  m_slotId = 6
dbi_RebuildLogs = 0                  dbi_dbccFlags = 2
dbi_dbccLastKnownGood = 1900-01-01 00:00:00.000
<snip>

The dbi_checkptLSN is dumped out in decimal – converting to hex gives us  (47:99:6), which exactly matches the LSN of the most recent LOP_BEGIN_CKPT record in the log dump above.

Hopefully this explains things clearly!