This is a blog post I've been meaning to do for a while, and I've recently noticed some info on the web about checkpoints which is a little misleading, so I want to do a quick post to explain how checkpoints work as far as log records are concerned.

When a checkpoint operation occurs, no matter how it's triggered (for instance through a manual CHECKPOINT, from a database or differential backup, or automatically) the same set of operations occurs:

  • All dirty data file pages for the database are written to disk (all pages that have changed in memory since they were read from disk or since the last checkpoint), regardless of the state of the transaction that made the change.
  • Before a page is written to disk, all log records up to and including the most recent log record describing a change to that page are written to disk (yes, log records can be cached in memory too). This guarantees recovery can work and is called write-ahead logging. Log records are written to the log sequentially, and log records from multiple transactions will be interspersed in the log. The log cannot be selectively written to disk, so writing a dirty page to disk that only has a single log record affecting it may mean writing many more previous log records to disk as well.
  • Log records describing the checkpoint are generated.
  • The LSN of the checkpoint is recorded in the database boot page in the dbi_checkptLSN field (see Search Engine Q&A #20: Boot pages, and boot page corruption).
  • If in the SIMPLE recovery model, the VLFs in the log are checked to see whether they can be marked inactive (called clearing or truncating the log – both of which are terrible misnomers, as nothing is either physically cleared or truncated).

I'm using some terms you may not have come across – for a background primer on logging, recovery, and transaction log architecture see my article in the February 2009 TechNet Magazine – Understanding Logging and Recovery in SQL Server (or come to my Wednesday November 4th PASS Spotlight session with the same title).

Checkpoints are not really tracked in the transaction log – it just serves as a useful repository for information about which transactions are active at the time of the checkpoint. The LSN of the last checkpoint is recorded in the database boot page. This is where recovery starts, and if this page is inaccessible, the database cannot be attached, opened, or processed in any way – partly because it's the boot page that knows whether the database was cleanly shut down or not, and partly because it's the only place that records the LSN of the last checkpoint record. You may say, well it's recorded in the transaction log too, but what if the log is corrupt in some way?

One area of confusion I've seen is that the checkpoint log records are overwritten by subsequent checkpoints. Absolutely not – once written, a log record is NEVER updated or overwritten – it will only be overwritten when the log wraps and the VLFs are re-used (see Inside the Storage Engine: More on the circular nature of the log). This has led to further confusion about when checkpoint information is retrievable from the log, using commands such as fn_dblog.

For the rest of this post, I want to show you what's going on in the transaction log when checkpoints occur under different circumstances.

Consider the following example:

CREATE DATABASE CheckpointTest;
GO
USE CheckpointTest;
GO
CREATE TABLE t1 (c1 INT);
GO
INSERT INTO t1 VALUES (1);
GO
CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
———————– ——————————-
00000047:00000051:009b  LOP_BEGIN_CKPT
00000047:00000091:0001  LOP_END_CKPT

We see the log records for the checkpoint. In this case the checkpoint is very simple and so there are only two records – beginning and ending the checkpoint.

If we run another checkpoint, what do we see?

CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
———————– ——————————-
00000047:00000092:0001  LOP_BEGIN_CKPT
00000047:00000093:0001  LOP_END_CKPT

Only information about one checkpoint, but with different LSNs for the log records. It's not that the previous checkpoint was overwritten, it's that we did a checkpoint - so the active portion of the log has moved forward and the log records for the previous checkpoint aren't considered active any more as they're not required (for instance, for database mirroring, an active transaction, a log backup, transactional replication). They're still there in the log though, but just aren't part of the required portion of the log and so aren't dumped by fn_dblog.

Now, what if I create an active transaction? In another connection, I'll do:

USE CheckpointTest;
GO
BEGIN TRAN;
GO
INSERT INTO t1 VALUES (1);
GO

And what if we do a checkpoint and look at the log now?

CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
———————– ——————————-
00000047:00000094:0001  LOP_BEGIN_XACT
00000047:00000094:0002  LOP_INSERT_ROWS
00000047:00000094:0003  LOP_COUNT_DELTA
00000047:00000094:0004  LOP_COUNT_DELTA
00000047:00000094:0005  LOP_COUNT_DELTA
00000047:00000094:0006  LOP_BEGIN_CKPT
00000047:00000096:0001  LOP_XACT_CKPT
00000047:00000096:0002  LOP_END_CKPT

We see the start of the open transaction, the insert of the record, the update of row counts in metadata, and the checkpoint.

You may notice that there's another log record being generated for the checkpoint – LOP_XACT_CKPT. This is only generated when there are active (uncommitted) transactions and it lists information about all actvie transactions at the time the checkpoint begins. This is used during crash recovery to work out how far back in the transaction log to go to start REDO and UNDO operations (well, technically only UNDO will need to go this far back). Focusing in on this log record, we can see:

SELECT [Current LSN], [Operation], [Num Transactions], [Log Record]
FROM fn_dblog (NULL, NULL) WHERE [Operation] = 'LOP_XACT_CKPT';
GO

Current LSN             Operation                       Num Transactions
———————– ——————————- —————-
00000047:00000096:0001  LOP_XACT_CKPT                   1

Log Record
———————————————————————————————————–
0×000018 <snip> 780500000000000047000000940000000100040147000000940000000200000001 <snip> 6621000000000000

This log record contains information about each active (uncommitted) transaction at the time of the checkpoint. Without going into all the details about what's in the payload for this log record, you can see two things in there:

  • The LSN of the LOP_BEGIN_XACT log record for the oldest active transaction (the first bold number above – match it against the LOP_BEGIN_XACT in the log dump a little higher up)
  • The LSN of the first log record making a database change for that transaction (the second bold number above – match it against the LOP_INSERT_ROWS in the log dump a little higher up)

You'll notice that these LSNs are stored byte-reversed. 

How about if we do another checkpoint?

CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
———————– ——————————-
00000047:00000094:0001  LOP_BEGIN_XACT
00000047:00000094:0002  LOP_INSERT_ROWS
00000047:00000094:0003  LOP_COUNT_DELTA
00000047:00000094:0004  LOP_COUNT_DELTA
00000047:00000094:0005  LOP_COUNT_DELTA
00000047:00000094:0006  LOP_BEGIN_CKPT
00000047:00000096:0001  LOP_XACT_CKPT
00000047:00000096:0002  LOP_END_CKPT
00000047:00000097:0001  LOP_BEGIN_CKPT
00000047:00000098:0001  LOP_XACT_CKPT
00000047:00000098:0002  LOP_END_CKPT

This time we see the log records for the current checkpoint and the previous one – as the active log stretches all the way back to the start of the oldest active transaction – no matter how many checkpoints we do while there's an active transaction.

Now what if we start another active transaction in a third connection?

USE CheckpointTest;
GO
BEGIN TRAN;
GO
INSERT INTO t1 VALUES (2);
GO

And then go back to the original connection and do another checkpoint and dump the log again:

CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
———————– ——————————-
00000047:00000094:0001  LOP_BEGIN_XACT
00000047:00000094:0002  LOP_INSERT_ROWS
00000047:00000094:0003  LOP_COUNT_DELTA
00000047:00000094:0004  LOP_COUNT_DELTA
00000047:00000094:0005  LOP_COUNT_DELTA
00000047:00000094:0006  LOP_BEGIN_CKPT
00000047:00000096:0001  LOP_XACT_CKPT
00000047:00000096:0002  LOP_END_CKPT
00000047:00000097:0001  LOP_BEGIN_CKPT
00000047:00000098:0001  LOP_XACT_CKPT
00000047:00000098:0002  LOP_END_CKPT
00000047:00000099:0001  LOP_BEGIN_XACT
00000047:00000099:0002  LOP_INSERT_ROWS
00000047:00000099:0003  LOP_COUNT_DELTA
00000047:00000099:0004  LOP_COUNT_DELTA
00000047:00000099:0005  LOP_COUNT_DELTA
00000047:00000099:0006  LOP_BEGIN_CKPT
00000047:0000009b:0001  LOP_XACT_CKPT
00000047:0000009b:0002  LOP_END_CKPT

You can see that we now have three sets of checkpoint log records, and two active transactions. Only one of these sets of checkpoint log records is the pertinent one – the previous two have been superceded, but the log records haven't been overwritten or removed, as you can clearly see.

Looking inside all the LOP_XACT_CKPT records, we can see (with a bit of creative formatting of the output):

SELECT [Current LSN], [Operation], [Num Transactions], [Log Record]
FROM fn_dblog (NULL, NULL) WHERE [Operation] = 'LOP_XACT_CKPT';
GO

Current LSN             Operation                       Num Transactions
———————– ——————————- —————-
00000047:00000096:0001  LOP_XACT_CKPT                   1
00000047:00000098:0001  LOP_XACT_CKPT                   1
00000047:0000009b:0001  LOP_XACT_CKPT                   2

Log Record
————————————————————————————————————-
0×000018 <snip> 780500000000000047000000940000000100040147000000940000000200000001 <snip> 21000000000000
0×000018 <snip> 780500000000000047000000940000000100040147000000940000000200000001 <snip> 21000000000000
0×000018 <snip> 780500000000000047000000940000000100040147000000940000000200000001 <snip> 21000000000000 …
    … 79050000000000004700000099000000010004014700000099000000020000000100000002000000DC000000

The first two checkpoints only list one active transaction and the most recent one lists two, as we'd expect. The log record payload for the first two lists the same oldest active transaction (highlighted in bold). The payload for the last checkpoint lists the same oldset active transaction (as that hasn't changed), but now lists an additional transaction (match the 470000009900000001000 with the LSN of the second LOP_BEGIN_XACT in the log dump above), and has a transaction count of two.

Just to finish things off, let's look at the boot page of the database either using DBCC PAGE or DBCC DBINFO:

DBCC TRACEON (3604);
GO
DBCC DBINFO ('CheckpointTest');
GO

DBINFO STRUCTURE:


DBINFO @0x6711EF64

dbi_dbid = 18                        dbi_status = 65536                   dbi_nextid = 2089058478
dbi_dbname = CheckpointTest          dbi_maxDbTimestamp = 2000            dbi_version = 611
dbi_createVersion = 611              dbi_ESVersion = 0                   
dbi_nextseqnum = 1900-01-01 00:00:00.000                                  dbi_crdate = 2009-09-28 07:06:35.873
dbi_filegeneration = 0              
dbi_checkptLSN

m_fSeqNo = 71                        m_blockOffset = 153                  m_slotId = 6
dbi_RebuildLogs = 0                  dbi_dbccFlags = 2                   
dbi_dbccLastKnownGood = 1900-01-01 00:00:00.000                          
<snip>

The dbi_checkptLSN is dumped out in decimal – converting to hex gives us  (47:99:6), which exactly matches the LSN of the most recent LOP_BEGIN_CKPT record in the log dump above.

Hopefully this explains things clearly!