How do checkpoints work and what gets logged

(Check out my Pluralsight online training course: SQL Server: Logging, Recovery, and the Transaction Log.)

This is a blog post I’ve been meaning to do for a while, and I’ve recently noticed some info on the web about checkpoints which is a little misleading, so I want to do a quick post to explain how checkpoints work as far as log records are concerned.

When a checkpoint operation occurs, no matter how it’s triggered (for instance through a manual CHECKPOINT, from a database or differential backup, or automatically) the same set of operations occurs:

  • All dirty data file pages for the database are written to disk (all pages that have changed in memory since they were read from disk or since the last checkpoint), regardless of the state of the transaction that made the change.
  • Before a page is written to disk, all log records up to and including the most recent log record describing a change to that page are written to disk (yes, log records can be cached in memory too). This guarantees recovery can work and is called write-ahead logging. Log records are written to the log sequentially, and log records from multiple transactions will be interspersed in the log. The log cannot be selectively written to disk, so writing a dirty page to disk that only has a single log record affecting it may mean writing many more previous log records to disk as well.
  • Log records describing the checkpoint are generated.
  • The LSN of the checkpoint is recorded in the database boot page in the dbi_checkptLSN field (see Search Engine Q&A #20: Boot pages, and boot page corruption).
  • If in the SIMPLE recovery model, the VLFs in the log are checked to see whether they can be marked inactive (called clearing or truncating the log – both of which are terrible misnomers, as nothing is either physically cleared or truncated).

I’m using some terms you may not have come across – for a background primer on logging, recovery, and transaction log architecture see my article in the February 2009 TechNet Magazine – Understanding Logging and Recovery in SQL Server (or come to my Wednesday November 4th PASS Spotlight session with the same title).

Checkpoints are not really tracked in the transaction log – it just serves as a useful repository for information about which transactions are active at the time of the checkpoint. The LSN of the last checkpoint is recorded in the database boot page. This is where recovery starts, and if this page is inaccessible, the database cannot be attached, opened, or processed in any way – partly because it’s the boot page that knows whether the database was cleanly shut down or not, and partly because it’s the only place that records the LSN of the last checkpoint record. You may say, well it’s recorded in the transaction log too, but what if the log is corrupt in some way?

One area of confusion I’ve seen is that the checkpoint log records are overwritten by subsequent checkpoints. Absolutely not – once written, a log record is NEVER updated or overwritten – it will only be overwritten when the log wraps and the VLFs are re-used (see Inside the Storage Engine: More on the circular nature of the log). This has led to further confusion about when checkpoint information is retrievable from the log, using commands such as fn_dblog.

For the rest of this post, I want to show you what’s going on in the transaction log when checkpoints occur under different circumstances.

[Edit: note that from 2012 onward, there’s an additional log record about updating the boot page.]

Consider the following example:

CREATE DATABASE CheckpointTest;
GO
USE CheckpointTest;
GO
CREATE TABLE t1 (c1 INT);
GO
INSERT INTO t1 VALUES (1);
GO
CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
———————– ——————————-
00000047:00000051:009b  LOP_BEGIN_CKPT
00000047:00000091:0001  LOP_END_CKPT

We see the log records for the checkpoint. In this case the checkpoint is very simple and so there are only two records – beginning and ending the checkpoint.

If we run another checkpoint, what do we see?

CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
———————– ——————————-
00000047:00000092:0001  LOP_BEGIN_CKPT
00000047:00000093:0001  LOP_END_CKPT

Only information about one checkpoint, but with different LSNs for the log records. It’s not that the previous checkpoint was overwritten, it’s that we did a checkpoint – so the active portion of the log has moved forward and the log records for the previous checkpoint aren’t considered active any more as they’re not required (for instance, for database mirroring, an active transaction, a log backup, transactional replication). They’re still there in the log though, but just aren’t part of the required portion of the log and so aren’t dumped by fn_dblog.

Now, what if I create an active transaction? In another connection, I’ll do:

USE CheckpointTest;
GO
BEGIN TRAN;
GO
INSERT INTO t1 VALUES (1);
GO

And what if we do a checkpoint and look at the log now?

CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
———————– ——————————-
00000047:00000094:0001  LOP_BEGIN_XACT
00000047:00000094:0002  LOP_INSERT_ROWS
00000047:00000094:0003  LOP_COUNT_DELTA
00000047:00000094:0004  LOP_COUNT_DELTA
00000047:00000094:0005  LOP_COUNT_DELTA
00000047:00000094:0006  LOP_BEGIN_CKPT
00000047:00000096:0001  LOP_XACT_CKPT
00000047:00000096:0002  LOP_END_CKPT

We see the start of the open transaction, the insert of the record, the update of row counts in metadata, and the checkpoint.

You may notice that there’s another log record being generated for the checkpoint – LOP_XACT_CKPT. This is only generated when there are active (uncommitted) transactions and it lists information about all actvie transactions at the time the checkpoint begins. This is used during crash recovery to work out how far back in the transaction log to go to start REDO and UNDO operations (well, technically only UNDO will need to go this far back). Focusing in on this log record, we can see:

SELECT [Current LSN], [Operation], [Num Transactions], [Log Record]
FROM fn_dblog (NULL, NULL) WHERE [Operation] = ‘LOP_XACT_CKPT’;
GO

Current LSN             Operation                       Num Transactions
———————– ——————————- —————-
00000047:00000096:0001  LOP_XACT_CKPT                   1

Log Record
———————————————————————————————————–
0x000018 <snip> 780500000000000047000000940000000100040147000000940000000200000001 <snip> 6621000000000000

This log record contains information about each active (uncommitted) transaction at the time of the checkpoint. Without going into all the details about what’s in the payload for this log record, you can see two things in there:

  • The LSN of the LOP_BEGIN_XACT log record for the oldest active transaction (the first bold number above – match it against the LOP_BEGIN_XACT in the log dump a little higher up)
  • The LSN of the first log record making a database change for that transaction (the second bold number above – match it against the LOP_INSERT_ROWS in the log dump a little higher up)

You’ll notice that these LSNs are stored byte-reversed. 

How about if we do another checkpoint?

CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
———————– ——————————-
00000047:00000094:0001  LOP_BEGIN_XACT
00000047:00000094:0002  LOP_INSERT_ROWS
00000047:00000094:0003  LOP_COUNT_DELTA
00000047:00000094:0004  LOP_COUNT_DELTA
00000047:00000094:0005  LOP_COUNT_DELTA
00000047:00000094:0006  LOP_BEGIN_CKPT
00000047:00000096:0001  LOP_XACT_CKPT
00000047:00000096:0002  LOP_END_CKPT
00000047:00000097:0001  LOP_BEGIN_CKPT
00000047:00000098:0001  LOP_XACT_CKPT
00000047:00000098:0002  LOP_END_CKPT

This time we see the log records for the current checkpoint and the previous one – as the active log stretches all the way back to the start of the oldest active transaction – no matter how many checkpoints we do while there’s an active transaction.

Now what if we start another active transaction in a third connection?

USE CheckpointTest;
GO
BEGIN TRAN;
GO
INSERT INTO t1 VALUES (2);
GO

And then go back to the original connection and do another checkpoint and dump the log again:

CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
———————– ——————————-
00000047:00000094:0001  LOP_BEGIN_XACT
00000047:00000094:0002  LOP_INSERT_ROWS
00000047:00000094:0003  LOP_COUNT_DELTA
00000047:00000094:0004  LOP_COUNT_DELTA
00000047:00000094:0005  LOP_COUNT_DELTA
00000047:00000094:0006  LOP_BEGIN_CKPT
00000047:00000096:0001  LOP_XACT_CKPT
00000047:00000096:0002  LOP_END_CKPT
00000047:00000097:0001  LOP_BEGIN_CKPT
00000047:00000098:0001  LOP_XACT_CKPT
00000047:00000098:0002  LOP_END_CKPT
00000047:00000099:0001  LOP_BEGIN_XACT
00000047:00000099:0002  LOP_INSERT_ROWS
00000047:00000099:0003  LOP_COUNT_DELTA
00000047:00000099:0004  LOP_COUNT_DELTA
00000047:00000099:0005  LOP_COUNT_DELTA
00000047:00000099:0006  LOP_BEGIN_CKPT
00000047:0000009b:0001  LOP_XACT_CKPT
00000047:0000009b:0002  LOP_END_CKPT

You can see that we now have three sets of checkpoint log records, and two active transactions. Only one of these sets of checkpoint log records is the pertinent one – the previous two have been superceded, but the log records haven’t been overwritten or removed, as you can clearly see.

Looking inside all the LOP_XACT_CKPT records, we can see (with a bit of creative formatting of the output):

SELECT [Current LSN], [Operation], [Num Transactions], [Log Record]
FROM fn_dblog (NULL, NULL) WHERE [Operation] = ‘LOP_XACT_CKPT’;
GO

Current LSN             Operation                       Num Transactions
———————– ——————————- —————-
00000047:00000096:0001  LOP_XACT_CKPT                   1
00000047:00000098:0001  LOP_XACT_CKPT                   1
00000047:0000009b:0001  LOP_XACT_CKPT                   2

Log Record
————————————————————————————————————-
0x000018 <snip> 780500000000000047000000940000000100040147000000940000000200000001 <snip> 21000000000000
0x000018 <snip> 780500000000000047000000940000000100040147000000940000000200000001 <snip> 21000000000000
0x000018 <snip> 780500000000000047000000940000000100040147000000940000000200000001 <snip> 21000000000000 …
… 79050000000000004700000099000000010004014700000099000000020000000100000002000000DC000000

The first two checkpoints only list one active transaction and the most recent one lists two, as we’d expect. The log record payload for the first two lists the same oldest active transaction (highlighted in bold). The payload for the last checkpoint lists the same oldset active transaction (as that hasn’t changed), but now lists an additional transaction (match the 470000009900000001000 with the LSN of the second LOP_BEGIN_XACT in the log dump above), and has a transaction count of two.

Just to finish things off, let’s look at the boot page of the database either using DBCC PAGE or DBCC DBINFO:

DBCC TRACEON (3604);
GO
DBCC DBINFO (‘CheckpointTest’);
GO

DBINFO STRUCTURE:
DBINFO @0x6711EF64

dbi_dbid = 18                        dbi_status = 65536                   dbi_nextid = 2089058478
dbi_dbname = CheckpointTest          dbi_maxDbTimestamp = 2000            dbi_version = 611
dbi_createVersion = 611              dbi_ESVersion = 0
dbi_nextseqnum = 1900-01-01 00:00:00.000                                  dbi_crdate = 2009-09-28 07:06:35.873
dbi_filegeneration = 0
dbi_checkptLSN

m_fSeqNo = 71                        m_blockOffset = 153                  m_slotId = 6
dbi_RebuildLogs = 0                  dbi_dbccFlags = 2
dbi_dbccLastKnownGood = 1900-01-01 00:00:00.000
<snip>

The dbi_checkptLSN is dumped out in decimal – converting to hex gives us  (47:99:6), which exactly matches the LSN of the most recent LOP_BEGIN_CKPT record in the log dump above.

Hopefully this explains things clearly!

 

 

 

 

 

 

 

 

 

41 thoughts on “How do checkpoints work and what gets logged

  1. Paul,

    Thanks for clarifying about this. My poor choice of words and not accounting for the active transactions led to this confusion. Thanks again.

  2. Hi Paul,

    I really enjoyed the article becuase of the art of writing and simplnees about the topics which is explained fantastically.

  3. Paul,

    Thanks for this pretty insightful post.

    I’ve a couple of clarification to seek, though…

    1. I believe this checkpoint behavior does not change from 2000 through 2008 R2. Is that fair to assume?

    2. During a checkpoint, is all other DB activity quiesced? I mean are updates (after reading in new data pages to the buffer) allowed?

    Thanks,
    DHF.

  4. Hello!
    Not surprising that I end up at Paul Randal’s blog whenever I’m investigating SQL Server txlog behavior. :-)
    Regarding this portion of the post:
    ***
    If in the SIMPLE recovery model, the VLFs in the log are checked to see whether they can be marked inactive (called clearing or truncating the log – both of which are terrible misnomers, as nothing is either physically cleared or truncated).
    ***
    It seems that too many vlfs (eg 10000 at 10 mb in size) in simple recovery mode can lead to a protracted time of log_reuse_wait_desc of ‘checkpoint’, and thus potentially contribute to a txlog reaching max size and terminating/rolling back transactions.

    If so, on a system with 10000 vlfs at 10 mb in size, it seems like shrinking the file to its first two vlfs and manually growing to expected size in 8000 mb chunks (for 500 mb vlfs as you and Kimberly have mentioned elsewhere) would yield expected benefits in transaction rollback, database recovery, and preventing txlog full conditions.

    Seem correct?

  5. Hi Paul,
    Very basic question here please. Just working on trying to understand logging architecture (Thanks for your pluralsight course Logging and Recovery as well. It is invaluable.)
    When you say “disk” here:
    “All dirty data file pages for the database are written to disk (all pages that have changed in memory since they were read from disk or since the last checkpoint), regardless of the state of the transaction that made the change.”
    is it .ldf or .mdf (.ndf)?”

    Thanks a lot.

    1. Also you say:
      “Random I/O to the data files, sequential I/O to the log”
      Is there multiple checkpoints?
      Thank you.

  6. Hi Paul,

    Your articles really helped me understand the logging mechanism. Presently I am working on a stored procedure to delete high volume of records from the database as part of the archival activity.

    In order to avoid the filling the transaction log file, I am ensuring that the delete is done in small batches,transaction is commited, checkpoint issued,the recovery mode is SIMPLE.

    But I have searched all the SP in the all the Databases in my application but could’nt find any of them issuing a checkpoint statement.I believe issuing checkpoint is mandatory for the VLF to be marked as truncated but if none of them is issuing the checkpoint statement then how the transaction log is managed. Please advice if it is compulsory to issue checkpoint to avoid the space full error of transaction log in simple recovery mode. We are using SQL server 2008.

    1. If you’re using SIMPLE, a checkpoint occurs when the log is 70% full automatically. Are you sure you want to use SIMPLE? You can do the same thing using FULL and doing log backups after each committed transaction. Alternatively, you could use partitioning and simple drop an entire partition of records with hardly any logging.

  7. Dear Paul,

    The introduction says “All dirty data file pages for the database are written to disk, regardless of the state of the transaction that made the change”. Are these pages written into the database data file(s)? If so, I presume the page exists twice in the datafile : the page as it will be if the transaction is committed and the data as it will be if the transaction is rolled back. I wonder what will happen with those 2 versions? If the transaction has been completed (committed or rolled back) at the next checkpoint, I presume one of both pages will be removed and the other one is read into memory without being marked “dirty”???

    Thanks
    Peter.

    1. Nope – it exists once, at the current state of the transaction. If the transaction commits, with no further changes to that page, nothing else needs to happen. If the transaction rolls back, the page is changed in memory again and will be flushed out to disk on the next checkpoint, overwriting the page on disk.

  8. Paul
    I very much appreciate this article.

    You might be surprised but the line that meant
    the most to me was

    “If in the SIMPLE recovery model, the VLFs in the log are checked to see whether they can be marked inactive (called clearing or truncating the log – both of which are terrible misnomers, as nothing is either physically cleared or truncated).”

    I have a SIMPLE recovery model database, and I have a huge update to do.
    I wrote a loop to break the update into batches of 100,000 rows each.

    But I wanted to make sure that I did not fill the TLog.
    so I added a checkpoint to the loop and it keep the TLog from growing.

    thanks
    -gary h

  9. Hi Paul, thank you for the post, it’s really helpful. I have a question about how checkpoints interact with user writes. If a checkpoint is in the middle of writing a data page to disk (IO request pending) and a user thread is trying to obtain an exclusive latch on that page in order to update it, what happens? (I suspect the thread will wait with a PAGEIOLATCH_EX, but not really sure). Thanks!

  10. Hi Paul, thank you for the post, as always great :)

    I have one question about checkpoint and reusing of log space. I know that in simple recovery model checkpoint (direct or indirect) is the only way to reuse space but what is going on in full recovery model – in script below – why after checkpoint “Log Space Used” is going low?

    –SQL 2016 DEV + SP1
    USE [master]
    GO

    IF DATABASEPROPERTYEX (N’DUMMY’, N’Version’) > 0
    BEGIN
    ALTER DATABASE DUMMY SET SINGLE_USER
    WITH ROLLBACK IMMEDIATE;
    DROP DATABASE DUMMY;
    END
    GO

    CREATE DATABASE [DUMMY]
    ON PRIMARY
    ( NAME = N’DUMMY’
    , FILENAME = N’C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\DUMMY.mdf’ ,
    SIZE = 2024KB , MAXSIZE = UNLIMITED, FILEGROWTH = 65536KB )
    LOG ON
    ( NAME = N’DUMMY_log’
    , FILENAME = N’C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\DUMMY_log.ldf’
    , SIZE = 2024KB , MAXSIZE = 2048GB , FILEGROWTH = 1024 KB)
    GO

    USE [master]
    GO

    ALTER DATABASE [DUMMY]

    SET RECOVERY FULL
    WITH NO_WAIT
    GO

    USE [DUMMY]
    GO

    DROP TABLE IF EXISTS dbo.TEST

    SELECT TOP 50 cast(‘A’ AS CHAR(8000)) char_8000
    INTO dbo.TEST
    FROM master..spt_values

    DBCC SQLPERF(logspace)
    –Log Size (MB) Log Space Used (%)
    –1,992188 38,48039

    INSERT INTO dbo.TEST
    SELECT TOP 50 cast(‘A’ AS CHAR(8000)) char_8000
    FROM master..spt_values

    DBCC SQLPERF(logspace)
    –Log Size (MB) Log Space Used (%)
    –1,992188 58,84804

    CHECKPOINT

    DBCC SQLPERF(logspace)

    –Log Size (MB) Log Space Used (%)
    –1,992188 19,28922

  11. Hi Paul, your post helped answering my own question on whether or not setting a database to read_only issues an internal checkpoint and yeah it seems like it does!

    CREATE DATABASE CheckpointTest;
    GO
    USE CheckpointTest;
    GO
    CREATE TABLE t1 (c1 INT);
    GO
    INSERT INTO t1 VALUES (1);
    GO
    Alter database CheckpointTest SET READ_ONLY WITH NO_WAIT
    GO
    SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
    GO

  12. Hi Paul,

    how does checkpoint operation get a list of dirty pages? Does it query PFS page? we noticed PFS page latch contention on user database during checkpoint process, followed by stack dump.

    Thanks.

    1. You hit a bug – PFS pages aren’t used during checkpoints – it’s a coincidence. Old-style checkpoint runs through the BUF structures in the hash list for the database looking for ones marked dirty. In new-style checkpoints there’s a dirty page list.

  13. Hi Paul,

    Good Morning

    Let me start my question with example

    let’s say check point occurred at 10AM and 10.12AM my server got crashed { there is no checkpoint after 10AM}
    In between 10 AM an 10.12 AM there was transaction occurred and it is completed
    suddenly my server got crashed, in this case SQL Engine will do the redo or undo the completed transaction ?
    Could you please explain about recovery

    Thanks,
    Mahendra

  14. Dear Paul,

    Have query on inserting few rows in table , after insert statement it sayd rows affected but while selecting or counting rows the data is not being changed. I did not see anh trigger to stop being committed, implicit transaction is also off, no error in logs. Not sure why records are not showing in table.

  15. Hi Paul,
    Does checkpoint run in bulk logged recovery model also? What does Eager Write process do then? I am confused.

    1. Yes, in all recovery modes. In bulk logged or simple, eager writing ensures that all data file pages changed by a minimally-logged operation have been written to disk before the commit of that operation occurs.

      1. Why cant Checkpoint flush dirty pages changed by minimally-logged operations to disk?

        1. Of course it could, but for correctness of crash recovery, data pages changed by minimally-logged ops must be on disk before the op commits. But checkpoint won’t ever encounter such a page because of eager writing.

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.