(Look in the Misconceptions blog category for the rest of the month’s posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together when you join our Insider list, plus my online Myths and Misconceptions training course on Pluralsight.)
My DBA Mythbusters session went *really* well at SQL Connections yesterday, and now I’m half way through my debunking month!
Myth #15: checkpoint only writes pages from committed transactions.
This myth has persisted for *ages* and is linked to a misunderstanding of how the overall logging and recovery system works. A checkpoint always writes out all pages that have changed (known as being marked dirty) since the last checkpoint, or since the page was read in from disk. It doesn’t matter whether the transaction that changed a page has committed or not – the page is written to disk regardless. The only exception is for tempdb, where data pages are not written to disk as part of a checkpoint. Here are some blog post links with deeper information:
- TechNet Magazine article on Understanding Logging and Recovery in SQL Server
- Blog post: How do checkpoints work and what gets logged
- Blog post: What does checkpoint do for tempdb?
You can watch what a checkpoint operation is doing using a few trace flags:
- 3502: writes to the error log when a checkpoint starts and finishes
- 3504: writes to the error log information about what is written to disk
To use these trace flags, you must enable them for all threads using DBCC TRACEON (3502, 3504, -1) otherwise you won’t see any output.
Here’s a quick script that proves that dirty pages from uncommitted transactions are written out during a checkpoint. First the setup:
CREATE DATABASE [CheckpointTest]; GO USE [CheckpointTest]; GO CREATE TABLE [t1] ([c1] INT IDENTITY, [c2] CHAR (8000) DEFAULT 'a'); CREATE CLUSTERED INDEX [t1c1] on [t1] ([c1]); GO SET NOCOUNT ON; GO CHECKPOINT; GO DBCC TRACEON (3502, 3504, -1); GO
And now an uncommitted transaction that causes 10MB of pages to be dirtied, followed by a checkpoint:
BEGIN TRAN; GO INSERT INTO [t1] DEFAULT VALUES; GO 1280 CHECKPOINT; GO
And in the error log I see:
2010-04-15 13:31:25.09 spid52 DBCC TRACEON 3502, server process ID (SPID) 52. This is an informational message only; no user action is required. 2010-04-15 13:31:25.09 spid52 DBCC TRACEON 3504, server process ID (SPID) 52. This is an informational message only; no user action is required. 2010-04-15 13:31:25.09 spid52 Ckpt dbid 8 started (0) 2010-04-15 13:31:25.09 spid52 About to log Checkpoint begin. 2010-04-15 13:31:25.09 spid52 Ckpt dbid 8 phase 1 ended (0) 2010-04-15 13:31:25.71 spid52 FlushCache: cleaned up 1297 bufs with 50 writes in 625 ms (avoided 0 new dirty bufs) 2010-04-15 13:31:25.71 spid52 average throughput: 16.21 MB/sec, I/O saturation: 70 2010-04-15 13:31:25.71 spid52 last target outstanding: 2 2010-04-15 13:31:25.71 spid52 About to log Checkpoint end. 2010-04-15 13:31:25.71 spid52 Ckpt dbid 8 complete
Clearly all the pages were written out, even though the transaction had not committed.
5 thoughts on “A SQL Server DBA myth a day: (15/30) checkpoint only writes pages from committed transactions”
Thank you. Ammunition for the next time someone tries to tell me that rollbacks don’t complete after a restart because the data chances would never have been persisted.
My proof consisted of reading the data file with a hex editor, am happy to have a simpler way.
So, the following line is “writing” every page? So, all 1280 batch executions took 50 writes?
2010-04-15 13:31:25.71 spid52 FlushCache: cleaned up 1297 bufs with 50 writes in 625 ms (avoided 0 new dirty bufs)
2010-04-15 13:31:25.71 spid52 average throughput: 16.21 MB/sec, I/O saturation: 70
2010-04-15 13:31:25.71 spid52 last target outstanding: 2
Why does it say 1297 instead of 1280 in the log? Is this due to overhead?
Yes, it’s writing all dirty pages to disk. You can’t equate number of writes to number of pages – one checkpoint write could write up to 32 pages at one time contiguously. 1297 because there are 1280 leaf level pages in the clustered index, plus IAM page, plus index pages, plus all the system table pages that changed to make note of the new table and index, plus a PFS, GAM, and SGAM page. You can’t just consider the actual data pages.
Just Wow.. Luved the proof.. Seems like I am begining to understand a miniscule in a plethora of knowlege in SQL Server.
It indeed makes sense that this is a misconception that Checkpoint writes dirty pages only from committed transaction.
Imagine that, there is a dirty page in buffer cache. It has 10 records. 2 records were already modified by a previous transaction which was committed successfully and it modified those two rows. The other 3 rows on the same page are currently affected by another active transaction which has not yet committed. If SQL Server would just flush the dirty pages from only committed transactions and if Checkpoint occurs at this point before that transaction is committed, then SQL Server cannot determine whether that dirty page should be flushed on the disk or not because that page is affected by a committed transaction and an uncommitted transaction as well.
Because SQL Server performs I/O operations at page level and it has to write whole page, it also includes the changes to be written to the disk from uncommitted transaction. I guess, this should be the reason why Checkpoint writes pages from both committed and uncommitted transactions.
Paul, please let me know if I am missing something or above was not accurate.
Thanks again for the nice post.