Back in 2009 I blogged about how checkpoints work (see How do checkpoints work and what gets logged) and I received a question in email on Monday that I thought would make a good little blog post.
The question is (paraphrased): What happens if a checkpoint starts but doesn’t finish before a crash occurs? Will that checkpoint be used for crash recovery?
The answer is no, it won’t. Now if I left it at that, that really would be a little blog post, so let me explain my answer :-)
The purpose of a checkpoint is to bring the pages in the data files up-to-date with what’s in the transaction log. When a checkpoint ends, there’s a guarantee that as of the LSN of the LOP_BEGIN_CKPT log record, all changes from log records before that point are persisted in the data files on disk. There’s no guarantee about logged changes after that point, only before it. In other words, all the log records before the LSN of the LOP_BEGIN_CKPT log record are no longer required for crash recovery, unless there’s a long running transaction that started before that LSN.
When the checkpoint ends, the boot page of the database (page 9 in file 1 – see here for some more info) is updated with the beginning LSN of the checkpoint (and then if in the SIMPLE recovery mode, any log clearing/truncating can occur).
So if a checkpoint started but didn’t end before a crash, it’s LSN would not be in the boot page and so crash recovery would start from the previous checkpoint. This is good, because an incomplete checkpoint means there’s no guarantee about which logged changes are persisted in the data files, and so crash recovery wouldn’t be able to work correctly from only starting at the beginning of the incomplete checkpoint.
A corollary question could be: How does SQL Server guarantee that there’s always one complete checkpoint in the active portion of the log, in case a crash occurs?
The answer is that log clearing/truncation of a VLF containing an LOP_BEGIN_CKPT log record cannot happen until another complete checkpoint has occurred. In other words, a complete checkpoint has to occur since the last log clearing/truncation before the next one can happen. If a checkpoint hasn’t occurred, the log_reuse_wait_desc for the database in sys.databases will return CHECKPOINT. It’s not common to see this occur, but you might see it if there’s a very long running checkpoint (e.g. a very large update on a system with a slow I/O subsystem so the flushing of data file pages takes a long time) and very frequent log backups, so two log backups occur over the time taken by a single checkpoint operation. It could also happen if you’ve messed with the sp_configure recovery interval and set it higher than the default.