Back in 2009 I blogged about how checkpoints work (see How do checkpoints work and what gets logged) and I received a question in email on Monday that I thought would make a good little blog post.
The question is (paraphrased): What happens if a checkpoint starts but doesn’t finish before a crash occurs? Will that checkpoint be used for crash recovery?
The answer is no, it won’t. Now if I left it at that, that really would be a little blog post, so let me explain my answer :-)
The purpose of a checkpoint is to bring the pages in the data files up-to-date with what’s in the transaction log. When a checkpoint ends, there’s a guarantee that as of the LSN of the LOP_BEGIN_CKPT log record, all changes from log records before that point are persisted in the data files on disk. There’s no guarantee about logged changes after that point, only before it. In other words, all the log records before the LSN of the LOP_BEGIN_CKPT log record are no longer required for crash recovery, unless there’s a long running transaction that started before that LSN.
When the checkpoint ends, the boot page of the database (page 9 in file 1 – see here for some more info) is updated with the beginning LSN of the checkpoint (and then if in the SIMPLE recovery mode, any log clearing/truncating can occur).
So if a checkpoint started but didn’t end before a crash, it’s LSN would not be in the boot page and so crash recovery would start from the previous checkpoint. This is good, because an incomplete checkpoint means there’s no guarantee about which logged changes are persisted in the data files, and so crash recovery wouldn’t be able to work correctly from only starting at the beginning of the incomplete checkpoint.
A corollary question could be: How does SQL Server guarantee that there’s always one complete checkpoint in the active portion of the log, in case a crash occurs?
The answer is that log clearing/truncation of a VLF containing an LOP_BEGIN_CKPT log record cannot happen until another complete checkpoint has occurred. In other words, a complete checkpoint has to occur since the last log clearing/truncation before the next one can happen. If a checkpoint hasn’t occurred, the log_reuse_wait_desc for the database in sys.databases will return CHECKPOINT. It’s not common to see this occur, but you might see it if there’s a very long running checkpoint (e.g. a very large update on a system with a slow I/O subsystem so the flushing of data file pages takes a long time) and very frequent log backups, so two log backups occur over the time taken by a single checkpoint operation. It could also happen if you’ve messed with the sp_configure recovery interval and set it higher than the default.
Interesting, eh?
15 thoughts on “Incomplete checkpoints and recovery”
“The answer is that log clearing/truncation of a VLF containing an LOP_BEGIN_XACT …”
Didn’t you mean LOP_BEGIN_CKPT? LOP_BEGIN_XACT is a BEGIN TRAN, I believe, and I can’t see how it fits in the context.
Yup – typo – thanks!
Does Delayed Durability have anything to do with this?
Nope – completely orthogonal.
Paul, could you explain “so two log backups occur concurrently with a single checkpoint operation”? My understanding is that only one tlog back up can occur at same time. and each time backup log will force checkpoint…
Thanks.
Turn of phrase. I reworded to ‘occur over the same time as a’ to be more clear.
Thanks for the great blog post Paul.
hi paul,
thank u for excellent post,
if this is the first time checkpoint occurs and it doesn’t complete before crash,
can we say that we have nothing in boot page about last checkpoint? how crash recovery behaves in this situation?
Crash recovery will start at the point the database was created and proceed as usual. Crash recovery is always infallible, except when the transaction log itself or boot page is corrupt.
If I issue a checkpoint from management studio and see that it is taking too long, can I just stop it? If I do, will it rollback all the work it did before I stopped it? I am thinking that if it does rollback all the work it did, then, it will consume more resources and so it should not be stopped midway.
Yes, you should be able to cancel it. No, nothing rolls back as it’s not a logged transaction.
Hello,Paul! I have a question about checkpoint below:
1. I know when a checkpoint have been finished, it can flushed the dirty data which in the memory to the disk, these dirty data contains about committed and uncommitted transaction data, and all the data has protected by the transaction log.
so my question is: at the time when the checkpoint just have been finished, the database maybe inconsistent because there are some uncommitted data in the database? so if database crashed at this time, it needs a crash recovery which undo the uncommitted data? if my understand is right, I confused with a sentence :”the crash recovery is begin at the last checkpoint because all dirty data has been flushed to the disk “, but there are some uncommitted data? if the database ignore the data before the last checkpoint, how can it make undo to let the database become consistent ?
It doesn’t ignore them. In the first paragraph, it says ‘In other words, all the log records before the LSN of the LOP_BEGIN_CKPT log record are no longer required for crash recovery, unless there’s a long running transaction that started before that LSN.’
That is to say,if the database crashed at the time which checkpoint just finished and there are some uncommitted transaction data in the disk, then the database needs a crash recovery which will use the some LSN before the checkpoint time’s LSN to undo some uncommitted data to let the database become consistent and online?
If the uncommitted transactions began before the LSN of the checkpoint, yes.