In a previous post (see here) I debunked a myth about how much transaction log a full backup would include. I had a question in the blog post comments that asked (paraphrasing):
The full backup has to include all the transaction log from the begin LSN of the oldest active transaction at the time the data read portion of the backup ends, until the LSN at which the data read portion ends. If that begin LSN is later in time than the LSN of the checkpoint that backup does initially, why does the full backup need to include all the transaction log between the checkpoint and the begin LSN? What is it used for?
I replied in the comments with a quip that it would be easier to reply with a whiteboard and a timeline – so I got all enthusiastic and created a picture in Powerpoint to help explain better.
Consider the timeline in the picture above for a full backup (the red numbers match the list below):
The backup operation take a checkpoint to force all dirty pages in the buffer pool to disk – both those containing changes from transactions that have committed and those containing changes from transactions that are still in-flight. The backup operation then starts reading the allocated pages in the database.
The read operation reads page X
Transaction A starts
Transaction A makes a change to page X. The copy in the backup is now out-of-date. Note that the backup will not read page X again – it’s already passed that point in the database.
Transaction B starts. It won’t complete before the data read operation completes so it’s begin LSN is the oldest active transaction begin LSN.
Transaction A commits. This commits the changes to page X.
The backup data read operation completes and transaction log reading starts.
Now, the reason that the transaction log is read is so that the restore operation can recover the database so it is transactionally consistent as of the point in time when the read data operation completed.
If the transaction log was only included from the oldest active transaction begin LSN (point 5), then the copy of page X that was restored from the backup (read at point 2) would not be updated with the changes from transaction A (that happened at point 4). This means that it would not be transactionally consistent with the rest of the database as of the time the read data operation completed (point 7).
So, (ignoring replication) the minimum LSN of the transaction log that’s included in the full backup is MIN (LSN of last checkpoint, LSN of oldest active transaction). This ensures that recovery can REDO log records to bring pages up-to-date and UNDO log records for transactions that had not committed.
Much easier to explain with aid of a picture than without! :-)
30 thoughts on “More on how much transaction log a full backup includes”
Great explaination Paul, it was indeed helpful
Thanks for this answer. Can we simplify the whole sketch by saying: when a full backup begins, there are only two situations:
– Either there is at least one active transaction at the time the backup begins: in this case, the start LSN will be the LSN of the oldest active xact.
– Either the database is quiesced (in your example above, the B transaction start after the backup checkpoint): in this case, the start LSN will be the LSN of the checkpoint triggered by the backup.
Come in Paris, I can get you the whiteboard !! ;-)
Basically, yes. There are some rare twists but they’d just confuse things for no good reason.
Thanks for sharing very useful Information about the backups.. :)
Going to your point 1, so you say when i take full backup it will issue checkpoint?
Just want to know whether trans A which made changes in page x will be in full backup?
Yes, because it finished before the data-reading portion of the backup finished, so will be included in the portion of transaction log read by the backup.
Thank you Paul,the last statement cleared my doubts.
o, (ignoring replication) the minimum LSN of the transaction log that’s included in the full backup is MIN (LSN of last checkpoint, LSN of oldest active transaction)
very good explanation you cleared all my doubts, the example was very good. The only doubt I have is what if a transaction started before that backup started and was still uncommitted after the backup was completed. will the logs be recorded into transaction logs? I think it should not reading your last paragraph but still I want to clarify.
The log records generated by the transaction will be in the log contained in the backup – but when the backup is restored, that uncommitted transaction will be rolled back.
I have been reading through your blogs trying to find a solution to my runaway log file. mdf is 6MB, ldf is 92GB. Don’t have enough storage space on the SQL 2005 server to backup the log file. What are my other options ?
Your only other option is to switch to SIMPLE (which breaks your log backup chain), CHECKPOINT, switch back to full and then re-establish the log backup chain using a differential or full backup.
Is there any co-relation between Full database backup & VLF?
I have database of 200 GB in which transaction log .ldf file size is 100 GB & it’s 99.99% free[unused].
If I shrink the log file will it reduce full backup time?
Log file size and number of VLFs has nothing to do with full backup duration. Only the *amount* of log that needs to be backed up by the full backup affects the full backup duration. So the answer to your second question is no.
I’m trying to understand something I observed recently and was hoping you might be able to help. I have a database that’s about 250GB and the transaction log had grown to 150GB; the full backups were about 53GB, and after shrinking the transaction log to 24GB I’ve noticed that the full backups are down by about 9GB.
This is a much larger reduction than I would have expected to see, as there are log backups every 15 minutes, so despite the large size of the log, for the most part it was pretty empty. There is not a huge amount of log generated in the time it takes for the backup to run, (around 10MB approximately). Is there anything that would explain this, or is it likely that there’s something else going on that I’m missing?
I can’t think of anything off the top of my head that would account for this, except a reduction in the amount of data in the database. Barring a long-running transaction, if the log generation rate is fairly constant, the amount of log in the backup shouldn’t vary much.
Is there a way to know the exact date and time of the last committed transaction contained within a full database backup?
Either restore the database and use fn_dblog or look at the log in the backup using fn_dump_dblog. That info isn’t persisted anywhere except in the log.
I have a question, what if tran A ended before tran B started, I mean in your example step 6 before step 5, then the changes made by tran A will be in Full Backup or in Log? Tran A ended before oldest active transaction but read data is still running, as I understand, changes will be in Full Backup without information about tran A.
I got it, probably not read carefully. In your example Log will include all transaction from checkpoint and later, because checkpoint LSN < LSN tran B, so information about tran A will be in Full Backup and changes made by tran A will be REDO during restore.
What’s happen Without transaction B?
When transaction A makes a change to page X, is the copy in the backup out-of-date?
Nothing changes, as transaction B starts after the checkpoint. The text after the image explains this.
The restore operation
will be included rollback and roll forward Operations.
that is true?
Roll forward is done for each restored backup in the restore sequence, roll back is only done when the restore sequence completes.
Just want to know, bring page X up-to-date, is a roll forward action?
Full backup includes transaction log ‘until the LSN at which the data read portion ends’. Does it mean, after read portion ends,the backup process still need more time to copy transaction logs? And, when restoring the full backup, it is restored to the time when read portion ended, but not the end of whole backup process? Is it correct?
Yes. Think of it this way: LSN1
LSN2 . The restore image will be as for LSN2, not for .
I created three tables, Tab1, Tab2 & Tab3 and i inserted values into the tables then i took full backup. again i inserted some records into the tables and i took differential backup. now i dropped Tab1 & Tab3 and i took log1 backup. and again inserted some records into tab2 then i took log2 backup. in this scenario i want to know my dropped table names before the point-in-time recovery? i used fn_dblog and fn_dump_dblog but i didn’t get table names.
If you’re connected to the database and the table is dropped, you won’t see its name in fn_dblog, as the metadata isn’t there. Same thing with fn_dump_dblog. You’d need to restore the database to some point in time when the tables are there, and then use fn_dump_dblog to look in log backups for the object ID of the table you want being dropped.