Debunking a couple of myths around full database backups

I’ve been involved in a few conversations today that have highlighted some big misconceptions about how backups work. I’d like to use this blog post to debunk them. I checked everything I say here with my friend Steve Schmidt, the developer on the Storage Engine team responsible for BACKUP/RESTORE for the last ten years.

Myth 1: A full database backup only contains the transaction log from the start of the backup to the end of the backup


When you restore a full database backup, you get a transactionally consistent database. Consider the case where there’s an active transaction that doesn’t commit until after the backup completes. If the backup only contained the log that occured while the database was being backed up, how would it roll back the active transaction. It *has* to include enough transaction log to roll back the active transaction. The start LSN of the log included in a database backup is the minimum of:



  • LSN of the last checkpoint
  • LSN of the start of the oldest active transaction
  • LSN of the last replicated transaction

Let me prove it to you. I’m going to create a database, put it into FULL recovery mode, start a transaction, checkpoint, and then take a backup. The checkpoint ensures the page I’ve altered is flushed to disk.



CREATE DATABASE stopattest;



GO


ALTER DATABASE stopattest SET RECOVERY FULL;


GO


BACKUP DATABASE stopattest TO DISK = ‘c:\sqlskills\stopattest.bck’ WITH INIT;


GO


USE stopattest;


GO


CREATE TABLE t1 (c1 INT);


GO


BEGIN TRAN;


INSERT INTO t1 VALUES (1);


GO


Now in another connection I’ll take another full database backup.



BACKUP DATABASE stopattest TO DISK = ‘c:\sqlskills\stopattest.bck’ WITH INIT;


GO


The msdb.dbo.backupmedia table will tell us the relevant LSNs in the backup (I added some spaces to delineate the prts of the LSN for clarity):



SELECT last_lsn, checkpoint_lsn, database_backup_lsn FROM msdb.dbo.backupset



WHERE database_name = ‘stopattest’;


GO


last_lsn              checkpoint_lsn        database_backup_lsn
——————— ——————— ———————
21 0000000190 00001   21 0000000174 00037    21 0000000058 00037


[Edit: After swapping some email with Kalen Delaney, I realized that when I originally put this together I had more log records in the post and when I removed them I messed up the description of the (21:174:37) LSN – its now corrected below)


So you can see the checkpoint that begins the backup was at (21:174:37). The LSN of the first log record that the backup contains is (21:58:37), which is before the start of the backup. And the backup contains all the log from then until (21:190:1). Now let’s look at the actual transaction log to see what these LSNs correspond to.



SELECT [Current LSN], Operation, [Transaction Name] FROM fn_dblog (null,null);


GO


Here’s some selected output:



Current LSN              Operation        Transaction Name
———————— —————- ——————
00000015:0000003a:0025   LOP_BEGIN_CKPT   NULL
                (this is the calculated minimum LSN the backup must contain (21:58:37) – which is (15:3a:25) in hex)
.
.
00000015:00000061:0001   LOP_BEGIN_XACT   user_transaction
                (here’s my transaction starting – before the backup started but within the LSN range contained in the backup)
.
.
00000015:000000ab:0004   LOP_BEGIN_XACT   Backup:InvalidateDiffMaps
                (this is the backup clearing the differential bitmaps)
.
.
00000015:000000ae:0025  LOP_BEGIN_CKPT    NULL
                (this is the checkpoint that BACKUP does – matching the checkpoint LSN above)
.
.


So – this clearly shows that the backup contains more than just the log from the time the backup was running.


Myth 2: It’s possible to do a STOPAT with only a full database backup


This myth is that its possible to use STOPAT with a full database backup to stop during the time the backup was being taken. The argument FOR this myth is that the backup contains the log for all the changes that happened while the backup was being taken, so it must be possible to stop at any point in time. Technically, that’s correct, but in practice it’s wrong – you cannot stop at a point while the backup was running, using only the database backup.


This one’s more complicated to explain. Doing a STOPAT operation means getting the database to a state where operations later than the time or LSN specified in the STOPAT clause haven’t affected the database yet. A database backup reads pages that may or may not have been changed while the backup was running. If they are changed, it could be at any point while the backup is running.


Consider the case where page X is changed at LSN (10:45:12), *just* before the backup completes and is read by the backup at the time equivalent to LSN (10:45:13). The backup will contain the changed page image, plus the log record for the change. What if I want to stop at a point while the backup was running but *before* the change to page X, say at LSN (10:44:00). The backup only contains the image of page X at LSN (10:45:12) – how can it be put back to the image at the time we want to stop at? The argument is that we have the log record for the change – can’t SQL Server just undo it?


No. It won’t even see it. STOPAT works by recovering the database up to the point that the STOPAT specified. If we ask to stop at LSN (10:44:00), then the log will only be read and recovered up to that point. However, because the database backup didn’t read page X until LSN (10:45:13), it only has the image of it from when it was altered at (10:45:12). This clearly won’t give a database image as of (10:44:00).


The only way to stop at a particular time/LSN, is to have images of *all* database pages from before that time/LSN (i.e. from the *previous* database backup) and then restore all the transaction logs up to and including the time/LSN to stop at.


Hopefully that makes sense.

16 thoughts on “Debunking a couple of myths around full database backups

  1. I am wondering how things work for VLDB which is in simple recovery mode and contiously changing while DB backup happens?
    Thanks

    Farhan

  2. It works in exactly the same way as for a DB in FULL recovery mode – the database backup contains all the log necessary to make the restored database transactionally consistent as of the end of the backup operation.

  3. Hi Again Paul,

    I have a question on this one: you say the database_backup_lsn corresponds to the first log record that the (second) backup contains. I understand this value of database_backup_lsn refers to the checkpoint generated by the first full backup in your example, because when you take multiple backups you notice that current database_backup_lsn always matches the checkpoint_lsn of the previous backup.

    When you say the first LSN is the minimum of the last checkpoint and the oldest opened tran, and because the first backup checkpoint happens before the begin tran, the start LSN would equals to database_backup_lsn, then. What I don’t understand is why do we have to start from this point, and not directly from the LSN of the oldest active tran (LSN 21:97:1 in your case) ? What needs to be recovered between last checkpoint and oldest active tran ? Every transaction committed in between will be checkpointed at the beginning of the next backup, so there will be no need to recover them ?

    Thanks,

    David.

  4. "It works in exactly the same way as for a DB in FULL recovery mode – the database backup contains all the log necessary to make the restored database transactionally consistent as of the end of the backup operation."

    Come again?

    There are three recovery models, but lets focus on Full and Simple.

    Simple – in simple recovery mode, the transaction log is not backed up so you can only recover to the most recent full or differential backup.

    Full – in full recovery mode you backup the database and the transaction log so that you can recover the database to any point in time.

    I do not understand why you mention that they both make the restored databases transactionally consistent. They do not have to be necessarily.

  5. Hi Aaron,

    "Come again?" – ok.

    Unfortunately, you’ve missed the whole point about full database backups. A full database backup, when restored either explicitly using WITH RECOVERY, or when a recovery option is not specified *ALWAYS* results in a transactionally-consistent database. The point-in-time at which the database is restored to is the point at which the data-portion-reading part of the backup operation completed. All database backups include transaction log, otherwise there would be no way to rollback transactions that were active at the time the data-reading portion of the backup ended.

    This behavior is *EXACTLY* the same in any of the three recovery models – so I’m afraid your assertion above is incorrect. Only if you specify WITH NORECOVERY does a restored backup NOT result in a transactionally consistent database.

    Thanks

  6. Hi Dear

    A full backup contains all the data in a specific database at given time, either you commit the transaction or not, i.e if you begin tran, execute some statement, in other session you take backup, all uncommited data will be there, you can check it by restoring database,while in first db where you make transaction(unsuccessfull) no data will be available.

  7. Would you discussion explain the following:

    In my application, which has a SQL 2005 database and classic web/asp application…
    I take a full backup of a source database.
    I restore the database to a 2nd server (target database, which I expect to be identical). Website code is identical.
    I run a report on both servers.
    Report on source server shows different data than the same report on the target server.

    Differences are minor, just a few records.

    I am wondering if very recent transaction data was cached and not written to disk. Would this explain why the reports are different?

    1. Assuming that there are no changes on system 1 after the backup, it must be that there are transactions being rolled back during the restore, and the report is using NOLOCK to scan the uncommitted data on system 1.

      Rolling back uncommitted transactions is the only thing that could be different after the restore.

  8. Suppose ,we started taking the ful backup. it will take 2 hours.
    in between some of the records deleted from table and also some of the records added into the table.
    now i want to restore the backup.
    Q1.Can i get deleted records?
    Q2.Can i get new inserted records?

    Kindly assist me.

    Thanks in advance……..

    1. The point-in-time that you get when you restore a full backup is at the end of the data-reading portion of the backup, and before it reads transaction log. If the transactions finished before that point in time, you won’t have the deleted records but you will have the inserted records.

      You need to take log backups also, so you can restore to any point in time, which will allow you to recover the deleted records.

  9. A couple of question on this?
    – I have more than 1 TB database took me 5 hours to backup, so if I start back up at 12 pm and goes on until 5pm, is that backup will consist of data till 5 pm when I restore it?
    – When I took differential backup at midnight at 12 it again took 4 hours to backup and almost same size of a full backup, why is that? Isn’t that differential data will be from 5 pm to midnight?
    – If I restore full backup Production, from the weekend (Sunday) on Monday morning, Monday night differential on Tuesday morning and restore every day like this does it take less time to restore since the full backup took me 8 hours to restore?

    1. No – it will have all the data for transactions that committed up to and including the time when the data-reading portion of the backup ended (let’s call it time E.) Time E may have been well before 5pm, if there was a lot of log needing to be backed up too.

      The subsequent diff backup had all the extents changed between time E and the time the data-reading portion of the diff backup ended. There may also have been a bunch of log that needed to be backed up too, making the backup take longer.

      No, if I’m understanding your question correctly. You only need to restore the most recent full plus the most recent diff (plus any required log backups). Diff backups are cumulative, not incremental. Tuesday’s diff has everything in Monday’s diff, plus whatever changed since Monday’s diff. A diff backup contains *all* changes since the most recent full backup.

      Hope that helps.

  10. This is an old post but I’m curious about what ways LSNs can change.. I’m new to being a DBA and I created a script that takes log back ups every 30 minutes and then at midnight takes a diff, with Saturday fulls.. I have a script that I wrote that uses powershell to verify the LSNs doing a restore header only, adds the results into a table and compares them. This ran fine Friday and Saturday night, but on Sunday at 2:30 in the morning the LSNs broke on all databases running log backups.. not just one. Upon taking a fresh full, this goes back to working… and again the full and diffs happen at midnight, so the diff at midnight was fine and the logs were fine for 2 hours after that. There is no other back up job running at 2 am, and nobody was connected to the machine. I also checked event logs on the machine and didn’t see anything particularly nefarious around this time.. Any ideas?

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Some thoughts on courage

(This is also the Ponderings – editorial – in today’s SQLskills newsletter.) I want to start out this post by sincerely thanking everyone who emailed

Explore

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.