More on how much transaction log a full backup includes


In a previous post (see here) I debunked a myth about how much transaction log a full backup would include. I had a question in the blog post comments that asked (paraphrasing):

The full backup has to include all the transaction log from the begin LSN of the oldest active transaction at the time the data read portion of the backup ends, until the LSN at which the data read portion ends. If that begin LSN is later in time than the LSN of the checkpoint that backup does initially, why does the full backup need to include all the transaction log between the checkpoint and the begin LSN? What is it used for?

I replied in the comments with a quip that it would be easier to reply with a whiteboard and a timeline – so I got all enthusiastic and created a picture in Powerpoint to help explain better.

Consider the timeline in the picture above for a full backup (the red numbers match the list below):

  1. The backup operation take a checkpoint to force all dirty pages in the buffer pool to disk – both those containing changes from transactions that have committed and those containing changes from transactions that are still in-flight. The backup operation then starts reading the allocated pages in the database.
  2. The read operation reads page X
  3. Transaction A starts
  4. Transaction A makes a change to page X. The copy in the backup is now out-of-date. Note that the backup will not read page X again – it’s already passed that point in the database.
  5. Transaction B starts. It won’t complete before the data read operation completes so it’s begin LSN is the oldest active transaction begin LSN.
  6. Transaction A commits. This commits the changes to page X.
  7. The backup data read operation completes and transaction log reading starts.

Now, the reason that the transaction log is read is so that the restore operation can recover the database so it is transactionally consistent as of the point in time when the read data operation completed.

If the transaction log was only included from the oldest active transaction begin LSN (point 5), then the copy of page X that was restored from the backup (read at point 2) would not be updated with the changes from transaction A (that happened at point 4). This means that it would not be transactionally consistent with the rest of the database as of the time the read data operation completed (point 7).

So, (ignoring replication) the minimum LSN of the transaction log that’s included in the full backup is MIN (LSN of last checkpoint, LSN of oldest active transaction). This ensures that recovery can REDO log records to bring pages up-to-date and UNDO log records for transactions that had not committed.

Much easier to explain with aid of a picture than without! :-)


Tales from the women’s bathroom in the Microsoft office in Beijing

Well, I bet that title grabbed a bunch of people’s attention :-)

In the Beijing Advanced Research Centre offices of Microsoft where we’re teaching this week, the bathroom stalls have print-outs of jokes on them. The ones in the women’s bathroom are pretty good – here are a couple of them that made us laugh (thanks to Kimberly for doing the research!)

#1: A man has a girlfriend, Lorraine, who’s he’s been dating for a while. He’s pretty happy. Into his life suddenly comes a new woman, called Clearly, who is better than Lorraine in many ways – she knows the best place in town to visit, to eat, and to dance. Unfortunately he doesn’t want to give up on Lorraine though. One day they’re walking along the banks of a river when Lorraine falls in and is swept away. The man starts singing ‘I can see Clearly now Lorraine has gone…’

#2: A man and his wife are lying in bed, just about to fall asleep. The conversation goes like this:

Wife: ‘If I were to die, would you re-marry?’

Husband: ‘Of course not!’

Wife: ‘Why? Don’t you like being married?’

Husband: ‘Well, yes – I suppose I would re-marry’

Wife: ‘Would you take down all the pictures of me and replace them with her?’

Husband: ‘Well yes, that would be the proper thing to do’

Wife: ‘Would you let her use my golf-clubs?’

Husband: ‘No, she’s left handed… oh shit!’


Database snapshots – when things go wrong

This is a post I’ve been meaning to do for a while – detailing some of the problems you can run into when using database snapshots.

Reverting to a snapshot

Many people use snapshots as a way of protecting against mistakes during a complicated set of changes to a database – you can just revert to the snapshot if something goes wrong. However, reverting to a snapshot has a little-known (but documented) problem. Let’s see. I’m going to create a database plus a snapshot of it and then start taking backups.

BACKUP DATABASE [SnapshotTest] TO DISK = N'C:\SQLskills\SnapshotTest.bak' WITH INIT;
CREATE TABLE [SnapshotTest].[dbo].[MyTable] ([c1] INT);

BACKUP LOG [SnapshotTest] TO DISK = N'C:\SQLskills\SnapshotTest_log.bak' WITH INIT;

-- Imagine a bunch of things happen here
-- Create the snapshot database, by first checking which files exist
SELECT * FROM [SnapshotTest].[sys].[database_files];
(NAME = N'SnapshotTest', FILENAME = N'C:\SQLskills\SnapshotTest_snap.snp')
AS SNAPSHOT OF [SnapshotTest];

Now I’m going to do some stuff that’s a mistake:

DROP TABLE [SnapshotTest].[dbo].[MyTable];

Not a problem as I can revert to my database snapshot:


Oops! I forgot to take a log backup to capture everything that happened since the last log backup…

BACKUP LOG [SnapshotTest] TO DISK = N'C:\SQLskills\SnapshotTest_log.bak';
Msg 4214, Level 16, State 1, Line 1
BACKUP LOG cannot be performed because there is no current database backup.
Msg 3013, Level 16, State 1, Line 1
BACKUP LOG is terminating abnormally.

Reverting to a snapshot rebuilds the transaction log and breaks the log backup chain. The only thing I can do is take a full or differential backup and then start taking log backups again. So – I lost the ability to do point-in-time restores in the period from the last log backup to the time when I reverted to the snapshot.

So how can I tell when the database was reverted? Books Online documents that the restorehistory table in msdb should have an entry with restore_type = ‘R’. Let’s try:

SELECT * FROM [msdb].[dbo].[restorehistory] WHERE [destination_database_name] = N'SnapshotTest';

Nope – nothing. That functionality seems to be broken. The only way I could find to trace reverting was in the error log:

2008-01-30 11:09:21.73 spid53      Reverting database 'SnapshotTest' to the point in time of database snapshot 'ST_Snap' with split point LSN 26000000013800001 (0x0000001a:0000008a:0001). This is an informational message only. No user action is required.
2008-01-30 11:09:21.74 spid53      Starting up database 'SnapshotTest'.
2008-01-30 11:09:21.74 spid53      The database 'SnapshotTest' is marked RESTORING and is in a state that does not allow recovery to be run.
2008-01-30 11:09:21.85 spid53      Starting up database 'SnapshotTest'.
2008-01-30 11:09:21.87 spid53      Starting up database 'ST_Snap'.

Note that the snapshot database is still there and continues to work.

Running out of space in a snapshot

The next problem that can happen is if you create a database snapshot on a volume that doesn’t have much disk space. In that case you can actually run out of space and the snapshot will go suspect and be unusable until it’s dropped and recreated. Let’s see what happens:

-- Create snapshot of large database on a volume with not much space
SELECT * FROM [SalesDB].[sys].[database_files];
(NAME = N'SalesDBData', FILENAME = N'D:\sqlskills\SalesDB_snap.snp')
Msg 5128, Level 17, State 2, Line 1
Write to sparse file 'D:\sqlskills\SalesDB_snap.snp' failed due to lack of disk space.

The query that causes the sparse file to run out of space doesn’t fail, but the user running the query will see this error. Note that it doesn’t say the snapshot has gone suspect. If you try to use it then it will tell you, plus there info written to the error log:

2008-01-30 11:50:29.14 spid53      Error: 17053, Severity: 16, State: 1.
2008-01-30 11:50:29.14 spid53      D:\sqlskills\SalesDB_snap.snp: Operating system error 112(There is not enough space on the disk.) encountered.
2008-01-30 11:50:29.15 spid53      Error: 3420, Severity: 21, State: 1.
2008-01-30 11:50:29.15 spid53      Database snapshot 'SalesDB_Snap' has failed an IO operation and is marked suspect.  It must be dropped and recreated.

Running out of space while running DBCC CHECKDB

One thing that isn’t common knowledge is that DBCC CHECKDB uses database snapshots as it’s mechanism for running online in SQL Server 2005. You can’t control where the snapshot is created (it’s actually created in the same location as the files comprising the database being checked – see my previous post on the DBCC CHECKDB stages here for more info) but you can create your own snapshot and check that. Anyway – it’s possible to run out of space in the snapshot while DBCC CHECKDB is running. I setup a situation with a large database on a drive with only 1MB of space left, started a large index rebuild in the database and then tried to run DBCC CHECKDB. Let’s see what happened:

The index rebuild statement:

ALTER INDEX ALL ON [SalesDB2].[dbo].[Sales] REBUILD;
Msg 5128, Level 17, State 2, Line 1
Write to sparse file 'd:\sqlskills\salesdbdata.mdf:MSSQL_DBCC20' failed due to lack of disk space.

So some poor unsuspecting user gets this error and has no idea why. For DBAs its a bit obscure too. The syntax is describing an alternate stream on the existing data file and the snapshot in question can’t be accessed at all.

The DBCC CHECKDB statement:

Msg 1823, Level 16, State 1, Line 1
A database snapshot cannot be created because it failed to start.
Msg 1823, Level 16, State 2, Line 1
A database snapshot cannot be created because it failed to start.
Msg 7928, Level 16, State 1, Line 1
The database snapshot for online checks could not be created. Either the reason is given in a previous error or one of the underlying volumes does not support sparse files or alternate streams. Attempting to get exclusive access to run checks offline.
Msg 5128, Level 17, State 2, Line 1
Write to sparse file 'd:\sqlskills\salesdbdata.mdf:MSSQL_DBCC20' failed due to lack of disk space.
Msg 3313, Level 21, State 2, Line 1
During redoing of a logged operation in database 'salesdb2', an error occurred at log record ID (1628:252:1). Typically, the specific failure is previously logged as an error in the Windows Event Log service. Restore the database from a full backup, or repair the database.
Msg 0, Level 20, State 0, Line 0
A severe error occurred on the current command.  The results, if any, should be discarded.

Wow – major amounts of errors – and it all boils down to running out of space.