Here’s a really interesting question that was in my search engine logs yesterday - if I have a transaction that runs and completes while a backup is running, will the complete transaction be in the backup? The answer is…. it depends!


In terms of what gets backed up, the way a full backup works is:



  1. Note the transaction log’s LSN (Log Sequence Number)
  2. Read all allocated extents in the various data files
  3. Note the LSN again
  4. Read all the transaction log between the starting LSN and the ending LSN

Any transaction that commits before or on the LSN read in step 3 will be fully reflected when the database is restored. If not, the transaction will be undone. So you can’t just go by the completion time of the backup and the completion time of the transaction. The transaction may well commit before the backup operation completes, but it may complete during step 4, and so it will get rolled back during a restore. In this case, it’s necessary to take a log backup as well and restore that too to make the transaction be fully reflected after a restore.


As you know I always like to prove things  – so here’s my proof of what I just said. I’m going to use the AdventureWorks database to do this. First thing is to set it to full recovery mode (and take the first full backup to start full recovery mode logging):



ALTER DATABASE AdventureWorks SET RECOVERY FULL;


BACKUP DATABASE AdventureWorks TO DISK=‘C:\SQLskills\AdventureWorks.bck’ WITH INIT;


GO


Now I’m going to flush out the backup history tables in MSDB:



USE msdb;


GO


EXEC sp_delete_backuphistory ’10/6/07′;


GO


I’ve got a really contrived example that I’m going to use to show whether my transaction is wholely contained in the full backup. Using the HumanResources.Employee table, there’s a column VacationHours which I’m going to set to 0 and then force all the changes pages to disk. This is my base state:



UPDATE AdventureWorks.HumanResources.Employee SET VacationHours = 0;


GO


CHECKPOINT;


GO


My contrived transaction is going to be in a tight loop updating all the rows in the table, which will generate lots of transaction log, and timed to complete just before the backup completes (i.e. in stage 4). In one connection I start the backup:



BACKUP DATABASE AdventureWorks TO DISK=‘C:\SQLskills\AdventureWorks.bck’ WITH INIT;


SELECT GETDATE ();


GO


and in another I start my contrived transaction, after starting the backup:



BEGIN TRAN


DECLARE @a INT


DECLARE @b INT


SELECT @a = 1


WHILE (@a < 6)


BEGIN



SELECT @b = 1


WHILE (@b < 201)


BEGIN



UPDATE AdventureWorks.HumanResources.Employee SET VacationHours = @b


SELECT @b=@b+1


END


SELECT @a=@a+1


END


COMMIT TRAN;


SELECT GETDATE ();


GO


It’s not pretty but it does the job. The backup finishes at 2007-10-05 17:42:38.983 and the transaction finishes at 2007-10-05 17:42:38.107 - before the backup finishes. Remember I set the VacationHours all to zero before running my transaction – let’s check the transaction did actually change them:



SELECT MAX (VacationHours) FROM AdventureWorks.HumanResources.Employee;


GO


This returns 200 – which is what I’d expect. But is the transaction wholely contained in the backup? Let’s look at the backup history to find out the last LSN that was captured in the full backup:



SELECT Backup_Start_Date, Backup_Finish_Date, First_LSN, Last_LSN


FROM msdb.dbo.backupset WHERE database_name = ‘AdventureWorks’;


GO


And the output we get is:



Backup_Start_Date       Backup_Finish_Date      First_LSN           Last_LSN
———————– ———————– ——————- ——————-
2007-10-05 17:42:22.000 2007-10-05 17:42:38.000 86000000001600029   91000000625600001


The LSNs are in decimal, so we need to convert the three numbers to hex so we can compare them to what’s in the log – giving: 5B:1870:1. This is the LSN of the last log record that was backed up in the full backup. Now let’s take a look at the transaction log for AdventureWorks using the undocumented fn_dblog function. This is undocumented but very well known. It’s a fully composable alternative to using the old DBCC LOG command.



USE AdventureWorks;


GO


SELECT [Current LSN], Operation, [Transaction ID], AllocUnitName FROM fn_dblog (NULL, NULL);


GO


The log record at that LSN is:



00000058:00001870:0001  LOP_MODIFY_ROW     0000:00001338  HumanResources.Employee.PK_Employee_EmployeeID


This is obviously in the middle of my contrived transaction – showing that it isn’t all in the full backup. The end of the transaction isn’t until way later in the log:



0000005e:00000628:01b1  LOP_MODIFY_ROW     0000:00001338  HumanResources.Employee.PK_Employee_EmployeeID
0000005e:00000628:01b2  LOP_COMMIT_XACT    0000:00001338  NULL


Before I do anything else, I want to take a log backup to preserve my transaction:



BACKUP LOG AdventureWorks TO DISK=‘C:\SQLskills\AdventureWorks_Log.bck’ WITH INIT;


GO


Now I want to restore the full backup and really show that my transaction isn’t all in there:



USE master;


GO


RESTORE DATABASE AdventureWorks FROM DISK=‘C:\SQLskills\AdventureWorks.bck’ WITH REPLACE, RECOVERY;


GO



SELECT MAX (VacationHours) FROM AdventureWorks.HumanResources.Employee;


GO


This returns 0. Clearly my transaction isn’t all in there, and the parts that are were rolled back during the restore. Now let’s do the same thing but using NORECOVERY for the restore of the full backup and also applying the log backup I took:



RESTORE DATABASE AdventureWorks FROM DISK=‘C:\SQLskills\AdventureWorks.bck’ WITH REPLACE, NORECOVERY;


GO


RESTORE LOG AdventureWorks FROM DISK=‘C:\SQLskills\AdventureWorks_Log.bck’ WITH RECOVERY;


GO


SELECT MAX (VacationHours) FROM AdventureWorks.HumanResources.Employee;


GO


This time the SELECT returns 200. And now you can start playng around with fn_dblog if you didn’t know about it before. I’ll be posting more about Storage Engine internals that you can figure out from the transaction log in future.