It’s been a bit light on technical posts here over the last few months but now that summer’s over I’ll be ramping up again with lots of cool stuff planned.

First up is a question that came up on the MCM distribution list this morning. There was a discussion of fast recovery (which I explained in detail in the post Lock Logging and Fast Recovery back in 2009), but in a nutshell is the ability of Enterprise Edition to allow access to a database after the REDO (rolling forward committed transactions) phase of crash recovery has completed and before the UNDO (rolling back uncommitted transactions) phase of crash recovery has completed. The idea is that UNDO can take much longer than REDO, so early access to the database is a good thing, hence it being an Enterprise Edition feature (from SQL Server 2005 onward).

The question essentially became: when is fast recovery used?

The answer is that it’s used whenever a database is started up and needs to have recovery run on it. This means fast recovery will be used:

  • When SQL Server starts up after a crash or shutdown where a database was not cleanly shut down
  • After a cluster failover
  • After a database mirroring failover
  • After an availability group failover
  • When a database state is changed to ONLINE and crash recovery needs to be run

Note that I did not include:

  • When restoring a database from backups
  • When bringing a log shipping secondary database online (this is restoring from backups)

Fast recovery is NOT used during a restore operation. You’ll read in some places online that it is, but those places are incorrect.

So why isn’t it used during a restore sequence?

It’s to do with the underlying mechanism that allows fast recovery. Operations that make changes to a database are logged, and the log record includes a bitmap of what locks were held at the time (examples of this are in the blog post I referenced above). When crash recovery runs, the REDO phase also acquires all the locks necessary to do the UNDO phase, as the REDO phase knows which transactions in the log being recovered need to be rolled back. At the end of the REDO phase, access can be given to the database because recovery can guarantee that no user can block the UNDO phase, as the UNDO phase locks are already held.

So why doesn’t that mechanism work for restores? Well restore doesn’t do one REDO and one UNDO like crash recovery does. For each backup that is restored in the restore sequence, the REDO phase of recovery is performed. This avoids having a really long REDO phase at the end of the restore sequence (which could be, say, a week’s worth of transactions spread over tens or hundreds of backups), and having to have a huge transaction log to hold all those log records.

At the end of the restore sequence, all necessary REDO has already been performed, but the REDO operations have NOT been acquiring UNDO locks. The UNDO locks aren’t acquired because UNDO isn’t likely to be the next phase during a restore sequence. It’s likely to be another restore operation. In that case, it’s likely that some of the transactions that were uncommitted at the end of the last restore become committed during the next restore, so if UNDO locks had been acquired, they would have to be released again. This would involve either rescanning the log records involved or keeping track of which in-restore transactions had acquired which locks. Either of these would be complicated and time consuming, so the benefit hasn’t been deemed worthwhile for the engineering effort involved.

So no fast recovery during restores.

But hold on, I hear you say, database mirroring is just a constant REDO process so how come fast recovery works for that? Back in SQL Server 2005, when a database mirroring failover occurred, the database was momentarily set offline so that full crash recovery would be run when the database came back online, hence allowing fast recovery to work. From SQL Server 2008 onward, that doesn’t happen any more, so there is a mechanism that figures out what UNDO locks are necessary when a mirroring failover occurs, allowing fast recovery behavior. I guess technically that same mechanism could be ported over to the restore code base, but I think it would be difficult to do, and I don’t think there’s enough demand to make the engineering effort and possible destabilization of the restore code worthwhile.

Hope this helps explain things – let me know if you have any questions.