(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)
Last week I was asked to explain an unstoppable DBCC CHECKDB process. The client regularly runs DBCC CHECKDB at night during off-peak load times and has no problems with it but last week they ran it during the day because of a corruption indication, then decided to stop it and couldn’t. They killed the spid and nothing happened except the spid showed as being in the KILLED/ROLLBACK state.
Let’s back up a bit and explain what DBCC CHECKDB is doing under the covers.
It needs to see a transactionally-consistent, unchanging view of the database. Prior to SQL Server 2005 it did this by essentially running its own version of crash recovery inside itself, by analyzing the database’s transaction log. This was pretty tortuous code that I helped write for SQL Server 2000, there were some cases where it caused false positives, and I had great fun one week in 2001 or 2002 removing all that code forever. The replacement for that code was to instead use a private database snapshot, because a database snapshot gives a transactionally-consistent, unchanging view of a database.
So, the first thing DBCC CHECKDB does in SQL Server 2005 and later is create a database snapshot (unless you specified WITH TABLOCK, or the target database is read-only, single-user, or already a database snapshot).
A database snapshot runs crash recovery of the target database *into* the database snapshot, and herein lies the problem: crash recovery cannot be interrupted, and there’s no check in the crash recovery code to tell whether it’s *real* crash recovery, or crash recovery into a database snapshot (or into the DBCC CHECKDB private snapshot).
If there’s a lot of transaction log to be recovered as part of the initial crash recovery into the database snapshot, that could take a long time. And if someone tries to kill the DBCC CHECKDB while the database snapshot is still running crash recovery, nothing will happen until the crash recovery finishes. It’s not a DBCC shortcoming, it’s a database snapshot shortcoming, and you just have to let it finish.
So there you go – mystery explained!