The Curious Case of… unstoppable DBCC CHECKDB

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

Last week I was asked to explain an unstoppable DBCC CHECKDB process. The client regularly runs DBCC CHECKDB at night during off-peak load times and has no problems with it but last week they ran it during the day because of a corruption indication, then decided to stop it and couldn’t. They killed the spid and nothing happened except the spid showed as being in the KILLED/ROLLBACK state.

Let’s back up a bit and explain what DBCC CHECKDB is doing under the covers.

It needs to see a transactionally-consistent, unchanging view of the database. Prior to SQL Server 2005 it did this by essentially running its own version of crash recovery inside itself, by analyzing the database’s transaction log. This was pretty tortuous code that I helped write for SQL Server 2000, there were some cases where it caused false positives, and I had great fun one week in 2001 or 2002 removing all that code forever. The replacement for that code was to instead use a private database snapshot, because a database snapshot gives a transactionally-consistent, unchanging view of a database.

So, the first thing DBCC CHECKDB does in SQL Server 2005 and later is create a database snapshot (unless you specified WITH TABLOCK, or the target database is read-only, single-user, or already a database snapshot).

A database snapshot runs crash recovery of the target database *into* the database snapshot, and herein lies the problem: crash recovery cannot be interrupted, and there’s no check in the crash recovery code to tell whether it’s *real* crash recovery, or crash recovery into a database snapshot (or into the DBCC CHECKDB private snapshot).

If there’s a lot of transaction log to be recovered as part of the initial crash recovery into the database snapshot, that could take a long time. And if someone tries to kill the DBCC CHECKDB while the database snapshot is still running crash recovery, nothing will happen until the crash recovery finishes. It’s not a DBCC shortcoming, it’s a database snapshot shortcoming, and you just have to let it finish.

So there you go – mystery explained!

SQLskills SQL101: Why DBCC CHECKDB can miss memory corruption

As Kimberly blogged about earlier this year, SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

An interesting situation was discussed online recently which prompted me to write this post. A fellow MVP was seeing periodic corruption messages in his error log, but DBCC CHECKDB on all databases didn’t find any corruptions. A subsequent restart of the instance caused the problem to go away.

My diagnosis? Memory corruption. Something had corrupted a page in memory – maybe it was bad memory chips or a memory scribbler (something that writes into SQL Server’s buffer pool, like a poorly-written extended stored procedure), or maybe a SQL Server bug. Whatever it was, restarting the instance wiped the buffer pool clean, removing the corrupt page.

So why didn’t DBCC CHECKDB encounter the corrupt page?

The answer is to do with DBCC CHECKDB‘s use of database snapshots (and all other DBCC CHECK* commands). It creates a database snapshot and then runs the consistency-checking algorithms on the database snapshot. The database snapshot is a transactionally-consistent, unchanging view of the database, which is what DBCC CHECKDB requires.

More info on DBCC CHECKDB’s use of snapshots, and potential problems can be found at:

A database snapshot is a separate database as far as the buffer pool is concerned, with its own database ID. A page in the buffer pool is owned by exactly one database ID, and cannot be shared by any other databases. So when DBCC CHECKDB reads a page in the context of the database snapshot, that page must be read from the source database on disk; it cannot use the page from the source database if it’s already in memory, as that page has the wrong database ID.

This means that DBCC CHECKDB reads the entire source database from disk when it uses a database snapshot. This is not a bad thing.

This also means that if there’s a page in the source database that’s corrupt in memory but not corrupt on disk, DBCC CHECKDB will not encounter it if it uses a database snapshot (the default).

If you suspect that a database has some corruption in memory, the only way to have DBCC CHECKDB use the in-memory pages, is to use the WITH TABLOCK option, which skips using a database snapshot and instead uses locks to quiesce changes in the database.

Hope this helps clear up any confusion!

Why PFS pages cannot be repaired

Last week there was a short discussion on Twitter about why PFS pages (damaged header, not individual PFS bytes) can’t be repaired (prompted by a #sqlhelp question about why they can’t be single-page restored, like other per-database allocation bitmaps). Just for the record, they can’t be fixed by automatic page repair in a mirror or AG either.

PFS pages occur every 8088 pages in every data file and store a byte of information about itself and the following 8087 pages. The most important piece of information it stores is whether a page is allocated (in use) or not. You can read more about PFS pages and the other per-database allocation bitmaps in this blog post.

So why can’t they be rebuilt by DBCC CHECKDB, when all the other per-database allocation bitmaps can?

The answer is that the is-this-page-allocated-or-not information is not duplicated anywhere else in the database, and it’s impossible to reconstruct it in all cases.

You might think that DBCC CHECKDB could work out which pages are allocated by inferring that state if a page is linked to in some way from another page that’s known to be allocated, and it could do that, except for the case of a heap with no nonclustered indexes.

If a heap has no nonclustered indexes, there is no other structure in the database that links to any of the data pages in the heap. Therefore, without the information in a PFS page, there’s no way to tell which of the pages contained in extents allocated to the heap are actually allocated or not. This is because SQL Server does not touch a page when it is deallocated, so there’s nothing on a page that indicates whether a page is currently allocated or has been deallocated.

So what if DBCC CHECKDB can tell that there are no such cases in the database?

The answer is that the algorithm to rebuild a PFS page given the links from other pages to pages covered by that PFS range is extremely complicated and would involve searching through the entire database, reading and processing all pages a second time *after* they’ve been repaired, looking for linkages to pages in the broken range. While it sounds technically possible, when scoping out writing such an algorithm back in 2001-2002, I quickly ran into run-time and complexity challenges that made the work entirely infeasible.

It’s also not possible to just mark all the pages allocated – because then allocation-order scans would come across potentially unformatted pages and fail. It would also break backups that use WITH CHECKSUM. Such an algorithm could be made to work (in the absence of heaps with no nonclustered indexes), but has the same problems as the algorithm above.

So – PFS pages can’t be repaired, and unless the database structure changes to mirror that allocation information in some way, I don’t see that changing at any point in the future.

Hope you found this interesting!