[Edit 2016: The team ‘fixed’ the problem in SQL Server 2016 by skipping consistency checking these indexes unless WITH EXTENDED_LOGICAL_CHECKS is used.]
It’s no secret that DBCC CHECKDB has some performance quirks based on the schema of the database being checked and various kinds of corruptions. I was recently doing some scalability testing of DBCC CHECKDB for a blog post and discovered quite a nasty performance issue that exists in all versions of SQL Server back to SQL Server 2005. This isn’t a bug, it’s just the way things work.
The issue occurs when a nonclustered index exists that has a computed column as part of the index key or as one of the INCLUDEd columns and affects DBCC CHECKDB, DBCC CHECKFILEGROUP, and DBCC CHECKTABLE.
As a bit of background, here’s an excerpt from the SQL Server 2012 Internals book where I describe one of the ways nonclustered indexes are checked for consistency:
The nonclustered index cross-checks verify that
- Every record in a nonclustered index (whether filtered or nonfiltered) must map to a valid record in the base table (that is, the heap or clustered index).
- Every record in the base table must map to exactly one record in each nonfiltered, nonclustered index, and one record in each filtered index, where the filter allows.
The mechanism to carry out these checks efficiently has changed in every release since SQL Server 7.0—becoming progressively more and more efficient. In SQL Server 2012, two hash tables are created for each partition of each nonclustered index—one hash table is for the actual records in that partition of the nonclustered index, and the other is for the records that should exist in that partition of the nonclustered index (as calculated from the existing data records in the table).
When a nonclustered index record is processed, all columns in the record are hashed together into a BIGINT value. This includes
- The physical or logical link back to the base table (known as the base table RID)
- All included columns—even LOB and FILESTREAM values) are hashed together into a BIGINT value
The resulting value is added to the master hash value for actual records for the nonclustered index partition of which the record is part.
DBCC CHECKDB knows which nonclustered indexes exist for the table and what the complete nonclustered index record composition should be for each. When a data record is processed, the following algorithm is run for each matching nonclustered index record that should exist for the data record (taking into account any filter predicates for filtered nonclustered indexes):
- Create the nonclustered index record in memory (again, including the base table RID, plus included columns).
- Hash all columns in the index record together into a BIGINT value.
- Add the resulting value to the “should exist” master hash value for the relevant nonclustered index partition of which the index record is part.
The premise that this algorithm works on is that if no corruptions exist, the master hash values for the actual records and “should exist” records for each nonclustered index partition should match exactly at the end of the DBCC CHECKDB batch.
When a nonclustered index uses a computed column, the value of the computed column has to be computed based on the column definition. To do that, an internal mechanism called an ‘expression evaluator’ is created. The expression evaluator is provided by the Query Processor code and its behavior is entirely outside the control of DBCC CHECKDB. The drawback of the expression evaluator is that every time it is used, an exclusive latch must be held by the thread using it. This creates an incredible bottleneck and drastically affects performance.
My test system is the Dell R720 we bought last year, with 64GB of memory and 2 E5-2670 CPUs with 8 physical cores and hyperthreading enabled. It’s running SQL Server 2012 SP1 CU3.
The test database is AdventureWorks that Jonathan expanded to 500GB for me using his cool scripts. The database is split over 8 data files stored on two 320GB Fusion-io drives, with tempdb and its log placed on two more 320GB Fusion-io drives. I set things up this way to remove I/O waits from the test.
The first few runs of my testing, with unlimited parallelism produced DBCC CHECKDB elapsed times of around 340 minutes. This seemed incredibly slow to me, so I ran another test and looked at the output of sys.dm_os_waiting_tasks. I discovered that half the threads at any time were waiting for exclusive access to the DBCC_OBJECT_METADATA latch. Adding in some diagnostics to capture waits and latches gave the data below:
WaitType Wait_S Resource_S Signal_S WaitCount Percentage AvgWait_S AvgRes_S AvgSig_S --------------- ---------- ----------- --------- ---------- ----------- ---------- --------- --------- CXPACKET 684482.45 682667.13 1815.32 60294236 34.90 0.0114 0.0113 0.0000 OLEDB 659325.08 659325.08 0.00 207661462 33.62 0.0032 0.0032 0.0000 LATCH_EX 615168.39 605357.28 9811.11 798224634 31.37 0.0008 0.0008 0.0000 LatchClass Wait_S WaitCount Percentage AvgWait_S ------------------------ ---------- ---------- ----------- ---------- DBCC_OBJECT_METADATA 611768.00 764845636 99.16 0.0008
This didn’t make any sense to me as I couldn’t believe that such an incredible bottleneck existed – almost 1ms per latch wait and 800 million latch waits! – surely we’d have heard about it by now? I pinged my friend Ryan Stonecipher who inherited the DBCC CHECKDB code from me and he reminded me about the expression evaluator behavior.
I went back to my test system and looked for nonclustered indexes using computed columns using the code below:
SELECT [s].[name], [o].[name], [i].[name], [co].[name], [ic].* FROM sys.columns [co] JOIN sys.index_columns [ic] ON [ic].[object_id] = [co].[object_id] AND [ic].[column_id] = [co].[column_id] JOIN sys.indexes [i] ON [i].[object_id] = [ic].[object_id] AND [i].[index_id] = [ic].[index_id] JOIN sys.objects [o] ON [i].[object_id] = [o].[object_id] JOIN sys.schemas [s] ON [o].[schema_id] = [s].[schema_id] WHERE [co].[is_computed] = 1; GO
And I found 6 nonclustered indexes with computed columns in, on some of the largest tables in the database. I then disabled those nonclustered indexes and re-ran the DBCC CHECKDB tests.
17-18 minutes per run.
Wow! The bottleneck from the expression evaluator was causing DBCC CHECKDB to run about 20 times slower.
The wait and latch details for the test runs with the indexes disabled are as below, which is what I expected to see from running DBCC CHECKDB:
WaitType Wait_S Resource_S Signal_S WaitCount Percentage AvgWait_S AvgRes_S AvgSig_S --------------- ---------- ----------- --------- ---------- ----------- ---------- --------- --------- CXPACKET 33064.56 29331.74 3732.83 42034533 48.14 0.0008 0.0007 0.0001 OLEDB 28883.78 28883.78 0.00 173940154 42.05 0.0002 0.0002 0.0000 LATCH_EX 5021.20 4605.50 415.70 30340659 7.31 0.0002 0.0002 0.0000 LatchClass Wait_S WaitCount Percentage AvgWait_S ------------------------ ---------- ---------- ----------- ---------- DBCC_CHECK_AGGREGATE 5039.36 30267451 98.82 0.0002
There’s nothing we can do about this bottleneck except to disable nonclustered indexes on computed columns while DBCC CHECKDB is running and then rebuild them afterwards, which is not a very palatable solution. I wanted to blog about this just to make the issue known and help any of you who are beating your head against a wall trying to figure out poor DBCC CHECKDB performance.
[Edit 12/2/2014: Here’s another set of performance tests using a kCura database schema of various sizes.]