CHECKDB internals: what is the BlobEater?

Several times over the last month, I've been asked about the query that drives DBCC CHECKDB and other consistency checking commands, which has a variable called BlobEater in it. In this post I'd like to explain what the query is doing.

In my previous post in the CHECKDB From Every Angle series, How does DBCC CHECKDB WITH ESTIMATEONLY work?, I explained how DBCC CHECKDB uses 'facts', little bits of information that describe something that DBCC CHECKDB has noted about a data file page. Once all the pages for an entire table (or set of tables, if batching is enabled – see that same blog post I mentioned above), all the facts are aggregated together and they should all cancel each other out. When there are extra facts (e.g. two pages in an index B-tree point to the same page at a lower level), or missing facts (e.g. a LOB fragment doesn't have any other LOB fragments or data/index record pointing to it), then DBCC CHECKDB can tell there's a corruption.

As DBCC CHECKDB is generating all these factors from essentially random pages in the database (it reads the pages in a table in physical order, not logical order), there has to be some sorting of the facts before aggregation can take place. This is all driven using the query processor. Each thread in DBCC CHECKDB reads pages, generates facts, and gives them to the query processor to sort and aggregate. Once all reading has finished, the facts are then given back to parallel threads inside DBCC CHECKDB to figure out whether corruptions are present.

A picture to show this mechanism looks as follows:

 

If you're doing any tracing or profiling while DBCC CHECKDB is running, you'll see the query below:

DECLARE @BlobEater VARBINARY(8000);
SELECT @BlobEater = CheckIndex(ROWSET_COLUMN_FACT_BLOB)
FROM <memory address of fact rowset>
GROUP BY ROWSET_COLUMN_FACT_KEY
>> WITH ORDER BY
     ROWSET_COLUMN_FACT_KEY,
     ROWSET_COLUMN_SLOT_ID,
     ROWSET_COLUMN_COMBINED_ID,
     ROWSET_COLUMN_FACT_BLOB
OPTION(ORDER GROUP);

The explanation of the parts of this query is documented in the Inside SQL Server 2008 and forthcoming Inside SQL Server 2012 books, and I've quoted it below from my DBCC Internals chapter:

This query brings the query processor and the DBCC CHECKDB code together to perform the fact-generation, fact-sorting, fact-storing, and fact-aggregation algorithms. The parts of the query are as follows:

  • @BlobEater This is a dummy variable with no purpose other than to consume any output from the CheckIndex function (there should never be any, but the syntax requires it).
  • CheckIndex (ROWSET_COLUMN_FACT_BLOB) This is the custom aggregation function inside DBCC CHECKDB that the query processor calls with sorted and grouped facts as part of the overall fact aggregation algorithm.
  • <memory address of fact rowset> This is the memory address of the OLEDB rowset that DBCC CHECKDB provides to the query processor. The query processor queries this rowset for rows (containing the generated facts) as part of the overall fact generation algorithm.
  • GROUP BY ROWSET_COLUMN_FACT_KEYThis triggers the aggregation in the query processor.
  • >> WITH ORDER BY <column list> This is internal-only syntax that provides ordered aggregation to the aggregation step. As I explained earlier, the DBCC CHECKDB aggregation code is based on the assumption that the order of the aggregated stream of facts from the query processor is forced (that is, it requires that the sort order of the keys within each group is the order of the four keys in the query).
  • OPTION(ORDER GROUP) This is a Query Optimizer hint that forces stream aggregation. It forces the Query Optimizer to sort on the grouping columns and avoid hash aggregation.

And there you have it. There are quite a few pieces of query processor syntax that are only callable from inside the Engine itself, and several pieces that are only callable from DBCC, for instance to force an index rebuild to read the heap/clustered index rather than reading from the existing index.

Hope this is interesting!

6 thoughts on “CHECKDB internals: what is the BlobEater?

  1. Hi Paul,
    I have a database integrity job running everyday on Sql 2008 r2 databases. Temp db data file is not clearing or deallocating temp db space after DBCC CheckDB “myDB” process completed successful. I used SP_WhoIsActive script to diagnose and found out DBCC CheckDB process is making temp db to be 45 GB but these is only for one database which is about 16 GB that DBCC CheckDB is using temp db heavily. Here is below results of few columns from sp_whoisactive for MyDB:

    Note: When I restart SQL Server tempdb will release allocated space to OS but will grown to 45 GB after DBCC Check DB job runs.

    tran_log_writes CPU tempdb_ tempdb_ reads Writes physical_reads
    allocation Current
    MyDB: 0 (0 kB),MyDB: 0 (0 kB) 20046 0 0 62755 9 61187
    MyDB: 0 (0 kB),MyDB: 0 (0 kB) 47018 264576 264576 255461 9 154598
    MyDB: 0 (0 kB),MyDB: 0 (0 kB) 74958 372480 372480 391181 9 249835
    MyDB: 0 (0 kB),MyDB: 0 (0 kB) 97110 744960 744960 590429 11 309388

    1. What is your question? If your tempdb needs to be at 45GB for the CHECKDB every day, leave it at 45GB. You’re not going to gain anything from shrinking it down every day.

  2. Hi Paul,

    Thanks for details of how DBCC checkdb run..

    I am facing high CPU issue during checkdb run. Also, getting Stack dump errors during checkdb run. There are no consistency or allocation errors found on database.

    Process 350:0:1 (0x758) Worker 0x0000000430D2E1A0 appears to be non-yielding on Scheduler 6. Thread creation time: 13117835005354. Approx Thread CPU Used: kernel 0 ms, user 0 ms. Process Utilization 1%. System Idle 49%. Interval: 312287 ms.

    2016-09-09 00:51:07.290 Server * BEGIN STACK DUMP:
    2016-09-09 00:51:07.290 Server * 09/09/16 00:51:07 spid 3576
    2016-09-09 00:51:07.290 Server *
    2016-09-09 00:51:07.290 Server * Non-yielding Scheduler
    2016-09-09 00:51:07.290 Server *
    2016-09-09 00:51:07.290 Server * *******************************************************************************
    2016-09-09 00:51:07.290 Server Stack Signature for the dump is 0x0000000000000293
    2016-09-09 00:51:12.010 Server External dump process return code 0x20000001.

    Kindly guide me on this.

    Regards,
    Suyog

  3. Hello Paul:

    I want to know what is fix when you see BlobEater code?

    In my case no traces or profiler is running.

    Any help is appreciate.

    Thank you
    Jay

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.