CHECKDB From Every Angle: Can CHECKDB repair everything?

In my previous post on interpreting DBCC CHECKDB output, plus in my DBCC Internals session at TechEd IT Forum in Barcelona yesterday, I mentioned there are some things that DBCC CHECKDB can’t repair. In this post I want to go into a bit more detail – based on a post from my old Storage Engine blog.

Before anyone takes this the wrong way – what do I mean by “can’t be repaired”? Remember that that purpose of repair is to make the database structurally consistent, and that to do this usually means deleting the corrupt data/structure (that’s why the option to do this was aptly named REPAIR_ALLOW_DATA_LOSS – see this post for more explanation on why repair can be bad). A corruption is deemed unrepairable when it doesn’t make sense to repair it given the damage the repair would cause, or the corruption is so rare and so complicated to repair correctly that it’s not worth the engineering effort to provide a repair. Remember also that recovery from corruptions should be based on a sound backup strategy, not on running repair, so making this trade-off in functionality makes sense.

Here’s a few of the more common unrepairable corruptions that people run into along with the reasons they can’t be repaired by DBCC CHECKDB.

PFS page header corruption

An example of this is:

Msg 8946, Level 16, State 12, Line 1
Table error: Allocation page (1:13280496) has invalid PFS_PAGE page header values.
Type is 0. Check type, alloc unit ID and page ID on the page.

DBCC CHECKDB uses the PFS pages to determine which pages are allocated – and so which pages to read to drive the various consistency checks. The only repair for a PFS page is to reconstruct it – they can’t simply be deleted as they’re a fixed part of the fabric of the database. PFS pages cannot be rebuilt because there is no infallible way to determine which pages are allocated or not. There are various algorithms I’ve experimented with to rebuild them, with optimistic or pessimistic setting of page allocation statuses and then re-running the various consistency checks to try to sort out the incorrect choices, but they all require very long run-times and won’t work 100% of the time. Given the frequency with which these corruptions are seen, and the engineering effort required to come up with an (imperfect) solution, I made the choice to leave this as unrepairable, and I don’t think that will change in future. I wrote another article about this here.

Critical system table clustered-index leaf-page corruption

An example of this is:

Msg 7985, Level 16, State 2, Server SUNART, Line 1
System table pre-checks: Object ID 4. Could not read and latch page (1:51) with
latch type SH. Check statement terminated due to unrepairable error.

In a previous post in the series I described why how and why there are special checks of the clustered indexes of the critical system tables. If any of the pages at the leaf-level of these indexes are corrupt, they cannot be repaired. Repairing would mean deallocating the page, wiping out the most important metadata for potentially hundreds of user tables and so effectively deleting all of these tables. That’s obviously an unpalatable repair for anyone to allow and so DBCC CHECKDB doesn’t do it.

Column value corruption

Here’s an example of this:

Msg 2570, Level 16, State 3, Line 1
Page (1:152), slot 0 in object ID 2073058421, index ID 0, partition ID 72057594038321152, alloc unit ID 72057594042318848 (type "In-row data"). Column "c1" value is out of range for data type "datetime".  Update column to a legal value.

This is where a column has a stored value that is outside the valid range for the column type. There are a couple of repairs that could be done for this:

  1. Delete the entire record
  2. Insert a dummy value

#1 isn’t very palatable because then data is lost and it’s not a structural problem in the database so doesn’t have to be repaired. #2 is dangerous – what value should be chosen as the dummy value? Any value put in may adversely affect business logic, or fire a trigger, or have some unwelcome meaning in the context of the table – even a NULL. Given these problems, I chose to allow people to fix the corrupt values themselves.

Metadata corruption

Here’s an example of this:

Msg 3854, Level 16, State 1, Line 2
Attribute (referenced_major_id=2089058478) of row (class=0,object_id=2105058535,column_id=0,referenced_major_id=2089058478,referenced_minor_id=0) in sys.sql_dependencies has a matching row (object_id=2089058478) in sys.objects (type=SN) that is invalid.

This example is relatively benign. There are other examples that will cause DBCC CHECKDB to terminate – not as bad as the critical system table corruption example above, but enough that DBCC CHECKDB doesn’t trust the metadata enough to use it to drive consistency checks. Repairing metadata corruption has the same problems as repairing critical system table corruption – any repair means deleting metadata about one or more tables, and hence deleting the tables themselves. It’s far better to leave the corruption unrepaired so that as much data as possible can be extracted from the remaining tables.

Summary

Repair can’t fix everything. You may end up having to perform manual and time-consuming data extraction from the corrupt database and losing lots of data because of, say, a critical system table corruption. Bottom line (as usual) – make sure you have valid backups so you don’t get into this state!

27 thoughts on “CHECKDB From Every Angle: Can CHECKDB repair everything?

  1. Hi Paul, i was reading your post, I have this problem in SQL server 2008 right now:

    Msg 8966, Level 16, State 2, Line 3
    Unable to read and latch page (1:926966) with latch type SH. l(failed to retrieve text for this error. Reason: 15105) failed.

    Some posts say that was a hardware HDD problem and the only way to repair is restoring the database, but my last restore has 12 days old, because that problem stop the backup!

    Any idea hoy to repair this?

  2. If m_pageId = (0:0) in the page header can repair fix this issue from the value in bpageno? That value is accurate (1:412168). We have 8 pages that are in this state. This is the summary for the affected table.

    There are 4657925 rows in 2838409 pages for object “ePrice_WQBill”.
    CHECKTABLE found 0 allocation errors and 1672 consistency errors in table ‘ePrice_WQBill’ (object ID 1605580758).
    repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKTABLE (WorkQueue_Prod.dbo.ePrice_WQBill).

    Thanks.

  3. DBCC CHECKDB on my database in simple recovery model failed. Please find the error message below. I copied the backup and restored on a test server and ran DBCC CHECKDB with repair_allow_data_loss. Below bold lines was the one got repaired. Does ths caused any loss of user data. Kindly suggest.

    Repair: The Clustered index successfully rebuilt for the object “sys.sysobjvalues” in database “****”.
    Repair: The page (1:629823) has been deallocated from object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 71776119065149440 (type LOB data).

    Executing the query “DBCC CHECKDB(N’****’) WITH NO_INFOMSGS

    ” failed with the following error: “Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID -3212797581117423616 (type Unknown), page (40634:1181196049). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 12716041 and -4.

    Incorrect PFS free space information for page (1:160) in object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 71776119065149440 (type LOB data). Expected value 95_PCT_FULL, actual value 50_PCT_FULL.

    Incorrect PFS free space information for page (1:3427) in object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 71776119065149440 (type LOB data). Expected value 80_PCT_FULL, actual value 95_PCT_FULL.

    Incorrect PFS free space information for page (1:22426) in object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 71776119065149440 (type LOB data). Expected value 100_PCT_FULL, actual value 50_PCT_FULL.

    Incorrect PFS free space information for page (1:26439) in object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 71776119065149440 (type LOB data). Expected value 50_PCT_FULL, actual value 100_PCT_FULL.

    Incorrect PFS free space information for page (1:27700) in object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 71776119065149440 (type LOB data). Expected value 95_PCT_FULL, actual value 100_PCT_FULL.

    Incorrect PFS free space information for page (1:27701) in object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 71776119065149440 (type LOB data). Expected value 80_PCT_FULL, actual value 95_PCT_FULL.

    Object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 71776119065149440 (type LOB data): Page (1:629823)could not be processed. See other errors for details.

    CHECKDB found 0 allocation errors and 1 consistency errors not associated with any single object.

    CHECKDB found 0 allocation errors and 7 consistency errors in table ‘sys.sysobjvalues’ (object ID 60).

    CHECKDB found 0 allocation errors and 8 consistency errors in database ‘****’.

    repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (****).

    Thanks Lakshmana Y

    1. Unknown – it deleted a page from a non-critical system table (sysobjvalues) so at some point in the future, an operation around a random table will fail. You didn’t lose any user data.

  4. We are getting below error. Can we rebuild clustered index to fix the corruption
    Msg 2511, Level 16, State 2, Line 1
    Table error: Object ID 27826728, index ID 1, partition ID 7823872782634, alloc unit ID 8738838383836 (type In-row data). Keys out of order on page (1:78362343), slots 163 and 164

  5. Paul

    I am getting this following DBCCC CHECKDB error on the master database, i checked regrading the error msg 8981 on your chapter 11 of “SQL Server 2012 Internals book” where i didn’t find any solution.

    my questions is : do we have any possible way to fix this error Msg 8981 by doing some sort of repair or it’s better to simply restore from backup ?

    Msg 8981, Level 16, State 1, Line 1
    Table error: Object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 281474980642816 (type In-row data). The next pointer of (1:113) refers to page (1:115). Neither (1:115) nor its parent were encountered. Possible bad chain linkage.
    Msg 8935, Level 16, State 1, Line 1
    Table error: Object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 281474980642816 (type In-row data). The previous link (1:120) on page (1:124) does not match the previous page (1:113) that the parent (1:275), slot 7 expects for this page.
    Msg 8934, Level 16, State 2, Line 1
    Table error: Object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 281474980642816 (type In-row data). The high key value on page (1:120) (level 0) is not less than the low key value in the parent (0:1), slot 0 of the next page (1:124).
    Msg 8934, Level 16, State 3, Line 1
    Table error: Object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 281474980642816 (type In-row data). The high key value on page (1:120) (level 0) is not less than the low key value in the parent (0:1), slot 0 of the next page (1:124).
    Msg 8935, Level 16, State 1, Line 1
    Table error: Object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 281474980642816 (type In-row data). The previous link (1:115) on page (1:264) does not match the previous page (1:120) that the parent (1:275), slot 14 expects for this page.
    Msg 8978, Level 16, State 1, Line 1
    Table error: Object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 281474980642816 (type In-row data). Page (1:264) is missing a reference from previous page (1:115). Possible chain linkage problem.

  6. Yes, I got below errors, after dbcc checkdb the error still exists:

    Msg 8935, Level 16, State 1, Line 1
    Table error: Object ID 325576198, index ID 2, partition ID 72057594438418432, alloc unit ID 72057594439794688 (type In-row data). The previous link (1:7128580) on page (1:6687422) does not match the previous page (1:7112104) that the parent (1:5287451), slot 115 expects for this page.
    Msg 8935, Level 16, State 1, Line 1
    Table error: Object ID 325576198, index ID 2, partition ID 72057594438418432, alloc unit ID 72057594439794688 (type In-row data). The previous link (1:5678739) on page (1:6691197) does not match the previous page (1:7111939) that the parent (1:6973298), slot 233 expects for this page.
    Msg 8935, Level 16, State 1, Line 1

    How to figure out it?

    Best Regards,
    Sue.

  7. Hi Paul,

    We have a customer whose SQL Server runs on VM. The customer has several databases and some of them (randomly) gets corrupted very often. We try to repair with allowing data loss every time. After a while the same databases and sometimes others get corrupted again and again. We had the customer change its disk from SSD to classical HDD. Also we had them upgrade SQL Server from 2012 to 2014. However, the problem persists, and we cannot explain the issue to the customer. Is there any possibility that fixing the database leaves something that may cause corruption again in the future? Or, where should we look for the real cause of the corruptions? As far as we know, the server is at a good data center where no power-off may happen unexpectedly to cause disk failure etc. Thanks in advance for your comments.

  8. Hi Paul,

    I am getting this following DBCCC CHECKDB error for one of our sharepoint user database (size 1TB); (sql server 2008R2)

    we already resolved it by identifying the 11 corrupted rows out of 3 lakhs rows by manual insert into new table.but again after 3 days the database got corrupted with the below error.

    Msg 8928, Level 16, State 1, Line 1
    Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594486784000 (type LOB data): Page (25:7050120) could not be processed. See other errors for details.
    Msg 8965, Level 16, State 1, Line 1
    Table error: Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594486784000 (type LOB data). The off-row data node at page (25:7050120), slot 0, text ID 686882816 is referenced by page (25:7047653), slot 0, but was not seen in the scan.
    Msg 8929, Level 16, State 1, Line 1
    Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594488946688 (type In-row data): Errors found in off-row data with ID 686882816 owned by data record identified by RID = (18:10474436:22)
    Msg 8929, Level 16, State 1, Line 1
    Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594488946688 (type In-row data): Errors found in off-row data with ID 686882816 owned by data record identified by RID = (18:10474937:8)
    Msg 8929, Level 16, State 1, Line 1
    Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594488946688 (type In-row data): Errors found in off-row data with ID 686882816 owned by data record identified by RID = (18:10475411:87)
    Msg 8929, Level 16, State 1, Line 1
    Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594488946688 (type In-row data): Errors found in off-row data with ID 686882816 owned by data record identified by RID = (18:10475796:38)
    Msg 8929, Level 16, State 1, Line 1
    Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594488946688 (type In-row data): Errors found in off-row data with ID 686882816 owned by data record identified by RID = (18:10476808:54)
    Msg 8929, Level 16, State 1, Line 1
    Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594488946688 (type In-row data): Errors found in off-row data with ID 686882816 owned by data record identified by RID = (18:10477312:4)
    Msg 8929, Level 16, State 1, Line 1
    Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594488946688 (type In-row data): Errors found in off-row data with ID 686882816 owned by data record identified by RID = (18:10477466:38)
    There are 311517 rows in 3621 pages for object “AllDocStreams”.
    CHECKDB found 0 allocation errors and 9 consistency errors in table ‘AllDocStreams’ (object ID 373576369).

    Kindly suggest the reason for it.

    Thank you in advance.

  9. Hi Paul:
    I am getting error like below : ( just a few key lines)
    DBCC results for ‘DWH_LANDING’.
    Msg 8905, Level 16, State 1, Line 3
    Extent (1:6648336) in database ID 19 is marked allocated in the GAM, but no SGAM or IAM has allocated it.
    Msg 8905, Level 16, State 1, Line 3
    Extent (1:6656424) in database ID 19 is marked allocated in the GAM, but no SGAM or IAM has allocated it.
    Msg 8905, Level 16, State 1, Line 3
    Extent (1:6664512) in database ID 19 is marked allocated in the GAM, but no SGAM or IAM has allocated it.

    CHECKDB found 127 allocation errors and 0 consistency errors in database ‘DWH_LANDING’.
    repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (DWH_LANDING).
    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    would DBCC CHECKDB (‘xyz’, REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS fix this error?
    Second question, is allocation error less severe than conssitency errors?

    Thanks
    Hui

      1. Thanks Paul. I did end up using DBCC CHECKDB (‘xyz’, REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS
        And result is below:
        CHECKDB found 127 allocation errors and 0 consistency errors in database ‘DWH_LANDING’.
        CHECKDB fixed 127 allocation errors and 0 consistency errors in database ‘DWH_LANDING’.

        Can I say there is no data loss in this case?

        thanks

  10. what resides in Extent (1:6648336) ? some say No object in the database has assigned that extent (1 allocation error … not associated with any single object), is this true?

    thanks

      1. Thanks Paul. I am really curious about those number , for example Extent (1:6648336) , I want to know if pageID 6648336 contains any user data. Therefore, I managed to get this script below:
        Is so.Object_Id referrring to the PageID for user tables? My rationale is that if I could list all PageID for objects and I could
        easily tell if those errors are related to database objects or not. This would give me much more confidence when reporting to manager. Can you check if query below is able to give me the info I am looking for?

        Thanks

        select so.name, so.object_id, sp.index_id, sp.partition_id, sp.hobt_id, sa.container_id, internals.total_pages, internals.used_pages, internals.data_pages, first_page, root_page, first_iam_page

        from sys.objects so

        inner join sys.partitions sp on so.object_id = sp.object_id

        inner join sys.allocation_units sa on sa.container_id = sp.hobt_id

        inner join sys.system_internals_allocation_units internals on internals.container_id = sa.container_id

        where so.name not like ‘sys%’

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.