Data recovery: investigating weird SELECT failures around corruption

An interesting corruption problem cropped up on the MCM distribution list yesterday and after I figured it out, I thought it would make a good blog post in case anyone hits a similar problem.

In a nutshell, the problem was corruption such that a simple SELECT * query failed, but a SELECT * query with an ORDER BY clause worked.

Let’s investigate!

Creating the scenario

First I’ll create the specific corruption. I’m going to create a simple table with a clustered index, and sizing the rows so there’s only one per page.


USE [Company];

	[c3] CHAR (4100) DEFAULT 'a');
CREATE CLUSTERED INDEX [test_cl] ON [test] ([c1], [c2]);


GO 10000

Now I’ll delete one of the rows, creating a page with a single ghost record on it, which I can see using DBCC PAGE on the first PFS page in the database.

DELETE FROM [test] WHERE [c1] = 150;

DBCC PAGE ([Company], 1, 1, 3);
<snip output for brevity>
(1:289)      - (1:295)      =     ALLOCATED   0_PCT_FULL                     Mixed Ext
(1:296)      - (1:437)      =     ALLOCATED   0_PCT_FULL                              
(1:438)      -              =     ALLOCATED   0_PCT_FULL Has Ghost                    
(1:439)      - (1:8087)     =     ALLOCATED   0_PCT_FULL                              

So page (1:438) is the one that had the row with key value 150 on it. It’s still allocated and linked into the clustered index structure though, so I’ll force the Access Methods code to ‘see’ it by doing a scan that’ll include it, and that will queue the page up for cleaning by the Ghost Cleanup Task.

SELECT COUNT (*) FROM [test] WHERE [c1] &amp;lt; 200;

And now if I wait 10 seconds and look at the PFS page again, I can see it’s been cleaned and deallocated – it’s no longer part of the clustered index. (You’ll notice that the PFS byte still says that the page has a ghost record; that’s because when a page is deallocated, the only PFS bits that are changed are the allocation status.)

DBCC PAGE ([Company], 1, 1, 3);
<snip output for brevity>
(1:289)      - (1:295)      =     ALLOCATED   0_PCT_FULL                     Mixed Ext
(1:296)      - (1:437)      =     ALLOCATED   0_PCT_FULL                              
(1:438)      -              = NOT ALLOCATED   0_PCT_FULL Has Ghost                    
(1:439)      - (1:8087)     =     ALLOCATED   0_PCT_FULL                              

Nothing’s corrupt at this point, so let’s cause some problems.

Creating the corruption

First off I’m going to zero out page (1:438) using DBCC WRITEPAGE:


DECLARE @offset INT;
SELECT @offset = 0;

WHILE (@offset < 8185)
	DBCC WRITEPAGE (N'Company', 1, 438, @offset, 8, 0x0000000000000000, 1);
	SELECT @offset = @offset + 8;


And there’s still no corruption here, because page (1:438) is a deallocated page.

So now I’ll corrupt it by forcing it to be allocated again. For this I need to find the offset of the PFS byte for page (1:438) using a hex dump of the PFS page and looking for a page that has the PFS bits matching the PFS output for page (1:438) above. The page only has the ‘Has Ghost’ bit set, which is 0x08.

DBCC PAGE ([Company], 1, 1, 2);
Memory Dump @0x00000000185EA000

00000000185EA000:   010b0000 00000000 00000000 00000000 00000000  ....................
00000000185EA014:   00000100 63000000 0200fc1f 01000000 01000000  ....c.....ü.........
00000000185EA028:   12010000 fd000000 01000000 00000000 00000000  ....ý...............
00000000185EA03C:   7944876a 01000000 00000000 00000000 00000000  yD‡j................
00000000185EA050:   00000000 00000000 00000000 00000000 00009c1f  ..................œ.
00000000185EA064:   44444444 00004444 60647060 74706070 60607060  DDDD..DD`dp`tp`p``p`
00000000185EA078:   60707060 40404040 40404040 61706070 60606070  `pp`@@@@@@@@ap`p```p
00000000185EA08C:   60706060 60706060 60706060 60606070 40404040  `p```p```p`````p@@@@
00000000185EA0A0:   40404040 40404040 40404030 60706060 70607060  @@@@@@@@@@@0`p``p`p`
00000000185EA0B4:   70706070 70606060 70607060 70607060 70607060  pp`pp```p`p`p`p`p`p`
00000000185EA0C8:   70607060 70607060 70606060 60607060 60706070  p`p`p`p`p`````p``p`p
00000000185EA0DC:   60706070 60706070 60707070 60607060 60706060  `p`p`p`p`ppp``p``p``
00000000185EA0F0:   60706060 70606060 60606060 70607060 60706060  `p``p```````p`p``p``
00000000185EA104:   60606060 60606060 40000000 00000000 60606060  ````````@.......````
00000000185EA118:   60606060 60606060 60606060 60606060 60606060  ````````````````````
00000000185EA12C:   40404040 40404040 40404040 40404040 40404040  @@@@@@@@@@@@@@@@@@@@
00000000185EA140:   40404040 40404040 40404040 40404040 40404040  @@@@@@@@@@@@@@@@@@@@
00000000185EA154:   60606060 64646260 40404040 40404040 40404040  ````ddb`@@@@@@@@@@@@
00000000185EA168:   40404040 40400000 00000000 40400000 00000000  @@@@@@......@@......
00000000185EA17C:   40404040 00000000 70606060 60606060 40404040  @@@@....p```````@@@@
00000000185EA190:   40404040 40404040 40404040 40404040 40404040  @@@@@@@@@@@@@@@@@@@@
00000000185EA1A4:   40404040 40404040 40404040 40404040 40404040  @@@@@@@@@@@@@@@@@@@@
00000000185EA1B8:   40404040 40404040 40404040 40404040 40404040  @@@@@@@@@@@@@@@@@@@@
00000000185EA1CC:   40404040 40404040 40404040 40404040 40404040  @@@@@@@@@@@@@@@@@@@@
00000000185EA1E0:   40404040 40404040 40404040 40404040 40404040  @@@@@@@@@@@@@@@@@@@@
00000000185EA1F4:   40404040 40404040 40404040 40404040 40404040  @@@@@@@@@@@@@@@@@@@@
00000000185EA208:   40404040 40404040 40404040 40404040 40400840  @@@@@@@@@@@@@@@@@@.@
00000000185EA21C:   40404040 40404040 40404040 40404040 40404040  @@@@@@@@@@@@@@@@@@@@
00000000185EA230:   40404040 40404040 40404040 40404040 40404040  @@@@@@@@@@@@@@@@@@@@

Can you spot the 0x08 byte? It’s at offset 0x21a on the page.

I can force page (1:438) to become allocated again by setting that byte offset in the PFS page to 0x40, again using DBCC WRITEPAGE.

DBCC WRITEPAGE (N'Company', 1, 1, 538, 1, 0x40);

And now if I run DBCC CHECKDB, I can see some corruption:

Msg 8909, Level 16, State 1, Line 68
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:438) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).
CHECKDB found 0 allocation errors and 1 consistency errors not associated with any single object.
Msg 8928, Level 16, State 1, Line 68
Object ID 245575913, index ID 1, partition ID 72057594040614912, alloc unit ID 72057594045857792 (type In-row data): Page (1:438) could not be processed.  See other errors for details.
CHECKDB found 0 allocation errors and 1 consistency errors in table 'test' (object ID 245575913).
CHECKDB found 0 allocation errors and 2 consistency errors in database 'Company'.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (Company).

And the final step is to make the database read-only:


Investigating the corruption

In the case described in the DL, there was no backup and so the client wanted to extract as much data as possible.

Running a simple SELECT * didn’t work, like so:

SELECT * FROM [test];

The query will start to give results and then fail with:

Msg 824, Level 24, State 2, Line 74
SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:438; actual 0:0). It occurred during a read of page (1:438) in database ID 10 at offset 0x0000000036c000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL12.SQL2014\MSSQL\DATA\Company.mdf'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

But if I run a SELECT * that has ordering, it works fine:

SELECT * FROM [test] ORDER BY [c1];

What’s going on?


The explanation is to do with how the two scans work for the SELECT statements.

The first scan is doing what’s called an allocation order scan. This is where the Access Methods decides not to use the index structure to give back the records. An allocation order scan has three requirements:

  • The query plan must allow for an unordered scan of the index
  • The index must be bigger than 64 pages
  • The data in the index must be guaranteed not to change

The allocation order scan uses the IAM pages to load a scanning object and then zip through the extents in allocation order, using the PFS pages to determine which pages in the extents are allocated and should be read and processed by the scan.

The second scan is doing a normal ordered scan, which navigates down to the left-hand side of the leaf level in the index and then follows the leaf-level page linkages to scan through the index.

So where does the corruption come in?

The page that I corrupted and then forced to be allocated again isn’t linked in to the leaf-level of the index, and so the ordered scan doesn’t attempt to read it. However, because it’s allocated, the allocation order scan thinks it’s a valid part of the extent that contains it and so tried to read it, resulting in the 823 error.

The key to having this scenario is that the database is set to read-only, which satisfies the third requirement for an allocation order scan. If you set the database back to read-write, and then run the first SELECT statement, it will work perfectly because the allocation order scan requirements aren’t being met any longer.

You can read more about allocation order scans in this great post from Paul White.


A lot of the time when dealing with database corruption and trying to effect comprehensive data recovery without backups, you’ll run into weird situations like this. When you do, step back, look at the query plan for what you’re doing, and think about what the Access Methods is doing under the covers to implement the query plan. And then think about how to work around that so you can continue getting more data back.

Hope this helps!

4 thoughts on “Data recovery: investigating weird SELECT failures around corruption

  1. Great read Paul! I’m new to SQL Server, but I did recognize the “Access Methods” as part of the SS Storage Engine. ;) In the DBA world I came from (Tandem) we would have called HP/Tandem support on this one.

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.