An interesting corruption problem cropped up on the MCM distribution list yesterday and after I figured it out, I thought it would make a good blog post in case anyone hits a similar problem.
In a nutshell, the problem was corruption such that a simple SELECT * query failed, but a SELECT * query with an ORDER BY clause worked.
Creating the scenario
First I’ll create the specific corruption. I’m going to create a simple table with a clustered index, and sizing the rows so there’s only one per page.
CREATE DATABASE [Company]; GO USE [Company]; GO CREATE TABLE [test] ( [c1] INT IDENTITY, [c2] UNIQUEIDENTIFIER DEFAULT NEWID (), [c3] CHAR (4100) DEFAULT 'a'); GO CREATE CLUSTERED INDEX [test_cl] ON [test] ([c1], [c2]); GO SET NOCOUNT ON; GO INSERT INTO [test] DEFAULT VALUES; GO 10000
Now I’ll delete one of the rows, creating a page with a single ghost record on it, which I can see using DBCC PAGE on the first PFS page in the database.
DELETE FROM [test] WHERE [c1] = 150; GO DBCC TRACEON (3604); DBCC PAGE ([Company], 1, 1, 3); GO
<snip output for brevity> (1:289) - (1:295) = ALLOCATED 0_PCT_FULL Mixed Ext (1:296) - (1:437) = ALLOCATED 0_PCT_FULL (1:438) - = ALLOCATED 0_PCT_FULL Has Ghost (1:439) - (1:8087) = ALLOCATED 0_PCT_FULL
So page (1:438) is the one that had the row with key value 150 on it. It’s still allocated and linked into the clustered index structure though, so I’ll force the Access Methods code to ‘see’ it by doing a scan that’ll include it, and that will queue the page up for cleaning by the Ghost Cleanup Task.
SELECT COUNT (*) FROM [test] WHERE [c1] &lt; 200; GO
And now if I wait 10 seconds and look at the PFS page again, I can see it’s been cleaned and deallocated – it’s no longer part of the clustered index. (You’ll notice that the PFS byte still says that the page has a ghost record; that’s because when a page is deallocated, the only PFS bits that are changed are the allocation status.)
DBCC PAGE ([Company], 1, 1, 3); GO
<snip output for brevity> (1:289) - (1:295) = ALLOCATED 0_PCT_FULL Mixed Ext (1:296) - (1:437) = ALLOCATED 0_PCT_FULL (1:438) - = NOT ALLOCATED 0_PCT_FULL Has Ghost (1:439) - (1:8087) = ALLOCATED 0_PCT_FULL
Nothing’s corrupt at this point, so let’s cause some problems.
Creating the corruption
First off I’m going to zero out page (1:438) using DBCC WRITEPAGE:
ALTER DATABASE [Company] SET SINGLE_USER; GO DECLARE @offset INT; SELECT @offset = 0; WHILE (@offset < 8185) BEGIN DBCC WRITEPAGE (N'Company', 1, 438, @offset, 8, 0x0000000000000000, 1); SELECT @offset = @offset + 8; END; GO ALTER DATABASE [Company] SET MULTI_USER; GO
And there’s still no corruption here, because page (1:438) is a deallocated page.
So now I’ll corrupt it by forcing it to be allocated again. For this I need to find the offset of the PFS byte for page (1:438) using a hex dump of the PFS page and looking for a page that has the PFS bits matching the PFS output for page (1:438) above. The page only has the ‘Has Ghost’ bit set, which is 0x08.
DBCC PAGE ([Company], 1, 1, 2); GO
<snip> Memory Dump @0x00000000185EA000 00000000185EA000: 010b0000 00000000 00000000 00000000 00000000 .................... 00000000185EA014: 00000100 63000000 0200fc1f 01000000 01000000 ....c.....ü......... 00000000185EA028: 12010000 fd000000 01000000 00000000 00000000 ....ý............... 00000000185EA03C: 7944876a 01000000 00000000 00000000 00000000 yDj................ 00000000185EA050: 00000000 00000000 00000000 00000000 00009c1f ................... 00000000185EA064: 44444444 00004444 60647060 74706070 60607060 DDDD..DD`dp`tp`p``p` 00000000185EA078: 60707060 40404040 40404040 61706070 60606070 `pp`@@@@@@@@ap`p```p 00000000185EA08C: 60706060 60706060 60706060 60606070 40404040 `p```p```p`````p@@@@ 00000000185EA0A0: 40404040 40404040 40404030 60706060 70607060 @@@@@@@@@@@0`p``p`p` 00000000185EA0B4: 70706070 70606060 70607060 70607060 70607060 pp`pp```p`p`p`p`p`p` 00000000185EA0C8: 70607060 70607060 70606060 60607060 60706070 p`p`p`p`p`````p``p`p 00000000185EA0DC: 60706070 60706070 60707070 60607060 60706060 `p`p`p`p`ppp``p``p`` 00000000185EA0F0: 60706060 70606060 60606060 70607060 60706060 `p``p```````p`p``p`` 00000000185EA104: 60606060 60606060 40000000 00000000 60606060 ````````@.......```` 00000000185EA118: 60606060 60606060 60606060 60606060 60606060 ```````````````````` 00000000185EA12C: 40404040 40404040 40404040 40404040 40404040 @@@@@@@@@@@@@@@@@@@@ 00000000185EA140: 40404040 40404040 40404040 40404040 40404040 @@@@@@@@@@@@@@@@@@@@ 00000000185EA154: 60606060 64646260 40404040 40404040 40404040 ````ddb`@@@@@@@@@@@@ 00000000185EA168: 40404040 40400000 00000000 40400000 00000000 @@@@@@......@@...... 00000000185EA17C: 40404040 00000000 70606060 60606060 40404040 @@@@....p```````@@@@ 00000000185EA190: 40404040 40404040 40404040 40404040 40404040 @@@@@@@@@@@@@@@@@@@@ 00000000185EA1A4: 40404040 40404040 40404040 40404040 40404040 @@@@@@@@@@@@@@@@@@@@ 00000000185EA1B8: 40404040 40404040 40404040 40404040 40404040 @@@@@@@@@@@@@@@@@@@@ 00000000185EA1CC: 40404040 40404040 40404040 40404040 40404040 @@@@@@@@@@@@@@@@@@@@ 00000000185EA1E0: 40404040 40404040 40404040 40404040 40404040 @@@@@@@@@@@@@@@@@@@@ 00000000185EA1F4: 40404040 40404040 40404040 40404040 40404040 @@@@@@@@@@@@@@@@@@@@ 00000000185EA208: 40404040 40404040 40404040 40404040 40400840 @@@@@@@@@@@@@@@@@@.@ 00000000185EA21C: 40404040 40404040 40404040 40404040 40404040 @@@@@@@@@@@@@@@@@@@@ 00000000185EA230: 40404040 40404040 40404040 40404040 40404040 @@@@@@@@@@@@@@@@@@@@ <snip>
Can you spot the 0x08 byte? It’s at offset 0x21a on the page.
I can force page (1:438) to become allocated again by setting that byte offset in the PFS page to 0x40, again using DBCC WRITEPAGE.
DBCC WRITEPAGE (N'Company', 1, 1, 538, 1, 0x40); GO
And now if I run DBCC CHECKDB, I can see some corruption:
DBCC CHECKDB (N'Company') WITH NO_INFOMSGS; GO
Msg 8909, Level 16, State 1, Line 68 Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:438) contains an incorrect page ID in its page header. The PageId in the page header = (0:0). CHECKDB found 0 allocation errors and 1 consistency errors not associated with any single object. Msg 8928, Level 16, State 1, Line 68 Object ID 245575913, index ID 1, partition ID 72057594040614912, alloc unit ID 72057594045857792 (type In-row data): Page (1:438) could not be processed. See other errors for details. CHECKDB found 0 allocation errors and 1 consistency errors in table 'test' (object ID 245575913). CHECKDB found 0 allocation errors and 2 consistency errors in database 'Company'. repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (Company).
And the final step is to make the database read-only:
ALTER DATABASE [Company] SET READ_ONLY; GO
Investigating the corruption
In the case described in the DL, there was no backup and so the client wanted to extract as much data as possible.
Running a simple SELECT * didn’t work, like so:
SELECT * FROM [test]; GO
The query will start to give results and then fail with:
Msg 824, Level 24, State 2, Line 74 SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:438; actual 0:0). It occurred during a read of page (1:438) in database ID 10 at offset 0x0000000036c000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL12.SQL2014\MSSQL\DATA\Company.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
But if I run a SELECT * that has ordering, it works fine:
SELECT * FROM [test] ORDER BY [c1]; GO
What’s going on?
The explanation is to do with how the two scans work for the SELECT statements.
The first scan is doing what’s called an allocation order scan. This is where the Access Methods decides not to use the index structure to give back the records. An allocation order scan has three requirements:
- The query plan must allow for an unordered scan of the index
- The index must be bigger than 64 pages
- The data in the index must be guaranteed not to change
The allocation order scan uses the IAM pages to load a scanning object and then zip through the extents in allocation order, using the PFS pages to determine which pages in the extents are allocated and should be read and processed by the scan.
The second scan is doing a normal ordered scan, which navigates down to the left-hand side of the leaf level in the index and then follows the leaf-level page linkages to scan through the index.
So where does the corruption come in?
The page that I corrupted and then forced to be allocated again isn’t linked in to the leaf-level of the index, and so the ordered scan doesn’t attempt to read it. However, because it’s allocated, the allocation order scan thinks it’s a valid part of the extent that contains it and so tried to read it, resulting in the 823 error.
The key to having this scenario is that the database is set to read-only, which satisfies the third requirement for an allocation order scan. If you set the database back to read-write, and then run the first SELECT statement, it will work perfectly because the allocation order scan requirements aren’t being met any longer.
You can read more about allocation order scans in this great post from Paul White.
A lot of the time when dealing with database corruption and trying to effect comprehensive data recovery without backups, you’ll run into weird situations like this. When you do, step back, look at the query plan for what you’re doing, and think about what the Access Methods is doing under the covers to implement the query plan. And then think about how to work around that so you can continue getting more data back.
Hope this helps!