Search Engine Q&A #21: File header pages, and file header corruption


Following on from my previous post on boot pages and boot page corruption, I’ve been asked about file header pages – and I was already planning this post as the next in the series.


So what’s a file header page? Every data file in a database has the very first 8kb page (i.e. page 0 in the file) set aside as the place to store all the metadata info about the file. As with the boot page, you can look at the contents with DBCC PAGE and it will interpret all the fields for you, or you can use the DBCC FILEHEADER command, which does a better job. This is undocumented and unsupported, just like DBCC DBINFO for looking at the database boot page, but it’s been discussed and posted about on the Internet before so it’s existence is no secret.


The command take a database name or database ID plus the file ID to dump. Here I’ve created a database called FileHeaderTest and used SSMS with results-to-text, plus a bunch of editing of the results to make it blog-able:



DBCC FILEHEADER (‘FileHeaderTest’, 1);
GO


FileId                : 1
LogicalName           : FileHeaderTest
BindingId             : D30AE3EF-14A6-47D5-B267-96F38238D882
FileGroup             : 1
Size                  : 152
MaxSize               : -1
MinSize               : 152
UserShrinkSize        : -1
Growth                : 128
BackupLSN             : 0
RedoStartLSN          : 0
FirstLSN              : 0
MaxLSN                : 0
FirstUpdateLSN        : 0
CreateLSN             : 0
SectorSize            : 512
RecoveryForkGUID      : 00000000-0000-0000-0000-000000000000
RecoveryForkLSN       : 0
DifferentialBaseLsn   : 19000000048800037
DifferentialBaseGuid  : 279A8EF4-4431-4CA5-8939-F613E5BC3033
Status                : 2
RestoreStatus         : 0
ReadOnlyLsn           : 0
ReadWriteLsn          : 0
MaxLsnBranchId        : 00000000-0000-0000-0000-000000000000
RedoTargetPointLsn    : 0
RedoTargetPointGuid   : 00000000-0000-0000-0000-000000000000
RestoreDiffBaseLsn    : 0
RestoreDiffBaseGuid   : 00000000-0000-0000-0000-000000000000
RestorePathOriginLsn  : 0
RestorePathOriginGuid : 00000000-0000-0000-0000-000000000000
OldestRestoredLsn     : 0


Lots of interesting stuff in here, such as:




  • BindingId: used to make sure a file is really part of this database


  • SectorSize: the disk sector size


  • Status: what kind of file and what state is it in (e.g. 2 = regular disk file)


  • Various sizes in number-of-8kb-pages (e.g. MaxSize of -1 means file growth is unlimited)


  • Growth: the number of pages to grow the file by if the 0x100000 bit is NOT set in the Status field. If it is set, the Growth is in percent.

And you can watch things change. For instance, if I change the file growth to 10%:



ALTER DATABASE FileHeaderTest MODIFY FILE (NAME = FileHeaderTest, FILEGROWTH = 10%);
GO


And then dump the file header page contents again, the Status and Growth fields have changed to:



.
.
Growth                : 10
.
.
Status                : 1048578
.
.


So what if a file header page is corrupt? I corrupted the file header page of my database and then started up SQL Server.



USE FileHeaderTest;
GO


Msg 945, Level 14, State 2, Line 1
Database ‘FileHeaderTest’ cannot be opened due to inaccessible files or insufficient memory or disk space. See the SQL Server errorlog for details.


Let’s try EMERGENCY mode:



ALTER DATABASE FileHeaderTest SET EMERGENCY;
GO


Msg 5172, Level 16, State 15, Line 1
The header for file ‘C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\FileHeaderTest.mdf’ is not a valid database file header. The PageAudit property is incorrect.


In this case, there’s nothing to do except restore from backups, and the database is inaccessible because the PRIMARY filegroup could not be brought online. If the corruption were in a file in a secondary filegroup, things are a little bit different. Now I’ve added a secondary filegroup with a single file (called CorruptFile) and corrupted it’s file header page. After starting up SQL Server we get the same behavior – but this time we can set the file to be offline and access the rest of the database. This is called partial database availability and works in Enterprise (and Developer) Edition only.



ALTER DATABASE FileHeaderTest MODIFY FILE (NAME = CorruptFile, OFFLINE);
GO


Note that the only way to bring an offline file back online is to restore it from a backup – see this post from my old Storage Engine blog for more details. Another corruption that can only be repaired using backups…


 

TechEd Online Panel Video: Building a Solid High-Availability Strategy


While we were at TechEd in June, Kimberly and I participated in an hour-long discussion panel (Leveraging SQL Server Technologies to Build a Solid High-Availability Strategy) that was video-taped by the TechEd Online folks. It’s now been edited and is available for download/viewing. We cover everything from requirements analysis to technology details in SQL Server 2008. The other panel members were Satya Jayanty, Allan Hirt, Kevin Farlee, and Amit Bansal.


You can browse the various online videos at http://technet.microsoft.com/en-us/events/teched/cc561184.aspx or go straight to the panel discussion video here.


Enjoy!

Search Engine Q&A #20: Boot pages, and boot page corruption

 

Now that I’ve done all the business-related blog posts, back to the good stuff to stop people complaining!

Something that’s cropped up a few times over the summer so far is people trying to repair boot page corruptions.

First off, what’s a boot page? Every database has a single page that stores critical information about the database itself. It’s always page 9 in file 1 (the first file in the PRIMARY filegroup). You can examine the page using DBCC PAGE and it will interpret all the fields for you, but there’s another command, DBCC DBINFO, that also dumps all this info (in fact the DBCC PAGE code calls the same underlying dumping code). This command is undocumented and unsupported but widely known and ‘documented’ in lots of places on the web – given that it uses the same code as DBCC PAGE, it’s just as safe to use IMHO.

Note that the output of DBCC DBINFO and DBCC PAGE of the boot page may change from release to release!

So what’s on the boot page?

 

DBCC DBINFO (‘BootPageTest’);

GO

 

DBINFO STRUCTURE:

DBINFO @0x5BF6EF84

dbi_dbid = 19                        dbi_status = 65536                   dbi_nextid = 2073058421

dbi_dbname = BootPageTest            dbi_maxDbTimestamp = 2000            dbi_version = 611

dbi_createVersion = 611              dbi_ESVersion = 0

dbi_nextseqnum = 1900-01-01 00:00:00.000                                  dbi_crdate = 2008-07-10 15:53:18.843

dbi_filegeneration = 0

dbi_checkptLSN

m_fSeqNo = 41                        m_blockOffset = 29                   m_slotId = 55

dbi_RebuildLogs = 0                  dbi_dbccFlags = 2

dbi_dbccLastKnownGood = 1900-01-01 00:00:00.000

dbi_dbbackupLSN

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0

dbi_oldestBackupXactLSN

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0

dbi_LastLogBackupTime = 1900-01-01 00:00:00.000

dbi_differentialBaseLSN

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0

dbi_createIndexLSN

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0

dbi_versionChangeLSN

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0

dbi_familyGUID = a4e88c13-b4cf-4320-834e-92b237244d4b

dbi_recoveryForkNameStack

entry 0

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0

m_guid = a4e88c13-b4cf-4320-834e-92b237244d4b                            

entry 1

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0

m_guid = 00000000-0000-0000-0000-000000000000

dbi_differentialBaseGuid = 00000000-0000-0000-0000-000000000000           dbi_firstSysIndexes = 0001:00000014

dbi_collation = 872468488            dbi_category = 0                     dbi_maxLogSpaceUsed = 231936

dbi_localState = 0                   dbi_roleSequence = 0

dbi_failoverLsn

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0

dbi_dbmRedoLsn

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0

dbi_dbmOldestXactLsn

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0

dbi_dbMirrorId = 00000000-0000-0000-0000-000000000000

dbi_pageUndoLsn

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0

dbi_disabledSequence = 0

dbi_dvSplitPoint

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0

dbi_CloneCpuCount = 0                dbi_CloneMemorySize = 0

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

There’s all kinds on interesting things in there, for instance:

  • dbi_version and dbi_createversion: the physical version number of the database (and when it was created). See question 1 in the August 2008 SQL Q&A column in TechNet Magazine for an explanation (see here).
  • dbi_RebuildLogs: a count of the number of times the transaction log has been rebuilt for the database. PSS can use this to tell whether corruption problems could have been caused by DBAs rebuilding the log
  • dbi_dbccLastKnownGood: the completion time of the last ‘clean’ run of DBCC CHECKDB
  • a bunch of different LSNs related to checkpoint, backups, database mirroring
  • dbi_LastLogBackupTime: self-explanatory
  • dbi_differentialBaseGuid: the GUID generated by the last full database backup. Differential backups can only be restored on top of a matching full backup – so an out-of-band full backup could screw-up your disaster recovery – see this blog post for more info.

Now, what about if this page is corrupt in some way? I corrupted the BootPageTest database to have a corrupt boot page. Let’s see what happens:

USE BootPagetest;

GO

Msg 913, Level 16, State 4, Line 1

Could not find database ID 19. Database may not be activated yet or may be in transition. Reissue the query once the database is available. If you do not think this error is due to a database that is transitioning its state and this error continues to occur, contact your primary support provider. Please have available for review the Microsoft SQL Server error log and any additional information relevant to the circumstances when the error occurred.

 

Okay – let’s try setting the database into EMERGENCY mode:

 

ALTER DATABASE BootPageTest SET EMERGENCY;

GO

Msg 824, Level 24, State 2, Line 1

SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0xcdee22fa; actual: 0xcb6ea2fa). It occurred during a read of page (1:9) in database ID 19 at offset 0x00000000012000 in file ‘C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\BootPageTest.mdf’. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

 

Hmm. What about running DBCC CHECKDB?

 

DBCC CHECKDB (‘BootPageTest’) WITH NO_INFOMSGS, ALL_ERRORMSGS;

GO

 

Msg 922, Level 14, State 1, Line 1

Database ‘BootPageTest’ is being recovered. Waiting until recovery is finished.

 

It’s not looking good. Obviously the change to EMERGENCY mode couldn’t complete properly. What’s the database status?

 

SELECT [state_desc] FROM sys.databases WHERE [name] = ‘BootPageTest’;

GO

state_desc

————————————————————

RECOVERY_PENDING

 

The boot page is inaccessible so in effect the database is inaccessible too, this is what the database state means in this case.

So what does this mean? If the boot page is corrupt, you can’t run DBCC CHECKDB so you can’t possibly run repair, and you can’t put the database into EMERGENCY mode so you can’t extract data into a new database. It means that there’s NO WAY to recover from a corrupt boot page EXCEPT to restore from backups. One more reason to have backups… [Edit 2015: You can swap out the entire boot page using a hex editor with one from a restored backup.]