Why PFS pages cannot be repaired

Last week there was a short discussion on Twitter about why PFS pages (damaged header, not individual PFS bytes) can’t be repaired (prompted by a #sqlhelp question about why they can’t be single-page restored, like other per-database allocation bitmaps). Just for the record, they can’t be fixed by automatic page repair in a mirror or AG either.

PFS pages occur every 8088 pages in every data file and store a byte of information about itself and the following 8087 pages. The most important piece of information it stores is whether a page is allocated (in use) or not. You can read more about PFS pages and the other per-database allocation bitmaps in this blog post.

So why can’t they be rebuilt by DBCC CHECKDB, when all the other per-database allocation bitmaps can?

The answer is that the is-this-page-allocated-or-not information is not duplicated anywhere else in the database, and it’s impossible to reconstruct it in all cases.

You might think that DBCC CHECKDB could work out which pages are allocated by inferring that state if a page is linked to in some way from another page that’s known to be allocated, and it could do that, except for the case of a heap with no nonclustered indexes.

If a heap has no nonclustered indexes, there is no other structure in the database that links to any of the data pages in the heap. Therefore, without the information in a PFS page, there’s no way to tell which of the pages contained in extents allocated to the heap are actually allocated or not. This is because SQL Server does not touch a page when it is deallocated, so there’s nothing on a page that indicates whether a page is currently allocated or has been deallocated.

So what if DBCC CHECKDB can tell that there are no such cases in the database?

The answer is that the algorithm to rebuild a PFS page given the links from other pages to pages covered by that PFS range is extremely complicated and would involve searching through the entire database, reading and processing all pages a second time *after* they’ve been repaired, looking for linkages to pages in the broken range. While it sounds technically possible, when scoping out writing such an algorithm back in 2001-2002, I quickly ran into run-time and complexity challenges that made the work entirely infeasible.

It’s also not possible to just mark all the pages allocated – because then allocation-order scans would come across potentially unformatted pages and fail. It would also break backups that use WITH CHECKSUM. Such an algorithm could be made to work (in the absence of heaps with no nonclustered indexes), but has the same problems as the algorithm above.

So – PFS pages can’t be repaired, and unless the database structure changes to mirror that allocation information in some way, I don’t see that changing at any point in the future.

Hope you found this interesting!

34 thoughts on “Why PFS pages cannot be repaired”

Brent Ozar says:

March 6, 2017 at 4:58 am

Cool! Do they get repaired by automatic page repair in database mirroring or AGs?

Reply
1. Alex Friedman says:
  
  March 6, 2017 at 9:11 am
  
  +1 for this question, very interesting
  
  Reply
2. Paul Randal says:
  
  March 6, 2017 at 9:19 am
  
  No – control pages can’t be single-page restored, repaired, or automatic-page-repaired.
  
  Reply
Henrik Staun Poulsen says:

March 6, 2017 at 6:28 am

What would happen if the PFS page was written as if all pages were allocated?

Then between 0 and 8087 pages would be un-usable, as SQL Server could not overwrite pages that were actually not in-use.
But the database would be up-and-running.

Reply
1. Paul Randal says:
  
  March 6, 2017 at 9:20 am
  
  No – that doesn’t work either as then allocation order scans (e.g. select in a heap with no indexes) would come across unformatted pages and fail.
  
  Reply
  1. Henrik Staun Poulsen says:
    
    March 7, 2017 at 4:27 am
    
    Then what would happen if the PFS page was written as if all FORMATTED pages were allocated, and UNformatted pages not allocated?
    
    Reply
    1. Paul Randal says:
      
      March 7, 2017 at 7:58 am
      
      Firstly it’s difficult to tell what’s formatted vs. unformatted. A page could be from a previous use of that portion of the disk, with an old SQL Server data file page that happens to have the same allocation unit ID as an allocation unit in the current database – so that would break things. Secondly, it could be a deallocated page, so effectively reallocating it again would add data to the table/index, causing wrong results from a query – a much worse sin than having a query fail from encountering corruption.
      
      Reply
      1. Henrik Staun Poulsen says:
        
        March 7, 2017 at 11:59 pm
        
        Then we’ll have to do a RESTORE DATABASE. Bummer.
Alain Krikilion says:

March 7, 2017 at 12:10 am

So that means the only fix is a full restore?

Reply
1. Paul Randal says:
  
  March 7, 2017 at 7:56 am
  
  Yes – a full restore of at least the file containing the broken PFS, plus log backups to roll the file forward to the same point as the rest of the database.
  
  Reply
Armando Lacerda says:

March 13, 2017 at 11:46 am

To be clear: we are talking about a single page (PFS) that got corrupted somehow (hardware failure?) and can’t be restored by itself.

Therefore, considering the cause was a hardware failure and would not be replicated at a AG copy, I could :
— switch the primary replica to one of the secondary replicas
— backup the database from one of the healthy replicas
— restore the database the original server after correcting the physical issue
— switch the primary replica back to the original server
— Life will be good again. :)

Is this right?

Reply
1. Paul Randal says:
  
  March 13, 2017 at 11:59 am
  
  You could – however a much faster way would be to just correct the issue, restore the affected file from your existing backups and roll forward log backups. No need to take another full backup, as you should already have the necessary backups available.
  
  Reply
  1. Nchax says:
    
    June 5, 2018 at 12:37 am
    
    So I restored from backup, rolled forward log backups and corruption still there. Also checked my replicas and they have corruption as well. What else can I try?
    
    Reply
    1. Paul Randal says:
      
      June 5, 2018 at 12:54 am
      
      That says the corruption was in the database before the replicas were initialized, or something was corrupt in the log, which was then backed up and sent to the replicas.
      
      What corruption are you describing here?
      
      Reply
      1. Nchax says:
        
        June 5, 2018 at 1:25 am
        
        PFS corruption, database has been functioning well for over two weeks and started having issues today and find it strange that replicas are also corrupt.
        
        DBCC results for ‘MyDatabase’.
        Msg 8946, Level 16, State 2, Line 1
        Table error: Allocation page (9:16176) has invalid PFS_PAGE page header values. Type is 2. Check type, alloc unit ID and page ID on the page.
        Msg 8998, Level 16, State 2, Line 1
        Page errors on the GAM, SGAM, or PFS pages prevent allocation integrity checks in database ID 9 pages from (9:16176) to (9:24263). See other errors for cause.
BhavyaArora says:

June 5, 2017 at 8:10 am

HI Paul,

if Instant File Initialization is enabled, then it is possibility of data corruption as PFS pages are not zero initialized
OR
Allocation pages are always zero initialized whether IFI is enabled or not to avoid corruption.

Regards

Reply
1. Paul Randal says:
  
  June 5, 2017 at 8:35 am
  
  Allocation bitmaps are formatted in memory with the correct bitmap bits set and written to disk as an 8k unbuffered write, there’s no zero initialization of that space in the data file. Instant file initialization is orthogonal to PFS page corruption.
  
  Reply
Michal Stanek says:

June 9, 2017 at 3:29 am

SQL 2016 – PFS error in myDB

First run:
DBCC CHECKALLOC (myDB,REPAIR_ALLOW_DATA_LOSS)
DBCC results for ‘myDB’.
Database error: Page (1:303905) is marked with the wrong type in PFS page (1:299256). PFS status 0x40 expected 0x60.
The error has been repaired.
CHECKALLOC found 1 allocation errors and 0 consistency errors in database ‘myDB’.

Second run:
DBCC CHECKALLOC (myDB,REPAIR_ALLOW_DATA_LOSS)
Repair: The page (1:303905) has been deallocated from object ID 0, index ID -1, partition ID 0, alloc unit ID 72058150719717376 (type Unknown).
Page (1:303905) in database ID 10 is allocated in the SGAM (1:3) and PFS (1:299256), but was not allocated in any IAM. PFS flags ‘MIXED_EXT ALLOCATED 0_PCT_FULL’.
The error has been repaired.
CHECKALLOC found 1 allocation errors and 0 consistency errors in database ‘myDB’.

Third run:
DBCC CHECKALLOC (myDB)
CHECKALLOC found 0 allocation errors and 0 consistency errors in database ‘myDB’.

So, some PFS errors can be repaired ?
In our case repair deleted one unused page, that didn’t belong to any object. Error was caused by disk full error. There were no heap tables in myDB.

Regards

Reply
1. Paul Randal says:
  
  June 9, 2017 at 8:34 am
  
  Individual errors in a PFS byte can be repaired, but not the page itself if the header is damaged.
  
  Reply
Kenneth Igiri says:

August 9, 2017 at 4:03 am

Looks like this is what i am facing with my tempdb (SQL 2016 SP1 CU3.

I get :

Message
SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 9:63214; actual 0:0). It occurred during a read of page (9:63214) in database ID 2 at offset 0x0000001eddc000 in file ‘T:\MSSQL\Temp\temp8.ndf’. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

I run DBCC CHECKDB and there are no errors

I drop the file then I get:

Message
Could not open File Control Block (FCB) for invalid file ID 9 in database ‘tempdb’. Verify the file location. Execute DBCC CHECKDB.

Wieredest thing I have see so far in SQL Server

Reply
1. Paul Randal says:
  
  August 9, 2017 at 6:49 am
  
  DBCC CHECKDB can’t run allocation checks on tempdb (and hasn’t since 2000) so it won’t ever find PFS corruption errors. It can’t do this because a database snapshot can’t be created on tempdb, and when a snapshot can’t be created, and X database lock is the only way to run allocation checks, and tempdb can’t be exclusively locked.
  
  However, that page ID isn’t a PFS page (the page ID has to be exactly divisible by 8088 for it to be a PFS page).
  
  Did you just delete the file or did you do an ALTER DATABASE … REMOVE FILE? You shouldn’t get that error message if you dropped the file properly.
  
  Reply
ZeroPerfect says:

October 26, 2017 at 8:13 pm

hello,Paul!
I have some questions about PFS page.
1).Page 16 is the first page of sys.sysallocunits table, but in the PFS page, its state is 0x60(means MIXED_EXT ALLOCATED 0_PCT_FULL) I do not know why it is 0_PCT_FULL?? that means this page is empty? but actually ,page 16 contains 49 slot cnt.
2).If my database’s PFS page is corrupt(such as become all 8k zero), I can copy a good PFS page from another MDF or somewhere else, and make its pageid,fileid correct, and then full the state byte with 0x44(means all the page was used), I know it will work on the index table ,and will not work on the heap, and now what will happen if I do a DBCC CHECKDB? I think maybe it will fixed the problem to the heap.

Reply
1. Paul Randal says:
  
  October 26, 2017 at 8:33 pm
  
  1) Because it’s a clustered index data page, and page free space in the PFS isn’t tracked for pages in a index
  2) No, you can’t do that. There is no algorithm that fixes PFS bytes correctly, otherwise I would have implemented it in the repair code for 2005.
  
  Reply
  1. ZeroPerfect says:
    
    October 26, 2017 at 8:53 pm
    
    That means ,in all the clustered index page, in PFS state byte’s low 4 bits is always 0_PCT_FULL, only high 4 bits is meanningful
    
    Reply
  2. ZeroPerfect says:
    
    October 27, 2017 at 1:21 am
    
    you said “If a heap has no nonclustered indexes, there is no other structure in the database that links to any of the data pages in the heap. Therefore, without the information in a PFS page, there’s no way to tell which of the pages contained in extents allocated to the heap are actually allocated or not. This is because SQL Server does not touch a page when it is deallocated, so there’s nothing on a page that indicates whether a page is currently allocated or has been deallocated.“
    
    for the above sentence, I have some questions below:
    1. Does the IAM page of the heap can tell which extent is belong to the heap? Because to a heap, SQL Server scan its data page through the IAM page.
    2. I make all the PFS allocate， but the IAM page can tell which extent is belongs to the heap? so SQL Server will not read the page which is not in the IAM bitmap?
    
    Reply
    1. Paul Randal says:
      
      October 27, 2017 at 9:59 am
      
      1) Yes, but the scan still needs the PFS page to tell which pages in an extent are actually allocated.
      2) No – heap scan uses the IAM to identify allocated extents, and then PFS to identify allocated pages.
      
      There is no algorithm which can correctly fix a PFS page that covers heap data pages.
      
      Reply
      1. ZeroPerfect says:
        
        October 27, 2017 at 6:41 pm
        
        ok~thank you!
        one more question~
        what about the case that all the table is clustered index table? I ask you this question because in a case I make all the corrupt PFS all allocate, and then DBCC CHECKTABLE to that corrupt table, then the table become ok! can you tell me why? or help me
Paul Randal says:

October 27, 2017 at 10:09 pm

Because that’s a pathological case where it’s possible to infer what’s allocated if the entire PFS interval is taken up by pages from indexes. DBCC CHECKDB doesn’t do that case as it’s not always possible to deterministically infer that everything in a PFS interval is from just indexes.

Reply
1. ZeroPerfect says:
  
  October 28, 2017 at 12:52 am
  
  thank you for your reply!
  Is there any article abot DBCC? I mean the detail about DBCC,such as how to check the metadata and data.
  
  Reply
  1. Paul Randal says:
    
    October 28, 2017 at 5:02 am
    
    https://www.sqlskills.com/blogs/paul/category/checkdb-from-every-angle/
    
    Reply
Michell says:

July 14, 2018 at 2:09 am

Paul,

You are always great

If a heap has no nonclustered indexes, there is no other structure in the database that links to any of the data pages in the heap.

Then how Heap pages are linked. I believed earlier it’s IAM pages that link together all heaps (+ Indexes) From that we can construed page allocated or not.

Regards

Reply
1. Paul Randal says:
  
  July 14, 2018 at 5:29 am
  
  The IAM pages list the extents allocated to the heap, and the PFS pages show which pages within the extents are allocated.
  
  Reply
Brian Schwind says:

August 6, 2018 at 2:55 pm

We’re getting PFS errors when restoring a SQL 2014 backup to SQL 2016. Is there a step we are missing in the backup restore that causes this or is this a known issue when migrating a 2014 database to 2016?

So after receiving “Page x is marked with the wrong type in PFS page x” when restoring a vendor provided SQL 2014 database to our SQL 2016 server, we decided to try and see if we restored it to 2014 if the errors would go away. They did.

We restored that vendor provided SQL 2014 database to a SQL 2014 server without issue. We ran the dbcc checks and it came out clean. No errors.

DBCC results for ‘DBNameRemoved_0806’.
CHECKDB found 0 allocation errors and 0 consistency errors in database ‘DBNameRemoved_0806’.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

We then made a backup of that same clean database, restored it on SQL 2016, and got the error again. Something about the conversion to SQL 2016 seems to be having issue.

DBCC results for ‘DBNameRemoved_0806’.
Msg 8948, Level 16, State 6, Line 1
Database error: Page (1:22289) is marked with the wrong type in PFS page (1:16176). PFS status 0x40 expected 0x60.
Msg 8948, Level 16, State 6, Line 1
Database error: Page (1:22290) is marked with the wrong type in PFS page (1:16176). PFS status 0x40 expected 0x60.
Msg 8948, Level 16, State 6, Line 1
Database error: Page (1:22291) is marked with the wrong type in PFS page (1:16176). PFS status 0x40 expected 0x60.

One thing I did notice is when I restored the original copy it was in SQL 2008 compatibility mode. Not sure that is part of the issue but I thought it odd and worth mentioning. Any ideas?

Reply
1. Paul Randal says:
  
  August 14, 2018 at 3:44 am
  
  Nothing to do with compatibility mode, but I’ve heard of this scenario happening so it must by a problem with the 2016 upgrade. I hesitate to say bug, as I don’t know what the cause is.
  
  Reply

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.

Why PFS pages cannot be repaired

34 thoughts on “Why PFS pages cannot be repaired”

Leave a Reply Cancel reply

Other articles

2024: the year in books

2023 (not a typo!): the year in books

Black Friday Super Sale!

Summer School Savings SALE is live!

The Curious Case of… missing log information

The Curious Case of… missing 832 errors during memory corruption

Imagine feeling confident enough to handle whatever your database throws at you.