PFS corruption after upgrading from SQL Server 2014

I’m seeing reports from a few people of DBCC CHECKDB reporting PFS corruption after an upgrade from SQL Server 2014 to SQL Server 2016 or later. The symptoms are that you run DBCC CHECKDB after the upgrade and get output similar to this:

Msg 8948, Level 16, State 6, Line 5
Database error: Page (3:3863) is marked with the wrong type in PFS page (1:1). PFS status 0x40 expected 0x60.
Msg 8948, Level 16, State 6, Line 5
Database error: Page (3:3864) is marked with the wrong type in PFS page (1:1). PFS status 0x40 expected 0x60.
CHECKDB found 2 allocation errors and 0 consistency errors not associated with any single object.
CHECKDB found 2 allocation errors and 0 consistency errors in database 'MyProdDB'.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (MyProdDB).

I’ve discussed with the SQL Server team and this is a known bug in SQL Server 2014.

The problem can occur if an ALTER INDEX … REORGANIZE is performed in a transaction and then rolled back, one of the affected extents can have some of its pages marked with the wrong PFS status. This state is valid in SQL Server 2014, but if one of the upgrade steps happens to move one of these pages, DBCC CHECKDB on the new version will complain with the errors above.

Note: this is not a bug in DBCC CHECKDB :-)

The fix for this issue is to run DBCC CHECKDB (yourdb, REPAIR_ALLOW_DATA_LOSS) and that will fix the PFS state. From anecdotal evidence, you might need to run repair twice. Repair will simply fix the PFS status, not deallocate/delete anything.

If you experience this issue, the SQL Server team requests that you contact CSS so they know how many people are hitting the issue and they may ask for access to the database to aid with developing a fix.

I’ll update this post when I get more information – at present (9/26/18) there is no fix available apart from running repair.

 

New VLF status value

At least since I started working on the SQL Server team (just after 7.0 shipped) and since then there have only been two VLF status codes:

  • 0 = the VLF is not active (i.e. it can be (re)activated and overwritten)
  • (1 = not used and no-one seems to remember what it used to mean)
  • 2 = the VLF is active because at least one log record in it is ‘required’ by SQL Server for some reason (e.g. hasn’t been backed up by a log backup or scanned by replication)

A few weeks ago I learned about a new VLF status code that was added back in SQL Server 2012 but hasn’t come to light until recently (at least I’ve never encountered it in the wild). I went back-and-forth with a friend from Microsoft (Sean Gallardy, a PFE and MCM down in Tampa) who was able to dig around in the code to figure out when it’s used.

It can show up on an Availability Group secondary replica (only, not on the primary, and isn’t used in Database Mirroring):

  • 4 = the VLF exists on the primary replica but doesn’t really exist yet on this secondary replica

The main case where this status can happen is when a log file growth (or creation of an extra log file) has occurred on the primary replica but it hasn’t yet been replayed on the secondary replica. More log records are generated on the primary, written to a newly-created VLF, and sent across to the secondary replica to be hardened (written to the replica’s log file). If the secondary hasn’t yet replayed the log growth, the VLFs that should contain the just-received log records don’t exist yet, so they’re kind of temporarily created with a status of 4 so that the log records from the primary can be hardened successfully on the secondary. Eventually the log growth is replayed on the secondary and the temporary VLFs are fixed up correctly and change to a status of 2.

It makes perfect sense that a scheme like this exists, but I’d never really thought about this case or experimented to see what happens.

Anyway, now you know what status 4 means if you ever see it (and thanks Sean!)

New live online training class in October: Fixing Slow Queries, Inefficient Code, and Caching/Statistics Problems

Continuing our series of live, online classes, Erin, Jonathan, and Kimberly will be delivering their new IEQUERY: Immersion Event on Fixing Slow Queries, Inefficient Code, and Caching/Statistics Problems in October! The class will be delivered live via WebEx on October 23-25 (roughly 12-13 hours of content including Q&As; about the same as two full workshop days!) and the attendees will have lifetime access to the recordings following the end of the class.

Rather than have people try to watch a full day of training at their computer for one of more days, the class will run from 10am to 3pm PST each day, with two 90-minute teaching sessions, each followed by Q&A, and a lunch break. We chose to do this, and to spread the class over a few days, so the times work pretty well for those in the Americas, Africa, and Europe. We also realize that this is complex content, so want to give attendees time to digest each day’s material, plus extensive Q&A.

Here are some select quotes from prior attendees of Erin’s/Jon’s/Kimberly’s online classes:

  • “Extremely pleased with the course. FAR exceeded my expectations.”
  • “Well worth the time and expense to attend. Would highly recommend this to others.”
  • “Great course – very informative – very great instructors – I am sure to be back!”
  • “Great course. Good new info for me, plus refresher on other info. Thanks!”
  • “Both Erin and Jon have a vast knowledge of not only SQL Server & tools, but also effective presentation.”
  • “Thanks for taking the time to better my knowledge of SQL and allow me to better my career.”
  • “Kimberly is incredibly knowledgeable and was able to adapt the techniques to all the different scenarios presented to her.”
  • “Great course. I could hear clearly, the content was relevant to current day problems, and provided clear instruction.”
  • “Loved the online aspect. It felt like I was there with the question ability and having the questions just answered right there. I felt I had a voice and could ask anything and the ability to watch it later made it totally worth the registration.”
  • “I really enjoyed the ability to ask questions as the course went along so that I didn’t forget what I wanted to ask while you were teaching. This allowed for questions to come through and class to continue until a good stopping point to answer the questions. Plus having the questions written from other attendees was nice for future reference instead of trying to remember from an in-person class discussion.”

The class is split into three parts, with each part taught by a different instructor:

  • Part 1/Day 1: Capturing Query Information and Analyzing Plans (presented by Erin Stellato)
    • Baselining options and considerations
    • Sources of query performance data (e.g. DMVs, Extended Events or Trace)
    • Capturing and comparing execution plans
    • Finding essential information in a plan
    • Misleading information in a plan
    • Common operators
    • Operators and memory use
    • Predicates and filters
    • Parallelism in plans
  • Part 2/Day 2: Removing Anti-Patterns in Transact-SQL (presented by Jonathan Kehayias)
    • Set based concepts for developers
    • Design considerations that affect performance
    • Reducing/eliminating row-by-row processing
      • CURSORs and WHILE Loops, scalar UDFs, TVFs
    • Understanding Sargability and eliminating index scans in code
    • Profiling during development and testing properly
  • Part 3/Day 3: How to Differentiate Caching / Statistics problems and SOLVE THEM! (presented by Kimberly L. Tripp)
    • Troubleshooting Statement Execution and Caching
      • Different ways to execute statements
      • Some statements can be cached for reuse
      • Statement auto-parameterization
      • Dynamic string execution
      • sp_executesql
      • Stored procedures
      • Literals, variables, and parameters
      • The life of a plan in cache
      • Plan cache limits
      • Bringing it all together
    • Troubleshooting Plan Problems Related to Statistics (not Caching)
      • Statement selectivity
      • What kinds of statistics exist
      • How does SQL Server use statistics
      • Creating additional statistics
      • Updating statistics

The price of the class is US$699 (or US$599 for prior attendees of any SQL Server class).

You can get all the details here.

We decided to start teaching some live, online classes as we recognize that not everyone can travel to our in-person classes, or take that time away from work or family, or simply have travel budget as well as training budget. People also have different ways they learn, some preferring in-person training, some preferring recorded, online training, and some preferring live, online training.

We’ll be doing more of these so stay tuned for updates (and discounts through the newsletter).

We hope you can join us!