Disaster recovery 101: fixing a broken system table page

This post is about a disaster-recovery scenario I described in our bi-weekly newsletter a couple of weeks ago, and wanted to make sure it’s out on the web too for people to find and use.

I was helping someone try to recover data from a corrupt database, from an online forum question. They did not have any up-to-date backups without the corruption in, so fixing their backup strategy was a piece of advice they were given by a few people.

The output from DBCC CHECKDB on the database was:

Msg 8921, Level 16, State 1, Line 1
Check terminated. A failure was detected while collecting facts. Possibly tempdb out of space or a system table is inconsistent. Check previous errors.
Msg 824, Level 24, State 2, Line 1
SQL Server detected a logical consistency-based I/O error: torn page (expected signature: 0x0; actual signature: 0x5555300). It occurred during a read of page (1:58) in database ID 10 at offset 0x00000000074000 in file ‘D:\dbname.mdf:MSSQL_DBCC10’. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

They’d tried running repair, but of course if DBCC CHECKDB says that it has to stop (i.e. error message 8921), then it can’t run repair.

I explained this, and how page 1:58 is a system table page and unrepairable, and so they’d have to script out as much of the database schema as possible, create a new database, and extract as much data as possible from the broken database.

I also explained that the page is part of the sys.syscolpars table, which is the equivalent of the old syscolumns system table, so that approach might not work if the corruption was such that it stopped the Query Processor from being able to use the table metadata.

Unfortunately my suspicions were correct, and the script/extract approach did indeed fail.

On a whim, I suggested trying something radical. A few years ago I blogged about a way to ‘fix’ broken boot pages using a hex editor to overwrite a broken boot page with one from an older copy of the database (see here) and demonstrated it at various conferences. I’d never tried it on a system table page before, but I figured that the page ID was low enough that the page likely hadn’t changed for a while.

What do I mean by that? Well, the sys.syscolpars clustered index is ordered by object ID, so the first few pages in the clustered index (of which page 1:58 is one), have the columns from the system tables, with very low object IDs. There’s never going to be the case where a new user table gets created and causes an insert into one of these low tables.

This means that an older backup of the database would have the current state of page 1:58 in it. So I suggested using the boot page hack on page 1:58 from the person’s older backup.

And it worked!

Luckily there wasn’t any other corruption in the database, so all the person had to do was root-cause analysis and remediation, and fixing the backup strategy so the situation wouldn’t arise in future.

Summary: In a disaster situation, when backups aren’t available; don’t be afraid to try something radical. As long as you try it on a copy of the database, it’s not as if you can make the situation any worse. And if you’re lucky, you’ll be able to make the situation a lot better.

Summer 2018 classes in London open for registration

Due to popular demand, we’re coming back to London in 2018 and I’ve just released our classes for registration!

All classes have discounts for registering before the end of 2017! (details on the individual class web pages…)

Our classes in September in London will be:

  • IEPTO1: Immersion Event on Performance Tuning and Optimization – Part 1
    • September 10-14
  • IEAzure: Immersion Event on Azure SQL Database and Azure VMs
    • September 10-11
  • IECAG: Immersion Event on Clustering and Availability Groups
    • September 12-13
  • IEPTO2: Immersion Event on Performance Tuning and Optimization – Part 2
    • September 17-21

You can get all the logistical, registration, and curriculum details by drilling down from our main schedule page.

We hope to see you there!

TSQL Tuesday #96: Folks Who Have Made a Difference

It’s been almost three years since I wrote a T-SQL Tuesday post (shame on me!), but this is one I definitely want to contribute to. It’s hosted by Ewald Cress and is about “the opportunity to give a shout-out to people (well-known or otherwise) who have made a meaningful contribution to your life in the world of data.”

There are three people I want to call out, in the order that they came into my life and helped me out.

Firstly, Dave Campbell, who left Microsoft as a Technical Fellow last year after 22 years in Microsoft’s world of data. When I joined the SQL Server team from DEC in 1999, Dave had already been there 5 years and was the Development Lead of the Access Methods team in the Storage Engine. Dave has always been a brilliant engineer, a calm and insightful manager, and a willing mentor. He taught me a lot about engineering, managing crises, and being a manager. I was amazed in late 2003 to find myself becoming the Dev Lead of the Access Methods team and stepping into his shoes.

I’m sad to say that over the years I’ve lost touch with Dave, but I’m forever grateful for the influence he had on my professional career.

Secondly, my great, great friend Bob Ward. I first met Bob a few months into my tenure at Microsoft and continued to meet and swap emails around Product Support matters but I didn’t start working closely with him until a few years later. Bob was the inspiration for me to want to help customers: to help them find why SQL Server was broken for them, to fix bugs, and to make sure that people in Product Support were saying and doing the right thing for customers. He inspired me because that was his passion, and his entire job. We’d spend many hours on the phone each week and through emails discussing things and sorting stuff out. This led me to champion adding an entire pillar to the new engineering process that came 2/3 through SQL Server 2005 development: supportability, making sure all facets of the SQL Server box could be understood and debugged by Product Support. This involved driving and coordinating all development teams to build and deliver training materials on how SQL Server worked, how to debug it, and how Product Support should approach it AND build into each area the tools, messages, and hooks to allow such investigations to be done.

Bob and I (and Bob’s lovely wife Ginger, plus Kimberly) continue to be close friends and we get together whenever we can (which is a lot more frequently now that Bob’s in the product group and up in Redmond regularly). Of all the people I met at Microsoft, Bob made the greatest contribution to who I am today by inspiring me to help people.

Thirdly, my wonderful wife Kimberly, who helped me develop my speaking skills and made me ‘less Paul’, as she puts it (learning humility, presenting with empathy, and removing a lot of the arrogance with which I left Microsoft). I’d just started presenting when I met Kimberly at TechEd 2006 in Boston and I had a *lot* to learn. I quickly adopted her style of presenting, which works for me. This involves going against one of the central things people are taught about presenting – few bullets with few words. We both (and all of SQLskills) have dense slides with lots of bullets. This is so that people can read the deck and tell what we’re talking about, rather than having pictures of kittens, trees, race-cars, whatever, which tell you nothing several months later. Some of you will disagree – each to their own. The central theme though is making sure that people have learned and understand how and why things are, not just what the answer is.

The other thing (among so many in my life since meeting her) that I want to thank Kimberly for here is for SQLskills. Kimberly’s been a successful business owner since the early 1990s and since she started SQLskills.com in 1995. It was incredibly cool that I could leave Microsoft in 2007 and walk straight into a thriving business with a stellar reputation and start teaching and consulting.

You’ll notice that I didn’t say ‘lastly’ above – I said ‘thirdly’. There are two more groups of people I want to give a shout out to.

Firstly, the incredibly-talented group that work with us at SQLskills (Erin, Glenn, Jon, Tim, and previously Joe Sack – another great friend). I continually learn new things from them and I’m sincerely thankful that they chose to work at SQLskills for so long (Jon for 6+ years, Erin and Glenn for 5+ years, and Tim for almost 3 years). They’re all experts in their specialties and immensely capable people, who keep me on my toes and who are all wonderful people and friends.

Lastly, and most importantly, the people who’ve had the most influence in my data world are the SQL Server community; my fellow MVPs, all the MCM community, everyone who’s come to a class, attended a session, read a blog post or article, watched a Pluralsight course, posted a question, or tweeted on #sqlhelp. A huge part of my personality is helping people understand SQL Server. It’s what drives me to blog, to answer random email questions, put together the waits library, teach, and more.

You’ve all helped shape me into the person I am today in the data world, and I thank you sincerely for it.