Corruption survival techniques – useless?

Now, I’m very thick-skinned and I know there are always some people in a conference session who don’t agree with everything I say (that’s human nature, and I’m totally cool with that) but this one I just couldn’t pass up mentioning here on the blog as I *utterly* disagree with the advice in that post, and suspect that the poster didn’t “get” what I was trying to explain in the session.

I came across an interesting blog post from someone who attended PASS, describing my Corruption Survival Techniques session as really interesting and fun, but basically useless. The advice was that there are only a handful of people in the world who can run things like single-page restore and emergency mode repair, and as soon as corruption is suspected, the DBA should just call Product Support for help.

The point of my session is to explain two things – that you should pro-actively be looking for corruption, and you should know what to do when corruption occurs. Both of these enable your business to experience less down-time and data-loss when corruption does occur. So turning on page checksums and running DBCC CHECKDB regularly are easy. So is planning a decent backup strategy (based on what you want to be able to restore – see my previous post on this – Planning a backup strategy – where to start?).

The more tricky part is knowing what to do when corruption does occur. That’s why I discuss some of the output of DBCC CHECKDB, in terms of high-level tips and tricks rather than what each and every error means (see my previous post on this – Tips and tricks for interpreting CHECKDB output). I also recommend backups as the best way to limit data-loss, but not necessarily down-time – depending on the backups you have available. The last part of the session shows some tricks for getting around worst-case scenarios, like someone detaching a suspect database or needing to run emergency mode repair. I don’t expect everyone to run off and start hacking the 2005 system tables with a single-user booted server and using the DAC (but if you do, see this post) but having some of this knowledge can make DBAs more confident to tackle problems themselves and increase their skills.

Since I’ve been blogging about this stuff and presenting it at conferences, I’ve heard from *countless* people who’ve used these techniques themselves to recover from disasters, and learned a ton of information and good practices in the process. Any production DBA with half a brain (a great Scottish expression :-) should be able to use restore, single-page restore, or run a repair – otherwise, with all due respect, they shouldn’t be running a production system. Now, for “involuntary” DBAs, who (through no fault of their own) may not know anything about backups, restores, or repairs – it’s a totally different story, and help should be sought through Product Support or forums.

But to come out with a blanket statement that knowing how to run restores, repairs and do first-level interpretation of DBCC CHECKDB output is useless? And that potentially wasting time and money with front-line Product Support is the best course of action when corruption occurs, when you can work out most of it for yourself? That’s *bad advice* as far as I’m concerned.

Maybe I’m just cranky as I’m sitting here with a very sore mouth after getting a filling at the dentist this morning :-(

What do you think? Comments please!

(PS I’m not fishing for praise – I want to know what you think of the argument)

Paul and Kimberly TechNet Radio interview from PASS

While we were at PASS we hooked up with Eric Ostrowski from TechNet Radio to do an interview. Eric's compiled a bunch of interviews from PASS into a 36 minute broadcast which is now live. Our section starts about 26 minutes in and runs to the end (but if you have time its worth listening to the whole broadcast). We don't touch on anything technical but topics covered include:

  • Kimberly's sheep and Highland cow fetish from our Scotland trip
  • The SQL Server 2008 Internals book
  • The "MVP book" that's coming up
  • Our new partner company in Australia
  • My famous "Naked Tour" of Australia in 2007 (which leaves Eric speechless )

We're both going to be recording more in-depth interviews with Eric over the next month or so too.

The links for this interview are:


Life on SQL Server 2008 – PASS interview with a large-scale user

During the PASS Community Summit, Rick Heiges did an interview with Thomas Grohser, the Senior Database Engineer at the bwin Interactive Entertainment AG, the world's largest online gambling company. This is the private client Kimberly and I were onsite with for a week last month in Vienna before heading to Barcelona for TechEd. In the interview, Thomas reveals some of the cool throughput statistics of their workload and his early impressions of SQL Server 2008.

Check it out on the PASS site (you'll need to register to view it) here.

(And here are my previous posts about their backup timings, and my wanderings around Vienna (long with lots of pictures))