As Kimberly blogged about earlier this year, SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.
At the weekend Kimberly and I attended our first ever SQLSaturday (in Dublin) and as well as a workshop on wait stats, I presented a session on Advanced Data Recovery Techniques. The contents of that session are way too advanced for a 101-level post (you can watch a video of it from the PASS Summit 2014 here if you’re interested) but one of the things I stressed at the start was that practicing disaster recovery techniques is crucial for success when a disaster happens for real.
It doesn’t matter how experienced you are with SQL Server, if you’re responsible for a SQL Server instance, you have to know the basics of how to recover when a disaster strikes (I touched on that earlier in the series in the post SQLskills SQL101: Dealing with SQL Server corruption) and you have to have practiced.
In this post I want to pose a short (by no means exhaustive) series of questions to you about what practicing you have (or haven’t) done, and make you think about an honest answer to each one.
Part 1: Information
- Do you know where the latest copy of the disaster recovery handbook/run book is? Does everyone else know? (Here’s an example template.)
- Do you know where the scripts are for automating restores of your backups?
- Do you know where the installation media for Windows and SQL Server are kept in your environment?
- Do you know where the Windows and SQL Server product keys are?
- Do you know how you’ll be able to get new servers if your data center is destroyed?
- And do you know where they will be installed? What about network? Power? HVAC?
- Do you know who to call when/if you get stuck during the disaster recovery process?
- Do you know the priority order for restoring databases/instances in your environment?
- Do you know where the various SQL Server passwords and encryption keys are stored?
Part 2: Techniques
- When was the last time you performed a full restore sequence, including tail-of-the-log backups?
- When was the last time you performed a failover to your secondary servers/data center?
- When was the last time you performed a bare-metal install?
- When was the last time you rebuilt or restored master on a server?
- When was the last time you practiced a recovery as if your main server was completely dead? (And did you successfully recover?)
- When was the last time you practiced a recovery as if your main SAN was completely dead? (And did you successfully recover?)
- When was the last time you practiced a recovery as if you didn’t have onsite backups and the SAN was dead? (And did you successfully recover?)
Think through the answers to the questions above and consider whether you’re comfortable with your responses. Now,think whether you’d be comfortable if someone responsible for some of your data (e.g. your bank, 401-k/retirement account holder, doctor’s office, favorite airline, credit-card companies) made those same answers about their disaster-recovery preparedness. My feeling is that you should be able to answer ‘yes’ for all the Part 1 questions, and answer ‘within the last 3 months’ for all the Part 2 questions, to feel fully comfortable that you’re practicing enough.