A few weeks ago I kicked off a survey asking what kind of disaster recovery guide/run-book/plan (I'll just call it a plan from now on) you have (see here for the survey). Here are the results as of 9/15/09:
Out of all these answers, IMHO the last answer is the only acceptable one for a production DBA responsible for recovering critical databases, to a defined RTO and RPO (recovery-time objective and recovery-point objective, respectively).
The problem is though, getting a wonderful and comprehensive disaster recovery plan together is waaay easier said than done. For a start, disaster recovery isn't a sexy topic with management UNTIL a disaster actually happens and the RTO and RPO are completely blown – so it's hard to justify the time and effort needed to put together a good plan. It's especially hard to put a plan together if you're an involuntary DBA, with no idea about what disasters could occur and what you'd do to recover from them.
The reasons to have a plan worked out in advance are pretty much common sense: in a disaster situation, where time is often of the essence, adrenaline and stress levels run high and it can be hard to remain cool, calm, and collected. For an unprepared DBA, this can very easily lead to costly mistakes being made. No-one wants to be the one that overwrote the only existing copy of a database with a corrupt backup and caused the business to be offline for several days. Good-bye job.
A pre-defined disaster recovery plan allows the DBA (or responsible person) to follow a set of tested steps to resolve problems that can occur. A really comprehensive plan covers more than just how to restore the database, but instead will contain troubleshooting information to determine what needs to be done (rather than just immediately taking everything offline and running a restore), and then coping with twists and kinks that could crop up while performing the recovery operation. What makes a plan wonderful is that it gets tested regularly to make sure everything in it is a) still appropriate b) still works c) still works within the defined RTO and RPO. Kimberly and always like to say that a disaster recovery plan should be written by the most experienced DBA you have and tested by everyone else, down to the most junior DBA – everyone needs to be able to work with it.
Bottom line (not a very long editorial this time) – you can't expect to be able to recover within any defined limits if you're just going to wing-it when a disaster strikes. Doesn't matter how experienced you are, crazy stuff happens that can trip you up.
Next up – the next survey!