A few weeks ago I kicked off a survey about disaster recovery testing and how the plan copes with human factors. You can see the original survey here.
This Dilbert cartoon is a pretty accurate portrayal of most clients' disaster recovery plans when I first start working with them: http://dilbert.com/strips/comic/2000-08-15/.
Here are the survey results:
The "other" responses are:
6 x As a company we haven't done it in decades + but I do test my backups every day.
2 x We have a dr strategy but know it's broken.
1 x 4x per year, but no fixed schedule.
I'm pretty disheartened by these results – only 41% of respondents test their DR plan at least once a year, but at least 80% of respondents actually *have* a DR plan.
Apart from the obvious reason to test a DR plan initially so you know it works, it's very important to test it regularly as very often assumptions made when the DR plan was written are no longer valid. For instance, if the database size increased then it's going to take longer to restore, and so may break the RTO agreement with your management. What if someone made a change to the backup procedures and now your restore sequence is broken? What if you're not monitoring database mirroring correctly and the REDO queue on the mirror is such that a failover takes longer than the RTO? What if the backup generator is broken? The list goes on and on.
I really didn't expect anyone to pick answer #4 – I'm shocked. How can sane management include preventing the technologists from testing the plan that's going to potentially save the company if a disaster occurs?
One line I like to use as a consultant when talking to senior executives is: would you rather find out that your disaster recovery plan is broken through a controlled test when all the senior folks are standing by to put things right or when an actual disaster happens in the middle of the night on a public holiday when only the most junior folk are on duty and the chances of monetary losses are significant?
(Don't get me wrong – junior does not equal incompetent in any way in my book, but that's the kind of reasoning I've found to work with senior executives in corporations who are far removed from the technological coal-face.)
So how does human nature factor in here? Well, it's human nature to not be worried about disaster recovery – until a disaster happens. It's kind of the "out-of-sight, out-of-mind" mentality. There's also the possibility that people know the DR plan sucks, but no-one wants to confront that fact and have to go fix it – this is sheer irresponsibility on someone's part (maybe not the DBA if they're not given the time to go fix it). There's also the "in won't ever happen to me" mentality. How many of you reading this post have walked around your house with a video camera making note of your belongings in case your house is destroyed? I know I haven't gotten around to it yet – it keeps getting pushed down the to-do list. It takes an effort of willpower to make these things bubble to the top of the to-do list and stop procrastinating.
Go test your disaster recovery plan – you'll be amazed at what you'll find is broken. I wrote a blog post about this back in 2009 after conducting a survey of what people discovered when testing their DR plan – see here.
The "other" responses are:
1 x I am so not coming to work on that day.
1 x Our DR site is 1200 miles away, but assumes compliance by the DR site folks. A nationwide disaster would be tough to overcome.
These results are not surprising at all. The majority of companies do not consider human nature during a disaster. Saying that, however, I think a distinction should be made between countries that are highly disaster-prepared and disaster-conscious, like Japan, and countries that in general aren't, like the US (go read this blog post that discusses Japanese disaster preparedness if you think I'm wrong here).
I think it comes down to what I said above: "it won't happen to us". Most DR plans that I've seen assume that the disaster being recovered from is one that's only affecting that company and isn't affecting the personal lives of those responsible for doing the disaster recovery. But in a widespread disaster most people are going to be focusing on themselves and their family, not thinking about whether the production database is still available. Does your company realize that?
Time to rethink your disaster recovery plan? No-one else is going to do it for you… that's human nature.