{"id":852,"date":"2009-05-25T13:00:00","date_gmt":"2009-05-25T13:00:00","guid":{"rendered":"\/blogs\/paul\/post\/Importance-of-testing-your-disaster-recovery-plan.aspx"},"modified":"2013-04-02T18:50:52","modified_gmt":"2013-04-03T01:50:52","slug":"importance-of-testing-your-disaster-recovery-plan","status":"publish","type":"post","link":"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/","title":{"rendered":"Importance of testing your disaster recovery plan"},"content":{"rendered":"<p><span style=\"font-family: verdana, geneva; font-size: small;\">In last week&#8217;s survey I asked whether you&#8217;re ever tested your disaster recovery plan, and if so, what happened? (See <\/span><a href=\"https:\/\/www.sqlskills.com\/blogs\/paul\/weekly-survey-have-you-ever-tested-your-disaster-recovery-plan\/\"><span style=\"font-family: verdana, geneva; font-size: small;\">here<\/span><\/a><span style=\"font-family: verdana, geneva; font-size: small;\"> for the survey). Here are the results as of 5\/25\/09: <\/span><\/p>\n<p><span style=\"font-family: verdana, geneva; font-size: small;\"><img fetchpriority=\"high\" decoding=\"async\" alt=\"\" src=\"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-content\/uploads\/2009\/5\/disasterrecovery.jpg\" width=\"576\" height=\"378\" \/> <\/span><\/p>\n<p><span style=\"font-family: verdana, geneva; font-size: small;\">The &#8216;other&#8217; responses are: <\/span><\/p>\n<ul>\n<li>\n<div><span style=\"font-family: verdana, geneva; font-size: small;\">2 x &#8220;restored to test env regularly. don&#8217;t know if sla would be met.&#8221; <\/span><\/div>\n<\/li>\n<li>\n<div><span style=\"font-family: verdana, geneva; font-size: small;\">2 x &#8220;test it regularly, most goes according to plan&#8221; <\/span><\/div>\n<\/li>\n<li>\n<div><span style=\"font-family: verdana, geneva; font-size: small;\">2 x &#8220;Test it regularly, people screw up. Was a great win when I obtained the budget for this activity.&#8221; <\/span><\/div>\n<\/li>\n<li>\n<div><span style=\"font-family: verdana, geneva; font-size: small;\">2 x &#8220;test it regulary and learn new things every time but overall it works&#8221; <\/span><\/div>\n<\/li>\n<li>\n<div><span style=\"font-family: verdana, geneva; font-size: small;\">2 x &#8220;we have annual drp test company-wide&#8221; <\/span><\/div>\n<\/li>\n<li>\n<div><span style=\"font-family: verdana, geneva; font-size: small;\">1 x &#8220;We have lots of DR plans, some tested, some not.&#8221; <\/span><\/div>\n<\/li>\n<li>\n<div><span style=\"font-family: verdana, geneva; font-size: small;\">1 x &#8220;We test the dr failover with mirror.however chain replications (subscriber becomes pulisher)we can&#8217;t&#8221; <\/span><\/div>\n<\/li>\n<\/ul>\n<p><span style=\"font-family: verdana, geneva; font-size: small;\">A good mixture of results, but only around 25% of respondents test it regularly. Rather depressingly, 35% of respondents either don&#8217;t have a DR plan or have one but have never tested it. Given the stories I see almost every day on the various forums, this doesn&#8217;t surprise me &#8211; but it&#8217;s still depressing nevertheless. <\/span><\/p>\n<p><span style=\"font-family: verdana, geneva; font-size: small;\">The term &#8216;disaster recovery&#8217; means different things to different people. The word &#8216;disaster&#8217; in my mind spans everything that could go wrong and affect whether your system and data is online, available, and performing to spec. The word &#8216;recovery&#8217; means any process that allows you to bring your system and data back online, available, and performing to spec. Your disaster recovery plan could be as simple as restoring from the last full database backup, or as complicated as failing over all processing to a remote data center and engaging a 3rd-party company to distribute new DNS routing entries across the Internet. It, of course, depends (I have to work that into every editorial :-). Lots of people interchange DR and high-availability (HA), but HA is really a set of technologies that you implement to help protect against disasters causing problems. For instance, you might implement database mirroring so that part of the DR plan for a database is to failover to the mirror, keeping the database highly-available, while DR happens on the old principal. Or you might implement auto-grow on the transaction log file so that if it runs out of space the database doesn&#8217;t become unusable. Neither of these are doing DR, they&#8217;re preventing a disaster from affecting availability. DR in the second case would be what you do to provision more space for the log so the database can come online again. <\/span><\/p>\n<p><span style=\"font-family: verdana, geneva; font-size: small;\">Now, this editorial&#8217;s not going to be about putting together your disaster recovery plan &#8211; that&#8217;s an entire book in itself, as there are many techniques depending on the disaster and the resources you have available to facilitate recovery. If you don&#8217;t have a disaster recovery plan (which I&#8217;m going to start calling DR plan), then you should *know* that come disaster time, you&#8217;re risking whoever&#8217;s on duty floundering around making mistakes and potentially leading to more downtime and data-loss than if you had a plan to follow. People panic in times of high-stress and crisis, and without a set of steps to follow, bad things happen. Enough said. You know who you are &#8211; go get a DR plan before a disaster happens and you lose time, data, your job, or all of the above. I see them all happen regularly. <\/span><\/p>\n<p><span style=\"font-family: verdana, geneva; font-size: small;\">Once you have a DR plan in place, the ONLY way to know whether you&#8217;re going to be able to recover from a variety of disasters is to simulate some, in production. Yes, this is far easier said than done &#8211; persuading your business owners to take planned downtime (and possibly lose a bit of revenue) can be a hard argument to make, but unless you do, you can&#8217;t know that your DR plan will work. One argument I&#8217;ve found effective is wouldn&#8217;t you rather have all the various admins and DBAs on-site and expecting the test and things to potentially go wrong, than wait until a real disaster occurs in the middle of the night and THEN find out that the DR plan doesn&#8217;t work and everyone has to scramble when they&#8217;re least expecting it? Of course, business owners often aren&#8217;t interested in low-probability potential problems. DR and HA aren&#8217;t sexy topics UNTIL the company experiences a disaster. Then it&#8217;s likely to be the top thing on the CEO&#8217;s mind and you have to have a DR plan in production by Tuesday. <\/span><\/p>\n<p><span style=\"font-family: verdana, geneva; font-size: small;\">Seriously though, if you&#8217;re responsible for your system meeting certain SLAs (downtime and data loss &#8211; a.k.a. RTO and RPO) then your DR plan actually has to work, no matter how carefully you&#8217;ve designed it. This means you have to try restoring from your backups and seeing if you can do it within your downtime SLA. What about if you have to setup a new server first? What about if there&#8217;s no power in your building? What if your off-site backups are 200 miles off-site and the network link is down? What if none of your backups work? What if your differential backups are bad, do you still have all the log backups? How does that affect your recovery time? And so on and so on. You might think I&#8217;m just making stuff up and these things don&#8217;t happen, but they do, and everything I&#8217;m citing as an example has happened to a customer I&#8217;ve personally been involved with while at Microsoft or since then. And they keep happening over and over again to different people. <\/span><\/p>\n<p><span style=\"font-family: verdana, geneva; font-size: small;\">If I had to list the most common reasons I see why disaster recovery fails, they are: <\/span><\/p>\n<ul>\n<li><span style=\"font-family: verdana, geneva; font-size: small;\">There are no backups, meaning recovery = data loss<\/span><\/li>\n<li><span style=\"font-family: verdana, geneva; font-size: small;\">Backups don&#8217;t work or all contain the corruption, meaning recovery = data loss<\/span><\/li>\n<li><span style=\"font-family: verdana, geneva; font-size: small;\">The data volume has increased since last DR test, meaning recovery time exceeds downtime SLA<\/span><\/li>\n<li><span style=\"font-family: verdana, geneva; font-size: small;\">The initial failover when the disaster happens doesn&#8217;t work because the failover site only has part of the application ecosystem, meaning recovery involves getting the application working on the failover site AND then recovering from the disaster on the main site<\/span><\/li>\n<\/ul>\n<p><span style=\"font-family: verdana, geneva; font-size: small;\">Doing an initial test when the DR plan is first produced is great, because at least you know that it works, or there are some things you&#8217;ve missed (which is almost invariably the case). The DR (and HA plans) should be written by the most experienced DBAs, as they&#8217;re the ones who&#8217;ve &#8220;seen it all&#8221; and have a good idea of what could go wrong at any point during the recovery. And the plans should be tested by the most junior DBAs, as you can bet that if a disaster occurs at 2am on Thanksgiving morning, it won&#8217;t be the most senior DBA who&#8217;s on duty. <\/span><\/p>\n<p><span style=\"font-family: verdana, geneva; font-size: small;\">Doing a regular test is critical because things change. Data volume increases. Databases get added into the mix. Personnel change. SLAs change. And after a change, if you don&#8217;t test regularly, then you won&#8217;t know if your DR plan still works until you have a real disaster. If you can push for a DR plan test and everything works, everyone has increased peace of mind. If you can push for a DR plan test and things go south, you&#8217;ll be praised for having exposed the problems. But if you wait, and things go south, no-one likes being responsible for unnecessary downtime or data loss &#8211; and that doesn&#8217;t look good on a resume.<\/span><\/p>\n<p><span style=\"font-size: small;\">Next post &#8211; this week&#8217;s survey!<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In last week&#8217;s survey I asked whether you&#8217;re ever tested your disaster recovery plan, and if so, what happened? (See here for the survey). Here are the results as of 5\/25\/09: The &#8216;other&#8217; responses are: 2 x &#8220;restored to test env regularly. don&#8217;t know if sla would be met.&#8221; 2 x &#8220;test it regularly, most [&hellip;]<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[35,52,91],"tags":[],"class_list":["post-852","post","type-post","status-publish","format-standard","hentry","category-disaster-recovery","category-involuntary-dba","category-surveys"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Importance of testing your disaster recovery plan - Paul S. Randal<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Importance of testing your disaster recovery plan - Paul S. Randal\" \/>\n<meta property=\"og:description\" content=\"In last week&#8217;s survey I asked whether you&#8217;re ever tested your disaster recovery plan, and if so, what happened? (See here for the survey). Here are the results as of 5\/25\/09: The &#8216;other&#8217; responses are: 2 x &#8220;restored to test env regularly. don&#8217;t know if sla would be met.&#8221; 2 x &#8220;test it regularly, most [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/\" \/>\n<meta property=\"og:site_name\" content=\"Paul S. Randal\" \/>\n<meta property=\"article:published_time\" content=\"2009-05-25T13:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2013-04-03T01:50:52+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-content\/uploads\/2009\/5\/disasterrecovery.jpg\" \/>\n<meta name=\"author\" content=\"Paul Randal\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Paul Randal\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/\",\"url\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/\",\"name\":\"Importance of testing your disaster recovery plan - Paul S. Randal\",\"isPartOf\":{\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-content\/uploads\/2009\/5\/disasterrecovery.jpg\",\"datePublished\":\"2009-05-25T13:00:00+00:00\",\"dateModified\":\"2013-04-03T01:50:52+00:00\",\"author\":{\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/#\/schema\/person\/ffcec826c18782e1e0adf173826a7fce\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/#primaryimage\",\"url\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-content\/uploads\/2009\/5\/disasterrecovery.jpg\",\"contentUrl\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-content\/uploads\/2009\/5\/disasterrecovery.jpg\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Importance of testing your disaster recovery plan\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/#website\",\"url\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/\",\"name\":\"Paul S. Randal\",\"description\":\"In Recovery...\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/#\/schema\/person\/ffcec826c18782e1e0adf173826a7fce\",\"name\":\"Paul Randal\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/0b6a266bba2f088f2551ef529293001bd73bf026bc1908b9866728c062beeeb6?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/0b6a266bba2f088f2551ef529293001bd73bf026bc1908b9866728c062beeeb6?s=96&d=mm&r=g\",\"caption\":\"Paul Randal\"},\"sameAs\":[\"http:\/\/3.209.169.194\/blogs\/paul\"],\"url\":\"https:\/\/www.sqlskills.com\/blogs\/paul\/author\/paul\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Importance of testing your disaster recovery plan - Paul S. Randal","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/","og_locale":"en_US","og_type":"article","og_title":"Importance of testing your disaster recovery plan - Paul S. Randal","og_description":"In last week&#8217;s survey I asked whether you&#8217;re ever tested your disaster recovery plan, and if so, what happened? (See here for the survey). Here are the results as of 5\/25\/09: The &#8216;other&#8217; responses are: 2 x &#8220;restored to test env regularly. don&#8217;t know if sla would be met.&#8221; 2 x &#8220;test it regularly, most [&hellip;]","og_url":"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/","og_site_name":"Paul S. Randal","article_published_time":"2009-05-25T13:00:00+00:00","article_modified_time":"2013-04-03T01:50:52+00:00","og_image":[{"url":"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-content\/uploads\/2009\/5\/disasterrecovery.jpg","type":"","width":"","height":""}],"author":"Paul Randal","twitter_misc":{"Written by":"Paul Randal","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/","url":"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/","name":"Importance of testing your disaster recovery plan - Paul S. Randal","isPartOf":{"@id":"https:\/\/www.sqlskills.com\/blogs\/paul\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/#primaryimage"},"image":{"@id":"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/#primaryimage"},"thumbnailUrl":"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-content\/uploads\/2009\/5\/disasterrecovery.jpg","datePublished":"2009-05-25T13:00:00+00:00","dateModified":"2013-04-03T01:50:52+00:00","author":{"@id":"https:\/\/www.sqlskills.com\/blogs\/paul\/#\/schema\/person\/ffcec826c18782e1e0adf173826a7fce"},"breadcrumb":{"@id":"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/#primaryimage","url":"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-content\/uploads\/2009\/5\/disasterrecovery.jpg","contentUrl":"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-content\/uploads\/2009\/5\/disasterrecovery.jpg"},{"@type":"BreadcrumbList","@id":"https:\/\/www.sqlskills.com\/blogs\/paul\/importance-of-testing-your-disaster-recovery-plan\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.sqlskills.com\/blogs\/paul\/"},{"@type":"ListItem","position":2,"name":"Importance of testing your disaster recovery plan"}]},{"@type":"WebSite","@id":"https:\/\/www.sqlskills.com\/blogs\/paul\/#website","url":"https:\/\/www.sqlskills.com\/blogs\/paul\/","name":"Paul S. Randal","description":"In Recovery...","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.sqlskills.com\/blogs\/paul\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.sqlskills.com\/blogs\/paul\/#\/schema\/person\/ffcec826c18782e1e0adf173826a7fce","name":"Paul Randal","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.sqlskills.com\/blogs\/paul\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/0b6a266bba2f088f2551ef529293001bd73bf026bc1908b9866728c062beeeb6?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/0b6a266bba2f088f2551ef529293001bd73bf026bc1908b9866728c062beeeb6?s=96&d=mm&r=g","caption":"Paul Randal"},"sameAs":["http:\/\/3.209.169.194\/blogs\/paul"],"url":"https:\/\/www.sqlskills.com\/blogs\/paul\/author\/paul\/"}]}},"_links":{"self":[{"href":"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-json\/wp\/v2\/posts\/852","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-json\/wp\/v2\/comments?post=852"}],"version-history":[{"count":0,"href":"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-json\/wp\/v2\/posts\/852\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-json\/wp\/v2\/media?parent=852"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-json\/wp\/v2\/categories?post=852"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/paul\/wp-json\/wp\/v2\/tags?post=852"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}