Target and actual SQL Server uptime survey results

Exactly five years ago I published survey results showing target uptime SLAs and actual uptime measurements. I re-ran the survey a few weeks ago to see what’s changed, if anything, in the space of five years, and here are the results.

24×7 Systems

 24x7target

 24x7actual

Other responses:

  • 1 x 99.95%

Non 24×7 Systems

Non24x7target

Other responses:

  • 7 x “No target or target unknown”
  • 1 x “0830 – 1730 M-Sat”

Non24x7actual

Other values:

  • 1 x “n/a”

Summary

Well, the good thing is that this survey had almost twice the number of respondents as the 2009 survey, but that could just be that a lot more people read my blog now than five years ago.

My takeaway from the data is that nothing has really changed over the last five years. Given the really low response rate to the survey (when I usually get more than 2-300 responses for a typical survey), my inference is that the majority of you out there don’t have well-defined uptime targets (or recovery time objective service level agreements, RTO SLAs, or whatever you want to call it) and so didn’t respond to the survey. The same thing happens when surveying something like backup testing frequency – where you *know* you’re supposed to do it, but don’t do it enough so feel guilty and don’t respond to the survey.

For those of you that responded, or didn’t respond and do have targets, well done! For those of you that don’t have targets, I don’t blame you, I blame the environment you’re in. Most DBAs I know that *want* to do something about HA/DR are prevented from doing so by their management not placing enough importance on the subject, from talking to a bunch of you. This is also shown by the demand for our various in-person training classes: IE2 on Performance Tuning is usually over-subscribed even though it runs 3-4 times per year, but IE3 on HA/DR has only sold out once even though we generally run it only once per year.

Performance is the number one thing on the collective minds of most I.T. management, not HA/DR planning, and that’s just wrong. Business continuity is so crucial, especially in this day and age of close competition where being down can cause fickle customers to move to a different store/service provider.

If you’re reading this and you know you don’t have well-defined uptime targets then I strongly encourage you to raise the issue with your management, as it’s likely that your entire HA/DR strategy is lacking too. For more information, you can read the results post from the survey five years ago (Importance of defining and measuring SLAs).

Don’t wait until disaster strikes to make HA/DR a priority.

TechNet Magazine article: data protection and the corporate jigsaw puzzle

My latest feature article for TechNet Magazine has just been published in the April edition.

It focuses on planning an HA/DR strategy within the confines of a larger corporate IT strategy, from multiple perspectives but really focusing on the IT manager role and how to interact both up (to business managers) and down (to DBAs). Although many of you reading my blog are in the latter category, this would be a great article to have your managers read – especially if you're having a hard time convincing them to take HA/DR seriously and/or conduct appropriate testing of an HA/DR strategy.

You can get to the article at: http://technet.microsoft.com/en-us/magazine/gg981678.aspx.

Enjoy!

Importance of network latency when using database mirroring

Last week I kicked off a survey about network latencies and database mirroring. See here for the original post.

Here are the results of the survey:

 

I was really interested to see whether the proportion of people doing asynchronous mirroring became higher as the network latency increased. Although this isn't a statistically valid sampe by any means, it does show that the answer is no. However, we're missing some data that would help explain what we see here: how long are the average transactions and is there a response time SLA?

The latency between the principal and the mirror is a big deal for synchronous mirroring, because a transaction on the principal cannot be acknowledged to the user/app as having committed until all of it's log records have been written to the mirror database's log drive.

NOTE: the transaction does NOT have to be replayed/committed on the mirror, simply the log records have to be durable to guarantee the transaction is durable if the principal has a disaster. This is a very common misconception.

If the average transaction length is quite long, say 20 seconds, then the addition of another 500ms of latency when the commit is issued is not a big deal. But if the average transaction length is 100ms then an extra 500ms is a *very* big deal. This is when using asynchronous mirroring starts being considered – as transactions on the principal do NOT have to wait, but at the expense of potential data loss if the principal experiences a disaster. However, if there is no response time SLA, then the company may be fine with the extra delay with synchronous mirroring to guaranteezero data loss (as long as the mirror session stays SYNCHRONIZED).

As always, the choice of HA and DR technology comes down to analyzing requirements and limitations before choosing a technology. I go into this in more detail in the whitepaper I wrote in 2009 for Microsoft: High Availability with SQL Server 2008. There is also an excellent whitepaper on database mirroring: Database Mirroring Best Practices and Performance Considerations.

If you're one of the people who responded that you don't know your network latency even though you're using mirroring, check out the post I wrote last week: Importance of monitoring a database mirroring session.

Thanks!