It's really scary how quickly time flies - seems like it was just last week when I last blogged about TechNet Magazine (actually it was a month ago when I blogged about my 2008 Change Tracking article - see here). Anyway, a new issue of TechNet Magazine has just come out and this one has the latest installment of my bi-monthly SQL Q&A column.

This month's topics are:

  • How row-overflow columns can lead to poor range scan performance, even on completely defragmented indexes.
  • Combining database mirroring and failover clustering without undesired failover behavior.
  • Adding differential backups to a full+log backup strategy to lower recovery time.
  • Memory settings for multi-instance failover clustering

You can get to the column online at http://technet.microsoft.com/en-us/magazine/dd228989.aspx.

Enjoy!

PS If you have any ideas for what would make a good SQL Q&A topic, please drop me a line - paul@sqlskills.com

It's been almost exactly a week since the last post - an unusually long time for me. Kimberly and I were teaching the first week of the inaugural Microsoft Certifed Masters SQL course last week in Redmond (a little intense at 8 hours of *teaching* per 10-hour day - so no energy for blogging afterwards). Now we're in Vienna, where we're on-site with one of our favorite customers, then we head to Barcelona next week for TechEd EMEA, back to Vegas for SQL Connections the week after, and finally back to Seattle for PASS the following week. Phew! Look for a photo-stuffed Where In The World Are Paul and Kimberly post from Vienna next week.

I'll try to post some interesting stuff that comes up while we're at the conferences - always a good crop of questions.

In this quick post I want to touch on something that's becoming more important to explain as more and more people start managing systems who haven't got a lot of DBA experience (what I like to call involuntary DBAs). How do you plan a backup strategy? As you'll see if you look at my Backup/Restore category, I've got lots of info about specific types of backups but nothing on putting a plan together in the first place.

It's very simple to decide on using the SIMPLE recovery model and regular database backups - the backup schedule is easy. But, what happens when you come to recovering from a disaster? Which backups to you need to restore and how long does it take you? If you take weekly full backups, say, then you stand the chance of losing a lot of data if the disaster occurs just before your next full backup. So, if you switch to the FULL recovery model and add log backups, you can recover right up to the point of the crash.

But again, what backups do you have to restore and how long does it take you? Are you able to restore within the maximum allowable downtime for your business? If you have a 300-GB database, and the downtime allowance is 15 minutes (as with one DBA I've known), the answer is no. Do you need to move to a partitioned schema that makes use of multiple filegroups so that you can use partial database availability to bring your application online faster with only the critical filegroups?

The key point when planning a backup strategy is not to think about what backups you want to take - think about what restores you have to be able to perform, then work backwards from that.

Cheers

One of our customers (and friends!) sent us this last week - a stark reminder why hardware redundancy is a good idea!

 

The controller card in one of his servers literally blew up at 7.15am Tuesday morning. Although they didn't have redundant hardware, they were back online by 9am - pretty good going. Funnily enough they've been considering clustering  for a while now...

(Posted with permission.)

I just heard today that the first whitepaper I've written for Microsoft has been published!

The abstract is:

SQL Server Replication: Providing High-Availability using Database Mirroring

This white paper describes how to use database mirroring to increase the availability of the replication stream in a transactional environment. It covers setting up replication in a mirrored environment, the effect of mirroring partnership state changes, and the effect of mirroring failovers on replication. In addition, it describes how to use LSN-based initialization to recover from the failover of a mirrored subscriber database.

Although brief overviews are given of both replication and database mirroring, it is easier to understand this white paper if the reader has some experience with one or both of these technologies, and has at least a rudimentary knowledge of database concepts such as transactions.

You can download it from http://download.microsoft.com/download/d/9/4/d948f981-926e-40fa-a026-5bfcf076d9b9/ReplicationAndDBM.docx and I'll put a link in our whitepapers page.

Enjoy!

As you may already know, instant initialization is a way to prevent data (not log) file create and grow operations having to zero-initialize the new space before allowing it to be used. This can vastly speed up these operations as zero-initialization can take a *long* time for large files. This is especially useful in disaster recovery operations as the first phase of a restore is always to create the requisite files, if they don't already exist. Cutting minutes or even hours from this phase can significantly reduce downtime. It's available on XP SP2 and Windows Server 2003 and above. You can get more details from a blog post of Kimberly's from March 2007.

The way to enable it is to give the SQL Server service account the 'Perform volume maintenance tasks' privilege and then restart the service. There's no way to enable or disable it from within SQL Server, and until now, no way I've known of to tell whether it's enabled from within SQL Server. I was teaching a Microsoft-internal class on Database Maintenance last week and one of the students came up with a neat way to tell - using xp_cmdshell to execute the whoami /priv command, which lists all the privileges that SQL Server service account has.

The whoami command is available on Windows Server machines but for XP you need to download the support tools from Microsoft to get it to work, as I did on my laptop. You can get them here (5MB download) and you need to stop/start SQL Server on XP after installing them so it picks up the new tools path. Now, most people will be running with xp_cmdshell turned off, because of the security risks involved with enabling it, so here's a script that turns it on, checks the privileges, and turns it off again.

EXEC sp_configure 'xp_cmdshell', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO

CREATE TABLE #xp_cmdshell_output (Output VARCHAR (8000));
GO

INSERT INTO #xp_cmdshell_output EXEC ('xp_cmdshell ''whoami /priv''');
GO

IF EXISTS (SELECT * FROM #xp_cmdshell_output WHERE Output LIKE '%SeManageVolumePrivilege%')
   
PRINT 'Instant Initialization enabled'
ELSE
   
PRINT 'Instant Initialization disabled';
GO

DROP TABLE #xp_cmdshell_output;
GO

EXEC sp_configure 'xp_cmdshell', 0;
GO
RECONFIGURE WITH OVERRIDE;
GO

Enjoy!

While we were at TechEd in June, Kimberly and I participated in an hour-long discussion panel (Leveraging SQL Server Technologies to Build a Solid High-Availability Strategy) that was video-taped by the TechEd Online folks. It's now been edited and is available for download/viewing. We cover everything from requirements analysis to technology details in SQL Server 2008. The other panel members were Satya Jayanty, Allan Hirt, Kevin Farlee, and Amit Bansal.

You can browse the various online videos at http://technet.microsoft.com/en-us/events/teched/cc561184.aspx or go straight to the panel discussion video here.

Enjoy!

It seems that today is going to be one of those days where I get lost in forums and blogging - I can live with that :-)

One of the questions that came up on a forum today was about choosing an HA solution - based solely on the hardware that was running the database! Given that single piece of info, it's impossible to come up with any kind of sensible answer. The other thing I see a lot is someone saying 'just use a cluster' - well, if you're trying to protect against damage to the data, just using a cluster won't do it because of the single-point-of-failure in a failover cluster - the shared disks.

So where do you start? The key to choosing an HA solution is to work out your requirements first and then choose a technology that allows you to meet as many of them as you can, within your available budget. Here are some of the questions I like to ask (not an exhaustive list):

  • What is the maximum application downtime SLA (service-level agreement)? In other words, if a disaster happens, how long can the application be off-line while failover occurs or the disaster is fixed?
  • What is the maximum acceptable data-loss SLA? If a disaster happens, how much can you afford to lose in terms of data or work? You might require up-to-the minute recovery for instance, or you might be able to cope with losing the last day's worth of transactions.
  • What are you trying to protect? Site, server, instance, database, filegroup, partition, table, group of tables?
  • What is the transaction log generation rate of your workload? If it's very high, that means you're going to have problems with backup up the log and with getting transaction log over to your redundant server/site.
  • What recovery model are you running your database(s) in? If you're in SIMPLE, then you can't get point-in-time recovery and so you're looking at losing all the work since your last full backup, and it also means you can't use any of the HA technologies which rely on the transaction log.
  • What’s your current backup strategy? If the answer is 'what backup strategy?' then you've got bigger problems than just getting an HA solution in place...
  • Are you trying to achieve site-level redundancy? If so, do you have a second site? Where is it? Does it have the same protection as the main site (in terms of security, HVAC, power, etc)
  • What’s the network bandwidth and latency to the second site? If your transction log generation rate is MBs/second, but your second site is 2000 miles away through a 720KB/second link, you're not going to be doing any kind of HA solution involving the second site that comes close to your downtime and data-loss requirements...
  • What’s the hardware at the second site?
  • Can you alter the application at all? If you can't alter the application then you may have a hard time getting it to gracefully failover to a redundant server. You also won't be able to use explicit redirection with database mirroring.
  • What's the application eco-system? In other words, what all has to failover so the application can run properly.

All of these figure into the choice of HA solution. Work these out, prioritize them, and then evaluate HA technologies (or combinations of technologies) to see which requirements you can meet. Don't just jump at failover clustering first!

Over the next few months I'll be posting more on designing for high-availability - let me know if there's anything in particular you want to see.

Theme design by Nukeation based on Jelle Druyts