Recently there was lengthy discussion on the #sqlhelp hash tag on Twitter about clustering SQL Server on VMs and whether or not that was a good idea or not. Two years ago I first blogged about this same topic on my blog post, Some Thoughts on Clustering SQL Server Virtual Machines. If you haven’t read that post, I recommend reading it before continuing with this one because it gives a lot of background that I won’t be rehashing as a part of this post. However, a lot has changed in VMware and Hyper-V since I wrote that original post and those changes really affect the recommendations that I would make today.
As I stated in the twitter discussion, we have a number of clients at SQLskills that run WSFC clusters across VMs for SQL Server HA and few have problems with the clusters related to them being VMs. There are some additional considerations that have to be made when working with VMs for the cluster nodes, a big one is that you should plan for those nodes to be on different hosts so that a hardware failure of the host doesn’t take out both of your cluster nodes, which defeats the purpose of having a SQL cluster entirely. There is also the additional layer of the hypervisor and resource management that plays into having a cluster on VMs but with proper planning and management of the VM infrastructure this won’t be a problem, it’s just another layer you have to consider if you do happen to have a problem.
In response to the discussion, Chuck Boyce Jr (Blog|Twitter) wrote a blog post that provided his opinion, which was not to do it, that started a separate discussion later on Twitter. The biggest problem Chuck points out is rooted in problems with inter-team communications within an IT shop. To be honest, Chuck’s point is not an incorrect one, I see this issue all the time, but it’s not specific to VMs. If you work in a shop that has problems with communication between DBAs, Windows administrators, VM administrators, the networking team, and any other IT resource in the business, the simple fact is those problems can be just as bad for a physical implementation of a SQL cluster as they might be for a VM implementation. The solution is to fix the communication problem and find ways to make the “team” cooperate better when problems arise, not avoid merging technologies in the hopes of preventing problems that will still occur in a physical implementation as well.
Am I saying that clustering VMs for SQL Server is for every shop? No, certainly not. There are plenty of places where clustering isn’t the best solution overall. However, with virtualization, depending on the infrastructure, the other SQL HA options that exist might not be a better decision as they would in a physical world either. One of the biggest things to think about is where are the VMs ultimately going to be stored? If the answer is a shared SAN then options like Database Mirroring and Log Shipping don’t really provide you with the same advantages that they do in physical implementation, the big one being that you have a second copy of the database on completely different storage generally. Yes I know that you could have two SQL Servers connected to the same SAN physically that use Database Mirroring, and my response to that would be that a cluster probably would make more sense because the SAN is your single point of failure in either configuration.
If you are new to clustering SQL Server, I wouldn’t recommend that you start out with VMs for your failover cluster. The odds are that you also don’t have a lot of VM experience and if there is a problem, you aren’t going to be able to troubleshoot it as effectively because you have two new technologies that you have to try and dig through. If you are comfortable with clustering SQL Server, adding virtualization as a new technology to the mix is really not that big of a deal, you just need to read the configuration guides and whitepapers for how to setup the VMs, usually your VM administrator is going to have to do this so it’s a good area to break the ice with them and work together to start the open lines of communication, to allow for a supported WSFC implementation and then finally install SQL Server and manage it like you would any other SQL Server failover cluster.
Where else would I recommend not implementing a cluster on VMs? iSCSI SANs that only offer 1Gb/s connectivity over Ethernet, simply because you are likely to run into I/O limitations quickly, and to build the cluster you have to use the software initiator for iSCSI so there is a CPU overhead associated with the configuration. Generally the host has limited ports so you end up sharing the networking between normal traffic and the iSCSI traffic which can be problematic as well. Does that mean it’s not possible? No – I have a number of clients that have these setups and they work fine for their workloads, but it’s not a configuration I would recommend if we were planning a new setup from the ground up.
The big thing that I work through with clients when they are considering whether to cluster VMs for SQL Server is the business requirements for availability and whether or not those can be met without having to leverage one of the SQL HA options or not. With the changes in VMware ESX 5 and Hyper-V 2012, you can scale VMs considerably, and both platforms allow for virtualized NUMA configurations inside of the guest VM for scalability, so the performance and sizing considerations I had two years ago are no longer primary concerns for the implementation to me. If we need 16 vCPUs and 64GB RAM for the nodes, with the correct host configuration, we can easily do that, and we can do it without performance problems while using Standard Edition licensing if we plan the infrastructure correctly.
In my previous post on this topic I linked to a number of VMware papers, and in the post prior to that one I linked to even more papers that include best practice considerations for configuration and sizing of the VMs, how to configure the VMs for clustering, and many other topics. Newer versions of these documents exist for ESX 5 and a number also exist for Hyper-V as well. I recommend that anyone looking at running SQL in a VM, whether as a clustered instance or not, spend some time reading through the papers about the hypervisor you want to run the VM on so you understand how it works, the best practices for running SQL Server on that hypervisor, and what to look for while troubleshooting problems should they occur.
In the end, Microsoft supports SQL Server failover clustering on SVVP certified hypervisors, so there isn’t a hard reason to not consider using VMs objectively to evaluate whether they might be an appropriate fit your business requirements. When I teach about virtualization in our IE3: High Availability & Disaster Recovery class, most of the perceptions at the start of the virtualization module are negative towards SQL Server on VMs, often from past experiences of failed implementations. By the end of the demos for the module, most of the opinions have changed, and in a lot of cases attendees have found and been able to communicate correctly with their VM administrator to get the problem fixed while I am performing demos of specific problems and their causes. In the last year I’ve setup a number of SQL Server clusters on VMs for clients where it was the best fit for their needs. If you would like assistance with reviewing the infrastructure, business requirements, and determining the best configuration for your needs, I’d be happy to work with you as well.