This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!
Virtualization has been popular for many years, and more and more businesses are moving low-latency line-of-business applications like SQL Server into virtual machines every day. One of the common reasons that I’ve heard over the years for moving SQL Server to a virtual machine is that high availability is built-in. Usually what this translates into is, “We don’t need to use SQL Server availability options because the VM already has HA.” This may be the case for some scenarios but as the saying goes “there’s no such thing as a free lunch.” In this post we’ll look at the high availability provided to virtual machines and the considerations that need to be taken into account when determining whether or not to implement SQL Server high availability while using virtual machines.
Basic Virtual Machine HA
The high availability provided through virtualization depends on the configuration of the host environment on which the VMs are running. Typically for a high-availability configuration for virtualization, multiple host servers are clustered together using a shared-storage solution on a SAN, NFS, or NAS for the virtual machine hard disks. This provides resilience against failure of one of the host servers by allowing the virtual machines to restart on one of the other hosts. Both Hyper-V and VMware provide automated detection of guest failures in the event of a problem and will restart the VMs automatically on another host, provided that sufficient resources exist to meet any reservations configured for the individual VMs.
VMs also gain better availability over physical servers through features like Live Migration/vMotion and the ability to perform online storage migrations to move the virtual hard disks from one storage array to another one available to the host(s). This can be very useful for planned maintenance windows, SAN upgrades, or for balancing load across the host servers to maximize performance in response to performance problems. The VM tools that are installed in the guest, to improve performance and integration with the host server, can also monitor availability of the guest through regular ‘heart-beats’ allowing the host to determine that a VM has crashed, for example a blue screen of death (BSOD), and automatically restart the guest VM in response.
VM Specific HA Features
Addition to the basic high availability provided by virtualization, there are VM-specific HA features that are offered by both VMware and Hyper-V for improving availability of individual VMs. VMware introduced a feature for VM guests called Fault Tolerance in vSphere 4 that creates a synchronized secondary virtual machine on another host in the high-availability cluster that is lock stepped with the primary. In the event of a host failure, guests that have Fault Tolerance enabled immediately failover to their secondary in a manner that is similar to a vMotion operation, preventing application downtime from occurring. At the same time, a new secondary VM is created on another host inside of the cluster and synchronized with the new primary maintaining the fault tolerance of the guest inside of the environment. Unfortunately this is limited to a single virtual CPU, even in ESX 5.1 so it’s not likely to be used with SQL Server VMs.
Hyper-V does not currently provide an equivalent feature to VMware Fault Tolerance, even in Server 2012. Hyper-V 2012 introduced Replica’s which are provide disaster recovery through replication to a remote data center with manual failover, but it doesn’t provide automated failover in a similar manner to Fault Tolerance.
SQL Server Considerations
The primary consideration I ask about when it comes to SQL Server high availability on virtualization is whether or not it is acceptable to incur planned down times associated with routine maintenance tasks like Windows Server OS patching, and SQL Server patching with Service Packs or Cumulative Updates. If a planned down time is possible to allow for patching then the high availability provided by virtualization may meet your business requirements. However, I would always recommend testing a host failure to determine the amount of time required to detect the failure, and then restart the VM on another host, including the time required for Windows to boot, and SQL Server to perform crash recovery to make the databases available again. This may take 3-5 minutes, or even longer depending on the environment, which may not fit within your downtime SLAs.
If planned down time for applying server patches is not possible, you will need to pick a SQL Server availability option using the same considerations as you would for a physical server implementation. Support for Failover Clustering of SQL Server on SVVP-certified platforms was introduced in 2008, and Database Mirroring and Availability Groups are also supported under server virtualization. However, none of the SQL Server high availability options are supported in conjunction with Hyper-V Replicas, so there are additional limitations that need to be considered whenever you combine features on top of server virtualization. One of the limitations that should always be factored into the decision to virtualize SQL Server and use SQL native high availability options is the added complexity that exists by adding the virtualization layer to the configuration.