This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!
Two of the most important responsibilities for any DBA are protecting the data in a database and keeping that data available. As such, a DBA may be responsible for creating and testing a disaster recovery plan, and creating and supporting a high availability solution. Before you create either, you have to know your RPO and RTO, as Paul talked about a couple weeks ago. Paul also discussed what you need to consider when developing a recovery strategy, and yesterday Jon covered considerations for implementing a high availability solution.
In today’s post, I want to provide some basic information about disaster recovery and high availability solutions used most often. This overview will give you an idea of what options might be a fit for your database(s), but you’ll want to understand each technology in more detail before you make a final decision.
No matter what type of implementation you support, you need a disaster recovery plan. Your database may not need to be highly available, and you may not have the budget to create a HA solution even if the business wants one. But you must have a method to recover from a disaster. Every version, and every edition, of SQL Server supports backup and restore. A bare bones DR plan requires a restore of the most recent database backups available – this is where backup retention comes in to play. Ideally you have a location to which you can restore. You may have a server and storage ready to go, 500 miles away, just waiting for you to restore the files. Or you may have to purchase that server, install it from the ground up, and then restore the backups. While the plan itself is important, what matters most is that you have a plan.
Log shipping exists on a per-user-database level and requires the database recovery model to use either full or bulk-logged recovery (see Paul’s post for a primer on the differences). Log shipping is easy to understand – it’s backup from one server and restore on another – but the process is automated through jobs. Log shipping is fairly straight forward to configure and you can use the UI or script it out (prior to SQL Server 2000 there was no UI). Log shipping is available in all currently supported versions of SQL Server, and all editions.
You can log ship to multiple locations, creating additional redundancy, and you can configure a database for log shipping if it’s the primary database in a database mirroring or availability group configuration. You can also use log shipping when replication is in use.
With log shipping you can allow limited read-only access on secondary databases for reporting purposes (make sure you understand the licensing impact), and you can take advantage of backup compression to reduce the size of the log backups and therefore decrease the amount of data sent between locations. Note: backup compression was first available only in SQL Server 2008 Enterprise, but starting in SQL Server 2008 R2 it was available in Standard Edition.
While Log Shipping is often used for disaster recovery, you can use it as a high availability solution, as long as you can accept some amount of data loss and some amount of downtime. Alternatively, in a DR scenario, if you implement a longer delay between backup and restore, then if data is changed or removed from the primary database – either purposefully or accidentally – you can possibly recover it from the secondary.
Failover Cluster Instance
A Failover Cluster Instance (also referred to as FCI or SQL FCI) exists at the instance level and can seem scary to newer DBA because it requires a Windows Server Failover Cluster (WSFC). A SQL FCI usually requires more coordination with other teams (e.g. server, storage) than other configurations. But clustering is not incredibly difficult once you understand the different parts involved. A Cluster Validation Tool was made available in Windows Server 2008, and you should ensure the supporting hardware successfully passes its configuration tests before you install SQL Server, otherwise you may not be able to get your instance and up and running.
SQL FCIs are available in all currently supported versions of SQL Server, and can be used with Standard Edition (2 nodes only), Business Intelligence Edition in SQL Server 2012 (2 nodes only), and Enterprise Edition. The nodes in the cluster share the same storage, so there is only one copy of the data. If a failure occurs for a node, SQL Server fails over to another available node.
If you have a two-node WSFC with only one instance of SQL Server, one of the nodes is always unused, basically sitting idle. Management may view this as a waste of resources, but understand that it is there as insurance (that second node is there to keep SQL Server available if the first node fails). You can install a second SQL Server instance and use log shipping or mirroring with snapshots to create a secondary copy of the database for reporting (again, pay attention to licensing costs). Or, those two instances can both support production databases, creating a better use of the hardware. However, be aware of resource utilization when a node fails and both instances run on the same node.
Finally, a SQL FCI can provide intra-data center high availability, but because it uses shared storage, you do have a single point of failure. A SQL FCI can be used for cross-data center disaster recovery if you use multi-site SQL FCIs in conjunction with storage replication. This does require a bit more work and configuration, because you have more moving parts, and it can become quite costly.
Database mirroring is configured on a per-user-database basis and the database must use the Full recovery model. Database mirroring was introduced in SQL Server 2005 SP1 and is available in Standard Edition (synchronous only) and Enterprise Edition (synchronous and asynchronous). A database can be mirrored to only one secondary server, unlike log shipping.
Database mirroring is extremely easy to configure using the UI or scripting. A third instance of SQL Server, configured as a witness, can detect the availability of the primary and mirror servers. In synchronous mode with automatic failover, if the primary server becomes unavailable and the witness can still see the mirror, failover will occur automatically if the database is synchronized.
Note that you cannot mirror a database that contains FILESTREAM data, and mirroring is not appropriate if you need multiple databases to failover simultaneously, or if you use cross-database transactions or distributed transactions. Database mirroring is considered a high availability solution, but it can also be used for disaster recovery, assuming the lag between the primary and mirror sites is not so great that the mirror database is too far behind the primary for RPO to be met. If you’re running Enterprise Edition, snapshots can be used on the mirror server for point-in-time reporting, but there’s a licensing cost that comes with reading off the mirror server (as opposed to if it’s used only when a failover occurs).
Availability groups (AGs) were introduced in SQL Server 2012 and require Enterprise Edition. AGs are configured for one or more databases, and if a failover occurs, the databases in a group failover together. They allow three synchronous replicas (the primary and two secondaries), whereas database mirroring allowed only one synchronous secondary, and up to four asynchronous replicas. Failover in an Availability Group can be automatic or manual. Availability Groups do require a Windows Failover Clustering Server (WFCS), but do not require a SQL FCI. An AG can be hosted on SQL FCIs, or on standalone servers within the WFCS.
Availability Groups allow read-only replicas that allow for lower latency streaming updates, so you can offload reporting to another server and have it be near real-time. Availability Groups offer some fantastic functionality, but just as with a SQL FCI, there are many moving parts and the DBA cannot work in a vacuum for this solution, it requires a group effort. Make friends with the server team, the storage team, the network folks, and the application team.
Transactional Replication gets a shout out here, even though it is not always considered a high availability solution as Paul discusses in his post, In defense of transaction replication as an HA technology. But it can work as a high availability solution provided you can accept its limitations. For example, there is no easy way to fail back to the primary site…however, I would argue this is true for log shipping as well because log shipping requires you to backup and restore (easy but time consuming). In addition, with transactional replication you don’t have a byte-for-byte copy of the publisher database, as you do with log shipping, database mirroring or availability groups. This may be a deal-breaker for some, but it may be quite acceptable for your database(s).
Transactional Replication is available in all currently supported versions and in Standard and Enterprise Editions, and may also be a viable option for you for disaster recovery. It’s important that you clearly understand what it can do, and what it cannot, before you decide to use it. Finally, replication in general isn’t for the faint of heart. It has many moving parts and can be overwhelming for an Accidental DBA. Joe has a great article on SQL Server Pro that covers how to get started with transactional replication.
As we’ve seen, there are many options available that a DBA can use to create a highly available solution and/or a system that can be recovered in the event of a disaster. It all starts with understanding how much data you can lose (RPO) and how long the system can be unavailable (RTO), and you work from there. Remember that the business needs to provide RPO and RTO to you, and then you create the solution based on that information. When you present the solution back to the business, or to management, make sure it is a solution that YOU can support. As an Accidental DBA, whatever technology you choose must be one with which you’re comfortable, because when a problem occurs, you will be the one to respond and that’s not a responsibility to ignore. For more information on HA and DR solutions I recommend the following: