This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

A lot of times when I talk to people about high availability for SQL Server the first thing that they think of is a failover cluster. While failover clustering is one of the high-availability features provided by SQL Server, it’s not the only option available. Selecting the correct high-availability strategy should be a part of the initial planning of a SQL Server installation, but commonly I find that high-availability considerations only become important after a problem has occurred that resulted in downtime for SQL Server. When high availability is considered as an after thought, the costs for implementing the chosen strategy may be higher than if it was implemented initially. In this post we’ll take a look at the necessary considerations for choosing a high-availability strategy

SQL Server offers many high-availability features such as database mirroring, failover clustered instances, availability groups, and log shipping. Even transactional replication can be used as a high-availability option.

Gather Requirements

The first step in the process of determining a high availability strategy for SQL Server is gathering business requirements to establish SLAs.  Paul already covered RTO and RPO in his blog post (Day 6 of 30): Backups: Understanding RTO and RPO, and these are the key requirements that need to be understood as a part of requirements gathering.  It is important to ensure that realistic expectations are set for the availability requirements for the solution.  Otherwise, when we get to the next step and begin evaluating our limitations we may need to come back to this first step again to reevaluate what is actually going to be possible.

Evaluate Limitations

Once we know the expected SLAs for the solution we can then begin to evaluate the limitations that exist to determine if we will be able to meet those SLAs or not. Limitations can generally be categorized as technical or non-technical.  For example, the budget is a non-technical limitation that is going to be important in picking technologies and determining if you can meet the SLA requirements.  There is no point in designing an entire solution around multiple data centers and SAN-based replication if the budget for high availability is only $10K.  Other important non-technical limitations to consider are the skill-set requirements for maintaining an implementation, space availability in the data center, power and cooling requirements, and even the time requirements for the implementation.  Any or all of these can be overcome if the budget supports expanding data-center space, additional training for employees, or even hiring experienced consultants to perform/help with the initial implementation when time constraints exist.

Technical limitations are limitations around SQL Server that can affect the ability of specific technologies to be used.  For example, if you can only use SQL Server Standard Edition in SQL Server 2012, the hardware will be limited to 16 cores, 64GB RAM and you won’t be able to leverage newer features like availability groups.  Other technical factors such as the volume of transaction log generation, average transaction size, database recovery model, and whether or not you can modify the application will also limit the options available when you begin selecting technologies based on the limitations and requirements.

Selecting Technologies

Once you have the requirements and limitations sorted out it is time to begin reviewing the available technologies to determine if it’s actually going to be possible to meet the SLAs within the existing limitations or not.  It is not uncommon to find that the existing limitations will not support the business requirements and SLAs, and when this occurs it is important to present the limitations and explain the SLAs won’t be achievable within the current constraints.  One of two things can happen at this point: the business requirements may be revised to work within the available limitations, or the requirements can be prioritized to determine the order of importance for the design.  The specifics the availability options in SQL Server will be covered in the next few posts in this series, but it is critical to ensure all of the features required by the application are going to be met by the selected technology.  For example, if the database uses FILESTREAM, database mirroring won’t be a viable high availability option, likewise for databases that don’t use the FULL recovery model, which is required for database mirroring and availability groups.

Testing, Validation, and Documentation

Testing and validation are generally a part of the initial implementation of a new high-availability strategy, but they should also become routine tasks to continuously validate that the current implementation continues to meet the business requirements and SLAs.  Documentation of the solution configuration, as well as the failover and recovery plan is an important part of implementing any high-availability solution.  The documentation should ideally be written by a senior team member, but then tested by the most junior person to ensure that all of the necessary steps for performing a failover or recovering from a problem have been appropriately addressed in the documentation.  During ongoing validation of the configuration, documentation updates should be made for any new limitations that are discovered or as the configuration changes to continue to meet the business SLAs.

Summary

Care must be taken when formulating a high-availability strategy to ensure that it meets  the requirements while also working within the limitations.

You can read more about these whitepapers that Paul wrote: