Since SQL Server 2012, Microsoft has put a lot of efforts into marketing AlwaysOn Availability Groups as a feature and solution for both high availability/disaster recovery for SQL Server, and scalability for read-only workloads. Over the last six years, I have talked more clients out of using Availability Groups than I have actually implemented for SQL Server for a lot of different reasons. As easy as Microsoft has made it to build an Availability Group, restore and synchronize the databases, join secondary servers, etc. the simple fact is that once they are in production usage, you had better know what not to do that could break things, the impact having a readable secondary, and how to troubleshoot problems when they occur, because they will.
A lot of the implementation questions I get for Availability Groups are generally looking at leveraging a secondary replica for High Availability or Disaster Recovery and often times also for offloading reporting or readable workloads. Readable Secondary copies of the database are really simple at the surface conceptually, but there are a lot of additional considerations I make that could lead you to another technology entirely like Transactional Replication or Log Shipping with Standby. Let’s look at how these three stack up against each other for different data access requirements.
First things first, for real-time data access, you only have one option, the primary writeable copy of the database. In replication this is the publisher database, log shipping calls it the log shipped primary, and Availability Groups refers to it as the primary replica. None of the secondary copies of the data, even in a synchronous Availability Group replica, is ever guaranteed to actually be real-time data. A synchronous Availability Group only guarantees that the log records for an operation are hardened in the synchronous secondary replicas transaction log allowing them to later be redone on the secondary database when a transaction commits. Depending on the volume of activity that is occurring, and whether the secondary replica is being used for read-only workloads or has sufficient hardware, the redo thread can typically be anywhere from a few hundred milliseconds to a few seconds behind the actual primary replica. This means that if you insert data into a table, scope the identity value and return it back to the application, then attempt to read that record immediately from the read-only secondary, it could actually not be there (Yes, this actually does happen).
Near Real-time Data
For near real time data access, both Replication and the Availability Group can meet the requirement for near real-time data. Replication typically will have slightly more latency than an Availability Group readable secondary in either commit mode by a few seconds due to the way replication functions. The log reader agent has to read the transaction log, write replication changes to the distribution database, and then the distribution agent has to pickup those changes up and apply them to the subscriber. This typically has a latency of 3-7 seconds under normal conditions, where an Availability Group readable secondary could have less latency depending on the volume of activity. The big difference between the two is entirely in the implementation and features supported.
If you need a non-clustered index to support the read-only workload, that index must be created on the primary database of an Availability Group for it to be available on the readable secondary. This means you have to backup and maintain that index and the maintenance will affect all servers in the Availability Group as a logged operation requiring redo on secondary replicas. This is not the case with transactional replication, where the subscriber database is a writeable copy of the database and can have a completely different indexing design from the publisher database that is based on just the read-only workload. Since the subscriber database is writeable, that also means that you can use features like Query Store, plan forcing, and automatic plan correction for the read-only workload; all of these are unavailable for a readable secondary replica in an Availability Group because the database is an exact duplicate of the primary (which can be enabled for Query Store and use these features).
Another consideration is the impact of the 14-byte row-version tag generation to support Read Committed Snapshot Isolation (RCSI) for the read-only replica in an Availability Group. This can only be generated on the primary replica database, but is not required on the publisher database for transactional replication. Why does this matter? It can lead to page splits that cause logical fragmentation of the indexes. Moving to a readable secondary may require monitoring index fragmentation and introducing fill factor changes where you wouldn’t have needed them previously. If you have read the incredibly bad information online telling you that you can ignore index fragmentation entirely with SSDs, or that you don’t need to rebuild indexes with an Availability Group, consider this: logical scan fragmentation may not matter much from a performance standpoint on fast storage, but a low page density caused by page splits and fragmentation affects every single replica copy and wastes space on disk and in memory in the buffer pool, and affects plan select due to query costing by the optimizer. With replication, all of this is avoided entirely unless you use RCSI for your publisher workload anyway, but that does not impact the subscribers in anyway since they are only replicated copies of the data modification operations, not the pages themselves.
For point in time data access, I often go low tech and suggest using Log Shipping with Standby and delayed restore to create a warm copy of the database that is restored to a point in time. This can be useful for accounting reporting on the previous days sales transactions, performing ETL into a data mart or data warehouse, and it also provides the benefit of having a second copy of the database at a known point in time for DR purposes. You can also do this type of reporting off of the near real-time data provided by Replication and Availability Groups typically as well. The table below breaks down the features and requirements of each of these three technologies for providing a readable copy of the database.
|Real Time Data||Secondary Delay||Readable Workload Indexes||Row-version Tag Overhead||Query Store Support||Automatic Plan Correction||Partial Data Only|
|Availability Groups (Sync)||Primary Only||Depends on Redo typically within a few seconds||Must be created on Primary||Primary database with Readable Secondary||Primary Only||Primary Only|
|Availability Groups (Async)||Primary Only||Depends on Log Send Queue and Redo typically within a few seconds||Must be created on Primary||Primary database with Readable Secondary||Primary Only||Primary Only|
|Replication||Publisher Only||Depends on log reader and distribution agent typically within a few seconds||May be created on subscriber copy only||Only if RCSI enabled||Publisher and Subscribers||Publisher and Subscribers|
|Log Shipping||Primary Only||Depends on restore job configuration and standby – Point in time only, updates require disconnecting users||Must be created on Primary||Only if RCSI enabled||Primary Only||Primary Only|
Complexity and Manageability
So what about complexity and manageability of these different options?
In the opening paragraph of this post I pointed out how easy Microsoft has made it to build an Availability Group to handle read-only copies of a database or group of databases. Setting up an Availability Group is really quite simple but that doesn’t mean it’s not lacking in complexity. First you need to know what kind of Availability Group are you going to implement. What started as a single feature has morphed over each release into:
- Basic Availability Groups – Standard Edition (single database)
- Enterprise Availability Groups – Enterprise Edition (one or many databases)
- Distributed Availability Groups – Support for up to 17 readable secondaries without daisy-chaining
- Domain Independent Availability Groups – Environments without Active Directory
- Read-scale Availability Groups – No clustering requirement or HA
Add Linux into the mix of options as well as whether or not you need failover clustering as a part of that configuration, and this isn’t as simple as it seems. One of the common areas of failure with Availability Groups that I’ve seen is with choosing the correct quorum configuration for the Windows Failover Cluster configuration, though Microsoft has made this easier in Windows Server 2012 R2 and higher. An error in quorum configuration can result in your High Availability solution failing and taking the databases offline as a result.
Replication tends to get a bad rap when it comes to complexity and manageability. The reality is that the technology just isn’t well understood by most SQL Server professionals, and it is really easy to configure (there’s a great UI in SSMS for it also), and as long as you don’t play with it or modify data on the subscriber databases, it usually just works without a hitch. Change management is something that has to be planned out with Replication, where it wouldn’t have to be with Availability Groups. New tables have to be added to replication manually as an article, and certain operations like dropping and recreating a table or truncating a table are not allowed when the table is being replicated (sorry, no more using SSMS designer to insert a column in the middle of a table). Compared to Availability Groups, just for change management, Replication is much more complex and requires more work to manage over time, but the types of changes that require manual intervention in Replication typically are not occurring at a frequent pace for established applications and databases.
Log Shipping is just stupid simple low-tech and easy to manage. I mean for real how difficult is taking a backup and restoring it on another server? You might be surprised, but the configuration wizard and Agent Jobs automate this so that you typically don’t have to do anything but let it do its thing. Even third party backup tools offer their own flavor of implementation of log shipping as a part of their backup products for SQL Server typically. If something ever gets out of sync for log shipping, it usually means that someone took a manual log backup of the database, or switched recovery models and broke the log chain, and the standby database will have to be reinitialized from a new FULL backup of the primary to resume the restores (or you can manually apply the missing backup if a manual log backup was performed to close the gap).
While Availability Groups have become the Go To technology for providing a readable copy of data for SQL Server, they are not the only feature available to meet this requirement. It is important to understand the requirements and the limitations of any feature that you are considering using, and to also evaluate the other available features that might be a better fit for the needs of the environment. I have used transactional replication for nearly 10 years to solve read-only workload and reporting requirements for SQL Server and continue to use it over Availability Groups where as a technology, it is just a better fit for the requirements. Being able to reduce indexing on the primary writeable copy of the database and only having readable workload non-clustered indexes on the subscribers can be a huge benefit for some workloads. For one client it was the difference between a 800GB publisher database and a 1.7TB primary AG replica which affected their backup/restore times and RTO SLAs. Collect the requirements, evaluate the technologies and limitations, then choose the appropriate way to solve a problem.