Availability Group Readable Secondaries – Just Say No

Since SQL Server 2012, Microsoft has put a lot of efforts into marketing AlwaysOn Availability Groups as a feature and solution for both high availability/disaster recovery for SQL Server, and scalability for read-only workloads.  Over the last six years, I have talked more clients out of using Availability Groups than I have actually implemented for SQL Server for a lot of different reasons.  As easy as Microsoft has made it to build an Availability Group, restore and synchronize the databases, join secondary servers, etc. the simple fact is that once they are in production usage, you had better know what not to do that could break things, the impact having a readable secondary, and how to troubleshoot problems when they occur, because they will.

A lot of the implementation questions I get for Availability Groups are generally looking at leveraging a secondary replica for High Availability or Disaster Recovery and often times also for offloading reporting or readable workloads.  Readable Secondary copies of the database are really simple at the surface conceptually, but there are a lot of additional considerations I make that could lead you to another technology entirely like Transactional Replication or Log Shipping with Standby. Let’s look at how these three stack up against each other for different data access requirements.

Real-time Data

First things first, for real-time data access, you only have one option, the primary writeable copy of the database. In replication this is the publisher database, log shipping calls it the log shipped primary, and Availability Groups refers to it as the primary replica.  None of the secondary copies of the data, even in a synchronous Availability Group replica, is ever guaranteed to actually be real-time data.  A synchronous Availability Group only guarantees that the log records for an operation are hardened in the synchronous secondary replicas transaction log allowing them to later be redone on the secondary database when a transaction commits.  Depending on the volume of activity that is occurring, and whether the secondary replica is being used for read-only workloads or has sufficient hardware, the redo thread can typically be anywhere from a few hundred milliseconds to a few seconds behind the actual primary replica.  This means that if you insert data into a table, scope the identity value and return it back to the application, then attempt to read that record immediately from the read-only secondary, it could actually not be there (Yes, this actually does happen).

Near Real-time Data

For near real time data access, both Replication and the Availability Group can meet the requirement for near real-time data. Replication typically will have slightly more latency than an Availability Group readable secondary in either commit mode by a few seconds due to the way replication functions.  The log reader agent has to read the transaction log, write replication changes to the distribution database, and then the distribution agent has to pickup those changes up and apply them to the subscriber.  This typically has a latency of 3-7 seconds under normal conditions, where an Availability Group readable secondary could have less latency depending on the volume of activity. The big difference between the two is entirely in the implementation and features supported.

If you need a non-clustered index to support the read-only workload, that index must be created on the primary database of an Availability Group for it to be available on the readable secondary.  This means you have to backup and maintain that index and the maintenance will affect all servers in the Availability Group as a logged operation requiring redo on secondary replicas.  This is not the case with transactional replication, where the subscriber database is a writeable copy of the database and can have a completely different indexing design from the publisher database that is based on just the read-only workload.  Since the subscriber database is writeable, that also means that you can use features like Query Store, plan forcing, and automatic plan correction for the read-only workload; all of these are unavailable for a readable secondary replica in an Availability Group because the database is an exact duplicate of the primary (which can be enabled for Query Store and use these features).

Another consideration is the impact of the 14-byte row-version tag generation to support Read Committed Snapshot Isolation (RCSI) for the read-only replica in an Availability Group.  This can only be generated on the primary replica database, but is not required on the publisher database for transactional replication. Why does this matter? It can lead to page splits that cause logical fragmentation of the indexes. Moving to a readable secondary may require monitoring index fragmentation and introducing fill factor changes where you wouldn’t have needed them previously. If you have read the incredibly bad information online telling you that you can ignore index fragmentation entirely with SSDs, or that you don’t need to rebuild indexes with an Availability Group, consider this: logical scan fragmentation may not matter much from a performance standpoint on fast storage, but a low page density caused by page splits and fragmentation affects every single replica copy and wastes space on disk and in memory in the buffer pool, and affects plan select due to query costing by the optimizer. With replication, all of this is avoided entirely unless you use RCSI for your publisher workload anyway, but that does not impact the subscribers in anyway since they are only replicated copies of the data modification operations, not the pages themselves.

Point-in-time Data

For point in time data access, I often go low tech and suggest using Log Shipping with Standby and delayed restore to create a warm copy of the database that is restored to a point in time.  This can be useful for accounting reporting on the previous days sales transactions, performing ETL into a data mart or data warehouse, and it also provides the benefit of having a second copy of the database at a known point in time for DR purposes.  You can also do this type of reporting off of the near real-time data provided by Replication and Availability Groups typically as well. The table below breaks down the features and requirements of each of these three technologies for providing a readable copy of the database.

Real Time DataSecondary DelayReadable Workload IndexesRow-version Tag OverheadQuery Store SupportAutomatic Plan CorrectionPartial Data Only
Availability Groups (Sync)Primary OnlyDepends on Redo typically within a few secondsMust be created on PrimaryPrimary database with Readable SecondaryPrimary OnlyPrimary Only


Availability Groups (Async)Primary OnlyDepends on Log Send Queue and Redo typically within a few secondsMust be created on PrimaryPrimary database with Readable SecondaryPrimary OnlyPrimary Only


ReplicationPublisher OnlyDepends on log reader and distribution agent typically within a few secondsMay be created on subscriber copy onlyOnly if RCSI enabledPublisher and SubscribersPublisher and Subscribers


Log ShippingPrimary OnlyDepends on restore job configuration and standby – Point in time only, updates require disconnecting usersMust be created on PrimaryOnly if RCSI enabledPrimary OnlyPrimary Only


Complexity and Manageability

So what about complexity and manageability of these different options?

In the opening paragraph of this post I pointed out how easy Microsoft has made it to build an Availability Group to handle read-only copies of a database or group of databases. Setting up an Availability Group is really quite simple but that doesn’t mean it’s not lacking in complexity.  First you need to know what kind of Availability Group are you going to implement.  What started as a single feature has morphed over each release into:

  • Basic Availability Groups – Standard Edition (single database)
  • Enterprise Availability Groups – Enterprise Edition (one or many databases)
  • Distributed Availability Groups – Support for up to 17 readable secondaries without daisy-chaining
  • Domain Independent Availability Groups – Environments without Active Directory
  • Read-scale Availability Groups – No clustering requirement or HA

Add Linux into the mix of options as well as whether or not you need failover clustering as a part of that configuration, and this isn’t as simple as it seems.  One of the common areas of failure with Availability Groups that I’ve seen is with choosing the correct quorum configuration for the Windows Failover Cluster configuration, though Microsoft has made this easier in Windows Server 2012 R2 and higher. An error in quorum configuration can result in your High Availability solution failing and taking the databases offline as a result.

Replication tends to get a bad rap when it comes to complexity and manageability. The reality is that the technology just isn’t well understood by most SQL Server professionals, and it is really easy to configure (there’s a great UI in SSMS for it also), and as long as you don’t play with it or modify data on the subscriber databases, it usually just works without a hitch. Change management is something that has to be planned out with Replication, where it wouldn’t have to be with Availability Groups.  New tables have to be added to replication manually as an article, and certain operations like dropping and recreating a table or truncating a table are not allowed when the table is being replicated (sorry, no more using SSMS designer to insert a column in the middle of a table). Compared to Availability Groups, just for change management, Replication is much more complex and requires more work to manage over time, but the types of changes that require manual intervention in Replication typically are not occurring at a frequent pace for established applications and databases.

Log Shipping is just stupid simple low-tech and easy to manage. I mean for real how difficult is taking a backup and restoring it on another server? You might be surprised, but the configuration wizard and Agent Jobs automate this so that you typically don’t have to do anything but let it do its thing. Even third party backup tools offer their own flavor of implementation of log shipping as a part of their backup products for SQL Server typically. If something ever gets out of sync for log shipping, it usually means that someone took a manual log backup of the database, or switched recovery models and broke the log chain, and the standby database will have to be reinitialized from a new FULL backup of the primary to resume the restores (or you can manually apply the missing backup if a manual log backup was performed to close the gap).


While Availability Groups have become the Go To technology for providing a readable copy of data for SQL Server, they are not the only feature available to meet this requirement.  It is important to understand the requirements and the limitations of any feature that you are considering using, and to also evaluate the other available features that might be a better fit for the needs of the environment.  I have used transactional replication for nearly 10 years to solve read-only workload and reporting requirements for SQL Server and continue to use it over Availability Groups where as a technology, it is just a better fit for the requirements.  Being able to reduce indexing on the primary writeable copy of the database and only having readable workload non-clustered indexes on the subscribers can be a huge benefit for some workloads. For one client it was the difference between a 800GB publisher database and a 1.7TB primary AG replica which affected their backup/restore times and RTO SLAs.  Collect the requirements, evaluate the technologies and limitations, then choose the appropriate way to solve a problem.

Where Can I Download a Specific SQL Server Cumulative Update?

Microsoft has made a lot of changes to their servicing model for SQL Server over the years, specifically moving more towards Cumulative Updates and away from Service Packs as a means to deliver updates and fixes to SQL Server. Today I needed to replace a server in a Failover Clustered Instance (FCI) of SQL Server 2014 that was on Service Pack 1 + CU7. So I went to the KB article for CU7 (https://support.microsoft.com/en-us/kb/3162659) and clicked what I thought was the download link to CU7 for SP1:

Download Latest CU Link

However, to my dismay that link took me to the download link for SQL Server 2014 SP1 CU8 and not to CU7 as I was expecting. So I tried other SP1 CU links and they all redirect to the CU8 download page. Slightly irritated I decided to send a Skype IM to Glenn (@GlennAlanBerry|Blog) what was going on because he keeps up with every update Microsoft publishes for SQL Server. Glenn told me that Microsoft’s recommendation and preference is for people to install the latest CU so the KB articles now link to the latest CU download page. That doesn’t really help me with adding a failed FCI node back into a FCI that is on CU7, so Glenn offered to share the CU7 file with me by Dropbox.

It turns out that I didn’t need Glenn to share the file with me, I needed to read the information on the KB article closer and pay attention.

Download CU7 Link

The Note at the bottom of the information specifically says that after future CUs are released this specific CU can be downloaded from the Microsoft Windows Update Catalog.  So if you need a specific CU for SQL Server, either for testing purposes, or to fix a failed FCI node, head over to the Microsoft Windows Update Catalog, and you can easily find it there.

One additional note:

You have to use Internet Explorer to access the Microsoft Windows Update Catalog, the site won’t allow you to access it using Google Chrome or other non-IE browsers it seems.

New Performance Tuning using Wait Statistics Whitepaper

Last week a new whitepaper titled SQL Server Performance Tuning Using Wait Statistics: A Beginners Guide was made available through SimpleTalk. This paper is a combination of information from Chapter 1 of my Troubleshooting SQL Server: A Guide for the Accidental DBA book (Amazon|eBook download), also published by SimpleTalk, and a series of articles created by Erin on baseline data collection techniques.

Understanding the time a session spends waiting inside of SQL Server is an incredibly important part of performance tuning and diagnosing problems. As consultants we use the information contained in the wait statistics during health checks of servers for bottleneck identification, and as a part of root cause analysis while troubleshooting problems. This whitepaper will provide you an introduction into the world of performance tuning using wait statistics in SQL Server and will explain the common wait types and what they do and do not mean in the context of performance tuning and troubleshooting. You will also find code for collecting and aggregating the information available in SQL Server DMVs about the waits that have occurred in the past and what sessions are currently waiting for in the paper to simplify getting started with understanding the activity on you SQL Servers.

Table of Contents

  1. Introduction
  2. The SQLOS scheduler and thread scheduling
  3. Using wait statistics for performance tuning
  4. Investigating active-but-blocked requests using sys.dm_os_waiting_tasks
  5. Analyzing historical wait statistics using sys.dm_os_wait_stats
  6. Common wait types
  7. Wait Statistics baselines
  8. Summary
  9. Further reading
  10. About the authors
  11. About the technical editor

A big thanks goes to Tony Davis at Redgate for editing all of the content and merging the two sets of information into a single concise introduction to the world of Performance Tuning SQL Server using Wait Statistics. The whitepaper can be downloaded from the SimpleTalk website, or from the help section of our site through the link below.