As Kimberly blogged about earlier this year, SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.
Yesterday I blogged about log shipping performance issues and mentioned a performance problem that can be caused by using availability group readable secondaries, and then realized I hadn’t blogged about the problem, only described it in our Insider newsletter. So here’s a post about it!
Availability groups (AGs) are pretty cool, and one of the most useful features of them is the ability to read directly from one of the secondary replicas. Before, with database mirroring, the only way to access the mirror database was through the creation of a database snapshot, which only gave a single, static view of the data. Readable secondaries are constantly updated from the primary so are far more versatile as a reporting or non-production querying platform.
But I bet you didn’t know that using this feature can cause performance problems on your primary replica?
As with most things in life, you don’t get anything for free. Readable secondaries are really useful, but there is a performance trade off you need to be aware of. All queries that are executed against a readable secondary are automatically run using read-committed snapshot isolation. This means they do not require share locks and so will not block any database changes being replayed from the primary replica (i.e. the constant redo of log records on the secondary replica that have been sent from the primary replica).
To do this requires the use of the versioning system, where (simplistically) pre-change versions of records are copied into the version store in tempdb and queries work out which version of the record is the correct one for them to process, based on the query’s starting time. All records that change get a 14-byte tag added on the end of the record that allows a query to see if this is the correct record, and if not to follow a pointer to the previous version of the record in the version store. This has been the mechanism since snapshot isolation and read-committed snapshot isolation were introduced in SQL Server 2005.
Now consider this: all AG replicas are exact copies of the primary replica. So how can versioning work on the readable secondary, adding 14-byte tags to some records? That must break the ‘exact copy’ rule, right?
Well, yes, it would… if the primary replica didn’t also change.
When a readable secondary is configured in an AG environment, all changing records on the primary replica start getting empty 14-byte versioning tags added to them. This is so that the 14-bytes of extra space on the record is noted in the transaction log and replayed on the secondary replicas, allowing the readable secondary to make use of the empty 14-byte space to store the versioning tag it needs.
This doesn’t break the ‘exact copy’ rule because the 14-bytes isn’t used for anything to do with recovery, there just has to be 14-bytes there.
So versioning tags start getting added to changing records on the primary (to be clear, it doesn’t turn on versioning on the primary) so table and index records start to get 14-bytes longer. And what happens when records get longer on pages where there isn’t enough space? Page splits in your indexes (and forwarded records in heaps – but I’ll concentrate on indexes here) leading to low page densities (wasted disk space and buffer pool memory), logical fragmentation (poor scan performance), and a bunch of extra, expensive log record generation from the page splits themselves.
To counteract this, you’ll need to implement (and/or possibly lower existing) fill factors on your indexes and even potentially start doing index maintenance on indexes that may not have required it previously. Quite an insidious problem that can be hard to figure out unless you know what’s going on under the covers!
See the following blog posts for more info:
- Performance issues from wasted buffer pool memory
- How expensive are page splits in terms of transaction log?
- How to choose a good index fillfactor?
- Easy automation of SQL Server database maintenance
This MSDN page has more general information and this whitepaper from Microsoft explains in more depth the various performance impacts from using readable secondaries: AlwaysOn Solution Guide: Offloading Read-Only Workloads to Secondary Replicas.
If you’re implementing readable secondaries in your AG configuration, make sure that you also investigate and implement index fill factors in the database so that the versioning tags that are added under the covers don’t start causing page splits and fragmentation.
7 thoughts on “SQLskills SQL101: Readable secondary performance problems”
I have seen this at every single client I have come across that enabled readable secondaries. It is SERIOUSLY EXACERBATED because ALL OF THEM had the default ZERO FILL FACTOR for EVERY INDEX!! The massive page splits (and umpteen negative consequences) caused by this is STUNNINGLY BAD!!
Great point that’s often missed! And yes, this causes fragmentation, but have you come across a case where this specific issue was the real cause of the customer’s performance problems.
(there was meant to be a question mark at the end there)
Yes – I’ve seen this problem contribute to performance issues (otherwise I wouldn’t have blogged it :-)
Hi – I’m trying to find information about fragmentation on asynchronous read replicas in MS SQL2016 and this doc is about the closest i’ve come to getting somewhere but we still haven’t found any resources discussing this. We have an AO cluster with 2 synchronous readable secondaries and 1 asynchronous readable secondary. The latter is displaying pretty heavy fragmentation but we didn’t expect that to be possible on a read replica as the index maintenance happens on the primary ahead of REDO. Can anyone shed any light on why or how an asynchronous read replica could become fragmented? We’ve never seen it on our synchronous replicas. Thanks in advance.
You can’t have fragmentation on a secondary without it also being on the primary, as they’re physical copies (otherwise log record reply wouldn’t work). Possibly the defrag work hasn’t been replayed on the async secondary yet when you check?