Transactional Replication Use Cases

This blog post is a part of a series on SQL Server Transactional Replication.  If you want to find the other posts in the series, check out the main page for the series here.

Why Transactional Replication?

Why in the world would you want to use Transactional Replication?  Isn’t that the thing that is impossible to manage, difficult to configure, and always prone to problems? As a consultant, I see a lot of scenarios where every problem becomes a nail because all you have is a hammer. Sometimes another technology can provide a solution to problems that exist but little is known about the technology, and Transactional Replication tends to fall into this scenario in my experience. In this post we are going to take a look some of the more common Transactional Replication use cases as well as some scenarios where it can be used to solve a business problem in a different way. As with anything, Transactional Replication is just another tool to have in your toolbox.

Offloading Read-only Workloads

Yes today we have Availability Groups and Readable Secondary capabilities, and I’ve already written a blog post Availability Group Readable Secondaries – Just Say No that outlines the challenges you might have with that approach. If you don’t need the entire database for read-only queries, you can minimize the data-set sizes by using transactional publications for only those articles/tables that are needed and keep the size of the readable copies as small as possible. You can also index those specifically for the readable workload and don’t have to maintain excess indexes on the writable copy in the publisher. The best thing about Transactional Replication is that it doesn’t require Enterprise Edition licenses to use on the subscribers, they can be Standard Edition as long as the feature set you need and hardware limitations of Standard Edition apply.

Reporting and ETL operations

Reporting and ETL operations can be taxing on source systems due to the potential volumes of data that might be being read. By moving these operations off to a transactional subscriber, it is possible to remove that load entirely from the production publisher. In addition to this, it is possible to customize the subscriber side with auditing/logging triggers to mark records that have been changed to allow for targeted incremental ETL operations that only handle the most recently changed data. In some cases Indexed Views can be created on the Subscriber database to pre-aggregate the data and reduce the size of the data set that has to be processed, improving reporting performance times without blowing up the size of the primary publisher database or impacting it’s performance.  However, you need to be careful with this and test changes to the subscriber for impacts to the distribution agents ability to replicate changes with low latency.  More than once I have worked with a client having replication performance issues where the root of the problem was due to Indexed Views, Full-Text Indexing and other additions that were made to the subscriber database without testing their impacts.

Isolating client data in a multi-tenant design

Multi-tenant databases are fairly common in today’s IT infrastructure, especially with many vendors providing software as a service type solutions for clients, or even having clients that support multiple locations with a single database solution. When a client needs a copy of all of their data for purposes outside of the normal application use, splitting that data off can be a timely process and is usually accomplished using ETL operations that build the same database schema, but then load 100% of the single clients data into that database to provide a backup copy of just that clients information. If this is a routine operation, moving 100% of the data each time is inefficient and creates impacts to the production database that can be removed through a filtered transactional publication. With a subscriber database for each filtered client, the time to provide a new backup copy of the data is just the time it takes to make a backup of the subscriber database, simplifying the process and making it much faster.

Generating Testing Databases

Many large production databases are commonly backed up and restored to test environments for development purposes, but in some cases, there are auditing and other tables that are not needed in the development environment and these can take a significant amount of storage on disk. I recently worked with a client database that was 10TB in size due to internal auditing tables that held nearly 8TB of the data that were immediately truncated on restore to a development server. However, this required 10TB of storage to be available just to get the backup restored. To solve this, a transactional publication of every table but those Audit tables was created to a subscriber that then became the source for development backups to be restored from. Once restored the database only took 2TB of space and allowed 30TB of storage to be recovered from the test environment for other uses since it had only been provisioned to allow the restores to happen and then get truncated.  Now before you say those tables should be in a separate database, that could be the case but it’s not how things are currently deployed or implemented for this client and you can’t have referential integrity across databases for the auditing tables.  They just don’t need the audit data to exist in a testing environment during testing, and this is an alternative means of saving a lot of storage given their current implementation.

Transactional Replication as an HA Strategy?

This one can certainly generate some debate, and I’m not the first person to blog about using Replication as a part of a HA strategy with SQL Server, Paul has actually blogged about it here.  However, depending on the business requirements, the design of the application, and the SLA’s, replication can absolutely be used as a part of a HA strategy for a database.  My favorite deployment of replication for HA purposes is for read intensive workloads using multiple subscribers behind a network load balancer like a Big IP F5 or Citrix Netscaler.  You can even leverage features like Datastream to do Content Switching and route connections to specific SQL Servers based on the data contained in the TDS stream.  While transactional replication alone is not a total HA solution, it can absolutely be a part of a HA strategy for SQL Server.

Understanding SQL Server Replication

This blog post is a part of a series on SQL Server Transactional Replication.  If you want to find the other posts in the series, check out the main page for the series here.

SQL Server Replication is one of the components of SQL Server that often has a bad reputation for being difficult to manage and prone to errors and problems.  However, when configured and managed properly, replication often just works and requires very little effort to maintain.  Replication is really easy to use and configure once you understand the basics behind how it works and the various components involved, and it’s been a part of SQL Server for over 20 years with very minimal changes and many other features piggy back on the foundation provided by replication.

There are at least three databases that participate in any replicated topology:

  1. Publisher – the main copy of the database that reads and writes can be issued against
  2. Distributor – the central database behind replication where log operations are written from the publisher and read for distribution to subscribers
  3. Subscriber – the database that receives changes made on the articles configured for replication on the publisher

The terminology for replication is based on a publishing industry terminology for print media like a magazine or newspaper.

  1. Publisher can produce one or many publications
  2. Publications contain articles
  3. Articles represent a specific object being published
  4. Subscribers subscribe to a publication to receive the changes to all of the articles within that publication

There are a few more advanced features of replication that go outside of these basics, but for a majority of use cases for replication, just understanding these simple terms is all that is necessary to plan out and build a configuration that meets a majority of business requirements.

The primary functionality provided by replication is handled by three specific agents, that run on the different servers to perform specific tasks:

  1. The Log Reader Agent reads the transaction log records from the publisher databases transaction log that are marked for replication and writes those commands into the distribution database.  The great thing about replicated databases is that they do not require FULL recovery model to be able to publish the changes.  The changes are marked for replication and until the log reader agent has successfully processed those commands to the distribution database they have to remain in the transaction log.
  2. The Snapshot Agent generates a snapshot of the articles as an initialization point for replicating the articles to new subscribers.  This is accomplished through BCP operations that write the contents of the tables out to files on disk that can then be used to create and load the data on new subscribers to initialize the data for further application of commands.
  3. The Distribution Agent applies any snapshots required to initialize the subscriber and then executes commands stored in the distribution database to apply changes to the subscriber database to replicate the data changes happening at the publisher

Since replication is based off of reading the transaction log of the publisher, it is an entirely asynchronous process and has minimal impact to transactional throughput of the source system.  However, this also means that there is latency between when a change is made, and when that change becomes available on a subscriber, but this is typically only a few seconds end-to-end.

Availability Group Readable Secondaries – Just Say No

Since SQL Server 2012, Microsoft has put a lot of efforts into marketing AlwaysOn Availability Groups as a feature and solution for both high availability/disaster recovery for SQL Server, and scalability for read-only workloads.  Over the last six years, I have talked more clients out of using Availability Groups than I have actually implemented for SQL Server for a lot of different reasons.  As easy as Microsoft has made it to build an Availability Group, restore and synchronize the databases, join secondary servers, etc. the simple fact is that once they are in production usage, you had better know what not to do that could break things, the impact having a readable secondary, and how to troubleshoot problems when they occur, because they will.

A lot of the implementation questions I get for Availability Groups are generally looking at leveraging a secondary replica for High Availability or Disaster Recovery and often times also for offloading reporting or readable workloads.  Readable Secondary copies of the database are really simple at the surface conceptually, but there are a lot of additional considerations I make that could lead you to another technology entirely like Transactional Replication or Log Shipping with Standby. Let’s look at how these three stack up against each other for different data access requirements.

Real-time Data

First things first, for real-time data access, you only have one option, the primary writeable copy of the database. In replication this is the publisher database, log shipping calls it the log shipped primary, and Availability Groups refers to it as the primary replica.  None of the secondary copies of the data, even in a synchronous Availability Group replica, is ever guaranteed to actually be real-time data.  A synchronous Availability Group only guarantees that the log records for an operation are hardened in the synchronous secondary replicas transaction log allowing them to later be redone on the secondary database when a transaction commits.  Depending on the volume of activity that is occurring, and whether the secondary replica is being used for read-only workloads or has sufficient hardware, the redo thread can typically be anywhere from a few hundred milliseconds to a few seconds behind the actual primary replica.  This means that if you insert data into a table, scope the identity value and return it back to the application, then attempt to read that record immediately from the read-only secondary, it could actually not be there (Yes, this actually does happen).

Near Real-time Data

For near real time data access, both Replication and the Availability Group can meet the requirement for near real-time data. Replication typically will have slightly more latency than an Availability Group readable secondary in either commit mode by a few seconds due to the way replication functions.  The log reader agent has to read the transaction log, write replication changes to the distribution database, and then the distribution agent has to pickup those changes up and apply them to the subscriber.  This typically has a latency of 3-7 seconds under normal conditions, where an Availability Group readable secondary could have less latency depending on the volume of activity. The big difference between the two is entirely in the implementation and features supported.

If you need a non-clustered index to support the read-only workload, that index must be created on the primary database of an Availability Group for it to be available on the readable secondary.  This means you have to backup and maintain that index and the maintenance will affect all servers in the Availability Group as a logged operation requiring redo on secondary replicas.  This is not the case with transactional replication, where the subscriber database is a writeable copy of the database and can have a completely different indexing design from the publisher database that is based on just the read-only workload.  Since the subscriber database is writeable, that also means that you can use features like Query Store, plan forcing, and automatic plan correction for the read-only workload; all of these are unavailable for a readable secondary replica in an Availability Group because the database is an exact duplicate of the primary (which can be enabled for Query Store and use these features).

Another consideration is the impact of the 14-byte row-version tag generation to support Read Committed Snapshot Isolation (RCSI) for the read-only replica in an Availability Group.  This can only be generated on the primary replica database, but is not required on the publisher database for transactional replication. Why does this matter? It can lead to page splits that cause logical fragmentation of the indexes. Moving to a readable secondary may require monitoring index fragmentation and introducing fill factor changes where you wouldn’t have needed them previously. If you have read the incredibly bad information online telling you that you can ignore index fragmentation entirely with SSDs, or that you don’t need to rebuild indexes with an Availability Group, consider this: logical scan fragmentation may not matter much from a performance standpoint on fast storage, but a low page density caused by page splits and fragmentation affects every single replica copy and wastes space on disk and in memory in the buffer pool, and affects plan select due to query costing by the optimizer. With replication, all of this is avoided entirely unless you use RCSI for your publisher workload anyway, but that does not impact the subscribers in anyway since they are only replicated copies of the data modification operations, not the pages themselves.

Point-in-time Data

For point in time data access, I often go low tech and suggest using Log Shipping with Standby and delayed restore to create a warm copy of the database that is restored to a point in time.  This can be useful for accounting reporting on the previous days sales transactions, performing ETL into a data mart or data warehouse, and it also provides the benefit of having a second copy of the database at a known point in time for DR purposes.  You can also do this type of reporting off of the near real-time data provided by Replication and Availability Groups typically as well. The table below breaks down the features and requirements of each of these three technologies for providing a readable copy of the database.

Real Time DataSecondary DelayReadable Workload IndexesRow-version Tag OverheadQuery Store SupportAutomatic Plan CorrectionPartial Data Only
Availability Groups (Sync)Primary OnlyDepends on Redo typically within a few secondsMust be created on PrimaryPrimary database with Readable SecondaryPrimary OnlyPrimary Only

No

Availability Groups (Async)Primary OnlyDepends on Log Send Queue and Redo typically within a few secondsMust be created on PrimaryPrimary database with Readable SecondaryPrimary OnlyPrimary Only

No

ReplicationPublisher OnlyDepends on log reader and distribution agent typically within a few secondsMay be created on subscriber copy onlyOnly if RCSI enabledPublisher and SubscribersPublisher and Subscribers

Yes

Log ShippingPrimary OnlyDepends on restore job configuration and standby – Point in time only, updates require disconnecting usersMust be created on PrimaryOnly if RCSI enabledPrimary OnlyPrimary Only

No

Complexity and Manageability

So what about complexity and manageability of these different options?

In the opening paragraph of this post I pointed out how easy Microsoft has made it to build an Availability Group to handle read-only copies of a database or group of databases. Setting up an Availability Group is really quite simple but that doesn’t mean it’s not lacking in complexity.  First you need to know what kind of Availability Group are you going to implement.  What started as a single feature has morphed over each release into:

  • Basic Availability Groups – Standard Edition (single database)
  • Enterprise Availability Groups – Enterprise Edition (one or many databases)
  • Distributed Availability Groups – Support for up to 17 readable secondaries without daisy-chaining
  • Domain Independent Availability Groups – Environments without Active Directory
  • Read-scale Availability Groups – No clustering requirement or HA

Add Linux into the mix of options as well as whether or not you need failover clustering as a part of that configuration, and this isn’t as simple as it seems.  One of the common areas of failure with Availability Groups that I’ve seen is with choosing the correct quorum configuration for the Windows Failover Cluster configuration, though Microsoft has made this easier in Windows Server 2012 R2 and higher. An error in quorum configuration can result in your High Availability solution failing and taking the databases offline as a result.

Replication tends to get a bad rap when it comes to complexity and manageability. The reality is that the technology just isn’t well understood by most SQL Server professionals, and it is really easy to configure (there’s a great UI in SSMS for it also), and as long as you don’t play with it or modify data on the subscriber databases, it usually just works without a hitch. Change management is something that has to be planned out with Replication, where it wouldn’t have to be with Availability Groups.  New tables have to be added to replication manually as an article, and certain operations like dropping and recreating a table or truncating a table are not allowed when the table is being replicated (sorry, no more using SSMS designer to insert a column in the middle of a table). Compared to Availability Groups, just for change management, Replication is much more complex and requires more work to manage over time, but the types of changes that require manual intervention in Replication typically are not occurring at a frequent pace for established applications and databases.

Log Shipping is just stupid simple low-tech and easy to manage. I mean for real how difficult is taking a backup and restoring it on another server? You might be surprised, but the configuration wizard and Agent Jobs automate this so that you typically don’t have to do anything but let it do its thing. Even third party backup tools offer their own flavor of implementation of log shipping as a part of their backup products for SQL Server typically. If something ever gets out of sync for log shipping, it usually means that someone took a manual log backup of the database, or switched recovery models and broke the log chain, and the standby database will have to be reinitialized from a new FULL backup of the primary to resume the restores (or you can manually apply the missing backup if a manual log backup was performed to close the gap).

Summary

While Availability Groups have become the Go To technology for providing a readable copy of data for SQL Server, they are not the only feature available to meet this requirement.  It is important to understand the requirements and the limitations of any feature that you are considering using, and to also evaluate the other available features that might be a better fit for the needs of the environment.  I have used transactional replication for nearly 10 years to solve read-only workload and reporting requirements for SQL Server and continue to use it over Availability Groups where as a technology, it is just a better fit for the requirements.  Being able to reduce indexing on the primary writeable copy of the database and only having readable workload non-clustered indexes on the subscribers can be a huge benefit for some workloads. For one client it was the difference between a 800GB publisher database and a 1.7TB primary AG replica which affected their backup/restore times and RTO SLAs.  Collect the requirements, evaluate the technologies and limitations, then choose the appropriate way to solve a problem.