This blog post is a part of a series on SQL Server Transactional Replication. If you want to find the other posts in the series, check out the main page for the series here.
SQL Server Replication is one of the components of SQL Server that often has a bad reputation for being difficult to manage and prone to errors and problems. However, when configured and managed properly, replication often just works and requires very little effort to maintain. Replication is really easy to use and configure once you understand the basics behind how it works and the various components involved, and it’s been a part of SQL Server for over 20 years with very minimal changes and many other features piggy back on the foundation provided by replication.
There are at least three databases that participate in any replicated topology:
- Publisher – the main copy of the database that reads and writes can be issued against
- Distributor – the central database behind replication where log operations are written from the publisher and read for distribution to subscribers
- Subscriber – the database that receives changes made on the articles configured for replication on the publisher
The terminology for replication is based on a publishing industry terminology for print media like a magazine or newspaper.
- Publisher can produce one or many publications
- Publications contain articles
- Articles represent a specific object being published
- Subscribers subscribe to a publication to receive the changes to all of the articles within that publication
There are a few more advanced features of replication that go outside of these basics, but for a majority of use cases for replication, just understanding these simple terms is all that is necessary to plan out and build a configuration that meets a majority of business requirements.
The primary functionality provided by replication is handled by three specific agents, that run on the different servers to perform specific tasks:
- The Log Reader Agent reads the transaction log records from the publisher databases transaction log that are marked for replication and writes those commands into the distribution database. The great thing about replicated databases is that they do not require FULL recovery model to be able to publish the changes. The changes are marked for replication and until the log reader agent has successfully processed those commands to the distribution database they have to remain in the transaction log.
- The Snapshot Agent generates a snapshot of the articles as an initialization point for replicating the articles to new subscribers. This is accomplished through BCP operations that write the contents of the tables out to files on disk that can then be used to create and load the data on new subscribers to initialize the data for further application of commands.
- The Distribution Agent applies any snapshots required to initialize the subscriber and then executes commands stored in the distribution database to apply changes to the subscriber database to replicate the data changes happening at the publisher
Since replication is based off of reading the transaction log of the publisher, it is an entirely asynchronous process and has minimal impact to transactional throughput of the source system. However, this also means that there is latency between when a change is made, and when that change becomes available on a subscriber, but this is typically only a few seconds end-to-end.