Yesterday on Twitter a few people expressed disgust at a conference slide listing transactional replication as a high-availability (HA) technology. I took exception to that and argued. I think the discussion merits a blog post so here it is. This point of view is controversial and I'm expecting some dissenting comments – bring them on!

I teach high-availability inside Microsoft and at the MCM level and I absolutely always teach that transactional replication, and it's more advanced brother - peer-to-peer replication, have their place as a high-availability technology. The two whitepapers on high availability I've written for Microsoft over the last two years also discuss transactional replication as a high-availability technology (links later).

To start with, let's define a high-availability technology as one which makes data more highly available in the event of a disaster. That definition does not imply any limits on the amount of data loss or downtime experienced – and to do so would be incorrect.

As some background, many people approach HA planning the wrong way – either by trying to shoe-horn a new strategy onto an incumbent and unsuitable technology or just picking the technology they've heard of – which is most commonly failover clustering. Planning an HA strategy means carefully laying out the business requirements, considering technical and non-technical limitations, reaching a compromise between the technologists and the business managers, and THEN starting to evaluate technologies. Once you start evaluating technologies you'll find that no single HA technology that SQL Server provides is a one-size-fits-all solution – they all have pros and cons. The resulting strategy will likely utilize multiple technologies to meet the strategy's needs.

You can read in more depth about my HA planning methodology in the whitepaper I wrote for Microsoft last year: High Availability with SQL Server 2008. This also has a good overview of transactional and peer-to-peer replication.

There are seven main arguments I hear against transactional replication as an HA technology – I'll take them on one at a time.

Argument 1: It's a scale-out technology, not an HA technology

So where's the actual argument here? Sure, transactional replication and it's more advanced brother, peer-to-peer replication, are primarily query scale-out technologies. But both provide extra copies of some data, so both can be used to provide higher availability of that data in the event of a disaster.

Argument 2: There's too much latency involved – how can that be HA?

High availability does not mean zero data loss and zero downtime. High availability means the data is more available in the event of a disaster than if you had no redundant copy. Sure, there is some latency involved with transaction replication but there's also latency involved with log shipping, and with asynchronous database mirroring.

The key is making sure that the availability provided by the technology fits within your desired downtime and data loss SLAs (service level agreements) – commonly known as RTO (recovery time objective) and RPO (recovery point objective), respectively. If you're fine with potentially losing 1/2 hour of data, and the transactional replication latency between the Publisher and Subscriber is only 10 minutes, then transactional replication can meet your data loss requirements.

Argument 3: It's really hard to know what you're missing if the Publisher crashes

Correct. But it's just the same as the other asynchronous technologies – log shipping and asynchronous database mirroring – so this is not a valid argument against transactional replication.

Argument 4: Replication is too complicated to set up

This argument is asinine IMHO. A technology may be complicated to use or troubleshoot, but that does not make it unsuitable – it just means you have to have someone who knows what they're doing. I've seen people mess up configuring log shipping – does that mean it's not an HA technology? No. Failover clustering can be fiendishly difficult to set up at the Windows level – does that mean it's not an HA technology? No. This argument is also invalid.

Argument 5: It only replicates some of the data

Correct. This means it's unsuitable if you'd like to protect an entire database or several databases – but if its restrictions work for you, no problem.

Argument 6: You can't make certain schema changes on a published table

Correct. But that doesn't mean it's not suitable for providing HA – as long as you can live with its restrictions.

Argument 7: There's no automatic failure detection or automatic failover

Correct. But neither does log shipping. This just means you have to provide the mechanisms for this. Transactional replication still means your data is available if the Publisher crashes.

[Edit: Argument 8: There's no easy way to fail back again

Correct - for transactional replication - that can be a PITA. With peer-to-peer replication that problem disappears because the changes that happened while one peer node was down are automatically pushed to it when it comes back online.]

Arguments FOR using transactional replication

So these arguments are really saying that transactional replication has some restrictions. Sure it does. As I said above, no single HA technology is a one-size-fits-all. Now, what about all the good things about transactional replication?

  • It only replicates the data you're interested in. Unlike log shipping or database mirroring, transactional replication only ships changes to the data you're interested in, rather than the entire database. Yes, there are applications where this is good. And it can be especially useful if you have to perform regular index maintenance on large indexes and you don't have the network bandwidth to ship transaction log for these.
  • It works in the SIMPLE recovery model, unlike log shipping or database mirroring, where you MUST be in the FULL recovery model (BULK_LOGGED is also permissible with log shipping). You don't have to perfom log management with log backups just because you're using an HA technology. Yes, there are strategies who don't want to be able to restore the entire database to a point in time.
  • You can have as many Subscribers as you want. Well, to a point. You need to be careful of network bandwidth from the Distributor, the Distribution Agent location (especially if they're all push subscriptions), and some other things. Log shipping also allows as many secondaries as you want but database mirroring can only have one mirror.
  • You can make use of republishing (having a Subscriber that is also a Publisher to down-stream Subscribers), which allows some interesting topologies and balancing of network traffic. No other SQL Server HA technology allows this.
  • The replicated data is available for reading and updating on both sides (and changes can then be published again using updateable subscriptions, or more easily with peer-to-peer replication). No other SQL Server HA technology allows this and this is one of the primary reasons why transactional replication is sometimes the necessary choice.
  • There's no wait time when a failover is necessary – the data is just there. With log shipping, recovery has to complete – which could take a while for any uncommitted transactions to roll back. With database mirroring, the REDO queue on the mirror must be recovered before the mirror database can come online as the principal. 

Summary

It all comes down to making rational choices of technologies based on the HA strategy requirements and limitations. Transactional replication will work for some of you, and not for others – it has its limitations just like the other SQL Server HA technologies. But it should never be dismissed out-of-hand as not being a high availability technology.

Take a look at the whitepaper I wrote for Microsoft this year – Proven SQL Server Architectures for High Availability and Disaster Recovery – where one of the five real-world customer examples (with customer names, case studies, etc) is about using transactional and peer-to-peer replication. That example links to a very comprehensive document the SQL CAT team wrote about the decision process that led to using transactional replication and all the setup and tuning that was done.

[Edit: one more whitepaper I wrote in 2008 that you might be interested in: SQL Server Replication: Providing High Availability using Database Mirroring.]

Thanks for reading!

PS If you post a comment and it doesn't appear it's because I'm moderating comments against spam – not to filter out dissention. Be patient and it will appear!