SQLskills SQL101: Why does repair invalidate replication subscriptions?

SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

Whenever I’m teaching and recovering from corruption, I always stress that if the REPAIR_ALLOW_DATA_LOSS option of DBCC CHECKDB (which I’ll just call ‘repair’ from now on) has to be used, you should do the following:

  • Run another DBCC CHECKDB to make sure that repair fixed everything and no new problems were exposed (by the repair fixing something that was blocking DBCC CHECKDB‘s access to a portion of the database)
  • Run DBCC CHECKCONSTRAINTS on any tables or indexes that were repaired and are involved in constraints (stated in Books Online too)
  • Reinitialize any replication subscriptions that are based on tables that were repaired (stated in Books Online too)
  • Reevaluate your HA/DR strategy so you don’t have to use repair again in future

One question I’m often asked is about why replication can be broken by running repair.

There are two ways that replication can be affected: repairs on replication metadata tables, and repairs on anything else to do with a subscription.

Repairs on replication metadata tables

This is the simplest case to explain. If the repair operation affects any of the replication metadata tables (i.e. deleted some data from them), the entire replication publication will be in an inconsistent state and you should remove replication completely from the database an reinitialize it. This isn’t limited to a single subscription – all replication should be reconfigured.

Repairs on anything else

Transaction replication captures changes to the publication database by analyzing the transaction log, looking for transactions that change data in any of the publications, and converting those operations into logical operations that can be applied to the subscribers. Merge replication captures changes to the publication database using DML triggers and converting those operations into logical operations that can be applied to the subscribers.

Neither of these mechanisms can capture of operations performed by repair. Repair operations are always direct physical changes to the database structures to fix inconsistencies in the structures (e.g. a database page, table record, or a linkage between two pages), as opposed to physical changes because of queries performing inserts, updates, or deletes on tables.

These repair operations cannot translated into logical operations that can be applied to replication subscribers because there are no logical operations than can be expressed using T-SQL for the equivalent of the direct structural changes that repair is performing. Replication does not preserve the exact physical location of a particular record between the publication and subscription databases, so a direct change to record Y on page X in the publication database would not be able to be replayed on the subscription database (remember, replication ships logical changes, not physical changes). This means that if any table that is part of a replication publication is changed by a repair operation, the replication subscription is no longer valid and must be reinitialized.

As an example, imagine a repair operation is forced to remove a data page from a unique clustered index (essentially deleting some table records), and the subscription is NOT reinitialized. Those records would still exist on the replicated copy of the table. If a subsequent insert operation inserted records with cluster key values corresponding to the records deleted by the repair operation, the Distribution Agent would fail when trying to apply the inserts to the replicated copy – because the repair operation was not applied to the subscription database and a duplicate key violation error will occur when attempting to apply the insert to the replicated table.

A replication subscription must always be reinitialized if any table in the publication is affected by a repair operation, or the replication metadata tables are repaired.

Thanks

SQL Server Magazine: feature article on using database repair

The September SQL Server Magazine articles are now available on the web and include my latest feature article on Using Database Repair for Disaster Recovery.

It includes a detailed walk-through of a disaster scenario where all backups include the corruption – showing you how to run repair and then try to recover some of the damaged data from an older backup.

You can get to the article HERE (no login required).

Enjoy!

PS Note that the posting date on the article says 2010 – the online editors at SQLMag will fix it shortly.

A SQL Server DBA myth a day: (16/30) corruptions and repairs

(Look in the Misconceptions blog category for the rest of the month’s posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together when you join our Insider list, plus my online Myths and Misconceptions training course on Pluralsight.)

Short and quick today as in the dictionary under ‘busy’ it says ‘See Paul Randal today’ – but I can’t neglect my readers so have to get a blog post in… :-)

Myth #16: variety of myths around corruptions and repairs…

All of them are FALSE

There are a bunch of things I hear over and over around what repair can and cannot do, what can cause corruptions, and whether corruptions can disappear. A bunch of these I’ve already written up in blog posts over the last few years so rather than regurgitate the same stuff, this myth-buster post is some interesting links to keep you happy.

Firstly, around what repair can and cannot do. I wrote a blog post Misconceptions around database repair that covers 13 separate myths and misconceptions – from whether you can run repair separately from DBCC CHECKDB (no!) to whether REPAIR_ALLOW_DATA_LOSS will cause data loss (I’m confused as to why the name is confusing :-).

Secondly, I’ve heard many people complaining the DBCC CHECKDB shows corruptions which then ‘disappear’ when they run DBCC CHECKDB again. There’s a very good reason for this – the database pages that were exhibiting corruptions are no longer part of the allocated set of pages in the database by the time DBCC CHECKDB is run a second time – so they don’t show as corruptions. I explain this in great detail in the blog post Misconceptions around corruptions: can they disappear?.

Lastly, there’s a pervasive myth that interrupting a long-running operation (like shrink, index rebuild,  bulk load) can cause corruption. No. Unless there’s a corruption bug in SQL Server (which happens sometimes, but rarely), nothing you can do from T-SQL can cause corruption. I wrote a detailed blog post on this a couple of years ago – see Search Engine Q&A #26: Myths around causing corruption.

Hope these are useful to you – got some good posts coming up next week in the series!