(Look in the Misconceptions blog category for the rest of the month’s posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together when you join our Insider list, plus my online Myths and Misconceptions training course on Pluralsight.)
Following on from yesterday’s myth about database mirroring failure detection being instantaneous…
(All this week we’re at SQL Connections – follow what’s happening using the #sqlskills Twitter hash tag)
Myth #11: Database mirroring failover is instantaneous
A mirroring failover can occur automatically or can be manually initiated.
An automatic failover is performed by the mirror server (yes, the witness does NOT make the decision) if the mirror and witness agree they cannot contact the principal server (this process is called forming quorum) and the mirroring partnership state is SYNCHRONIZED (i.e. there were no unsent log records on the principal).
A manual failover is performed by you – either because a witness server wasn’t present (and so the mirror cannot ever form the quorum required for an automatic failover) or because the mirroring partnership state was not SYNCHRONIZED at the time the principal server died.
Once the failure is initiated, the mirror database will not come online as the principal until the REDO queue has been processed. The REDO queue is the set of log records that have been received on the mirror from the principal, but have not yet been replayed in the mirror database. Even when using synchronous database mirroring, a transaction can commit on the principal once its log records are written to the mirror’s log drive – it doesn’t have to wait for the log records to be actually replayed in the mirror database. During a failover, the roll-forward of committed transactions must complete before the mirror database comes online, but rolling-back uncommitted transactions happens after the database comes online (using the same mechanism as fast recovery in Enterprise Edition – see my blog post Lock logging and fast recovery).
The roll-forward is single threaded on Standard Edition, and on Enterprise Edition where the server has less than 5 processor cores. On Enterprise Edition where the server has more than 5 processor cores, there’s a redo thread for every 4 processor cores. So you can see how the failover time is really entirely dependent on how much log there is to process in the REDO queue, the processing power of the mirror server, and also what other workloads may be running on the mirror server that are competing for resources.
Because of the fact that mirroring is thought of as always performing a fast failover, many people do not monitor the REDO queue on the mirror. It’s very important to do this as the amount of REDO queue correlates to the amount of downtime you’ll experience during a mirroring failover.
For a bit more detail on all of this, see the Books Online entry Estimating the Interruption of Service During Role Switching.