A SQL Server DBA myth a day: (1/30) in-flight transactions continue after a failover

(Look in the Misconceptions blog category for the rest of the month’s posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together when you join our Insider list, plus my online Myths and Misconceptions training course on Pluralsight.)

I’ve got a session coming up at SQL Connections in a couple of weeks called DBA Mythbusters. And I’ve got a whole stack of DBA myths and misconceptions around SQL Server to choose from (I came up with 69 just in an hour on the plane last weekend) and I won’t be able to cover them all in the session, so I thought it would be cool to do a Debunk-A-Myth-A-Day blog post series in April. And so here we are.

Every day (hopefully!) I’ll pick a myth and lay it to rest. They won’t be long posts and they may be links to previous posts, but hopefully they’ll be useful.

Myth #1: After a failover, any in-flight transactions are continued.

FALSE

Any time a failover occurs, some form of crash recovery has to occur. If the transaction hadn’t committed on the server that the connection was to when the server/database went down, there’s no way for SQL Server to automatically recreate all the transaction context and resurrect the in-flight transaction on the failover server – whether the failover used clustering, mirroring, log-shipping, or SAN replication.

For failover clustering, when a failover occurs, the SQL instance is restarted on another cluster node. All the database go through crash recovery – which will roll-back all uncommitted transactions. (See my TechNet Magazine article from February 2009 for an explanation of crash recovery and how it works: Understanding Logging and Recovery in SQL Server.)

For database mirroring, the transaction log on the mirror database is constantly being used to perform redo operations (of the log records coming from the principal). When the mirror switches to being the principal, the transaction log switches into crash-recovery mode and recovers the database as if a crash had occured – and then lets connections into the database.

For log shipping, transaction log backups are being replayed on the log shipping secondary database(s) periodically. When a failure on the primary occurs, the DBA brings the log shipping secondary online by completing the restore sequence (either bringing the database immediately online, or replaying all possible transaction log backups to bring the database as up-to-date as possible). The final part of any restore sequence is to run the undo portion of crash recovery – rolling back any uncommitted transactions.

With SAN replication, I/Os to the local SAN are shipped to the remote SAN to be replayed. When a failure occurs, the system connected to the remote SAN comes online and the databases go through crash recovery, in just the same way as for failover clustering (this is an overly simplistic explanation – but you get the idea).

The *only* technology that allows unbroken connections to a database when a failure occurs, is using virtualization with a live migration feature, where the VM comes up and the connections don’t know they’re talking to a different physical host server of the VM.

But whatever mechanism you’re using – if a *connection* is broken, the in-flight transaction is lost. So your application needs to be coded to gracefully cope with broken connections and some form of transaction retry.

1 down – 29 to go!

13 thoughts on “A SQL Server DBA myth a day: (1/30) in-flight transactions continue after a failover

  1. The entire post all I could think to my self I hope he talks about how developers should develop to handle broken connection so I can send this article to my developers and say, "See someone way smarter than myself said it too". Thanks for the ammo.

  2. Hi Paul,

    I know SQL Server itself can’t preserve transaction state, but it’s not particularly difficult to catch an exception client side, wait until the SQL Server is available again, then replay the transaction. Isn’t this why MSDTC gets enrolled in the SQL Server resource group? i.e. To support the clients that do implement this persist and retry pattern?

    Cheers

    JH

  3. Sure, yoy can use a virtualized SQLServer in production and rely only in Vmotion….But what happen when you need to update the OS or the SQLServer instance? Do you offer to your clients a cup of tea, a big one :-)

  4. If a connection is reset by peer or the connection socket is closed by the server (in our case a random Winsock reset which seems to be a known issue) how exactly does one recover? I was hoping there was a socket level solution but apparently not. This is not a typical OLTP where get pool connection, one row is updated, commit, put connection back to pool. These are processes that accumulate 1000’s of transactions easily in seconds but randomnly although infrequently we can get socket closed on the connection. Do you know of any way to recover those uncommitted CUD operations on a closed socket connection?

    Thanks

  5. Thanks Paul, is there anyway to failover an ag manually in order to perform maintenance on one of the nodes, but wait until uncommitted transactions have completed?

  6. Thank you. Paul – for all this

    Will in-flight transactions continue after a AAG failover ?
    Same DB backup jobs are scheduled on both Primary & Secondary ( where backup preference is Primary only )
    DB backup job is running on the Primary Instance, and after few minutes AG failover to Secondary will that Backup job continue to the new Primary and it is not then you to handle this ….

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.