Why could restoring a log-shipping log backup be slow?

A short post to start the day (in India) and then at lunchtime I'll do how it works: FILESTREAM garbage collection.

This is a question that came up recently on an MVP discussion list: why does a 200MB log backup take 40 minutes to restore on my log-shipping secondary?

First answer I thought of was that the I/O subsystem on the log-shipping secondary was very slow, or broken in some way, so I/Os were taking ages to complete. Nope – I/O subsystem was performing well.

Next thing I suggested was the the log-shipping secondary was restoring the log backups using WITH STANDBY and there was something like an index rebuild that was being rolled back. As an aside, when you restore log backups on the log-shipping secondary, you have a choice how they recovery: either WITH NORECOVERY or WITH STANDBY. The first option doesn't allow any access to the database, as the database is still "in recovery". The second option runs the REDO part of recovery, then runs the UNDO part of recovery, but saves the details of what it had to do for UNDO into a file (who's name and location you specify). It then allows read-only access to the database, for queries/reporting/whatever. Obviously if there's a lot of transaction log that has to be undone (i.e. rolling back transactions that weren't committed at the time the log backup completed), then this could take some time. But 40 minutes? No. That wasn't it.

I was partly right on my second guess. The *previous* log backup that was restored WITH STANDBY contained a long-running index operation, and so the undo file that the restore created was *huge*. The next log backup that's restored after a RESTORE … WITH STANDBY, must first undo everything in the undo file (i.e. put the database back to the exact state it was in as if the WITH STANDBY part of the restore never happened) before it can restore more transaction log. In this case, no-one had noticed that the previous log restore *also* took a lot of time and created the huge undo file. It just looked like the 200MB log backup was causing the problem.

My advice: in any kind of there's-suddenly-a-corruption-or-performance-problem situation, don't just look at what's happening now. Always look at what happened leading up to the problem, as the answer is often there.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.