The Curious Case of… faster disks causing more WRITELOG waits

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

I was contacted last week by someone who was asking how are faster disks causing more WRITELOG waits. They were seeing lots of these waits, with an average wait time of 18ms. The log was stored on a Raid-1 array, using locally-attached spinning disks in the server. They figured that by moving the log to Raid-1 array of SSDs, they’d reduce the WRITELOG wait time and get better workload throughput.

They did so and got better performance, but were very surprised to now see WRITELOG as the most frequent wait type on the server, even though the average wait time was less than 1ms, and asked me to explain.

Let me start by saying this is not unusual, and I’ve seen this sequence of events many times with clients and students.

WRITELOG wait occurs when a log block is being flushed to disk, and the two most common causes of this are that the log block is full (60KB) or a transaction committed. If it’s the case that a workload is comprised of lots of very small transactions, then a reduction in wait time for the log block flush from 18ms down to less than 1ms is very likely to result in a higher throughput. This means more transactions happening and hence more log block flushes when the transactions commit, so *more* WRITELOG waits.

Bottom line: What was confusing to them was that performance tuning is usually about reducing the incidence of waits and/or the individual wait times. This is about the only example I can think of where you do something that increases performance and the wait type you were concerned about becomes more prevalent.

Lazy log truncation or why VLFs might stay at status 2 after log clearing

 Earlier this year I was sent an interesting question about why the person was seeing lots of VLFs in the log with status = 2 (which means ‘active’) after clearing (also known as ‘truncating’) the log and log_reuse_wait_desc showed NOTHING.

I did some digging around and all I could find was an old blog post from 2013 that shows the behavior and mentions that this happens with mirroring and Availability Groups. I hadn’t heard of this behavior before but I guessed at the reason, and confirmed with the SQL Server team.

When an AG secondary is going to be added, at that point in time, the maximum LSN (the Log Sequence Number of the most recent log record) present in the restored copy of the database that will be the secondary must be part of the ‘active’ log on the AG primary (i.e. that LSN must be in a VLF on the primary that has status = 2). If that’s not the case, you need to restore another log backup on what will be the new secondary, and try the AG joining process again. Repeat until it works. You can see how for a very busy system, generating lots of log records and with frequent log backups (which clear the log on the primary), catching up the secondary enough to allow it to join the AG might be difficult, or necessitate temporarily stopping log backups on the primary (possibly opening up a window for increased data loss in a disaster).

To make this whole process easier, when a database is an AG primary, when log clearing occurs, the VLFs don’t go to status = 0; they remain ‘active’ with status = 2. So how does this help? Well, the fact that lots more VLFs are ‘active’ on the AG primary means that it’s more likely that the maximum LSN of a new secondary is still part of the ‘active’ log on the primary, and the AG-joining succeeds without having to repeat the restore-retry-the-join over and over.

(Note: the log manager knows that these VLFs are really ‘fake active’ and can reuse them as if they were ‘inactive’ if the log wraps around (see this post for an explanation) so there’s no interruption to regular operations on the AG primary.)

It’s a clever little mechanism that someone thought of to make a DBA’s life a bit easier and AGs less problematic to set up.

And now you know – log clearing won’t *always* set VLF statuses to zero.

New VLF status value

At least since I started working on the SQL Server team (just after 7.0 shipped) and since then there have only been two VLF status codes:

  • 0 = the VLF is not active (i.e. it can be (re)activated and overwritten)
  • (1 = not used and no-one seems to remember what it used to mean)
  • 2 = the VLF is active because at least one log record in it is ‘required’ by SQL Server for some reason (e.g. hasn’t been backed up by a log backup or scanned by replication)

A few weeks ago I learned about a new VLF status code that was added back in SQL Server 2012 but hasn’t come to light until recently (at least I’ve never encountered it in the wild). I went back-and-forth with a friend from Microsoft (Sean Gallardy, a PFE and MCM down in Tampa) who was able to dig around in the code to figure out when it’s used.

It can show up on an Availability Group secondary replica (only, not on the primary, and isn’t used in Database Mirroring):

  • 4 = the VLF exists on the primary replica but doesn’t really exist yet on this secondary replica

The main case where this status can happen is when a log file growth (or creation of an extra log file) has occurred on the primary replica but it hasn’t yet been replayed on the secondary replica. More log records are generated on the primary, written to a newly-created VLF, and sent across to the secondary replica to be hardened (written to the replica’s log file). If the secondary hasn’t yet replayed the log growth, the VLFs that should contain the just-received log records don’t exist yet, so they’re kind of temporarily created with a status of 4 so that the log records from the primary can be hardened successfully on the secondary. Eventually the log growth is replayed on the secondary and the temporary VLFs are fixed up correctly and change to a status of 2.

It makes perfect sense that a scheme like this exists, but I’d never really thought about this case or experimented to see what happens.

Anyway, now you know what status 4 means if you ever see it (and thanks Sean!)