(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)
I was contacted last week by someone who was asking how are faster disks causing more WRITELOG waits. They were seeing lots of these waits, with an average wait time of 18ms. The log was stored on a Raid-1 array, using locally-attached spinning disks in the server. They figured that by moving the log to Raid-1 array of SSDs, they’d reduce the WRITELOG wait time and get better workload throughput.
They did so and got better performance, but were very surprised to now see WRITELOG as the most frequent wait type on the server, even though the average wait time was less than 1ms, and asked me to explain.
Let me start by saying this is not unusual, and I’ve seen this sequence of events many times with clients and students.
A WRITELOG wait occurs when a log block is being flushed to disk, and the two most common causes of this are that the log block is full (60KB) or a transaction committed. If it’s the case that a workload is comprised of lots of very small transactions, then a reduction in wait time for the log block flush from 18ms down to less than 1ms is very likely to result in a higher throughput. This means more transactions happening and hence more log block flushes when the transactions commit, so *more* WRITELOG waits.
Bottom line: What was confusing to them was that performance tuning is usually about reducing the incidence of waits and/or the individual wait times. This is about the only example I can think of where you do something that increases performance and the wait type you were concerned about becomes more prevalent.