The Curious Case of… what is the wait resource (0:0:0)?

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

On Twitter today under the #sqlhelp hash tag, a former student asked what is the wait resource (0:0:0)?

This can show up in a couple of places:

  • In the output from sys.dm_exec_requests in the wait_resource column when the last_wait_type is PAGELATCH_EX or PAGELATCH_SH
  • In the output from sys.sysprocesses in the waitresource column when the lastwaittype is PAGELATCH_EX or PAGELATCH_SH

(as sys.sysprocesses is basically the same code as the DMV under the covers)

The simple answer is that in some cases of PAGELATCH_SH or PAGELATCH_EX waits in tempdb, the wait resource simply isn’t preserved by the code and so the wait resource for the last wait type is set to (0:0:0) in the output of those queries. It doesn’t mean anything special, and I’ve only ever seen it in tempdb. It’s been that way since SQL Server 2005 and is still there in all the latest versions today.

Simple answer: ignore it and troubleshoot the other waits.

The Curious Case of… faster disks causing more WRITELOG waits

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

I was contacted last week by someone who was asking how are faster disks causing more WRITELOG waits. They were seeing lots of these waits, with an average wait time of 18ms. The log was stored on a Raid-1 array, using locally-attached spinning disks in the server. They figured that by moving the log to Raid-1 array of SSDs, they’d reduce the WRITELOG wait time and get better workload throughput.

They did so and got better performance, but were very surprised to now see WRITELOG as the most frequent wait type on the server, even though the average wait time was less than 1ms, and asked me to explain.

Let me start by saying this is not unusual, and I’ve seen this sequence of events many times with clients and students.

WRITELOG wait occurs when a log block is being flushed to disk, and the two most common causes of this are that the log block is full (60KB) or a transaction committed. If it’s the case that a workload is comprised of lots of very small transactions, then a reduction in wait time for the log block flush from 18ms down to less than 1ms is very likely to result in a higher throughput. This means more transactions happening and hence more log block flushes when the transactions commit, so *more* WRITELOG waits.

Bottom line: What was confusing to them was that performance tuning is usually about reducing the incidence of waits and/or the individual wait times. This is about the only example I can think of where you do something that increases performance and the wait type you were concerned about becomes more prevalent.

PREEMPTIVE_OS_FLUSHFILEBUFFERS waits on Linux

Last week I blogged about Using the Wait Statistics Report in Azure Data Studio, using a Linux VM in Azure as an example host. As that was my first time running wait stats analysis on a Linux host, I found that the PREEMPTIVE_OS_FLUSHFILEBUFFERS waits on Linux will show up as the top wait on SQL Server 2017 CU6 and higher with default options.

The simple reason for this is that on some Linux servers where the underling I/O subsystem cannot guarantee durable writes when the power fails (i.e. it’s using a volatile write cache), a power failure could lead to data loss within SQL Server. To work around this, 2017 CU6 introduced forced-flush behavior for log and data writing, including things like backups and writes to FILESTREAM files. This means that the writes will be done as normal, and then a call is made to the Windows FlushFileBuffers function, which is translated via the SQLPAL/HostExtension wrapper on Linux into an fdatasync call. There’s a very deep explanation of all the file system internals in Bob Dorr’s excellent post SQL Server On Linux: Forced Unit Access (Fua) Internals.

If you know your I/O subsystem is resilient, you can disable this behavior, as described in KB 4131496.

However, as this will be by far the highest wait on many Linux systems, and as it’s benign, I’ve added it to the list of waits I ignore and updated the Server Reports extension for Azure Data Studio to filter it out as well (this is now live as v1.4).

Enjoy!

PS I also just added the VDI_CLIENT_OTHER wait to the ignore list as well (common from Availability Group seeding), and submitted GitHub changes to Server Reports yesterday, which will be v1.5.