PREEMPTIVE_OS_FLUSHFILEBUFFERS waits on Linux

Last week I blogged about Using the Wait Statistics Report in Azure Data Studio, using a Linux VM in Azure as an example host. As that was my first time running wait stats analysis on a Linux host, I found that the PREEMPTIVE_OS_FLUSHFILEBUFFERS waits on Linux will show up as the top wait on SQL Server 2017 CU6 and higher with default options.

The simple reason for this is that on some Linux servers where the underling I/O subsystem cannot guarantee durable writes when the power fails (i.e. it’s using a volatile write cache), a power failure could lead to data loss within SQL Server. To work around this, 2017 CU6 introduced forced-flush behavior for log and data writing, including things like backups and writes to FILESTREAM files. This means that the writes will be done as normal, and then a call is made to the Windows FlushFileBuffers function, which is translated via the SQLPAL/HostExtension wrapper on Linux into an fdatasync call. There’s a very deep explanation of all the file system internals in Bob Dorr’s excellent post SQL Server On Linux: Forced Unit Access (Fua) Internals.

If you know your I/O subsystem is resilient, you can disable this behavior, as described in KB 4131496.

However, as this will be by far the highest wait on many Linux systems, and as it’s benign, I’ve added it to the list of waits I ignore and updated the Server Reports extension for Azure Data Studio to filter it out as well (this is now live as v1.4).

Enjoy!

PS I also just added the VDI_CLIENT_OTHER wait to the ignore list as well (common from Availability Group seeding), and submitted GitHub changes to Server Reports yesterday, which will be v1.5.

Using the Wait Statistics Report in Azure Data Studio

Azure Data Studio, which used to be called SQL Operations Studio, has a bunch of extensions available, including one called Server Reports from Microsoft. Last year they took my wait stats query (from here) and made it into a report as part of Server Reports extension. In this quick post I’ll show you how to install that extension and look at the report.

I’ve been using Azure Data Studio a lot as part of working with SQL Server 2017 on Linux in Azure and I like it a lot. It’s obviously not as comprehensive as SSMS, but for simple stuff it does the job.

You can see the pretty big list of available extensions by hitting the Extensions icon:

In the list that appears, find the Server Reports extension and click the Install button:

Once an extension is installed, you need to activate it. Without reading any documentation, the first time I used Azure Data Studio I couldn’t figure out why the extension wasn’t there once I’d installed it. Click the Reload button to active the extension:

The extension will now show up in the list of enabled extensions:

If the Server Dashboard screen isn’t showing, right-click on one of the servers you’re connected to and select Manage. All the active extensions will show as tabs at the top of the screen, so select Server Reports and then click the heart-shaped icon and you’ll see the current aggregate wait stats for the instance:

Note that the x-axis is percentage of all waits, not wait count. You’ll see that PREEMPTIVE_OS_FLUSHFILEBUFFERS is the top wait on my Linux instance – that’s by design and I’ll blog about that next. I’ve also submitted a GitHub change to add that wait to the list of waits filtered out by script the extension uses.

Anyway, you can drill in to the details by clicking the ellipsis at the top-right of the graph and selecting ‘Show Details’. That’ll give all the waits and by selecting each one you can see the usual output from my waits script. To get more information on what each wait means, select the bottom cell, right-click on the URL to copy it, and paste into your favorite browser to go to my waits library. And of course, you can refresh the results via the ellipsis as well.

There are many more useful extensions that you should check out too – enjoy!

CXCONSUMER wait type – history and what you need to know

I’m a little bit overdue to blog about this new wait type, but I wanted to wait until SQL Server 2016 SP2 was released for a bit and people started seeing this new wait type.

History

Back in September 2016 I created a Connect item (3102145) to split the CXPACKET wait into benign and actionable waits, so that when a CXPACKET wait happens, there’s really something to investigate. Unfortunately the Connect website has been shut down by Microsoft so I can’t point you to the original request I made (I also tried in the Internet Archive but couldn’t find it).

The reason I wanted this change is that CXPACKET waits have always been registered by both producer and consumer threads for query plan operators where some threads produce data (i.e. producer threads) and some threads consume the produced data (i.e. consumer threads). However, the waits for the consumer threads are not actionable, because it’s the *producer* threads that are the cause of the consumer thread waits, and so it’s the producer thread waits that are actionable. By splitting the consumer waits out, the number of CXPACKET waits should be reduced, and those that are left should be investigated.

During the PASS Summit in 2017, my friend Pedro Lopes (b|t) on the Tiger Team announced that they’d made the change. The new wait type, CXCONSUMER, was initially added to SQL Server 2017 RTM CU3 and was added to SQL Server 2016 SP2 in April. You can read Pedro’s blog post here. Note that there was initially a bug with the implementation which was fixed in SQL Server 2017 RTM CU4.

What You Need To Know

CXCONSUMER waits can *generally* be ignored, but not always, just like most of the waits we generally ignore. I’ve seen control threads (thread ID = 0) show CXPACKET or CXCONSUMER, depending on what’s happening in the query plan, and I’ve seen some weird cases of skewed parallelism where everything shows as CXCONSUMER instead of CXPACKET. 

I’ve added a CXCONSUMER page to my waits library, and added it to the ‘ignore’ list in all published locations of my wait stats script. However, if you see a bunch of parallel threads on a long-running query accruing a lot of time waiting for CXCONSUMER, I’d troubleshoot it just like CXPACKET – look for skewed work distribution problems.

PS And if you do have CXPACKET waits, start here. Don’t just blindly follow random internet advice to set your sp_configure ‘max degree of parallelism’ or ‘cost threshold for parallelism’ options so a set value without testing how it affects your workload and whether a higher or lower number is better for your environment.