Capturing wait stats for a single operation

This is a performance tuning post that's been on my to-do list for quite a while. Wait stats analysis is a great way of looking at the symptoms of performance problems (see my Wait Stats category for more posts on this) but using the sys.dm_os_wait_stats DMV shows everything that's happening on a server. If you want to see what waits are occuring because of a single query or operation on a production system (e.g. the effect of MAXDOP hints on number and length of CXPACKET waits for a query) then using the DMV isn't usually feasible – you'd need to clear the wait stats and then have nothing else running on the system except the query/operation you want to investigate. The way to do it is with extended events.

Whenever a thread state changes to SUSPENDED because of a wait, the wait_info event from the sqlos event package fires to mark the beginning of the wait. When the thread gets back on the CPU again, the wait_info event fires again, noting the total wait time and the signal wait time (spent on the RUNNABLE queue). Just like with the DMV, we need to calculate the resource wait time (spent being SUSPENDED) ourselves.

Here's a simple example for you – imagine we want to investigate the waits that occur during an index rebuild. First thing is to create the event session:

— Drop the session if it exists.
IF EXISTS (SELECT * FROM sys.server_event_sessions WHERE name = 'MonitorWaits')
    DROP EVENT SESSION MonitorWaits ON SERVER
GO

CREATE EVENT SESSION MonitorWaits ON SERVER
ADD EVENT sqlos.wait_info
    (WHERE sqlserver.session_id = 53 /* session_id of connection to monitor */)
ADD TARGET package0.asynchronous_file_target
    (SET FILENAME = N'C:\SQLskills\EE_WaitStats.xel',
    METADATAFILE = N'C:\SQLskills\EE_WaitStats.xem')
WITH (max_dispatch_latency = 1 seconds);
GO

You'll need to plug in the session ID of the SQL Server connection where you'll run the operation under investigation. I created a dummy production database with a table and some data so I can monitor the waits from rebuilding its clustered index.

You can also filter out any wait types you're not interested in by adding another predicate to the session that filters on the sqlos.wait_type. You'll need to specify numbers, and you can list the mapping between numbers and human-understandable wait types using this code:

SELECT xmv.map_key, xmv.map_value
FROM sys.dm_xe_map_values xmv
JOIN sys.dm_xe_packages xp
    ON xmv.object_package_guid = xp.guid
WHERE xmv.name = 'wait_types'
    AND xp.name = 'sqlos';
GO

Now I turn on the session and run my rebuild.

— Start the session
ALTER EVENT SESSION MonitorWaits ON SERVER STATE = START;
GO

— Go do the query

— Stop the event session
ALTER EVENT SESSION MonitorWaits ON SERVER STATE = STOP;
GO

And then I can see how many events fired:

SELECT COUNT (*)
FROM sys.fn_xe_file_target_read_file
    ('C:\SQLskills\EE_WaitStats*.xel',
    'C:\SQLskills\EE_WaitStats*.xem', null, null);
GO

13324

And then pull them into a temporary table and aggregate them (various ways of doing this, I prefer this one):

— Create intermediate temp table for raw event data
CREATE TABLE #RawEventData (
    Rowid  INT IDENTITY PRIMARY KEY,
    event_data XML);
 
GO

— Read the file data into intermediate temp table
INSERT INTO #RawEventData (event_data)
SELECT
    CAST (event_data AS XML) AS event_data
FROM sys.fn_xe_file_target_read_file (
    'C:\SQLskills\EE_WaitStats*.xel',
    'C:\SQLskills\EE_WaitStats*.xem', null, null);
GO

SELECT
    waits.[Wait Type],
    COUNT (*) AS [Wait Count],
    SUM (waits.[Duration]) AS [Total Wait Time (ms)],
    SUM (waits.[Duration]) – SUM (waits.[Signal Duration]) AS [Total Resource Wait Time (ms)],
    SUM (waits.[Signal Duration]) AS [Total Signal Wait Time (ms)]
FROM
    (SELECT
        event_data.value ('(
/event/@timestamp)[1]', 'DATETIME') AS [Time],
        event_data.value ('(/event/data[@name=''wait_type'']/text)[1]', 'VARCHAR(100)') AS [Wait Type],
        event_data.value ('(/event/data[@name=''opcode'']/text)[1]', 'VARCHAR(100)') AS [Op],
        event_data.value ('(/event/data[@name=''duration'']/value)[1]', 'BIGINT') AS [Duration],
        event_data.value ('(/event/data[@name=''signal_duration'']/value)[1]', 'BIGINT') AS [Signal Duration]
     FROM #RawEventData
    ) AS waits
WHERE waits.[op] = 'End'
GROUP BY waits.[Wait Type]
ORDER BY [Total Wait Time (ms)] DESC;
GO

— Cleanup
DROP TABLE #RawEventData;
GO

Wait Type             Wait Count  Total Wait Time (ms) Total Resource Wait Time (ms) Total Signal Wait Time (ms)
——————— ———– ——————– —————————– —————————
LATCH_EX              4919        15783                15693                         90
LATCH_SH              18          4399                 4398                          1
CXPACKET              16          3039                 3038                          1
PAGEIOLATCH_SH        813         468                  457                           11
PAGELATCH_SH          246         113                  109                           4
WRITELOG              99          71                   66                            5
PAGELATCH_UP          203         65                   63                            2
LOGBUFFER             123         58                   57                            1
WRITE_COMPLETION      82          55                   55                            0
PAGEIOLATCH_UP        35          27                   27                            0
IO_COMPLETION         80          25                   25                            0
SQLTRACE_LOCK         1           6                    6                             0
PAGEIOLATCH_EX        4           3                    3                             0
CMEMTHREAD            17          1                    0                             1
SOS_SCHEDULER_YIELD   6           0                    0                             0

You can easily change this to give average times and add filters to remove uninteresting waits.

If you're interested in getting more involved with extended events, check out Jonathan's excellent series from last December.

Happy waiting!

21 thoughts on “Capturing wait stats for a single operation

  1. Excellent content as always! Just started utilizing Event Notification and extended events for a number of things. This will fit very well in my troubleshooting/performance script library!

  2. Very nice!
    This is what I’ve been missing in SQL Server that’s easily available in Oracle.
    Now I just need to push harder to upgrade to 2008…

  3. Very good article. I was going to ask if you can use events to capture plan information, but I see from http://bit.ly/k9Pk6Z that you can (in Denali CPT1 onwards) and that Jonathan has requested some enhancements to this.

  4. To capture activity for a single database, will this work?

    CREATE EVENT SESSION MonitorWaits ON SERVER
    ADD EVENT sqlos.wait_info
    (WHERE sqlserver.database_id = 38)
    ADD TARGET package0.asynchronous_file_target
    (SET FILENAME = N’T:\ExtendedEvents\EE_WaitStats.xel’,
    METADATAFILE = N’T:\ExtendedEvents\EE_WaitStats.xem’)
    WITH (max_dispatch_latency = 1 seconds);
    GO

  5. Paul – For the past few weeks we have been trying to diagnose poor performance on one of our SQL Servers and we have been watching profiler and noticing some queries with ZERO CPU, negligible reads, but fairly high duration. We have been executing these queries again, but have failed to reproduce the high duration times… 2 Questions: 1. Why are we not able to reproduce these high duration times. (maybe pages are in memory) 2. Is there a way to “passively” capture detailed wait stats for queries based on a set condition? (duration>=5000ms)

    1. 1) Would have been based on what was happening on the box at that time – could have been blocking, waiting for latches, waiting for query execution memory and a host of other things
      2) Not without some expense. You’d have to capture them for all queries and then filter just on the queries you want. A better approach would be to have general perf monitoring in place all the time and then be able to look back to see what was happening. See Erin’s Pluralsight course http://pluralsight.com/training/Courses/TableOfContents/sqlserver-benchmarking

      1. Thanks, Paul for the quick reply… anyways, it was a VM issue, I should of mentioned that it was a virtualized machine… so many variables indeed!

  6. This is really excellent and I tried one of our prticularly poor queries on dev VM but I’m confused by the results. the query took 20 seconds to run but I got total wait time (ms) of 73915 for CXPACKET. I was expecting none of the waits to exceed the query duration, have I missed something? Thanks, Rob.

      1. Thanks for the Reply.

        The server is a 2 vCPU vmware box;

        Logical CPU Count: 2
        Hyperthread Ratio: 1
        Physical CPU Count:1

        The server’s MAXDOP settings are the default of 0 and 5 for cost threshold for parallelism. Which suggests the query ran with DOP 2.

        Query had CPU time = 27440 ms and elapsed time = 18071 ms.

        The CXPACKET wait stats were:

        Wait Count: 15346
        Total Wait Time (ms): 61514
        Total Resource Wait Time (ms): 43127
        Total Signal Wait Time (ms): 18387

        To be able to look at the wait stats per query is just so useful and I’m trying to learn and understand the relationship between the wait times and query execution times.

        1. Along with parallelism could this be rounding error? Perhaps SQL Server has a habit of reporting at least 1ms if there is a wait greater than zero.

  7. Hi Paul

    I am reading the blog, still trying to understand the “duration” field on the extended event when i capture the event sp_statement_Completed and sql_batch_completed.

    Is the duration includes, the lock wait time and any other wait type. If so, there is any way i can add the wait time and wait type for the same Xevent and filter by the duration > X amount of time?

    Thank you in advance.It may be a basic stuff but trying to get clear my doubts from SQL Server Experts.

    Thanks.

    1. For those events, that is the total elapsed time for the SP statement and batch request to complete, including all waits and CPU time.

      To your other question, no, as wait time does not include the time spent executing on the processor.

  8. Fantastic, thanks Paul. We are running 2016 but it seems the ‘Standard Edition’ does not capture the wait stats in the execution plan like the Enterprise Edition’. This is a great help as I don’t have to reinvent the wheel :-) Thanks again!!

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Some thoughts on courage

(This is also the Ponderings – editorial – in today’s SQLskills newsletter.) I want to start out this post by sincerely thanking everyone who emailed

Explore

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.