Capturing wait stats for a single operation

This is a performance tuning post that's been on my to-do list for quite a while. Wait stats analysis is a great way of looking at the symptoms of performance problems (see my Wait Stats category for more posts on this) but using the sys.dm_os_wait_stats DMV shows everything that's happening on a server. If you want to see what waits are occuring because of a single query or operation on a production system (e.g. the effect of MAXDOP hints on number and length of CXPACKET waits for a query) then using the DMV isn't usually feasible – you'd need to clear the wait stats and then have nothing else running on the system except the query/operation you want to investigate. The way to do it is with extended events.

Whenever a thread state changes to SUSPENDED because of a wait, the wait_info event from the sqlos event package fires to mark the beginning of the wait. When the thread gets back on the CPU again, the wait_info event fires again, noting the total wait time and the signal wait time (spent on the RUNNABLE queue). Just like with the DMV, we need to calculate the resource wait time (spent being SUSPENDED) ourselves.

Here's a simple example for you – imagine we want to investigate the waits that occur during an index rebuild. First thing is to create the event session:

— Drop the session if it exists.
IF EXISTS (SELECT * FROM sys.server_event_sessions WHERE name = 'MonitorWaits')
    DROP EVENT SESSION MonitorWaits ON SERVER
GO

CREATE EVENT SESSION MonitorWaits ON SERVER
ADD EVENT sqlos.wait_info
    (WHERE sqlserver.session_id = 53 /* session_id of connection to monitor */)
ADD TARGET package0.asynchronous_file_target
    (SET FILENAME = N'C:\SQLskills\EE_WaitStats.xel',
    METADATAFILE = N'C:\SQLskills\EE_WaitStats.xem')
WITH (max_dispatch_latency = 1 seconds);
GO

You'll need to plug in the session ID of the SQL Server connection where you'll run the operation under investigation. I created a dummy production database with a table and some data so I can monitor the waits from rebuilding its clustered index.

You can also filter out any wait types you're not interested in by adding another predicate to the session that filters on the sqlos.wait_type. You'll need to specify numbers, and you can list the mapping between numbers and human-understandable wait types using this code:

SELECT xmv.map_key, xmv.map_value
FROM sys.dm_xe_map_values xmv
JOIN sys.dm_xe_packages xp
ON xmv.object_package_guid = xp.guid
WHERE xmv.name = 'wait_types'
AND xp.name = 'sqlos';
GO

Now I turn on the session and run my rebuild.

— Start the session
ALTER EVENT SESSION MonitorWaits ON SERVER STATE = START;
GO

— Go do the query

— Stop the event session
ALTER EVENT SESSION MonitorWaits ON SERVER STATE = STOP;
GO

And then I can see how many events fired:

SELECT COUNT (*)
FROM sys.fn_xe_file_target_read_file
('C:\SQLskills\EE_WaitStats*.xel',
'C:\SQLskills\EE_WaitStats*.xem', null, null);
GO

13324

And then pull them into a temporary table and aggregate them (various ways of doing this, I prefer this one):

— Create intermediate temp table for raw event data
CREATE TABLE #RawEventData (
    Rowid  INT IDENTITY PRIMARY KEY,
    event_data XML);

GO

— Read the file data into intermediate temp table
INSERT INTO #RawEventData (event_data)
SELECT
    CAST (event_data AS XML) AS event_data
FROM sys.fn_xe_file_target_read_file (
    'C:\SQLskills\EE_WaitStats*.xel',
    'C:\SQLskills\EE_WaitStats*.xem', null, null);
GO

SELECT
    waits.[Wait Type],
    COUNT (*) AS [Wait Count],
    SUM (waits.[Duration]) AS [Total Wait Time (ms)],
    SUM (waits.[Duration]) – SUM (waits.[Signal Duration]) AS [Total Resource Wait Time (ms)],
    SUM (waits.[Signal Duration]) AS [Total Signal Wait Time (ms)]
FROM
    (SELECT
        event_data.value ('(/event/@timestamp)[1]', 'DATETIME') AS [Time],
        event_data.value ('(/event/data[@name=''wait_type'']/text)[1]', 'VARCHAR(100)') AS [Wait Type],
        event_data.value ('(/event/data[@name=''opcode'']/text)[1]', 'VARCHAR(100)') AS [Op],
        event_data.value ('(/event/data[@name=''duration'']/value)[1]', 'BIGINT') AS [Duration],
        event_data.value ('(/event/data[@name=''signal_duration'']/value)[1]', 'BIGINT') AS [Signal Duration]
     FROM #RawEventData
    ) AS waits
WHERE waits.[op] = 'End'
GROUP BY waits.[Wait Type]
ORDER BY [Total Wait Time (ms)] DESC;
GO

— Cleanup
DROP TABLE #RawEventData;
GO

Wait Type             Wait Count Total Wait Time (ms) Total Resource Wait Time (ms) Total Signal Wait Time (ms)
——————— ———– ——————– —————————– —————————
LATCH_EX              4919        15783                15693                         90
LATCH_SH              18          4399                 4398                          1
CXPACKET              16          3039                 3038                          1
PAGEIOLATCH_SH        813         468                  457                           11
PAGELATCH_SH          246         113                  109                           4
WRITELOG              99          71                   66                            5
PAGELATCH_UP          203         65                   63                            2
LOGBUFFER             123         58                   57                            1
WRITE_COMPLETION      82          55                   55                            0
PAGEIOLATCH_UP        35          27                   27                            0
IO_COMPLETION         80          25                   25                            0
SQLTRACE_LOCK         1           6                    6                             0
PAGEIOLATCH_EX        4           3                    3                             0
CMEMTHREAD            17          1                    0                             1
SOS_SCHEDULER_YIELD   6           0                    0                             0

You can easily change this to give average times and add filters to remove uninteresting waits.

If you're interested in getting more involved with extended events, check out Jonathan's excellent series from last December.

Happy waiting!

21 thoughts on “Capturing wait stats for a single operation”

Paul Timmerman says:

July 11, 2011 at 10:12 am

Excellent content as always! Just started utilizing Event Notification and extended events for a number of things. This will fit very well in my troubleshooting/performance script library!

Reply
Alex says:

July 11, 2011 at 9:32 pm

Very nice!
This is what I’ve been missing in SQL Server that’s easily available in Oracle.
Now I just need to push harder to upgrade to 2008…

Reply
Alfred Songy says:

July 11, 2011 at 3:20 pm

Nice! I’ll give it a try.

Reply
Chris Adkin says:

July 14, 2011 at 3:09 am

Very good article. I was going to ask if you can use events to capture plan information, but I see from http://bit.ly/k9Pk6Z that you can (in Denali CPT1 onwards) and that Jonathan has requested some enhancements to this.

Reply
dmmaxwell says:

July 15, 2011 at 4:38 am

Not only was this an excellent idea, and a great post, but I was able to use it immediately to troubleshoot an issue. Thanks, Paul!

Reply
Sam says:

July 10, 2013 at 9:53 am

To capture activity for a single database, will this work?

CREATE EVENT SESSION MonitorWaits ON SERVER
ADD EVENT sqlos.wait_info
(WHERE sqlserver.database_id = 38)
ADD TARGET package0.asynchronous_file_target
(SET FILENAME = N’T:\ExtendedEvents\EE_WaitStats.xel’,
METADATAFILE = N’T:\ExtendedEvents\EE_WaitStats.xem’)
WITH (max_dispatch_latency = 1 seconds);
GO

Reply
1. Paul Randal says:
  
  August 13, 2013 at 1:59 pm
  
  Yes, should do.
  
  Reply
Eric Swiggum (@SQLLayman) says:

November 26, 2013 at 12:53 pm

Paul – For the past few weeks we have been trying to diagnose poor performance on one of our SQL Servers and we have been watching profiler and noticing some queries with ZERO CPU, negligible reads, but fairly high duration. We have been executing these queries again, but have failed to reproduce the high duration times… 2 Questions: 1. Why are we not able to reproduce these high duration times. (maybe pages are in memory) 2. Is there a way to “passively” capture detailed wait stats for queries based on a set condition? (duration>=5000ms)

Reply
1. Paul Randal says:
  
  November 26, 2013 at 1:23 pm
  
  1) Would have been based on what was happening on the box at that time – could have been blocking, waiting for latches, waiting for query execution memory and a host of other things
  2) Not without some expense. You’d have to capture them for all queries and then filter just on the queries you want. A better approach would be to have general perf monitoring in place all the time and then be able to look back to see what was happening. See Erin’s Pluralsight course http://pluralsight.com/training/Courses/TableOfContents/sqlserver-benchmarking
  
  Reply
  1. Eric Swiggum (@SQLLayman) says:
    
    December 18, 2013 at 9:36 am
    
    Thanks, Paul for the quick reply… anyways, it was a VM issue, I should of mentioned that it was a virtualized machine… so many variables indeed!
    
    Reply
Rob W says:

February 11, 2014 at 7:06 am

This is really excellent and I tried one of our prticularly poor queries on dev VM but I’m confused by the results. the query took 20 seconds to run but I got total wait time (ms) of 73915 for CXPACKET. I was expecting none of the waits to exceed the query duration, have I missed something? Thanks, Rob.

Reply
1. Paul Randal says:
  
  February 11, 2014 at 7:55 am
  
  What’s the wait count for CXPACKET? The total wait time adds up all the waits for that type. My guess is that your query ran with DOP 8.
  
  Reply
  1. Rob W says:
    
    February 13, 2014 at 12:54 am
    
    Thanks for the Reply.
    
    The server is a 2 vCPU vmware box;
    
    Logical CPU Count: 2
    Hyperthread Ratio: 1
    Physical CPU Count:1
    
    The server’s MAXDOP settings are the default of 0 and 5 for cost threshold for parallelism. Which suggests the query ran with DOP 2.
    
    Query had CPU time = 27440 ms and elapsed time = 18071 ms.
    
    The CXPACKET wait stats were:
    
    Wait Count: 15346
    Total Wait Time (ms): 61514
    Total Resource Wait Time (ms): 43127
    Total Signal Wait Time (ms): 18387
    
    To be able to look at the wait stats per query is just so useful and I’m trying to learn and understand the relationship between the wait times and query execution times.
    
    Reply
    1. JRStern says:
      
      September 17, 2014 at 9:04 am
      
      Along with parallelism could this be rounding error? Perhaps SQL Server has a habit of reporting at least 1ms if there is a wait greater than zero.
      
      Reply
      1. Paul Randal says:
        
        September 17, 2014 at 4:02 pm
        
        Which number are you referring to?
Rani says:

February 14, 2017 at 10:22 am

Hi Paul

I am reading the blog, still trying to understand the “duration” field on the extended event when i capture the event sp_statement_Completed and sql_batch_completed.

Is the duration includes, the lock wait time and any other wait type. If so, there is any way i can add the wait time and wait type for the same Xevent and filter by the duration > X amount of time?

Thank you in advance.It may be a basic stuff but trying to get clear my doubts from SQL Server Experts.

Thanks.

Reply
1. Paul Randal says:
  
  February 14, 2017 at 10:35 am
  
  For those events, that is the total elapsed time for the SP statement and batch request to complete, including all waits and CPU time.
  
  To your other question, no, as wait time does not include the time spent executing on the processor.
  
  Reply
SQLDBA says:

June 18, 2017 at 6:04 am

Paul if i have get all wait types for a given query, is there a way to do that.

Reply
1. Paul Randal says:
  
  June 19, 2017 at 10:24 am
  
  Not until SQL Server 2016, except using the method in this post.
  
  Reply
Bernt Jensen says:

June 22, 2018 at 8:07 am

Great stuff, this helped me a lot today. Thanks Paul/SQLSkills

Reply
Peter Kupsovsky says:

October 13, 2020 at 2:32 am

Fantastic, thanks Paul. We are running 2016 but it seems the ‘Standard Edition’ does not capture the wait stats in the execution plan like the Enterprise Edition’. This is a great help as I don’t have to reinvent the wheel :-) Thanks again!!

Reply

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.

Capturing wait stats for a single operation

21 thoughts on “Capturing wait stats for a single operation”

Leave a Reply Cancel reply

Other articles

2024: the year in books

2023 (not a typo!): the year in books

Black Friday Super Sale!

Summer School Savings SALE is live!

The Curious Case of… missing log information

The Curious Case of… missing 832 errors during memory corruption

Imagine feeling confident enough to handle whatever your database throws at you.