This is a performance tuning post that's been on my to-do list for quite a while. Wait stats analysis is a great way of looking at the symptoms of performance problems (see my Wait Stats category for more posts on this) but using the sys.dm_os_wait_stats DMV shows everything that's happening on a server. If you want to see what waits are occuring because of a single query or operation on a production system (e.g. the effect of MAXDOP hints on number and length of CXPACKET waits for a query) then using the DMV isn't usually feasible – you'd need to clear the wait stats and then have nothing else running on the system except the query/operation you want to investigate. The way to do it is with extended events.
Whenever a thread state changes to SUSPENDED because of a wait, the wait_info event from the sqlos event package fires to mark the beginning of the wait. When the thread gets back on the CPU again, the wait_info event fires again, noting the total wait time and the signal wait time (spent on the RUNNABLE queue). Just like with the DMV, we need to calculate the resource wait time (spent being SUSPENDED) ourselves.
Here's a simple example for you – imagine we want to investigate the waits that occur during an index rebuild. First thing is to create the event session:
– Drop the session if it exists.
IF EXISTS (SELECT * FROM sys.server_event_sessions WHERE name = 'MonitorWaits')
DROP EVENT SESSION MonitorWaits ON SERVER
GOCREATE EVENT SESSION MonitorWaits ON SERVER
ADD EVENT sqlos.wait_info
(WHERE sqlserver.session_id = 53 /* session_id of connection to monitor */)
ADD TARGET package0.asynchronous_file_target
(SET FILENAME = N'C:\SQLskills\EE_WaitStats.xel',
METADATAFILE = N'C:\SQLskills\EE_WaitStats.xem')
WITH (max_dispatch_latency = 1 seconds);
GO
You'll need to plug in the session ID of the SQL Server connection where you'll run the operation under investigation. I created a dummy production database with a table and some data so I can monitor the waits from rebuilding its clustered index.
You can also filter out any wait types you're not interested in by adding another predicate to the session that filters on the sqlos.wait_type. You'll need to specify numbers, and you can list the mapping between numbers and human-understandable wait types using this code:
SELECT xmv.map_key, xmv.map_value
FROM sys.dm_xe_map_values xmv
JOIN sys.dm_xe_packages xp
ON xmv.object_package_guid = xp.guid
WHERE xmv.name = 'wait_types'
AND xp.name = 'sqlos';
GO
Now I turn on the session and run my rebuild.
– Start the session
ALTER EVENT SESSION MonitorWaits ON SERVER STATE = START;
GO– Go do the query
– Stop the event session
ALTER EVENT SESSION MonitorWaits ON SERVER STATE = STOP;
GO
And then I can see how many events fired:
SELECT COUNT (*)
FROM sys.fn_xe_file_target_read_file
('C:\SQLskills\EE_WaitStats*.xel',
'C:\SQLskills\EE_WaitStats*.xem', null, null);
GO13324
And then pull them into a temporary table and aggregate them (various ways of doing this, I prefer this one):
– Create intermediate temp table for raw event data
CREATE TABLE #RawEventData (
Rowid INT IDENTITY PRIMARY KEY,
event_data XML);
GO– Read the file data into intermediate temp table
INSERT INTO #RawEventData (event_data)
SELECT
CAST (event_data AS XML) AS event_data
FROM sys.fn_xe_file_target_read_file (
'C:\SQLskills\EE_WaitStats*.xel',
'C:\SQLskills\EE_WaitStats*.xem', null, null);
GOSELECT
waits.[Wait Type],
COUNT (*) AS [Wait Count],
SUM (waits.[Duration]) AS [Total Wait Time (ms)],
SUM (waits.[Duration]) – SUM (waits.[Signal Duration]) AS [Total Resource Wait Time (ms)],
SUM (waits.[Signal Duration]) AS [Total Signal Wait Time (ms)]
FROM
(SELECT
event_data.value ('(/event/@timestamp)[1]', 'DATETIME') AS [Time],
event_data.value ('(/event/data[@name=''wait_type'']/text)[1]', 'VARCHAR(100)') AS [Wait Type],
event_data.value ('(/event/data[@name=''opcode'']/text)[1]', 'VARCHAR(100)') AS [Op],
event_data.value ('(/event/data[@name=''duration'']/value)[1]', 'BIGINT') AS [Duration],
event_data.value ('(/event/data[@name=''signal_duration'']/value)[1]', 'BIGINT') AS [Signal Duration]
FROM #RawEventData
) AS waits
WHERE waits.[op] = 'End'
GROUP BY waits.[Wait Type]
ORDER BY [Total Wait Time (ms)] DESC;
GO– Cleanup
DROP TABLE #RawEventData;
GOWait Type Wait Count Total Wait Time (ms) Total Resource Wait Time (ms) Total Signal Wait Time (ms)
——————— ———– ——————– —————————– —————————
LATCH_EX 4919 15783 15693 90
LATCH_SH 18 4399 4398 1
CXPACKET 16 3039 3038 1
PAGEIOLATCH_SH 813 468 457 11
PAGELATCH_SH 246 113 109 4
WRITELOG 99 71 66 5
PAGELATCH_UP 203 65 63 2
LOGBUFFER 123 58 57 1
WRITE_COMPLETION 82 55 55 0
PAGEIOLATCH_UP 35 27 27 0
IO_COMPLETION 80 25 25 0
SQLTRACE_LOCK 1 6 6 0
PAGEIOLATCH_EX 4 3 3 0
CMEMTHREAD 17 1 0 1
SOS_SCHEDULER_YIELD 6 0 0 0
You can easily change this to give average times and add filters to remove uninteresting waits.
If you're interested in getting more involved with extended events, check out Jonathan's excellent series from last December.
Happy waiting!
9 Responses to Capturing wait stats for a single operation
Excellent content as always! Just started utilizing Event Notification and extended events for a number of things. This will fit very well in my troubleshooting/performance script library!
Very nice!
This is what I’ve been missing in SQL Server that’s easily available in Oracle.
Now I just need to push harder to upgrade to 2008…
Nice! I’ll give it a try.
Very good article. I was going to ask if you can use events to capture plan information, but I see from http://bit.ly/k9Pk6Z that you can (in Denali CPT1 onwards) and that Jonathan has requested some enhancements to this.
Not only was this an excellent idea, and a great post, but I was able to use it immediately to troubleshoot an issue. Thanks, Paul!
[...] Now that I have my session ids, I’m going to create an extended events session that I can run during the “slow performing” period to help illuminate where to investigate next (and for more general discussion on this technique, see Paul Randal’s post “Capturing wait stats for a single operation”): [...]
[...] Now that I have my session ids, I’m going to create an extended events session that I can run during the “slow performing” period to help illuminate where to investigate next (and for more general discussion on this technique, see Paul Randal’s post “Capturing wait stats for a single operation”): [...]
[...] confirm that query is running slow just because of RBAR factor, I have used extended events for single session waits analysis, a well defined method by Paul Randal. Output was as [...]
[...] there but I had no idea what until I found this great post by Paul S. Randal about wait stats (http://www.sqlskills.com/blogs/paul/capturing-wait-stats-for-a-single-operation). That gave me hope of actually finding where the problem sits. Unfortunately his approach is [...]