It is time again for another TSQL Tuesday, this time hosted by my good friend and fellow MVP, Sankar Reddy(Blog|Twitter).   This month’s topic is Misconceptions about SQL Server, and as Sankar points out in this months, there are so many misconceptions out there that almost anyone can blog about this topic, in fact I might even blog more than just one post on this topic since I have a few half finished blog posts that would fit into this topic.

For this post I am going to look at a recent misconception from the MSDN Forums, that physical reads of a data file, correlate to wait stats inside the database engine.  The background on this is rooted in two Dynamic Management Objects inside the database engine; the Dynamic Management Function (DMF) sys.dm_io_virtual_file_stats() which returns I/O statistics for data and log files and the Dynamic Management View (DMV) sys.dm_os_wait_stats which returns information about all the waits encountered by threads that have executed.  These objects reflect cumulative totals for their information, generally since the server was last restarted.  While it is possible to reset the wait statistic information by using DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR), the I/O statistics can only be cleared by restarting the server.  When using either of these objects for analysis of a specific time, it is important to take a snapshot of the data held in them at the start of the analysis, generally by INSERTing the information contained by the object into a temporary table and then performing a differential analysis of the data against a later SELECT from the objects (more on this in a minute).

The misconception on the forum thread was that I/O stalls as reflected in the sys.dm_io_virtual_file_stats() DMF, would correlate to I/O related wait counts, and wait times in the sys.dm_os_wait_stats DMV.  Prior to SQL Server 2008, this kind of thing would be impossible to prove without having indepth knowledge of the Database Engine code.  However, one of the new features in SQL Server 2008, Extended Events, actually allows us to define a Event Session to take a look at this in depth, and not only disprove the misconception, but potentially explain the what and why behind it.

To start off lets look at the snapshot information that lead to the misconception for a simple query against AdventureWorks:

USE [AdventureWorks]
GO
-- Clear the Buffer Cache to force reads from Disk
DBCC DROPCLEANBUFFERS
GO
-- Drop temporary tables if they exist
IF OBJECT_ID('tempdb..#io_stats') IS NOT NULL
DROP TABLE #io_stats
GO
IF OBJECT_ID('tempdb..#wait_stats') IS NOT NULL
DROP TABLE #wait_stats
GO
-- Create the Snapshots of the Dynamic Management Objects
SELECT *
INTO #io_stats
FROM sys.dm_io_virtual_file_stats(NULL, NULL)
GO
SELECT *
INTO #wait_stats
FROM sys.dm_os_wait_stats
GO
-- Turn on Statistics IO so we can see the IO ops
SET STATISTICS IO ON
GO
-- Run the Simple SELECT against AdventureWorks
SELECT SUM(TotalDue), SalesPersonID
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID
GO
-- Turn Off Statistics IO
SET STATISTICS IO OFF
GO
-- Get the Differential Wait Statistic Information
SELECT
  
wait_type               = t1.wait_type,
  
waiting_tasks_count     = t1.waiting_tasks_count - t2.waiting_tasks_count,
  
wait_time_ms            = t1.wait_time_ms - t2.wait_time_ms,
  
max_wait_time_ms        = t1.max_wait_time_ms - t2.max_wait_time_ms,
  
signal_wait_time_ms     = t1.signal_wait_time_ms - t2.signal_wait_time_ms
FROM sys.dm_os_wait_stats AS t1
JOIN #wait_stats AS t2
  
ON t1.wait_type = t2.wait_type
WHERE t1.wait_time_ms - t2.wait_time_ms > 0
GO
-- Get the Differential IO Statistic Information
SELECT
  
database_name           = DB_NAME(t1.database_id),
  
logical_name            = mf.name,
  
physical_name           = mf.physical_name,
  
num_of_reads            = t1.num_of_reads - t2.num_of_reads,
  
num_of_bytes_read       = t1.num_of_bytes_read - t2.num_of_bytes_read,
  
io_stall_read_ms        = t1.io_stall_read_ms - t2.io_stall_read_ms,
  
num_of_writes           = t1.num_of_writes - t2.num_of_writes,
  
num_of_bytes_written    = t1.num_of_bytes_written - t2.num_of_bytes_written,
  
io_stall_write_ms       = t1.io_stall_write_ms - t2.io_stall_write_ms,
  
io_stall                = t1.io_stall - t2.io_stall,
  
size_on_disk_bytes      = t1.size_on_disk_bytes,
  
file_handle             = t1.file_handle
FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS t1
JOIN #io_stats AS t2
  
ON t1.database_id = t2.database_id
      
AND t1.FILE_ID = t2.FILE_ID
JOIN sys.master_files AS mf
  
ON mf.database_id = t1.database_id
      
AND mf.FILE_ID = t1.FILE_ID
WHERE (t1.num_of_reads - t2.num_of_reads > 0)
  OR (
t1.num_of_writes - t2.num_of_writes > 0)
GO

The expectation in the forums thread was that the num_of_reads, and the io_stall_read_ms values from sys.dm_io_virtual_file_stats() would have an equivalent amount of waiting_tasks_count and wait_time_ms in sys.dm_os_wait_stats, but as can be seen from the below output, this is not the case.

TSQL Tuesday #11   Physical IO’s Don’t Always Accumulate Wait Times   image thumb

Before I continue with this post, let me first provide some information about the hardware basis for these tests.  The information contained in this blog post comes from my VMware Workstation SQL Server 2008 VM running on my laptop which is a a Dell Inspiron 1400 which has 4GB RAM, and a 120GB OCZ Summit SSD.  The VM has a single vCPU and 1GB RAM allocated to it.  Now I would be remiss in thinking that this configuration wouldn’t raise some eyebrows, but rest assured that I have validated these findings on multiple enterprise class VM’s as well as physical servers including a dual quad core Dell R710 with 32GB RAM and a 8 Disk DAS that I have used in other tests for blog posts in the past.  The results have been consistent across all of my tests and the output from my laptop VM allows me to completely isolate the workload being performed in SQL to a single execution easily.

So why don’t these numbers line up?  To explain this further we can use an Extended Event Session to look at the internal workings when the example workload query is executed:

IF EXISTS(SELECT *
        
FROM sys.server_event_sessions
        
WHERE name='test_session')
   
DROP EVENT SESSION [test_session] ON SERVER;
CREATE EVENT SESSION [test_session]
ON SERVER
ADD EVENT sqlserver.file_read(
    
ACTION (sqlserver.database_id, sqlserver.session_id)),
ADD EVENT sqlserver.file_written(
    
ACTION (sqlserver.database_id, sqlserver.session_id)),
ADD EVENT sqlserver.file_read_completed(
    
ACTION (sqlserver.database_id, sqlserver.session_id)),
ADD EVENT sqlserver.file_write_completed(
    
ACTION (sqlserver.database_id, sqlserver.session_id)),
ADD EVENT sqlos.async_io_requested(
    
ACTION (sqlserver.database_id, sqlserver.session_id)),
ADD EVENT sqlos.async_io_completed(
    
ACTION (sqlserver.database_id, sqlserver.session_id)),
ADD EVENT sqlos.wait_info(
    
ACTION (sqlserver.database_id, sqlserver.session_id)),
ADD EVENT sqlserver.sql_statement_starting(
    
ACTION (sqlserver.database_id, sqlserver.plan_handle,
           
sqlserver.session_id, sqlserver.sql_text)),
ADD EVENT sqlserver.sql_statement_completed(
    
ACTION (sqlserver.database_id, sqlserver.plan_handle,
           
sqlserver.session_id, sqlserver.sql_text))
ADD TARGET package0.ring_buffer
WITH (MAX_MEMORY = 4096KB,
    
EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS,
    
MAX_DISPATCH_LATENCY = 5 SECONDS,
    
MAX_EVENT_SIZE = 4096KB,
    
MEMORY_PARTITION_MODE = PER_CPU,
    
TRACK_CAUSALITY = ON,
    
STARTUP_STATE = OFF)

Some items about this Event Session that should be noted:

  1. The Event Session uses the TRACK_CAUSALITY session option which appends a sequential GUID to each event allowing the event sequences to be followed.
  2. The Event Session captures the sql_statement_starting and sql_statement_completed events allowing isolation of the associated events between these two events by the attach_activity_id action, which is collected by causality tracking.
  3. The Event Session captures the following IO related Events:
    • async_io_requested
    • async_io_completed
    • file_read
    • file_read_completed
    • file_written
    • file_write_completed
    • wait_info
  4. The database_id and session_id actions are captured for each event, and the sql_text and plan_handle actions are captured for the sql_statement_% events.
  5. The event data is captured to a ring_buffer target which could result in event loss on a busy system, in which case a asynchronous_file_target would be more appropriate.
  6. The Event Session is configured with 4MB of buffer space that is partitioned per CPU, and a MAX_DISPATCH_LATENCY of 5 seconds.
  7. In the situation where the Session Buffers fill up faster than they can be dispatched single events can be lost to allow collection of incoming events.

To use the Event Session to further investigate what is going on here, we can change our original workload script to start the Event Session before running the test query, and then to drop the Events from the Event Session to disable Event collection while leaving the captured Events available in the ring_buffer target:

USE [AdventureWorks]
GO
-- Clear the Buffer Cache to force reads from Disk
DBCC DROPCLEANBUFFERS
GO
-- Drop temporary tables if they exist
IF OBJECT_ID('tempdb..#io_stats') IS NOT NULL
DROP TABLE #io_stats
GO
IF OBJECT_ID('tempdb..#wait_stats') IS NOT NULL
DROP TABLE #wait_stats
GO
-- Create the Snapshots of the Dynamic Management Objects
SELECT *
INTO #io_stats
FROM sys.dm_io_virtual_file_stats(NULL, NULL)
GO
SELECT *
INTO #wait_stats
FROM sys.dm_os_wait_stats
GO
-- Turn on Statistics IO so we can see the IO ops
SET STATISTICS IO ON
GO
-- Start the Event Session so we capture the Events caused by running the test
ALTER EVENT SESSION test_session
ON SERVER
STATE
=START
GO
-- Run the Simple SELECT against AdventureWorks
SELECT SUM(TotalDue), SalesPersonID
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID
GO
-- Disable Event collection by dropping the Events from the Event Session
ALTER EVENT SESSION test_session
ON SERVER
   
DROP EVENT sqlos.async_io_requested,
   
DROP EVENT sqlos.async_io_completed,
   
DROP EVENT sqlos.wait_info,
  
DROP EVENT sqlserver.file_read,
  
DROP EVENT sqlserver.file_written,
  
DROP EVENT sqlserver.file_read_completed,
  
DROP EVENT sqlserver.file_write_completed,
  
DROP EVENT sqlserver.sql_statement_starting,
  
DROP EVENT sqlserver.sql_statement_completed
GO
-- Turn Off Statistics IO
SET STATISTICS IO OFF
GO
-- Get the Differential Wait Statistic Information
SELECT
  
wait_type               = t1.wait_type,
  
waiting_tasks_count     = t1.waiting_tasks_count - t2.waiting_tasks_count,
  
wait_time_ms            = t1.wait_time_ms - t2.wait_time_ms,
  
max_wait_time_ms        = t1.max_wait_time_ms - t2.max_wait_time_ms,
  
signal_wait_time_ms     = t1.signal_wait_time_ms - t2.signal_wait_time_ms
FROM sys.dm_os_wait_stats AS t1
JOIN #wait_stats AS t2
  
ON t1.wait_type = t2.wait_type
WHERE t1.wait_time_ms - t2.wait_time_ms > 0
GO
-- Get the Differential IO Statistic Information
SELECT
  
database_name           = DB_NAME(t1.database_id),
  
logical_name            = mf.name,
  
physical_name           = mf.physical_name,
  
num_of_reads            = t1.num_of_reads - t2.num_of_reads,
  
num_of_bytes_read       = t1.num_of_bytes_read - t2.num_of_bytes_read,
  
io_stall_read_ms        = t1.io_stall_read_ms - t2.io_stall_read_ms,
  
num_of_writes           = t1.num_of_writes - t2.num_of_writes,
  
num_of_bytes_written    = t1.num_of_bytes_written - t2.num_of_bytes_written,
  
io_stall_write_ms       = t1.io_stall_write_ms - t2.io_stall_write_ms,
  
io_stall                = t1.io_stall - t2.io_stall,
  
size_on_disk_bytes      = t1.size_on_disk_bytes,
  
file_handle             = t1.file_handle
FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS t1
JOIN #io_stats AS t2
  
ON t1.database_id = t2.database_id
      
AND t1.FILE_ID = t2.FILE_ID
JOIN sys.master_files AS mf
  
ON mf.database_id = t1.database_id
      
AND mf.FILE_ID = t1.FILE_ID
WHERE (t1.num_of_reads - t2.num_of_reads > 0)
  OR (
t1.num_of_writes - t2.num_of_writes > 0)
GO

Looking at the raw output from this second test, we can see that it has identical num_of_reads, and num_of_bytes_read as the original query (there may be a slight variance here depending on the number of tests executed, but over 10 repeated tests, 7 of the tests returned identical results, with 2 of the remaining 3 showing 14 reads with just over

TSQL Tuesday #11   Physical IO’s Don’t Always Accumulate Wait Times   image thumb

To perform analysis of the Events that fired while the Event Session was capturing data, we will use a PIVOT query to rotate the information captured into a common format for analysis using the following query:

DROP TABLE #Results

-- Extract the Event information from the Event Session
SELECT
  
-- Add a RowID column to provide guaranteed Event ordering for analysis
  
IDENTITY(INT,1,1) AS RowID,
  
table2.*
-- INSERT the immediate results into a temp table for further analysis
INTO #Results
FROM
SELECT XEvent.query('.') AS XEvent
  
FROM
  
(   -- Cast the target_data to XML
      
SELECT CAST(target_data AS XML) AS TargetData
      
FROM sys.dm_xe_session_targets st
      
JOIN sys.dm_xe_sessions s
          
ON s.address = st.event_session_address
      
WHERE name = 'test_session'
        
AND target_name = 'ring_buffer'    ) AS Data
  
-- Split out the Event Nodes
  
CROSS APPLY TargetData.nodes ('RingBufferTarget/event') AS XEventData (XEvent)    ) AS table1
-- Use a TVF CROSS APPLY to split out the final result columns for PIVOT
CROSS APPLY
SELECT *
  
FROM
  
(   -- Get the Event Name
      
SELECT
          
'event_name' AS 'name',
          
RTRIM(LTRIM(XEvent.value('(event/@name)[1]', 'nvarchar(max)'))) AS value
      
UNION ALL
      
-- Get the Event Timestamp
      
SELECT
          
'timestamp' AS 'name',
          
RTRIM(LTRIM(XEvent.value('(event/@timestamp)[1]', 'nvarchar(40)'))) AS value
      
UNION ALL
      
-- Get the Event data columns
      
SELECT
          
RTRIM(LTRIM(XData.value('@name[1]','nvarchar(max)'))) AS 'name',
          
RTRIM(LTRIM(COALESCE(NULLIF(XData.value('(text)[1]','nvarchar(max)'), ''),
                    
XData.value('(value)[1]','nvarchar(max)')))) AS 'value'
      
FROM XEvent.nodes('event/data') AS Data (XData)
      
UNION ALL
      
-- Get the Event action columns
      
SELECT
          
XData.value('@name[1]','nvarchar(max)') AS 'name'
          
,COALESCE(NULLIF(XData.value('(text)[1]','nvarchar(max)'), ''),
                    
XData.value('(value)[1]','nvarchar(max)')) AS 'value'
      
FROM XEvent.nodes('event/action') AS Data (XData)   ) AS tab
   PIVOT
  
(   MAX(value)
      
-- Specify columns for PIVOT Operation
      
FOR name IN
          
(   [event_name], [timestamp], [database_id], [session_id],
              
[file_handle], [wait_type], [opcode], [duration],
              
[signal_duration], [mode], [file_id], [succeeded],
              
[page_id], [extent_id], [sql_text], [plan_handle], [cpu],
              
[reads], [writes], [state], [attach_activity_id]
          
)
   )
AS pvt
) AS table2

With the data PIVOT’d into a temporary table we can do some basic analysis of what was captured by the Event Session.

DECLARE @attach_id VARCHAR(100)
DECLARE @start_event_id INT
DECLARE
@end_event_id INT

-- Set the variables for the start_event_id, and attach_id
SELECT
  
@start_event_id = RowID,
  
@attach_id = SUBSTRING(attach_activity_id, 1,
               (
LEN(attach_activity_id)-CHARINDEX('-', REVERSE(attach_activity_id))+1))
FROM #Results
WHERE event_name = 'sql_statement_starting'
AND sql_text LIKE '%SELECT SUM(TotalDue)%'

-- Set the variable for the end_event_id
SELECT
  
@end_event_id = RowID
  
FROM #Results
WHERE event_name = 'sql_statement_completed'
AND sql_text LIKE '%SELECT SUM(TotalDue)%'

-- SELECT the rows from the captured Events that are between the start and
-- end Event with the attach_activity_id for the start Event
SELECT
  
RowID,
  
event_name,
  
[attach_activity_id],
  
database_id,
  
FILE_ID,
  
file_handle,
  
mode,
  
wait_type,
  
opcode,
  
duration,
  
signal_duration,
  
[sql_text],
  
[plan_handle],
  
[cpu],
  
[reads],
  
[writes],
  
[state]
FROM #Results AS r
WHERE attach_activity_id LIKE @attach_id+'%'
 
AND RowID >= @start_event_id
 
AND RowID <= @end_event_id

With this output, we can look at the order of engine events that fired, and from a quick glance it would seem that IO operations do have PAGEIOLATCH_SH waits associated with them as highlighted below:

TSQL Tuesday #11   Physical IO’s Don’t Always Accumulate Wait Times   image thumb

What is quickly evident is that while the wait type might get set, it has a zero duration, yet this alone is not enough information to support the analysis here, we really have to work through the events to determine what exactly might be going on here.  One thing that immediately stands out to me is that the PAGEIOLATCH_SH waits may not have long durations associated with them, in fact the screenshots for this post show zero durations for the events, but the file_read_completed events have durations associated with them.

TSQL Tuesday #11   Physical IO’s Don’t Always Accumulate Wait Times   image thumb

So in the interests of trying to better explain the output from sys.dm_io_virtual_file_stats() lets aggregate the totals for all the wait types captured in association with this statement executing, as well as aggregate the file_read_completed durations with the wait_info durations to see how they compare to the output from sys.dm_io_virtual_file_stats().

DECLARE @attach_id VARCHAR(100)
DECLARE @start_event_id INT
DECLARE
@end_event_id INT

-- Set the variables for the start_event_id, end_event_id, and attach_id
SELECT
  
@start_event_id = RowID,
  
@attach_id = SUBSTRING(attach_activity_id, 1,
               (
LEN(attach_activity_id)-CHARINDEX('-', REVERSE(attach_activity_id))+1))
FROM #Results
WHERE event_name = 'sql_statement_starting'
AND sql_text LIKE '%SELECT SUM(TotalDue)%'

SELECT
  
@end_event_id = RowID
  
FROM #Results
WHERE event_name = 'sql_statement_completed'
AND sql_text LIKE '%SELECT SUM(TotalDue)%'

-- Aggregate the wait types associated with the Event exectuion
SELECT
  
wait_type,
  
COUNT(*) AS waiting_task_count,
  
SUM(CAST(duration AS INT)) AS duration
FROM #results
WHERE attach_activity_id LIKE @attach_id+'%'
 
AND RowID >= @start_event_id
 
AND RowID <= @end_event_id
 
AND opcode = 'end'
GROUP BY wait_type

-- Aggregate the IO durations with the waits for the Event execution
SELECT SUM(CAST(duration AS INT)) AS IO_and_wait_duration
FROM #Results AS r
WHERE attach_activity_id LIKE @attach_id+'%'
 
AND RowID >= @start_event_id
 
AND RowID <= @end_event_id
 
AND event_name IN ('file_read_completed', 'wait_info')

TSQL Tuesday #11   Physical IO’s Don’t Always Accumulate Wait Times   image thumb

When we compare this to the original output from sys.dm_io_virtual_file_stats(), we get closer to the total io_stall_read_ms values, though we are still not precisely there.

TSQL Tuesday #11   Physical IO’s Don’t Always Accumulate Wait Times   image thumb

So what accounts for the difference here?  Well, I could speculate, but that isn’t really the purpose of this post, and it would only feed into further misconceptions about what the correlation of the values presented here might actually be.  What is really clear here is that the information contained in sys.dm_io_virtual_file_stats() can not be directly correlated to the output from sys.dm_os_wait_stats.   As shown in this simple, ok maybe its a bit complex because Extended Events has yet to be made simple to use, there is duration associated with file read operations that is not directly attributable to any wait type, which should be expected.