Capturing IO latencies for a period of time

In both my wait statistics pre-conference workshops at the PASS Summit and SQLintersection I promised to do a bunch of blog posts. The second one on the list is a simple script to allow you to capture all the reads, writes, and I/O latencies that occurred over a period of time.

The script does the following:

  • Creates two temporary tables
  • Captures the output from sys.dm_io_virtual_file_stats into the first table
  • Waits for a configurable delay (line 41 in the script – I made it 30 minutes in the example)
  • Captures the output from sys.dm_io_virtual_file_stats into the second table
  • Provides my usual virtual file stats output on the results

The original code and explanation about using the DMV are here.

Enjoy!

/*============================================================================
  File:     ShortPeriodIOLatencies.sql

  Summary:  Short snapshot of I/O latencies

  SQL Server Versions: 2005 onwards
------------------------------------------------------------------------------
  Written by Paul S. Randal, SQLskills.com

  (c) 2014, SQLskills.com. All rights reserved.

  For more scripts and sample code, check out http://www.SQLskills.com

  You may alter this code for your own *non-commercial* purposes (e.g. in a
  for-sale commercial tool). Use in your own environment is encouraged.
  You may republish altered code as long as you include this copyright and
  give due credit, but you must obtain prior permission before blogging
  this code.

  THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF
  ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED
  TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
  PARTICULAR PURPOSE.
============================================================================*/
 

IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
    WHERE [name] = N'##SQLskillsStats1')
    DROP TABLE [##SQLskillsStats1];

IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
    WHERE [name] = N'##SQLskillsStats2')
    DROP TABLE [##SQLskillsStats2];
GO

SELECT [database_id], [file_id], [num_of_reads], [io_stall_read_ms],
	   [num_of_writes], [io_stall_write_ms], [io_stall],
	   [num_of_bytes_read], [num_of_bytes_written], [file_handle]
INTO ##SQLskillsStats1
FROM sys.dm_io_virtual_file_stats (NULL, NULL);
GO

WAITFOR DELAY '00:30:00';
GO

SELECT [database_id], [file_id], [num_of_reads], [io_stall_read_ms],
	   [num_of_writes], [io_stall_write_ms], [io_stall],
	   [num_of_bytes_read], [num_of_bytes_written], [file_handle]
INTO ##SQLskillsStats2
FROM sys.dm_io_virtual_file_stats (NULL, NULL);
GO

WITH [DiffLatencies] AS
(SELECT
-- Files that weren't in the first snapshot
        [ts2].[database_id],
        [ts2].[file_id],
        [ts2].[num_of_reads],
        [ts2].[io_stall_read_ms],
		[ts2].[num_of_writes],
		[ts2].[io_stall_write_ms],
		[ts2].[io_stall],
		[ts2].[num_of_bytes_read],
		[ts2].[num_of_bytes_written]
    FROM [##SQLskillsStats2] AS [ts2]
    LEFT OUTER JOIN [##SQLskillsStats1] AS [ts1]
        ON [ts2].[file_handle] = [ts1].[file_handle]
    WHERE [ts1].[file_handle] IS NULL
UNION
SELECT
-- Diff of latencies in both snapshots
        [ts2].[database_id],
        [ts2].[file_id],
        [ts2].[num_of_reads] - [ts1].[num_of_reads] AS [num_of_reads],
        [ts2].[io_stall_read_ms] - [ts1].[io_stall_read_ms] AS [io_stall_read_ms],
		[ts2].[num_of_writes] - [ts1].[num_of_writes] AS [num_of_writes],
		[ts2].[io_stall_write_ms] - [ts1].[io_stall_write_ms] AS [io_stall_write_ms],
		[ts2].[io_stall] - [ts1].[io_stall] AS [io_stall],
		[ts2].[num_of_bytes_read] - [ts1].[num_of_bytes_read] AS [num_of_bytes_read],
		[ts2].[num_of_bytes_written] - [ts1].[num_of_bytes_written] AS [num_of_bytes_written]
    FROM [##SQLskillsStats2] AS [ts2]
    LEFT OUTER JOIN [##SQLskillsStats1] AS [ts1]
        ON [ts2].[file_handle] = [ts1].[file_handle]
    WHERE [ts1].[file_handle] IS NOT NULL)
SELECT
	DB_NAME ([vfs].[database_id]) AS [DB],
	LEFT ([mf].[physical_name], 2) AS [Drive],
	[mf].[type_desc],
	[num_of_reads] AS [Reads],
	[num_of_writes] AS [Writes],
	[ReadLatency(ms)] =
		CASE WHEN [num_of_reads] = 0
			THEN 0 ELSE ([io_stall_read_ms] / [num_of_reads]) END,
	[WriteLatency(ms)] =
		CASE WHEN [num_of_writes] = 0
			THEN 0 ELSE ([io_stall_write_ms] / [num_of_writes]) END,
	-- [Latency] =
		-- CASE WHEN ([num_of_reads] = 0 AND [num_of_writes] = 0)
			-- THEN 0 ELSE ([io_stall] / ([num_of_reads] + [num_of_writes])) END,
	[AvgBPerRead] =
		CASE WHEN [num_of_reads] = 0
			THEN 0 ELSE ([num_of_bytes_read] / [num_of_reads]) END,
	[AvgBPerWrite] =
		CASE WHEN [num_of_writes] = 0
			THEN 0 ELSE ([num_of_bytes_written] / [num_of_writes]) END,
	-- [AvgBPerTransfer] =
		-- CASE WHEN ([num_of_reads] = 0 AND [num_of_writes] = 0)
			-- THEN 0 ELSE
				-- (([num_of_bytes_read] + [num_of_bytes_written]) /
				-- ([num_of_reads] + [num_of_writes])) END,
	[mf].[physical_name]
FROM [DiffLatencies] AS [vfs]
JOIN sys.master_files AS [mf]
	ON [vfs].[database_id] = [mf].[database_id]
	AND [vfs].[file_id] = [mf].[file_id]
-- ORDER BY [ReadLatency(ms)] DESC
ORDER BY [WriteLatency(ms)] DESC;
GO

-- Cleanup
IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
    WHERE [name] = N'##SQLskillsStats1')
    DROP TABLE [##SQLskillsStats1];

IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
    WHERE [name] = N'##SQLskillsStats2')
    DROP TABLE [##SQLskillsStats2];
GO

22 thoughts on “Capturing IO latencies for a period of time

  1. Thanks, Paul, this and the waits stats script will be very useful. Just a quick simplication that I think works. In that WITH clause you can replace this

    SELECT
    — Files that weren’t in the first snapshot
    [ts2].[database_id],
    [ts2].[file_id],
    [ts2].[num_of_reads],
    [ts2].[io_stall_read_ms],
    [ts2].[num_of_writes],
    [ts2].[io_stall_write_ms],
    [ts2].[io_stall],
    [ts2].[num_of_bytes_read],
    [ts2].[num_of_bytes_written]
    FROM [##SQLskillsStats2] AS [ts2]
    LEFT OUTER JOIN [##SQLskillsStats1] AS [ts1]
    ON [ts2].[file_handle] = [ts1].[file_handle]
    WHERE [ts1].[file_handle] IS NULL
    UNION
    SELECT
    — Diff of latencies in both snapshots
    [ts2].[database_id],
    [ts2].[file_id],
    [ts2].[num_of_reads] – [ts1].[num_of_reads] AS [num_of_reads],
    [ts2].[io_stall_read_ms] – [ts1].[io_stall_read_ms] AS [io_stall_read_ms],
    [ts2].[num_of_writes] – [ts1].[num_of_writes] AS [num_of_writes],
    [ts2].[io_stall_write_ms] – [ts1].[io_stall_write_ms] AS [io_stall_write_ms],
    [ts2].[io_stall] – [ts1].[io_stall] AS [io_stall],
    [ts2].[num_of_bytes_read] – [ts1].[num_of_bytes_read] AS [num_of_bytes_read],
    [ts2].[num_of_bytes_written] – [ts1].[num_of_bytes_written] AS [num_of_bytes_written]
    FROM [##SQLskillsStats2] AS [ts2]
    LEFT OUTER JOIN [##SQLskillsStats1] AS [ts1]
    ON [ts2].[file_handle] = [ts1].[file_handle]
    WHERE [ts1].[file_handle] IS NOT NULL

    with this:

    SELECT
    — Diff of latencies in both snapshots
    [ts2].[database_id],
    [ts2].[file_id],
    [ts2].[num_of_reads] – COALESCE([ts1].[num_of_reads],0) AS [num_of_reads],
    [ts2].[io_stall_read_ms] – COALESCE([ts1].[io_stall_read_ms],0) AS [io_stall_read_ms],
    [ts2].[num_of_writes] – COALESCE([ts1].[num_of_writes],0) AS [num_of_writes],
    [ts2].[io_stall_write_ms] – COALESCE([ts1].[io_stall_write_ms],0) AS [io_stall_write_ms],
    [ts2].[io_stall] – COALESCE([ts1].[io_stall],0) AS [io_stall],
    [ts2].[num_of_bytes_read] – COALESCE([ts1].[num_of_bytes_read],0) AS [num_of_bytes_read],
    [ts2].[num_of_bytes_written] – COALESCE([ts1].[num_of_bytes_written],0) AS [num_of_bytes_written]
    FROM [##SQLskillsStats2] AS [ts2]
    LEFT OUTER JOIN [##SQLskillsStats1] AS [ts1]
    ON [ts2].[file_handle] = [ts1].[file_handle]

  2. very cool, just this morning I was thinking about setting up something like this. You saved me the work (again) and coming from sqlskills I can also safely assume it’s working as expected :) Thx again sir!

  3. thanks Paul for the script. I ran it for 5 hours on one of my PROD instances to track an application job. Read latencies comes up extremely very much higher than what is reported from Perfmon counters of the same LUN and during the same time period. (i.e. Sec/Transfer, Sec/Write & Sec/Read). Is there a way to explain that .. or is there other factors that I can consider for more deep look at this?

    Thanks and best regards ..

  4. Hi Paul,

    I was wondering if we could replace the delay statement by a t-sql statement which we wanted to know the i/o latency for (assuming I’m on an idle instance, no users, jobs… but me). Or is that a bit short-sighted?

    Chrz
    Thierry

    1. You could, but only if you empty the buffer pool first using DBCC DROPCLEANBUFFERS. Note that this will not likely be a valid, run-time test as in a real workload some of the pages may already be in memory.

      1. Thx Paul for the fast reply.
        The reason I asked is I wanted a quick way to see the difference between emc xtremeio cache enabled/disabled.
        I executed dropcleanbuffers in between each run as you mention (and checked using dm_os_buffer_descriptors)
        Regards
        Thierry

  5. Great script, and is a useful tool we can use to be a “Canary in the coalmine” script we can use to anticipate severe I/O issues before they actually impact our online environment. Question, I’m noticing your script does not do a division across 30 minutes to scale it. Does that mean if I ran this in 5 minute increments, the data would still be the same? Sorry about the question and hopefully not missing anything here.

    1. Yup – no scaling division necessary. It’s giving you the aggregate write latencies over (say) 30 minutes divided by the number of writes over that time.

  6. Thanks very much. And also, to scale this into a trend, I was thinking about aggregating it by drive letter (e.g. use AVG on the values) but I’m thinking in OLTP, this may not be a good idea, and take the time to develop a good reporting mechanism to take that into account. Your thoughts on this?

    1. Totally depends what metrics you want. Be aware that if you’re doing AVG over all your collected results, you’re really just doing the same as looking at the output from the DMV without a time constraint.

  7. Hi Paul,

    I cannot explain how good that script is!
    I am SharePoint admin and I have used it with many customers.

    I like powershell and I would like to make a PowerShell version of this script, so it will be easier for other SharePoint folks. I imagine it something like this .

    I will publish the script in TechNet Gallery. Will include the copyright in the script help section and in the script description and will give credits and link to your blog when I publish it.

    Can I have your permission to do so?

    Thank you in advance!

    Ivan

    1. Hi Ivan, I took a quick look at your PowerShell script, you had pointed out that it captures data for the duration of the specified “Minutes” variable.

      But it seems that it is fed to a “for” command that simply counts backwards, waits for the specified duration then runs only once.

      The CSV result file is equally large no matter the number of minutes specified. (in my environment approximately 4KB)
      Am I correct or am I missing something????

      Either which way, good job.

      Best Regards

  8. I’m curious why you use the IO stall divided by the number of reads and writes as opposed to the number of bytes? Ozar does the same thing and when I asked him, the response I got was essentially, “because I know more about it than you.”

    It’s not a criticism, just a lack of understanding on my part.

    1. Yeah, that’s not a very useful/nice response. The io_stall is the total aggregate stall time for all reads and writes together. To get the average per IO stall time, divide total stall time by number of IOs. Using number of bytes would give the stall time per byte, which is meaningless. Hope that helps.

      1. I’m pulling the differential data into a spreadsheet and calculating MB/Sec based on bytes read/write and read/write stall. Am I calculating that incorrectly? Is it even a valid metric?
        At the moment, I’m getting stellar read performance but my write rate is abysmal, so I’m trying to dig in and find out why.

  9. Nice script! Thanks!

    Slightly simplified a few things, and corrected handling when divisor is zero to display Null rather than Zero. (Strictly NaN – but that is splitting sticks).

    If object_id(N’Tempdb..##SQLskillsStats1′) is not null
    DROP TABLE [##SQLskillsStats1];

    SELECT [database_id], [file_id], [num_of_reads], [io_stall_read_ms],
    [num_of_writes], [io_stall_write_ms], [io_stall],
    [num_of_bytes_read], [num_of_bytes_written], [file_handle]
    INTO ##SQLskillsStats1
    FROM sys.dm_io_virtual_file_stats (NULL, NULL);
    GO

    WAITFOR DELAY ’00:30:00′;
    GO

    SELECT
    DB_NAME ([vfs].[database_id]) AS [DB],
    LEFT ([mf].[physical_name], 2) AS [Drive],
    [mf].[type_desc],
    [num_of_reads] AS [Reads],
    [num_of_writes] AS [Writes],
    [ReadLatency(ms)] = [io_stall_read_ms] / NullIf([num_of_reads],0),
    [WriteLatency(ms)] = [io_stall_write_ms] / NullIf([num_of_writes],0),
    [Latency] = [io_stall] / NullIf([num_of_reads] + [num_of_writes],0),
    [AvgBPerRead] = [num_of_bytes_read] / NullIf([num_of_reads],0),
    [AvgBPerWrite] = [num_of_bytes_written] / NullIf([num_of_writes],0),
    [AvgBPerTransfer] = ([num_of_bytes_read] + [num_of_bytes_written]) / NullIf([num_of_reads] + [num_of_writes],0),
    [mf].[physical_name]
    FROM (SELECT
    — Diff of latencies
    ts2.[database_id],
    ts2.[file_id],
    ts2.[num_of_reads] – Isnull(ts1.[num_of_reads],0) AS [num_of_reads],
    ts2.[io_stall_read_ms] – Isnull(ts1.[io_stall_read_ms],0) AS [io_stall_read_ms],
    ts2.[num_of_writes] – Isnull(ts1.[num_of_writes],0) AS [num_of_writes],
    ts2.[io_stall_write_ms] – Isnull(ts1.[io_stall_write_ms],0) AS [io_stall_write_ms],
    ts2.[io_stall] – Isnull(ts1.[io_stall],0) AS [io_stall],
    ts2.[num_of_bytes_read] – Isnull(ts1.[num_of_bytes_read],0) AS [num_of_bytes_read],
    ts2.[num_of_bytes_written] – Isnull(ts1.[num_of_bytes_written],0) AS [num_of_bytes_written]
    FROM (
    SELECT [database_id], [file_id], [num_of_reads], [io_stall_read_ms],
    [num_of_writes], [io_stall_write_ms], [io_stall],
    [num_of_bytes_read], [num_of_bytes_written], [file_handle]
    FROM sys.dm_io_virtual_file_stats (NULL, NULL)
    ) AS ts2
    LEFT OUTER JOIN [##SQLskillsStats1] AS ts1
    ON ts2.[file_handle] = ts1.[file_handle]
    ) AS [vfs]
    JOIN sys.master_files AS [mf]
    ON [vfs].[database_id] = [mf].[database_id]
    AND [vfs].[file_id] = [mf].[file_id]
    — ORDER BY [ReadLatency(ms)] DESC
    ORDER BY [WriteLatency(ms)] DESC, [ReadLatency(ms)] DESC, DB;
    GO

    — Cleanup
    If object_id(N’Tempdb..##SQLskillsStats1′) is not null
    DROP TABLE [##SQLskillsStats1];

  10. Hi,

    Great Job!!!
    I created a JOB that insert’s in the table log each 1 minute the information using your approach:

    1) Create the table log:

    SELECT getdate() as Collection_time, [database_id], [file_id], [num_of_reads], [io_stall_read_ms],
    [num_of_writes], [io_stall_write_ms], [io_stall],
    [num_of_bytes_read], [num_of_bytes_written], [file_handle]
    INTO tb_io_lat
    FROM sys.dm_io_virtual_file_stats (NULL, NULL)
    where 1=2

    2) Create the procedure or t-sql in the sqlserver JOB with this command to execute each 1 minute:

    INSERT INTO TB_LOG_LAT
    SELECT getdate(),[database_id], [file_id], [num_of_reads], [io_stall_read_ms],
    [num_of_writes], [io_stall_write_ms], [io_stall],
    [num_of_bytes_read], [num_of_bytes_written], [file_handle]
    FROM sys.dm_io_virtual_file_stats (NULL, NULL);

    When I want to see delta information, i execute the below query. This work fine for me.: ( Before someone asks, I put only one AND to exclude some initial non useful data.)

    SELECT [Collection_time],
    DB_NAME ([vfs].[database_id]) AS [DB],
    LEFT ([mf].[physical_name], 2) AS [Drive],
    [mf].[type_desc],
    — [num_of_reads_T],
    [num_of_reads] AS [Reads],
    — [num_of_writes_T],
    [num_of_writes] AS [Writes],
    — [io_stall_read_ms_T],
    [ReadLatency(ms)] =
    ROUND(CASE WHEN [num_of_reads] = 0
    THEN 0 ELSE ([io_stall_read_ms] / [num_of_reads]) END ,2,1),
    — [io_stall_write_ms_T],
    [WriteLatency(ms)] =
    ROUND(CASE WHEN [num_of_writes] = 0
    THEN 0 ELSE ([io_stall_write_ms] / [num_of_writes]) END ,2,1),
    /*[Latency] =
    CASE WHEN ([num_of_reads] = 0 AND [num_of_writes] = 0)
    THEN 0 ELSE ([io_stall] / ([num_of_reads] + [num_of_writes])) END,*/
    [AvgBPerRead] =
    ROUND(CASE WHEN [num_of_reads] = 0
    THEN 0 ELSE ([num_of_bytes_read] / [num_of_reads]) END ,2,1),
    [AvgBPerWrite] =
    ROUND(CASE WHEN [num_of_writes] = 0
    THEN 0 ELSE ([num_of_bytes_written] / [num_of_writes]) END,2,1),
    [mf].[physical_name]
    FROM (
    SELECT [Collection_time],
    database_id,
    file_id ,
    num_of_reads as num_of_reads_T,
    case when num_of_reads = num_of_reads – LAG(num_of_reads, 1,0) OVER (PARTITION BY [database_id],[file_id] ORDER BY [collection_time],[database_id],[file_id] ) then -9999999999 else
    num_of_reads – LAG(num_of_reads, 1,0) OVER (PARTITION BY [database_id],[file_id] ORDER BY [collection_time],[database_id],[file_id] )END AS [num_of_reads],
    num_of_writes as num_of_writes_T,
    case when num_of_writes = num_of_writes – LAG(num_of_writes, 1,0) OVER (PARTITION BY [database_id],[file_id] ORDER BY [collection_time],[database_id],[file_id] ) then -9999999999 else
    num_of_writes – LAG(num_of_writes, 1,0) OVER (PARTITION BY [database_id],[file_id] ORDER BY [collection_time],[database_id],[file_id] ) END AS num_of_writes,
    io_stall_read_ms as io_stall_read_ms_T,
    case when io_stall_read_ms = io_stall_read_ms – LAG(io_stall_read_ms, 1,0) OVER (PARTITION BY [database_id],[file_id] ORDER BY [collection_time],[database_id],[file_id] ) then -9999999999 else
    io_stall_read_ms – LAG(io_stall_read_ms, 1,0) OVER (PARTITION BY [database_id],[file_id] ORDER BY [collection_time],[database_id],[file_id] ) END AS io_stall_read_ms,
    io_stall_write_ms as io_stall_write_ms_T,
    case when io_stall_write_ms = io_stall_write_ms – LAG(io_stall_write_ms, 1,0) OVER (PARTITION BY [database_id],[file_id] ORDER BY [collection_time],[database_id],[file_id] ) then -9999999999 else
    io_stall_write_ms – LAG(io_stall_write_ms, 1,0) OVER (PARTITION BY [database_id],[file_id] ORDER BY [collection_time],[database_id],[file_id] ) END AS io_stall_write_ms,
    case when io_stall = io_stall – LAG(io_stall, 1,0) OVER (PARTITION BY [database_id],[file_id] ORDER BY [collection_time],[database_id],[file_id] ) then -9999999999 else
    io_stall – LAG(io_stall, 1,0) OVER (PARTITION BY [database_id],[file_id] ORDER BY [collection_time],[database_id],[file_id] ) END AS io_stall,
    case when num_of_bytes_read = num_of_bytes_read – LAG(num_of_bytes_read, 1,0) OVER (PARTITION BY [database_id],[file_id] ORDER BY [collection_time],[database_id],[file_id] ) then -9999999999 else
    num_of_bytes_read – LAG(num_of_bytes_read, 1,0) OVER (PARTITION BY [database_id],[file_id] ORDER BY [collection_time],[database_id],[file_id] ) END AS num_of_bytes_read,
    case when num_of_bytes_written = num_of_bytes_written – LAG(num_of_bytes_written, 1,0) OVER (PARTITION BY [database_id],[file_id] ORDER BY [collection_time],[database_id],[file_id] ) then -9999999999 else
    num_of_bytes_written – LAG(num_of_bytes_written, 1,0) OVER (PARTITION BY [database_id],[file_id] ORDER BY [collection_time],[database_id],[file_id] ) END AS num_of_bytes_written
    FROM [dbserv].[dbo].[tb_io_lat] ) AS [vfs]
    JOIN sys.master_files AS [mf]
    ON [vfs].[database_id] = [mf].[database_id]
    AND [vfs].[file_id] = [mf].[file_id]
    WHERE DB_NAME ([vfs].[database_id]) NOT IN (‘dbserv’,’model’,’FIMANAGERBD_CAOA_HOMOLOG’,’FIMANAGERBD_CAOA_TREINAMENTO’,’Caoa_Homologacao2′)
    AND [num_of_reads] -9999999999
    — ORDER BY [ReadLatency(ms)] DESC
    ORDER BY [Collection_time],[WriteLatency(ms)] DESC;

    Thank you..

    Wellington Prado

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.