sqlskills-logo-2015-white.png

Avoid false negatives when comparing sys.dm_io_virtual_file_stats data to perfmon counter data

 

A quick and somewhat nuanced tip today… When evaluating sys.dm_io_virtual_file_stats against collected perfmon counter data, be careful not to erroneously invalidate the latency findings by measuring and comparing against the averages of already-averaged collected performance monitor data (e.g. Avg. Disk sec/Write, Avg. Disk sec/Read ).  I’ve seen scenarios where there is true I/O latency that has been identified in dm_io_virtual_file_stats, but when the DBA goes to other teams to discuss, they lose the argument due to bad comparisons.

To illustrate the point, I created a new database on one of my older, non-RAID USB drives:

CREATE DATABASE [SampleIOHammering]
CONTAINMENT = NONE
ON  PRIMARY
( NAME = N'SampleIOHammering',
FILENAME = N'F:\SampleIOHammering.mdf' ,
SIZE = 8388608KB , FILEGROWTH = 1048576KB )
LOG ON
( NAME = N'SampleIOHammering_log',
FILENAME = N'F:\SampleIOHammering_log.ldf' ,
SIZE = 1048576KB ,
FILEGROWTH = 1048576KB );
GO

ALTER DATABASE [SampleIOHammering] SET RECOVERY SIMPLE;
GO

I’ve intentionally used slow disk and the database creation operation took 7 minutes and 5 seconds (no IFI being used).

Immediately after creating the database, I executed the following query (concentrating on write activity):

SELECT  file_id,
io_stall_write_ms,
num_of_writes
FROM    sys.dm_io_virtual_file_stats(DB_ID('SampleIOHammering'), NULL);

This shows the following results – which don’t truly represent the full lifecycle of the file creation and the associated latency:

file_id io_stall_write_ms num_of_writes
1 76 10
2 25 15

How can I tell that the accumulated stats are representative of the actual database creation?  I did so through associated perfmon counters which were being collected via a data collector set concurrently… Here were the averages for the drive over the database creation period (and this is the average over the averages):

image

So we’re definitely seeing latency that is beyond the recommended number represented in average disk seconds per write, and if anything, that latency isn’t yet fully reflected in the virtual file stats DMV (yet).

Next I created a new table and loaded it with 1.6 million rows from another database:

USE [SampleIOHammering];
GO

CREATE TABLE [dbo].[charge]
(
[charge_no] INT NOT NULL,
[member_no] INT NOT NULL,
[provider_no] INT NOT NULL,
[category_no] INT NOT NULL,
[charge_dt] [datetime] NOT NULL,
[charge_amt] [money] NOT NULL,
[statement_no] INT NOT NULL,
[charge_code] CHAR(2) NOT NULL
);

GO

INSERT  [dbo].[charge]
SELECT  [c].[charge_no],
[c].[member_no],
[c].[provider_no],
[c].[category_no],
[c].[charge_dt],
[c].[charge_amt],
[c].[statement_no],
[c].[charge_code]
FROM    [Credit].[dbo].[charge] AS c;
GO

This shows the following virtual stats results:

file_id io_stall_write_ms num_of_writes
1 8917 79
2 575633 3783

Unlike the database creation, we’re now seeing some real stat accumulations. If we divide the IO stall write ms by the number of writes, we see about 152 ms as the average latency for the transaction log file (file 2).

And now if we look at perfmon counters, we see the following associated latency data averages across the collection time period for my INSERT:

image

Focusing on Avg. Disk sec/Write – we see 52 milliseconds certainly isn’t ideal – but at this point you may have the storage or Windows admin say something like “This doesn’t match your average of 152 milliseconds…  Your DMV must be wrong.”

But what if you switched from using an average of  the perfmon data to the maximum?

image

Making this change shifts the numbers as follows:

image

Notice now that the maximum average disk seconds per write is 156 milliseconds at peak – which is more in the ballpark of our 152 millisecond average we saw from a virtual file stats perspective.  Granted it is the maximum measurement – but it brings us back into alignment with the two different data sources. Again, this is a nuanced point – but I’ve noticed that sometimes non-SQL folks are doubtful about the SQL perspective of things, and so it helps to make sure you are comparing data in a fair way.

3 thoughts on “Avoid false negatives when comparing sys.dm_io_virtual_file_stats data to perfmon counter data

  1. Joe, so what’s the best tool to gather io stats then? From this article it seems perfmon is off compared to what dm_io_virtual_stats is telling us?

    1. Hi Eric,

      I like to use a combination of sources – still including Perfmon counters, Virtual File Stats and wait time stats. Three different angles – but all useful. I don’t like to rely on one pointer when going down the I/O troubleshooting path.

Comments are closed.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.