A quick and somewhat nuanced tip today… When evaluating sys.dm_io_virtual_file_stats against collected perfmon counter data, be careful not to erroneously invalidate the latency findings by measuring and comparing against the averages of already-averaged collected performance monitor data (e.g. Avg. Disk sec/Write, Avg. Disk sec/Read ). I’ve seen scenarios where there is true I/O latency that has been identified in dm_io_virtual_file_stats, but when the DBA goes to other teams to discuss, they lose the argument due to bad comparisons.
To illustrate the point, I created a new database on one of my older, non-RAID USB drives:
CREATE DATABASE [SampleIOHammering] CONTAINMENT = NONE ON PRIMARY ( NAME = N'SampleIOHammering', FILENAME = N'F:\SampleIOHammering.mdf' , SIZE = 8388608KB , FILEGROWTH = 1048576KB ) LOG ON ( NAME = N'SampleIOHammering_log', FILENAME = N'F:\SampleIOHammering_log.ldf' , SIZE = 1048576KB , FILEGROWTH = 1048576KB ); GO ALTER DATABASE [SampleIOHammering] SET RECOVERY SIMPLE; GO
I’ve intentionally used slow disk and the database creation operation took 7 minutes and 5 seconds (no IFI being used).
Immediately after creating the database, I executed the following query (concentrating on write activity):
SELECT file_id, io_stall_write_ms, num_of_writes FROM sys.dm_io_virtual_file_stats(DB_ID('SampleIOHammering'), NULL);
This shows the following results – which don’t truly represent the full lifecycle of the file creation and the associated latency:
How can I tell that the accumulated stats are representative of the actual database creation? I did so through associated perfmon counters which were being collected via a data collector set concurrently… Here were the averages for the drive over the database creation period (and this is the average over the averages):
So we’re definitely seeing latency that is beyond the recommended number represented in average disk seconds per write, and if anything, that latency isn’t yet fully reflected in the virtual file stats DMV (yet).
Next I created a new table and loaded it with 1.6 million rows from another database:
USE [SampleIOHammering]; GO CREATE TABLE [dbo].[charge] ( [charge_no] INT NOT NULL, [member_no] INT NOT NULL, [provider_no] INT NOT NULL, [category_no] INT NOT NULL, [charge_dt]2021-09-26, 30NOT NULL, [charge_amt] [money] NOT NULL, [statement_no] INT NOT NULL, [charge_code] CHAR(2) NOT NULL ); GO INSERT [dbo].[charge] SELECT .[charge_no], .[member_no], .[provider_no], .[category_no], .[charge_dt], .[charge_amt], .[statement_no], .[charge_code] FROM [Credit].[dbo].[charge] AS c; GO
This shows the following virtual stats results:
Unlike the database creation, we’re now seeing some real stat accumulations. If we divide the IO stall write ms by the number of writes, we see about 152 ms as the average latency for the transaction log file (file 2).
And now if we look at perfmon counters, we see the following associated latency data averages across the collection time period for my INSERT:
Focusing on Avg. Disk sec/Write – we see 52 milliseconds certainly isn’t ideal – but at this point you may have the storage or Windows admin say something like “This doesn’t match your average of 152 milliseconds… Your DMV must be wrong.”
But what if you switched from using an average of the perfmon data to the maximum?
Making this change shifts the numbers as follows:
Notice now that the maximum average disk seconds per write is 156 milliseconds at peak – which is more in the ballpark of our 152 millisecond average we saw from a virtual file stats perspective. Granted it is the maximum measurement – but it brings us back into alignment with the two different data sources. Again, this is a nuanced point – but I’ve noticed that sometimes non-SQL folks are doubtful about the SQL perspective of things, and so it helps to make sure you are comparing data in a fair way.