Over the last few months I’ve been lecturing at classes and conferences about getting SQL Server’s view of the I/O subsystem and what latencies it is experiencing, so time for a blog post to help everyone else.

Most SQL Server’s today are I/O bound – that’s a generally agreed-on statement by DBAs and consultants in the field. This means that the major factor in server performance is its ability to perform I/O operations quickly. If the I/O subsystem cannot keep up with the demand being placed on it, then the SQL Server workload will suffer performance problems.

Now, saying that, one trap that many people fall into is equating increased I/O subsystem latencies with poor I/O subsystem performance. This is often not the case at all. It’s usually the case that the I/O subsystem performs fine when the designed-for I/O workload is happening, but becomes the performance bottleneck when the I/O workload increases past the I/O subsystem design point. The I/O workload increase is what’s causing the problem, not the I/O subsystem – if you design an I/O subsystem to support 1000 IOPS (I/O operations per second) and SQL Server is trying to push 2000 IOPS, performance is going to suffer.

If you find that I/O latency has increased, look to a change in SQL Server behavior before blaming the I/O subsystem. For instance:

  • Query plan changes from out-of-date statistics, code changes, implicit conversions, poor indexing that cause table scans rather than index seeks
  • Additional indexes being created that cause increased index maintenance workload and logging
  • Access pattern/key changes that cause page splits and hence extra page reads/writes and logging
  • Adding change data capture or change tracking that causes extra writes and logging
  • Enabling snapshot isolation that causes tempdb I/Os, plus potentially page splits
  • Decreased server memory leading to a smaller buffer pool and increased lazy writer and read activity

And a whole host of other reasons can lead to an increased I/O workload where it’s not the I/O subsystem’s fault.

On the other hand, however, it may very well be the I/O subsystem that has an issue if the SQL Server workload is the same. A SAN administrator may have decided to give some of the space on one of the SQL Server LUNs to another server, which can lead to an overload.

Using performance counters, you can see at the physical disk level what the latencies are (looking at the Avg. Disk sec/Read and Avg. Disk sec/Write counters) but if you have your databases spread across a few LUNs, that doesn’t help you pinpointing which database files are experiencing latency issues and driving the most I/Os to the I/O subsystem.

This is where sys.dm_io_virtual_file_stats comes in. It was introduced in SQL Server 2005 as a beefed-up replacement for fn_virtualfilestats and shows you how many I/Os have occurred, with latencies for all files. You can give it a database ID and a file ID, but I found it most useful to look at all the files on the server and order by one of the statistics.

Here’s the default output (with some column names changed slightly to make things fit nicely):

SELECT * FROM sys.dm_io_virtual_file_stats (NULL, NULL);
GO

 

db_id file_id sample_ms reads bytes_read io_stall_read_ms writes bytes_written io_stall_write_ms io_stall size_on_disk_bytes file_handle
----- ------- --------- ----- ---------- ---------------- ------ ------------- ----------------- -------- ------------------ ------------------
1     1       14433009  52    3227648    58               10     90112         12                70       515964928          0x000000000000013C
1     2       14433009  55    1241088    36               28     122880        115               151      12648448           0x0000000000000474
2     1       14433009  22    1261568    22               4      147456        1                 23       8388608            0x000000000000065C
2     2       14433009  6     385024     2                7      154112        3                 5        524288             0x0000000000000660
3     1       14433009  31    1859584    18               1      8192          0                 18       1310720            0x0000000000000524
3     2       14433009  6     385024     5                3      16384         1                 6        524288             0x000000000000056C
4     1       14433009  139   8175616    369              13     106496        45                414      260767744          0x0000000000000958
4     2       14433009  136   712704     485              7      53248         25                510      104595456          0x00000000000009F0
5     1       14433009  20    1138688    204              0      0             0                 204      419430400          0x0000000000000980
5     2       14433009  13    411136     466              1      2048          13                479      104857600          0x0000000000000A44
6     1       14433009  15    811008     81               1      8192          8                 89       1310720            0x0000000000000954
6     2       14433009  7     386560     89               4      11264         80                169      516096             0x00000000000009EC
7     1       14433009  15    811008     80               1      8192          9                 89       1310720            0x0000000000000970
7     2       14433009  7     386560     130              3      9216          17                147      516096             0x00000000000009B4
8     1       14433009  16    876544     37               1      8192          103               140      136577024          0x0000000000000988
8     2       14433009  9     229888     11               4      10752         59                70       5242880            0x00000000000009E4
.
.

This isn’t that useful because a) I don’t have database IDs and file paths memorized, and b) it gives aggregate latencies (io_stall_read_ms and io_stall_write_ms).

What I usually do is use the script below – part of my standard set of scripts I use when doing a server health check for a client. It’s based in part on code from by good friend Jimmy May (blog|twitter), with a bunch of tweaks. It allows me to filter on read or write latencies and it joins with sys.master_files to get database names and file paths.

SELECT
    --virtual file latency
    [ReadLatency] =
        CASE WHEN [num_of_reads] = 0
            THEN 0 ELSE ([io_stall_read_ms] / [num_of_reads]) END,
    [WriteLatency] =
        CASE WHEN [num_of_writes] = 0
            THEN 0 ELSE ([io_stall_write_ms] / [num_of_writes]) END,
    [Latency] =
        CASE WHEN ([num_of_reads] = 0 AND [num_of_writes] = 0)
            THEN 0 ELSE ([io_stall] / ([num_of_reads] + [num_of_writes])) END,
    --avg bytes per IOP
    [AvgBPerRead] =
        CASE WHEN [num_of_reads] = 0
            THEN 0 ELSE ([num_of_bytes_read] / [num_of_reads]) END,
    [AvgBPerWrite] =
        CASE WHEN [io_stall_write_ms] = 0
            THEN 0 ELSE ([num_of_bytes_written] / [num_of_writes]) END,
    [AvgBPerTransfer] =
        CASE WHEN ([num_of_reads] = 0 AND [num_of_writes] = 0)
            THEN 0 ELSE
                (([num_of_bytes_read] + [num_of_bytes_written]) /
                ([num_of_reads] + [num_of_writes])) END,
    LEFT ([mf].[physical_name], 2) AS [Drive],
    DB_NAME ([vfs].[database_id]) AS [DB],
    --[vfs].*,
    [mf].[physical_name]
FROM
    sys.dm_io_virtual_file_stats (NULL,NULL) AS [vfs]
JOIN sys.master_files AS [mf]
    ON [vfs].[database_id] = [mf].[database_id]
    AND [vfs].[file_id] = [mf].[file_id]
-- WHERE [vfs].[file_id] = 2 -- log files
-- ORDER BY [Latency] DESC
-- ORDER BY [ReadLatency] DESC
ORDER BY [WriteLatency] DESC;
GO

 

ReadLatency WriteLatency Latency AvgBPerRead AvgBPerWrite AvgBPerTransfer Drive DB                      physical_name
----------- ------------ ------- ----------- ------------ --------------- ----- ----------------------- ------------------------------------------
2           103          8       54784       8192         52043           D:    HotSpot                 D:\SQLskills\HotSpot_data.mdf
7           48           13      63061       5632         54857           C:    DemoCorruptSystemTables C:\SQLskills\corrupt_log.LDF
32          41           33      43861       2048         39680           D:    DemoSystemIndex3        D:\SQLskills\DemoSystemIndex3_LOG.ldf
27          40           28      43861       1024         39577           D:    DemoSystemIndex5        D:\SQLskills\DemoSystemIndex5_LOG.ldf
23          39           24      43861       7168         40192           D:    DemoSystemIndex4        D:\SQLskills\DemoSystemIndex4_LOG.ldf
7           38           10      43861       4608         39936           D:    DemoSystemIndex1        D:\SQLskills\DemoSystemIndex1_LOG.ldf
11          25           12      43861       7168         40192           C:    FSWaits                 C:\SQLskills\FSWaits_log.ldf
12          20           13      55222       6656         49152           C:    newdatabase             C:\SQLskills\newdatabase_log.LDF
12          20           15      55222       2816         36165           C:    LockResDemo             C:\SQLskills\LockResDemo_log.LDF
14          17           15      45933       2816         30254           C:    TempdbTest              C:\SQLskills\TempdbTest_log.LDF
1           14           5       25543       2688         18510           C:    HotSpot                 C:\SQLskills\HotSpot_log.ldf
40          14           39      31625       7680         29915           D:    RecompileTest           D:\SQLskills\RecompileTest_log.ldf
9           13           10      51858       7680         46336           D:    DemoFatalCorruption1    D:\SQLskills\DemoFatalCorruption1_log.ldf
35          13           34      31625       2048         29513           D:    GUIDTest                D:\SQLskills\GUIDTest_log.ldf
12          13           12      16861       1024         15872           C:    FNDBLogTest             C:\SQLskills\FNDBLogTest_log.LDF
.
.

This is much more useful as it allows me to quickly see where the read and write hot spots are and then drill into a database to see what’s going on, and if nothing out of the ordinary, ask the SAN admin to move those hot spot files to dedicated and/or faster storage.

One question I’m sure you’re going to ask is “what is an acceptable read or write latency?” and the answer is a big “it depends!”. It depends on the I/O subsystem and how it’s been configured. The key is producing a performance baseline for when things are running acceptably well and then seeing where the results from the DMV deviate from your baseline. On well-configured storage which isn’t being overloaded I’d expect to see single-digit ms for either read or write latency, but this will vary based on the rotation speed and technology of the drives (SCSI vs SATA vs SSD etc).

Enjoy!