Today, we will go through four queries that are related to storage. We start out with Query #23, which is Volume Info. This query retrieves information from the sys.master_files catalog view and the sys.dm_os_volume_stats dynamic management function about any operating system volumes or mount points on your server where you have any SQL Server database files. Query #23 is shown in Figure 1.
1: -- Volume info for all LUNS that have database files on the current instance (Query 23) (Volume Info)
2: SELECT DISTINCT vs.volume_mount_point, vs.file_system_type,
3: vs.logical_volume_name, CONVERT(DECIMAL(18,2),vs.total_bytes/1073741824.0) AS [Total Size (GB)],
4: CONVERT(DECIMAL(18,2),vs.available_bytes/1073741824.0) AS [Available Size (GB)],
5: CAST(CAST(vs.available_bytes AS FLOAT)/ CAST(vs.total_bytes AS FLOAT) AS DECIMAL(18,2)) * 100 AS [Space Free %]
6: FROM sys.master_files AS f WITH (NOLOCK)
7: CROSS APPLY sys.dm_os_volume_stats(f.database_id, f.[file_id]) AS vs
8: ORDER BY vs.volume_mount_point OPTION (RECOMPILE);
10: -- Shows you the total and free space on the LUNs where you have database files
11: -- Being low on free space can negatively affect performance
Figure 1: Query #23 Volume Info
This query shows you how large each operating system volume or mount point is, and how much space is available. You don’t ever want to run out of space on a volume where SQL Server database files are located (for obvious reasons), but many people are not aware that being low on space can affect the performance of the volume, with both magnetic storage and with flash storage.
With magnetic storage, being low on disk space means that you may be no longer getting the increased disk performance possible from short-stroking. This can affect your disk performance with magnetic drives because the arm that moves the drive heads back and forth to read the data has to move further as an individual disk becomes more and more full, plus the disk platters are moving slower on the inside portion compared to the outside portion of the disk.
The reason why I say that this only “may” be affecting your disk performance is because with many SANs, whatever space has been allocated for a volume may be spread across all of the available drives in the SAN, so the fact that your volume is low on space may have very little to do with what is happening with the overall SAN.
With flash storage, as the drive gets lower on space, it becomes more difficult for garbage collection and TRIM operations to complete without affecting performance. Generally speaking, your write performance will tend to decrease and become more inconsistent as the drive is closer to being full of data.
Query #24 is Drive Level Latency. This query retrieves information from the sys.master_files catalog view and the sys.dm_io_virtual_file_stats dynamic management function about I/O statistics for SQL Server data and log files. Query #24 is shown in Figure 2.
1: -- Drive level latency information (Query 24) (Drive Level Latency)
2: -- Based on code from Jimmy May
3: SELECT tab.[Drive], tab.volume_mount_point AS [Volume Mount Point],
5: WHEN num_of_reads = 0 THEN 0
6: ELSE (io_stall_read_ms/num_of_reads)
7: END AS [Read Latency],
9: WHEN io_stall_write_ms = 0 THEN 0
10: ELSE (io_stall_write_ms/num_of_writes)
11: END AS [Write Latency],
13: WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0
14: ELSE (io_stall/(num_of_reads + num_of_writes))
15: END AS [Overall Latency],
17: WHEN num_of_reads = 0 THEN 0
18: ELSE (num_of_bytes_read/num_of_reads)
19: END AS [Avg Bytes/Read],
21: WHEN io_stall_write_ms = 0 THEN 0
22: ELSE (num_of_bytes_written/num_of_writes)
23: END AS [Avg Bytes/Write],
25: WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0
26: ELSE ((num_of_bytes_read + num_of_bytes_written)/(num_of_reads + num_of_writes))
27: END AS [Avg Bytes/Transfer]
28: FROM (SELECT LEFT(UPPER(mf.physical_name), 2) AS Drive, SUM(num_of_reads) AS num_of_reads,
29: SUM(io_stall_read_ms) AS io_stall_read_ms, SUM(num_of_writes) AS num_of_writes,
30: SUM(io_stall_write_ms) AS io_stall_write_ms, SUM(num_of_bytes_read) AS num_of_bytes_read,
31: SUM(num_of_bytes_written) AS num_of_bytes_written, SUM(io_stall) AS io_stall, vs.volume_mount_point
32: FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
33: INNER JOIN sys.master_files AS mf WITH (NOLOCK)
34: ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
35: CROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.[file_id]) AS vs
36: GROUP BY LEFT(UPPER(mf.physical_name), 2), vs.volume_mount_point) AS tab
37: ORDER BY [Overall Latency] OPTION (RECOMPILE);
39: -- Shows you the drive-level latency for reads and writes, in milliseconds
40: -- Latency above 20-25ms is usually a problem
Figure 2: Query #24 Drive Level Latency
This particular query aggregates all of the file activity for your database data and log files up to the volume level. This activity includes your normal workload, plus anything else (such as index maintenance, database backups, log reader activity for things like replication, database mirroring, and AlwaysOn AGs) that touches your database data and log files.
These numbers are cumulative since SQL Server was last started. This means that the latency numbers you see here will usually tend to be a little higher than what you might see in Performance Monitor, and also higher than what shows up at the SAN-level. This query helps you understand your disk performance for SQL Server at the volume level. Once you understand this, then you are ready to drill down to the individual database file level with the next query.
Query #25 is IO Stalls by File. This query retrieves information from the sys.master_files catalog view and the sys.dm_io_virtual_file_stats dynamic management function about I/O statistics for SQL Server data and log files. Query #25 is shown in Figure 3.
1: -- Calculates average stalls per read, per write, and per total input/output for each database file (Query 25) (IO Stalls by File)
2: SELECT DB_NAME(fs.database_id) AS [Database Name], CAST(fs.io_stall_read_ms/(1.0 + fs.num_of_reads) AS NUMERIC(10,1)) AS [avg_read_stall_ms],
3: CAST(fs.io_stall_write_ms/(1.0 + fs.num_of_writes) AS NUMERIC(10,1)) AS [avg_write_stall_ms],
4: CAST((fs.io_stall_read_ms + fs.io_stall_write_ms)/(1.0 + fs.num_of_reads + fs.num_of_writes) AS NUMERIC(10,1)) AS [avg_io_stall_ms],
5: CONVERT(DECIMAL(18,2), mf.size/128.0) AS [File Size (MB)], mf.physical_name, mf.type_desc, fs.io_stall_read_ms, fs.num_of_reads,
6: fs.io_stall_write_ms, fs.num_of_writes, fs.io_stall_read_ms + fs.io_stall_write_ms AS [io_stalls], fs.num_of_reads + fs.num_of_writes AS [total_io],
7: io_stall_queued_read_ms AS [Resource Governor Total Read IO Latency (ms)], io_stall_queued_write_ms AS [Resource Governor Total Write IO Latency (ms)]
8: FROM sys.dm_io_virtual_file_stats(null,null) AS fs
9: INNER JOIN sys.master_files AS mf WITH (NOLOCK)
10: ON fs.database_id = mf.database_id
11: AND fs.[file_id] = mf.[file_id]
12: ORDER BY avg_io_stall_ms DESC OPTION (RECOMPILE);
14: -- Helps determine which database files on the entire instance have the most I/O bottlenecks
15: -- This can help you decide whether certain LUNs are overloaded and whether you might
16: -- want to move some files to a different location or perhaps improve your I/O performance
Figure 3: Query #25 IO Stalls by File
This query is similar to the previous query, with all of the same caveats. This query goes down to the individual database file level rather than rolling up the data to the volume level. Once you see high read or write latency for a particular volume, you will want to then determine which database file(s) are seeing the highest latency on that volume. This will help you decide what you might be able to do to improve the situation.
Query #26 is IO Warnings. This query reads the five most recent SQL Server Error logs to look for 15 second I/O warnings. Query #26 is shown in Figure 4.
1: -- Look for I/O requests taking longer than 15 seconds in the five most recent SQL Server Error Logs (Query 26) (IO Warnings)
2: CREATE TABLE #IOWarningResults(LogDate datetime, ProcessInfo sysname, LogText nvarchar(1000));
4: INSERT INTO #IOWarningResults
5: EXEC xp_readerrorlog 0, 1, N'taking longer than 15 seconds';
7: INSERT INTO #IOWarningResults
8: EXEC xp_readerrorlog 1, 1, N'taking longer than 15 seconds';
10: INSERT INTO #IOWarningResults
11: EXEC xp_readerrorlog 2, 1, N'taking longer than 15 seconds';
13: INSERT INTO #IOWarningResults
14: EXEC xp_readerrorlog 3, 1, N'taking longer than 15 seconds';
16: INSERT INTO #IOWarningResults
17: EXEC xp_readerrorlog 4, 1, N'taking longer than 15 seconds';
19: SELECT LogDate, ProcessInfo, LogText
20: FROM #IOWarningResults
21: ORDER BY LogDate DESC;
23: DROP TABLE #IOWarningResults;
25: -- Finding 15 second I/O warnings in the SQL Server Error Log is useful evidence of
26: -- poor I/O performance (which might have many different causes)
28: -- Diagnostics in SQL Server help detect stalled and stuck I/O operations
29: -- https://support.microsoft.com/en-us/kb/897284
Figure 4: Query #26 IO Warnings
This query may take some time to complete if your SQL Server Error log files are very large. Finding any 15 second I/O warnings is very strong evidence that you were seeing very poor storage subsystem performance when they were recorded. The warnings go down to the database file level, so you can see which drive and which file was affected.
What you want to look for is whether there is any sort of recognizable pattern to when you are getting these warnings. For example, perhaps you often see a series of 15 second warnings for the data files for one database at around 3AM each morning. Perhaps this is when you are doing index maintenance, plus running DBCC CHECKDB at the same time, or some other repeating pattern of scheduled disk intensive activity.
On the other hand, it may be the case that you see many 15 second I/O warnings at random times, spread across many drive volumes and disk files. This would be a pretty good indicator of generally poor storage subsystem performance!
The results of these four queries are very useful to have available when you talk to your storage administrator about disk performance. The more evidence and metrics you have available about what SQL Server is seeing from a storage perspective, the stronger your case will be.