SQL Server Diagnostic Information Queries Detailed, Day 18

For Day 18 of this series, we start out with Query #41, which is Memory Clerk Usage. This query retrieves information from the sys.dm_os_memory_clerks dynamic management view about total memory usage by your active memory clerks. Query #41 is shown in Figure 1.

   1: -- Memory Clerk Usage for instance  (Query 41) (Memory Clerk Usage)

   2: -- Look for high value for CACHESTORE_SQLCP (Ad-hoc query plans)

   3: SELECT TOP(10) mc.[type] AS [Memory Clerk Type], 

   4:        CAST((SUM(mc.pages_kb)/1024.0) AS DECIMAL (15,2)) AS [Memory Usage (MB)] 

   5: FROM sys.dm_os_memory_clerks AS mc WITH (NOLOCK)

   6: GROUP BY mc.[type]  

   7: ORDER BY SUM(mc.pages_kb) DESC OPTION (RECOMPILE);

   8:  

   9: -- MEMORYCLERK_SQLBUFFERPOOL was new for SQL Server 2012. It should be your highest consumer of memory

  10:  

  11: -- CACHESTORE_SQLCP  SQL Plans         

  12: -- These are cached SQL statements or batches that aren't in stored procedures, functions and triggers

  13: -- Watch out for high values for CACHESTORE_SQLCP

  14:  

  15: -- CACHESTORE_OBJCP  Object Plans      

  16: -- These are compiled plans for stored procedures, functions and triggers

Figure 1: Query #41 PLE by NUMA Node

This query shows you which memory clerks are using the most memory on your instance. With SQL Server 2012 or newer, your top memory clerk by memory usage should be MEMORYCLERK_SQLBUFFERPOOL, meaning memory usage by the SQL Server Buffer Pool. It is very common to see a high value for the CACHESTORE_SQLCP memory clerk, indicating that you have multiple GB of cached ad hoc or prepared query plans in the plan cache. If you see that, then you should look at the next query more closely, for several things you can do to help mitigate this issue.

 

Query #42 is Ad hoc Queries. This query retrieves information from the sys.dm_exec_cached_plans dynamic management view and the sys.dm_exec_sql_text dynamic management function about the single-use ad hoc and prepared query plans. Query #42 is shown in Figure 2.

   1: -- Find single-use, ad-hoc and prepared queries that are bloating the plan cache  (Query 42) (Ad hoc Queries)

   2: SELECT TOP(50)  AS [QueryText], cp.cacheobjtype, cp.objtype, cp.size_in_bytes/1024 AS [Plan Size in KB]

   3: FROM sys.dm_exec_cached_plans AS cp WITH (NOLOCK)

   4: CROSS APPLY sys.dm_exec_sql_text(plan_handle) 

   5: WHERE cp.cacheobjtype = N'Compiled Plan' 

   6: AND cp.objtype IN (N'Adhoc', N'Prepared') 

   7: AND cp.usecounts = 1

   8: ORDER BY cp.size_in_bytes DESC OPTION (RECOMPILE);

   9:  

  10: -- Gives you the text, type and size of single-use ad-hoc and prepared queries that waste space in the plan cache

  11: -- Enabling 'optimize for ad hoc workloads' for the instance can help (SQL Server 2008 and above only)

  12: -- Running DBCC FREESYSTEMCACHE ('SQL Plans') periodically may be required to better control this

  13: -- Enabling forced parameterization for the database can help, but test first!

  14:  

  15: -- Plan cache, adhoc workloads and clearing the single-use plan cache bloat

  16: -- http://www.sqlskills.com/blogs/kimberly/plan-cache-adhoc-workloads-and-clearing-the-single-use-plan-cache-bloat/

Figure 2: Query #42 Ad hoc Queries

This query will show you which single-use ad hoc or prepared query plans are using the most space in the plan cache. Once you know who the culprits are, you can start investigating them more closely. Perhaps these queries can be converted to stored procedures or parameterized SQL. At the very least, I think you should enable “optimize for ad hoc workloads” at the instance level pretty much as a default setting. On top of this, it is usually a good idea to periodically flush that particular cache, using the DBCC FREESYSTEMCACHE (‘SQL Plans’); command.

 

Query #43 is Top Logical Reads Queries. This query retrieves information from the sys.dm_exec_query_stats dynamic management view, the sys.dm_exec_sql_text dynamic management function and the sys.dm_exec_query_plan dynamic management function about the cached query plans that have the highest total logical reads. Query #43 is shown in Figure 3.

   1: -- Get top total logical reads queries for entire instance (Query 43) (Top Logical Reads Queries)

   2: SELECT TOP(50) DB_NAME(t.[dbid]) AS [Database Name], LEFT(t., 50) AS [Short Query Text],

   3: qs.total_logical_reads AS [Total Logical Reads],

   4: qs.min_logical_reads AS [Min Logical Reads],

   5: qs.total_logical_reads/qs.execution_count AS [Avg Logical Reads],

   6: qs.max_logical_reads AS [Max Logical Reads],   

   7: qs.min_worker_time AS [Min Worker Time],

   8: qs.total_worker_time/qs.execution_count AS [Avg Worker Time], 

   9: qs.max_worker_time AS [Max Worker Time], 

  10: qs.min_elapsed_time AS [Min Elapsed Time], 

  11: qs.total_elapsed_time/qs.execution_count AS [Avg Elapsed Time], 

  12: qs.max_elapsed_time AS [Max Elapsed Time],

  13: qs.execution_count AS [Execution Count], qs.creation_time AS [Creation Time]

  14: --,t. AS [Complete Query Text], qp.query_plan AS [Query Plan] -- uncomment out these columns if not copying results to Excel

  15: FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK)

  16: CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS t 

  17: CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp 

  18: ORDER BY qs.total_logical_reads DESC OPTION (RECOMPILE);

  19:  

  20:  

  21: -- Helps you find the most expensive queries from a memory perspective across the entire instance

  22: -- Can also help track down parameter sniffing issues

Figure 3: Query #40 Top Logical Reads Queries

Having logical reads means that you are finding the data you need to satisfy a query in the SQL Server Buffer Pool rather than having to go out to the storage subsystem, which is a good thing. Queries that have high numbers of logical reads are creating extra internal memory pressure on your system. They also indirectly create read I/O pressure, since the data that is in the buffer pool has to be initially read from the storage subsystem. If you are seeing signs of memory pressure, then knowing which cached queries (across the entire instance) that have the highest number of total logical reads can help you understand which queries are causing the most memory pressure.

Once you understand this, then you can start looking at individual queries in more detail. Perhaps there is a missing index that is causing a clustered index scan that is causing high numbers of logical reads in a query. Perhaps there is an implicit conversion in a JOIN or in a WHERE clause that is causing SQL Server to ignore a useful index. Maybe someone is pulling back more columns than they need for a query. There are lots of possibilities here.

These three Pluralsight Courses go into even more detail about how to run these queries and interpret the results:

SQL Server 2014 DMV Diagnostic Queries – Part 1

SQL Server 2014 DMV Diagnostic Queries – Part 2

SQL Server 2014 DMV Diagnostic Queries – Part 3

SQL Server Diagnostic Information Queries Detailed, Day 17

For Day 17 of this series, we start out with Query #39, which is PLE by NUMA Node. This query retrieves information from the sys.dm_os_performance_counters dynamic management view about your page life expectancy (PLE) by NUMA node. Query #39 is shown in Figure 1.

   1: -- Page Life Expectancy (PLE) value for each NUMA node in current instance  (Query 39) (PLE by NUMA Node)

   2: SELECT @@SERVERNAME AS [Server Name], [object_name], instance_name, cntr_value AS [Page Life Expectancy]

   3: FROM sys.dm_os_performance_counters WITH (NOLOCK)

   4: WHERE [object_name] LIKE N'%Buffer Node%' -- Handles named instances

   5: AND counter_name = N'Page life expectancy' OPTION (RECOMPILE);

   6:  

   7: -- PLE is a good measurement of memory pressure

   8: -- Higher PLE is better. Watch the trend over time, not the absolute value

   9: -- This will only return one row for non-NUMA systems

  10:  

  11: -- Page Life Expectancy isn’t what you think…

  12: -- http://www.sqlskills.com/blogs/paul/page-life-expectancy-isnt-what-you-think/

Figure 1: Query #39 PLE by NUMA Node

I think that page life expectancy (PLE) is probably one of the best ways to gauge whether you are under internal memory pressure, as long as you think about it correctly. What you should do is monitor your PLE value ranges over time so that you know what your typical minimum, average, and maximum PLE values are at different times and on different days of the week. They will usually vary quite a bit according to your workload.

The ancient guidance that a PLE measurement of 300 or higher is good, is really not relevant with modern database servers with much higher amounts of physical RAM compared to 10-12 years ago. Basically, higher PLE values are always better. You want to watch the ranges and trends over time, rather than focus on a single measurement.

 

 

Query #40 is Memory Grants Pending. This query retrieves information from the sys.dm_os_performance_counters dynamic management view about the current value of the Memory Grants Pending performance counter. Query #40 is shown in Figure 2.

   1: -- Memory Grants Pending value for current instance  (Query 40) (Memory Grants Pending)

   2: SELECT @@SERVERNAME AS [Server Name], [object_name], cntr_value AS [Memory Grants Pending]                                                                                                       

   3: FROM sys.dm_os_performance_counters WITH (NOLOCK)

   4: WHERE [object_name] LIKE N'%Memory Manager%' -- Handles named instances

   5: AND counter_name = N'Memory Grants Pending' OPTION (RECOMPILE);

   6:  

   7: -- Run multiple times, and run periodically is you suspect you are under memory pressure

   8: -- Memory Grants Pending above zero for a sustained period is a very strong indicator of internal memory pressure

Figure 2: Query #40 Memory Grants Pending

This query is another way to determine whether you are under severe internal memory pressure. The value of this query will change from second to second, so you will want to run it multiple times when you suspect you are under memory pressure. Any sustained value above zero is not a good sign. In fact, it is a very bad sign, showing that you are under pretty extreme memory pressure.

SQL Server Diagnostic Information Queries Detailed, Day 16

For Day 16 of this series, we start out with Query #37, which is CPU Utilization History. This query retrieves information from the somewhat undocumented sys.dm_os_ring_buffers dynamic management view about recent CPU utilization by SQL Server. Query #37 is shown in Figure 1.

   1: -- Get CPU Utilization History for last 256 minutes (in one minute intervals)  (Query 37) (CPU Utilization History)

   2: -- This version works with SQL Server 2016

   3: DECLARE @ts_now bigint = (SELECT cpu_ticks/(cpu_ticks/ms_ticks) FROM sys.dm_os_sys_info WITH (NOLOCK)); 

   4:  

   5: SELECT TOP(256) SQLProcessUtilization AS [SQL Server Process CPU Utilization], 

   6:                SystemIdle AS [System Idle Process], 

   7:                100 - SystemIdle - SQLProcessUtilization AS [Other Process CPU Utilization], 

   8:                DATEADD(ms, -1 * (@ts_now - [timestamp]), GETDATE()) AS [Event Time] 

   9: FROM (SELECT record.value('(./Record/@id)[1]', 'int') AS record_id, 

  10:             record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') 

  11:             AS [SystemIdle], 

  12:             record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') 

  13:             AS [SQLProcessUtilization], [timestamp] 

  14:       FROM (SELECT [timestamp], CONVERT(xml, record) AS [record] 

  15:             FROM sys.dm_os_ring_buffers WITH (NOLOCK)

  16:             WHERE ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR' 

  17:             AND record LIKE N'%<SystemHealth>%') AS x) AS y 

  18: ORDER BY record_id DESC OPTION (RECOMPILE);

  19:  

  20: -- Look at the trend over the entire period 

  21: -- Also look at high sustained Other Process CPU Utilization values

Figure 1: Query #37 CPU Utilization History

This query shows you the average CPU utilization history by the current instance of SQL Server, plus the summed average CPU utilization by all other processes on your machine are captured in one minute increments for the past 256 minutes. This lets you go back in time and see what has been happening with processor utilization over that period. It is always nice to know whether an episode of high CPU utilization has been sustained or whether it has been going on for just a short period. It is also nice to understand how much CPU other processes are using on your machine.

Ideally, you have some sort of monitoring system in place to let you review your history and trends for more than four hours. Not all 3rd party monitoring systems are created equal. Some are much better than others. Personally, I really like SQLSentry PerformanceAdvisor.

Query #38 is Top Worker Time Queries. This query retrieves information from the sys.dm_exec_query_stats dynamic management view, the sys.dm_exec_sql_text dynamic management function, and the sys.dm_exec_query_plan dynamic management function about the highest cumulative worker time queries across your entire instance. Query #38 is shown in Figure 2.

   1: -- Get top total worker time queries for entire instance (Query 38) (Top Worker Time Queries)

   2: SELECT TOP(50) DB_NAME(t.[dbid]) AS [Database Name], LEFT(t., 50) AS [Short Query Text],  

   3: qs.total_worker_time AS [Total Worker Time], qs.min_worker_time AS [Min Worker Time],

   4: qs.total_worker_time/qs.execution_count AS [Avg Worker Time], 

   5: qs.max_worker_time AS [Max Worker Time], 

   6: qs.min_elapsed_time AS [Min Elapsed Time], 

   7: qs.total_elapsed_time/qs.execution_count AS [Avg Elapsed Time], 

   8: qs.max_elapsed_time AS [Max Elapsed Time],

   9: qs.min_logical_reads AS [Min Logical Reads],

  10: qs.total_logical_reads/qs.execution_count AS [Avg Logical Reads],

  11: qs.max_logical_reads AS [Max Logical Reads], 

  12: qs.execution_count AS [Execution Count], qs.creation_time AS [Creation Time]

  13: -- ,t. AS [Query Text], qp.query_plan AS [Query Plan] -- uncomment out these columns if not copying results to Excel

  14: FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK)

  15: CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS t 

  16: CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp 

  17: ORDER BY qs.total_worker_time DESC OPTION (RECOMPILE);

  18:  

  19:  

  20: -- Helps you find the most expensive queries from a CPU perspective across the entire instance

  21: -- Can also help track down parameter sniffing issues

Figure 2: Query #38 Top Worker Time Queries

This query is very useful when you see any signs of CPU pressure. It helps you understand which cached queries are using the most CPU resources across your entire instance. You might notice that one line of the query is commented out. I do this so that the query results will copy and paste into Excel. If you are not going to copy/paste the results, then you will usually get some more useful information by including the last two columns in the SELECT statement.

If you are seeing very wide variations in minimum, average, and maximum worker time for a cached query, this might be evidence that you have a parameter sniffing issue (where some input values for a query produce results that are not well-suited for the cached version of the query plan that is being used). The new Query Store feature in SQL Server 2016 is going to be very useful for detecting and correcting these types of issues.

SQL Server Diagnostic Information Queries Detailed, Day 15

For Day 15 of this series, we start out with Query #35, which is Avg Task Counts. This query retrieves information from the sys.dm_os_schedulers dynamic management view about the current average load across your SQL OS schedulers. Query #35 is shown in Figure 1.

   1: -- Get Average Task Counts (run multiple times)  (Query 35) (Avg Task Counts)

   2: SELECT AVG(current_tasks_count) AS [Avg Task Count], 

   3: AVG(work_queue_count) AS [Avg Work Queue Count],

   4: AVG(runnable_tasks_count) AS [Avg Runnable Task Count],

   5: AVG(pending_disk_io_count) AS [Avg Pending DiskIO Count]

   6: FROM sys.dm_os_schedulers WITH (NOLOCK)

   7: WHERE scheduler_id < 255 OPTION (RECOMPILE);

   8:  

   9: -- Sustained values above 10 suggest further investigation in that area

  10: -- High Avg Task Counts are often caused by blocking/deadlocking or other resource contention

  11:  

  12: -- Sustained values above 1 suggest further investigation in that area

  13: -- High Avg Runnable Task Counts are a good sign of CPU pressure

  14: -- High Avg Pending DiskIO Counts are a sign of disk pressure

Figure 1: Query #35 Avg Task Counts

If you see high average task counts (above 10), that is usually a pretty good indicator of blocking/deadlocking. In some cases, it just means that your instance is very busy, with a high sustained level of activity. If you see average runnable task counts above 0, that is a a good indicator of CPU pressure. If you see average pending IO counts above 0, that is a good indicator of I/O pressure or bottlenecks. You need to run this query multiple times, since the results will change from second to second.

Looking at the results of this query (after I have run it a few times over the course of a few minutes) gives me a good high-level sense of the workload and health of my SQL Server instance.

 

Query #36 is Detect Blocking. This query retrieves information from the sys.dm_exec_requests dynamic management view and the sys.dm_exec_sql_text dynamic management function about any blocking activity that is occurring when you run the query. Query #36 is shown in Figure 2.

   1: -- Detect blocking (run multiple times)  (Query 36) (Detect Blocking)

   2: SELECT t1.resource_type AS [lock type], DB_NAME(resource_database_id) AS [database],

   3: t1.resource_associated_entity_id AS [blk object],t1.request_mode AS [lock req],  --- lock requested

   4: t1.request_session_id AS [waiter sid], t2.wait_duration_ms AS [wait time],       -- spid of waiter  

   5: (SELECT  FROM sys.dm_exec_requests AS r WITH (NOLOCK)                      -- get sql for waiter

   6: CROSS APPLY sys.dm_exec_sql_text(r.[sql_handle]) 

   7: WHERE r.session_id = t1.request_session_id) AS [waiter_batch],

   8: (SELECT SUBSTRING(qt.,r.statement_start_offset/2, 

   9:     (CASE WHEN r.statement_end_offset = -1 

  10:     THEN LEN(CONVERT(nvarchar(max), qt.)) * 2 

  11:     ELSE r.statement_end_offset END - r.statement_start_offset)/2) 

  12: FROM sys.dm_exec_requests AS r WITH (NOLOCK)

  13: CROSS APPLY sys.dm_exec_sql_text(r.[sql_handle]) AS qt

  14: WHERE r.session_id = t1.request_session_id) AS [waiter_stmt],                    -- statement blocked

  15: t2.blocking_session_id AS [blocker sid],                                        -- spid of blocker

  16: (SELECT  FROM sys.sysprocesses AS p                                        -- get sql for blocker

  17: CROSS APPLY sys.dm_exec_sql_text(p.[sql_handle]) 

  18: WHERE p.spid = t2.blocking_session_id) AS [blocker_batch]

  19: FROM sys.dm_tran_locks AS t1 WITH (NOLOCK)

  20: INNER JOIN sys.dm_os_waiting_tasks AS t2 WITH (NOLOCK)

  21: ON t1.lock_owner_address = t2.resource_address OPTION (RECOMPILE);

  22:  

  23: -- Helps troubleshoot blocking and deadlocking issues

  24: -- The results will change from second to second on a busy system

  25: -- You should run this query multiple times when you see signs of blocking

Figure 2: Query #36 Detect Blocking

If no blocking is happening when you run this query, it will not return any results. This is what you want to see! You need to run this query multiple times, since the results will often change from second to second. Don’t just run it once, and then conclude that there is no blocking happening at any time.

If any blocking is occurring, then this query will show you the blocked query text and the query text of the blocker. This information can be very useful when it comes to understanding what is going on when blocking or deadlocking is happening. Many times, excessive blocking and deadlocking is caused by missing indexes on a table, so proper index tuning can be a very effective solution.

SQL Server Diagnostic Information Queries Detailed, Day 14

For Day 14 of this series, we start out with Query #33, which is Top Waits. This query retrieves information from the sys.dm_os_wait_stats dynamic management view about the cumulative wait statistics for the instance since the last time it was restarted (or the wait statistics were manually cleared). Query #33 is shown in Figure 1.

   1: -- Clear Wait Stats with this command

   2: -- DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);

   3:  

   4: -- Isolate top waits for server instance since last restart or wait statistics clear  (Query 33) (Top Waits)

   5: WITH [Waits] 

   6: AS (SELECT wait_type, wait_time_ms/ 1000.0 AS [WaitS],

   7:           (wait_time_ms - signal_wait_time_ms) / 1000.0 AS [ResourceS],

   8:            signal_wait_time_ms / 1000.0 AS [SignalS],

   9:            waiting_tasks_count AS [WaitCount],

  10:            100.0 *  wait_time_ms / SUM (wait_time_ms) OVER() AS [Percentage],

  11:            ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS [RowNum]

  12:     FROM sys.dm_os_wait_stats WITH (NOLOCK)

  13:     WHERE [wait_type] NOT IN (

  14:         N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR', N'BROKER_TASK_STOP',

  15:         N'BROKER_TO_FLUSH', N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE',

  16:         N'CHKPT', N'CLR_AUTO_EVENT', N'CLR_MANUAL_EVENT', N'CLR_SEMAPHORE',

  17:         N'DBMIRROR_DBM_EVENT', N'DBMIRROR_EVENTS_QUEUE', N'DBMIRROR_WORKER_QUEUE',

  18:         N'DBMIRRORING_CMD', N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEUE_SEMAPHORE',

  19:         N'EXECSYNC', N'FSAGENT', N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX',

  20:         N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION', N'HADR_LOGCAPTURE_WAIT', 

  21:         N'HADR_NOTIFICATION_DEQUEUE', N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE',

  22:         N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP', N'LOGMGR_QUEUE', 

  23:         N'MEMORY_ALLOCATION_EXT', N'ONDEMAND_TASK_QUEUE',

  24:         N'PREEMPTIVE_OS_LIBRARYOPS', N'PREEMPTIVE_OS_COMOPS', N'PREEMPTIVE_OS_CRYPTOPS',

  25:         N'PREEMPTIVE_OS_PIPEOPS', N'PREEMPTIVE_OS_AUTHENTICATIONOPS',

  26:         N'PREEMPTIVE_OS_GENERICOPS', N'PREEMPTIVE_OS_VERIFYTRUST',

  27:         N'PREEMPTIVE_OS_FILEOPS', N'PREEMPTIVE_OS_DEVICEOPS',

  28:         N'PWAIT_ALL_COMPONENTS_INITIALIZED', N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP',

  29:         N'QDS_ASYNC_QUEUE',

  30:         N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP', N'REQUEST_FOR_DEADLOCK_SEARCH',

  31:         N'RESOURCE_QUEUE', N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH', N'SLEEP_DBSTARTUP',

  32:         N'SLEEP_DCOMSTARTUP', N'SLEEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY',

  33:         N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP', N'SLEEP_SYSTEMTASK', N'SLEEP_TASK',

  34:         N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT', N'SP_SERVER_DIAGNOSTICS_SLEEP',

  35:         N'SQLTRACE_BUFFER_FLUSH', N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', N'SQLTRACE_WAIT_ENTRIES',

  36:         N'WAIT_FOR_RESULTS', N'WAITFOR', N'WAITFOR_TASKSHUTDOWN', N'WAIT_XTP_HOST_WAIT',

  37:         N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG', N'WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHER_JOIN',

  38:         N'XE_DISPATCHER_WAIT', N'XE_LIVE_TARGET_TVF', N'XE_TIMER_EVENT')

  39:     AND waiting_tasks_count > 0)

  40: SELECT

  41:     MAX (W1.wait_type) AS [WaitType],

  42:     CAST (MAX (W1.WaitS) AS DECIMAL (16,2)) AS [Wait_Sec],

  43:     CAST (MAX (W1.ResourceS) AS DECIMAL (16,2)) AS [Resource_Sec],

  44:     CAST (MAX (W1.SignalS) AS DECIMAL (16,2)) AS [Signal_Sec],

  45:     MAX (W1.WaitCount) AS [Wait Count],

  46:     CAST (MAX (W1.Percentage) AS DECIMAL (5,2)) AS [Wait Percentage],

  47:     CAST ((MAX (W1.WaitS) / MAX (W1.WaitCount)) AS DECIMAL (16,4)) AS [AvgWait_Sec],

  48:     CAST ((MAX (W1.ResourceS) / MAX (W1.WaitCount)) AS DECIMAL (16,4)) AS [AvgRes_Sec],

  49:     CAST ((MAX (W1.SignalS) / MAX (W1.WaitCount)) AS DECIMAL (16,4)) AS [AvgSig_Sec]

  50: FROM Waits AS W1

  51: INNER JOIN Waits AS W2

  52: ON W2.RowNum <= W1.RowNum

  53: GROUP BY W1.RowNum

  54: HAVING SUM (W2.Percentage) - MAX (W1.Percentage) < 99 -- percentage threshold

  55: OPTION (RECOMPILE);

  56:  

  57: -- Cumulative wait stats are not as useful on an idle instance that is not under load or performance pressure

  58:  

  59: -- The SQL Server Wait Type Repository

  60: -- http://blogs.msdn.com/b/psssql/archive/2009/11/03/the-sql-server-wait-type-repository.aspx

  61:  

  62: -- Wait statistics, or please tell me where it hurts

  63: -- http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/

  64:  

  65: -- SQL Server 2005 Performance Tuning using the Waits and Queues

  66: -- http://technet.microsoft.com/en-us/library/cc966413.aspx

  67:  

  68: -- sys.dm_os_wait_stats (Transact-SQL)

  69: -- http://msdn.microsoft.com/en-us/library/ms179984(v=sql.120).aspx

Figure 1: Query #33 Top Waits

This query is can be very useful when your instance has been experiencing performance problems. At the same time, I have seen many DBAs spend way too much time agonizing about their top wait statistics when they don’t need to. SQL Server will always be waiting on some type of resource (which is why I try to filter out what are generally considered to be benign wait types). If your instance is performing well, and nobody is complaining about performance, then you can relax a little bit.

Another issue with the results of this query is that there is a lot of bad advice on the internet about what certain wait types mean and what, if anything, you should do if you see them. This often leads to what Paul Randal calls “knee-jerk” performance tuning, where you see a certain wait type, and then immediately want to make some configuration change without doing any further investigation or putting any deeper thought into the matter.

After all of those cautions, this query can be very useful in pointing you in one direction or another to do deeper investigation, especially when your instance has been performing poorly. If you do make any configuration changes, or do something else that might affect performance (such as adding an index), then it is a good idea to clear the wait statistics so that the old cumulative wait statistics don’t obscure what is going on after the change.

 

Query #34 is Connection Counts by IP Address. This query retrieves information from the sys.dm_exec_sessions dynamic management view and the  sys.dm_exec_connections dynamic management view about your current connection counts by IP address. Query #34 is shown in Figure 2.

   1: -- Get a count of SQL connections by IP address (Query 34) (Connection Counts by IP Address)

   2: SELECT ec.client_net_address, es.[program_name], es.[host_name], es.login_name, 

   3: COUNT(ec.session_id) AS [connection count] 

   4: FROM sys.dm_exec_sessions AS es WITH (NOLOCK) 

   5: INNER JOIN sys.dm_exec_connections AS ec WITH (NOLOCK) 

   6: ON es.session_id = ec.session_id 

   7: GROUP BY ec.client_net_address, es.[program_name], es.[host_name], es.login_name  

   8: ORDER BY ec.client_net_address, es.[program_name] OPTION (RECOMPILE);

   9:  

  10: -- This helps you figure where your database load is coming from

  11: -- and verifies connectivity from other machines

Figure 2: Query #34 Connection Counts by IP Address

This query helps you see the magnitude of your workload and judge whether it is in the normal range that you should be seeing. I think it is a good idea to have a baseline for how many connections your database server typically has from whatever other machines normally connect to it. This query can also help you confirm and troubleshoot connectivity issues from other machines. I can’t tell you how many times that people have claimed my SQL Server instance was down because they could not connect to it. In the vast majority of cases, they simply had an incorrect connection string or there was a blocked port on their machine that prevented the connection. Remember, the database is always guilty until proven innocent!

SQL Server Diagnostic Information Queries Detailed, Day 13

For Day 13 of this series, we start out with Query #30, which is CPU Usage by Database. This query retrieves information from the sys.dm_exec_query_stats dynamic management view and from the sys.dm_exec_plan_attributes dynamic management function about total CPU usage by database for cached query plans. Query #30 is shown in Figure 1.

   1: -- Get CPU utilization by database (Query 30) (CPU Usage by Database)

   2: WITH DB_CPU_Stats

   3: AS

   4: (SELECT pa.DatabaseID, DB_Name(pa.DatabaseID) AS [Database Name], SUM(qs.total_worker_time/1000) AS [CPU_Time_Ms]

   5:  FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK)

   6:  CROSS APPLY (SELECT CONVERT(int, value) AS [DatabaseID] 

   7:               FROM sys.dm_exec_plan_attributes(qs.plan_handle)

   8:               WHERE attribute = N'dbid') AS pa

   9:  GROUP BY DatabaseID)

  10: SELECT ROW_NUMBER() OVER(ORDER BY [CPU_Time_Ms] DESC) AS [CPU Rank],

  11:        [Database Name], [CPU_Time_Ms] AS [CPU Time (ms)], 

  12:        CAST([CPU_Time_Ms] * 1.0 / SUM([CPU_Time_Ms]) OVER() * 100.0 AS DECIMAL(5, 2)) AS [CPU Percent]

  13: FROM DB_CPU_Stats

  14: WHERE DatabaseID <> 32767 -- ResourceDB

  15: ORDER BY [CPU Rank] OPTION (RECOMPILE);

  16:  

  17: -- Helps determine which database is using the most CPU resources on the instance

Figure 1: Query #30 CPU Usage by Database

Simply speaking, this query shows you which databases are using the most CPU resources on the instance, at least as far as their cached query plans are concerned. If you are seeing any signs of CPU pressure, this query can help point you at the correct databases to investigate further, to see what queries are using the most CPU resources. There are several other queries in this complete set that can help you find the most expensive cached stored procedures and queries.

 

Query #31 is IO Usage by Database. This query retrieves information from the sys.dm_io_virtual_file_stats dynamic management function about your total cumulative I/O usage by database since SQL Server was last started. Query #31 is shown in Figure 2.

   1: -- Get I/O utilization by database (Query 31) (IO Usage By Database)

   2: WITH Aggregate_IO_Statistics

   3: AS

   4: (SELECT DB_NAME(database_id) AS [Database Name],

   5: CAST(SUM(num_of_bytes_read + num_of_bytes_written)/1048576 AS DECIMAL(12, 2)) AS io_in_mb

   6: FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS [DM_IO_STATS]

   7: GROUP BY database_id)

   8: SELECT ROW_NUMBER() OVER(ORDER BY io_in_mb DESC) AS [I/O Rank], [Database Name], io_in_mb AS [Total I/O (MB)],

   9:        CAST(io_in_mb/ SUM(io_in_mb) OVER() * 100.0 AS DECIMAL(5,2)) AS [I/O Percent]

  10: FROM Aggregate_IO_Statistics

  11: ORDER BY [I/O Rank] OPTION (RECOMPILE);

  12:  

  13: -- Helps determine which database is using the most I/O resources on the instance

Figure 2: Query #31 IO Usage Usage by Database

The figures that this query collects are cumulative since SQL Server last started, and they include all file activity against your database data files and log files. This includes your normal database workload, plus things like index maintenance, DBCC CHECKDB activity, database backups, and any log reader activity. Because of all this, the numbers you see here might be different than you expect.

Query #32 is Total Buffer Usage by Database. This query retrieves information from the sys.dm_os_buffer_descriptors dynamic management view about your current total buffer usage by database. Query #32 is shown in Figure 3.

   1: -- Get total buffer usage by database for current instance  (Query 32) (Total Buffer Usage by Database)

   2: -- This make take some time to run on a busy instance

   3: WITH AggregateBufferPoolUsage

   4: AS

   5: (SELECT DB_NAME(database_id) AS [Database Name],

   6: CAST(COUNT(*) * 8/1024.0 AS DECIMAL (10,2))  AS [CachedSize]

   7: FROM sys.dm_os_buffer_descriptors WITH (NOLOCK)

   8: WHERE database_id <> 32767 -- ResourceDB

   9: GROUP BY DB_NAME(database_id))

  10: SELECT ROW_NUMBER() OVER(ORDER BY CachedSize DESC) AS [Buffer Pool Rank], [Database Name], CachedSize AS [Cached Size (MB)],

  11:        CAST(CachedSize / SUM(CachedSize) OVER() * 100.0 AS DECIMAL(5,2)) AS [Buffer Pool Percent]

  12: FROM AggregateBufferPoolUsage

  13: ORDER BY [Buffer Pool Rank] OPTION (RECOMPILE);

  14:  

  15: -- Tells you how much memory (in the buffer pool) 

  16: -- is being used by each database on the instance

Figure 3: Query #32 Total Buffer Usage Usage by Database

This query shows you which databases are using the most space in the SQL Server Buffer Pool. If you see a database that is using a large amount of memory in the buffer pool, you might be able to improve the situation by doing some query or index tuning, or by using SQL Server Data Compression on some of your indexes. It may be that you just have a large database that has a lot of activity, so it has a lot of data in the buffer pool, by design.

SQL Server Diagnostic Information Queries Detailed, Day 12

For Day 12 of this series, we start out with Query #27, which is Database Properties. This query retrieves information from the sys.databases catalog view, and from the sys.dm_os_performance_counters dynamic management view. Query #27 is shown in Figure 1.

   1: -- Recovery model, log reuse wait description, log file size, log usage size  (Query 27) (Database Properties)

   2: -- and compatibility level for all databases on instance

   3: SELECT db.[name] AS [Database Name], db.recovery_model_desc AS [Recovery Model], db.state_desc, db.containment_desc,

   4: db.log_reuse_wait_desc AS [Log Reuse Wait Description], 

   5: CONVERT(DECIMAL(18,2), ls.cntr_value/1024.0) AS [Log Size (MB)], CONVERT(DECIMAL(18,2), lu.cntr_value/1024.0) AS [Log Used (MB)],

   6: CAST(CAST(lu.cntr_value AS FLOAT) / CAST(ls.cntr_value AS FLOAT)AS DECIMAL(18,2)) * 100 AS [Log Used %], 

   7: db.[compatibility_level] AS [DB Compatibility Level], db.page_verify_option_desc AS [Page Verify Option], 

   8: db.is_auto_create_stats_on, db.is_auto_update_stats_on, db.is_auto_update_stats_async_on, db.is_parameterization_forced, 

   9: db.snapshot_isolation_state_desc, db.is_read_committed_snapshot_on, db.is_auto_close_on, db.is_auto_shrink_on, 

  10: db.target_recovery_time_in_seconds, db.is_cdc_enabled, db.is_published, db.is_distributor, db.is_encrypted,

  11: db.group_database_id, db.replica_id,db.is_memory_optimized_elevate_to_snapshot_on, 

  12: db.delayed_durability_desc, db.is_auto_create_stats_incremental_on,

  13: db.is_query_store_on, db.is_sync_with_backup, 

  14: db.is_supplemental_logging_enabled, db.is_remote_data_archive_enabled      

  15: FROM sys.databases AS db WITH (NOLOCK)

  16: INNER JOIN sys.dm_os_performance_counters AS lu WITH (NOLOCK)

  17: ON db.name = lu.instance_name

  18: INNER JOIN sys.dm_os_performance_counters AS ls WITH (NOLOCK)

  19: ON db.name = ls.instance_name

  20: WHERE lu.counter_name LIKE N'Log File(s) Used Size (KB)%' 

  21: AND ls.counter_name LIKE N'Log File(s) Size (KB)%'

  22: AND ls.cntr_value > 0 OPTION (RECOMPILE);

  23:  

  24: -- Things to look at:

  25: -- How many databases are on the instance?

  26: -- What recovery models are they using?

  27: -- What is the log reuse wait description?

  28: -- How full are the transaction logs?

  29: -- What compatibility level are the databases on? 

  30: -- What is the Page Verify Option? (should be CHECKSUM)

  31: -- Is Auto Update Statistics Asynchronously enabled?

  32: -- Make sure auto_shrink and auto_close are not enabled!

Figure 1: Query #27 Volume Info

This query gives you a lot of very useful information about all of your databases. I pay special attention to Log Reuse Wait Description and Log Used %, to spot problems that are preventing the space in the transaction log from being reused, which will eventually cause the log file to fill up and grow. I also look at the compatibility level (which is much more important in SQL Server 2014, with the new cardinality estimator) and the page verify option, which should always be CHECKSUM.

It is important to check the statistics-related properties, and to also make sure that auto shrink and auto close are not enabled. I don’t think I have ever seen an instance of SQL Server where I did not notice some database properties for multiple databases that were simply incorrect. This query is great for quickly spotting possible issues that need to be corrected.

 

Query #28 is Missing Indexes All Databases. This query retrieves information from the sys.dm_db_missing_index_group_stats dynamic management view, the sys.dm_db_missing_index_groups dynamic management view, and the sys.dm_db_missing_index_details dynamic management view about “missing” indexes that the SQL Server Query Optimizer thinks that it would like to have. Query #28 is shown in Figure 2.

   1: -- Missing Indexes for all databases by Index Advantage  (Query 28) (Missing Indexes All Databases)

   2: SELECT CONVERT(decimal(18,2), user_seeks * avg_total_user_cost * (avg_user_impact * 0.01)) AS [index_advantage], 

   3: migs.last_user_seek, mid.[statement] AS [Database.Schema.Table],

   4: mid.equality_columns, mid.inequality_columns, mid.included_columns,

   5: migs.unique_compiles, migs.user_seeks, migs.avg_total_user_cost, migs.avg_user_impact

   6: FROM sys.dm_db_missing_index_group_stats AS migs WITH (NOLOCK)

   7: INNER JOIN sys.dm_db_missing_index_groups AS mig WITH (NOLOCK)

   8: ON migs.group_handle = mig.index_group_handle

   9: INNER JOIN sys.dm_db_missing_index_details AS mid WITH (NOLOCK)

  10: ON mig.index_handle = mid.index_handle

  11: ORDER BY index_advantage DESC OPTION (RECOMPILE);

  12:  

  13: -- Getting missing index information for all of the databases on the instance is very useful

  14: -- Look at last user seek time, number of user seeks to help determine source and importance

  15: -- Also look at avg_user_impact and avg_total_user_cost to help determine importance

  16: -- SQL Server is overly eager to add included columns, so beware

  17: -- Do not just blindly add indexes that show up from this query!!!

Figure 2: Query #28 Missing Indexes All Databases

This query is very useful, but also very easy to misinterpret and misuse. I have seen many novice DBAs and developers use the results of this query to pretty badly over-index their databases, which affects their database size and hurts insert, update, and delete performance. I like to focus on the last_user_seek column, and see how long ago that was. Was it a few seconds or minutes ago, or was it days or weeks ago?

I then start looking at the user_seeks, avg_total_user_cost, and avg_user_impact columns to get a sense for how often SQL Server thinks it wants this proposed index, how expensive it is not to have the index, and how much the query optimizer thinks the cost of the query would be reduced if it did have this index that it is requesting.

Next, I’ll look at any other proposed indexes on the same table to see if I can come up with a wider, consolidated index that covers multiple requested indexes. Finally, I’ll look at the existing indexes on that table, and look at the index usage metrics for that table to have a better idea of whether a new index would be a good idea, based on the volatility of that table.

 

Query #29 is VLF Counts. This query calls the DBCC LogInfo command for each on of your system and user databases to get your current virtual log file (VLF) counts for your transaction logs. Query #29 is shown in Figure 3

   1: -- Get VLF Counts for all databases on the instance (Query 29) (VLF Counts)

   2: -- (adapted from Michelle Ufford) 

   3: CREATE TABLE #VLFInfo (RecoveryUnitID int, FileID  int,

   4:                        FileSize bigint, StartOffset bigint,

   5:                        FSeqNo      bigint, [Status]    bigint,

   6:                        Parity      bigint, CreateLSN   numeric(38));

   7:      

   8: CREATE TABLE #VLFCountResults(DatabaseName sysname, VLFCount int);

   9:      

  10: EXEC sp_MSforeachdb N'Use [?]; 

  11:  

  12:                 INSERT INTO #VLFInfo 

  13:                 EXEC sp_executesql N''DBCC LOGINFO([?])''; 

  14:      

  15:                 INSERT INTO #VLFCountResults 

  16:                 SELECT DB_NAME(), COUNT(*) 

  17:                 FROM #VLFInfo; 

  18:  

  19:                 TRUNCATE TABLE #VLFInfo;'

  20:      

  21: SELECT DatabaseName, VLFCount  

  22: FROM #VLFCountResults

  23: ORDER BY VLFCount DESC;

  24:      

  25: DROP TABLE #VLFInfo;

  26: DROP TABLE #VLFCountResults;

  27:  

  28: -- High VLF counts can affect write performance 

  29: -- and they can make full database restores and crash recovery take much longer

  30: -- Try to keep your VLF counts under 200 in most cases (depending on log file size)

  31:  

  32: -- Important change to VLF creation algorithm in SQL Server 2014

  33: -- http://www.sqlskills.com/blogs/paul/important-change-vlf-creation-algorithm-sql-server-2014/

Figure 3: Query #29 VLF Counts

Every time a SQL Server log file grows, a certain number of VLFs are added to the log file. By the way, the formula for this was changed in SQL Server 2014, as discussed by Paul Randal here.

The reason why you care about this is because having a higher number of VLFs affects how long the recovery portion of a database restore takes. If you have ever watched the progress update in SSMS while you were doing a restore of a large database, and noticed that it finally got to 100%, but then kept grinding away for quite a while after that, you have seen this effect in action. It is very frustrating!

It also affects how long “crash recovery” takes, which occurs for every database on the instance whenever the SQL Server Service starts. This comes into play whether it is just a standalone instance starting or when you failover to another node with a traditional SQL Server failover cluster instance (FCI). If you have ever been frustrated about how long a FCI failover takes, keeping your VLF counts under control is one thing you can do to help speed up the process.

If you have a high VLF count in a database, it is usually pretty easy to correct. If you are in the FULL recovery model, you will want to run a transaction log backup. Then, you shrink the transaction log file. Often, you will have to repeat this sequence more than once. Sometimes you will even need to generate some log activity (by doing something like reorganizing an index or two) to get the active portion of the log to a place where a shrink works.

Once you have done all of this, your transaction log file is likely to be extremely small, perhaps just a few MB in size. You will want to immediately manually grow it in relatively large increments (such as 1000MB, 2000MB, or 4000MB) until you have it at the desired size. You will also want to make sure the auto growth increment is set to a fixed size in MB (such as 1000MB, 2000MB, or 4000MB) so that any future auto growths don’t add an excessive number of VLFs.

SQL Server Diagnostic Information Queries Detailed, Day 11

Today, we will go through four queries that are related to storage. We start out with Query #23, which is Volume Info.  This query retrieves information from the sys.master_files catalog view and the sys.dm_os_volume_stats dynamic management function about any operating system volumes or mount points on your server where you have any SQL Server database files. Query #23 is shown in Figure 1.

   1: -- Volume info for all LUNS that have database files on the current instance (Query 23) (Volume Info)

   2: SELECT DISTINCT vs.volume_mount_point, vs.file_system_type, 

   3: vs.logical_volume_name, CONVERT(DECIMAL(18,2),vs.total_bytes/1073741824.0) AS [Total Size (GB)],

   4: CONVERT(DECIMAL(18,2),vs.available_bytes/1073741824.0) AS [Available Size (GB)],  

   5: CAST(CAST(vs.available_bytes AS FLOAT)/ CAST(vs.total_bytes AS FLOAT) AS DECIMAL(18,2)) * 100 AS [Space Free %] 

   6: FROM sys.master_files AS f WITH (NOLOCK)

   7: CROSS APPLY sys.dm_os_volume_stats(f.database_id, f.[file_id]) AS vs 

   8: ORDER BY vs.volume_mount_point OPTION (RECOMPILE);

   9:  

  10: -- Shows you the total and free space on the LUNs where you have database files

  11: -- Being low on free space can negatively affect performance

Figure 1: Query #23 Volume Info

This query shows you how large each operating system volume or mount point is, and how much space is available. You don’t ever want to run out of space on a volume where SQL Server database files are located (for obvious reasons), but many people are not aware that being low on space can affect the performance of the volume, with both magnetic storage and with flash storage.

With magnetic storage, being low on disk space means that you may be no longer getting the increased disk performance possible from short-stroking. This can affect your disk performance with magnetic drives because the arm that moves the drive heads back and forth to read the data has to move further as an individual disk becomes more and more full, plus the disk platters are moving slower on the inside portion compared to the outside portion of the disk.

The reason why I say that this only “may” be affecting your disk performance is because with many SANs, whatever space has been allocated for a volume may be spread across all of the available drives in the SAN, so the fact that your volume is low on space may have very little to do with what is happening with the overall SAN.

With flash storage, as the drive gets lower on space, it becomes more difficult for garbage collection and TRIM operations to complete without affecting performance. Generally speaking, your write performance will tend to decrease and become more inconsistent as the drive is closer to being full of data.

 

Query #24 is Drive Level Latency. This query retrieves information from the sys.master_files catalog view and the sys.dm_io_virtual_file_stats dynamic management function about I/O statistics for SQL Server data and log files. Query #24 is shown in Figure 2.

   1: -- Drive level latency information (Query 24) (Drive Level Latency)

   2: -- Based on code from Jimmy May

   3: SELECT tab.[Drive], tab.volume_mount_point AS [Volume Mount Point], 

   4:     CASE 

   5:         WHEN num_of_reads = 0 THEN 0 

   6:         ELSE (io_stall_read_ms/num_of_reads) 

   7:     END AS [Read Latency],

   8:     CASE 

   9:         WHEN io_stall_write_ms = 0 THEN 0 

  10:         ELSE (io_stall_write_ms/num_of_writes) 

  11:     END AS [Write Latency],

  12:     CASE 

  13:         WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 

  14:         ELSE (io_stall/(num_of_reads + num_of_writes)) 

  15:     END AS [Overall Latency],

  16:     CASE 

  17:         WHEN num_of_reads = 0 THEN 0 

  18:         ELSE (num_of_bytes_read/num_of_reads) 

  19:     END AS [Avg Bytes/Read],

  20:     CASE 

  21:         WHEN io_stall_write_ms = 0 THEN 0 

  22:         ELSE (num_of_bytes_written/num_of_writes) 

  23:     END AS [Avg Bytes/Write],

  24:     CASE 

  25:         WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 

  26:         ELSE ((num_of_bytes_read + num_of_bytes_written)/(num_of_reads + num_of_writes)) 

  27:     END AS [Avg Bytes/Transfer]

  28: FROM (SELECT LEFT(UPPER(mf.physical_name), 2) AS Drive, SUM(num_of_reads) AS num_of_reads,

  29:              SUM(io_stall_read_ms) AS io_stall_read_ms, SUM(num_of_writes) AS num_of_writes,

  30:              SUM(io_stall_write_ms) AS io_stall_write_ms, SUM(num_of_bytes_read) AS num_of_bytes_read,

  31:              SUM(num_of_bytes_written) AS num_of_bytes_written, SUM(io_stall) AS io_stall, vs.volume_mount_point 

  32:       FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs

  33:       INNER JOIN sys.master_files AS mf WITH (NOLOCK)

  34:       ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id

  35:       CROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.[file_id]) AS vs 

  36:       GROUP BY LEFT(UPPER(mf.physical_name), 2), vs.volume_mount_point) AS tab

  37: ORDER BY [Overall Latency] OPTION (RECOMPILE);

  38:  

  39: -- Shows you the drive-level latency for reads and writes, in milliseconds

  40: -- Latency above 20-25ms is usually a problem

Figure 2: Query #24 Drive Level Latency

This particular query aggregates all of the file activity for your database data and log files up to the volume level. This activity includes your normal workload, plus anything else (such as index maintenance, database backups, log reader activity for things like replication, database mirroring, and AlwaysOn AGs) that touches your database data and log files.

These numbers are cumulative since SQL Server was last started. This means that the latency numbers you see here will usually tend to be a little higher than what you might see in Performance Monitor, and also higher than what shows up at the SAN-level. This query helps you understand your disk performance for SQL Server at the volume level. Once you understand this, then you are ready to drill down to the individual database file level with the next query.

 

Query #25 is IO Stalls by File. This query retrieves information from the sys.master_files catalog view and the sys.dm_io_virtual_file_stats dynamic management function about I/O statistics for SQL Server data and log files. Query #25 is shown in Figure 3.

   1: -- Calculates average stalls per read, per write, and per total input/output for each database file  (Query 25) (IO Stalls by File)

   2: SELECT DB_NAME(fs.database_id) AS [Database Name], CAST(fs.io_stall_read_ms/(1.0 + fs.num_of_reads) AS NUMERIC(10,1)) AS [avg_read_stall_ms],

   3: CAST(fs.io_stall_write_ms/(1.0 + fs.num_of_writes) AS NUMERIC(10,1)) AS [avg_write_stall_ms],

   4: CAST((fs.io_stall_read_ms + fs.io_stall_write_ms)/(1.0 + fs.num_of_reads + fs.num_of_writes) AS NUMERIC(10,1)) AS [avg_io_stall_ms],

   5: CONVERT(DECIMAL(18,2), mf.size/128.0) AS [File Size (MB)], mf.physical_name, mf.type_desc, fs.io_stall_read_ms, fs.num_of_reads, 

   6: fs.io_stall_write_ms, fs.num_of_writes, fs.io_stall_read_ms + fs.io_stall_write_ms AS [io_stalls], fs.num_of_reads + fs.num_of_writes AS [total_io],

   7: io_stall_queued_read_ms AS [Resource Governor Total Read IO Latency (ms)], io_stall_queued_write_ms AS [Resource Governor Total Write IO Latency (ms)] 

   8: FROM sys.dm_io_virtual_file_stats(null,null) AS fs

   9: INNER JOIN sys.master_files AS mf WITH (NOLOCK)

  10: ON fs.database_id = mf.database_id

  11: AND fs.[file_id] = mf.[file_id]

  12: ORDER BY avg_io_stall_ms DESC OPTION (RECOMPILE);

  13:  

  14: -- Helps determine which database files on the entire instance have the most I/O bottlenecks

  15: -- This can help you decide whether certain LUNs are overloaded and whether you might

  16: -- want to move some files to a different location or perhaps improve your I/O performance

Figure 3: Query #25 IO Stalls by File

This query is similar to the previous query, with all of the same caveats. This query goes down to the individual database file level rather than rolling up the data to the volume level. Once you see high read or write latency for a particular volume, you will want to then determine which database file(s) are seeing the highest latency on that volume. This will help you decide what you might be able to do to improve the situation.

 

Query #26 is IO Warnings. This query reads the five most recent SQL Server Error logs to look for 15 second I/O warnings. Query #26 is shown in Figure 4.

   1: -- Look for I/O requests taking longer than 15 seconds in the five most recent SQL Server Error Logs (Query 26) (IO Warnings)

   2: CREATE TABLE #IOWarningResults(LogDate datetime, ProcessInfo sysname, LogText nvarchar(1000));

   3:  

   4:     INSERT INTO #IOWarningResults 

   5:     EXEC xp_readerrorlog 0, 1, N'taking longer than 15 seconds';

   6:  

   7:     INSERT INTO #IOWarningResults 

   8:     EXEC xp_readerrorlog 1, 1, N'taking longer than 15 seconds';

   9:  

  10:     INSERT INTO #IOWarningResults 

  11:     EXEC xp_readerrorlog 2, 1, N'taking longer than 15 seconds';

  12:  

  13:     INSERT INTO #IOWarningResults 

  14:     EXEC xp_readerrorlog 3, 1, N'taking longer than 15 seconds';

  15:  

  16:     INSERT INTO #IOWarningResults 

  17:     EXEC xp_readerrorlog 4, 1, N'taking longer than 15 seconds';

  18:  

  19: SELECT LogDate, ProcessInfo, LogText

  20: FROM #IOWarningResults

  21: ORDER BY LogDate DESC;

  22:  

  23: DROP TABLE #IOWarningResults;  

  24:  

  25: -- Finding 15 second I/O warnings in the SQL Server Error Log is useful evidence of

  26: -- poor I/O performance (which might have many different causes)

  27:  

  28: -- Diagnostics in SQL Server help detect stalled and stuck I/O operations

  29: -- https://support.microsoft.com/en-us/kb/897284

Figure 4: Query #26 IO Warnings

This query may take some time to complete if your SQL Server Error log files are very large. Finding any 15 second I/O warnings is very strong evidence that you were seeing very poor storage subsystem performance when they were recorded. The warnings go down to the database file level, so you can see which drive and which file was affected.

What you want to look for is whether there is any sort of recognizable pattern to when you are getting these warnings. For example, perhaps you often see a series of 15 second warnings for the data files for one database at around 3AM each morning. Perhaps this is when you are doing index maintenance, plus running DBCC CHECKDB at the same time, or some other repeating pattern of scheduled disk intensive activity.

On the other hand, it may be the case that you see many 15 second I/O warnings at random times, spread across many drive volumes and disk files. This would be a pretty good indicator of generally poor storage subsystem performance!

The results of these four queries are very useful to have available when you talk to your storage administrator about disk performance. The more evidence and metrics you have available about what SQL Server is seeing from a storage perspective, the stronger your case will be.

SQL Server Diagnostic Information Queries Detailed, Day 10

For Day 10 of this series, we start out with Query #21, which is Memory Dump Info.  This query retrieves information about any SQL Server memory dumps that have occurred from the sys.dm_server_memory_dumps dynamic management view. Query #21 is shown in Figure 1.

   1: -- Get information on location, time and size of any memory dumps from SQL Server  (Query 21) (Memory Dump Info)

   2: SELECT [filename], creation_time, size_in_bytes/1048576.0 AS [Size (MB)]

   3: FROM sys.dm_server_memory_dumps WITH (NOLOCK) 

   4: ORDER BY creation_time DESC OPTION (RECOMPILE);

   5:  

   6: -- This will not return any rows if you have 

   7: -- not had any memory dumps (which is a good thing)

Figure 1: Query #21 Memory Dump Info

What you want to see when you run this query is no results, which means you have not had any SQL Server memory dumps. Memory dumps occur when SQL Server runs into problems or severe errors that are not quite bad enough to cause the entire SQL Server process to crash. If you have any of these, you can sometimes open the accompanying text file and get a rough idea what might have happened. Ultimately, you will need someone who knows what they are doing to look at the dump file with a debugger and the public symbols for your version and build of SQL Server (which might be Microsoft Support). Being on the latest Service Pack and Cumulative Update for your version of SQL Server will tend to reduce the number of memory dumps that you see.

 

Query #22 is Database Filenames and Paths. This query retrieves information from the  sys.master_files catalog view about the filenames and paths for all of the database files for all of the user and system databases on your instance. Query #22 is shown in Figure 2.

   1: -- File names and paths for all user and system databases on instance  (Query 22) (Database Filenames and Paths)

   2: SELECT DB_NAME([database_id]) AS [Database Name], 

   3:        [file_id], name, physical_name, [type_desc], state_desc,

   4:        is_percent_growth, growth,

   5:        CONVERT(bigint, growth/128.0) AS [Growth in MB], 

   6:        CONVERT(bigint, size/128.0) AS [Total Size in MB]

   7: FROM sys.master_files WITH (NOLOCK)

   8: ORDER BY DB_NAME([database_id]) OPTION (RECOMPILE);

   9:  

  10: -- Things to look at:

  11: -- Are data files and log files on different drives?

  12: -- Is everything on the C: drive?

  13: -- Is TempDB on dedicated drives?

  14: -- Is there only one TempDB data file?

  15: -- Are all of the TempDB data files the same size?

  16: -- Are there multiple data files for user databases?

  17: -- Is percent growth enabled for any files (which is bad)?

Figure 2: Query #22 Database Filenames and Paths

This query lets you pretty easily see how your various database files are laid out across your file system, which is very useful to understand as you are investigating your storage subsystem performance. For example, it is fairly common to see high write latency for your tempdb database data files. Knowing that these files are located on your T: drive (for example), will help you understand the results from some of the upcoming queries more easily.

Depending of what kind of storage you have and how it is configured, you might want to separate your database data files from your log files (on separate LUNs). This old rule of thumb is not always the best thing to do when you have a small number of magnetic spindles. This query also lets you see how tempdb is configured, and whether any of your database files are using percent growth (which is a bad thing for performance).

SQL Server Diagnostic Information Queries Detailed, Day 9

For Day 9 of this series, we start out with Query #19, which is BPE Configuration.  This query retrieves information about your buffer pool extension (BPE) configuration from the sys.dm_os_buffer_pool_extension_configuration dynamic management view. Query #19 is shown in Figure 1.

   1: -- See if buffer pool extensions (BPE) is enabled (Query 19) (BPE Configuration)

   2: SELECT [path], state_description, current_size_in_kb, 

   3: CAST(current_size_in_kb/1048576.0 AS DECIMAL(10,2)) AS [Size (GB)]

   4: FROM sys.dm_os_buffer_pool_extension_configuration WITH (NOLOCK) OPTION (RECOMPILE);

   5:  

   6: -- BPE is available in both Standard Edition and Enterprise Edition

   7: -- It is a more interesting feature for Standard Edition

   8:  

   9: -- Buffer Pool Extension to SSDs in SQL Server 2014

  10: -- http://blogs.technet.com/b/dataplatforminsider/archive/2013/07/25/buffer-pool-extension-to-ssds-in-sql-server-2014.aspx

Figure 1: Query #19 BPE Configuration

BPE was a new feature that was added to SQL Server 2014. The idea behind it is that you can set aside some space in your file system for a read-only cache file of clean buffer pool pages, that makes it look like your buffer pool is larger than it actually is. SQL Server will look in the actual buffer pool first, and then in the BPE file, and finally in your storage subsystem to find the data that it needs. The ideal scenario for this to be helpful is if you have a read-intensive, OLTP workload that does a lot of random reads from your data files (but you don’t have enough RAM to fit the data that is being read into the actual buffer pool). If you are using magnetic storage that has relatively poor random read I/O performance, and if your BPE file is on fast, local flash storage, then, you might see a performance improvement from using BPE.

The problem with this in real life is that there is typically a lot of write activity to the BPE file to keep it up to date with the data that it is trying to cache. Some types of less expensive flash storage has performance issues when it is under sustained write pressure. It is also hard to drive an OLTP workload hard enough to cause it to actually use the BPE file in most scenarios, unless you set max server memory artificially low. The BPE feature is mainly interesting if you are using SQL Server 2014 Standard Edition, where you are restricted to using 128GB of RAM per instance for the Database Engine.

 

Query #20 is BPE Usage. This query retrieves data from the sys.dm_os_buffer_descriptors dynamic management view. Query #20 is shown in Figure 2.

   1: -- Look at buffer descriptors to see BPE usage by database (Query 20) (BPE Usage) 

   2: SELECT DB_NAME(database_id) AS [Database Name], COUNT(page_id) AS [Page Count],

   3: CAST(COUNT(*)/128.0 AS DECIMAL(10, 2)) AS [Buffer size(MB)], 

   4: AVG(read_microsec) AS [Avg Read Time (microseconds)]

   5: FROM sys.dm_os_buffer_descriptors WITH (NOLOCK)

   6: WHERE database_id <> 32767

   7: AND is_in_bpool_extension = 1

   8: GROUP BY DB_NAME(database_id) 

   9: ORDER BY [Buffer size(MB)] DESC OPTION (RECOMPILE);

  10:  

  11: -- You will see no results if BPE is not enabled or if there is no BPE usage

Figure 2: Query #20 BPE Usage

This query (which usually takes a few seconds to run), will show you which databases are actually using your BPE file. I have not seen too many people using the BPE feature so far, which I think is a shame. If you have the right kind of workload, it may help performance. It is certainly worth some testing to try it out.