SQL Server Diagnostic Information Queries Detailed, Day 21

For Day 21 of this series, we start out with Query #48, which is SP Avg Elapsed Time. This query retrieves information from the sys.procedures object catalog view and the sys.dm_exec_procedure_stats dynamic management view about the cached stored procedures that have the highest average elapsed time in the current database. Query #48 is shown in Figure 1.

   1: -- Top Cached SPs By Avg Elapsed Time (Query 48) (SP Avg Elapsed Time)

   2: SELECT TOP(25) p.name AS [SP Name], qs.min_elapsed_time, qs.total_elapsed_time/qs.execution_count AS [avg_elapsed_time], 

   3: qs.max_elapsed_time, qs.last_elapsed_time, qs.total_elapsed_time, qs.execution_count, 

   4: ISNULL(qs.execution_count/DATEDIFF(Minute, qs.cached_time, GETDATE()), 0) AS [Calls/Minute], 

   5: qs.total_worker_time/qs.execution_count AS [AvgWorkerTime], 

   6: qs.total_worker_time AS [TotalWorkerTime], qs.cached_time

   7: FROM sys.procedures AS p WITH (NOLOCK)

   8: INNER JOIN sys.dm_exec_procedure_stats AS qs WITH (NOLOCK)

   9: ON p.[object_id] = qs.[object_id]

  10: WHERE qs.database_id = DB_ID()

  11: ORDER BY avg_elapsed_time DESC OPTION (RECOMPILE);

  12:  

  13: -- This helps you find high average elapsed time cached stored procedures that

  14: -- may be easy to optimize with standard query tuning techniques

Figure 1: Query #48 SP Avg Elapsed Time

This query gives you a chance to look like a super hero. It shows you the cached stored procedures that have the highest average elapsed time in the current database. This basically gives you a list of stored procedures to look at much more closely, to see if you can do any query optimization or index tuning to make them dramatically faster. If you are able to do your DBA magic and make a long-running stored procedure run much, much faster, people are going to notice, and perhaps think you are some sort of evil genius.

 

Query #49 is SP Worker Time. This query retrieves information from the sys.procedures object catalog view and the sys.dm_exec_procedure_stats dynamic management view about the cached stored procedures that have the highest cumulative worker time in the current database. Query #49 is shown in Figure 2.

   1: -- Top Cached SPs By Total Worker time. Worker time relates to CPU cost  (Query 49) (SP Worker Time)

   2: SELECT TOP(25) p.name AS [SP Name], qs.total_worker_time AS [TotalWorkerTime], 

   3: qs.total_worker_time/qs.execution_count AS [AvgWorkerTime], qs.execution_count, 

   4: ISNULL(qs.execution_count/DATEDIFF(Minute, qs.cached_time, GETDATE()), 0) AS [Calls/Minute],

   5: qs.total_elapsed_time, qs.total_elapsed_time/qs.execution_count 

   6: AS [avg_elapsed_time], qs.cached_time

   7: FROM sys.procedures AS p WITH (NOLOCK)

   8: INNER JOIN sys.dm_exec_procedure_stats AS qs WITH (NOLOCK)

   9: ON p.[object_id] = qs.[object_id]

  10: WHERE qs.database_id = DB_ID()

  11: ORDER BY qs.total_worker_time DESC OPTION (RECOMPILE);

  12:  

  13: -- This helps you find the most expensive cached stored procedures from a CPU perspective

  14: -- You should look at this if you see signs of CPU pressure

Figure 2: Query #49 SP Worker Time

This query shows you which cached stored procedures have the highest cumulative total worker time in the current database. Worker time means CPU cost. If your instance or server is under CPU pressure, than looking at the stored procedures that show up at the top of this diagnostic query should be a high priority. Even if you are not under sustained CPU pressure, keeping an eye on the top offenders on this query is a good idea. Quite often, you will find the same stored procedures showing up on several of these different “Top SP cost” queries, which means that the SP in question is expensive from multiple perspectives.

SQL Server Diagnostic Information Queries Detailed, Day 20

For Day 20 of this series, we start out with Query #46, which is Query Execution Counts. This query retrieves information from the sys.dm_exec_query_stats dynamic management view, the sys.dm_exec_sql_text dynamic management function, and the sys.dm_exec_query_plan dynamic management function about the most frequently executed cached queries in the current database. Query #46 is shown in Figure 1.

   1: -- Get most frequently executed queries for this database (Query 46) (Query Execution Counts)

   2: SELECT TOP(50) LEFT(t., 50) AS [Short Query Text], qs.execution_count AS [Execution Count],

   3: qs.total_logical_reads AS [Total Logical Reads],

   4: qs.total_logical_reads/qs.execution_count AS [Avg Logical Reads],

   5: qs.total_worker_time AS [Total Worker Time],

   6: qs.total_worker_time/qs.execution_count AS [Avg Worker Time], 

   7: qs.total_elapsed_time AS [Total Elapsed Time],

   8: qs.total_elapsed_time/qs.execution_count AS [Avg Elapsed Time], 

   9: qs.creation_time AS [Creation Time]

  10: --,t. AS [Complete Query Text], qp.query_plan AS [Query Plan] -- uncomment out these columns if not copying results to Excel

  11: FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK)

  12: CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS t 

  13: CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp 

  14: WHERE t.dbid = DB_ID()

  15: ORDER BY qs.execution_count DESC OPTION (RECOMPILE);

Figure 1: Query #46 Query Execution Counts

This query shows you which cached queries (which might be part of a stored procedure or not) are being called the most often. This is useful as a part of understanding the nature of your workload. Keep in mind that just because a query is called a lot does not necessarily mean that it is a key part of your workload. It might be, but it could be that it is not actually that expensive for individual calls or cumulatively. You will need to look at the other metrics for that query to determine that.

You may notice that I have one line of this query commented out. This is because Excel does not deal very well with large quantities of text or XML. If you are working with this in real time, you should probably uncomment that line, so you see the extra information that it retrieves.

 

Query #47 is SP Execution Counts. This query retrieves information from the sys.procedures object catalog view and the sys.dm_exec_procedure_stats dynamic management view about the most frequently executed cached stored procedures in the current database. Query #47 is shown in Figure 2.

   1: -- Top Cached SPs By Execution Count (Query 47) (SP Execution Counts)

   2: SELECT TOP(100) p.name AS [SP Name], qs.execution_count,

   3: ISNULL(qs.execution_count/DATEDIFF(Minute, qs.cached_time, GETDATE()), 0) AS [Calls/Minute],

   4: qs.total_worker_time/qs.execution_count AS [AvgWorkerTime], qs.total_worker_time AS [TotalWorkerTime],  

   5: qs.total_elapsed_time, qs.total_elapsed_time/qs.execution_count AS [avg_elapsed_time],

   6: qs.cached_time

   7: FROM sys.procedures AS p WITH (NOLOCK)

   8: INNER JOIN sys.dm_exec_procedure_stats AS qs WITH (NOLOCK)

   9: ON p.[object_id] = qs.[object_id]

  10: WHERE qs.database_id = DB_ID()

  11: ORDER BY qs.execution_count DESC OPTION (RECOMPILE);

  12:  

  13: -- Tells you which cached stored procedures are called the most often

  14: -- This helps you characterize and baseline your workload

Figure 2: Query #47 SP Execution Counts

This query shows you which stored procedures with cached query plans are being called the most often. This helps you understand the nature and magnitude of your workload. Ideally, you should have a general idea of what your normal workload looks like, in terms of how many calls/minute or per second you are seeing for your top stored procedures.

If this rate suddenly changes, you would want to investigate further to understand what might have happened. Understanding which stored procedures are called the most often, can also help you identify possible candidates for middle-tier caching.

SQL Server Diagnostic Information Queries Detailed, Day 19

After eighteen days of queries, we have made it through all of the instance-level queries in this set. Now, we move on to the database-specific queries in the set. For these queries, you need to be connected to a particular database that you are concerned with, rather than the master system database.

For Day 19 of this series, we start out with Query #44, which is File Sizes and Space. This query retrieves information from the sys.database_files system catalog view and the sys.data_spaces system catalog view about The sizes and available space for all of your database files. Query #44 is shown in Figure 1.

   1: -- Individual File Sizes and space available for current database  (Query 44) (File Sizes and Space)

   2: SELECT f.name AS [File Name] , f.physical_name AS [Physical Name], 

   3: CAST((f.size/128.0) AS DECIMAL(15,2)) AS [Total Size in MB],

   4: CAST(f.size/128.0 - CAST(FILEPROPERTY(f.name, 'SpaceUsed') AS int)/128.0 AS DECIMAL(15,2)) 

   5: AS [Available Space In MB], [file_id], fg.name AS [Filegroup Name],

   6: f.is_percent_growth, f.growth

   7: FROM sys.database_files AS f WITH (NOLOCK) 

   8: LEFT OUTER JOIN sys.data_spaces AS fg WITH (NOLOCK) 

   9: ON f.data_space_id = fg.data_space_id OPTION (RECOMPILE);

  10:  

  11: -- Look at how large and how full the files are and where they are located

  12: -- Make sure the transaction log is not full!!

Figure 1: Query #44 File Sizes and Space

This query lets you see how large each of your database files are, plus how much space is available in each of your database files. For data files, you can also see what file group each file is in. You can also see exactly where each file is located in the file system. This is all extremely useful information.

 

Query #45 is IO Stats By File. This query retrieves information from the sys.dm_io_virtual_file_stats dynamic management function and the sys.database_files system catalog view about the cumulative I/O usage by database file. Query #45 is shown in Figure 2.

   1: -- I/O Statistics by file for the current database  (Query 45) (IO Stats By File)

   2: SELECT DB_NAME(DB_ID()) AS [Database Name], df.name AS [Logical Name], vfs.[file_id], df.type_desc,

   3: df.physical_name AS [Physical Name], CAST(vfs.size_on_disk_bytes/1048576.0 AS DECIMAL(10, 2)) AS [Size on Disk (MB)],

   4: vfs.num_of_reads, vfs.num_of_writes, vfs.io_stall_read_ms, vfs.io_stall_write_ms,

   5: CAST(100. * vfs.io_stall_read_ms/(vfs.io_stall_read_ms + vfs.io_stall_write_ms) AS DECIMAL(10,1)) AS [IO Stall Reads Pct],

   6: CAST(100. * vfs.io_stall_write_ms/(vfs.io_stall_write_ms + vfs.io_stall_read_ms) AS DECIMAL(10,1)) AS [IO Stall Writes Pct],

   7: (vfs.num_of_reads + vfs.num_of_writes) AS [Writes + Reads], 

   8: CAST(vfs.num_of_bytes_read/1048576.0 AS DECIMAL(10, 2)) AS [MB Read], 

   9: CAST(vfs.num_of_bytes_written/1048576.0 AS DECIMAL(10, 2)) AS [MB Written],

  10: CAST(100. * vfs.num_of_reads/(vfs.num_of_reads + vfs.num_of_writes) AS DECIMAL(10,1)) AS [# Reads Pct],

  11: CAST(100. * vfs.num_of_writes/(vfs.num_of_reads + vfs.num_of_writes) AS DECIMAL(10,1)) AS [# Write Pct],

  12: CAST(100. * vfs.num_of_bytes_read/(vfs.num_of_bytes_read + vfs.num_of_bytes_written) AS DECIMAL(10,1)) AS [Read Bytes Pct],

  13: CAST(100. * vfs.num_of_bytes_written/(vfs.num_of_bytes_read + vfs.num_of_bytes_written) AS DECIMAL(10,1)) AS [Written Bytes Pct]

  14: FROM sys.dm_io_virtual_file_stats(DB_ID(), NULL) AS vfs

  15: INNER JOIN sys.database_files AS df WITH (NOLOCK)

  16: ON vfs.[file_id]= df.[file_id] OPTION (RECOMPILE);

  17:  

  18: -- This helps you characterize your workload better from an I/O perspective for this database

  19: -- It helps you determine whether you has an OLTP or DW/DSS type of workload

Figure 2: Query #45 IO Stats By File

This query lets you see all of the cumulative file activity for each of the files in the current database, since SQL Server was last started. This includes your normal workload activity, plus any other activity that touches your data and log files. This would include things like database backups, index maintenance, DBCC CHECKDB activity, and HA-related activity from things like transactional replication, database mirroring, and AlwaysOn AG-related activity.

Looking at the results of this query helps you understand what kind of I/O workload activity you are seeing on each of your database files. This helps you do a better job when it comes to designing and configuring your storage subsystem.

CPU-Z Benchmark Survey

The latest version of the free CPU-Z utility has a quick CPU benchmark test that just takes a couple of minutes to run. As part of a personal project that I am working on (which I think will be very interesting and beneficial to the SQL Server community), I am trying to collect as many CPU-Z CPU benchmark results as possible, covering as many different families and models of processors as possible. If you have about five minutes of spare time, perhaps you can help me with this!

With CPU-Z 1.75, simply click on the “Bench CPU” button on the Bench tab (as shown in Figure 2). Once the test finishes in a couple of minutes, take a screenshot of that tab (ALT-Print Screen in Windows), and paste it in an e-mail. Then take a screenshot of the CPU tab (so I can easily identify your processor), like you see in Figure 1, and include that in your e-mail. Another way to get these screenshots is to hit the F5 key, while you are on those two tabs, which will save a .bmp file in the same directory as CPU-Z.

I am mainly looking for results for bare-metal, non-virtualized machines right now. If possible, make sure the Windows High Performance power plan is enabled, and that your machine is plugged in (if it is a laptop or tablet). Ideally, you would do this while your machine is relatively idle, so that all of the processing power is available for the test.

If you run this on a server, please don’t do it while it is in Production!

 

clip_image002

Figure 1: CPU-Z CPU Tab

 

Make sure to only click the Bench CPU tab, not the Stress CPU tab!

clip_image002[5]

Figure 2: CPU-Z Bench Tab

Once you are done, simply send me your screenshots by e-mail. Please don’t try to return any results by comments on this blog post.

If you would like to do this for multiple machines, that would be great!  Thanks!

SQL Server Diagnostic Information Queries Detailed, Day 18

For Day 18 of this series, we start out with Query #41, which is Memory Clerk Usage. This query retrieves information from the sys.dm_os_memory_clerks dynamic management view about total memory usage by your active memory clerks. Query #41 is shown in Figure 1.

   1: -- Memory Clerk Usage for instance  (Query 41) (Memory Clerk Usage)

   2: -- Look for high value for CACHESTORE_SQLCP (Ad-hoc query plans)

   3: SELECT TOP(10) mc.[type] AS [Memory Clerk Type], 

   4:        CAST((SUM(mc.pages_kb)/1024.0) AS DECIMAL (15,2)) AS [Memory Usage (MB)] 

   5: FROM sys.dm_os_memory_clerks AS mc WITH (NOLOCK)

   6: GROUP BY mc.[type]  

   7: ORDER BY SUM(mc.pages_kb) DESC OPTION (RECOMPILE);

   8:  

   9: -- MEMORYCLERK_SQLBUFFERPOOL was new for SQL Server 2012. It should be your highest consumer of memory

  10:  

  11: -- CACHESTORE_SQLCP  SQL Plans         

  12: -- These are cached SQL statements or batches that aren't in stored procedures, functions and triggers

  13: -- Watch out for high values for CACHESTORE_SQLCP

  14:  

  15: -- CACHESTORE_OBJCP  Object Plans      

  16: -- These are compiled plans for stored procedures, functions and triggers

Figure 1: Query #41 PLE by NUMA Node

This query shows you which memory clerks are using the most memory on your instance. With SQL Server 2012 or newer, your top memory clerk by memory usage should be MEMORYCLERK_SQLBUFFERPOOL, meaning memory usage by the SQL Server Buffer Pool. It is very common to see a high value for the CACHESTORE_SQLCP memory clerk, indicating that you have multiple GB of cached ad hoc or prepared query plans in the plan cache. If you see that, then you should look at the next query more closely, for several things you can do to help mitigate this issue.

 

Query #42 is Ad hoc Queries. This query retrieves information from the sys.dm_exec_cached_plans dynamic management view and the sys.dm_exec_sql_text dynamic management function about the single-use ad hoc and prepared query plans. Query #42 is shown in Figure 2.

   1: -- Find single-use, ad-hoc and prepared queries that are bloating the plan cache  (Query 42) (Ad hoc Queries)

   2: SELECT TOP(50)  AS [QueryText], cp.cacheobjtype, cp.objtype, cp.size_in_bytes/1024 AS [Plan Size in KB]

   3: FROM sys.dm_exec_cached_plans AS cp WITH (NOLOCK)

   4: CROSS APPLY sys.dm_exec_sql_text(plan_handle) 

   5: WHERE cp.cacheobjtype = N'Compiled Plan' 

   6: AND cp.objtype IN (N'Adhoc', N'Prepared') 

   7: AND cp.usecounts = 1

   8: ORDER BY cp.size_in_bytes DESC OPTION (RECOMPILE);

   9:  

  10: -- Gives you the text, type and size of single-use ad-hoc and prepared queries that waste space in the plan cache

  11: -- Enabling 'optimize for ad hoc workloads' for the instance can help (SQL Server 2008 and above only)

  12: -- Running DBCC FREESYSTEMCACHE ('SQL Plans') periodically may be required to better control this

  13: -- Enabling forced parameterization for the database can help, but test first!

  14:  

  15: -- Plan cache, adhoc workloads and clearing the single-use plan cache bloat

  16: -- http://www.sqlskills.com/blogs/kimberly/plan-cache-adhoc-workloads-and-clearing-the-single-use-plan-cache-bloat/

Figure 2: Query #42 Ad hoc Queries

This query will show you which single-use ad hoc or prepared query plans are using the most space in the plan cache. Once you know who the culprits are, you can start investigating them more closely. Perhaps these queries can be converted to stored procedures or parameterized SQL. At the very least, I think you should enable “optimize for ad hoc workloads” at the instance level pretty much as a default setting. On top of this, it is usually a good idea to periodically flush that particular cache, using the DBCC FREESYSTEMCACHE (‘SQL Plans’); command.

 

Query #43 is Top Logical Reads Queries. This query retrieves information from the sys.dm_exec_query_stats dynamic management view, the sys.dm_exec_sql_text dynamic management function and the sys.dm_exec_query_plan dynamic management function about the cached query plans that have the highest total logical reads. Query #43 is shown in Figure 3.

   1: -- Get top total logical reads queries for entire instance (Query 43) (Top Logical Reads Queries)

   2: SELECT TOP(50) DB_NAME(t.[dbid]) AS [Database Name], LEFT(t., 50) AS [Short Query Text],

   3: qs.total_logical_reads AS [Total Logical Reads],

   4: qs.min_logical_reads AS [Min Logical Reads],

   5: qs.total_logical_reads/qs.execution_count AS [Avg Logical Reads],

   6: qs.max_logical_reads AS [Max Logical Reads],   

   7: qs.min_worker_time AS [Min Worker Time],

   8: qs.total_worker_time/qs.execution_count AS [Avg Worker Time], 

   9: qs.max_worker_time AS [Max Worker Time], 

  10: qs.min_elapsed_time AS [Min Elapsed Time], 

  11: qs.total_elapsed_time/qs.execution_count AS [Avg Elapsed Time], 

  12: qs.max_elapsed_time AS [Max Elapsed Time],

  13: qs.execution_count AS [Execution Count], qs.creation_time AS [Creation Time]

  14: --,t. AS [Complete Query Text], qp.query_plan AS [Query Plan] -- uncomment out these columns if not copying results to Excel

  15: FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK)

  16: CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS t 

  17: CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp 

  18: ORDER BY qs.total_logical_reads DESC OPTION (RECOMPILE);

  19:  

  20:  

  21: -- Helps you find the most expensive queries from a memory perspective across the entire instance

  22: -- Can also help track down parameter sniffing issues

Figure 3: Query #40 Top Logical Reads Queries

Having logical reads means that you are finding the data you need to satisfy a query in the SQL Server Buffer Pool rather than having to go out to the storage subsystem, which is a good thing. Queries that have high numbers of logical reads are creating extra internal memory pressure on your system. They also indirectly create read I/O pressure, since the data that is in the buffer pool has to be initially read from the storage subsystem. If you are seeing signs of memory pressure, then knowing which cached queries (across the entire instance) that have the highest number of total logical reads can help you understand which queries are causing the most memory pressure.

Once you understand this, then you can start looking at individual queries in more detail. Perhaps there is a missing index that is causing a clustered index scan that is causing high numbers of logical reads in a query. Perhaps there is an implicit conversion in a JOIN or in a WHERE clause that is causing SQL Server to ignore a useful index. Maybe someone is pulling back more columns than they need for a query. There are lots of possibilities here.

These three Pluralsight Courses go into even more detail about how to run these queries and interpret the results:

SQL Server 2014 DMV Diagnostic Queries – Part 1

SQL Server 2014 DMV Diagnostic Queries – Part 2

SQL Server 2014 DMV Diagnostic Queries – Part 3

SQL Server Diagnostic Information Queries Detailed, Day 17

For Day 17 of this series, we start out with Query #39, which is PLE by NUMA Node. This query retrieves information from the sys.dm_os_performance_counters dynamic management view about your page life expectancy (PLE) by NUMA node. Query #39 is shown in Figure 1.

   1: -- Page Life Expectancy (PLE) value for each NUMA node in current instance  (Query 39) (PLE by NUMA Node)

   2: SELECT @@SERVERNAME AS [Server Name], [object_name], instance_name, cntr_value AS [Page Life Expectancy]

   3: FROM sys.dm_os_performance_counters WITH (NOLOCK)

   4: WHERE [object_name] LIKE N'%Buffer Node%' -- Handles named instances

   5: AND counter_name = N'Page life expectancy' OPTION (RECOMPILE);

   6:  

   7: -- PLE is a good measurement of memory pressure

   8: -- Higher PLE is better. Watch the trend over time, not the absolute value

   9: -- This will only return one row for non-NUMA systems

  10:  

  11: -- Page Life Expectancy isn’t what you think…

  12: -- http://www.sqlskills.com/blogs/paul/page-life-expectancy-isnt-what-you-think/

Figure 1: Query #39 PLE by NUMA Node

I think that page life expectancy (PLE) is probably one of the best ways to gauge whether you are under internal memory pressure, as long as you think about it correctly. What you should do is monitor your PLE value ranges over time so that you know what your typical minimum, average, and maximum PLE values are at different times and on different days of the week. They will usually vary quite a bit according to your workload.

The ancient guidance that a PLE measurement of 300 or higher is good, is really not relevant with modern database servers with much higher amounts of physical RAM compared to 10-12 years ago. Basically, higher PLE values are always better. You want to watch the ranges and trends over time, rather than focus on a single measurement.

 

 

Query #40 is Memory Grants Pending. This query retrieves information from the sys.dm_os_performance_counters dynamic management view about the current value of the Memory Grants Pending performance counter. Query #40 is shown in Figure 2.

   1: -- Memory Grants Pending value for current instance  (Query 40) (Memory Grants Pending)

   2: SELECT @@SERVERNAME AS [Server Name], [object_name], cntr_value AS [Memory Grants Pending]                                                                                                       

   3: FROM sys.dm_os_performance_counters WITH (NOLOCK)

   4: WHERE [object_name] LIKE N'%Memory Manager%' -- Handles named instances

   5: AND counter_name = N'Memory Grants Pending' OPTION (RECOMPILE);

   6:  

   7: -- Run multiple times, and run periodically is you suspect you are under memory pressure

   8: -- Memory Grants Pending above zero for a sustained period is a very strong indicator of internal memory pressure

Figure 2: Query #40 Memory Grants Pending

This query is another way to determine whether you are under severe internal memory pressure. The value of this query will change from second to second, so you will want to run it multiple times when you suspect you are under memory pressure. Any sustained value above zero is not a good sign. In fact, it is a very bad sign, showing that you are under pretty extreme memory pressure.

SQL Server Diagnostic Information Queries Detailed, Day 16

For Day 16 of this series, we start out with Query #37, which is CPU Utilization History. This query retrieves information from the somewhat undocumented sys.dm_os_ring_buffers dynamic management view about recent CPU utilization by SQL Server. Query #37 is shown in Figure 1.

   1: -- Get CPU Utilization History for last 256 minutes (in one minute intervals)  (Query 37) (CPU Utilization History)

   2: -- This version works with SQL Server 2016

   3: DECLARE @ts_now bigint = (SELECT cpu_ticks/(cpu_ticks/ms_ticks) FROM sys.dm_os_sys_info WITH (NOLOCK)); 

   4:  

   5: SELECT TOP(256) SQLProcessUtilization AS [SQL Server Process CPU Utilization], 

   6:                SystemIdle AS [System Idle Process], 

   7:                100 - SystemIdle - SQLProcessUtilization AS [Other Process CPU Utilization], 

   8:                DATEADD(ms, -1 * (@ts_now - [timestamp]), GETDATE()) AS [Event Time] 

   9: FROM (SELECT record.value('(./Record/@id)[1]', 'int') AS record_id, 

  10:             record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') 

  11:             AS [SystemIdle], 

  12:             record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') 

  13:             AS [SQLProcessUtilization], [timestamp] 

  14:       FROM (SELECT [timestamp], CONVERT(xml, record) AS [record] 

  15:             FROM sys.dm_os_ring_buffers WITH (NOLOCK)

  16:             WHERE ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR' 

  17:             AND record LIKE N'%<SystemHealth>%') AS x) AS y 

  18: ORDER BY record_id DESC OPTION (RECOMPILE);

  19:  

  20: -- Look at the trend over the entire period 

  21: -- Also look at high sustained Other Process CPU Utilization values

Figure 1: Query #37 CPU Utilization History

This query shows you the average CPU utilization history by the current instance of SQL Server, plus the summed average CPU utilization by all other processes on your machine are captured in one minute increments for the past 256 minutes. This lets you go back in time and see what has been happening with processor utilization over that period. It is always nice to know whether an episode of high CPU utilization has been sustained or whether it has been going on for just a short period. It is also nice to understand how much CPU other processes are using on your machine.

Ideally, you have some sort of monitoring system in place to let you review your history and trends for more than four hours. Not all 3rd party monitoring systems are created equal. Some are much better than others. Personally, I really like SQLSentry PerformanceAdvisor.

Query #38 is Top Worker Time Queries. This query retrieves information from the sys.dm_exec_query_stats dynamic management view, the sys.dm_exec_sql_text dynamic management function, and the sys.dm_exec_query_plan dynamic management function about the highest cumulative worker time queries across your entire instance. Query #38 is shown in Figure 2.

   1: -- Get top total worker time queries for entire instance (Query 38) (Top Worker Time Queries)

   2: SELECT TOP(50) DB_NAME(t.[dbid]) AS [Database Name], LEFT(t., 50) AS [Short Query Text],  

   3: qs.total_worker_time AS [Total Worker Time], qs.min_worker_time AS [Min Worker Time],

   4: qs.total_worker_time/qs.execution_count AS [Avg Worker Time], 

   5: qs.max_worker_time AS [Max Worker Time], 

   6: qs.min_elapsed_time AS [Min Elapsed Time], 

   7: qs.total_elapsed_time/qs.execution_count AS [Avg Elapsed Time], 

   8: qs.max_elapsed_time AS [Max Elapsed Time],

   9: qs.min_logical_reads AS [Min Logical Reads],

  10: qs.total_logical_reads/qs.execution_count AS [Avg Logical Reads],

  11: qs.max_logical_reads AS [Max Logical Reads], 

  12: qs.execution_count AS [Execution Count], qs.creation_time AS [Creation Time]

  13: -- ,t. AS [Query Text], qp.query_plan AS [Query Plan] -- uncomment out these columns if not copying results to Excel

  14: FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK)

  15: CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS t 

  16: CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp 

  17: ORDER BY qs.total_worker_time DESC OPTION (RECOMPILE);

  18:  

  19:  

  20: -- Helps you find the most expensive queries from a CPU perspective across the entire instance

  21: -- Can also help track down parameter sniffing issues

Figure 2: Query #38 Top Worker Time Queries

This query is very useful when you see any signs of CPU pressure. It helps you understand which cached queries are using the most CPU resources across your entire instance. You might notice that one line of the query is commented out. I do this so that the query results will copy and paste into Excel. If you are not going to copy/paste the results, then you will usually get some more useful information by including the last two columns in the SELECT statement.

If you are seeing very wide variations in minimum, average, and maximum worker time for a cached query, this might be evidence that you have a parameter sniffing issue (where some input values for a query produce results that are not well-suited for the cached version of the query plan that is being used). The new Query Store feature in SQL Server 2016 is going to be very useful for detecting and correcting these types of issues.

SQL Server Diagnostic Information Queries Detailed, Day 15

For Day 15 of this series, we start out with Query #35, which is Avg Task Counts. This query retrieves information from the sys.dm_os_schedulers dynamic management view about the current average load across your SQL OS schedulers. Query #35 is shown in Figure 1.

   1: -- Get Average Task Counts (run multiple times)  (Query 35) (Avg Task Counts)

   2: SELECT AVG(current_tasks_count) AS [Avg Task Count], 

   3: AVG(work_queue_count) AS [Avg Work Queue Count],

   4: AVG(runnable_tasks_count) AS [Avg Runnable Task Count],

   5: AVG(pending_disk_io_count) AS [Avg Pending DiskIO Count]

   6: FROM sys.dm_os_schedulers WITH (NOLOCK)

   7: WHERE scheduler_id < 255 OPTION (RECOMPILE);

   8:  

   9: -- Sustained values above 10 suggest further investigation in that area

  10: -- High Avg Task Counts are often caused by blocking/deadlocking or other resource contention

  11:  

  12: -- Sustained values above 1 suggest further investigation in that area

  13: -- High Avg Runnable Task Counts are a good sign of CPU pressure

  14: -- High Avg Pending DiskIO Counts are a sign of disk pressure

Figure 1: Query #35 Avg Task Counts

If you see high average task counts (above 10), that is usually a pretty good indicator of blocking/deadlocking. In some cases, it just means that your instance is very busy, with a high sustained level of activity. If you see average runnable task counts above 0, that is a a good indicator of CPU pressure. If you see average pending IO counts above 0, that is a good indicator of I/O pressure or bottlenecks. You need to run this query multiple times, since the results will change from second to second.

Looking at the results of this query (after I have run it a few times over the course of a few minutes) gives me a good high-level sense of the workload and health of my SQL Server instance.

 

Query #36 is Detect Blocking. This query retrieves information from the sys.dm_exec_requests dynamic management view and the sys.dm_exec_sql_text dynamic management function about any blocking activity that is occurring when you run the query. Query #36 is shown in Figure 2.

   1: -- Detect blocking (run multiple times)  (Query 36) (Detect Blocking)

   2: SELECT t1.resource_type AS [lock type], DB_NAME(resource_database_id) AS [database],

   3: t1.resource_associated_entity_id AS [blk object],t1.request_mode AS [lock req],  --- lock requested

   4: t1.request_session_id AS [waiter sid], t2.wait_duration_ms AS [wait time],       -- spid of waiter  

   5: (SELECT  FROM sys.dm_exec_requests AS r WITH (NOLOCK)                      -- get sql for waiter

   6: CROSS APPLY sys.dm_exec_sql_text(r.[sql_handle]) 

   7: WHERE r.session_id = t1.request_session_id) AS [waiter_batch],

   8: (SELECT SUBSTRING(qt.,r.statement_start_offset/2, 

   9:     (CASE WHEN r.statement_end_offset = -1 

  10:     THEN LEN(CONVERT(nvarchar(max), qt.)) * 2 

  11:     ELSE r.statement_end_offset END - r.statement_start_offset)/2) 

  12: FROM sys.dm_exec_requests AS r WITH (NOLOCK)

  13: CROSS APPLY sys.dm_exec_sql_text(r.[sql_handle]) AS qt

  14: WHERE r.session_id = t1.request_session_id) AS [waiter_stmt],                    -- statement blocked

  15: t2.blocking_session_id AS [blocker sid],                                        -- spid of blocker

  16: (SELECT  FROM sys.sysprocesses AS p                                        -- get sql for blocker

  17: CROSS APPLY sys.dm_exec_sql_text(p.[sql_handle]) 

  18: WHERE p.spid = t2.blocking_session_id) AS [blocker_batch]

  19: FROM sys.dm_tran_locks AS t1 WITH (NOLOCK)

  20: INNER JOIN sys.dm_os_waiting_tasks AS t2 WITH (NOLOCK)

  21: ON t1.lock_owner_address = t2.resource_address OPTION (RECOMPILE);

  22:  

  23: -- Helps troubleshoot blocking and deadlocking issues

  24: -- The results will change from second to second on a busy system

  25: -- You should run this query multiple times when you see signs of blocking

Figure 2: Query #36 Detect Blocking

If no blocking is happening when you run this query, it will not return any results. This is what you want to see! You need to run this query multiple times, since the results will often change from second to second. Don’t just run it once, and then conclude that there is no blocking happening at any time.

If any blocking is occurring, then this query will show you the blocked query text and the query text of the blocker. This information can be very useful when it comes to understanding what is going on when blocking or deadlocking is happening. Many times, excessive blocking and deadlocking is caused by missing indexes on a table, so proper index tuning can be a very effective solution.

SQL Server Diagnostic Information Queries Detailed, Day 14

For Day 14 of this series, we start out with Query #33, which is Top Waits. This query retrieves information from the sys.dm_os_wait_stats dynamic management view about the cumulative wait statistics for the instance since the last time it was restarted (or the wait statistics were manually cleared). Query #33 is shown in Figure 1.

   1: -- Clear Wait Stats with this command

   2: -- DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);

   3:  

   4: -- Isolate top waits for server instance since last restart or wait statistics clear  (Query 33) (Top Waits)

   5: WITH [Waits] 

   6: AS (SELECT wait_type, wait_time_ms/ 1000.0 AS [WaitS],

   7:           (wait_time_ms - signal_wait_time_ms) / 1000.0 AS [ResourceS],

   8:            signal_wait_time_ms / 1000.0 AS [SignalS],

   9:            waiting_tasks_count AS [WaitCount],

  10:            100.0 *  wait_time_ms / SUM (wait_time_ms) OVER() AS [Percentage],

  11:            ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS [RowNum]

  12:     FROM sys.dm_os_wait_stats WITH (NOLOCK)

  13:     WHERE [wait_type] NOT IN (

  14:         N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR', N'BROKER_TASK_STOP',

  15:         N'BROKER_TO_FLUSH', N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE',

  16:         N'CHKPT', N'CLR_AUTO_EVENT', N'CLR_MANUAL_EVENT', N'CLR_SEMAPHORE',

  17:         N'DBMIRROR_DBM_EVENT', N'DBMIRROR_EVENTS_QUEUE', N'DBMIRROR_WORKER_QUEUE',

  18:         N'DBMIRRORING_CMD', N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEUE_SEMAPHORE',

  19:         N'EXECSYNC', N'FSAGENT', N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX',

  20:         N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION', N'HADR_LOGCAPTURE_WAIT', 

  21:         N'HADR_NOTIFICATION_DEQUEUE', N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE',

  22:         N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP', N'LOGMGR_QUEUE', 

  23:         N'MEMORY_ALLOCATION_EXT', N'ONDEMAND_TASK_QUEUE',

  24:         N'PREEMPTIVE_OS_LIBRARYOPS', N'PREEMPTIVE_OS_COMOPS', N'PREEMPTIVE_OS_CRYPTOPS',

  25:         N'PREEMPTIVE_OS_PIPEOPS', N'PREEMPTIVE_OS_AUTHENTICATIONOPS',

  26:         N'PREEMPTIVE_OS_GENERICOPS', N'PREEMPTIVE_OS_VERIFYTRUST',

  27:         N'PREEMPTIVE_OS_FILEOPS', N'PREEMPTIVE_OS_DEVICEOPS',

  28:         N'PWAIT_ALL_COMPONENTS_INITIALIZED', N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP',

  29:         N'QDS_ASYNC_QUEUE',

  30:         N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP', N'REQUEST_FOR_DEADLOCK_SEARCH',

  31:         N'RESOURCE_QUEUE', N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH', N'SLEEP_DBSTARTUP',

  32:         N'SLEEP_DCOMSTARTUP', N'SLEEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY',

  33:         N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP', N'SLEEP_SYSTEMTASK', N'SLEEP_TASK',

  34:         N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT', N'SP_SERVER_DIAGNOSTICS_SLEEP',

  35:         N'SQLTRACE_BUFFER_FLUSH', N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', N'SQLTRACE_WAIT_ENTRIES',

  36:         N'WAIT_FOR_RESULTS', N'WAITFOR', N'WAITFOR_TASKSHUTDOWN', N'WAIT_XTP_HOST_WAIT',

  37:         N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG', N'WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHER_JOIN',

  38:         N'XE_DISPATCHER_WAIT', N'XE_LIVE_TARGET_TVF', N'XE_TIMER_EVENT')

  39:     AND waiting_tasks_count > 0)

  40: SELECT

  41:     MAX (W1.wait_type) AS [WaitType],

  42:     CAST (MAX (W1.WaitS) AS DECIMAL (16,2)) AS [Wait_Sec],

  43:     CAST (MAX (W1.ResourceS) AS DECIMAL (16,2)) AS [Resource_Sec],

  44:     CAST (MAX (W1.SignalS) AS DECIMAL (16,2)) AS [Signal_Sec],

  45:     MAX (W1.WaitCount) AS [Wait Count],

  46:     CAST (MAX (W1.Percentage) AS DECIMAL (5,2)) AS [Wait Percentage],

  47:     CAST ((MAX (W1.WaitS) / MAX (W1.WaitCount)) AS DECIMAL (16,4)) AS [AvgWait_Sec],

  48:     CAST ((MAX (W1.ResourceS) / MAX (W1.WaitCount)) AS DECIMAL (16,4)) AS [AvgRes_Sec],

  49:     CAST ((MAX (W1.SignalS) / MAX (W1.WaitCount)) AS DECIMAL (16,4)) AS [AvgSig_Sec]

  50: FROM Waits AS W1

  51: INNER JOIN Waits AS W2

  52: ON W2.RowNum <= W1.RowNum

  53: GROUP BY W1.RowNum

  54: HAVING SUM (W2.Percentage) - MAX (W1.Percentage) < 99 -- percentage threshold

  55: OPTION (RECOMPILE);

  56:  

  57: -- Cumulative wait stats are not as useful on an idle instance that is not under load or performance pressure

  58:  

  59: -- The SQL Server Wait Type Repository

  60: -- http://blogs.msdn.com/b/psssql/archive/2009/11/03/the-sql-server-wait-type-repository.aspx

  61:  

  62: -- Wait statistics, or please tell me where it hurts

  63: -- http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/

  64:  

  65: -- SQL Server 2005 Performance Tuning using the Waits and Queues

  66: -- http://technet.microsoft.com/en-us/library/cc966413.aspx

  67:  

  68: -- sys.dm_os_wait_stats (Transact-SQL)

  69: -- http://msdn.microsoft.com/en-us/library/ms179984(v=sql.120).aspx

Figure 1: Query #33 Top Waits

This query is can be very useful when your instance has been experiencing performance problems. At the same time, I have seen many DBAs spend way too much time agonizing about their top wait statistics when they don’t need to. SQL Server will always be waiting on some type of resource (which is why I try to filter out what are generally considered to be benign wait types). If your instance is performing well, and nobody is complaining about performance, then you can relax a little bit.

Another issue with the results of this query is that there is a lot of bad advice on the internet about what certain wait types mean and what, if anything, you should do if you see them. This often leads to what Paul Randal calls “knee-jerk” performance tuning, where you see a certain wait type, and then immediately want to make some configuration change without doing any further investigation or putting any deeper thought into the matter.

After all of those cautions, this query can be very useful in pointing you in one direction or another to do deeper investigation, especially when your instance has been performing poorly. If you do make any configuration changes, or do something else that might affect performance (such as adding an index), then it is a good idea to clear the wait statistics so that the old cumulative wait statistics don’t obscure what is going on after the change.

 

Query #34 is Connection Counts by IP Address. This query retrieves information from the sys.dm_exec_sessions dynamic management view and the  sys.dm_exec_connections dynamic management view about your current connection counts by IP address. Query #34 is shown in Figure 2.

   1: -- Get a count of SQL connections by IP address (Query 34) (Connection Counts by IP Address)

   2: SELECT ec.client_net_address, es.[program_name], es.[host_name], es.login_name, 

   3: COUNT(ec.session_id) AS [connection count] 

   4: FROM sys.dm_exec_sessions AS es WITH (NOLOCK) 

   5: INNER JOIN sys.dm_exec_connections AS ec WITH (NOLOCK) 

   6: ON es.session_id = ec.session_id 

   7: GROUP BY ec.client_net_address, es.[program_name], es.[host_name], es.login_name  

   8: ORDER BY ec.client_net_address, es.[program_name] OPTION (RECOMPILE);

   9:  

  10: -- This helps you figure where your database load is coming from

  11: -- and verifies connectivity from other machines

Figure 2: Query #34 Connection Counts by IP Address

This query helps you see the magnitude of your workload and judge whether it is in the normal range that you should be seeing. I think it is a good idea to have a baseline for how many connections your database server typically has from whatever other machines normally connect to it. This query can also help you confirm and troubleshoot connectivity issues from other machines. I can’t tell you how many times that people have claimed my SQL Server instance was down because they could not connect to it. In the vast majority of cases, they simply had an incorrect connection string or there was a blocked port on their machine that prevented the connection. Remember, the database is always guilty until proven innocent!

SQL Server Diagnostic Information Queries Detailed, Day 13

For Day 13 of this series, we start out with Query #30, which is CPU Usage by Database. This query retrieves information from the sys.dm_exec_query_stats dynamic management view and from the sys.dm_exec_plan_attributes dynamic management function about total CPU usage by database for cached query plans. Query #30 is shown in Figure 1.

   1: -- Get CPU utilization by database (Query 30) (CPU Usage by Database)

   2: WITH DB_CPU_Stats

   3: AS

   4: (SELECT pa.DatabaseID, DB_Name(pa.DatabaseID) AS [Database Name], SUM(qs.total_worker_time/1000) AS [CPU_Time_Ms]

   5:  FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK)

   6:  CROSS APPLY (SELECT CONVERT(int, value) AS [DatabaseID] 

   7:               FROM sys.dm_exec_plan_attributes(qs.plan_handle)

   8:               WHERE attribute = N'dbid') AS pa

   9:  GROUP BY DatabaseID)

  10: SELECT ROW_NUMBER() OVER(ORDER BY [CPU_Time_Ms] DESC) AS [CPU Rank],

  11:        [Database Name], [CPU_Time_Ms] AS [CPU Time (ms)], 

  12:        CAST([CPU_Time_Ms] * 1.0 / SUM([CPU_Time_Ms]) OVER() * 100.0 AS DECIMAL(5, 2)) AS [CPU Percent]

  13: FROM DB_CPU_Stats

  14: WHERE DatabaseID <> 32767 -- ResourceDB

  15: ORDER BY [CPU Rank] OPTION (RECOMPILE);

  16:  

  17: -- Helps determine which database is using the most CPU resources on the instance

Figure 1: Query #30 CPU Usage by Database

Simply speaking, this query shows you which databases are using the most CPU resources on the instance, at least as far as their cached query plans are concerned. If you are seeing any signs of CPU pressure, this query can help point you at the correct databases to investigate further, to see what queries are using the most CPU resources. There are several other queries in this complete set that can help you find the most expensive cached stored procedures and queries.

 

Query #31 is IO Usage by Database. This query retrieves information from the sys.dm_io_virtual_file_stats dynamic management function about your total cumulative I/O usage by database since SQL Server was last started. Query #31 is shown in Figure 2.

   1: -- Get I/O utilization by database (Query 31) (IO Usage By Database)

   2: WITH Aggregate_IO_Statistics

   3: AS

   4: (SELECT DB_NAME(database_id) AS [Database Name],

   5: CAST(SUM(num_of_bytes_read + num_of_bytes_written)/1048576 AS DECIMAL(12, 2)) AS io_in_mb

   6: FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS [DM_IO_STATS]

   7: GROUP BY database_id)

   8: SELECT ROW_NUMBER() OVER(ORDER BY io_in_mb DESC) AS [I/O Rank], [Database Name], io_in_mb AS [Total I/O (MB)],

   9:        CAST(io_in_mb/ SUM(io_in_mb) OVER() * 100.0 AS DECIMAL(5,2)) AS [I/O Percent]

  10: FROM Aggregate_IO_Statistics

  11: ORDER BY [I/O Rank] OPTION (RECOMPILE);

  12:  

  13: -- Helps determine which database is using the most I/O resources on the instance

Figure 2: Query #31 IO Usage Usage by Database

The figures that this query collects are cumulative since SQL Server last started, and they include all file activity against your database data files and log files. This includes your normal database workload, plus things like index maintenance, DBCC CHECKDB activity, database backups, and any log reader activity. Because of all this, the numbers you see here might be different than you expect.

Query #32 is Total Buffer Usage by Database. This query retrieves information from the sys.dm_os_buffer_descriptors dynamic management view about your current total buffer usage by database. Query #32 is shown in Figure 3.

   1: -- Get total buffer usage by database for current instance  (Query 32) (Total Buffer Usage by Database)

   2: -- This make take some time to run on a busy instance

   3: WITH AggregateBufferPoolUsage

   4: AS

   5: (SELECT DB_NAME(database_id) AS [Database Name],

   6: CAST(COUNT(*) * 8/1024.0 AS DECIMAL (10,2))  AS [CachedSize]

   7: FROM sys.dm_os_buffer_descriptors WITH (NOLOCK)

   8: WHERE database_id <> 32767 -- ResourceDB

   9: GROUP BY DB_NAME(database_id))

  10: SELECT ROW_NUMBER() OVER(ORDER BY CachedSize DESC) AS [Buffer Pool Rank], [Database Name], CachedSize AS [Cached Size (MB)],

  11:        CAST(CachedSize / SUM(CachedSize) OVER() * 100.0 AS DECIMAL(5,2)) AS [Buffer Pool Percent]

  12: FROM AggregateBufferPoolUsage

  13: ORDER BY [Buffer Pool Rank] OPTION (RECOMPILE);

  14:  

  15: -- Tells you how much memory (in the buffer pool) 

  16: -- is being used by each database on the instance

Figure 3: Query #32 Total Buffer Usage Usage by Database

This query shows you which databases are using the most space in the SQL Server Buffer Pool. If you see a database that is using a large amount of memory in the buffer pool, you might be able to improve the situation by doing some query or index tuning, or by using SQL Server Data Compression on some of your indexes. It may be that you just have a large database that has a lot of activity, so it has a lot of data in the buffer pool, by design.