SQL Server Diagnostic Information Queries Detailed, Day 14

For Day 14 of this series, we start out with Query #33, which is Top Waits. This query retrieves information from the sys.dm_os_wait_stats dynamic management view about the cumulative wait statistics for the instance since the last time it was restarted (or the wait statistics were manually cleared). Query #33 is shown in Figure 1.

   1: -- Clear Wait Stats with this command

   2: -- DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);

   3:  

   4: -- Isolate top waits for server instance since last restart or wait statistics clear  (Query 33) (Top Waits)

   5: WITH [Waits] 

   6: AS (SELECT wait_type, wait_time_ms/ 1000.0 AS [WaitS],

   7:           (wait_time_ms - signal_wait_time_ms) / 1000.0 AS [ResourceS],

   8:            signal_wait_time_ms / 1000.0 AS [SignalS],

   9:            waiting_tasks_count AS [WaitCount],

  10:            100.0 *  wait_time_ms / SUM (wait_time_ms) OVER() AS [Percentage],

  11:            ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS [RowNum]

  12:     FROM sys.dm_os_wait_stats WITH (NOLOCK)

  13:     WHERE [wait_type] NOT IN (

  14:         N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR', N'BROKER_TASK_STOP',

  15:         N'BROKER_TO_FLUSH', N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE',

  16:         N'CHKPT', N'CLR_AUTO_EVENT', N'CLR_MANUAL_EVENT', N'CLR_SEMAPHORE',

  17:         N'DBMIRROR_DBM_EVENT', N'DBMIRROR_EVENTS_QUEUE', N'DBMIRROR_WORKER_QUEUE',

  18:         N'DBMIRRORING_CMD', N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEUE_SEMAPHORE',

  19:         N'EXECSYNC', N'FSAGENT', N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX',

  20:         N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION', N'HADR_LOGCAPTURE_WAIT', 

  21:         N'HADR_NOTIFICATION_DEQUEUE', N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE',

  22:         N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP', N'LOGMGR_QUEUE', 

  23:         N'MEMORY_ALLOCATION_EXT', N'ONDEMAND_TASK_QUEUE',

  24:         N'PREEMPTIVE_OS_LIBRARYOPS', N'PREEMPTIVE_OS_COMOPS', N'PREEMPTIVE_OS_CRYPTOPS',

  25:         N'PREEMPTIVE_OS_PIPEOPS', N'PREEMPTIVE_OS_AUTHENTICATIONOPS',

  26:         N'PREEMPTIVE_OS_GENERICOPS', N'PREEMPTIVE_OS_VERIFYTRUST',

  27:         N'PREEMPTIVE_OS_FILEOPS', N'PREEMPTIVE_OS_DEVICEOPS',

  28:         N'PWAIT_ALL_COMPONENTS_INITIALIZED', N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP',

  29:         N'QDS_ASYNC_QUEUE',

  30:         N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP', N'REQUEST_FOR_DEADLOCK_SEARCH',

  31:         N'RESOURCE_QUEUE', N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH', N'SLEEP_DBSTARTUP',

  32:         N'SLEEP_DCOMSTARTUP', N'SLEEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY',

  33:         N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP', N'SLEEP_SYSTEMTASK', N'SLEEP_TASK',

  34:         N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT', N'SP_SERVER_DIAGNOSTICS_SLEEP',

  35:         N'SQLTRACE_BUFFER_FLUSH', N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', N'SQLTRACE_WAIT_ENTRIES',

  36:         N'WAIT_FOR_RESULTS', N'WAITFOR', N'WAITFOR_TASKSHUTDOWN', N'WAIT_XTP_HOST_WAIT',

  37:         N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG', N'WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHER_JOIN',

  38:         N'XE_DISPATCHER_WAIT', N'XE_LIVE_TARGET_TVF', N'XE_TIMER_EVENT')

  39:     AND waiting_tasks_count > 0)

  40: SELECT

  41:     MAX (W1.wait_type) AS [WaitType],

  42:     CAST (MAX (W1.WaitS) AS DECIMAL (16,2)) AS [Wait_Sec],

  43:     CAST (MAX (W1.ResourceS) AS DECIMAL (16,2)) AS [Resource_Sec],

  44:     CAST (MAX (W1.SignalS) AS DECIMAL (16,2)) AS [Signal_Sec],

  45:     MAX (W1.WaitCount) AS [Wait Count],

  46:     CAST (MAX (W1.Percentage) AS DECIMAL (5,2)) AS [Wait Percentage],

  47:     CAST ((MAX (W1.WaitS) / MAX (W1.WaitCount)) AS DECIMAL (16,4)) AS [AvgWait_Sec],

  48:     CAST ((MAX (W1.ResourceS) / MAX (W1.WaitCount)) AS DECIMAL (16,4)) AS [AvgRes_Sec],

  49:     CAST ((MAX (W1.SignalS) / MAX (W1.WaitCount)) AS DECIMAL (16,4)) AS [AvgSig_Sec]

  50: FROM Waits AS W1

  51: INNER JOIN Waits AS W2

  52: ON W2.RowNum <= W1.RowNum

  53: GROUP BY W1.RowNum

  54: HAVING SUM (W2.Percentage) - MAX (W1.Percentage) < 99 -- percentage threshold

  55: OPTION (RECOMPILE);

  56:  

  57: -- Cumulative wait stats are not as useful on an idle instance that is not under load or performance pressure

  58:  

  59: -- The SQL Server Wait Type Repository

  60: -- http://blogs.msdn.com/b/psssql/archive/2009/11/03/the-sql-server-wait-type-repository.aspx

  61:  

  62: -- Wait statistics, or please tell me where it hurts

  63: -- http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/

  64:  

  65: -- SQL Server 2005 Performance Tuning using the Waits and Queues

  66: -- http://technet.microsoft.com/en-us/library/cc966413.aspx

  67:  

  68: -- sys.dm_os_wait_stats (Transact-SQL)

  69: -- http://msdn.microsoft.com/en-us/library/ms179984(v=sql.120).aspx

Figure 1: Query #33 Top Waits

This query is can be very useful when your instance has been experiencing performance problems. At the same time, I have seen many DBAs spend way too much time agonizing about their top wait statistics when they don’t need to. SQL Server will always be waiting on some type of resource (which is why I try to filter out what are generally considered to be benign wait types). If your instance is performing well, and nobody is complaining about performance, then you can relax a little bit.

Another issue with the results of this query is that there is a lot of bad advice on the internet about what certain wait types mean and what, if anything, you should do if you see them. This often leads to what Paul Randal calls “knee-jerk” performance tuning, where you see a certain wait type, and then immediately want to make some configuration change without doing any further investigation or putting any deeper thought into the matter.

After all of those cautions, this query can be very useful in pointing you in one direction or another to do deeper investigation, especially when your instance has been performing poorly. If you do make any configuration changes, or do something else that might affect performance (such as adding an index), then it is a good idea to clear the wait statistics so that the old cumulative wait statistics don’t obscure what is going on after the change.

 

Query #34 is Connection Counts by IP Address. This query retrieves information from the sys.dm_exec_sessions dynamic management view and the  sys.dm_exec_connections dynamic management view about your current connection counts by IP address. Query #34 is shown in Figure 2.

   1: -- Get a count of SQL connections by IP address (Query 34) (Connection Counts by IP Address)

   2: SELECT ec.client_net_address, es.[program_name], es.[host_name], es.login_name, 

   3: COUNT(ec.session_id) AS [connection count] 

   4: FROM sys.dm_exec_sessions AS es WITH (NOLOCK) 

   5: INNER JOIN sys.dm_exec_connections AS ec WITH (NOLOCK) 

   6: ON es.session_id = ec.session_id 

   7: GROUP BY ec.client_net_address, es.[program_name], es.[host_name], es.login_name  

   8: ORDER BY ec.client_net_address, es.[program_name] OPTION (RECOMPILE);

   9:  

  10: -- This helps you figure where your database load is coming from

  11: -- and verifies connectivity from other machines

Figure 2: Query #34 Connection Counts by IP Address

This query helps you see the magnitude of your workload and judge whether it is in the normal range that you should be seeing. I think it is a good idea to have a baseline for how many connections your database server typically has from whatever other machines normally connect to it. This query can also help you confirm and troubleshoot connectivity issues from other machines. I can’t tell you how many times that people have claimed my SQL Server instance was down because they could not connect to it. In the vast majority of cases, they simply had an incorrect connection string or there was a blocked port on their machine that prevented the connection. Remember, the database is always guilty until proven innocent!

SQL Server Diagnostic Information Queries Detailed, Day 13

For Day 13 of this series, we start out with Query #30, which is CPU Usage by Database. This query retrieves information from the sys.dm_exec_query_stats dynamic management view and from the sys.dm_exec_plan_attributes dynamic management function about total CPU usage by database for cached query plans. Query #30 is shown in Figure 1.

   1: -- Get CPU utilization by database (Query 30) (CPU Usage by Database)

   2: WITH DB_CPU_Stats

   3: AS

   4: (SELECT pa.DatabaseID, DB_Name(pa.DatabaseID) AS [Database Name], SUM(qs.total_worker_time/1000) AS [CPU_Time_Ms]

   5:  FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK)

   6:  CROSS APPLY (SELECT CONVERT(int, value) AS [DatabaseID] 

   7:               FROM sys.dm_exec_plan_attributes(qs.plan_handle)

   8:               WHERE attribute = N'dbid') AS pa

   9:  GROUP BY DatabaseID)

  10: SELECT ROW_NUMBER() OVER(ORDER BY [CPU_Time_Ms] DESC) AS [CPU Rank],

  11:        [Database Name], [CPU_Time_Ms] AS [CPU Time (ms)], 

  12:        CAST([CPU_Time_Ms] * 1.0 / SUM([CPU_Time_Ms]) OVER() * 100.0 AS DECIMAL(5, 2)) AS [CPU Percent]

  13: FROM DB_CPU_Stats

  14: WHERE DatabaseID <> 32767 -- ResourceDB

  15: ORDER BY [CPU Rank] OPTION (RECOMPILE);

  16:  

  17: -- Helps determine which database is using the most CPU resources on the instance

Figure 1: Query #30 CPU Usage by Database

Simply speaking, this query shows you which databases are using the most CPU resources on the instance, at least as far as their cached query plans are concerned. If you are seeing any signs of CPU pressure, this query can help point you at the correct databases to investigate further, to see what queries are using the most CPU resources. There are several other queries in this complete set that can help you find the most expensive cached stored procedures and queries.

 

Query #31 is IO Usage by Database. This query retrieves information from the sys.dm_io_virtual_file_stats dynamic management function about your total cumulative I/O usage by database since SQL Server was last started. Query #31 is shown in Figure 2.

   1: -- Get I/O utilization by database (Query 31) (IO Usage By Database)

   2: WITH Aggregate_IO_Statistics

   3: AS

   4: (SELECT DB_NAME(database_id) AS [Database Name],

   5: CAST(SUM(num_of_bytes_read + num_of_bytes_written)/1048576 AS DECIMAL(12, 2)) AS io_in_mb

   6: FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS [DM_IO_STATS]

   7: GROUP BY database_id)

   8: SELECT ROW_NUMBER() OVER(ORDER BY io_in_mb DESC) AS [I/O Rank], [Database Name], io_in_mb AS [Total I/O (MB)],

   9:        CAST(io_in_mb/ SUM(io_in_mb) OVER() * 100.0 AS DECIMAL(5,2)) AS [I/O Percent]

  10: FROM Aggregate_IO_Statistics

  11: ORDER BY [I/O Rank] OPTION (RECOMPILE);

  12:  

  13: -- Helps determine which database is using the most I/O resources on the instance

Figure 2: Query #31 IO Usage Usage by Database

The figures that this query collects are cumulative since SQL Server last started, and they include all file activity against your database data files and log files. This includes your normal database workload, plus things like index maintenance, DBCC CHECKDB activity, database backups, and any log reader activity. Because of all this, the numbers you see here might be different than you expect.

Query #32 is Total Buffer Usage by Database. This query retrieves information from the sys.dm_os_buffer_descriptors dynamic management view about your current total buffer usage by database. Query #32 is shown in Figure 3.

   1: -- Get total buffer usage by database for current instance  (Query 32) (Total Buffer Usage by Database)

   2: -- This make take some time to run on a busy instance

   3: WITH AggregateBufferPoolUsage

   4: AS

   5: (SELECT DB_NAME(database_id) AS [Database Name],

   6: CAST(COUNT(*) * 8/1024.0 AS DECIMAL (10,2))  AS [CachedSize]

   7: FROM sys.dm_os_buffer_descriptors WITH (NOLOCK)

   8: WHERE database_id <> 32767 -- ResourceDB

   9: GROUP BY DB_NAME(database_id))

  10: SELECT ROW_NUMBER() OVER(ORDER BY CachedSize DESC) AS [Buffer Pool Rank], [Database Name], CachedSize AS [Cached Size (MB)],

  11:        CAST(CachedSize / SUM(CachedSize) OVER() * 100.0 AS DECIMAL(5,2)) AS [Buffer Pool Percent]

  12: FROM AggregateBufferPoolUsage

  13: ORDER BY [Buffer Pool Rank] OPTION (RECOMPILE);

  14:  

  15: -- Tells you how much memory (in the buffer pool) 

  16: -- is being used by each database on the instance

Figure 3: Query #32 Total Buffer Usage Usage by Database

This query shows you which databases are using the most space in the SQL Server Buffer Pool. If you see a database that is using a large amount of memory in the buffer pool, you might be able to improve the situation by doing some query or index tuning, or by using SQL Server Data Compression on some of your indexes. It may be that you just have a large database that has a lot of activity, so it has a lot of data in the buffer pool, by design.

SQL Server Diagnostic Information Queries Detailed, Day 12

For Day 12 of this series, we start out with Query #27, which is Database Properties. This query retrieves information from the sys.databases catalog view, and from the sys.dm_os_performance_counters dynamic management view. Query #27 is shown in Figure 1.

   1: -- Recovery model, log reuse wait description, log file size, log usage size  (Query 27) (Database Properties)

   2: -- and compatibility level for all databases on instance

   3: SELECT db.[name] AS [Database Name], db.recovery_model_desc AS [Recovery Model], db.state_desc, db.containment_desc,

   4: db.log_reuse_wait_desc AS [Log Reuse Wait Description], 

   5: CONVERT(DECIMAL(18,2), ls.cntr_value/1024.0) AS [Log Size (MB)], CONVERT(DECIMAL(18,2), lu.cntr_value/1024.0) AS [Log Used (MB)],

   6: CAST(CAST(lu.cntr_value AS FLOAT) / CAST(ls.cntr_value AS FLOAT)AS DECIMAL(18,2)) * 100 AS [Log Used %], 

   7: db.[compatibility_level] AS [DB Compatibility Level], db.page_verify_option_desc AS [Page Verify Option], 

   8: db.is_auto_create_stats_on, db.is_auto_update_stats_on, db.is_auto_update_stats_async_on, db.is_parameterization_forced, 

   9: db.snapshot_isolation_state_desc, db.is_read_committed_snapshot_on, db.is_auto_close_on, db.is_auto_shrink_on, 

  10: db.target_recovery_time_in_seconds, db.is_cdc_enabled, db.is_published, db.is_distributor, db.is_encrypted,

  11: db.group_database_id, db.replica_id,db.is_memory_optimized_elevate_to_snapshot_on, 

  12: db.delayed_durability_desc, db.is_auto_create_stats_incremental_on,

  13: db.is_query_store_on, db.is_sync_with_backup, 

  14: db.is_supplemental_logging_enabled, db.is_remote_data_archive_enabled      

  15: FROM sys.databases AS db WITH (NOLOCK)

  16: INNER JOIN sys.dm_os_performance_counters AS lu WITH (NOLOCK)

  17: ON db.name = lu.instance_name

  18: INNER JOIN sys.dm_os_performance_counters AS ls WITH (NOLOCK)

  19: ON db.name = ls.instance_name

  20: WHERE lu.counter_name LIKE N'Log File(s) Used Size (KB)%' 

  21: AND ls.counter_name LIKE N'Log File(s) Size (KB)%'

  22: AND ls.cntr_value > 0 OPTION (RECOMPILE);

  23:  

  24: -- Things to look at:

  25: -- How many databases are on the instance?

  26: -- What recovery models are they using?

  27: -- What is the log reuse wait description?

  28: -- How full are the transaction logs?

  29: -- What compatibility level are the databases on? 

  30: -- What is the Page Verify Option? (should be CHECKSUM)

  31: -- Is Auto Update Statistics Asynchronously enabled?

  32: -- Make sure auto_shrink and auto_close are not enabled!

Figure 1: Query #27 Volume Info

This query gives you a lot of very useful information about all of your databases. I pay special attention to Log Reuse Wait Description and Log Used %, to spot problems that are preventing the space in the transaction log from being reused, which will eventually cause the log file to fill up and grow. I also look at the compatibility level (which is much more important in SQL Server 2014, with the new cardinality estimator) and the page verify option, which should always be CHECKSUM.

It is important to check the statistics-related properties, and to also make sure that auto shrink and auto close are not enabled. I don’t think I have ever seen an instance of SQL Server where I did not notice some database properties for multiple databases that were simply incorrect. This query is great for quickly spotting possible issues that need to be corrected.

 

Query #28 is Missing Indexes All Databases. This query retrieves information from the sys.dm_db_missing_index_group_stats dynamic management view, the sys.dm_db_missing_index_groups dynamic management view, and the sys.dm_db_missing_index_details dynamic management view about “missing” indexes that the SQL Server Query Optimizer thinks that it would like to have. Query #28 is shown in Figure 2.

   1: -- Missing Indexes for all databases by Index Advantage  (Query 28) (Missing Indexes All Databases)

   2: SELECT CONVERT(decimal(18,2), user_seeks * avg_total_user_cost * (avg_user_impact * 0.01)) AS [index_advantage], 

   3: migs.last_user_seek, mid.[statement] AS [Database.Schema.Table],

   4: mid.equality_columns, mid.inequality_columns, mid.included_columns,

   5: migs.unique_compiles, migs.user_seeks, migs.avg_total_user_cost, migs.avg_user_impact

   6: FROM sys.dm_db_missing_index_group_stats AS migs WITH (NOLOCK)

   7: INNER JOIN sys.dm_db_missing_index_groups AS mig WITH (NOLOCK)

   8: ON migs.group_handle = mig.index_group_handle

   9: INNER JOIN sys.dm_db_missing_index_details AS mid WITH (NOLOCK)

  10: ON mig.index_handle = mid.index_handle

  11: ORDER BY index_advantage DESC OPTION (RECOMPILE);

  12:  

  13: -- Getting missing index information for all of the databases on the instance is very useful

  14: -- Look at last user seek time, number of user seeks to help determine source and importance

  15: -- Also look at avg_user_impact and avg_total_user_cost to help determine importance

  16: -- SQL Server is overly eager to add included columns, so beware

  17: -- Do not just blindly add indexes that show up from this query!!!

Figure 2: Query #28 Missing Indexes All Databases

This query is very useful, but also very easy to misinterpret and misuse. I have seen many novice DBAs and developers use the results of this query to pretty badly over-index their databases, which affects their database size and hurts insert, update, and delete performance. I like to focus on the last_user_seek column, and see how long ago that was. Was it a few seconds or minutes ago, or was it days or weeks ago?

I then start looking at the user_seeks, avg_total_user_cost, and avg_user_impact columns to get a sense for how often SQL Server thinks it wants this proposed index, how expensive it is not to have the index, and how much the query optimizer thinks the cost of the query would be reduced if it did have this index that it is requesting.

Next, I’ll look at any other proposed indexes on the same table to see if I can come up with a wider, consolidated index that covers multiple requested indexes. Finally, I’ll look at the existing indexes on that table, and look at the index usage metrics for that table to have a better idea of whether a new index would be a good idea, based on the volatility of that table.

 

Query #29 is VLF Counts. This query calls the DBCC LogInfo command for each on of your system and user databases to get your current virtual log file (VLF) counts for your transaction logs. Query #29 is shown in Figure 3

   1: -- Get VLF Counts for all databases on the instance (Query 29) (VLF Counts)

   2: -- (adapted from Michelle Ufford) 

   3: CREATE TABLE #VLFInfo (RecoveryUnitID int, FileID  int,

   4:                        FileSize bigint, StartOffset bigint,

   5:                        FSeqNo      bigint, [Status]    bigint,

   6:                        Parity      bigint, CreateLSN   numeric(38));

   7:      

   8: CREATE TABLE #VLFCountResults(DatabaseName sysname, VLFCount int);

   9:      

  10: EXEC sp_MSforeachdb N'Use [?]; 

  11:  

  12:                 INSERT INTO #VLFInfo 

  13:                 EXEC sp_executesql N''DBCC LOGINFO([?])''; 

  14:      

  15:                 INSERT INTO #VLFCountResults 

  16:                 SELECT DB_NAME(), COUNT(*) 

  17:                 FROM #VLFInfo; 

  18:  

  19:                 TRUNCATE TABLE #VLFInfo;'

  20:      

  21: SELECT DatabaseName, VLFCount  

  22: FROM #VLFCountResults

  23: ORDER BY VLFCount DESC;

  24:      

  25: DROP TABLE #VLFInfo;

  26: DROP TABLE #VLFCountResults;

  27:  

  28: -- High VLF counts can affect write performance 

  29: -- and they can make full database restores and crash recovery take much longer

  30: -- Try to keep your VLF counts under 200 in most cases (depending on log file size)

  31:  

  32: -- Important change to VLF creation algorithm in SQL Server 2014

  33: -- http://www.sqlskills.com/blogs/paul/important-change-vlf-creation-algorithm-sql-server-2014/

Figure 3: Query #29 VLF Counts

Every time a SQL Server log file grows, a certain number of VLFs are added to the log file. By the way, the formula for this was changed in SQL Server 2014, as discussed by Paul Randal here.

The reason why you care about this is because having a higher number of VLFs affects how long the recovery portion of a database restore takes. If you have ever watched the progress update in SSMS while you were doing a restore of a large database, and noticed that it finally got to 100%, but then kept grinding away for quite a while after that, you have seen this effect in action. It is very frustrating!

It also affects how long “crash recovery” takes, which occurs for every database on the instance whenever the SQL Server Service starts. This comes into play whether it is just a standalone instance starting or when you failover to another node with a traditional SQL Server failover cluster instance (FCI). If you have ever been frustrated about how long a FCI failover takes, keeping your VLF counts under control is one thing you can do to help speed up the process.

If you have a high VLF count in a database, it is usually pretty easy to correct. If you are in the FULL recovery model, you will want to run a transaction log backup. Then, you shrink the transaction log file. Often, you will have to repeat this sequence more than once. Sometimes you will even need to generate some log activity (by doing something like reorganizing an index or two) to get the active portion of the log to a place where a shrink works.

Once you have done all of this, your transaction log file is likely to be extremely small, perhaps just a few MB in size. You will want to immediately manually grow it in relatively large increments (such as 1000MB, 2000MB, or 4000MB) until you have it at the desired size. You will also want to make sure the auto growth increment is set to a fixed size in MB (such as 1000MB, 2000MB, or 4000MB) so that any future auto growths don’t add an excessive number of VLFs.

SQL Server Diagnostic Information Queries Detailed, Day 11

Today, we will go through four queries that are related to storage. We start out with Query #23, which is Volume Info.  This query retrieves information from the sys.master_files catalog view and the sys.dm_os_volume_stats dynamic management function about any operating system volumes or mount points on your server where you have any SQL Server database files. Query #23 is shown in Figure 1.

   1: -- Volume info for all LUNS that have database files on the current instance (Query 23) (Volume Info)

   2: SELECT DISTINCT vs.volume_mount_point, vs.file_system_type, 

   3: vs.logical_volume_name, CONVERT(DECIMAL(18,2),vs.total_bytes/1073741824.0) AS [Total Size (GB)],

   4: CONVERT(DECIMAL(18,2),vs.available_bytes/1073741824.0) AS [Available Size (GB)],  

   5: CAST(CAST(vs.available_bytes AS FLOAT)/ CAST(vs.total_bytes AS FLOAT) AS DECIMAL(18,2)) * 100 AS [Space Free %] 

   6: FROM sys.master_files AS f WITH (NOLOCK)

   7: CROSS APPLY sys.dm_os_volume_stats(f.database_id, f.[file_id]) AS vs 

   8: ORDER BY vs.volume_mount_point OPTION (RECOMPILE);

   9:  

  10: -- Shows you the total and free space on the LUNs where you have database files

  11: -- Being low on free space can negatively affect performance

Figure 1: Query #23 Volume Info

This query shows you how large each operating system volume or mount point is, and how much space is available. You don’t ever want to run out of space on a volume where SQL Server database files are located (for obvious reasons), but many people are not aware that being low on space can affect the performance of the volume, with both magnetic storage and with flash storage.

With magnetic storage, being low on disk space means that you may be no longer getting the increased disk performance possible from short-stroking. This can affect your disk performance with magnetic drives because the arm that moves the drive heads back and forth to read the data has to move further as an individual disk becomes more and more full, plus the disk platters are moving slower on the inside portion compared to the outside portion of the disk.

The reason why I say that this only “may” be affecting your disk performance is because with many SANs, whatever space has been allocated for a volume may be spread across all of the available drives in the SAN, so the fact that your volume is low on space may have very little to do with what is happening with the overall SAN.

With flash storage, as the drive gets lower on space, it becomes more difficult for garbage collection and TRIM operations to complete without affecting performance. Generally speaking, your write performance will tend to decrease and become more inconsistent as the drive is closer to being full of data.

 

Query #24 is Drive Level Latency. This query retrieves information from the sys.master_files catalog view and the sys.dm_io_virtual_file_stats dynamic management function about I/O statistics for SQL Server data and log files. Query #24 is shown in Figure 2.

   1: -- Drive level latency information (Query 24) (Drive Level Latency)

   2: -- Based on code from Jimmy May

   3: SELECT tab.[Drive], tab.volume_mount_point AS [Volume Mount Point], 

   4:     CASE 

   5:         WHEN num_of_reads = 0 THEN 0 

   6:         ELSE (io_stall_read_ms/num_of_reads) 

   7:     END AS [Read Latency],

   8:     CASE 

   9:         WHEN io_stall_write_ms = 0 THEN 0 

  10:         ELSE (io_stall_write_ms/num_of_writes) 

  11:     END AS [Write Latency],

  12:     CASE 

  13:         WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 

  14:         ELSE (io_stall/(num_of_reads + num_of_writes)) 

  15:     END AS [Overall Latency],

  16:     CASE 

  17:         WHEN num_of_reads = 0 THEN 0 

  18:         ELSE (num_of_bytes_read/num_of_reads) 

  19:     END AS [Avg Bytes/Read],

  20:     CASE 

  21:         WHEN io_stall_write_ms = 0 THEN 0 

  22:         ELSE (num_of_bytes_written/num_of_writes) 

  23:     END AS [Avg Bytes/Write],

  24:     CASE 

  25:         WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 

  26:         ELSE ((num_of_bytes_read + num_of_bytes_written)/(num_of_reads + num_of_writes)) 

  27:     END AS [Avg Bytes/Transfer]

  28: FROM (SELECT LEFT(UPPER(mf.physical_name), 2) AS Drive, SUM(num_of_reads) AS num_of_reads,

  29:              SUM(io_stall_read_ms) AS io_stall_read_ms, SUM(num_of_writes) AS num_of_writes,

  30:              SUM(io_stall_write_ms) AS io_stall_write_ms, SUM(num_of_bytes_read) AS num_of_bytes_read,

  31:              SUM(num_of_bytes_written) AS num_of_bytes_written, SUM(io_stall) AS io_stall, vs.volume_mount_point 

  32:       FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs

  33:       INNER JOIN sys.master_files AS mf WITH (NOLOCK)

  34:       ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id

  35:       CROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.[file_id]) AS vs 

  36:       GROUP BY LEFT(UPPER(mf.physical_name), 2), vs.volume_mount_point) AS tab

  37: ORDER BY [Overall Latency] OPTION (RECOMPILE);

  38:  

  39: -- Shows you the drive-level latency for reads and writes, in milliseconds

  40: -- Latency above 20-25ms is usually a problem

Figure 2: Query #24 Drive Level Latency

This particular query aggregates all of the file activity for your database data and log files up to the volume level. This activity includes your normal workload, plus anything else (such as index maintenance, database backups, log reader activity for things like replication, database mirroring, and AlwaysOn AGs) that touches your database data and log files.

These numbers are cumulative since SQL Server was last started. This means that the latency numbers you see here will usually tend to be a little higher than what you might see in Performance Monitor, and also higher than what shows up at the SAN-level. This query helps you understand your disk performance for SQL Server at the volume level. Once you understand this, then you are ready to drill down to the individual database file level with the next query.

 

Query #25 is IO Stalls by File. This query retrieves information from the sys.master_files catalog view and the sys.dm_io_virtual_file_stats dynamic management function about I/O statistics for SQL Server data and log files. Query #25 is shown in Figure 3.

   1: -- Calculates average stalls per read, per write, and per total input/output for each database file  (Query 25) (IO Stalls by File)

   2: SELECT DB_NAME(fs.database_id) AS [Database Name], CAST(fs.io_stall_read_ms/(1.0 + fs.num_of_reads) AS NUMERIC(10,1)) AS [avg_read_stall_ms],

   3: CAST(fs.io_stall_write_ms/(1.0 + fs.num_of_writes) AS NUMERIC(10,1)) AS [avg_write_stall_ms],

   4: CAST((fs.io_stall_read_ms + fs.io_stall_write_ms)/(1.0 + fs.num_of_reads + fs.num_of_writes) AS NUMERIC(10,1)) AS [avg_io_stall_ms],

   5: CONVERT(DECIMAL(18,2), mf.size/128.0) AS [File Size (MB)], mf.physical_name, mf.type_desc, fs.io_stall_read_ms, fs.num_of_reads, 

   6: fs.io_stall_write_ms, fs.num_of_writes, fs.io_stall_read_ms + fs.io_stall_write_ms AS [io_stalls], fs.num_of_reads + fs.num_of_writes AS [total_io],

   7: io_stall_queued_read_ms AS [Resource Governor Total Read IO Latency (ms)], io_stall_queued_write_ms AS [Resource Governor Total Write IO Latency (ms)] 

   8: FROM sys.dm_io_virtual_file_stats(null,null) AS fs

   9: INNER JOIN sys.master_files AS mf WITH (NOLOCK)

  10: ON fs.database_id = mf.database_id

  11: AND fs.[file_id] = mf.[file_id]

  12: ORDER BY avg_io_stall_ms DESC OPTION (RECOMPILE);

  13:  

  14: -- Helps determine which database files on the entire instance have the most I/O bottlenecks

  15: -- This can help you decide whether certain LUNs are overloaded and whether you might

  16: -- want to move some files to a different location or perhaps improve your I/O performance

Figure 3: Query #25 IO Stalls by File

This query is similar to the previous query, with all of the same caveats. This query goes down to the individual database file level rather than rolling up the data to the volume level. Once you see high read or write latency for a particular volume, you will want to then determine which database file(s) are seeing the highest latency on that volume. This will help you decide what you might be able to do to improve the situation.

 

Query #26 is IO Warnings. This query reads the five most recent SQL Server Error logs to look for 15 second I/O warnings. Query #26 is shown in Figure 4.

   1: -- Look for I/O requests taking longer than 15 seconds in the five most recent SQL Server Error Logs (Query 26) (IO Warnings)

   2: CREATE TABLE #IOWarningResults(LogDate datetime, ProcessInfo sysname, LogText nvarchar(1000));

   3:  

   4:     INSERT INTO #IOWarningResults 

   5:     EXEC xp_readerrorlog 0, 1, N'taking longer than 15 seconds';

   6:  

   7:     INSERT INTO #IOWarningResults 

   8:     EXEC xp_readerrorlog 1, 1, N'taking longer than 15 seconds';

   9:  

  10:     INSERT INTO #IOWarningResults 

  11:     EXEC xp_readerrorlog 2, 1, N'taking longer than 15 seconds';

  12:  

  13:     INSERT INTO #IOWarningResults 

  14:     EXEC xp_readerrorlog 3, 1, N'taking longer than 15 seconds';

  15:  

  16:     INSERT INTO #IOWarningResults 

  17:     EXEC xp_readerrorlog 4, 1, N'taking longer than 15 seconds';

  18:  

  19: SELECT LogDate, ProcessInfo, LogText

  20: FROM #IOWarningResults

  21: ORDER BY LogDate DESC;

  22:  

  23: DROP TABLE #IOWarningResults;  

  24:  

  25: -- Finding 15 second I/O warnings in the SQL Server Error Log is useful evidence of

  26: -- poor I/O performance (which might have many different causes)

  27:  

  28: -- Diagnostics in SQL Server help detect stalled and stuck I/O operations

  29: -- https://support.microsoft.com/en-us/kb/897284

Figure 4: Query #26 IO Warnings

This query may take some time to complete if your SQL Server Error log files are very large. Finding any 15 second I/O warnings is very strong evidence that you were seeing very poor storage subsystem performance when they were recorded. The warnings go down to the database file level, so you can see which drive and which file was affected.

What you want to look for is whether there is any sort of recognizable pattern to when you are getting these warnings. For example, perhaps you often see a series of 15 second warnings for the data files for one database at around 3AM each morning. Perhaps this is when you are doing index maintenance, plus running DBCC CHECKDB at the same time, or some other repeating pattern of scheduled disk intensive activity.

On the other hand, it may be the case that you see many 15 second I/O warnings at random times, spread across many drive volumes and disk files. This would be a pretty good indicator of generally poor storage subsystem performance!

The results of these four queries are very useful to have available when you talk to your storage administrator about disk performance. The more evidence and metrics you have available about what SQL Server is seeing from a storage perspective, the stronger your case will be.

SQL Server Diagnostic Information Queries Detailed, Day 10

For Day 10 of this series, we start out with Query #21, which is Memory Dump Info.  This query retrieves information about any SQL Server memory dumps that have occurred from the sys.dm_server_memory_dumps dynamic management view. Query #21 is shown in Figure 1.

   1: -- Get information on location, time and size of any memory dumps from SQL Server  (Query 21) (Memory Dump Info)

   2: SELECT [filename], creation_time, size_in_bytes/1048576.0 AS [Size (MB)]

   3: FROM sys.dm_server_memory_dumps WITH (NOLOCK) 

   4: ORDER BY creation_time DESC OPTION (RECOMPILE);

   5:  

   6: -- This will not return any rows if you have 

   7: -- not had any memory dumps (which is a good thing)

Figure 1: Query #21 Memory Dump Info

What you want to see when you run this query is no results, which means you have not had any SQL Server memory dumps. Memory dumps occur when SQL Server runs into problems or severe errors that are not quite bad enough to cause the entire SQL Server process to crash. If you have any of these, you can sometimes open the accompanying text file and get a rough idea what might have happened. Ultimately, you will need someone who knows what they are doing to look at the dump file with a debugger and the public symbols for your version and build of SQL Server (which might be Microsoft Support). Being on the latest Service Pack and Cumulative Update for your version of SQL Server will tend to reduce the number of memory dumps that you see.

 

Query #22 is Database Filenames and Paths. This query retrieves information from the  sys.master_files catalog view about the filenames and paths for all of the database files for all of the user and system databases on your instance. Query #22 is shown in Figure 2.

   1: -- File names and paths for all user and system databases on instance  (Query 22) (Database Filenames and Paths)

   2: SELECT DB_NAME([database_id]) AS [Database Name], 

   3:        [file_id], name, physical_name, [type_desc], state_desc,

   4:        is_percent_growth, growth,

   5:        CONVERT(bigint, growth/128.0) AS [Growth in MB], 

   6:        CONVERT(bigint, size/128.0) AS [Total Size in MB]

   7: FROM sys.master_files WITH (NOLOCK)

   8: ORDER BY DB_NAME([database_id]) OPTION (RECOMPILE);

   9:  

  10: -- Things to look at:

  11: -- Are data files and log files on different drives?

  12: -- Is everything on the C: drive?

  13: -- Is TempDB on dedicated drives?

  14: -- Is there only one TempDB data file?

  15: -- Are all of the TempDB data files the same size?

  16: -- Are there multiple data files for user databases?

  17: -- Is percent growth enabled for any files (which is bad)?

Figure 2: Query #22 Database Filenames and Paths

This query lets you pretty easily see how your various database files are laid out across your file system, which is very useful to understand as you are investigating your storage subsystem performance. For example, it is fairly common to see high write latency for your tempdb database data files. Knowing that these files are located on your T: drive (for example), will help you understand the results from some of the upcoming queries more easily.

Depending of what kind of storage you have and how it is configured, you might want to separate your database data files from your log files (on separate LUNs). This old rule of thumb is not always the best thing to do when you have a small number of magnetic spindles. This query also lets you see how tempdb is configured, and whether any of your database files are using percent growth (which is a bad thing for performance).

SQL Server Diagnostic Information Queries Detailed, Day 9

For Day 9 of this series, we start out with Query #19, which is BPE Configuration.  This query retrieves information about your buffer pool extension (BPE) configuration from the sys.dm_os_buffer_pool_extension_configuration dynamic management view. Query #19 is shown in Figure 1.

   1: -- See if buffer pool extensions (BPE) is enabled (Query 19) (BPE Configuration)

   2: SELECT [path], state_description, current_size_in_kb, 

   3: CAST(current_size_in_kb/1048576.0 AS DECIMAL(10,2)) AS [Size (GB)]

   4: FROM sys.dm_os_buffer_pool_extension_configuration WITH (NOLOCK) OPTION (RECOMPILE);

   5:  

   6: -- BPE is available in both Standard Edition and Enterprise Edition

   7: -- It is a more interesting feature for Standard Edition

   8:  

   9: -- Buffer Pool Extension to SSDs in SQL Server 2014

  10: -- http://blogs.technet.com/b/dataplatforminsider/archive/2013/07/25/buffer-pool-extension-to-ssds-in-sql-server-2014.aspx

Figure 1: Query #19 BPE Configuration

BPE was a new feature that was added to SQL Server 2014. The idea behind it is that you can set aside some space in your file system for a read-only cache file of clean buffer pool pages, that makes it look like your buffer pool is larger than it actually is. SQL Server will look in the actual buffer pool first, and then in the BPE file, and finally in your storage subsystem to find the data that it needs. The ideal scenario for this to be helpful is if you have a read-intensive, OLTP workload that does a lot of random reads from your data files (but you don’t have enough RAM to fit the data that is being read into the actual buffer pool). If you are using magnetic storage that has relatively poor random read I/O performance, and if your BPE file is on fast, local flash storage, then, you might see a performance improvement from using BPE.

The problem with this in real life is that there is typically a lot of write activity to the BPE file to keep it up to date with the data that it is trying to cache. Some types of less expensive flash storage has performance issues when it is under sustained write pressure. It is also hard to drive an OLTP workload hard enough to cause it to actually use the BPE file in most scenarios, unless you set max server memory artificially low. The BPE feature is mainly interesting if you are using SQL Server 2014 Standard Edition, where you are restricted to using 128GB of RAM per instance for the Database Engine.

 

Query #20 is BPE Usage. This query retrieves data from the sys.dm_os_buffer_descriptors dynamic management view. Query #20 is shown in Figure 2.

   1: -- Look at buffer descriptors to see BPE usage by database (Query 20) (BPE Usage) 

   2: SELECT DB_NAME(database_id) AS [Database Name], COUNT(page_id) AS [Page Count],

   3: CAST(COUNT(*)/128.0 AS DECIMAL(10, 2)) AS [Buffer size(MB)], 

   4: AVG(read_microsec) AS [Avg Read Time (microseconds)]

   5: FROM sys.dm_os_buffer_descriptors WITH (NOLOCK)

   6: WHERE database_id <> 32767

   7: AND is_in_bpool_extension = 1

   8: GROUP BY DB_NAME(database_id) 

   9: ORDER BY [Buffer size(MB)] DESC OPTION (RECOMPILE);

  10:  

  11: -- You will see no results if BPE is not enabled or if there is no BPE usage

Figure 2: Query #20 BPE Usage

This query (which usually takes a few seconds to run), will show you which databases are actually using your BPE file. I have not seen too many people using the BPE feature so far, which I think is a shame. If you have the right kind of workload, it may help performance. It is certainly worth some testing to try it out.

SQL Server Diagnostic Information Queries Detailed, Day 8

For Day 8 of this series, we start out with Query #16, which is Hardware Info.  This query retrieves some hardware-related information from the sys.dm_os_sys_info dynamic management view. Query #16 is shown in Figure 1.

   1: -- Hardware information from SQL Server 2016  (Query 16) (Hardware Info)

   2: SELECT cpu_count AS [Logical CPU Count], scheduler_count, hyperthread_ratio AS [Hyperthread Ratio],

   3: cpu_count/hyperthread_ratio AS [Physical CPU Count], 

   4: physical_memory_kb/1024 AS [Physical Memory (MB)], committed_kb/1024 AS [Committed Memory (MB)],

   5: committed_target_kb/1024 AS [Committed Target Memory (MB)],

   6: max_workers_count AS [Max Workers Count], affinity_type_desc AS [Affinity Type], 

   7: sqlserver_start_time AS [SQL Server Start Time], virtual_machine_type_desc AS [Virtual Machine Type], 

   8: softnuma_configuration_desc AS [Soft NUMA Configuration]

   9: FROM sys.dm_os_sys_info WITH (NOLOCK) OPTION (RECOMPILE);

  10:  

  11: -- Gives you some good basic hardware information about your database server

  12: -- Cannot distinguish between HT and multi-core

  13: -- Note: virtual_machine_type_desc of HYPERVISOR does not automatically mean you are running SQL Server inside of a VM

  14: -- It merely indicates that you have a hypervisor running on your host

  15: -- Soft NUMA configuration is a new column for SQL Server 2016

Figure 1: Query #16 Hardware Info

This query tells you some very useful information about your hardware, even though it is frustratingly incomplete. You can determine your physical socket count, and the total number of logical processors, but you cannot tell what type of processor you have or whether Intel Hyper-Threading is enabled or not. You can also determine how much RAM is visible to the machine you are running on (physical or virtual).

You can also tell whether a hypervisor is present on the host you are running on, and whether someone has manually modified CPU process affinity. New in SQL Server 2016, you can see whether soft NUMA has been enabled. Finally, you can see the last time the SQL Server Service was started, which is very important to know when you are interpreting the results of many of the other queries in this set. It would be very nice if the results from the next two queries were also included in this DMV.

 

Query #17 is System Manufacturer.  This query tries to read the SQL Server Error log to determine the manufacturer and model number of the server that you are running on. If you are running inside of a hypervisor virtual machine, you will see that instead of the actual host hardware. Query #17 is shown in Figure 2.

   1: -- Get System Manufacturer and model number from SQL Server Error log (Query 17) (System Manufacturer)

   2: EXEC sys.xp_readerrorlog 0, 1, N'Manufacturer'; 

   3:  

   4: -- This can help you determine the capabilities and capacities of your database server

   5: -- Can also be used to confirm if you are running in a VM

   6: -- This query might take a few seconds if you have not recycled your error log recently

   7: -- This query will return no results if your error log has been recycled since the instance was started

Figure 2: Query #17 System Manufacturer

Knowing the brand and model of the server you are running on helps you figure out other useful information, such as how many PCIe expansion slots it has, how much memory it supports, and what type of processors it supports. It also helps you figure out how old it might be. This query will return no result if the SQL Server Error log has been recycled enough times to rollover since it was started. I really hate having to pull information like this from the SQL Server Error log, it would be much better to get it from a DMV.

 

Query #18 is Processor Description.  This query retrieves the processor description information from the Windows Registry. Query #18 is shown in Figure 3.

   1: -- Get processor description from Windows Registry  (Query 18) (Processor Description)

   2: EXEC sys.xp_instance_regread N'HKEY_LOCAL_MACHINE', N'HARDWARE\DESCRIPTION\System\CentralProcessor\0', N'ProcessorNameString';

   3:  

   4: -- Gives you the model number and rated clock speed of your processor(s)

   5: -- Your processors may be running at less than the rated clock speed due

   6: -- to the Windows Power Plan or hardware power management

   7:  

   8: -- You can use CPU-Z to get your actual CPU core speed and a lot of other useful information

   9: -- http://www.cpuid.com/softwares/cpu-z.html

  10:  

  11: -- You can learn more about processor selection for SQL Server by following this link

  12: -- http://www.sqlskills.com/blogs/glenn/processor-selection-for-sql-server/

Figure 3: Query #18 Processor Description

I think it is critical to know exactly what processor(s) you are running on your SQL Server instances. This helps you understand the capabilities of the system and how old it might be. It is also very important to know this if you are thinking about doing a server upgrade or consolidation effort. Even though the gory details of processors are probably not that interesting to many database professionals, it is actually pretty important for licensing purposes and for performance and scalability.

SQL Server Diagnostic Information Queries Detailed, Day 7

Note: The three queries I am discussing today are not particularly interesting or useful if you are not using either a traditional, failover cluster instance (FCI), or an AlwaysOn Availability Group.

For Day 7 of this series, we start out with Query #13, which is SQL Server Error Log. This query retrieves information from the sys.dm_os_server_diagnostics_log_configurations dynamic management view with the current configuration for the SQL Server failover cluster diagnostic log. Query #13 is shown in Figure 1.

   1: -- Shows you where the SQL Server failover cluster diagnostic log is located and how it is configured  (Query 13) (SQL Server Error Log)

   2: SELECT is_enabled, [path], max_size, max_files

   3: FROM sys.dm_os_server_diagnostics_log_configurations WITH (NOLOCK) OPTION (RECOMPILE);

   4:  

   5: -- Knowing this information is important for troubleshooting purposes

   6: -- Also shows you the location of other error and diagnostic log files

Figure 1: Query #13 SQL Server Error Log

This query basically shows you where all the SQL Server related error and diagnostic logs are located. Knowing this can be very helpful if you need to troubleshoot problems with various parts of SQL Server. For example, if the SQL Server database engine will not start, opening the SQL Server Error log in a text editor will often give you some very useful information about what the problem is.

 

Query #14 is Cluster Node Properties. This query retrieves information from the sys.dm_os_cluster_nodes dynamic management view with one row for each node in the failover cluster instance configuration. If you have a standalone instance of SQL Server, this query will return an empty rowset. Query #14 is shown in Figure 2.

   1: -- Get information about your cluster nodes and their status  (Query 14) (Cluster Node Properties)

   2: -- (if your database server is in a failover cluster)

   3: SELECT NodeName, status_description, is_current_owner

   4: FROM sys.dm_os_cluster_nodes WITH (NOLOCK) OPTION (RECOMPILE);

   5:  

   6: -- Knowing which node owns the cluster resources is critical

   7: -- Especially when you are installing Windows or SQL Server updates

   8: -- You will see no results if your instance is not clustered

Figure 2: Query #14 Cluster Node Properties

This query is useful because it lets you identify each node in the cluster, see it’s current status, and see which node currently owns the instance.

 

Query #15 is AlwaysOn AG Cluster. This query retrieves information from the sys.dm_hadr_cluster dynamic management view about the Windows Server Failover Clustering (WSFC) quorum, if one exists on the node that is hosting the SQL Server instance, and you are using an AlwaysOn AG. Query #15 is shown in Figure 3.

   1: -- Get information about any AlwaysOn AG cluster this instance is a part of (Query 15) (AlwaysOn AG Cluster)

   2: SELECT cluster_name, quorum_type_desc, quorum_state_desc

   3: FROM sys.dm_hadr_cluster WITH (NOLOCK) OPTION (RECOMPILE);

   4:  

   5: -- You will see no results if your instance is not using AlwaysOn AGs

   6:  

   7: -- Recommended hotfixes and updates for Windows Server 2012 R2-based failover clusters

   8: -- http://support.microsoft.com/kb/2920151

Figure 3: Query #15 AlwaysOn AG Cluster

This query is one way of detecting the fact that there is an AlwaysOn AG in place on your instance, and then seeing what quorum type is being used.

If you are going to be using either traditional FCIs or AlwaysOn AGs (which require WSFC, but not shared storage), it is very important that you use Windows Server 2012 or newer for your operating system. The clustering code was basically rewritten in Windows Server 2012, and it was further improved in Windows Server 2012 R2.

SQL Server Diagnostic Information Queries Detailed, Day 6

For Day 6 of this series, we start out with Query #11, which is SQL Server NUMA Info. This query retrieves information from the sys.dm_os_nodes dynamic management view about the NUMA node structure that has been created by the SQLOS. By default, this will mimic the hardware processor locality (whether it is the actual hardware of the host, or a virtual machine), but this structure can be altered by using soft-NUMA. Query #11 is shown in Figure 1.

   1: -- SQL Server NUMA Node information  (Query 11) (SQL Server NUMA Info)

   2: SELECT node_id, node_state_desc, memory_node_id, processor_group, online_scheduler_count, 

   3:        active_worker_count, avg_load_balance, resource_monitor_state

   4: FROM sys.dm_os_nodes WITH (NOLOCK) 

   5: WHERE node_state_desc <> N'ONLINE DAC' OPTION (RECOMPILE);

   6:  

   7: -- Gives you some useful information about the composition and relative load on your NUMA nodes

   8: -- You want to see an equal number of schedulers on each NUMA node

   9: -- Watch out if SQL Server 2016 Standard Edition has been installed on a machine with more than 16 physical cores

  10:  

  11: -- Balancing Your Available SQL Server Core Licenses Evenly Across NUMA Nodes

  12: -- http://www.sqlskills.com/blogs/glenn/balancing-your-available-sql-server-core-licenses-evenly-across-numa-nodes/

Figure 1: Query #11 SQL Server NUMA Info

This query tells you how many NUMA nodes are being used by SQLOS. In a modern, NUMA-capable server, you should see one NUMA node for each physical processor socket (unless someone has disabled NUMA in the UEFI/BIOS). So a two-socket server should have two NUMA nodes. If you are using virtualization, you might see a  VM with only one NUMA node and many virtual processor cores, or whoever created the VM might have used multiple NUMA nodes with fewer virtual processor cores per NUMA node. You also want to confirm that you are seeing the same online_scheduler_count on each NUMA node.

Query #12 is System Memory. This query retrieves information from the sys.dm_os_sys_memory dynamic management view about the memory usage by the operating system. Query #12 is shown in Figure 2.

   1: -- Good basic information about OS memory amounts and state  (Query 12) (System Memory)

   2: SELECT total_physical_memory_kb/1024 AS [Physical Memory (MB)], 

   3:        available_physical_memory_kb/1024 AS [Available Memory (MB)], 

   4:        total_page_file_kb/1024 AS [Total Page File (MB)], 

   5:        available_page_file_kb/1024 AS [Available Page File (MB)], 

   6:        system_cache_kb/1024 AS [System Cache (MB)],

   7:        system_memory_state_desc AS [System Memory State]

   8: FROM sys.dm_os_sys_memory WITH (NOLOCK) OPTION (RECOMPILE);

   9:  

  10: -- You want to see "Available physical memory is high" for System Memory State

  11: -- This indicates that you are not under external memory pressure

Figure 2: Query #12 System Memory

This query tells you how much memory is visible to the operating system (whether it is the host OS or the guest OS for a virtual machine). It also tells you how much memory is available to the operating system, which is normally a relatively low amount when SQL Server has seen a normal workload. It is very important to look at the system_memory_state_desc column result, which is the result of two separate flags about the system memory state.

What you want to see is “Available physical memory is high”, which means that the OS has plenty of available memory, and you are not under external memory pressure. What you might see in some cases is  “Available physical memory is steady”, or “Available physical memory is low”, which means that you are under external memory pressure, which is something you don’t want to see.

SQL Server Diagnostic Information Queries Detailed, Day 5

For Day 5 of the series, we have Query #9, which is SQL Server Agent Alerts. This query looks at the sysalerts table on the msdb system database. Query #9 is shown in Figure 1.

   1: -- Get SQL Server Agent Alert Information (Query 9) (SQL Server Agent Alerts)

   2: SELECT name, event_source, message_id, severity, [enabled], has_notification, 

   3:        delay_between_responses, occurrence_count, last_occurrence_date, last_occurrence_time

   4: FROM msdb.dbo.sysalerts WITH (NOLOCK)

   5: ORDER BY name OPTION (RECOMPILE);

   6:  

   7: -- Gives you some basic information about your SQL Server Agent Alerts (which are different from SQL Server Agent jobs)

   8: -- Read more about Agent Alerts here: http://www.sqlskills.com/blogs/glenn/creating-sql-server-agent-alerts-for-critical-errors/

Figure 1: Query #9 SQL Server Agent Alerts

In my experience, probably 90% of the SQL Server instances that I look at don’t have any SQL Server Agent Alerts in place, often because many DBAs are not sure what they are or why they are useful.

Despite the name, SQL Server Agent Alerts have very little to do with SQL Server Agent jobs. I have run into quite a few DBAs that mistakenly assume that SQL Server Agent Alerts are what occur when SQL Server Agent jobs fail. This is not the case. These are actually special alerts that you can create to detect when certain errors or other conditions occur, such as SQL Server performance condition alerts or WMI event alerts.

When one of these alerts is triggered, you can have SQL Server Agent run a SQL Server Agent job and/or notify a SQL Server operator via e-mail or pager/smart phone. This makes it much more likely that someone will notice a problem sooner rather than later. This blog post explains SQL Server Agent Alerts in more detail, and it also has a link to where you can download a T-SQL script that I wrote that you can use to create some very useful Agent Alerts on your server.

 

Query #10 is Windows Info. This query retrieves information from the sys.dm_os_windows_info dynamic management view about the operating system that you are running your instance of SQL Server on. Query #10 is shown in Figure 2.

   1: -- Windows information (Query 10) (Windows Info)

   2: SELECT windows_release, windows_service_pack_level, 

   3:        windows_sku, os_language_version

   4: FROM sys.dm_os_windows_info WITH (NOLOCK) OPTION (RECOMPILE);

   5:  

   6: -- Gives you major OS version, Service Pack, Edition, and language info for the operating system

   7: -- 10.0 is either Windows 10 or Windows Server 2016

   8: -- 6.3 is either Windows 8.1 or Windows Server 2012 R2 

   9: -- 6.2 is either Windows 8 or Windows Server 2012

  10: -- 6.1 is either Windows 7 or Windows Server 2008 R2

  11: -- 6.0 is either Windows Vista or Windows Server 2008

  12:  

  13: -- Windows SKU codes

  14: -- 4 is Enterprise Edition

  15: -- 7 is Standard Server Edition

  16: -- 8 is Datacenter Server Edition

  17: -- 10 is Enterprise Server Edition

  18: -- 48 is Professional Edition

  19:  

  20: -- 1033 for os_language_version is US-English

  21:  

  22: -- SQL Server 2014 requires Windows Server 2012 or newer

  23:  

  24: -- Quick-Start Installation of SQL Server 2016

  25: -- https://msdn.microsoft.com/en-us/library/bb500433(v=sql.130).aspx

  26:  

  27: -- Hardware and Software Requirements for Installing SQL Server 2016

  28: -- https://msdn.microsoft.com/en-us/library/ms143506(v=sql.130).aspx

  29:  

  30: -- Using SQL Server in Windows 8, Windows 8.1, Windows Server 2012 and Windows Server 2012 R2 environments

  31: -- http://support.microsoft.com/kb/2681562

Figure 2: Query #10 Windows Info

Unfortunately, the raw information that this query returns is a little cryptic. For example, the windows_release column returns a numeric value for the major release number of the operating system, such as 6.3. Unless you have a very good knowledge for computer trivia, you might not know that 6.2 means either Windows 8.1 or Windows Server 2012 R2. That is why I list the relevant possibilities in the comments below the query.

You also cannot tell directly from the version number alone whether you are dealing with the server operating system (like Windows Server 2012 R2) or a client operating system (like Windows 8.1), even though you should not be running a Production instance of SQL Server on a client operating system. The windows_sku column can sometimes help you indirectly determine that.

Knowing the details of the operating edition version and edition can be very useful. For example, Windows Server 2008 R2 Standard Edition is limited to 32GB of RAM, which often catches people by surprise.