Updated sys.dm_os_waiting_tasks script to correctly parse nodeId

I realized that I’d fixed an issue with my sys.dm_os_waiting_tasks script to correctly parse out the parallelism nodeId from the resource_description column (as newer versions include more info after the nodeId=X info) but I never blogged the update. Here it is for your use(and all other references have been updated to point to this post).

Enjoy!

(Note that ‘text’ on one line does not have delimiters because that messes up the code formatting plugin):

/*============================================================================
  File:     WaitingTasks.sql

  Summary:  Snapshot of waiting tasks

  SQL Server Versions: 2005 onward
------------------------------------------------------------------------------
  Written by Paul S. Randal, SQLskills.com

  (c) 2019, SQLskills.com. All rights reserved.

  For more scripts and sample code, check out 
    http://www.SQLskills.com

  You may alter this code for your own *non-commercial* purposes. You may
  republish altered code as long as you include this copyright and give due
  credit, but you must obtain prior permission before blogging this code.
  
  THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF 
  ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED 
  TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
  PARTICULAR PURPOSE.
============================================================================*/
SELECT
    [owt].[session_id] AS [SPID],
    [owt].[exec_context_id] AS [Thread],
    [ot].[scheduler_id] AS [Scheduler],
    [owt].[wait_duration_ms] AS [wait_ms],
    [owt].[wait_type],
    [owt].[blocking_session_id] AS [Blocking SPID],
    [owt].[resource_description],
    CASE [owt].[wait_type]
        WHEN N'CXPACKET' THEN
            SUBSTRING ( -- earlier versions don't have anything after the nodeID...
                [owt].[resource_description],
                CHARINDEX (N'nodeId=', [owt].[resource_description]) + 7,
                CHARINDEX (N' tid=', [owt].[resource_description] + ' tid=') -
                    CHARINDEX (N'nodeId=', [owt].[resource_description]) - 7
            )
        ELSE NULL
    END AS [Node ID],
    [eqmg].[dop] AS [DOP],
    [er].[database_id] AS [DBID],
    CAST ('https://www.sqlskills.com/help/waits/' + [owt].[wait_type] as XML) AS [Help/Info URL],
    [eqp].[query_plan],
    [est].text
FROM sys.dm_os_waiting_tasks [owt]
INNER JOIN sys.dm_os_tasks [ot] ON
    [owt].[waiting_task_address] = [ot].[task_address]
INNER JOIN sys.dm_exec_sessions [es] ON
    [owt].[session_id] = [es].[session_id]
INNER JOIN sys.dm_exec_requests [er] ON
    [es].[session_id] = [er].[session_id]
FULL JOIN sys.dm_exec_query_memory_grants [eqmg] ON
    [owt].[session_id] = [eqmg].[session_id]
OUTER APPLY sys.dm_exec_sql_text ([er].[sql_handle]) [est]
OUTER APPLY sys.dm_exec_query_plan ([er].[plan_handle]) [eqp]
WHERE
    [es].[is_user_process] = 1
ORDER BY
    [owt].[session_id],
    [owt].[exec_context_id];
GO

The Curious Case of… what is the wait resource (0:0:0)?

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

On Twitter today under the #sqlhelp hash tag, a former student asked what is the wait resource (0:0:0)?

This can show up in a couple of places:

  • In the output from sys.dm_exec_requests in the wait_resource column when the last_wait_type is PAGELATCH_EX or PAGELATCH_SH
  • In the output from sys.sysprocesses in the waitresource column when the lastwaittype is PAGELATCH_EX or PAGELATCH_SH

(as sys.sysprocesses is basically the same code as the DMV under the covers)

The simple answer is that in some cases of PAGELATCH_SH or PAGELATCH_EX waits in tempdb, the wait resource simply isn’t preserved by the code and so the wait resource for the last wait type is set to (0:0:0) in the output of those queries. It doesn’t mean anything special, and I’ve only ever seen it in tempdb. It’s been that way since SQL Server 2005 and is still there in all the latest versions today.

Simple answer: ignore it and troubleshoot the other waits.

The Curious Case of… faster disks causing more WRITELOG waits

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

I was contacted last week by someone who was asking how are faster disks causing more WRITELOG waits. They were seeing lots of these waits, with an average wait time of 18ms. The log was stored on a Raid-1 array, using locally-attached spinning disks in the server. They figured that by moving the log to Raid-1 array of SSDs, they’d reduce the WRITELOG wait time and get better workload throughput.

They did so and got better performance, but were very surprised to now see WRITELOG as the most frequent wait type on the server, even though the average wait time was less than 1ms, and asked me to explain.

Let me start by saying this is not unusual, and I’ve seen this sequence of events many times with clients and students.

WRITELOG wait occurs when a log block is being flushed to disk, and the two most common causes of this are that the log block is full (60KB) or a transaction committed. If it’s the case that a workload is comprised of lots of very small transactions, then a reduction in wait time for the log block flush from 18ms down to less than 1ms is very likely to result in a higher throughput. This means more transactions happening and hence more log block flushes when the transactions commit, so *more* WRITELOG waits.

Bottom line: What was confusing to them was that performance tuning is usually about reducing the incidence of waits and/or the individual wait times. This is about the only example I can think of where you do something that increases performance and the wait type you were concerned about becomes more prevalent.