This is a performance tuning post that's been on my to-do list for quite a while. Wait stats analysis is a great way of looking at the symptoms of performance problems (see my Wait Stats category for more posts on this) but using the sys.dm_os_wait_stats DMV shows everything that's happening on a server. If you want to see what waits are occuring because of a single query or operation on a production system (e.g. the effect of MAXDOP hints on number and length of CXPACKET waits for a query) then using the DMV isn't usually feasible - you'd need to clear the wait stats and then have nothing else running on the system except the query/operation you want to investigate. The way to do it is with extended events.

Whenever a thread state changes to SUSPENDED because of a wait, the wait_info event from the sqlos event package fires to mark the beginning of the wait. When the thread gets back on the CPU again, the wait_info event fires again, noting the total wait time and the signal wait time (spent on the RUNNABLE queue). Just like with the DMV, we need to calculate the resource wait time (spent being SUSPENDED) ourselves.

Here's a simple example for you - imagine we want to investigate the waits that occur during an index rebuild. First thing is to create the event session:

-- Drop the session if it exists.
IF EXISTS (SELECT * FROM sys.server_event_sessions WHERE name = 'MonitorWaits')
    DROP EVENT SESSION MonitorWaits ON SERVER
GO

CREATE EVENT SESSION MonitorWaits ON SERVER
ADD EVENT sqlos.wait_info
    (WHERE sqlserver.session_id = 53 /* session_id of connection to monitor */)
ADD TARGET package0.asynchronous_file_target
    (SET FILENAME = N'C:\SQLskills\EE_WaitStats.xel',
    METADATAFILE = N'C:\SQLskills\EE_WaitStats.xem')
WITH (max_dispatch_latency = 1 seconds);
GO

You'll need to plug in the session ID of the SQL Server connection where you'll run the operation under investigation. I created a dummy production database with a table and some data so I can monitor the waits from rebuilding its clustered index.

You can also filter out any wait types you're not interested in by adding another predicate to the session that filters on the sqlos.wait_type. You'll need to specify numbers, and you can list the mapping between numbers and human-understandable wait types using this code:

SELECT xmv.map_key, xmv.map_value
FROM sys.dm_xe_map_values xmv
JOIN sys.dm_xe_packages xp
    ON xmv.object_package_guid = xp.guid
WHERE xmv.name = 'wait_types'
    AND xp.name = 'sqlos';
GO

Now I turn on the session and run my rebuild.

-- Start the session
ALTER EVENT SESSION MonitorWaits ON SERVER STATE = START;
GO

-- Go do the query

-- Stop the event session
ALTER EVENT SESSION MonitorWaits ON SERVER STATE = STOP;
GO

And then I can see how many events fired:

SELECT COUNT (*)
FROM sys.fn_xe_file_target_read_file
    ('C:\SQLskills\EE_WaitStats*.xel',
    'C:\SQLskills\EE_WaitStats*.xem', null, null);
GO

13324

And then pull them into a temporary table and aggregate them (various ways of doing this, I prefer this one):

-- Create intermediate temp table for raw event data
CREATE TABLE #RawEventData (
    Rowid  INT IDENTITY PRIMARY KEY,
    event_data XML);
 
GO

-- Read the file data into intermediate temp table
INSERT INTO #RawEventData (event_data)
SELECT
    CAST (event_data AS XML) AS event_data
FROM sys.fn_xe_file_target_read_file (
    'C:\SQLskills\EE_WaitStats*.xel',
    'C:\SQLskills\EE_WaitStats*.xem', null, null);
GO

SELECT
    waits.[Wait Type],
    COUNT (*) AS [Wait Count],
    SUM (waits.[Duration]) AS [Total Wait Time (ms)],
    SUM (waits.[Duration]) - SUM (waits.[Signal Duration]) AS [Total Resource Wait Time (ms)],
    SUM (waits.[Signal Duration]) AS [Total Signal Wait Time (ms)]
FROM
    (SELECT
        event_data.value ('(
/event/@timestamp)[1]', 'DATETIME') AS [Time],
        event_data.value ('(/event/data[@name=''wait_type'']/text)[1]', 'VARCHAR(100)') AS [Wait Type],
        event_data.value ('(/event/data[@name=''opcode'']/text)[1]', 'VARCHAR(100)') AS [Op],
        event_data.value ('(/event/data[@name=''duration'']/value)[1]', 'BIGINT') AS [Duration],
        event_data.value ('(/event/data[@name=''signal_duration'']/value)[1]', 'BIGINT') AS [Signal Duration]
     FROM #RawEventData
    ) AS waits
WHERE waits.[op] = 'End'
GROUP BY waits.[Wait Type]
ORDER BY [Total Wait Time (ms)] DESC;
GO

-- Cleanup
DROP TABLE #RawEventData;
GO

Wait Type             Wait Count  Total Wait Time (ms) Total Resource Wait Time (ms) Total Signal Wait Time (ms)
--------------------- ----------- -------------------- ----------------------------- ---------------------------
LATCH_EX              4919        15783                15693                         90
LATCH_SH              18          4399                 4398                          1
CXPACKET              16          3039                 3038                          1
PAGEIOLATCH_SH        813         468                  457                           11
PAGELATCH_SH          246         113                  109                           4
WRITELOG              99          71                   66                            5
PAGELATCH_UP          203         65                   63                            2
LOGBUFFER             123         58                   57                            1
WRITE_COMPLETION      82          55                   55                            0
PAGEIOLATCH_UP        35          27                   27                            0
IO_COMPLETION         80          25                   25                            0
SQLTRACE_LOCK         1           6                    6                             0
PAGEIOLATCH_EX        4           3                    3                             0
CMEMTHREAD            17          1                    0                             1
SOS_SCHEDULER_YIELD   6           0                    0                             0

You can easily change this to give average times and add filters to remove uninteresting waits.

If you're interested in getting more involved with extended events, check out Jonathan's excellent series from last December.

Happy waiting!

I just had to figure out how to do this so I figured a quick blog post is in order to save other people time in future.

If you ever need to use windbg to debug a SQL Server crash dump, or you want to capture call stacks using extended events (e.g. when debugging excessive spinlock contention), you'll need the correct symbol file (sqlservr.pdb) to go with sqlservr.exe of the instance you're interested in.

KB article 311503 has details on how to do this in general, but they're a little cryptic.

Assuming you've already installed the Windows debugging tools (32-bit or 64-bit), you'll have a tool called symchk in the folder where windbg resides (for my laptop, "C:\Program Files\Debugging Tools for Windows (x64)") - this is what will pull down sqlservr.pdb.

You need to point symchk at the executable you're interested in, tell it where to put the sqlservr.pdb, and tell it the location of the Microsoft symbol server.

For me, the following worked:

C:\>cd C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Binn

C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Binn>"C:\Program
 Files\Debugging Tools for Windows (x64)\symchk" sqlservr.exe /s SRV*c:\symbols*
http://msdl.microsoft.com/download/symbols

SYMCHK: FAILED files = 0
SYMCHK: PASSED + IGNORED files = 1

And then I copied the sqlservr.pdb into the \Binn directory.

Enjoy!

As part of the 2008 DBA class we're teaching down here in Melbourne, I did a demo of using predicates and file targets with extended events, so I want to blog the script for people to play with.

For background info on extended events see:

In this scenario, I'd like to track queries that make heavy usage of the CPU on my system.

First off I'll create a test database to play with.

USE MASTER;
GO
IF DATABASEPROPERTYEX ('production', 'Version') > 0 DROP DATABASE production;
GO

CREATE DATABASE production;
GO
USE production;
GO

CREATE TABLE t1 (c1 INT IDENTITY, c2 UNIQUEIDENTIFIER ROWGUIDCOL DEFAULT NEWID(), c3 CHAR (5000) DEFAULT 'a');
CREATE CLUSTERED INDEX t1_CL ON t1 (c1);
CREATE NONCLUSTERED INDEX t1_NCL ON t1 (c2);
GO

SET NOCOUNT ON;
INSERT INTO t1 DEFAULT VALUES;
GO 1000

-- Get the database ID to plug into the event session
SELECT DB_ID ('production');
GO

Notice that I grabbed the database ID of the new database I created, as I'll need that when defining the extended event predicate.

IF EXISTS (SELECT * FROM sys.server_event_sessions WHERE name = 'EE_ExpensiveQueries')
   DROP EVENT SESSION EE_ExpensiveQueries ON SERVER;
GO

CREATE EVENT SESSION EE_ExpensiveQueries ON SERVER
ADD EVENT sqlserver.sql_statement_completed
   (ACTION (sqlserver.sql_text, sqlserver.plan_handle)
      WHERE sqlserver.database_id = 18 /*DBID*/  AND cpu > 10 /*total ms of CPU time*/)
ADD TARGET package0.asynchronous_file_target
   (SET FILENAME = N'C:\SQLskills\EE_ExpensiveQueries.xel', METADATAFILE = N'C:\SQLskills\EE_ExpensiveQueries.xem')
WITH (max_dispatch_latency = 1 seconds);
GO

I'm monitoring the event that fires whenever a statement completes, and I'm filtering by database ID (make sure you plug in the correct database ID when you try this yourself) and by the number of milliseconds of CPU time the statement used. Note: when using predicates you should always make sure that you order the predicate tests such that the most restrictive predicates (those most likely to evaluate to false) are first in the list, so the predicate evaluation 'short-circuits' as quickly as possible (a standard programming practice with boolean logic). I'm also using a file target, just to show how it can be done, instead of the ring buffer which is commonly used for extended events demos.

Now I'm going to turn on the event session and do some queries in the production database.

ALTER EVENT SESSION EE_ExpensiveQueries ON SERVER STATE = START;
GO

USE production;
GO

SELECT COUNT (*) FROM t1 WHERE c1 > 500;
GO

SELECT SUM (c1) FROM t1 WHERE c3 LIKE 'a';
GO

ALTER INDEX t1_CL ON t1 REORGANIZE;
GO

ALTER INDEX t1_CL ON t1 REBUILD;
GO

That should have generated some long-running queries. Now I'll switch in to the context of master (so the querying of the event session itself doesn't get captured by the event session) and see what I've captured.

USE master;
GO

SELECT COUNT (*) FROM sys.fn_xe_file_target_read_file
   ('C:\SQLskills\EE_ExpensiveQueries*.xel', 'C:\SQLskills\EE_ExpensiveQueries*.xem', NULL, NULL);
GO

In this case I've got 3 entries. I can pull these out using the following code:

SELECT data FROM
   (SELECT CONVERT (XML, event_data) AS data FROM sys.fn_xe_file_target_read_file
      ('C:\SQLskills\EE_ExpensiveQueries*.xel', 'C:\SQLskills\EE_ExpensiveQueries*.xem', NULL, NULL)
 ) entries;
GO

But I get three XML blobs back, like so:

 

What's more useful is to pull everything out of the XML blob programmatically using the code below:

SELECT
   data.value (
      '(/event[@name=''sql_statement_completed'']/@timestamp)[1]', 'DATETIME') AS [Time],
   data.value (
      '(/event/data[@name=''cpu'']/value)[1]', 'INT') AS [CPU (ms)],
      CONVERT (FLOAT, data.value ('(/event/data[@name=''duration'']/value)[1]', 'BIGINT')) / 1000000
      AS [Duration (s)],
   data.value (
      '(/event/action[@name=''sql_text'']/value)[1]', 'VARCHAR(MAX)') AS [SQL Statement],
      SUBSTRING (data.value ('(/event/action[@name=''plan_handle'']/value)[1]', 'VARCHAR(100)'), 15, 50)
      AS [Plan Handle]
FROM
   (SELECT CONVERT (XML, event_data) AS data FROM sys.fn_xe_file_target_read_file
      ('C:\SQLskills\EE_ExpensiveQueries*.xel', 'C:\SQLskills\EE_ExpensiveQueries*.xem', null, null)
) entries
ORDER BY [Time] DESC;
GO

Time                    CPU (ms) Duration (s) SQL Statement                              Plan Handle
----------------------- -------- ------------ ------------------------------------------ --------------------------------------------------
2009-10-16 17:59:29.623 30       1.214875     ALTER INDEX t1_CL ON t1 REBUILD;           0x06001000EB672A07B8C0C807000000000000000000000000
2009-10-16 17:59:28.407 20       0.024076     ALTER INDEX t1_CL ON t1 REORGANIZE;        0x0600100003594903B8C0C807000000000000000000000000
2009-10-16 17:59:28.343 51       0.045144     SELECT SUM (c1) FROM t1 WHERE c3 LIKE 'a'; 0x06001000FAF5B11EB820C307000000000000000000000000

Now I can plug in one of the plan handles to a query of sys.dm_exec_query_plan to get the graphical query plan:

SELECT [query_plan] FROM sys.dm_exec_query_plan (0x06001000FAF5B11EB820C307000000000000000000000000);
GO

And clicking on the resulting XML 'link' gives the query plan:

 

 

And now I can tweak the production workload to potentially behave better.

Just for kicks I went back into the context of the production database and ran the XML parsing again to capture the query plan - try it yourself - pretty gnarly! :-)

Now, if I was going to make this more useful, I'd use a ring buffer target, with a polling mechanism every few seconds to make sure that I can capture the graphical query plan for expensive queries before the plan is pushed out of cache - maybe I'll get around to doing that and publish the complete solution.

In the meantime, here's a zip file containing the entire script that you can download and play with: EE_ExpensiveQueries.zip (1.72 kb)

Enjoy!

In the TechNet Magazine article I wrote on Advanced Troubleshooting with Extended Events, I mentioned the always-on event session called system_health. Jonathan Kehayias, a fellow MVP and blogging mad-man, posted a great article with SQL Server Central today about how to get historical deadlock graph info from it. His article explains some of the pre-reqs for doing this (like installing 2008 RTM CU1 - see KB 956717 - to get it working and a bug with the XML output) and gives some code.

After wrestling with the download of CU1 (it took 5 tries to get it to download!), I got it installed in one of my VPCs and gave Jon's code a whirl. I forced a deadlock by creating two tables, locking them from separate transactions and then trying to select from the other table. See the code below:

-- Connection 1:
CREATE TABLE t1 (c1 INT);
INSERT INTO t1 VALUES (1);
GO

BEGIN TRAN
UPDATE t1 SET c1 = 2;
GO

-- Connection 2:
CREATE TABLE t2 (c1 INT);
INSERT INTO t2 VALUES (1);
GO

BEGIN TRAN
UPDATE t2 SET c1 = 2;
GO

-- Connection 1:
SELECT * FROM t2;
GO

-- Connection 2:
SELECT * FROM t1;
GO

And one of them will be killed by the deadlock monitor.

Msg 1205, Level 13, State 45, Line 1
Transaction (Process ID 57) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

Now running Jon's (slightly reformatted) code:

SELECT CAST (
    REPLACE (
        REPLACE (
            XEventData.XEvent.value ('(data/value)[1]', 'varchar(max)'),
            '<victim-list>', '<deadlock><victim-list>'),
        '<process-list>', '</victim-list><process-list>')
    AS XML) AS DeadlockGraph
FROM (SELECT CAST (target_data AS XML) AS TargetData
    FROM sys.dm_xe_session_targets st
    JOIN sys.dm_xe_sessions s ON s.address = st.event_session_address
    WHERE [name] = 'system_health') AS Data
CROSS APPLY TargetData.nodes ('//RingBufferTarget/event') AS XEventData (XEvent)
    WHERE XEventData.XEvent.value('@name', 'varchar(4000)') = 'xml_deadlock_report';

Gives the XML deadlock graph (which unfortunately can't be displayed graphically because the format differs from the Profiler/trace-flag output)

<deadlock-list>
  <deadlock>
    <victim-list>
      <victimProcess id="process53d9000" />
    </victim-list>
    <process-list>
      <process id="process53d9000" taskpriority="0" logused="256" waitresource="RID: 1:1:304:0" waittime="331" ownerId="1868" transactionname="user_transaction" lasttranstarted="2009-02-23T13:32:05.100" XDES="0x4eb8c10" lockMode="S" schedulerid="1" kpid="2216" status="suspended" spid="57" sbid="0" ecid="0" priority="0" trancount="1" lastbatchstarted="2009-02-23T13:32:33.093" lastbatchcompleted="2009-02-23T13:32:05.100" clientapp="Microsoft SQL Server Management Studio - Query" hostname="CHICAGO" hostpid="2552" loginname="CHICAGO\Administrator" isolationlevel="read committed (2)" xactid="1868" currentdb="1" lockTimeout="4294967295" clientoption1="671090784" clientoption2="390200">
        <executionStack>
          <frame procname="" line="1" sqlhandle="0x02000000c70b7b1daba27f2ddf3e772600949464d524b6f2" />
        </executionStack>
        <inputbuf>
SELECT * FROM t1;
    </inputbuf>
      </process>
      <process id="process4985558" taskpriority="0" logused="256" waitresource="RID: 1:1:294:0" waittime="10181" ownerId="1877" transactionname="user_transaction" lasttranstarted="2009-02-23T13:32:08.067" XDES="0x4eb9be0" lockMode="S" schedulerid="1" kpid="2020" status="suspended" spid="56" sbid="0" ecid="0" priority="0" trancount="1" lastbatchstarted="2009-02-23T13:32:23.240" lastbatchcompleted="2009-02-23T13:32:08.067" clientapp="Microsoft SQL Server Management Studio - Query" hostname="CHICAGO" hostpid="2552" loginname="CHICAGO\Administrator" isolationlevel="read committed (2)" xactid="1877" currentdb="1" lockTimeout="4294967295" clientoption1="671090784" clientoption2="390200">
        <executionStack>
          <frame procname="" line="1" sqlhandle="0x02000000f4d34532698b7ad324df813feb2ba5024730cb79" />
        </executionStack>
        <inputbuf>
SELECT * FROM t2;
    </inputbuf>
      </process>
    </process-list>
    <resource-list>
      <ridlock fileid="1" pageid="304" dbid="1" objectname="" id="lock47aea80" mode="X" associatedObjectId="72057594039042048">
        <owner-list>
          <owner id="process4985558" mode="X" />
        </owner-list>
        <waiter-list>
          <waiter id="process53d9000" mode="S" requestType="wait" />
        </waiter-list>
      </ridlock>
      <ridlock fileid="1" pageid="294" dbid="1" objectname="" id="lock47af200" mode="X" associatedObjectId="72057594038976512">
        <owner-list>
          <owner id="process53d9000" mode="X" />
        </owner-list>
        <waiter-list>
          <waiter id="process4985558" mode="S" requestType="wait" />
        </waiter-list>
      </ridlock>
    </resource-list>
  </deadlock>
</deadlock-list>

Cool!

Microsoft's Steffen Krause has written an excellent whitepaper on Tuning the Performance of Change Data Capture in SQL Server 2008, that I technically reviewed, and it was published late last year. To get an overview of Change Data Capture (CDC) before reading the whitepaper, see the TechNet Magazine article I wrote for the November issue, titled SQL Server 2008: Tracking Changes in Your Enterprise Database. The new whitepaper covers:

  • Configuration of sys.sp_cdc_enable_table parameters
  • Configuration of the CDC capture job
  • Using Extended Events to determine the performance and characteristics of CDC
  • Influence of CDC scan job parameters on CDC performance
  • Influence of workload characteristics on CDC performance
  • Influence of sys.sp_cdc_enable_table parameters on CDC performance
  • CDC cleanup job considerations
  • Transaction log file considerations

Check it out at http://msdn.microsoft.com/en-us/library/dd266396.aspx.

The January 2009 issue of TechNet Magazine is now available on the web and has a new article I wrote on Advanced Troubleshooting with Extended Events in SQL Server 2008. The article covers:

  • An overview of troubleshooting in SQL Server, along with links to a bunch of tools like DMVStats and RML
  • An overview of extended events, with descriptions of all the parts of the feature and how to investigate them with the new DMVs
  • Performance considerations of how you setup an extended events session, especially on multi-core servers
  • An example where I build an I/O chargeback mechanism to tie into a system controlled by the SQL Server 2008 resource governor

You can get to the article here, which also has a link to a screencast of me doing a demo. And you can bet that this is an area I'll be blogging about going forward too.

Enjoy!

PS This issue also has the editorial where I get made a Contributing Editor Smile

The second part of our radio interview with TechNet has been released (see here for part 1). In this installment we discuss troubleshooting and manageability in SQL Server 2008. You can get to it by going to the March 11th 2008 show here.

Enjoy!

Phew - last week Kimberly and I spent 3 days teaching the ins-and-outs of SQL Server 2008 for DBAs/IT-Pros to about 130 Microsoft SQL Server experts and MVPs (like Kalen Delaney, Adam Machanic and Ron Talmage). This was the (95% complete) Beta delivery of a course we've been developing for the last six months for Microsoft that they'll use to train their SQL experts around the world on the new release. It's been very interesting watching the features develop through the CTPs (especially since I left the fold last August) - and making demos work on pre-release builds of the CTPs.

Teaching the course was a *blast* - the thing I love about teaching a really geeky crowd is the plethora of great questions and opportunities for going deep with explanations. Our team actually wrote and delivered the concurrently presented Developer and BI tracks as well. As you can see from the list below (and this is just the features a DBA needs to use/know about), SQL Server 2008 isn't a dot release of Yukon at all, as some people have suggested. Over the three days we covered:

  • Database Mirroring (D)
  • Backup Compression
  • Peer-to-Peer Replication (D)
  • Transparent Data Encryption (D)
  • Extensible (Off-Box) Key Management
  • All Actions Audited (D)
  • Policy-Based Management
  • Resource Governor (D)
  • Extended Events (D)
  • Spatial Indexes
  • Integrated Full-Text Search
  • Sparse Columns (D)
  • Filtered Indexes
  • Change Tracking
  • Change Data Capture (D)
  • FILESTREAM (D)
  • Performance Data Collection
  • Query Optimizer Enhancements
  • Data Compression (D)
  • Service Broker
  • Partition-Level Lock Escalation (D)

The features marked with a (D) are ones I demo'd during the course (Kimberly demo'd a bunch of the others - especially the tools features). Some of the demos were challenging to make work in time as we only got a pre-CTP6 build mid-January just before we headed off to China.

So why am I posting this? Well, a bunch of these features are in CTP-6, which should be just around the corner, and I have some easy-to-understand demos of them that I'll be posting here over the next month or so. Also, if this course sounds interesting, Kimberly and I will be teaching it in various configurations over the next year - starting with SQL Connections in April, a soon-to-be-announced class in Iceland in March, and the ITPro portion of TechEd in June.

Watch this space starting next week (today's the last day of six straight weeks of teaching for us so this weekend's a break :-))

Theme design by Nukeation based on Jelle Druyts