I had a recent consulting engagement where a query had unpredictable performance on an isolated test environment.  I had two initial questions:

- What were the wait stats associated with the unpredictable query?

- What did the query execution plan look like in the “good” versus “bad” condition?

To address the query wait stats question, I set up an Extended Events session to track the query’s accumulated wait stats for each execution.  The performance issue would only happen a couple of times today, so I scheduled a job to loop the execution – without disconnecting, and keeping the session ID being referenced in the Extended Events session.  I set up a separate table to track each test run (begin/end time).  This way the long periods could be associated back to the Extended Events session.

We caught a few instances of the long running query, and the associated wait stats were primarily related to PAGEIOLATCH_SH.  That opened up other considerations which I don’t cover in this post, but I was still interested in seeing the execution plan for the long-running time periods versus the “steady” state.

For example, let’s say you have the following query:

EXEC sp_executesql    
    N'SELECT     p.ProductLine,
               SUM(f.SalesAmount) TotalSalesAmount
    FROM [dbo].[FactInternetSales] f
    INNER JOIN [dbo].[DimProduct] p ON
         f.ProductKey = p.ProductKey
    GROUP BY p.ProductLine
    ORDER BY p.ProductLine';
GO

This particular query leverages parallelism and a columnstore index in my database.  I can search for the query hash and query plan hash as follows (after executing on my test system):

SELECT TOP 1 last_execution_time, query_hash, query_plan_hash, p.query_plan
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text (qs.plan_handle) AS t
CROSS APPLY sys.dm_exec_query_plan (qs.plan_handle) AS p
WHERE t.text LIKE '%SUM(f.SalesAmount)%'
ORDER BY last_execution_time DESC;
GO

I order it by last execution because I want the most recent execution version (and you have to be careful when grabbing the query_hash to make sure the text match in your WHERE clause isn’t capturing another query).  Obviously easier to do on an isolated test system – but not impossible in production if you pay attention to execution count statistics.  The following shows example results from the previous query:

snaghtml506cb63 thumb Capturing Transient Query Plan Changes

I was interested in the query_hash and also the query_plan – which for this example, is the parallel plan:

image thumb Capturing Transient Query Plan Changes

Now what if I re-execute my query – but with parallelism disabled at the server scope?  No query change – but now the plan must change since it can’t run in batch execution mode.  Will tracking it based on my query_hash still work? 

SELECT query_plan_hash, p.query_plan
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_query_plan (qs.plan_handle) AS p
WHERE qs.query_hash = 0x3759AECF09255926
GO

In this case, I haven’t changed my query, but the plan changed, and so my query_hash allowed me to see the new serial query plan:

image thumb Capturing Transient Query Plan Changes

image thumb Capturing Transient Query Plan Changes

In the particular consulting scenario I mentioned at the beginning of this post, I wrote the most recent query plan to a separate table for each test run so that when the long performance happened on the test environment, we could go back and compare the different plans based on the same query hash value.