Why You Need Query Store, Part III: Proactively analyze your workload


The amount of data collected by Query Store pales in comparison to the amount of data available in all of SQL Server that could be captured.  But the quality of the data overshadows the quantity, or lack thereof.  While most people target the execution statistics when viewing historical data in Query Store, I would argue that the query text and query plan information are equally as valuable when it comes to taking the time to analyze your workload.

The query text and plan are stored in two separate tables, sys.query_store_query_text and sys.query_store_plan, respectively.  The text is stored as a nvarchar(max) data type, and the plan is stored as varbinary(max).  This means that the data can be mined, looking for patterns and explicit use of objects.  Want to know what queries use an index?  Look for it in the plans.  Want to know what queries have a RECOMPILE hint on them?  Look for it in the query text.  Anything you want to find with regard to query text or plans is there, you just have to know how to query the data.

Mining Query Store: Queries

But wait, can’t we just search the text?  Sure, with the query_text field you can just wildcard your way through it looking for things.  For example, the query below will help you find every statement in Query Store that has RECOMPILE in the text.

     TRY_CONVERT(XML, [qsp].[query_plan]) AS [QueryPlan_XML]
FROM [sys].[query_store_query] [qsq]
JOIN [sys].[query_store_query_text] [qst]
     ON [qsq].[query_text_id] = [qst].[query_text_id]
JOIN [sys].[query_store_plan] [qsp]
     ON [qsq].[query_id] = [qsp].[query_id]
JOIN [sys].[query_store_runtime_stats] [rs]
     ON [qsp].[plan_id] = [rs].[plan_id]
WHERE [qst].[query_sql_text] LIKE '%RECOMPILE%';

Mining Query Store: Plans

Query plans have a specific format, defined by Microsoft, which can be searched using wildcards, but the performance is abysmal and you’re better served using XQuery.  That’s right, I said XQuery.  If you don’t know, XQuery and I are frenemies.  I love the information I get when using it, but writing the T-SQL always makes me want to compulsively eat M&Ms and animal crackers until I feel sick.

Fortunately for me, Jonathan writes XQuery and creates the foundation of the queries I manage to cobble together for Query Store.  In his post, Finding what queries in the plan cache use a specific index, Jonathan has the code to look through the plan cache and find what queries use a specific index.  The challenge is that the plan cache can be volatile, particularly in an ad hoc workload.  Even in a parameterized/procedural workload, there is churn and the plans that are cache change as different business processes run throughout the day, week, and month.

Consider the scenario where you use sys.dm_db_index_usage_stats to determine if an index is being used, and when you find one that it isn’t….how do you really know?  As we know, the plan cache is transitory, so a query that uses it might not be in cache at that moment.  But it would be in Query Store, depending on your retention settings.  If we modify the query from Jonathan’s post, we can interrogate the Query Store data:


DECLARE @IndexName AS NVARCHAR(128) = '[FK_Sales_InvoiceLines_InvoiceID]';
-- Make sure the name passed is appropriately quoted

IF (LEFT(@IndexName, 1) <> '[' AND RIGHT(@IndexName, 1) <> ']') SET @IndexName = QUOTENAME(@IndexName);
--Handle the case where the left or right was quoted manually but not the opposite side
IF LEFT(@IndexName, 1) <> '[' SET @IndexName = '['+@IndexName;
IF RIGHT(@IndexName, 1) <> ']' SET @IndexName = @IndexName + ']';
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
     stmt.value('(@StatementText)[1]', 'varchar(max)') AS SQL_Text,
     obj.value('(@Database)[1]', 'varchar(128)') AS DatabaseName,
     obj.value('(@Schema)[1]', 'varchar(128)') AS SchemaName,
     obj.value('(@Table)[1]', 'varchar(128)') AS TableName,
     obj.value('(@Index)[1]', 'varchar(128)') AS IndexName,
     obj.value('(@IndexKind)[1]', 'varchar(128)') AS IndexKind,
     SELECT query_plan
          SELECT TRY_CONVERT(XML, [qsp].[query_plan]) AS [query_plan]
          FROM sys.query_store_plan [qsp]) tp
          ) AS tab (query_plan)
     CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS batch(stmt)
     CROSS APPLY stmt.nodes('.//IndexScan/Object[@Index=sql:variable("@IndexName")]') AS idx(obj)


Beyond looking at performance metrics over time for a query, take advantage of the plethora of data in the Query Store tables.  Use both the text and plan information to proactively look for anti-patterns in your workload: use of index hints, RECOMPILE, NO LOCK…all the things that may cause problems, and understand both object access and expensive operations in the plans.  There is no limit to the patterns, and anti-patterns, you can find.  You just have to write the code.

Why You Need Query Store, Part II: Plan forcing

We hear a lot about all the data that Query Store captures, but it is tempting to argue that the best part of Query Store is the ability to force a specific plan for a query.  Arguably, if the data didn’t exist, we would have a harder time figuring out what plan to force, but plan forcing with Query Store is light-years easier than using a plan guide (the historical method for getting a query to use a specific plan).  Further, tracking what plans are forced and how the queries are performing is easy too, using the Queries With Forced Plans report in SSMS.

Items to Consider

Before you go too crazy trying to stabilize performance with hundreds of forced plans, here are a couple things worth mentioning about plan forcing.

  1. I think of plan forcing as a temporary solution. I do not view it as a “set it and forget it” option.
  2. Use T-SQL for forcing and un-forcing (versus the UI) so you can track when it was done, and by whom, in change control. Nothing is entered in the ERRORLOG, or anywhere else, and there is currently no event that captures it.
  3. Evaluate both the execution statistics (average and total) and the different plans for the query to determine the “best” plan to force.

Why do I view plan forcing as temporary?  Because if I have plan variability, I’d really like to address it through code or schema changes.  Maybe there’s an index to add.  Maybe a section of the query needs a re-write.  To me, forcing the plan gives me a way to “stop the bleeding” – I can stabilize performance right now – until I can determine the right long-term solution and implement it.

For those of you with third-party applications, who cannot change code or indexes, you still want to monitor performance of those forced plans.  While you can’t change the code or the schema, the vendor can (and most likely will at some point), and your data will change.  Changes in data and its distribution could have a greater impact on plan choice than schema and code, and you may be preventing a better plan from being used by forcing the one that was good six months ago.

Deciding What Plan to Force

When you are looking at plan performance, deciding which plan to force can get tricky, depending on how many plans you have for a query.  I start by looking at the plans – does one of them stand out as a “better” plan because of the operators it uses (or doesn’t use), the lack of warnings in the plan, etc.  Then I look at the execution statistics, and this is where you want to make sure you know what metric is displayed in the UI.  Examine graph A below, and decide what plan you would force before you continue reading.

Total plan duration over time

Total plan duration over time

Graph A shows plan performance for query_id 137512 across a 48-hour window.  They y-axis is duration, and notice that it’s TOTAL duration.

Now take a look at graph B, below, which shows plan performance for the same query_id across the exact same window of time, but the y-axis is AVERAGE duration.  Does your forcing decision change?

Average plan duration over time

Average plan duration over time

When we look at average duration across the different plans, it’s quite consistent.  So much so that I probably wouldn’t force a plan for this query.  I may investigate the plans in more detail to understand what the variation is, but for the most part, I’d probably leave it.  Remember that total duration represents the total amount of time spent executing the query during a time interval.  So the plans that had a lower total duration most likely were not executed as many times as those with a higher total duration.  This becomes evident when you hover over one of the circles and look at the metrics for that specific data point, as shown below.

Query performance details for a highlighted plan

Query performance details for a highlighted plan


I think plan forcing is a fantastic feature in SQL Server, whether you’re manually forcing plans, or using Automatic Plan Correction.  If you are forcing plans, take the time to decide the “best” plan to force by not just reviewing the plan and the execution statistics, but also by testing the plan with different input parameters.  This can really to understand what plan consistently provides stable performance for a query.  I recommend setting up an Extended Events session to monitor for forcing failures, and regularly re-visit what plans are forced to see if they are still appropriate.  Finally, remember that if you turn off Query Store for any reason, plan forcing is not longer in effect.  But if Query Store happens to flip to READ_ONLY, as long it’s enabled, your plans will still be forced.



Why You Need Query Store, Part I: Historical Performance Data

The Query Store feature previewed in Azure SQL Database in summer 2015, was made generally available that fall, and was part of the SQL Server 2016 release the following summer.  Over the past four years (has it really been that long?!) I have devoted significant time to learning Query Store – not just understanding how it works from the bottom up, but also why it works the way it does.  I’ve also spent a lot of time sharing that information and helping customers understand and implement the feature, and then working with them to use the query store data to troubleshoot issues and stabilize performance.  During this time I have developed a deep appreciation for the Query Store feature, as its capabilities go far beyond its original marketing.  Is it perfect?  No.  But it’s a feature that Microsoft continues to invest in, and in this series of blog posts my aim is to help you understand why Query Store is a tool you need to leverage in your environment.

It’s all about the data

The most talked-about benefit of Query Store is its ability to help you dig into query performance and regressions, which is possible because of the data it captures.  The bulk of the Query Store data resides seven system tables, exposed through views, with a total of over 150 distinct data points.  Notable within that collection of information are:

  • Statement text
    • individual queries are captured, whether they are ad hoc or part of an object (stored procedure)
  • Query Plan(s)
    • the plan that was used during execution is stored in Query Store, it’s the equivalent of what you get from the plan cache
  • Execution Statistics
    • resource use, execution count, duration data and more are stored
  • Wait Statistics
    • Azure SQL Database and SQL Server 2017 and higher can capture query-level wait statistics

Query Store is a database-scoped feature – it’s not something that’s enabled for the entire instance – therefore the data is stored within the user database.  To view that data you can use the built-in reports in Management Studio (the 18.x release has seven reports, earlier releases only have four), or you can write your own queries against the system views.

Viewing data

Depending on how you’ve configured Query Store, you can look at historical query information going back days or weeks.  Not only can you see trends in query performance over time, but you can also compare query performance between specific windows of time.  The image below shows the top 25 queries for the past two weeks based on total duration.  The grid view on the left (which I prefer for analysis but is not the default) lets us see the statement text, the object to which it belongs (if it’s blank then it’s an ad hoc query), and the number of plans for the query.


Query Store Data for the Last Two Weeks

Top 25 Queries (Total Duration) for the Last Two Weeks


The ability to view performance of the plans over time, as displayed on the right side of the image, is a huge benefit for any data professional that has to troubleshoot performance, or figure out what happened AFTER the fact.  For this query, query_id 45, there are two plans, plan_id 45 and plan_id 1882.  We can see that one plan, 1882, was used on September 9, 2019, and the other, 45, was used the next day, on September 10, 2019.  Had someone complained because performance was abysmal on the 9th, but then just fine on the 10th, we could look at this data and see immediately it was due to a change in plan.  If you didn’t have Query Store, how would you know that?

You wouldn’t, unless you had set up a method to capture query plan and execution information, or you had a third-party monitoring tool.  On that topic, does Query Store replace the need for third-party monitoring tools?  It doesn’t.  I don’t believe you need one or the other, I think having access to both only makes your life easier.  If you don’t have the budget for a third-party tool, then Query Store is a fantastic option as it’s included in SQL Server, the data is collected within that user database, you control how long the data is kept, and it’s extremely easy to look at query performance using the built-in reports.