Query Store Performance Overhead…Updated

I wrote the original Query Store performance overhead post just over two years ago, and just like the data in your database keeps changing, so does SQL Server.  However, the question, “What is the performance overhead of enabling Query Store?” is still the most frequent question I am asked. 

So why am I writing this post?  Because there have been many improvements specific to Query Store that have taken the feature to the point where it can support all workloads, including those that are ad-hoc.  This is a big deal.

Quick history (in case you didn’t read the initial post)

When Query Store was originally released in Azure, the Microsoft engineers had the benefit of terabytes of telemetry to help them understand what was working, and what wasn’t.  This allowed them to make changes to the feature so that when it was released in SQL Server 2016, it was fully-functioning and reliable.

However, not every workload that runs in an on-premises environment also runs in Azure SQL Database.  As a result, many interesting problems were exposed when customers beginning turning on Query Store in their environment.  Many users have been critical of these initial limitations, but I’ve been impressed with the commitment of the SQL Server team to the Query Store feature.  In addition to making Query Store more robust throughout the SQL Server 2017 and SQL Server 2019 releases, the team has continued to add functionality.  In case you missed it, during yesterday’s PASS Summit keynote, Bob Ward and Conor Cunningham debuted the ability to add hints to queries via Query Store.  That’s big.  Query Store is feature that’s here to stay.

What’s new in Query Store?

There are multiple fixes and improvements since the initial SQL Server 2016 release, which include:

  • Internal memory limits
  • Smaller transactions for background flushes of data
  • Limiting the number of rows deleted as part of Query Store’s cleanup mechanism
  • The ability to customize the capture settings

Performance details

Internal memory limits

The most recent CU for each major release that supports Query Store limits the amount of internal memory used by Query Store at both the database and instance level.  I’ve talked about the memory that Query Store uses previously, and this limitation prevents the Query Store memory overhead from growing beyond a limit relative to the total available server memory.  From the documentation:

Query Store scalability improvement

The purpose of these memory limits is to prevent other performance issues such as high waits, memory pressure, and locking contention that could be exposed through the use of Query Store for extremely ad hoc workloads.  Without limits, if the memory used by Query Store grows by too large, it has to continually try and catch up with the workload, eventually causing bigger problems.  With memory limits in place, Query Store will back off when the internally managed limits are reached.  This ensures that impact on user workload remains minimal.  Keeping the overhead of Query Store use low is a core priority feature, so there was a decision to maintain overall performance even if it means that Query Store temporarily stops capturing queries and their runtime statistics.

If Query Store switches to a READ_ONLY state because memory limitations are encountered, you will see one of the following read_only_reasons in sys.database_query_store_options:

131072 – The number of different statements in Query Store has reached the internal memory limit. Consider removing queries that you do not need or upgrading to a higher service tier to enable transferring Query Store to read-write mode.

262144 – Size of in-memory items waiting to be persisted on disk has reached the internal memory limit. Query Store will be in read-only mode temporarily until the in-memory items are persisted on disk.

Smaller transactions for background flushes of data

Query Store asynchronously flushes data from memory to disk, and this is a background activity that previously could take an extended period of time, particularly for a high-volume workload.  The background flush occurs in smaller transactions, allowing Query Store settings to be altered quickly, without being blocked.  In addition, for extremely intensive workloads this could previously become a cycle of never-ending flushes that could become problematic.

Limiting the number of rows deleted as part of Query Store’s cleanup mechanism

Cleanup of the Query Store data occurs regularly, as part of the process to keep a limited amount of data (determined by STALE_QUERY_THRESHOLD_DAYS), and as necessary when the amount of Query Store data gets close to the value set for MAX_STORAGE_SIZE_MB.  Limiting the number of rows that are deleted as part of cleanup reduces the impact on the database transaction log.

The ability to customize the capture settings

SQL Server 2019 introduced a new option for QUERY_CAPTURE_MODE, which is CUSTOM.  This option provides the ability to set thresholds that control whether queries are captured based on execution count, compile CPU time, or execution CPU time.  For all workloads that are ad-hoc, this is the silver bullet we’ve been waiting for the SQL Server team to provide.

CUSTOM capture mode options

The aforementioned improvements, with the exception of CUSTOM capture mode, exist in the following releases (current as of this writing):

  • SQL Server 2019 CU8
  • SQL Server 2017 CU22
  • SQL Server 2016 SP2 CU15

What do you need to do?

Upgrade.

For anyone that does not have an ad hoc workload, get to the latest CU of the version you’re running.  For anyone with an ad hoc workload, SQL Server 2019 CU8+ is where you need to be to get the best Query Store experience and prevent a degradation in performance with it enabled.

Systems with an ad hoc workload on SQL Server 2019 should use the CUSTOM capture mode. For more parameterized workloads, AUTO can be used. These options are recommended to help prevent Query Store from becoming read_only, as described above (which is much more likely with capture mode ALL).

Finally, if you find yourself in a state where you believe Query Store is creating a performance problem (perhaps the CUSTOM capture settings were not properly configured), you now have the ability to forcibly turn off Query Store.  I view this as a last resort/panic button, but it’s a nice option to have in your back pocket.

10 thoughts on “Query Store Performance Overhead…Updated

  1. I noticed an instance of version SQL2017 CU21 also accepted ( and scripts out ) the settings:

    DATA_FLUSH_INTERVAL_SECONDS = 900, /* SQL Server 2016 SP2 CU15 / SQL Server 2017 CU22 / SQL Server 2019 CU8 */
    MAX_PLANS_PER_QUERY = 200, /* SQL Server 2016 SP2 CU15 / SQL Server 2017 CU22 / SQL Server 2019 CU8 */

    As per this article, you state the settings are only effective as of SQL2017 CU22.
    Doe you expect it to just neglect these settings or is it a flaw in the system they are not used as optimized ?

    1. The CUSTOM capture mode settings are only specific to SQL 2019. All other settings, including DATA_FLUSH_INTERVAL_SECONDS and MAX_PLANS_PER_QUERY, are available in SQL Server 2016 and higher.

    1. I don’t have exact numbers on the overhead, but because QS is collecting more data when wait stats capture is enabled, I would expect more overhead with an ad hoc workload. If you’re on 2019 and you’re using CUSTOM capture, then you can manage that overhead better by setting the thresholds to determine what queries are collected. It really depends on the workload, the volume of queries, and your settings.

  2. Hi Erin.
    How are you?
    My SQL is Microsoft SQL Server 2016 (SP2-CU17) (KB5001092) – 13.0.5888.11
    But when I run the following command to set the capture mode to CUSTOM, I got the error due to the CUSTOM mode.
    The GUI version doesn’t have the option to choose CUSTOM.
    ALTER DATABASE xxx
    SET QUERY_STORE (QUERY_CAPTURE_MODE = CUSTOM)

    Did I miss anything? I thought we should have the CUSTOM option for SQL 2016 SP2 CU15 and above.
    Thanks

  3. Hi Erin, this is a great article! I’m user and fan of Query Store. I’m facing an issue now where QS is using more and more storage. On an adhoc workload 400GB SQL 2017 DB (that I know I may have to upgrade to 2019 CU8) QS is using 30GB and keep growing, using AUTO capture mode, STALE_QUERY_THRESHOLD_DAYS on 40 and capturing waits information. Do you know if there’s any way to identify how are those 30GB distributed so I can take the best decision? I mean, if the more space is being used for waits information, or number of plans per query, etc. I was trying to find any information about that with no luck. Thanks in advance!

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

A Fond Farewell

If you haven’t guessed from the title, I’m writing this post because I am leaving SQLskills. This Friday, January 14th, is my last day, and

Explore

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.