I’m a huge fan of Query Store, which regular readers may know, but there’s a need to write a bit more about Query Store best practices. This isn’t a “you must use this feature” post, this is a “here is what you must know if you want to use this feature” post.
I have a lot of content about Query Store, but maybe what’s really important gets lost amongst everything else. Glenn prompted me to write this, after I worked with two customers last week that ran into issues related to their configuration of Query Store. Listed below are the things you must know before you enable Query Store. If you already have Query Store enabled and running, I recommend reviewing the information to ensure you are following best practices.
Settings
- Review my post on Query Store Settings to understand what different settings exist for Query Store, what values are recommended for each setting, and why.
- Most important: QUERY_CAPTURE_MODE set to AUTO, MAX_STORAGE_SIZE_MB set to 10GB at the absolute max, something less ideally (you may need to adjust CLEANUP_POLICY to keep less data, depending on your workload).
- Edit: November 17, 2020: If you are using SQL Server 2019 and have an ad hoc workload, I recommend CUSTOM for CAPTURE_MODE.
Trace Flags
- Review my post on Query Store Trace Flags to understand the two Trace Flags which are Query Store-related, what they do, and why you want to enable them.
- I definitely recommend both 7752 and 7745. If you have 7752 enabled and you are trying to make changes to Query Store configuration after a restart, please review my post discussing the Query Store data load. There is no shortcut to get this to load faster, and no way to kill it. This is why proper settings are important.
- Edit: November 17, 2020: Trace flag 7752 is enabled by default in SQL Server 2019.
Performance
- Review my post on Query Store Performance Overhead (updated November 2020). If you have a high-volume, ad hoc workload, it is quite possible that you should not enable Query Store. Jonathan told me about a customer of his that enabled Query Store and within two (2) hours, they had filled up 10GB of Query Store. We recommended that they not use Query Store at this time. There are improvements coming in SQL Server 2019 that will hopefully help manage Query Store with this type of workload.
Version
- I highly recommend that you run SQL Server 2016 SP2 CU7 or SQL Server 2017 CU15 (current CUs at the time of writing, I’ll try to keep this updated). There is a very important fix in CU7 that I strongly suggest you have in place. I don’t see this fix in a CU for SQL Server 2017 yet, so it’s possible you could run into a problem if you have an ad hoc workload.
- Edit: November 17, 2020: The current CUs for each release at the time of this writing are strongly recommended (please see Query Store Performance Overhead…Updated). The CUs are:
- SQL Server 2019 CU8
- SQL Server 2017 CU22
- SQL Server 2016 SP2 CU15
- Edit: November 17, 2020: The current CUs for each release at the time of this writing are strongly recommended (please see Query Store Performance Overhead…Updated). The CUs are:
Ad hoc?
- Not sure whether you have an ad hoc workload? Check out my post on Examining the Performance Impact of an Adhoc Workload.
If you still have questions about how to configure Query Store or what to watch out for, please leave a comment! I’m happy to share as much information and experience as possible.
29 thoughts on “Query Store Best Practices”
Hi Erin –
Thanks for taking the time to write this. We’re thinking about starting with Query Store in our dev / test environments, which we just upgraded to 2019. Any concerns with the 2019 releases?
Thanks,
Dan
Dan-
There was an issue with CU2, and that CU was withdrawn, but otherwise I’m not aware of any concerns.
Erin
Hi Erin, great stuff about the QS. Your presentations are always very informative. I wanted to ask a question during today’s PASS presentation but forgot to ask. We have an 8 replica AG cluster with some databases having 5-6 secondaries… Is QS data being synchronized to every replica and how much I/O does it consume?
Hi Michael-
Correct – the QS data is synchronized to every replica. In terms of I/O, I don’t know if you mean disk space in the user database (that’s something you can pull from sys.database_query_store_options), or something else.
Erin
so dealing with 3rd party db 12 terabytes, sql 2017 enterprise edition. Noticed that if it runs out of space and turns to RO all the queries go to hell. Have it t 20 gig now and it is a lifesaver for the parameter sniffing, but twice now it has run out of space and the world crashes. I thought the cleanup process would help this but I am guessing not sophisticated enough for that
Hi-
It depends on what CU you’re running, as there was a change to improve the cleanup process in CU16. I note that in this post: https://www.sqlskills.com/blogs/erin/query-store-performance-updated/.
However, the cleanup process is single-threaded, so it’s not wildly efficient. You really want to have Query Store sized so that automatic cleanup does NOT kick in so it doesn’t flip into READ_ONLY.
Ideally, the max QS size is 10GB or less, and you have to manage that with your CAPTURE_MODE (should be AUTO for your version) and the STALE_QUERY_THRESHOLD_DAYS. I would guess that the workload is mixed with a fair number of ad hoc queries, which is why the QS is so large.
Erin
you are my hero, so yes it is ad hoc hell but have to live with it, the auto fixing of bad plan is savior but a devil in a red dress. I inherited this, it seems we have hit a possible threshold. I have also seen people complain about if you have to purge to get back to read write and it is on an AG like ours. It is painful………… Will get to the CU this is on an earlier version thanks for your quick response
when you mention CU 16 you are referring to SQL 2017 correct?
thanks
Correct, the latest for SQL 2017 is CU24, but CU16 has the fix that might help.
so if everything in this DB is more ad hoc type would it make more sense to limit the days , right now at 15 thinking 10, FYI applying Cu24 next week
Aside from CAPTURE_MODE set to AUTO, the next thing you can change to manage the size of the Query Store is the retention. So yes, dropping it from 15 days to 10 days would help. I would make that change on a weekend/less busy time, as clean up will kick in and it may take a while to run. The cleanup method that deletes older data is better than the one that deletes when you hit the max size, but they are both slow.
Erin,
Thanks for article. It is really worth a read when someone laps in query store issues 🙂
I have noticed something weird in last few months for query store cleanup.
Brief:
I have enabled query store on RDS SQL Server instances having 2017 version. It looks like it does runs cleanup in the background which is either slow or cpu consuming. lately I noticed that approx. 50% of cpu utilization occurred because of cleanup by query store. most of the deletes were on sys.plan_persist_plan, plan_persist_runtime_stats, plan_persist_query_Text tables. When I put query store in read only mode, 50% of cpu and deletes transactions were cleared.
I have now made some changes increased size for query store to 10GB, although it was already set to 5GB and did not reach to max at all. secondly I have also disabled size base cleanup temporarily to see if still it does cleanup or not. I am not sure if it does cleanup on based of stale queries. could you please provide me some insight on this.
It is pretty scary if cleanup consumes 50% of CPU and runs for 2 and half hours during business hours 🙂
Thanks
Sukh
Hi Sukh-
What build of SQL Server 2017 are you running?
How did you tie the CPU use to Query Store cleanup? Was that through high CPU consumers in dm_exec_query_stats or something else?
Also, I’m not sure that increasing the size of QS will help here. There have been improvements to the delete process, but it is single-threaded and not extremely efficient, and even with a larger size for QS, cleanup will still kick in regularly. If you’re not getting close to the max size, the retention cleanup still runs on a regular basis, cleaning out queries older than X days, based on what you have configured for CLEANUP_POLICY (STALE_QUERY_THRESHOLD_DAYS).
Erin
Erin
Also worth to mention that I have made change of Capture mode from all to auto now. hope that minimize the capture and capture only relevant ones.
Thanks
Sukh
This is HIGHLY recommended.
Erin,
Our SQL version is SQL 2019 cu8 but the user databases are set to sql 2016 (130) compatibility level. Which best practice settings I should follow in this scenario, sql 2016 or sql 2019? We will be raising the compatibility level to 150 once the query store starts collecting data for at least a week.
Also, do u see any urgent need to go to SQL 2019 CU10 when using query store?
Thanks,
Wing
Wing-
Query Store functionality is independent of compatibility mode, so I would recommend following the recommendations for SQL 2019. I don’t see that there are any fixes in CU10 specific to Query Store, so I don’t believe you would absolutely need it. I definitely recommend CU9 or higher for SQL Server 2019 though, if you are going to use Query Store.
Erin
Thanks Erin. CU9 has a fix (13749461 – Fixes Query Store scalability improvement for ad-hoc workloads.) which would help our system. I would apply CU10.
You are right, it’s CU9 – I updated my original reply so that, down the road, folks make sure they are running CU9 at a minimum (but I typically recommend latest CU when upgrading).
Erin, we have been following the best practices for upgrading SQL 2016 to SQL 2019 in our staging env but leave the user db compatibility level to sql 2016 for three weeks now so the query store would collect the stats and query plan. Now we are ready to raise the user dbs to the compatibility of sql 2019. At the same time, we will turn off the Legacy Cardinality Estimator (we have it on in sql 2016) and turn on the auto turning. We are looking forward to seeing the query performance with this auto-tuning feature working side by side with querystore. In querystore, I turn on the new custom capture but using the defaults. Since this setting is new in sql 2019, i am wondering after raising the compatibility level, what i need to watch out for in order to adjust the settings to match our workload. Secondly, should we set the ‘query Optimizer fixes’ setting from off to on (has been off) if it has any benefits to the query store when doing the compatibility level change?
Wing-
Without looking at data that you have in Query Store, I don’t know if you need to adjust the options for CUSTOM capture mode. You can use the Query Store data to understand execution count, compile CPU and execution CPU and then set the thresholds based on that information. Whether you enable Query Optimizer fixes is up to you – you are introducing another variable (in addition to changing compat mode AND CE version), and I typically only change one thing at a time when I’m trying to understand the impact on query performance.
Erin
with our monster 3rd party db we have to purge the query store once a month, what would be nice if there was an option to keep the pinned plans you want to clean out all the rest, any idea if Microsoft may consider this?
Tim –
You can always create a User Voice request for that, but honestly I would think that’s low priority. I understand why you want that, but what would have to happen behind the scenes is inefficient. It would be like having a table with 1 million rows, but wanting to keep only 20 rows. If you deleted all but 20 rows, it would generate a huge amount of transaction log, you’ve have to batch it to prevent blocking, etc. I just don’t see Microsoft investing effort in that, but it never hurts to ask. I would definitely recommend providing a strong business case.
Erin
have found that queryStore is “nonFunctional” in sql server aws-rds after a multiAZ failover. data is frozen on what was collected 3 weeks in the past. Am thinking that perhaps the queryStore persisted to disk is corrupted? i am so depended on query store – as is — so useful.
do you please have any thoughts? i have opened tech tickets with both aws and microsoft.
Allen
I have not seen this issue previously – and if you hadn’t already opened a ticket with Microsoft, that is what I would recommend. I would also try to replicate it in a test environment.
Erin
apparently in aws rds SQL server with multi-az, failover may cause sql server querystore to quit functioning.
*** we were able to correct this in nonPROD *** – by REBOOT primary instance WITHOUT FAILOVER – and now querystore has started functioning again.
We are attempting to do the same in our production envirnonment ( where querystore is not functioning ) to see if this also corrects our prod environment.