Updating Statistics with Ola Hallengren’s Script

I am a HUGE fan of updating statistics as part of regular maintenance.  In fact, if you don’t know if you have a step or job that updates out of statistics on a regular basis, go check now!  This post will still be here when you get back 😊

At any rate, for a long time the default options for updating statistics were pretty much a sledgehammer.  Within the maintenance plan options, the Update Statistics Task only provides the option to update Index statistics, Column statistics, or both.  You can also specify whether it is a full scan or a sample for the update, but that’s about it:

Update Statistics Task (Maintenance Plan)

Update Statistics Task (Maintenance Plan)

I don’t like this option because it means that statistics that have had little or no change will be updated.  I could have a 10 million row table where only 1000 rows change, and yet the statistics for that table will update.  This is a waste of resources.  For a small database, or system that’s not 24×7, that isn’t such a big deal.  But in a database with multiple 10 million row tables, it is a big deal.

The sp_updatestats command isn’t a favorite of mine either.  I’ve written about that here, so I won’t re-hash it.

If you have used Ola Hallengren’s scripts for maintenance, you hopefully know that it will also update statistics using the @UpdateStatistics parameter.  The default value for this is NULL, which means do not update statistics.  To be clear, if you drop in Ola’s scripts and have it create the jobs for you, and then you start running the “IndexOptimize – USER_DATABASES” job, by default you’re not updating statistics.  The code the IndexOptimize – USER_DATABASES job has, by default, is:

EXECUTE [dbo].[IndexOptimize]
@Databases = 'USER_DATABASES',
@LogToTable = 'Y'

If you want to have the job also update statistics, you need:

EXECUTE [dbo].[IndexOptimize]
@Databases = 'USER_DATABASES',
@UpdateStatistics = 'ALL',
@LogToTable = 'Y'

With this variation, we are updating index and column statistics, which is great.  But…we are updating them regardless of whether it’s needed.  Statistic with no rows modified? Update it.  Statistic with 10 rows modified? Update it.

There has always been an option to only update statistics that have changed, this is the @OnlyModifiedStatistics option, and this gets us behavior just like sp_updatestats.

EXECUTE [dbo].[IndexOptimize]
@Databases = 'USER_DATABASES',
@UpdateStatistics = 'ALL',
@OnlyModifiedStatistics = 'Y',
@LogToTable = 'Y'

With this option, if no rows have changed, the statistic will not be updated.  If one or more rows have changed, the statistic will be updated.

Since the release of SP1 for 2012, this has been my only challenge with Ola’s scripts.  In SQL Server 2008R2 SP2 and SQL Server 2012 SP1 they introduced the sys.dm_db_stats_properties DMV, which tracks modifications for each statistic.  I have written custom scripts to use this information to determine if stats should be updated, which I’ve talked about here.  Jonathan has also modified Ola’s script for a few of our customers to look at sys.dm_db_stats_properties to determine if enough data had changed to update stats, and a long time ago we had emailed Ola to ask if he could include an option to set a threshold. Good news, that option now exists!

Using Ola’s script to update statistics based on a threshold of change

With the IndexOptimize stored procedure Ola now includes the option of @StatisticsModificationLevel.  You can use this to set a threshold for modifications, so that only statistics with a specific volume of change are updated.  For example, if I want statistics updated if 5% of the data has changed, use:

EXECUTE [dbo].[IndexOptimize]
@Databases = 'USER_DATABASES',
@UpdateStatistics = 'ALL',
@StatisticsModificationLevel= '5',
@LogToTable = 'Y'

Take note: the option @OnlyModifiedStatistics option is not included here…you cannot use both options, it has to be one or the other.

This is great!  I can further customize this for different tables.  Consider a database that has a very volatile table, maybe dbo.OrderStatus, where auto-update may or may not kick in during the day, so I want to make sure stats are updated nightly:

EXECUTE [dbo].[IndexOptimize]
@Databases = 'USER_DATABASES',
@Indexes = 'ALL_INDEXES, -SalesDB.dbo.OrderStatus',
@UpdateStatistics = 'ALL',
@StatisticsModificationLevel= '10',
@LogToTable = 'Y'

This will address fragmentation and update statistics for all tables in the SalesDB database except dbo.OrderStatus, and it will update statistics if 10% or more of the rows have changed.

I would then have a second job to address fragmentation and stats for OrderStatus:

EXECUTE [dbo].[IndexOptimize]
@Databases = 'USER_DATABASES',
@Indexes = 'SalesDB.dbo.OrderStatus',
@UpdateStatistics = 'ALL',
@StatisticsModificationLevel= '1',
@LogToTable = 'Y'

For the dbo.OrderStatus table, statistics would be updated when only 1% of the data had changed.

I love the flexibility this provides!

You might be wondering why I chose 1%…take a close look at this important note which is included in Ola’s documentation:

Statistics will also be updated when the number of modified rows has reached a decreasing, dynamic threshold, SQRT(number of rows * 1000)

This is critical to understand because if the threshold I have set for @StatisticsModificationLevel ends up having a number of rows HIGHER than the formula above, statistics will update sooner than I expect.

For example, if I have 1 million rows in a table and I have @StatisticsModificationLevel = 10, then 10% of the rows, or 100,000, have to change in order to update statistics.  HOWEVER, if you plug 1 million into SQRT(1,000,000 * 1000), you get 31,623, which means Ola’s script will update statistics after 31,623 rows have changed…well before 100,000.

This may be important for some of you to understand in terms of these thresholds, so I dropped the information into a table to make it easier to comprehend (at least, it’s easier for me!).

Thresholds for Statistics Updates (percentage and SQRT algorithm)

Thresholds for Statistics Updates (percentage and SQRT algorithm)

Using my original example, if dbo.OrderStatus has about one million rows, then with 1% as the threshold, only 10,000 rows need to change before stats are updated.  If the SQRT algorithm were used, over 30,000 rows would need to change before stats were updated, and depending on the data skew, that might be too high.

Understand that as tables get larger, statistics will likely be updated before the set percentage value is reached because the SQRT algorithm has a lower threshold.  (Yes, I’m driving this point home.)  Consider a table with 10 million rows.  If I set the threshold to 5%, I would expect statistics to update after 500,000 modifications, but in fact they will update after 100,000.

If you’re wondering where the SQRT algorithm comes from, please review Microsoft’s Statistics documentation.  This threshold was originally introduced with trace flag 2371 to lower the threshold for automatic updates.  It is applied by default started in SQL Server 2016 when using compatibility level 130.  My assumption is that Ola determined this was a good threshold to use as a fail-safe/catch-all for his script, and I think it was smart move on his part.  In general, I’d rather have statistics update too often, rather than not often enough.  However, using the new @StatisticsModificationLevel option gives us better control than we’ve had previously, unless we write a custom script (which is still an option…do what works best for you!).

Can you force a plan for a different query with Query Store?

This is question I’ve gotten a few times in class…Can you force a plan for a different query with Query Store?

tl;dr

No.

Assume you have two similar queries, but they have different query_id values in Query Store.  One of the queries has a plan that’s stable, and I want to force that plan for the other query.  Query Store provides no ability to do this in the UI, but you can try it with the stored procedure.  Let’s take a look…

Testing

Within WideWorldImporters we’ll execute an ad-hoc query with two different input values:

USE [master];
GO
ALTER DATABASE [WideWorldImporters] SET QUERY_STORE = ON;
GO
ALTER DATABASE [WideWorldImporters] SET QUERY_STORE (OPERATION_MODE = READ_WRITE);
GO

USE [WideWorldImporters];
GO

DECLARE @CustomerID INT;
SET @CustomerID = 972;

SELECT o.OrderDate, o.ContactPersonID, ol.StockItemID, ol.Quantity
FROM Sales.Orders o
JOIN Sales.OrderLines ol
ON o.OrderID = ol.OrderID
WHERE o.CustomerID = @CustomerID;
GO

DECLARE @CustomerID2 INT;
SET @CustomerID2 = 972;

SELECT o.ContactPersonID, o.OrderDate, ol.StockItemID, ol.Quantity
FROM Sales.Orders o
JOIN Sales.OrderLines ol
ON o.OrderID = ol.OrderID
WHERE o.CustomerID = @CustomerID2;
GO

Let’s see what’s in Query Store:


SELECT qt.query_text_id, q.query_id, qt.query_sql_text, p.plan_id, TRY_CAST(p.query_plan AS XML)
FROM sys.query_store_query_text qt
JOIN sys.query_store_query q
ON qt.query_text_id = q.query_text_id
JOIN sys.query_store_plan p
ON q.query_id = p.query_id
WHERE qt.query_sql_text LIKE '%Sales.Orders%';
GO

Query information from Query Store

Query information from Query Store

We see that we have two different queries and one plan for each. We can force the plan for the first query:

EXEC sp_query_store_force_plan @query_id = 3, @plan_id = 3;
GO

This works.  If we try to force that same plan for the other query:

EXEC sp_query_store_force_plan @query_id = 4, @plan_id = 3;
GO
Error when trying to force a different plan for a query

Error when trying to force a different plan for a query

Trying to force plan_id 3 for query_id 4 throws this error:

Msg 12406, Level 11, State 1, Procedure sp_query_store_force_plan, Line 1 [Batch Start Line 34]
 Query plan with provided plan_id (2) is not found in the Query Store for query (4). Check the plan_id value and rerun the command.

Summary
Within Query Store, the relationship between query_id and plan_id is managed internally (i.e. there are no foreign key constraints for the underlying tables), and there is a validation that any plan_id that you want to force for a query_id must have been generated for that specific query.

In this type of scenario, you have to get the plan shape you want for the query, which may require trying different input parameters.  The example I’ve provided is very simple, but when in doubt, check the input parameters for the plan that you want, then try those with the other query (that doesn’t yet have the plan you want to force).  Of course, if you have to use a query or index hint to get the plan that you want, then it’s going to be a little trickier to get the plan you want for the original query.  Good luck!

Query Store and the Plan Cache Flushing

I’ve had two comments recently on my blog about Query Store causing the plan cache to be flushed. There was a known issue related to the plan cache flushing after Query Store was enabled, but this was fixed in CU2 for SQL Server 2016 SP1. So I did some testing and here is what I think is causing the confusion:

When you enable Query Store, which is done with an ALTER DATABASE SET statement, the plan cache for the database is flushed.

Now, before anyone writes up a UserVoice item, understand that there are several ALTER DATABASE SET commands that cause the plan cache for a database to be flushed. For example, taking a database OFFLINE causes the database plan cache to be flushed. That one seems intuitive, right?  So why is the plan cache cleared when you enable Query Store, or change one of the settings?  To ensure that new Query Store data is not lost.  This relates to the internals of how Query Store works, which aren’t essential to dig into, the point is that this behavior is known by Microsoft and expected.

If you review the ALTER DATABASE SET documentation, and specifically review the Query Store options, you won’t find any mention of the database plan cache clearing.  But you can test it to see that it occurs…

Testing

First, disable Query Store for the WideWorldImporters database, and then free the plan cache:

USE [master];
GO
ALTER DATABASE [WideWorldImporters] SET QUERY_STORE = OFF;
GO

DBCC FREEPROCCACHE;
GO

USE [WideWorldImporters];
GO

SELECT o.OrderDate, o.ContactPersonID, ol.StockItemID, ol.Quantity
FROM Sales.Orders o
JOIN Sales.OrderLines ol
	ON o.OrderID = ol.OrderID
WHERE o.CustomerID = 972;
GO
SELECT o.OrderDate, o.ContactPersonID, ol.StockItemID, ol.Quantity
FROM Sales.Orders o
JOIN Sales.OrderLines ol
	ON o.OrderID = ol.OrderID
WHERE o.CustomerID = 123;
GO

Now query the plan cache to confirm those plans are in cache:


SELECT t.dbid, t.text, s.creation_time, s.execution_count, p.query_plan
FROM sys.dm_exec_query_stats s
CROSS APPLY sys.dm_exec_query_plan(s.plan_handle) p
CROSS APPLY sys.dm_exec_sql_text(s.sql_handle) t
WHERE t.text LIKE '%Sales%';
GO
SQL Server's plan cache after initial query execution

SQL Server’s plan cache after initial query execution

Great, they’re there. Now enable Query Store, then check the plan cache again.


USE [master];
GO
ALTER DATABASE [WideWorldImporters] SET QUERY_STORE = ON;
GO
ALTER DATABASE [WideWorldImporters] SET QUERY_STORE (OPERATION_MODE = READ_WRITE);
GO

SELECT t.dbid, t.text, s.creation_time, s.execution_count, p.query_plan
FROM sys.dm_exec_query_stats s
CROSS APPLY sys.dm_exec_query_plan(s.plan_handle) p
CROSS APPLY sys.dm_exec_sql_text(s.sql_handle) t
WHERE t.text LIKE '%Sales%';
GO
SQL Server's plan cache after enabling Query Store

SQL Server’s plan cache after enabling Query Store (plans have been cleared)

 

 

 

 

 

 

 

The plan cache for the database has been cleared. Note that this only clears the plan cache for that database – plans for other databases still remain in cache. Run a few more queries to add some plans back to the plan cache, and confirm they’re there.


USE [WideWorldImporters];
GO

SELECT o.OrderDate, o.ContactPersonID, ol.StockItemID, ol.Quantity
FROM Sales.Orders o
JOIN Sales.OrderLines ol
	ON o.OrderID = ol.OrderID
WHERE o.CustomerID = 972;
GO
SELECT o.OrderDate, o.ContactPersonID, ol.StockItemID, ol.Quantity
FROM Sales.Orders o
JOIN Sales.OrderLines ol
	ON o.OrderID = ol.OrderID
WHERE o.CustomerID = 123;
GO

SELECT t.dbid, t.text, s.creation_time, s.execution_count, p.query_plan
FROM sys.dm_exec_query_stats s
CROSS APPLY sys.dm_exec_query_plan(s.plan_handle) p
CROSS APPLY sys.dm_exec_sql_text(s.sql_handle) t
WHERE t.text LIKE '%Sales%';
GO
SQL Server's plan cache after enabling Query Store AND running some queries

SQL Server’s plan cache after enabling Query Store AND running some queries

This time, change one of the settings for Query Store. It’s already enabled, but perhaps we want to change the INTERVAL_LENGTH_MINUTES setting from the default of 60 minutes to 30 minutes.


USE [master]
GO
ALTER DATABASE [WideWorldImporters] SET QUERY_STORE (INTERVAL_LENGTH_MINUTES = 30)
GO

SELECT t.dbid, t.text, s.creation_time, s.execution_count, p.query_plan
FROM sys.dm_exec_query_stats s
CROSS APPLY sys.dm_exec_query_plan(s.plan_handle) p
CROSS APPLY sys.dm_exec_sql_text(s.sql_handle) t
WHERE t.text LIKE '%Sales%';
GO
SQL Server's plan cache after changing a Query Store setting (plans have been cleared)

SQL Server’s plan cache after changing a Query Store setting (plans have been cleared)

In checking the plan cache again, the ALTER DATABASE SET statement cleared the database’s cache.

Summary

As you can see, the database plan cache is cleared after you enable Query Store, or change any settings related to Query Store. This is the same behavior we see with other ALTER DATABASE SET commands (e.g. changing the recovery model).  Unfortunately, this is not documented, nor is anything written to the ERRORLOG.

Of note: I don’t expect that you are changing settings often (if you are, I’d like to understand that thought process, as once you find the right values for space and interval, I expect those settings to be static…and if you’re not sure where to start, feel free to check out my post discussing the different options). I also don’t expect that you are turning Query Store on and off throughout the day; that completely defeats the purpose of the feature. It should be enabled, and left enabled, all the time. You don’t know when a problem might occur, right?