Updating Statistics with Ola Hallengren’s Script

I am a HUGE fan of updating statistics as part of regular maintenance.  In fact, if you don’t know if you have a step or job that updates out of statistics on a regular basis, go check now!  This post will still be here when you get back 😊

At any rate, for a long time the default options for updating statistics were pretty much a sledgehammer.  Within the maintenance plan options, the Update Statistics Task only provides the option to update Index statistics, Column statistics, or both.  You can also specify whether it is a full scan or a sample for the update, but that’s about it:

Update Statistics Task (Maintenance Plan)

Update Statistics Task (Maintenance Plan)

I don’t like this option because it means that statistics that have had little or no change will be updated.  I could have a 10 million row table where only 1000 rows change, and yet the statistics for that table will update.  This is a waste of resources.  For a small database, or system that’s not 24×7, that isn’t such a big deal.  But in a database with multiple 10 million row tables, it is a big deal.

The sp_updatestats command isn’t a favorite of mine either.  I’ve written about that here, so I won’t re-hash it.

If you have used Ola Hallengren’s scripts for maintenance, you hopefully know that it will also update statistics using the @UpdateStatistics parameter.  The default value for this is NULL, which means do not update statistics.  To be clear, if you drop in Ola’s scripts and have it create the jobs for you, and then you start running the “IndexOptimize – USER_DATABASES” job, by default you’re not updating statistics.  The code the IndexOptimize – USER_DATABASES job has, by default, is:

EXECUTE [dbo].[IndexOptimize]
@Databases = 'USER_DATABASES',
@LogToTable = 'Y'

If you want to have the job also update statistics, you need:

EXECUTE [dbo].[IndexOptimize]
@Databases = 'USER_DATABASES',
@UpdateStatistics = 'ALL',
@LogToTable = 'Y'

With this variation, we are updating index and column statistics, which is great.  But…we are updating them regardless of whether it’s needed.  Statistic with no rows modified? Update it.  Statistic with 10 rows modified? Update it.

There has always been an option to only update statistics that have changed, this is the @OnlyModifiedStatistics option, and this gets us behavior just like sp_updatestats.

EXECUTE [dbo].[IndexOptimize]
@Databases = 'USER_DATABASES',
@UpdateStatistics = 'ALL',
@OnlyModifiedStatistics = 'Y',
@LogToTable = 'Y'

With this option, if no rows have changed, the statistic will not be updated.  If one or more rows have changed, the statistic will be updated.

Since the release of SP1 for 2012, this has been my only challenge with Ola’s scripts.  In SQL Server 2008R2 SP2 and SQL Server 2012 SP1 they introduced the sys.dm_db_stats_properties DMV, which tracks modifications for each statistic.  I have written custom scripts to use this information to determine if stats should be updated, which I’ve talked about here.  Jonathan has also modified Ola’s script for a few of our customers to look at sys.dm_db_stats_properties to determine if enough data had changed to update stats, and a long time ago we had emailed Ola to ask if he could include an option to set a threshold. Good news, that option now exists!

Using Ola’s script to update statistics based on a threshold of change

With the IndexOptimize stored procedure Ola now includes the option of @StatisticsModificationLevel.  You can use this to set a threshold for modifications, so that only statistics with a specific volume of change are updated.  For example, if I want statistics updated if 5% of the data has changed, use:

EXECUTE [dbo].[IndexOptimize]
@Databases = 'USER_DATABASES',
@UpdateStatistics = 'ALL',
@StatisticsModificationLevel= '5',
@LogToTable = 'Y'

Take note: the option @OnlyModifiedStatistics option is not included here…you cannot use both options, it has to be one or the other.

This is great!  I can further customize this for different tables.  Consider a database that has a very volatile table, maybe dbo.OrderStatus, where auto-update may or may not kick in during the day, so I want to make sure stats are updated nightly:

EXECUTE [dbo].[IndexOptimize]
@Databases = 'USER_DATABASES',
@Indexes = 'ALL_INDEXES, -SalesDB.dbo.OrderStatus',
@UpdateStatistics = 'ALL',
@StatisticsModificationLevel= '10',
@LogToTable = 'Y'

This will address fragmentation and update statistics for all tables in the SalesDB database except dbo.OrderStatus, and it will update statistics if 10% or more of the rows have changed.

I would then have a second job to address fragmentation and stats for OrderStatus:

EXECUTE [dbo].[IndexOptimize]
@Databases = 'USER_DATABASES',
@Indexes = 'SalesDB.dbo.OrderStatus',
@UpdateStatistics = 'ALL',
@StatisticsModificationLevel= '1',
@LogToTable = 'Y'

For the dbo.OrderStatus table, statistics would be updated when only 1% of the data had changed.

I love the flexibility this provides!

You might be wondering why I chose 1%…take a close look at this important note which is included in Ola’s documentation:

Statistics will also be updated when the number of modified rows has reached a decreasing, dynamic threshold, SQRT(number of rows * 1000)

This is critical to understand because if the threshold I have set for @StatisticsModificationLevel ends up having a number of rows HIGHER than the formula above, statistics will update sooner than I expect.

For example, if I have 1 million rows in a table and I have @StatisticsModificationLevel = 10, then 10% of the rows, or 100,000, have to change in order to update statistics.  HOWEVER, if you plug 1 million into SQRT(1,000,000 * 1000), you get 31,623, which means Ola’s script will update statistics after 31,623 rows have changed…well before 100,000.

This may be important for some of you to understand in terms of these thresholds, so I dropped the information into a table to make it easier to comprehend (at least, it’s easier for me!).

Thresholds for Statistics Updates (percentage and SQRT algorithm)

Thresholds for Statistics Updates (percentage and SQRT algorithm)

Using my original example, if dbo.OrderStatus has about one million rows, then with 1% as the threshold, only 10,000 rows need to change before stats are updated.  If the SQRT algorithm were used, over 30,000 rows would need to change before stats were updated, and depending on the data skew, that might be too high.

Understand that as tables get larger, statistics will likely be updated before the set percentage value is reached because the SQRT algorithm has a lower threshold.  (Yes, I’m driving this point home.)  Consider a table with 10 million rows.  If I set the threshold to 5%, I would expect statistics to update after 500,000 modifications, but in fact they will update after 100,000.

If you’re wondering where the SQRT algorithm comes from, please review Microsoft’s Statistics documentation.  This threshold was originally introduced with trace flag 2371 to lower the threshold for automatic updates.  It is applied by default started in SQL Server 2016 when using compatibility level 130.  My assumption is that Ola determined this was a good threshold to use as a fail-safe/catch-all for his script, and I think it was smart move on his part.  In general, I’d rather have statistics update too often, rather than not often enough.  However, using the new @StatisticsModificationLevel option gives us better control than we’ve had previously, unless we write a custom script (which is still an option…do what works best for you!).

Do you need to update statistics after an upgrade?

This post originally went live on May 11, 2018, but modifications were made on May 14, 2018 after some additional internal discussions with Microsoft.  Changes made on May 14, 2018 are in blue. 

There are a variety of methods we use for helping customers upgrade to a new SQL Server version, and one question we often get asked is whether or not statistics need to be updated as part of the upgrade process.

tl;dr

Yes.  Update statistics after an upgrade. Further, if you’re upgrading to 2012 or higher from an earlier version, you should rebuild your indexes (which will update index statistics, so then you just need to update column statistics).

History

Some of you may remember that the stats blob changed between SQL Server 2000 and SQL Server 2005, and Microsoft specifically recommended updating statistics after upgrading from SQL Server 2000.  Official Microsoft documentation about the stats blog change in SQL Server 2005 is difficult to find, but this article includes the following paragraph:

After you upgrade from SQL Server 2000, update statistics on all databases. Use the sp_updatestats stored procedure to update statistics in user-defined tables in SQL Server databases. This step is not necessary for upgrading from SQL Server 2005.

Current Microsoft documentation related to upgrading does not state anything specific about updating statistics, but people continue to ask and if you peruse forums, blog posts, and other social media options, you’ll see recommendations to update statistics. Further, the documentation that Microsoft provides about when to update statistics does not mention anything about upgrades.

Side bar: I don’t recommend using sp_updatestats, and here’s why: Understanding What sp_updatestats Really Updates.

Today

The statistics blob has not changed since SQL Server 2000 to my knowledge, but I thought I would ask someone from Microsoft for an official recommendation to share publicly.  Here you go:

Microsoft suggests that customers test the need for a full update of statistics after a major version change and/or a database compatibility level change.

Further items to note:

  1. If Microsoft updates the format of statistics (e.g. the stats blog), customers will be expected to update statistics after an upgrade.
    1. Microsoft does not always upgrade the statistics format as part of a major version upgrade.
  2. There are occasions where Microsoft does not change the format of statistics, but they do change the algorithm for creating statistics as part of a major version upgrade or database compatibility level change.

In addition, there was a change in the nonclustered leaf level internals in SQL Server 2012, so if you are upgrading to 2012 or higher from an earlier version (e.g. 2008, 2008R2), rebuild your nonclustered indexes.  And remember, rebuilding indexes updates the statistics for those indexes with a fullscan, so you do not need to update them again.

Conclusion

As part of your upgrade methodology, it is recommended (by me, based on experience with a lot of customer upgrades) to build in time to update statistics.  I’ve gotten some pushback from customers who don’t want to update statistics after upgrade because it takes too long.  Some kind reminders:

  • Updating statistics is an online operation, therefore, the database and related applications are accessible and usable. A statistics update does take a schema modification lock so you’re not allowed to make any changes to a table while its stats are updating.  Therefore, if you decide to change your schema after upgrading your SQL Server version (not something I would typically recommend), do that before you update stats.
  • You need to update statistics regularly to provide the optimizer with current information about your data, so at some point it needs to be done. Immediately after an upgrade is a really good time, considering the aforementioned items.

If you’re not comfortable upgrading to a newer version of SQL Server, we can help!  I’m in the process of helping a customer migrate from SQL Server 2012 to SQL Server 2017, and I’m so excited to get them up to the latest version so they can start using some new features…like Query Store 😉

SQLskills SQL101: Updating SQL Server Statistics Part II – Scheduled Updates

In last week’s post I discussed the basics of how automatic updates to statistics occur in SQL Server.  This week I want to talk about scheduled (aka manual) updates, because as you might remember, we really want to control when statistics are updated.

In terms of updating statistics you have multiple options, including:

  • Update Statistics Task (Maintenance Plan)
  • sp_updatestats
  • UPDATE STATISTICS

For systems that do not have a full-time DBA, one of the easiest methods for managing statistics is the Update Statistics Task.  This task can be configured for all databases or certain databases, and you can determine what statistics it updates:

Update Statistics Task- deciding which statistics to update

Update Statistics Task – deciding which statistics to update

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

You might think you want to update All existing statistics.  If you just had a plan with just this task, that might be true.  But what I see most often is that someone configures the Rebuild Index task, and then has the Update Statistics task as the next step.  In that case, if you are running SQL Server 2014 and below, you want to update Column statistics only.  When you run the Rebuild Index task in SQL Server 2014, you rebuild all indexes, and when you rebuild an index, its statistic is updated with a fullscan.  Therefore, there is no need to update Index statistics after you rebuild all your indexes, but you do need to update column statistics.

This is a bit more complicated in SQL Server 2016.  The Rebuild Index task has more options in SQL Server 2016, which is nice for that specific task, but it makes providing guidance about statistics updates a bit trickier.  In SQL Server 2016 you can configure the Rebuild Index task so that it only rebuilds an index if a certain level of fragmentation exists.  Therefore, some of your indexes will rebuild (and thus have statistics updated) and some will not (and not have updated statistics).  How do you manage that with the Update Statistics task?  Well, in that case you probably select All existing statistics and update some statistics for a second time, which is really a waste.  Therefore, if you’re on SQL Server 2016, you probably want to look at more intelligent updates.

One method, which I would not say is intelligent, but it is an option, is to use sp_updatestats in a scheduled job that runs on a regular basis.  This command is one you run for a database, not for a specific statistic or index or table.  The sp_updatestats command will only update statistics if data has changed.  That sounds good, but the caveat is that only one (1) row has to have changed.  If I have a table with 2,000,000 rows, and only 5 rows have changed, I really don’t need to update statistics.

The other method is to use UPDATE STATISTICS in a scheduled job.  The UPDATE STATISTICS command can be run for individual statistics or for a table (updating all statistics for a table).  You can develop an intelligent method to use this command, which is what I recommend.  Rather than a blanket update to all statistics, or statistics where one row has changed, I prefer to update statistics that are outdated based on the amount of data that has changed.  Consider the aforementioned table with 2,000,000 rows.  If I let SQL Server update statistics automatically, I would need 400,500 rows to change.  It’s quite possible that with a table of that size I would want to statistics to update sooner – say after 200,000 rows had changed, or 10% of the table.

We can programmatically determine whether we need to update statistics using the sys.dm_db_stats_properties DMF.  This DMF tracks modifications, and also tells us how many rows were in the table when statistics were last updated, and the date statistics were updated. For example, if I update some rows in Sales.SalesOrderDetail, and then look at the output from the DMF, you can see that the modification counter matches the number of rows I changed* for the ProductID index:

USE [AdventureWorks2012];
GO

UPDATE [Sales].[SalesOrderDetail]
SET [ProductID] = [ProductID]
WHERE [ProductID] IN (921,873,712);
GO

SELECT
[so].[name] [TableName],
[ss].[name] [StatisticName],
[ss].[stats_id] [StatisticID],
[sp].[last_updated] [LastUpdated],
[sp].[rows] [RowsInTableWhenUpdated],
[sp].[rows_sampled] [RowsSampled],
[sp].[modification_counter] [NumberOfModifications]
FROM [sys].[stats] [ss]
JOIN [sys].[objects] [so] ON [ss].[object_id] = [so].[object_id]
CROSS APPLY [sys].[dm_db_stats_properties] ([so].[object_id], [ss].stats_id) [sp]
WHERE [so].[name] =  N'SalesOrderDetail';
GO
Output from sys.dm_db_stats_properties

Output from sys.dm_db_stats_properties

 

 

 

 

 

 

*You’re correct, I technically didn’t change ProductID to a new value, but SQL Server doesn’t know that.  Also, there’s a foreign key on that column which is why I can’t easily change it a random number.

Armed with this type of data, we can intelligently decide whether we should update statistics because a percentage of rows (rather than just a fixed number of rows) have changed.  In the example above, only 8% of data changed – probably not enough to require me to update statistics.  It’s quite possible that some statistics need to be updated daily because there is a high rate of change, and other statistics only need to be updated weekly or monthly because data doesn’t change much at all.

Ultimately, when it comes to scheduled updates of statistics, you can go the sledgehammer route (Update Statistics task or sp_updatestats) or the selective update route (UPDATE STATISTICS and sys.dm_db_stats_properties).  Using the Update Statistics task or sp_updatestats is easier if you’re not familiar with SQL Server and if you have the maintenance window and resources for it to run.  To be perfectly clear: if you’re a system administrator and want to update statistics, I’d rather you use this approach than nothing at all.  Presumably, if you don’t have a full-time DBA, you also don’t need the system to be available 24×7, so you can take the performance hit at night or on the weekend while all statistics update.  In that situation I’m ok with the approach.

But, if you are a DBA and you know how to write T-SQL, then you can absolutely write some code that programmatically looks at your statistics and decides what to update and what to skip.  Whatever method you use, just make sure your updates are scheduled to run regularly through an Agent Job, and make sure you have Auto Update Statistics enabled just in case the job doesn’t run and you don’t get notified for some reason (this would be Plan B, because it’s always good for DBAs to have a Plan B!).

Additional Resources