One of my favorite topics in SQL Server is statistics, and in my next two posts I want to cover how they are updated: either by SQL Server or by you.
We’ll start with updates by SQL Server, and these happen automatically. In order for automatic updates of statistics to occur, the AUTO UPDATE STATISTICS database option must be enabled for the database:
This option is enabled by default for every new database you create in SQL Server 2005 and higher, and it is recommended to leave this option enabled. If you’re not sure if this option is enabled, you can check in the UI or you can use the following T-SQL:
SELECT [name] [DatabaseName], CASE WHEN [is_auto_update_stats_on] = 1 THEN 'Enabled' ELSE 'Disabled' END [AutoUpdateStats] FROM [sys].[databases] ORDER BY [name]; GO
If you want to enable the option, you can run:
USE [master]; GO ALTER DATABASE [<database_name_here] SET AUTO_UPDATE_STATISTICS ON WITH NO_WAIT; GO
With the option enabled, SQL Server marks statistics as out of date based on internal thresholds.
For SQL Server 2014 and earlier, the threshold was 500 rows plus 20% of the total rows in a table. For example, if I have a table with 10,000 rows in it, when 2500 rows have changed, then SQL Server marks the statistic as out of date. There are exceptions to this (e.g. when a table has less than 500 rows, or if the table is temporary), but in general this threshold is what you need to remember.
A new trace flag, 2371, was introduced in SQL Server 2008R2 SP1 to lower this threshold. This change was designed to target large tables. Imagine a table with 10 million rows; over 2 million rows would need to change before statistics would be marked as out of date. With trace flag 2371, the threshold is lower.
In SQL Server 2016, the threshold introduced by trace flag 2371 is used if you have the compatibility mode for a database set to 130. This means that in SQL Server 2016, you only need to use trace flag 2371 to get that lower threshold if you have the database compatibility mode set to 120 or lower.
If statistics have been marked as out of date, then they will be updated by SQL Server automatically the next time they are used in a query. Understand that they are not updated the moment they are out of date…they are not updated until they are needed. Imagine the following scenarios using the original threshold:
Example 1 – PhysicianData
|Sunday, March 19, 2017 2:00 AM||Statistics updated for table PhysicianData, which has 500,000 rows in it|
|Monday, March 21, 6:00 AM||Processing job runs, and 50,000 new rows are added to the PhysicianData table|
|Tuesday, March 21, 6:00 AM||Processing job runs, and 50,500 new rows are added to the PhysicianData table; statistics for PhysicianData are marked as out of date|
|Tuesday, March 21, 7:35 AM||A user queries PhysicianData for the first time since processing ran at 6:00 AM; statistics for PhysicianData are updated|
Example 2 – PatientData
|Sunday, March 19, 2017 2:00 AM||Statistics updated for table PatientData, which has 2,000,000 rows in it|
|Monday, March 20, all day||Different processes and user activities access PatientData, adding new rows, changing existing rows. By the end of day 100,000 rows have changed or been added.|
|Tuesday, March 21, all day||Different processes and user activities access PatientData, adding new rows, changing existing rows. By the end of day 250,000 rows have changed or been added.|
|Wednesday, March 22, all day||Different processes and user activities access PatientData, adding new rows, changing existing rows. At 8:15PM, 400,500 rows have changed or been added.|
|Wednesday, March 22, 8:16 PM||A user queries PatientData; statistics for PatientData are updated|
I’ve given two very contrived example to help you understand that statistics are not always updated the exact moment they are marked as out of date. They might be – if the table has a lot of activity, but they might not be.
As I stated originally, it is recommended to leave this option enabled for a database. However, we do not want to rely on SQL Server for our statistics updates. In fact, think of this option as a safety net for statistics. We want to control when statistics are updated, not SQL Server. Consider of the first scenario I described, where statistics updated at 7:35AM. If that’s a busy time of day and this is a large table, it could affect performance in the system. It’s preferable to have statistics updated when the system has less activity, so that resource use doesn’t contend with user activity, but we always want to leave Auto Update Statistics enabled for a database…just in case.
- Understanding When Statistics Will Automatically Update
- Further Testing with Automatic Updates to Statistics
- Controlling Autostat (AUTO_UPDATE_STATISTICS) behavior in SQL Server
- Changes to automatic update statistics in SQL Server – traceflag 2371
- Statistical maintenance functionality (autostats) in SQL Server