SQLskills SQL101: Updating SQL Server Statistics Part II – Scheduled Updates

In last week’s post I discussed the basics of how automatic updates to statistics occur in SQL Server.  This week I want to talk about scheduled (aka manual) updates, because as you might remember, we really want to control when statistics are updated.

In terms of updating statistics you have multiple options, including:

  • Update Statistics Task (Maintenance Plan)
  • sp_updatestats
  • UPDATE STATISTICS

For systems that do not have a full-time DBA, one of the easiest methods for managing statistics is the Update Statistics Task.  This task can be configured for all databases or certain databases, and you can determine what statistics it updates:

Update Statistics Task- deciding which statistics to update

Update Statistics Task – deciding which statistics to update

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

You might think you want to update All existing statistics.  If you just had a plan with just this task, that might be true.  But what I see most often is that someone configures the Rebuild Index task, and then has the Update Statistics task as the next step.  In that case, if you are running SQL Server 2014 and below, you want to update Column statistics only.  When you run the Rebuild Index task in SQL Server 2014, you rebuild all indexes, and when you rebuild an index, its statistic is updated with a fullscan.  Therefore, there is no need to update Index statistics after you rebuild all your indexes, but you do need to update column statistics.

This is a bit more complicated in SQL Server 2016.  The Rebuild Index task has more options in SQL Server 2016, which is nice for that specific task, but it makes providing guidance about statistics updates a bit trickier.  In SQL Server 2016 you can configure the Rebuild Index task so that it only rebuilds an index if a certain level of fragmentation exists.  Therefore, some of your indexes will rebuild (and thus have statistics updated) and some will not (and not have updated statistics).  How do you manage that with the Update Statistics task?  Well, in that case you probably select All existing statistics and update some statistics for a second time, which is really a waste.  Therefore, if you’re on SQL Server 2016, you probably want to look at more intelligent updates.

One method, which I would not say is intelligent, but it is an option, is to use sp_updatestats in a scheduled job that runs on a regular basis.  This command is one you run for a database, not for a specific statistic or index or table.  The sp_updatestats command will only update statistics if data has changed.  That sounds good, but the caveat is that only one (1) row has to have changed.  If I have a table with 2,000,000 rows, and only 5 rows have changed, I really don’t need to update statistics.

The other method is to use UPDATE STATISTICS in a scheduled job.  The UPDATE STATISTICS command can be run for individual statistics or for a table (updating all statistics for a table).  You can develop an intelligent method to use this command, which is what I recommend.  Rather than a blanket update to all statistics, or statistics where one row has changed, I prefer to update statistics that are outdated based on the amount of data that has changed.  Consider the aforementioned table with 2,000,000 rows.  If I let SQL Server update statistics automatically, I would need 400,500 rows to change.  It’s quite possible that with a table of that size I would want to statistics to update sooner – say after 200,000 rows had changed, or 10% of the table.

We can programmatically determine whether we need to update statistics using the sys.dm_db_stats_properties DMF.  This DMF tracks modifications, and also tells us how many rows were in the table when statistics were last updated, and the date statistics were updated. For example, if I update some rows in Sales.SalesOrderDetail, and then look at the output from the DMF, you can see that the modification counter matches the number of rows I changed* for the ProductID index:

USE [AdventureWorks2012];
GO

UPDATE [Sales].[SalesOrderDetail]
SET [ProductID] = [ProductID]
WHERE [ProductID] IN (921,873,712);
GO

SELECT
[so].[name] [TableName],
[ss].[name] [StatisticName],
[ss].[stats_id] [StatisticID],
[sp].[last_updated] [LastUpdated],
[sp].[rows] [RowsInTableWhenUpdated],
[sp].[rows_sampled] [RowsSampled],
[sp].[modification_counter] [NumberOfModifications]
FROM [sys].[stats] [ss]
JOIN [sys].[objects] [so] ON [ss].[object_id] = [so].[object_id]
CROSS APPLY [sys].[dm_db_stats_properties] ([so].[object_id], [ss].stats_id) [sp]
WHERE [so].[name] =  N'SalesOrderDetail';
GO
Output from sys.dm_db_stats_properties

Output from sys.dm_db_stats_properties

 

 

 

 

 

 

*You’re correct, I technically didn’t change ProductID to a new value, but SQL Server doesn’t know that.  Also, there’s a foreign key on that column which is why I can’t easily change it a random number.

Armed with this type of data, we can intelligently decide whether we should update statistics because a percentage of rows (rather than just a fixed number of rows) have changed.  In the example above, only 8% of data changed – probably not enough to require me to update statistics.  It’s quite possible that some statistics need to be updated daily because there is a high rate of change, and other statistics only need to be updated weekly or monthly because data doesn’t change much at all.

Ultimately, when it comes to scheduled updates of statistics, you can go the sledgehammer route (Update Statistics task or sp_updatestats) or the selective update route (UPDATE STATISTICS and sys.dm_db_stats_properties).  Using the Update Statistics task or sp_updatestats is easier if you’re not familiar with SQL Server and if you have the maintenance window and resources for it to run.  To be perfectly clear: if you’re a system administrator and want to update statistics, I’d rather you use this approach than nothing at all.  Presumably, if you don’t have a full-time DBA, you also don’t need the system to be available 24×7, so you can take the performance hit at night or on the weekend while all statistics update.  In that situation I’m ok with the approach.

But, if you are a DBA and you know how to write T-SQL, then you can absolutely write some code that programmatically looks at your statistics and decides what to update and what to skip.  Whatever method you use, just make sure your updates are scheduled to run regularly through an Agent Job, and make sure you have Auto Update Statistics enabled just in case the job doesn’t run and you don’t get notified for some reason (this would be Plan B, because it’s always good for DBAs to have a Plan B!).

Additional Resources

SQLskills SQL101: Updating SQL Server Statistics Part I – Automatic Updates

One of my favorite topics in SQL Server is statistics, and in my next two posts I want to cover how they are updated: either by SQL Server or by you.

We’ll start with updates by SQL Server, and these happen automatically. In order for automatic updates of statistics to occur, the AUTO UPDATE STATISTICS database option must be enabled for the database:

Auto Update Statistics option via SSMS

Auto Update Statistics option via SSMS

 

 

 

 

 

 

 

 

 

 

 

 

This option is enabled by default for every new database you create in SQL Server 2005 and higher, and it is recommended to leave this option enabled. If you’re not sure if this option is enabled, you can check in the UI or you can use the following T-SQL:

SELECT
	[name] [DatabaseName],
	CASE
		WHEN [is_auto_update_stats_on] = 1 THEN 'Enabled'
		ELSE 'Disabled'
	END [AutoUpdateStats]
FROM [sys].[databases]
ORDER BY [name];
GO

If you want to enable the option, you can run:

USE [master];
GO
ALTER DATABASE [<database_name_here] SET AUTO_UPDATE_STATISTICS ON WITH NO_WAIT;
GO

With the option enabled, SQL Server marks statistics as out of date based on internal thresholds.

For SQL Server 2014 and earlier, the threshold was 500 rows plus 20% of the total rows in a table. For example, if I have a table with 10,000 rows in it, when 2500 rows have changed, then SQL Server marks the statistic as out of date. There are exceptions to this (e.g. when a table has less than 500 rows, or if the table is temporary), but in general this threshold is what you need to remember.

A new trace flag, 2371, was introduced in SQL Server 2008R2 SP1 to lower this threshold. This change was designed to target large tables. Imagine a table with 10 million rows; over 2 million rows would need to change before statistics would be marked as out of date. With trace flag 2371, the threshold is lower.

In SQL Server 2016, the threshold introduced by trace flag 2371 is used if you have the compatibility mode for a database set to 130. This means that in SQL Server 2016, you only need to use trace flag 2371 to get that lower threshold if you have the database compatibility mode set to 120 or lower.
If statistics have been marked as out of date, then they will be updated by SQL Server automatically the next time they are used in a query. Understand that they are not updated the moment they are out of date…they are not updated until they are needed. Imagine the following scenarios using the original threshold:

Example 1 – PhysicianData

Date/TimeAction
Sunday, March 19, 2017 2:00 AMStatistics updated for table PhysicianData, which has 500,000 rows in it
Monday, March 21, 6:00 AMProcessing job runs, and 50,000 new rows are added to the PhysicianData table
Tuesday, March 21, 6:00 AMProcessing job runs, and 50,500 new rows are added to the PhysicianData table; statistics for PhysicianData are marked as out of date
Tuesday, March 21, 7:35 AMA user queries PhysicianData for the first time since processing ran at 6:00 AM; statistics for PhysicianData are updated

 

Example 2 – PatientData

Date/TimeAction
Sunday, March 19, 2017 2:00 AMStatistics updated for table PatientData, which has 2,000,000 rows in it
Monday, March 20, all dayDifferent processes and user activities access PatientData, adding new rows, changing existing rows.  By the end of day 100,000 rows have changed or been added.
Tuesday, March 21, all dayDifferent processes and user activities access PatientData, adding new rows, changing existing rows.  By the end of day 250,000 rows have changed or been added.
Wednesday, March 22, all dayDifferent processes and user activities access PatientData, adding new rows, changing existing rows.  At 8:15PM, 400,500 rows have changed or been added.
Wednesday, March 22, 8:16 PMA user queries PatientData; statistics for PatientData are updated

 

I’ve given two very contrived example to help you understand that statistics are not always updated the exact moment they are marked as out of date.  They might be – if the table has a lot of activity, but they might not be.

As I stated originally, it is recommended to leave this option enabled for a database.  However, we do not want to rely on SQL Server for our statistics updates.  In fact, think of this option as a safety net for statistics. We want to control when statistics are updated, not SQL Server.  Consider of the first scenario I described, where statistics updated at 7:35AM.  If that’s a busy time of day and this is a large table, it could affect performance in the system.  It’s preferable to have statistics updated when the system has less activity, so that resource use doesn’t contend with user activity, but we always want to leave Auto Update Statistics enabled for a database…just in case.

Additional Resources:

SQL Server 2016 Upgrade Testing with the New Cardinality Estimator: Context Matters

This week I’ve been working with SQL Server 2016 and it’s been a lot of fun – this is truly a great release from the SQL Server team.  Yesterday I was working on some upgrade testing with regard to the new Cardinality Estimator (CE), introduced in SQL Server 2104 and well explained in this whitepaper written by my friend Joe Sack.  If you are upgrading to SQL Server 2014 or 2016, you should test not just your upgrade process, but also the performance of your queries with the new CE.  In some cases we have seen good/great improvements in query performance with the new CE, but in other cases we have seen significant regressions which dramatically affect overall system performance.

There are a variety of things to look for when testing beyond just query duration and resource use – you should also be looking at estimates and plan shape, which means capturing the execution plan.  I was doing just that yesterday, and in my testing I was changing the compatibility mode for the user database and running my queries with the different CEs.  I was then confirming the CE version used for the query by looking at the plan and checking the CardinalityEstimationModelVersion attribute.  Remember that CE version is tied to the compatibility mode, and the CE that will be used is tied to the current database context.  So if you’re running cross database or distributed queries, this is something to which you need to pay attention.

Consider this scenario…

I have a new installation of SQL Server 2016, and I restore my user database to the instance.  By default the compatibility is not changed unless the database you’re restoring has a compatibility level of 90 (remember you can upgrade from SQL Server 2005+ to any other version).  In that case, it will automatically be bumped up to 100 for both SQL Server 2014 and SQL Server 2016.  For more details on supported compatibility levels for versions and the differences between the levels, see ALTER Database Compatibility Level.

Because this is a new installation of SQL Server 2016, the system databases have compatibility level of 130.  I restore two user databases to my instance (previously running on SQL Server 2012), call them AdventureWorks and WideWorldImporters.  I leave the compatibility level for AdventureWorks at 110, and set it to 130 for WideWorldImporters.  I then start running queries, some of which are specific to a database, others which query both databases.

In the case where I’m in the context of AdventureWorks, the old CE version used – even when I query the WideWorldImporters database.  And if I’m in the WideWorldImporters database, querying across to AdventureWorks, the new CE is used.  Here’s setup code:

/*
	Restore the databases
	(change locations as appropriate)
*/
USE [master];
GO

RESTORE DATABASE [WideWorldImporters]
	FROM  DISK = N'C:\Backups\WideWorldImporters-Full.bak'
	WITH  FILE = 1,
	MOVE N'WWI_Primary' TO N'C:\Databases\WideWorldImporters\WideWorldImporters.mdf',
	MOVE N'WWI_UserData' TO N'C:\Databases\WideWorldImporters\WideWorldImporters_UserData.ndf',
	MOVE N'WWI_Log' TO N'C:\Databases\WideWorldImporters\WideWorldImporters.ldf',
	MOVE N'WWI_InMemory_Data_1' TO N'C:\Databases\WideWorldImporters\WideWorldImporters_InMemory_Data_1',
	NOUNLOAD,
	REPLACE,
	STATS = 5;
GO

RESTORE DATABASE [AdventureWorks2016]
	FROM  DISK = N'C:\Backups\AW2014_Base.bak'
	WITH  FILE = 1,
	MOVE N'AdventureWorks2014_Data' TO N'C:\Databases\AdventureWorks2016\AdventureWorks2016_Data.mdf',
	MOVE N'AdventureWorks2014_Log' TO N'C:\Databases\AdventureWorks2016\AdventureWorks2016_Log.ldf',
	NOUNLOAD,
	REPLACE,
	STATS = 5;
GO

/*
	Set the compatibility levels
*/
USE [master];
GO
ALTER DATABASE [WideWorldImporters] SET COMPATIBILITY_LEVEL = 130;
GO
ALTER DATABASE [AdventureWorks2016] SET COMPATIBILITY_LEVEL = 110;
GO

First, run the query in the context of WideWorldImporters, with the execution plan displayed:

USE [WideWorldImporters];
GO

SET STATISTICS XML ON;
GO

SELECT
	[o].[CustomerID],
	.[StoreID],
	[o].[OrderDate],
	[ol].[StockItemID],
	[ol].[Quantity],
	[ol].[UnitPrice]
FROM [WideWorldImporters].[Sales].[Orders] [o]
JOIN [WideWorldImporters].[Sales].[OrderLines] [ol] on [o].[OrderID] = [ol].[OrderID]
JOIN [AdventureWorks2016].[Sales].[Customer]  ON .[CustomerID] = [o].[CustomerID]
WHERE [o].[OrderDate] BETWEEN '2016-05-01' AND '2016-05-31'
ORDER BY [o].[OrderDate] DESC;
GO

SET STATISTICS XML OFF;
GO

If you open up the graphical plan, and display the Properties window (F4) and then click on the SELECT operator in the plan, I can see that the CardinalityEstimationModelVersion is 130 (you can also view the XML and do a search to find it there):

Execution plan using the new CE

Execution plan using the new CE

Now run the query in the context of AdventureWorks, again with the execution plan displayed:

USE [AdventureWorks2016];
GO

SET STATISTICS XML ON;
GO

SELECT
	[o].[CustomerID],
	.[StoreID],
	[o].[OrderDate],
	[ol].[StockItemID],
	[ol].[Quantity],
	[ol].[UnitPrice]
FROM [WideWorldImporters].[Sales].[Orders] [o]
JOIN [WideWorldImporters].[Sales].[OrderLines] [ol] on [o].[OrderID] = [ol].[OrderID]
JOIN [AdventureWorks2016].[Sales].[Customer]  ON .[CustomerID] = [o].[CustomerID]
WHERE [o].[OrderDate] BETWEEN '2016-05-01' AND '2016-05-31'
ORDER BY [o].[OrderDate] DESC;
GO

SET STATISTICS XML OFF;
GO

This time when you look at the graphical plan, the CardinalityEstimationModelVersion is 70, indicating it’s using the old CE:

Execution plan using the old CE

Execution plan using the old CE

In this example, I have users connecting to both databases, and they can be querying one database or both.  Note that if you happen to keep users in the context of one database (e.g. a user database that has no data in it, but you use as a “gateway” to other database), then the compatibility level for that database will determine what CE the queries use.  The exception to all this, of course, is the CE-related traceflags.  You can override the compatibility level and force the optimizer to use a specific CE by using one of two trace flags:

  • Use Trace Flag 9481 to revert to the legacy CE behavior from the context of a database with a compatibility level of 120 or higher
  • Use Trace Flag 2312 to enable to the new CE from the context of a database with a compatibility level below 120

Therefore, when you’re testing your upgrade to SQL Server 2014 or 2016, take the time to look beyond the basics.  Upgrading is not just about making sure nothing breaks, it’s also about making sure that performance doesn’t degrade, even if you don’t change hardware (especially if you don’t change hardware!).  We don’t typically expect that upgrading to a newer software version would negatively affect performance, but because the change the SQL Server’s Cardinality Estimator is significant, this is one you definitely want to test.