A new question on an older post, Understanding When Statistics Will Automatically Update, came in this week. If you look in the comments, you see that Rajendra asks:
…I want know, whether schema change of a table (adding new / (deleting an existing) column) also qualifies for UPDATE STATS?
It’s an interesting question that I’d never considered before. I had an instinct about the answer, but the scientist in me wanted to test and prove it, so here we go. The scenario and code are similar to my original post, so this should be a quick read for many of you
Start with a copy of the
AdventureWorks2012 database, which you can download from CodePlex, on a 2012 SP1 instance, and confirm that the Auto Update Statistics option is enabled:
IF (SELECT COUNT(*) FROM [sys].[databases] WHERE [name] = 'AdventureWorks2012' AND [is_auto_create_stats_on] = 0) = 0 BEGIN ALTER DATABASE [AdventureWorks2012] SET AUTO_UPDATE_STATISTICS ON END;
Create a copy of the Sales.SalesOrderDemo table for our testing, and add the clustered index and one nonclustered index:
USE [AdventureWorks2012]; GO SELECT * INTO [Sales].[TestSalesOrderDetail] FROM [Sales].[SalesOrderDetail]; GO CREATE CLUSTERED INDEX [PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID] ON [Sales].[TestSalesOrderDetail] ([SalesOrderID], [SalesOrderDetailID]); GO CREATE NONCLUSTERED INDEX [IX_TestSalesOrderDetail_ProductID] ON [Sales].[TestSalesOrderDetail] ([ProductID]); GO
Validate the current statistics and modifications; the statistics should be current since we just created the indexes, and modifications should be 0 since we haven’t run any inserts, updates, or deletes:
/* Note: I refer to this query throughout the post as "the sys.stats query", rather than including it multiple times! */ SELECT OBJECT_NAME([sp].[object_id]) AS "Table", [sp].[stats_id] AS "Statistic ID", [s].[name] AS "Statistic", [sp].[last_updated] AS "Last Updated", [sp].[rows], [sp].[rows_sampled], [sp].[unfiltered_rows], [sp].[modification_counter] AS "Modifications" FROM [sys].[stats] AS [s] OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [sp] WHERE [s].[object_id] = OBJECT_ID(N'Sales.TestSalesOrderDetail');
Excellent, we’re starting with a clean slate.
Let’s first add a little bit of data (1000 rows), to up our modification count:
BULK INSERT [AdventureWorks2012].[Sales].[TestSalesOrderDetail] FROM 'S:\SQLStuff\Dropbox\Statistics\Data\sod_1000.txt' WITH ( DATAFILETYPE = 'native', TABLOCK );
When we run our
sys.stats query again, we see the 1000 modifications for both indexes, because both indexes had rows added:
Great. Now add a new column to the table…let’s pretend that we have a new ProductID, but we want to keep the existing one for historical reasons, so our new column is, creatively, NewProductID:
ALTER TABLE [Sales].[TestSalesOrderDetail] ADD [NewProductID] INT;
Did adding this column cause statistics to update? This was our main question, and if we run our
sys.stats query we see:
Nothing changed… Adding a new column to the table does not invoke an automatic update to statistics. Nor should it. We haven’t modified data in any existing statistics. We would certainly expect a change in query plans, for existing queries that are modified to use this new column, but adding a new column to a table does not cause any existing statistics for the table to update.
More Testing (because it’s good practice to keep proving out what we expect)
Just for fun, let’s see what happens if we modify some more data:
UPDATE Sales.TestSalesOrderDetail SET ProductID = 717 WHERE SalesOrderID = 75123; GO
If we check
sys.stats, we see that our nonclustered index, IX_TestSalesOrderDetail_ProductID, has three more modifications than it had previously, but not enough data has changed overall for statistics to be invalidated.
What happens if we update NewProductID?
UPDATE [Sales].[TestSalesOrderDetail] SET [NewProductID] = [ProductID] + 1000;
Well, we updated 122317 rows, but only the NewProductID column, which isn’t part of either the clustered index or nonclustered index key. Therefore, no update to statistics:
To prove this out further, create a nonclustered index on NewProductID:
CREATE NONCLUSTERED INDEX [NCI_TestSalesOrderDetail_NewProductID] ON [Sales].[TestSalesOrderDetail] ([NewProductID]); GO
And if we verify the statistics and modifications, we see that our new NCI has zero modifications, as expected.
We’ll make another massive update to NewProductID, because the first update wasn’t correct, and then check
UPDATE [Sales].[TestSalesOrderDetail] SET [NewProductID] = [ProductID] + 10000;
Now we see the modifications – all 122317 of them. If we query the NewProductID column now, we should invoke an automatic update to statistics, because they are invalid:
SELECT NewProductID, SalesOrderID FROM [Sales].[TestSalesOrderDetail] WHERE NewProductID = 10718;
And there you have it. Adding a column to a table will not cause an automatic update to statistics to occur. Automatic updates occur only when the threshold for modifications has been exceeded for a statistics key.
/* clean up code */ USE [AdventureWorks2012]; GO DROP TABLE [Sales].[TestSalesOrderDetail];