Last November I blogged about how index usage stats don’t get updated when the associated index statistics (but not index) are used.

This post will describe another scenario that you should be aware of (the topic came up today in class while Kimberly was teaching – as we were trying to recall tricks to clearing stats for sys.dm_db_index_usage_stats)…

Imagine that I’ve queried a specific table as follows:

SELECT member_no, lastname, firstname, middleinitial, street, city, state_prov, country
FROM dbo.member
WHERE member_no = 1;

If I check sys.dm_db_index_usage_stats for any reference to the member table, I’ll see the following:

SELECT i.index_id, i.name, u.user_seeks, u.user_lookups, u.user_scans
FROM sys.dm_db_index_usage_stats u
INNER JOIN sys.indexes i ON
     u.object_id = i.object_id AND
     u.index_id = i.index_id
WHERE u.object_id=object_id('dbo.member')

This returns:

image

Now let’s say that you have a weekly rebuild of specific indexes (for example):

ALTER INDEX member_ident
ON dbo.member REBUILD

If I check for usage stats after rebuilding the query (and before anyone has accessed the member table specifically) – the stats have been cleared out for that table. 

image

Why does this matter? 

If you’re using the sys.dm_db_index_usage_stats to determine which indexes should be removed, you’re running the risk of making decisions based on recently cleared out statistics.  This is similar to the case where a SQL Server instance has been recently restarted.  You should not be dropping indexes without knowing whether the accumulated statistics represent the full set of critical workloads. 

For tables with frequent index rebuilds, be sure to capture data from sys.dm_db_index_usage_stats before these jobs run.  This DMV is definitely a useful tool, but if you’re not careful, you could be dropping indexes based on missing information.

A few other noteworthy items:

  • Rebuilding an index only clears the stats for the index involved.  For example – if I have two rows in the DMV for two different indexes on the same object, rebuilding one index will only clear stats for that object.
  • Reorganizing the index does NOT clear the stats from sys.dm_db_index_usage_stats.
  • This is not the only circumstance where the stats will clear.
  • I have only tested this recently on SQL Server 2008 R2 and SQL Server 2012.  At one point when I was at Microsoft I blogged about seeing the stats persist after a REBUILD (more in the context of REBUILD itself – and I had mentioned the DMV in passing).  Which makes me think that perhaps this behavior wasn’t always the case.  If you’ve found a different behavior on different versions, please post your comment here.

Thanks!

Categories:
indexing | Performance

I was thinking back to a conversation I had some time ago regarding cases when sys.dm_db_index_usage_stats does not get updated after an index is used for just the statistics associated with the indexed columns.  What makes this profession so interesting is that almost every tool we use has a "yes, but..." associated with it.  What a DMV gives with one hand it might hide with another.

While I recalled the conversation, I needed to double check who I had actually discussed it with.  Admittedly, it’s a short list.  After checking with him over email, I realized that conversation had been with MVP, Microsoft RD, MCM Greg Low.  He also mentioned that Rob Farley once demonstrated the effect of creating a unique non-clustered index on a column which impacts the query plan but doesn’t actually register that use in sys.dm_db_index_usage_stats.

While the conversation stuck with me, I had never taken time to test it out.  In this post, I’ll be testing out a different scenario, but essentially a similar situation where index column stats are used for a query (to provide more accurate estimates) – but its use results in no update to sys.dm_db_index_usage_stats. 

Before getting to the demo, I do want to say that I still think sys.dm_db_index_usage_stats is incredibly useful for evaluating usage patterns and helping to identify indexes that aren’t pulling their weight (high cost – low benefit).  I also see the following scenario as an edge case – but something to most certainly be mindful of as one reason why you could all-of-the-sudden see cardinality estimate issues after dropping an index that wasn’t showing as being used at all.

In this demo, I’m using SQL Server 2008 R2 (10.50.1617) and the AdventureWorksDW database.   I’ll start off by disabling auto-creation of statistics (you’ll see why shortly).  While I see auto-creation enabled more often than not, I have seen cases where it has mindfully been disabled and cases where it was disabled for no good reason at all – a topic for another day:

-- Disable creation of statistics

USE [master]

GO

 

ALTER DATABASE [AdventureWorksDW] SET AUTO_CREATE_STATISTICS OFF WITH NO_WAIT

GO

Next I’ll execute the following query against the dbo.FactInternetSales table – and I’ve enabled the “Include Actual Execution Plan” in SSMS so I can see the actual plan:

USE [AdventureWorksDW]

GO

 

-- Estimated rows (no stats on TaxAmt) = 3,853

-- Actual rows (562)

SELECT RevisionNumber

FROM dbo.FactInternetSales

WHERE TaxAmt = 5.08

We can check the estimated versus actual rows in SSMS, but I’ll actually show the results in SQL Sentry Plan Explorer because I like the tabular format (and I don’t have to “hover” to see it).  The following screen shot is from the Plan Tree tab:

clip_image001

As you can see – the estimated rows were 3,853 versus 562 rows.  Now recall that automatic creation of statistics are disabled, so I’m going to manually create the following index:

 -- Create an index on TaxAmt (which means we also get stats)

CREATE INDEX IX_FactInternetSales_TaxAmt ON

dbo.FactInternetSales (TaxAmt)

If I re-execute the previous query against dbo.FactInternetSales, I see the following:

clip_image002

Now our estimates match the actual results.  We have a better estimate, although notice that a table scan was still chosen (although imagine the disparity had this table had significantly more rows - and associated impact).   But in essence, we did use the statistics associated with that index, so what about index usage stats? 

If I execute the following query, I get no results at all:

-- Was the index used? Not according to sys.dm_db_index_usage_stats

SELECT u.user_seeks, u.user_lookups, u.user_scans

FROM sys.dm_db_index_usage_stats u

INNER JOIN sys.indexes i ON

     u.object_id = i.object_id AND

     u.index_id = i.index_id

WHERE u.object_id=object_id('dbo.FactInternetSales') AND

     i.name = 'IX_FactInternetSales_TaxAmt'

So let’s drop the index we just created and see what happens:

USE [AdventureWorksDW]

GO

 

DROP INDEX [IX_FactInternetSales_TaxAmt] ON [dbo].[FactInternetSales] WITH ( ONLINE = OFF )

GO

 

-- Estimated rows with index = 3,853

-- Actual rows (562)

SELECT RevisionNumber

FROM dbo.FactInternetSales

WHERE TaxAmt = 5.08

As you may expect, we’re back to the estimation issue:

clip_image001[1]

As a final test, let’s enable auto-creation of statistics and re-execute the query (again, with no supporting index statistics):

USE [master]

GO

 

ALTER DATABASE [AdventureWorksDW] SET AUTO_CREATE_STATISTICS ON WITH NO_WAIT

GO

 

USE [AdventureWorksDW]

GO

 

SELECT RevisionNumber

FROM dbo.FactInternetSales

WHERE TaxAmt = 5.08

The plan shows that actual versus estimated matches again (without the index) because auto-creation of statistics occurred in the background:

clip_image002[1]

And I can validate if auto-statistics were created as follows:

SELECT s.name, STATS_DATE(s.object_id, s.stats_id) auto_stats_date

FROM sys.stats s

INNER JOIN sys.stats_columns c ON

     s.stats_id = c.stats_id AND

     s.object_id = c.object_id

WHERE s.object_id = object_id('dbo.FactInternetSales') AND

     s.auto_created = 1 AND

     c.column_id = 20 – TaxAmt

And indeed – this was the case:

clip_image003

Will this stop me from using sys.dm_db_index_usage_stats to identify high cost/low benefit indexes? 

Absolutely not.  The potential benefit of identifying and eliminating wasteful indexing is too great and this is a fantastic (but not perfect) method to use in assessing an indexes’ value. 

However – I will also be mindful of such scenarios.  If someone tells me that plans have turned for the worse after an index cleanup operation, I’ll validate this very scenario.  And even if I found this as the root cause, my bias would revolve around creating the needed statistics, rather than creating an index that is not used for actual data access.

Wish list time… One “nice to have” for a future version of sys.dm_db_index_usage_stats would be to add a stats_lookup bigint column and a last_stats_lookup datetime column.  I would see it as a great way to ensure we address indexes that are used exclusively for the index column statistics associated with it.

Today there isn’t a fine-tuned way to clear the accumulated statistics from the “missing index” set of DMVs.  Certainly restarting SQL Server will do this, but you don’t have something like DBCC SQLPERF (‘sys.dm_os_wait_stats’, CLEAR) to reset statistics for dm_db_missing_index_columns, dm_db_missing_index_groups, dm_db_missing_index_group_stats and dm_db_missing_index_details.

While we don’t have this direct option, there is an interesting behavior that I’ll demonstrate on this blog post using SQL Server 2008 R2 (10.50.1617).

Right after restarting SQL Server, I run the following query to generate a few entries in the missing index DMVs (each query spawns a missing index recommendation):

USE AdventureWorks

GO

 

SELECT ProductID

FROM Sales.SalesOrderDetail

WHERE UnitPriceDiscount = 0.40

GO

 

SELECT SalesOrderID

FROM Sales.SalesOrderDetail

WHERE LineTotal = 236.421500

GO

 

SELECT SalesOrderID

FROM Sales.SalesOrderHeader

WHERE TaxAmt = '10.316'

GO

Next I execute the following query to see the three entries in the missing index DMVs:

SELECT      s.last_user_seek,

            d.object_id,

            d.equality_columns,

            d.inequality_columns,

            d.included_columns,

            d.statement,

            s.avg_user_impact

FROM sys.dm_db_missing_index_group_stats AS s

INNER JOIN sys.dm_db_missing_index_groups AS g

    ON (s.group_handle = g.index_group_handle)

INNER JOIN sys.dm_db_missing_index_details AS d

    ON (g.index_handle = d.index_handle)

This returns:

clip_image002

So I have three index suggestions so far. What happens if I create one of suggested indexes for the SalesOrderDetail table?

USE [AdventureWorks]

GO

 

CREATE NONCLUSTERED INDEX IDX_SalesOrderDetail_UnitPriceDiscount

ON [Sales].[SalesOrderDetail] ([UnitPriceDiscount])

INCLUDE ([ProductID])

GO

If I re-execute the query against the missing index DMVs – I’ll now see that BOTH suggestions from the SalesOrderDetail table got cleared out (even though I only added one of the suggestions), leaving behind the suggestion for the other table (SalesOrderHeader):

clip_image004

So by virtue of adding an index to SalesOrderDetail, it clears out BOTH suggestions for that table. 

And if I re-execute the query for the LineTotal column on SalesOrderDetail (that was not yet indexed), the missing index entry pops back in (since we didn’t create an index for this):

clip_image006

And just to test from another direction, I'll add the suggested index on the SalesOrderHeader table:

USE [AdventureWorks]

GO

 

CREATE NONCLUSTERED INDEX IDX_SalesOrderHeader_TaxAmt

ON [Sales].[SalesOrderHeader] ([TaxAmt])

Re-running the query against the missing index DMVs, sure enough, the SalesOrderHeader suggestion is removed, leaving the SalesOrderDetail suggestion (for the one index we haven’t yet created):

clip_image008

So in a nutshell, the missing index views aren’t as static as they may appear to be and if you’re adding indexes, you may expect to see suggestions disappear for that table (not waiting for a SQL Server restart) - and this happens whether or not you created indexes for each suggestion for a specific table.  I can understand the trade-offs of this behavior – especially since a new index on a table can have an impact on future execution plans.  Basically one index creation “resets” the suggestions.  But that also means that you should be sampling these DMVs over time and not rely fully on a  single sampling.

Theme design by Nukeation based on Jelle Druyts