The post A first look at the query_optimizer_estimate_cardinality XE event appeared first on Joe Sack.
]]>In this post I’m just sharing my initial exploration steps regarding the query_optimizer_estimate_cardinality XE event. I’m not entirely sure how well this event will be documented, but I’m definitely interested in learning more about it.
For my test scenario, I attached a version of AdventureWorksLT2012, set it to compatibility level 120 and then created the following session:
CREATE EVENT SESSION [XE_Card_Calculator] ON SERVER ADD EVENT sqlserver.query_optimizer_estimate_cardinality WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS, MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF) GO
If you’re looking for the query_optimizer_estimate_cardinality in the GUI, keep in mind that it is in the Debug channel (so you’ll need to select this in order to see it). This also implies that there is likely nontrivial overhead to enabling this event, so while there isn’t an explicit warning for this event like other more invasive ones, I would still use it with caution.
As for the description of this event in the GUI, it is as follows:
“Occurs when the query optimizer estimates cardinality on a relational expression.”
Okay – no big deal, right? Why care?
So in keeping things simple, I executed the following query against a single table (using the RECOMPILE so I can get the event each time I tested it out):
SELECT AddressLine1 FROM [SalesLT].[Address] WHERE AddressID = 9 OPTION (RECOMPILE);
The actual query execution plan had a Clustered Index Seek with an estimate of 1 row. And gathering the query_optimizer_estimate_cardinality event I saw two events surfaced.
The first event had the following information:
| calculator | <CalculatorList> <FilterCalculator CalculatorName=”CSelCalcUniqueKeyFilter” /> </CalculatorList> |
| creation_time | 2013-11-16 16:56:35.6666666 |
| input_relation | <Operator Name=”LogOp_Select” ClassNo=”32″> <StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”450.00″ TableName=”SalesLT.Address” /> <Operator Name=”ScaOp_Comp ” ClassNo=”100″> <CompInfo CompareOp=”EQ” /> <Operator Name=”ScaOp_Identifier ” ClassNo=”99″> <IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Address]” ColumnName=”AddressID” /> </Operator> <Operator Name=”ScaOp_Const ” ClassNo=”98″> <ConstInfo Type=”int” Value=”(9)” /> </Operator> </Operator> </Operator> |
| query_hash | 13158512245962950952 |
| stats_collection | <StatsCollection Name=”CStCollFilter” Id=”2″ Card=”1.00″ /> |
| stats_collection_id | 2 |
The second event had the following information:
| calculator | <CalculatorList /> |
| creation_time | 2013-11-16 16:56:35.6666666 |
| input_relation | <Operator Name=”LogOp_SelectIdx” ClassNo=”43″> <StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”450.00″ TableName=”SalesLT.Address” /> <Operator Name=”ScaOp_Comp ” ClassNo=”100″> <CompInfo CompareOp=”EQ” /> <Operator Name=”ScaOp_Identifier ” ClassNo=”99″> <IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Address]” ColumnName=”AddressID” /> </Operator> <Operator Name=”ScaOp_Const ” ClassNo=”98″> <ConstInfo Type=”int” Value=”(9)” /> </Operator> </Operator> </Operator> |
| query_hash | 13158512245962950952 |
| stats_collection | <StatsCollection Name=”CStCollFilter” Id=”2″ Card=”1.00″ /> |
| stats_collection_id | 2 |
So there is a lot here to dig through, but I highlighted a couple of values that stood out. And I know that AddressID happens to be my clustered, unique, primary key column for this table.
What happens if I reference a non-unique key value that is covered by an index (such as StateProvince)?
SELECT AddressID FROM [SalesLT].[Address] WHERE StateProvince = 'Arizona' OPTION (RECOMPILE);
This query uses in Index Seek as I expected, and this time for query_optimizer_estimate_cardinality I saw a new calculator value:
<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcColumnInInterval” Selectivity=”0.029″ TableName=”[AdventureWorksLT2012].[SalesLT].[Address]” ColumnName=”StateProvince” StatId=”5″ />
</CalculatorList>
The stats_collection value was as follows:
<StatsCollection Name=”CStCollFilter” Id=”2″ Card=”13.00″>
<LoadedStats>
<StatsInfo DbId=”5″ ObjectId=”69575286″ StatsId=”5″ />
</LoadedStats>
</StatsCollection>
Also – for a scenario where I didn’t have stats – and disabled them from being auto-created (to simulate a wild guess scenario), I saw the following calculator list:
<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcPointPredsFreqBased“>
<SubCalculator Role=”DistinctCountPlan”>
<DistinctCountCalculator CalculatorName=”CDVCPlanLeaf“ Guesses=”1″ CoveringStatId=”4″ CoveringStatDensity=”450.000″ />
</SubCalculator>
</FilterCalculator>
</CalculatorList>
The “Guesses” part looks promising (thinking magic numbers/selectivity guesses/heuristics, whatever you like to call it).
When executing a query that kicked off auto-stats operations, I saw the following operator information:
<Operator Name=”LogOp_GbAgg” ClassNo=”31″>
<StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”847.00″ TableName=”SalesLT.Customer” />
<Operator Name=”AncOp_PrjList ” ClassNo=”137″>
<Operator Name=”AncOp_PrjEl ” ClassNo=”138″>
<Operator Name=”ScaOp_AggFunc ” ClassNo=”90″>
<AggFuncInfo AggType=”STATMAN” />
<Operator Name=”ScaOp_Identifier ” ClassNo=”99″>
<IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Customer]” ColumnName=”LastName” />
</Operator>
</Operator>
</Operator>
</Operator>
</Operator>
And I saw the following calculator information (for the auto-stats operations):
<CalculatorList>
<DistinctCountCalculator CalculatorName=”CDVCPlanTrivial” />
</CalculatorList>
And lastly, I tried a query with a bad-practice (fiddling with the column reference via concatenation) to see what steps would be taken:
SELECT [CustomerID] FROM [SalesLT].[Customer] WHERE LastName + ' ' = 'Gates' OPTION (RECOMPILE);
This query plan just had a Clustered Index Scan, but spawned five query_optimizer_estimate_cardinality events associated with it (and I tested this a few times to see if the 5-event output was consistent):
| calculator | input_relation |
| <CalculatorList> <FilterCalculator CalculatorName=”CSelCalcHistogramComparison” Selectivity=”0.002″ ComparisonType=”Interval” /> </CalculatorList> |
<Operator Name=”LogOp_Select” ClassNo=”32″> <StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”847.00″ TableName=”SalesLT.Customer” /> <Operator Name=”ScaOp_Comp ” ClassNo=”100″> <CompInfo CompareOp=”EQ” /> <Operator Name=”ScaOp_Arithmetic ” ClassNo=”87″> <ArithmeticInfo Operation=”ADD” /> <Operator Name=”ScaOp_Identifier ” ClassNo=”99″> <IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Customer]” ColumnName=”LastName” /> </Operator> <Operator Name=”ScaOp_Const ” ClassNo=”98″> <ConstInfo Type=”nvarchar(1)” Value=”N’ ‘” /> </Operator> </Operator> <Operator Name=”ScaOp_Const ” ClassNo=”98″> <ConstInfo Type=”nvarchar(5)” Value=”N’Gates'” /> </Operator> </Operator> </Operator> |
| <CalculatorList /> | <Operator Name=”LogOp_Project” ClassNo=”29″> <OpProjectInfo /> <StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”847.00″ TableName=”SalesLT.Customer” /> <Operator Name=”AncOp_PrjList ” ClassNo=”137″> <Operator Name=”AncOp_PrjEl ” ClassNo=”138″> <Operator Name=”ScaOp_Arithmetic ” ClassNo=”87″> <ArithmeticInfo Operation=”ADD” /> <Operator Name=”ScaOp_Identifier ” ClassNo=”99″> <IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Customer]” ColumnName=”LastName” /> </Operator> <Operator Name=”ScaOp_Const ” ClassNo=”98″> <ConstInfo Type=”nvarchar(1)” Value=”N’ ‘” /> </Operator> </Operator> </Operator> </Operator> </Operator> |
| <CalculatorList> <FilterCalculator CalculatorName=”CSelCalcHistogramComparison” Selectivity=”0.002″ ComparisonType=”Interval” /> </CalculatorList> |
<Operator Name=”LogOp_Select” ClassNo=”32″> <StatsCollection Name=”CStCollProject” Id=”3″ Card=”847.00″ /> <Operator Name=”ScaOp_Comp ” ClassNo=”100″> <CompInfo CompareOp=”EQ” /> <Operator Name=”ScaOp_Identifier ” ClassNo=”99″> <IdentifierInfo ColumnName=”Expr1002″ /> </Operator> <Operator Name=”ScaOp_Const ” ClassNo=”98″> <ConstInfo Type=”nvarchar(5)” Value=”N’Gates'” /> </Operator> </Operator> </Operator> |
| <CalculatorList /> | <Operator Name=”LogOp_Project” ClassNo=”29″> <OpProjectInfo /> <StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”847.00″ TableName=”SalesLT.Customer” /> <Operator Name=”AncOp_PrjList ” ClassNo=”137″> <Operator Name=”AncOp_PrjEl ” ClassNo=”138″> <Operator Name=”ScaOp_Arithmetic ” ClassNo=”87″> <ArithmeticInfo Operation=”ADD” /> <Operator Name=”ScaOp_Identifier ” ClassNo=”99″> <IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Customer]” ColumnName=”LastName” /> </Operator> <Operator Name=”ScaOp_Const ” ClassNo=”98″> <ConstInfo Type=”nvarchar(1)” Value=”N’ ‘” /> </Operator> </Operator> </Operator> </Operator> </Operator> |
| <CalculatorList> <FilterCalculator CalculatorName=”CSelCalcHistogramComparison” Selectivity=”0.002″ ComparisonType=”Interval” /> </CalculatorList> |
<Operator Name=”LogOp_Select” ClassNo=”32″> <StatsCollection Name=”CStCollProject” Id=”5″ Card=”847.00″ /> <Operator Name=”ScaOp_Comp ” ClassNo=”100″> <CompInfo CompareOp=”EQ” /> <Operator Name=”ScaOp_Identifier ” ClassNo=”99″> <IdentifierInfo ColumnName=”Expr1002″ /> </Operator> <Operator Name=”ScaOp_Const ” ClassNo=”98″> <ConstInfo Type=”nvarchar(5)” Value=”N’Gates'” /> </Operator> </Operator> </Operator> |
Lots of scenarios to mull over and dig through as time permits.
Why care?
Many query performance issues (and associated query plan quality issues) are due to cardinality estimate skews. It would be great to have a way to more efficiently point to how the various estimates are being calculated and why the estimates are off.
I’m not sure how in-depth this event and associated calculators will be documented by Microsoft, and my assumption is that we’ll need to figure it out via collective reverse-engineering. But in the meantime this new XE event might prove to be quite useful for troubleshooting the more mysterious cardinality estimates.
The post A first look at the query_optimizer_estimate_cardinality XE event appeared first on Joe Sack.
]]>The post New Article on SQLPerformance.com “Exploring Partition-Level Online Index Operations in SQL Server 2014 CTP1” appeared first on Joe Sack.
]]>Exploring Partition-Level Online Index Operations in SQL Server 2014 CTP1
In this post I explore the online, single-partition rebuild improvement being introduced in SQL Server 2014 CTP1.
The post New Article on SQLPerformance.com “Exploring Partition-Level Online Index Operations in SQL Server 2014 CTP1” appeared first on Joe Sack.
]]>The post Data Page Count Influence on the Query Execution Plan appeared first on Joe Sack.
]]>To illustrate the scenario, I created a table in the Credit database based on the charge table and I added two indexes, one clustered and one nonclustered:
USE [Credit]; GO SELECT TOP 575000 [charge_no], [member_no], [provider_no], [category_no], [charge_dt], [charge_amt], [statement_no], [charge_code] INTO [dbo].[charge_demo] FROM [dbo].[charge]; GO CREATE CLUSTERED INDEX [charge_demo_charge_no] ON [dbo].[charge_demo] ([charge_no]); GO CREATE NONCLUSTERED INDEX [charge_demo_charge_amt] ON [dbo].[charge_demo] ([charge_amt]) INCLUDE ([member_no]) WITH (FILLFACTOR = 100); GO
Next, I checked the data page counts by index for this new 575,000 row table:
SELECT [index_id],
[in_row_data_page_count]
FROM [sys].[dm_db_partition_stats]
WHERE [object_id] = OBJECT_ID('dbo.charge_demo');
GO
The clustered index has 3,426 data pages and the nonclustered index has 1,567 data pages.
Next I looked at the execution plan for the following query:
SELECT [member_no], SUM([charge_amt]) AS [charge_amt] FROM [dbo].[charge_demo] WHERE [charge_amt] > 0 GROUP BY [member_no] OPTION (RECOMPILE); GO
The query execution plan (via SQL Sentry Plan Explorer) was as follows:
The overall estimated subtree cost for the plan ended up being 4.6168.
Next, I rebuilt the nonclustered index using a very low fill factor (far lower than I would ever recommend, but I was doing this to demonstrate the placement of the same number of rows over many more pages than the original default fill factor):
CREATE NONCLUSTERED INDEX [charge_demo_charge_amt] ON [dbo].[charge_demo] ([charge_amt]) INCLUDE ([member_no]) WITH (FILLFACTOR = 1, DROP_EXISTING = ON); GO
The clustered index still has 3,426 data pages (since we didn’t change it), but now the nonclustered index has 143,753 data pages instead of the original 1,567 data pages. And again, this represents the same 575,000 row count. Re-executing the original test query, I saw the following changed plan:
The overall estimated subtree cost for the plan increased to 54.3065 with a few other significant changes as well. The second plan switched to using a clustered index scan instead of a nonclustered index seek. Also, the second plan uses a stream aggregate with an “injected” sort operation, instead of the original plan’s hash match aggregate operation.
The post Data Page Count Influence on the Query Execution Plan appeared first on Joe Sack.
]]>The post Validating Instance-Level Index View and MERGE Optimization Activity appeared first on Joe Sack.
]]>I won’t rehash what they have collectively already covered thoroughly – but just a quick tip about identifying index view and MERGE optimization activity via the sys.dm_exec_query_optimizer_info DMV… The following query shows counter name and occurrences of optimizations for MERGE statements and indexed views having been matched since the SQL Server instance last restarted:
SELECT [counter],
[occurrence]
FROM sys.[dm_exec_query_optimizer_info]
WHERE counter IN
('merge stmt',
'indexed views matched');
I see this as a “first cut” check – but there are some key limitations to why this would only be a starting data point and not the “end all, be all” approach:
Even with the limitations, if you see non-zero values for the counters, this might accelerate your investigation and application of the appropriate cumulative update. I prefer keeping up with serious issues in this case, but if you need to prioritize what gets patched in larger environments with thousands of SQL Server instances, this may help drive that prioritization.
The post Validating Instance-Level Index View and MERGE Optimization Activity appeared first on Joe Sack.
]]>The post Columnstore Segment Population Skew appeared first on Joe Sack.
]]>Anyhow, this is a quick post on segment population skew based on parallel nonclustered Columnstore index creations.
I’ll use the same 123,695,104 row FactInternetSales table I used almost a year ago to demonstrate. I’ll create the following nonclustered Columnstore index just on one column, to keep things simple:
CREATE NONCLUSTERED COLUMNSTORE INDEX [NCSI_FactInternetSales] ON [dbo].[FactInternetSales] ( [ProductKey] );
The index takes 31 seconds to create on my laptop and it was created using 8 threads (which I can confirm via the SQL Server execution plan in, this case, SQL Sentry Plan Explorer):
Adding up the actual rows by thread, we get the 123,695,104 row count.
Now if we look at sys.column_store_segments, we can see that the last few segments were populated with less than the maximum 1,048,576 rows:
SELECT [partition_id], [column_id], [segment_id], [row_count] FROM sys.column_store_segments WHERE [row_count] = 1048576 AND [column_id] = 2;
Now the purpose of this short post is to show what happens if we remove parallelism from the overall Columnstore index build (aside from increasing build time and reducing the memory grant):
DROP INDEX [NCSI_FactInternetSales] ON [dbo].[FactInternetSales]; GO CREATE NONCLUSTERED COLUMNSTORE INDEX [NCSI_FactInternetSales] ON [dbo].[FactInternetSales] ( [ProductKey] )WITH (DROP_EXISTING = OFF, MAXDOP = 1); GO
Now instead of running in 31 seconds with 8 schedulers, this serial index build took (not surprisingly) 2 minutes and 10 seconds to build.
How many segments fell beneath the 1,048,576 row count?
This time, just one segment, the last one to be populated. With 117 segments (segment_id 0 through segment_id 117) populated at 1,048,576 per segment, and 123,695,104 rows – our 118th segment has the remaining 1,011,712 rows.
Should the more tightly packed segments provide meaningful performance gains versus the parallel-built, partially filled version? I haven’t tested this yet, but I will at some point. Let me know if you get a chance to do so before I do. My wild guess would be that the benefit would be minor, at best – but as with most things I would like to see for myself.
The post Columnstore Segment Population Skew appeared first on Joe Sack.
]]>The post Exceptions–what sys.dm_db_index_usage_stats doesn’t tell you (Part II) appeared first on Joe Sack.
]]>This post will describe another scenario that you should be aware of (the topic came up today in class while Kimberly was teaching – as we were trying to recall tricks to clearing stats for sys.dm_db_index_usage_stats)…
Imagine that I’ve queried a specific table as follows:
SELECT member_no, lastname, firstname, middleinitial, street, city, state_prov, country
FROM dbo.member
WHERE member_no = 1;
If I check sys.dm_db_index_usage_stats for any reference to the member table, I’ll see the following:
SELECT i.index_id, i.name, u.user_seeks, u.user_lookups, u.user_scans
FROM sys.dm_db_index_usage_stats u
INNER JOIN sys.indexes i ON
u.object_id = i.object_id AND
u.index_id = i.index_id
WHERE u.object_id=object_id('dbo.member')
This returns:
Now let’s say that you have a weekly rebuild of specific indexes (for example):
ALTER INDEX member_ident
ON dbo.member REBUILD
If I check for usage stats after rebuilding the query (and before anyone has accessed the member table specifically) – the stats have been cleared out for that table.
Why does this matter?
If you’re using the sys.dm_db_index_usage_stats to determine which indexes should be removed, you’re running the risk of making decisions based on recently cleared out statistics. This is similar to the case where a SQL Server instance has been recently restarted. You should not be dropping indexes without knowing whether the accumulated statistics represent the full set of critical workloads.
For tables with frequent index rebuilds, be sure to capture data from sys.dm_db_index_usage_stats before these jobs run. This DMV is definitely a useful tool, but if you’re not careful, you could be dropping indexes based on missing information.
A few other noteworthy items:
Thanks!
The post Exceptions–what sys.dm_db_index_usage_stats doesn’t tell you (Part II) appeared first on Joe Sack.
]]>The post Exceptions – what sys.dm_db_index_usage_stats doesn’t tell you appeared first on Joe Sack.
]]>While I recalled the conversation, I needed to double check who I had actually discussed it with. Admittedly, it’s a short list. After checking with him over email, I realized that conversation had been with MVP, Microsoft RD, MCM Greg Low. He also mentioned that Rob Farley once demonstrated the effect of creating a unique non-clustered index on a column which impacts the query plan but doesn’t actually register that use in sys.dm_db_index_usage_stats.
While the conversation stuck with me, I had never taken time to test it out. In this post, I’ll be testing out a different scenario, but essentially a similar situation where index column stats are used for a query (to provide more accurate estimates) – but its use results in no update to sys.dm_db_index_usage_stats.
Before getting to the demo, I do want to say that I still think sys.dm_db_index_usage_stats is incredibly useful for evaluating usage patterns and helping to identify indexes that aren’t pulling their weight (high cost – low benefit). I also see the following scenario as an edge case – but something to most certainly be mindful of as one reason why you could all-of-the-sudden see cardinality estimate issues after dropping an index that wasn’t showing as being used at all.
In this demo, I’m using SQL Server 2008 R2 (10.50.1617) and the AdventureWorksDW database. I’ll start off by disabling auto-creation of statistics (you’ll see why shortly). While I see auto-creation enabled more often than not, I have seen cases where it has mindfully been disabled and cases where it was disabled for no good reason at all – a topic for another day:
— Disable creation of statistics
USE [master]
GO
ALTER DATABASE [AdventureWorksDW] SET AUTO_CREATE_STATISTICS OFF WITH NO_WAIT
GO
Next I’ll execute the following query against the dbo.FactInternetSales table – and I’ve enabled the “Include Actual Execution Plan” in SSMS so I can see the actual plan:
USE [AdventureWorksDW]
GO
— Estimated rows (no stats on TaxAmt) = 3,853
— Actual rows (562)
SELECT RevisionNumber
FROM dbo.FactInternetSales
WHERE TaxAmt = 5.08
We can check the estimated versus actual rows in SSMS, but I’ll actually show the results in SQL Sentry Plan Explorer because I like the tabular format (and I don’t have to “hover” to see it). The following screen shot is from the Plan Tree tab:
As you can see – the estimated rows were 3,853 versus 562 rows. Now recall that automatic creation of statistics are disabled, so I’m going to manually create the following index:
— Create an index on TaxAmt (which means we also get stats)
CREATE INDEX IX_FactInternetSales_TaxAmt ON
dbo.FactInternetSales (TaxAmt)
If I re-execute the previous query against dbo.FactInternetSales, I see the following:
Now our estimates match the actual results. We have a better estimate, although notice that a table scan was still chosen (although imagine the disparity had this table had significantly more rows – and associated impact). But in essence, we did use the statistics associated with that index, so what about index usage stats?
If I execute the following query, I get no results at all:
— Was the index used? Not according to sys.dm_db_index_usage_stats
SELECT u.user_seeks, u.user_lookups, u.user_scans
FROM sys.dm_db_index_usage_stats u
INNER JOIN sys.indexes i ON
u.object_id = i.object_id AND
u.index_id = i.index_id
WHERE u.object_id=object_id('dbo.FactInternetSales') AND
i.name = 'IX_FactInternetSales_TaxAmt'
So let’s drop the index we just created and see what happens:
USE [AdventureWorksDW]
GO
DROP INDEX [IX_FactInternetSales_TaxAmt] ON [dbo].[FactInternetSales] WITH ( ONLINE = OFF )
GO
— Estimated rows with index = 3,853
— Actual rows (562)
SELECT RevisionNumber
FROM dbo.FactInternetSales
WHERE TaxAmt = 5.08
As you may expect, we’re back to the estimation issue:
As a final test, let’s enable auto-creation of statistics and re-execute the query (again, with no supporting index statistics):
USE [master]
GO
ALTER DATABASE [AdventureWorksDW] SET AUTO_CREATE_STATISTICS ON WITH NO_WAIT
GO
USE [AdventureWorksDW]
GO
SELECT RevisionNumber
FROM dbo.FactInternetSales
WHERE TaxAmt = 5.08
The plan shows that actual versus estimated matches again (without the index) because auto-creation of statistics occurred in the background:
And I can validate if auto-statistics were created as follows:
SELECT s.name, STATS_DATE(s.object_id, s.stats_id) auto_stats_date
FROM sys.stats s
INNER JOIN sys.stats_columns c ON
s.stats_id = c.stats_id AND
s.object_id = c.object_id
WHERE s.object_id = object_id('dbo.FactInternetSales') AND
s.auto_created = 1 AND
c.column_id = 20 – TaxAmt
And indeed – this was the case:
Will this stop me from using sys.dm_db_index_usage_stats to identify high cost/low benefit indexes?
Absolutely not. The potential benefit of identifying and eliminating wasteful indexing is too great and this is a fantastic (but not perfect) method to use in assessing an indexes’ value.
However – I will also be mindful of such scenarios. If someone tells me that plans have turned for the worse after an index cleanup operation, I’ll validate this very scenario. And even if I found this as the root cause, my bias would revolve around creating the needed statistics, rather than creating an index that is not used for actual data access.
Wish list time… One “nice to have” for a future version of sys.dm_db_index_usage_stats would be to add a stats_lookup bigint column and a last_stats_lookup datetime column. I would see it as a great way to ensure we address indexes that are used exclusively for the index column statistics associated with it.
The post Exceptions – what sys.dm_db_index_usage_stats doesn’t tell you appeared first on Joe Sack.
]]>The post Clearing “missing index” suggestions for a single table appeared first on Joe Sack.
]]>While we don’t have this direct option, there is an interesting behavior that I’ll demonstrate on this blog post using SQL Server 2008 R2 (10.50.1617).
Right after restarting SQL Server, I run the following query to generate a few entries in the missing index DMVs (each query spawns a missing index recommendation):
USE AdventureWorks
GO
SELECT ProductID
FROM Sales.SalesOrderDetail
WHERE UnitPriceDiscount = 0.40
GO
SELECT SalesOrderID
FROM Sales.SalesOrderDetail
WHERE LineTotal = 236.421500
GO
SELECT SalesOrderID
FROM Sales.SalesOrderHeader
WHERE TaxAmt = '10.316'
GO
Next I execute the following query to see the three entries in the missing index DMVs:
SELECT s.last_user_seek,
d.object_id,
d.equality_columns,
d.inequality_columns,
d.included_columns,
d.statement,
s.avg_user_impact
FROM sys.dm_db_missing_index_group_stats AS s
INNER JOIN sys.dm_db_missing_index_groups AS g
ON (s.group_handle = g.index_group_handle)
INNER JOIN sys.dm_db_missing_index_details AS d
ON (g.index_handle = d.index_handle)
This returns:
So I have three index suggestions so far. What happens if I create one of suggested indexes for the SalesOrderDetail table?
USE [AdventureWorks]
GO
CREATE NONCLUSTERED INDEX IDX_SalesOrderDetail_UnitPriceDiscount
ON [Sales].[SalesOrderDetail] ([UnitPriceDiscount])
INCLUDE ([ProductID])
GO
If I re-execute the query against the missing index DMVs – I’ll now see that BOTH suggestions from the SalesOrderDetail table got cleared out (even though I only added one of the suggestions), leaving behind the suggestion for the other table (SalesOrderHeader):
So by virtue of adding an index to SalesOrderDetail, it clears out BOTH suggestions for that table.
And if I re-execute the query for the LineTotal column on SalesOrderDetail (that was not yet indexed), the missing index entry pops back in (since we didn’t create an index for this):
And just to test from another direction, I'll add the suggested index on the SalesOrderHeader table:
USE [AdventureWorks]
GO
CREATE NONCLUSTERED INDEX IDX_SalesOrderHeader_TaxAmt
ON [Sales].[SalesOrderHeader] ([TaxAmt])
Re-running the query against the missing index DMVs, sure enough, the SalesOrderHeader suggestion is removed, leaving the SalesOrderDetail suggestion (for the one index we haven’t yet created):
So in a nutshell, the missing index views aren’t as static as they may appear to be and if you’re adding indexes, you may expect to see suggestions disappear for that table (not waiting for a SQL Server restart) – and this happens whether or not you created indexes for each suggestion for a specific table. I can understand the trade-offs of this behavior – especially since a new index on a table can have an impact on future execution plans. Basically one index creation “resets” the suggestions. But that also means that you should be sampling these DMVs over time and not rely fully on a single sampling.
The post Clearing “missing index” suggestions for a single table appeared first on Joe Sack.
]]>