Thinking about the inaccurate_cardinality_estimate Event

SQL Server 2012 introduces the new inaccurate_cardinality_estimate extended event, and I had a few miscellaneous thoughts about it that I wanted to discuss on this post.

Background reading

If you’re unfamiliar with this event, there isn’t anything as of today in Books Online, but there are two blog references / discussions:

How the query execution engine calculates the threshold it uses to detect inaccurate cardinality estimation and fire the inaccurate_cardinality_estimate extended event? – By my friend and former PFE colleague Nacho Alonso Portillo (blog)

Extended Events – inaccurate_cardinality_estimate – David Ballantyne’s (blog | @davebally)

Give these both a read before digging in further on this event.  The event output of “actual_rows” and “estimated_rows” can be a bit confusing at first, so Nacho walks through the algorithm for when this event fires.

When should (or could) you use this event?

The primary tool I use for detecting cardinality estimates is in using the Actual (not estimated) query execution plan.  Pulling a plan from cache isn’t helpful, because it is just the estimated plan, so the best scenario for troubleshooting suspected cardinality estimate issues is to generate the actual plan.

And yet, there are scenarios where it isn’t practical or possible.  For example – what if the suspected cardinality estimate is related to a long running data modification statement?  And what if the issue only occurs in production?  You may not be able to actually capture the Actual plan in a contained way in SQL Server Management Studio without polluting production data. And yes, you could capture the plan via “Showplan XML Statistics Profile” or “Showplan Statistics Profile” – but at what cost to your production activity (remember the “observer overhead”).  Those are events that you should be very careful about capturing in a production environment.

But this leads us to a similar cost/benefit discussion of the inaccurate_cardinality_estimate event.  The description from Microsoft of this event is very clear about it’s overhead:

“Occurs when an operator outputs significantly more rows than estimated by the Query Optimizer. Use this event to identify queries that may be using sub-optimal plans due to cardinality estimate inaccuracy. Using this event can have a significant performance overhead so it should only be used when troubleshooting or monitoring specific problems for brief periods of time.”

This event can help us spot cardinality estimate issues for live activity – and even map to the nodes in the query plan tree (more on this later) – but Microsoft is very clear that there can be “significant” performance overhead.  This is similar to the warning about pulling the execution plan too, by the way: “Occurs after a SQL statement is executed. This event returns an XML representation of the actual query plan. Using this event can have a significant performance overhead so it should only be used when troubleshooting or monitoring specific problems for brief periods of time.”.

So if you’re going to use this new inaccurate_cardinality_estimate event or other execution plan related events, don’t keep this running for a long period of time – and think carefully before enabling this on very high throughput systems or those systems running near to capacity.  What is the overhead percentage?  I haven’t tested this specifically using the inaccurate_cardinality_estimate extended in isolation, but I may do so eventually.  It isn’t urgent, as Jonathan Kehayias (blog | @sqlpoolboy) has already done plenty of work in this area (see Measuring “Observer Overhead” of SQL Trace vs. Extended Events”) to establish overhead with even some of the more standard events.

Where is inaccurate_cardinality_estimate in the GUI?  I don’t see it.

David Ballantyne (blog | @davebally) created a Connect item on this subject, and Microsoft noted that this was a debug event, requiring the debug channel to be enabled (checked off) in order to see it.

snaghtmlfbfd75 thumb Thinking about the inaccurate cardinality estimate Event

And of course, you can skip the New Session Wizard and just reference it directly in your CREATE EVENT SESSION definition.  For example:

 
CREATE EVENT SESSION [Track_CE_Issues] ON SERVER
ADD EVENT sqlserver.inaccurate_cardinality_estimate
ADD TARGET package0.ring_buffer
WITH (STARTUP_STATE=ON);
GO

What’s this “nodes” thing you mentioned earlier?

I’ll demonstrate with the following query in the Credit database that has a significant enough cardinality estimate skew to trigger the inaccurate_cardinality_estimate event:

    
USE [Credit];
GO

DECLARE @Column INT = 2,
    @Value INT = 10;

SELECT  [member].[member_no],
        [member].[street],
        [member].[city],
        [charge].[charge_no],
        [charge].[provider_no],
        [charge].[category_no],
        [charge].[charge_dt],
        [charge].[charge_amt],
        [charge].[charge_code]
FROM    [dbo].[charge]
INNER JOIN [dbo].[member]
        ON [member].[member_no] = [charge].[member_no]
WHERE   CHOOSE(@Column, [charge].[provider_no], [charge].[category_no]) = @Value;
GO

SQL Sentry Plan Explorer shows the following:

image thumb Thinking about the inaccurate cardinality estimate Event

I added in Node ID in the Plan Tree tab for cross reference.

As for the captured inaccurate_cardinality_estimate event data, we see the following:

image thumb Thinking about the inaccurate cardinality estimate Event

7 event rows were captured with seven pieces of information, including the estimated vs. actual rows (and again, read Nacho’s blog for the reason why actual_rows doesn’t match the final “actual rows” value).  We also see the node_id and thread_id associated with each event. (As an aside – if you’re wondering about the estimated_rows (4532 vs. 36257, for example) – note that this was a parallel execution plan which used 8 threads.)

You see that the operators associated with the firing include node_id 5 (Clustered Index Scan on [charge].[ChargePK]) and node_id 1 (Hash Match).  Also notice that while there are skews for the parallelism-related operators, they didn’t fire inaccurate_cardinality_estimate, which seems acceptable to me because the skew I really care about originates at node_id 5.

image thumb Thinking about the inaccurate cardinality estimate Event

Anything else?

This post was just a mix of thoughts about this new event.  Time will tell if it is practical (or safe) enough to use with any significant frequency.

SQLPerformance.com “TRANSACTION_MUTEX and Multi-Session Transaction Access”

My third guest blog post was published today on SQLPerformance.com:

TRANSACTION_MUTEX and Multi-Session Transaction Access

This is related to a recent SQL Server instance I was working with that had high TRANSACTION_MUTEX accumulated wait time.  Based on this observation, I explored a three different functionality areas that could be associated with this wait type.

Exploring Column Correlation and Cardinality Estimates

Last Thursday I presented a session at the PASS Winter 2012 Performance Palooza.  It was a great experience and I appreciated the opportunity.  The topic was “Troubleshooting Query Plan Quality Issues” and I received a few email questions after the presentation, so I thought I would walk through the full scenario, weaving in a few additional points that were motivated by the questions I received.

First, let me set up the scenario.  I used the Credit database, which you can download here

The first T-SQL I executed updated the member table with ten different city and state_prov combinations:

USE Credit;
GO

UPDATE  [dbo].[member]
SET     [city] = ‘Minneapolis’,
        [state_prov] = ‘MN’
WHERE   [member_no] % 10 = 0;

UPDATE  [dbo].[member]
SET     [city] = ‘New York’,
        [state_prov] = ‘NY’
WHERE   [member_no] % 10 = 1;

UPDATE  [dbo].[member]
SET     [city] = ‘Chicago’,
        [state_prov] = ‘IL’
WHERE   [member_no] % 10 = 2;

UPDATE  [dbo].[member]
SET     [city] = ‘Houston’,
        [state_prov] = ‘TX’
WHERE   [member_no] % 10 = 3;

UPDATE  [dbo].[member]
SET     [city] = ‘Philadelphia’,
        [state_prov] = ‘PA’
WHERE   [member_no] % 10 = 4;

UPDATE  [dbo].[member]
SET     [city] = ‘Phoenix’,
        [state_prov] = ‘AZ’
WHERE   [member_no] % 10 = 5;

UPDATE  [dbo].[member]
SET     [city] = ‘San Antonio’,
        [state_prov] = ‘TX’
WHERE   [member_no] % 10 = 6;

UPDATE  [dbo].[member]
SET     [city] = ‘San Diego’,
        [state_prov] = ‘CA’
WHERE   [member_no] % 10 = 7;

UPDATE  [dbo].[member]
SET     [city] = ‘Dallas’,
        [state_prov] = ‘TX’
WHERE   [member_no] % 10 = 8;
GO

Next, with “Include Actual Execution Plan” enabled, I executed the following query:

SELECT  [lastname],
        [firstname]
FROM    [dbo].[member]
WHERE   [city] = ‘Minneapolis’;
GO

Looking at the estimated rows versus actual (using SQL Sentry Plan Explorer), I see that the estimate is spot-on with 1,000 rows estimated and 1,000 rows actual:

image thumb Exploring Column Correlation and Cardinality Estimates

Now my database has statistics auto-updates enabled, so even though I don’t have a supporting index on the city column, I do have supporting statistics (which were created in conjunction with my query execution):

EXEC dbo.sp_helpstats ‘member’;

image thumb Exploring Column Correlation and Cardinality Estimates

Looking at the statistics information via DBCC SHOW_STATISTICS, I see the following STAT_HEADER, DENSITY_VECTOR, and HISTOGRAM information (highlighting the Minneapolis histogram step):

DBCC SHOW_STATISTICS(‘member’, ‘_WA_Sys_00000006_0CBAE877′);

image thumb Exploring Column Correlation and Cardinality Estimates

So we see the DENSITY_VECTOR shows an “all density” value of 0.1 and we also see a Minneapolis RANGE_HI_KEY histogram step with an EQ_ROWS value of 1000.

Next, I executed the following query, looking at city AND state_prov:

SELECT  [lastname],
        [firstname]
FROM    [dbo].[member]
WHERE   [city] = ‘Minneapolis’ AND
        [state_prov] = ‘MN’
OPTION (RECOMPILE);
GO

Now I personally know that these two columns are correlated, but the query optimizer does not.  The query optimizer assumes that these two columns are independent.  Here is the estimated versus actual for this query:

image thumb Exploring Column Correlation and Cardinality Estimates

We see an estimate of 100 rows, versus the actual 1,000 rows.

We also have new statistics generated for the state_prov column:

EXEC dbo.sp_helpstats ‘member’;

image thumb Exploring Column Correlation and Cardinality Estimates

And notice that the statistics_keys are for single-column statistics.  SQL Server does not automatically generate multi-column statistics.

Looking at the statistics information via DBCC SHOW_STATISTICS, I see the following STAT_HEADER, DENSITY_VECTOR, and HISTOGRAM information (highlighting the MN histogram step):

DBCC SHOW_STATISTICS(‘member’, ‘_WA_Sys_00000007_0CBAE877′);

image thumb Exploring Column Correlation and Cardinality Estimates

The “all density” value for state_prov is 0.125 (and notice that unlike with city, we have one state that has 3,000 EQ_ROWS value, for TX).

So the MN step shows 1,000 rows out of 10,000 rows (10%).  And the Minneapolis step shows 1,000 rows out of 10,000 rows (10%).  But SQL Server is not assuming that these two columns are correlated, and so we end up with an estimate of 1% of the rows (0.10 * 0.10).  And while this is a small scale example, imagine this for much larger skews.  What kind of impact could this have on the query execution plan?  And how many times do you have predicates referencing correlated columns in your query? 

Now, to help out the query optimizer, I can manually create multi-column statistics on city and state_prov:

CREATE STATISTICS [member_city_state_prov]
ON [dbo].[member]([city],[state_prov]);
GO

If I re-execute my original query with city and state_prov predicates, I see that my estimates are now exactly correct:

image thumb Exploring Column Correlation and Cardinality Estimates

But this isn’t the end of the story, because if you look at the STAT_HEADER, DENSITY_VECTOR, and HISTOGRAM of the manually created statistics, you’ll see the following:

image thumb Exploring Column Correlation and Cardinality Estimates

Notice that the DENSITY_VECTOR shows two rows – one with city and one with city, state_prov.  Both show an “all density” of 0.1 – which reflects our correlation between the two columns.

But also notice that the HISTOGRAM does NOT show multi-column steps.  It just shows the leading statistics key column, city – with steps equal to the various city values.  So in the case of Minneapolis, MN – the “all density” value was correct.

What about a scenario where I pick a mismatched city and state_prov combination (Minneapolis and Texas)?

SELECT  [lastname],
        [firstname]
FROM    [dbo].[member]
WHERE   [city] = ‘Minneapolis’ AND
        [state_prov] = ‘TX’
OPTION (RECOMPILE);
GO

This time we get the following cardinality estimate skew:

image thumb Exploring Column Correlation and Cardinality Estimates

We estimated the rows based on DENSITY_VECTOR, but without a multi-column HISTOGRAM, the query optimizer doesn’t know that there are no Minneapolis city and Texas state_prov rows.  So while multi-column statistics can be helpful, there are limits.

Now what if I drop my statistics and add in a multi-column index instead? 

DROP STATISTICS  [dbo].[member].[member_city_state_prov];
GO

CREATE INDEX [member_city_state_prov]
ON [dbo].[member]([city],[state_prov]);
GO

The multi-column index will provide me with the same results for the Minneapolis / MN combination – as well as the same skew for the Minneapolis / TX combo.

One question I received was around the index choice after I created the index on city and state_prov.  Why didn’t that index get used via an index seek?

image thumb Exploring Column Correlation and Cardinality Estimates

Well, the new index did get used from a cardinality estimate perspective, but the final plan choice involved a clustered index scan.  The warning indicator you see on the SELECT was for the following missing index:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[member] ([city],[state_prov])
INCLUDE ([lastname],[firstname])
GO

This is where we should explore the cost alternatives.  Below shows the side-by-side costs of a Clustered Index Scan, a (forced) bookmark lookup using the index I created earlier just on city and state, and then finally the missing index suggestion with the additional INCLUDE columns (using INDEX hints to force the three different options):

– Clustered index scan
SELECT  [lastname],
        [firstname]
FROM    [dbo].[member]
WITH (INDEX = [member_ident])
WHERE   [city] = ‘Minneapolis’ AND
        [state_prov] = ‘MN’
OPTION (RECOMPILE);
GO

image thumb Exploring Column Correlation and Cardinality Estimates

– Non-covering NCI
SELECT  [lastname],
        [firstname]
FROM    [dbo].[member]
WITH (INDEX = [member_city_state_prov])
WHERE   [city] = ‘Minneapolis’ AND
        [state_prov] = ‘MN’
OPTION (RECOMPILE);
GO

image thumb Exploring Column Correlation and Cardinality Estimates

– Covering NCI
SELECT  [lastname],
        [firstname]
FROM    [dbo].[member]
WITH (INDEX = [member_city_state_prov_2])
WHERE   [city] = ‘Minneapolis’ AND
        [state_prov] = ‘MN’
OPTION (RECOMPILE);
GO

image thumb Exploring Column Correlation and Cardinality Estimates

So while the query optimizer used the index statistics for my cardinality estimate, the estimated cost of a Clustered Index Scan was 0.011 versus using the non-covering nonclustered index and key lookup cost of 0.158.  And of course, the fully covering index was the cheapest estimated cost out of the three – at 0.001, although whether it makes sense holistically to accommodate that one query with a covering index is another topic altogether.

New Course: “SQL Server: Common Performance Issue Patterns”

My latest course, “SQL Server: Common Performance Issue Patterns  was just published by Pluralsight.com.

The course description is as follows:

“There are a wealth of problems that can affect the performance of SQL Server workloads, and Joe shows you more than 35 common performance issue patterns. With 17 detailed demos, you’re shown how to recognize each pattern along with practical troubleshooting guidance that you can use. The course starts with high-level issues around people and practices, before moving to technical areas like I/O, concurrency, memory, and CPU. The course also has modules on specific areas like tempdb, application design, and factors that can affect query plan quality. The information that Joe presents is perfect for developers, DBAs, and anyone responsible for SQL Server, from complete beginners through to those with more experience who want a fresh way to troubleshoot SQL Server. It is applicable to all versions from SQL Server 2005 onwards.”

I really enjoyed creating this course and I hope you find it to be useful! Click here to get started.

Simple-Talk Article: “Fixing Gatekeeper Row Cardinality Estimate Issues”

I wrote a new article for Simple-Talk that was published recently:

Fixing Gatekeeper Row Cardinality Estimate Issues

In the article I talk about a specific cardinality estimation issue you may see for relational data warehouse queries (specifically against star schemas) and a few recommended solutions that may help eliminate or at least reduce the overall impact to query plan quality and performance.

SQLPerformance.com “Ten Common Threats to Execution Plan Quality”

My second guest blog post was published today on SQLPerformance.com:

Ten Common Threats to Execution Plan Quality

This is related to the subject of my recent SQLSaturday presentation and this is a big subject with a real tactical benefit from a performance tuning perspective, so I’ll be expanding it out over time in various formats (including other SQLSaturday deliveries and a separate Pluralsight course).

Distributed Query Plan Quality and SQL Server 2012 SP1

SQL Server 2012 Service Pack 1 is out today and it fixes an issue that has been around for several versions regarding distributed queries and poor quality execution plans due to bad cardinality estimates when the distributed query principal had insufficient permissions to gather the applicable statistics.  The description of this fix can be found here –> New or Enhanced Features in SQL Server 2012.

So does the fix work?  Based on one test I cobbled together today, it certainly seems so.

I used the following query for my test:

SELECT  1.[charge_no],
        1.[member_no],
        1.[provider_no],
        1.[category_no],
        1.[charge_dt],
        1.[charge_amt],
        1.[statement_no],
        1.[charge_code]
FROM    [dbo].[charge] AS c
INNER JOIN [JOSEPHSACK-PC\TIBERIUS].Credit.dbo.charge AS rc
        ON 1.[charge_no] = [rc].[charge_no]
WHERE   [rc].[member_no] = 7894
OPTION  (RECOMPILE);
GO

I’m joining the charge tables across two SQL Server 2012 SP1 instances.  My linked server account only has SELECT permission on the charge table on the remote server.

If I enable trace flag 9485 on the destination SQL Server instance (hosting the remote data) for my session, I can see the legacy behavior in the plan (showing estimates vs. actuals using SQL Sentry Plan Explorer):

image thumb Distributed Query Plan Quality and SQL Server 2012 SP1

image thumb Distributed Query Plan Quality and SQL Server 2012 SP1

Now compare this to a plan with the 9485 flag turned off on the remote SQL Server 2012 SP1 instance:

image thumb Distributed Query Plan Quality and SQL Server 2012 SP1

image thumb Distributed Query Plan Quality and SQL Server 2012 SP1

So my user account for the linked server no longer requires broader permissions to gather statistics – in SQL Server 2012 SP1.

Thank you to the SQL Server product team for getting in this change!  This issue has sprung up several times over the years for my own clients and its good to know that there is a supported solution once folks upgrade.

Avoid false negatives when comparing sys.dm_io_virtual_file_stats data to perfmon counter data

 

A quick and somewhat nuanced tip today… When evaluating sys.dm_io_virtual_file_stats against collected perfmon counter data, be careful not to erroneously invalidate the latency findings by measuring and comparing against the averages of already-averaged collected performance monitor data (e.g. Avg. Disk sec/Write, Avg. Disk sec/Read ).  I’ve seen scenarios where there is true I/O latency that has been identified in dm_io_virtual_file_stats, but when the DBA goes to other teams to discuss, they lose the argument due to bad comparisons.

To illustrate the point, I created a new database on one of my older, non-RAID USB drives:

CREATE DATABASE [SampleIOHammering]
CONTAINMENT = NONE
ON  PRIMARY
( NAME = N'SampleIOHammering',
FILENAME = N'F:\SampleIOHammering.mdf' ,
SIZE = 8388608KB , FILEGROWTH = 1048576KB )
LOG ON
( NAME = N'SampleIOHammering_log',
FILENAME = N'F:\SampleIOHammering_log.ldf' ,
SIZE = 1048576KB ,
FILEGROWTH = 1048576KB );
GO

ALTER DATABASE [SampleIOHammering] SET RECOVERY SIMPLE;
GO

I’ve intentionally used slow disk and the database creation operation took 7 minutes and 5 seconds (no IFI being used).

Immediately after creating the database, I executed the following query (concentrating on write activity):

SELECT  file_id,
io_stall_write_ms,
num_of_writes
FROM    sys.dm_io_virtual_file_stats(DB_ID('SampleIOHammering'), NULL);

This shows the following results – which don’t truly represent the full lifecycle of the file creation and the associated latency:

file_id io_stall_write_ms num_of_writes
1 76 10
2 25 15

How can I tell that the accumulated stats are representative of the actual database creation?  I did so through associated perfmon counters which were being collected via a data collector set concurrently… Here were the averages for the drive over the database creation period (and this is the average over the averages):

image thumb Avoid false negatives when comparing sys.dm io virtual file stats data to perfmon counter data

So we’re definitely seeing latency that is beyond the recommended number represented in average disk seconds per write, and if anything, that latency isn’t yet fully reflected in the virtual file stats DMV (yet).

Next I created a new table and loaded it with 1.6 million rows from another database:

USE [SampleIOHammering];
GO

CREATE TABLE [dbo].[charge]
(
[charge_no] INT NOT NULL,
[member_no] INT NOT NULL,
[provider_no] INT NOT NULL,
[category_no] INT NOT NULL,
[charge_dt] 
2013-05-23, 31
NOT NULL, [charge_amt] [money] NOT NULL, [statement_no] INT NOT NULL, [charge_code] CHAR(2) NOT NULL ); GO INSERT  [dbo].[charge] SELECT  1.[charge_no], 1.[member_no], 1.[provider_no], 1.[category_no], 1.[charge_dt], 1.[charge_amt], 1.[statement_no], 1.[charge_code] FROM    [Credit].[dbo].[charge] AS c; GO

This shows the following virtual stats results:

file_id io_stall_write_ms num_of_writes
1 8917 79
2 575633 3783

Unlike the database creation, we’re now seeing some real stat accumulations. If we divide the IO stall write ms by the number of writes, we see about 152 ms as the average latency for the transaction log file (file 2).

And now if we look at perfmon counters, we see the following associated latency data averages across the collection time period for my INSERT:

image thumb Avoid false negatives when comparing sys.dm io virtual file stats data to perfmon counter data

Focusing on Avg. Disk sec/Write – we see 52 milliseconds certainly isn’t ideal – but at this point you may have the storage or Windows admin say something like “This doesn’t match your average of 152 milliseconds…  Your DMV must be wrong.”

But what if you switched from using an average of  the perfmon data to the maximum?

image thumb Avoid false negatives when comparing sys.dm io virtual file stats data to perfmon counter data

Making this change shifts the numbers as follows:

image thumb Avoid false negatives when comparing sys.dm io virtual file stats data to perfmon counter data

Notice now that the maximum average disk seconds per write is 156 milliseconds at peak – which is more in the ballpark of our 152 millisecond average we saw from a virtual file stats perspective.  Granted it is the maximum measurement – but it brings us back into alignment with the two different data sources. Again, this is a nuanced point – but I’ve noticed that sometimes non-SQL folks are doubtful about the SQL perspective of things, and so it helps to make sure you are comparing data in a fair way.

Exploring Semantic Search Key Term Relevance

I missed that this got published last week, so here is just a quick pointer to an article I wrote regarding the relevance of SQL Server 2012’s Semantic Search functionality:

Exploring Semantic Search Key Term Relevance

In this article I investigate the general efficacy of the new functionality and come to some conclusions based on searching a library of 1,388 articles provided to me via Simple-Talk.

SQLPerformance.com “Configuring a Dedicated Network for Availability Group Communication”

My first guest blog post published today on SQLPerformance.com:

Configuring a Dedicated Network for Availability Group Communication

The subject was originally motivated by a question I received at a recent training event, regarding how one would go about using a dedicated network for the endpoints used in availability group communication.

Cheers!