sqlskills-logo-2015-white.png

Detecting Cardinality Estimate Issues with sys.dm_exec_query_stats

SQL Server 2008 R2 SP1 and SQL Server 2012 RTM+ includes actual row count statistics in the sys.dm_exec_query_stats dynamic management view.  This includes the total_rows, last_rows, min_rows and max_rows columns, which are intuitively named.

One potential use of the row statistics data is to detect cardinality estimate issues where the actual count of rows is significantly different from the query plan’s estimated rows.  While I can see using this information as a valid approach, there are limits, and I’ll walk through them in this post.

To illustrate the use cases, I’m going to use the Credit database to execute three different queries.  I’ll show you the plan tree tab from SQL Sentry Plan Explorer for each query.

Query 1: No Cardinality Estimate Issue

SELECT  region.region_name,
        member.lastname,
        member.firstname,
        member.member_no
FROM    dbo.member
INNER JOIN dbo.region
        ON region.region_no = member.region_no
WHERE   region.region_no = 9;
GO

The plan:

image

For each operation in this query, the estimated rows match the actual rows.

Query 2: Cardinality Estimate Issue, Leaf-Level + Final Operator

DECLARE @Column INT = 2,
    @Value INT = 10;

SELECT  [member].[member_no],
        [member].[street],
        [member].[city],
        [charge].[charge_no],
        [charge].[provider_no],
        [charge].[category_no],
        [charge].[charge_dt],
        [charge].[charge_amt],
        [charge].[charge_code]
FROM    [dbo].[charge]
INNER JOIN [dbo].[member]
        ON [member].[member_no] = [charge].[member_no]
WHERE   CHOOSE(@Column, [charge].[provider_no], [charge].[category_no]) = @Value;
GO

The plan:

image

Unlike the previous query, for this query we have leaf-level estimated vs. actual row count skews and skews that flow up to the final operation.

Query 3: Cardinality Estimate Leaf-Level Skew and No Skew for Root Operator

SELECT TOP (1000)
        [member].[member_no],
        [member].[lastname],
        [member].[firstname],
        [region].[region_no],
        [region].[region_name],
        [provider].[provider_name],
        [category].[category_desc],
        [charge].[charge_no],
        [charge].[provider_no],
        [charge].[category_no],
        [charge].[charge_dt],
        [charge].[charge_amt],
        [charge].[charge_code]
FROM    [dbo].[provider]
INNER JOIN [dbo].[charge]
        ON [provider].[provider_no] = [charge].[provider_no]
INNER JOIN [dbo].[member]
        ON [member].[member_no] = [charge].[member_no]
INNER JOIN [dbo].[region]
        ON [region].[region_no] = [member].[region_no]
INNER JOIN [dbo].[category]
        ON [category].[category_no] = [charge].[category_no];
GO

The plan:

image

For this third query, we see leaf-level skews (and intermediate-level for the Hash Match), but then the root of the plan does NOT have a skew (estimated 1,000 vs. actual 1,000).

Detecting Issues

So in my next query, I pull the estimated rows out from the query execution plan of the three queries and compare it to the last actual row count values from sys.dm_exec_query_stats (and I’m keeping this example query as simple as possible):

SELECT  t.text,
		p.[query_plan],
		s.[last_execution_time],
		p.[query_plan].value('(//@EstimateRows)[1]', 'varchar(128)') AS [estimated_rows],
		s.[last_rows]
FROM    sys.[dm_exec_query_stats] AS [s]
CROSS APPLY sys.[dm_exec_sql_text](sql_handle) AS [t]
CROSS APPLY sys.[dm_exec_query_plan](plan_handle) AS [p]
WHERE   DATEDIFF(mi, s.[last_execution_time], GETDATE()) < 1
GO

The results are as follows:

image

So what do we see here?  We show a final skew for just one query (Query #2).  The other two queries show no final skew.  For Query #1 – that is an accurate assessment, but for Query #3 the underlying skew is hidden since we’re only looking at the final operator estimated rows vs. actual.  So our actual row count stats are still useful – but for only one specific skew scenario.

Why Does this Matter?

Even if the query optimizer estimates final row counts accurately, the leaf-level skews can drive performance issues.  For example – for leaf-level or intermediate skews, you may be under-estimating memory grants sizes (risk of spills) or actually over-estimating memory grant sizes (potential concurrency issues for larger grant queries).  Another example may be the seek vs scan decision, with unnecessary I/O being driven by leaf-level skews.

So to summarize, the additional sys.dm_exec_query_stats data is useful for situations where we can detect the skew at the root of the query plan, but it doesn’t help you identify leaf-level and intermediate-level skews.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.