This post is a continuation of the SQL Server 2014 Cardinality Estimator enhancements exploration series:

In the previous post, I looked at root-level cardinality estimates vs. actual row counts across a set of 28 queries).  9 plans had improved cardinality but two of them had significant changes.  One of the plans for the new CE version had a root-level estimate of 11,199 rows vs. 1,468 actual rows.  Compare that to the original estimate of 348 vs. 1,468.

The query in-question was as follows:

SELECT AVG(OrderQty) AS 'Average Quantity',
NonDiscountSales = (OrderQty * UnitPrice)
FROM Sales.SalesOrderDetail
GROUP BY (OrderQty * UnitPrice)
ORDER BY (OrderQty * UnitPrice) DESC;

The pre-SQL Server 2014 CTP2 query plan was as follows (via SQL Sentry Plan Explorer, showing estimated rows):

image thumb18 Using Legacy Methods to Lessen SQL Server 2014 Cardinality Estimator Skews

And here was the new version of the plan (using the new Cardinality Estimator):

image thumb19 Using Legacy Methods to Lessen SQL Server 2014 Cardinality Estimator Skews

You can see that at the leaf-level, the Cluster Index Scan estimates remain the same, as does the consecutive Compute Scalar. Where it changes is in the Hash Match (Aggregate) operator – with the new CE estimating 11,200 rows versus the old estimate of 348.

Out of curiosity, I changed the original query to group by just OrderQty and another query to just group by UnitPrice (in other words – was the estimate issue related to our OrderQty * UnitPrice expression, or were the skews introduces at the single-column level?) Here was the OrderQty grouping query and plan tree (ignoring the useless nature of averaging the column you’re already group by):

SELECT AVG(OrderQty) AS 'Average Quantity',
OrderQty
FROM Sales.SalesOrderDetail
GROUP BY OrderQty
ORDER BY OrderQty  DESC;

image thumb20 Using Legacy Methods to Lessen SQL Server 2014 Cardinality Estimator Skews

And here was the UnitPrice query and plan tree:

SELECT AVG(OrderQty) AS 'Average Quantity',
UnitPrice
FROM Sales.SalesOrderDetail
GROUP BY UnitPrice
ORDER BY UnitPrice DESC;

image thumb21 Using Legacy Methods to Lessen SQL Server 2014 Cardinality Estimator Skews

No large skews for the prior two queries, compared to our original (OrderQty * UnitPrice) skew – shown below in plan tree form:

image thumb22 Using Legacy Methods to Lessen SQL Server 2014 Cardinality Estimator Skews

Aside from the standard methods of evaluating single-column statistics (do they exist, are they up to date, do I need to change the sampling) – I then wondered if some of the legacy methods for helping improve estimates would still apply?  For example, would adding a calculated column help?

ALTER TABLE Sales.[SalesOrderDetail]
ADD [NonDiscountSales] AS (OrderQty * UnitPrice);

Re-executing the original query, I saw the following plan tree (with auto-created stats to the rescue associated with the calculated column):

image thumb26 Using Legacy Methods to Lessen SQL Server 2014 Cardinality Estimator Skews

I wish I could look at the query_optimizer_estimate_cardinality results for the old version of the CE to compare the event output to the new CE.  I know it isn’t possible, but it would have been more helpful to understand what changed regarding the Hash Match (Aggregate) estimate.  As an aside, the exact math behind the estimates are hidden both in the new and old versions – so perhaps a comparison of calculators in this scenario would still be limited?

Regarding XE information, there were 12 query_optimizer_estimate_cardinality events raised for query with the larger estimate skew.  The second event out of the 12 events raised had the following calculator output (and it is interesting to see the SingleColumnStat reference in this output followed by two column names – but I believe this just means both single-column stats were loaded):

<CalculatorList>
<DistinctCountCalculator CalculatorName=”CDVCPlanProjectOrGroupBy”>
<SubCalculator Role=”DistinctCountCalculatorToDelegate”>
<DistinctCountCalculator CalculatorName=”CDVCPlanLeaf” SingleColumnStat=”OrderQty,UnitPrice” />
</SubCalculator>
</DistinctCountCalculator>
</CalculatorList>

And the stats collection was as follows:

<StatsCollection Name=”CStCollGroupBy” Id=”3″ Card=”11199.81“>
<LoadedStats>
<StatsInfo DbId=”7″ ObjectId=”898102240″ StatsId=”5″ />
<StatsInfo DbId=”7″ ObjectId=”898102240″ StatsId=”6″ />
</LoadedStats>
</StatsCollection>

The stats collection was interesting to me as it was our first appearance out of the 12 events of the GROUP BY estimate – 11,199.81 (show earlier in the plan tree rounded up).  The loaded stats StatsId of 5 was for the UnitPrice column and the StatsId of 6 was for the OrderQty column.

On the subject of SingleColumnStat, I wondered if creating multi-column stats (another legacy method for improving CE) on UnitPrice and OrderQty would be helpful.  I dropped the old calculated column and created the following multi-column stats instead:

CREATE STATISTICS [stats_SalesOrderDetail_OrderQty_UnitPrice] ON
Sales.[SalesOrderDetail](UnitPrice, OrderQty);

This answer is, multi-column stats were indeed helpful:

image thumb25 Using Legacy Methods to Lessen SQL Server 2014 Cardinality Estimator Skews

We do see now a reference to MultiColumnStat in the CalculatorList for one of the query_optimizer_estimate_cardinality events:

<CalculatorList>
<DistinctCountCalculator CalculatorName=”CDVCPlanProjectOrGroupBy”>
<SubCalculator Role=”DistinctCountCalculatorToDelegate”>
<DistinctCountCalculator CalculatorName=”CDVCPlanLeaf” MultiColumnStat=”OrderQty,UnitPrice” />
</SubCalculator>
</DistinctCountCalculator>
</CalculatorList>

And we see the new stats object was loaded instead of looking at the two individual stats objects:

<StatsCollection Name=”CStCollGroupBy” Id=”3″ Card=”1600.00″>
<LoadedStats>
<StatsInfo DbId=”7″ ObjectId=”898102240″ StatsId=”10″ />
</LoadedStats>
</StatsCollection>

So we see that with skews introduced in the new CE – similar to previous versions and depending on the query and schema, multi-column stats and calculated columns can still be helpful in reducing the CE skew.

To be continued…