This post is a continuation of the SQL Server 2014 Cardinality Estimator enhancements exploration series:<\/p>\n
In the previous post, I looked at root-level cardinality estimates vs. actual row counts across a set of 28 queries).\u00a0 9 plans had improved cardinality but two of them had significant changes.\u00a0 One of the plans for the new CE version had a root-level estimate of 11,199 rows vs. 1,468 actual rows.\u00a0 Compare that to the original estimate of 348 vs. 1,468.<\/p>\n
The query in-question was as follows:<\/p>\n
\r\nSELECT AVG(OrderQty) AS 'Average Quantity',\r\nNonDiscountSales = (OrderQty * UnitPrice)\r\nFROM Sales.SalesOrderDetail\r\nGROUP BY (OrderQty * UnitPrice)\r\nORDER BY (OrderQty * UnitPrice) DESC;\r\n<\/pre>\nThe pre-SQL Server 2014 CTP2 query plan was as follows (via SQL Sentry Plan Explorer, showing estimated <\/em>rows):<\/p>\n
<\/a><\/p>\n
And here was the new version of the plan (using the new Cardinality Estimator):<\/p>\n
<\/a><\/p>\n
You can see that at the leaf-level, the Cluster Index Scan estimates remain the same, as does the consecutive Compute Scalar. Where it changes is in the Hash Match (Aggregate) operator \u2013 with the new CE estimating 11,200 rows versus the old estimate of 348.<\/p>\n
Out of curiosity, I changed the original query to group by just OrderQty and another query to just group by UnitPrice (in other words \u2013 was the estimate issue related to our OrderQty * UnitPrice expression, or were the skews introduces at the single-column level?) Here was the OrderQty grouping query and plan tree (ignoring the useless nature of averaging the column you\u2019re already group by):<\/p>\n
\r\nSELECT AVG(OrderQty) AS 'Average Quantity',\r\nOrderQty\r\nFROM Sales.SalesOrderDetail\r\nGROUP BY OrderQty\r\nORDER BY OrderQty\u00a0 DESC;\r\n<\/pre>\n
<\/a><\/p>\n
And here was the UnitPrice query and plan tree:<\/p>\n
\r\nSELECT AVG(OrderQty) AS 'Average Quantity',\r\nUnitPrice\r\nFROM Sales.SalesOrderDetail\r\nGROUP BY UnitPrice\r\nORDER BY UnitPrice DESC;\r\n<\/pre>\n
<\/a><\/p>\n
No large skews for the prior two queries, compared to our original (OrderQty * UnitPrice) skew \u2013 shown below in plan tree form:<\/p>\n
<\/a><\/p>\n
Aside from the standard methods of evaluating single-column statistics (do they exist, are they up to date, do I need to change the sampling) – I then wondered if some of the legacy methods for helping improve estimates would still apply?\u00a0 For example, would adding a calculated column help?<\/p>\n
\r\nALTER TABLE Sales.[SalesOrderDetail]\r\nADD [NonDiscountSales] AS (OrderQty * UnitPrice);\r\n<\/pre>\nRe-executing the original query, I saw the following plan tree (with auto-created stats to the rescue associated with the calculated column):<\/p>\n