According to the whitepaper “Mission-Critical Performance and Scale with SQL Server and Windows Server” there’s a new cardinality estimator in SQL Server 2014. The whitepaper doesn’t say a lot about how it works, but does mention “trace information will be provided”.
That’s XEvent trace of course, and the trace event is “query_optimizer_estimate_cardinality”. I decided to see what kind of information there is. Created an event session with that single event, and added an action of sql_text and named it, unsurprisingly, “Cardinality”.
Before starting it up, you need to make sure you’re using the new cardinality estimator on your queries; you can check this by looking in the query plan at the value of property “CardinalityEstimationModel110”. You should confirm that the value is “ON”. (Note: this a strange to me because, if the new estimator was introduced in SQL Server 2014, shouldn’t it be CardinalityEstimationModel120?) Databases that are restored from earlier versions’ backups don’t seem to have that property on, databases created and populated “from scratch” on SQL Server 2014 will. There’s also a trace flag 2312 to enable the new estimation model. You also need to be creating a plan, so make sure you don’t already have the plan in cache. Not getting the new model with restored databases from previous versions lets you compared the cardinality estimates and query plans between 2014 CTP1 and previous SQL Server versions.
Note: This post is already way too long to answer the question “What does a cardinality estimator in a database do?” To get a feel for it type “Cardinality Estimation SQL Server” into your favorite search engine and I’ll meet you back here in a while…(hint…it estimates cardinality).
This event fires, naturally, multiple times during the creation of a query plan. I started off with a “simple” 5-table join with a GROUP BY clause and received 54 of these events. Each event contains three major pieces of diagnostic information:
–InputRelation – Logical operator and part of a parse tree. Some ops contain cardinality numbers.
–Calculator – Information about the cardinality calculators used.
–StatsCollection – Stats read for this estimation step and estimates.
After switching back and forth between the chunks of XML using SSMS, I decided to factor the most useful pieces of information into columns, for easier reading. However the XML can be arbitrarily complex, so I left the original XML columns intact as well. Some calculators provide different estimate information, so these should be factored out into a case-based columns. After reading these sets of rows for a while, things become as intuitive as when you first started reading and understanding query plans.
For example, my 5-table join used the Northwind view “Order Subtotals”. This view looks like:
SELECT “Order Details”.OrderID, Sum(CONVERT(money,(“Order Details”.UnitPrice*Quantity*(1-Discount)/100))*100) AS Subtotal
FROM “Order Details”
GROUP BY “Order Details”.OrderID
My first row for this query was for the global aggregate logical operator (LogOp_GbAgg). I sifted out the following pieces of information:
From InputRelation:
–Operator: LogOp_GbAgg
–Table1: Order Details
–Cardinality1: 2155
From Calculator:
–CalculatorType: DistinctCountCalculator
–CalculatorName: CDVCPlanLeaf
–SingleColumnStat: OrderID
–CoveringStatDensity: 2155
From StatsCollection:
–StatsCollectionName: CStCollGroupBy
–StatsCollectionCardinality: 830
–StatsInfoObject1: Order Details
–StatsId: 1
Which means that we expect to read 2155 rows from “Order Details” and group it by OrderID into 830 groups. This is what’s estimated for the query plan and works out to be exactly the same as the acual number of rows. Comparing it to the “old” (pre-2014) estimate, its exactly the same. For this one. Other estimates that I’ve looked at are usually closer in the “new” estimate than the “old” estimate and, in many cases result in different query plans. As in every query optimizer change, there are also some regressions. I’ve only looked at less than 20 queries so far.
I’m providing the query I used to take apart the XEvent trace along with this article. You’re welcome to add more columns, and if you come up with more interesting ways to present the information, please let me know. BTW, the SQL query heavily uses SQL Server XQuery, so it’s going to be slow for large trace files unless you materialize intermediate results as tables and add XML indexes.
I also “consulted the binary” for lists of enumerated values of Operator, CalculatorType, CalculatorName, and StatsCollectionName, but they are too many to list here; they’re included in the zip file. It’s interesting to see what kind of query clauses result in a special calculator (e.g. CSelCalcTrieBased or CSelCalcExpressionComparedToExpression) or special statistics collection names (my favorite amusing ones are CStCollFudge and CStCollBlackBox).
Hope this adds to your estimation investigation enjoyment.
Cheers, Bob