Combining multiple CONTAINS predicates in SQL Server 2012

The SQL Server 2005  “Performance Tuning and Optimization (Full-Text Search)” books online topic has the following recommendation:

Combine multiple CONTAINS predicates into one CONTAINS predicate. In SQL Server you can specify a list of columns in the CONTAINS query.”

While this shows up for the SQL Server 2005 topic, there is no such statement made for the 2008 and 2012 versions of the the books online topic.

I was curious if this 2005-era recommendation still applied for more recent versions.  To test this scenario I made the following changes to the AdventureWorks2012 sample database on version Microsoft SQL Server 2012 (SP1), 11.0.3000.0 Developer Edition:

USE [AdventureWorks2012];
GO
ALTER TABLE [Production].[ProductDescription]
ADD [Description2] NVARCHAR(400) NOT NULL DEFAULT '';
GO

-- Populating the new description column
UPDATE [Production].[ProductDescription]
SET [Description2] = [Description];
GO

CREATE FULLTEXT INDEX ON [Production].[ProductDescription] KEY INDEX [PK_ProductDescription_ProductDescriptionID] ON ([AW2008FullTextCatalog])
WITH (CHANGE_TRACKING AUTO);
GO

ALTER FULLTEXT INDEX ON [Production].[ProductDescription] ADD ([Description]);
ALTER FULLTEXT INDEX ON [Production].[ProductDescription] ADD ([Description2]);
GO

Now for an initial test, I executed the following single CONTAINS predicate query:

SET STATISTICS IO ON;

SELECT  [ProductDescriptionID] ,
[Description] ,
[rowguid] ,
[ModifiedDate]
FROM [Production].[ProductDescription]
WHERE CONTAINS([Description], 'bike');

SET STATISTICS IO OFF;

The query execution plan tree (from SQL Sentry Plan Explorer) was as follows:

image thumb Combining multiple CONTAINS predicates in SQL Server 2012

And the STATISTICS IO output was as follows:

Table ‘ProductDescription’. Scan count 0, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Now for a multi-CONTAINS predicate, I executed the following example:

SET STATISTICS IO ON;

SELECT  [ProductDescriptionID] ,
[Description] ,
[rowguid] ,
[ModifiedDate]
FROM [Production].[ProductDescription]
WHERE CONTAINS([Description], 'bike') OR
CONTAINS([Description2], 'bike');

SET STATISTICS IO OFF;

The query execution plan tree (from SQL Sentry Plan Explorer) was as follows:

image thumb1 Combining multiple CONTAINS predicates in SQL Server 2012

The operators remained the same, but we have a change in estimated subtree cost and also a doubling up of cardinality estimates.

As for the STATISTICS IO output, it was as follows:

Table ‘ProductDescription’. Scan count 0, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Logical reads were 34 versus the original 28.

For the last example, I combined the two CONTAINS predicates as follows:

SET STATISTICS IO ON;

SELECT  [ProductDescriptionID] ,
[Description] ,
[rowguid] ,
[ModifiedDate]
FROM [Production].[ProductDescription]
WHERE CONTAINS(([Description], [Description2]), 'bike');

SET STATISTICS IO OFF;

I found that with the combined CONTAINS, the plan shape for this example remained the same as the multiple-CONTAINS version. Operator costs and estimates also remained the same.

For SQL Server 2005, if you had multiple CONTAINS predicates referencing columns from the same table, your execution plan would show separate remote scan operators for each reference instead of them being combined natively into a single operator.  This behavior seems to have been addressed in recent versions, but if you have seen a variation of this issue given other conditions please share in the comments.  Thanks!

Finding a Deadlocked SQL Server Agent Job Name

Short and simple post – filed under “Joe – you’ve already forgotten about this twice, so write this down somewhere.”

Let’s say you have deadlock output that shows the following information for the “clientapp” attribute:

clientapp=SQLAgent – TSQL JobStep (Job 0xB813B96C59E6004CA8CD542D8A431A2E : Step 1)

Based on this output, we know a SQL Server Agent Job is involved, but what is the SQL Server Agent Job name?

Tested on SQL Server 2012 SP1, I can find it with the following query:

SELECT  [job_id] ,
        [name]
FROM [msdb].[dbo].[sysjobs]
WHERE [job_id] = 0xB813B96C59E6004CA8CD542D8A431A2E;
GO

I’ve forgotten about this twice over time because the job_id uses a uniqueidentifier data type (in this example, the job_id value is 6CB913B8-E659-4C00-A8CD-542D8A431A2E).  And as an aside, the query execution plan shows a CONVERT_IMPLICIT as I would expect and uses a Clustered Index Seek.

Troubleshooting the new Cardinality Estimator

This post is a continuation of the SQL Server 2014 Cardinality Estimator enhancements exploration series:

This post, like the previous nine posts on this subject, uses SQL Server 2014 CTP2 as a reference point.  There may be changes by SQL Server 2014 RTM, and if so, I’ll write a post about applicable changes.

Now in terms of troubleshooting the new Cardinality Estimator, what I’m specifically referring to is the introduction of cardinality estimate skews that negatively impact query performance compared to the pre-SQL Server 2014 cardinality estimator functionality.   Ideally performance regressions should be rare, but when they happen, what are our troubleshooting options?

To frame this discussion, let’s first discuss what may or may not warrant action…

No Action (Necessarily) Needed

  • The estimates are identical to old CE functionality and the query plan is unchanged
  • The estimates are skewed compared to the old CE functionality, but the query plan “shape” is identical (and you see no side-effects from the skews, such as sort or hash spills and query runtime degradation)
  • The estimates are skewed compared to the old CE functionality, the plan is different, but performance is equal or improved – or even more stable
  • The estimates are skewed compared to the old CE functionality, the plan is different, and performance is somewhat impacted but not enough to justify changes (totally depends on your SLAs & workload performance requirements of course)

Action Potentially Needed

  • The estimates are skewed, the plan shape is unchanged, but the estimates lead to performance issues such as spills (due to under-estimates) or concurrency issues (due to over-estimates) for memory-intensive operators
  • The estimates are skewed, the plan is changed, and the plan quality leads to performance degradation (a variety of query optimizer plan choices which may lead to issues)

So what troubleshooting methods and options are available to us?

Troubleshooting Options

  • Do nothing (yeah, I know, but this can be a decision you ultimately make, looking at risk/effort/reward)
  • Revert to the pre-SQL Server 2014 CE version (for example, via database compatibility level change)
  • Apply legacy troubleshooting methods, which may fix other issues directly or indirectly related to the skew and thus help close the gap (framing these legacy methods as questions below)
    • Are the statistics old and need updating?
    • Should the statistics sampling be changed?
    • Are multi-column stats needed to help establish a correlation where one currently isn’t seen by the query optimizer?
    • Parameter sniffing troubleshooting needed? (a much larger topic, but indulge me on including this as a method)
    • Is your table variable usage contributing to the skew?
    • Is your multi-statement table-valued function or scalar user-defined function contributing to the skew?
    • Any data-type conversions occurring for predicates (join or filter)?
    • Are you comparing column values from the same table?
    • Is your column reference being buried by a function or embedded in a complex expression?
    • Are hints being used and if so, is their usage appropriate?

The new CE story will unfold as customers start upgrading to SQL Server 2014 and I’ll be curious to see which regression patterns are most common.

Open Questions

  • Regarding the new query_optimizer_estimate_cardinality XE event… Will it be a practical source of information for most SQL Server users in cardinality estimator skew regression scenarios – or will it be something reserved for edge-cases and advanced Microsoft customer support scenarios?  I suspect this XE event will have limited applicability, but I’m reserving judgment for now.
  • Will SQL Server 2014 RTM introduce finer-grained methods for reverting to the pre-2014 cardinality estimator?
  • How will the new CE behave with newer functionality?  For example, Hekaton and clustered columnstore indexes.
  • Will this be it for CE changes for the next few versions?  There is plenty left on the CE-improvement wish list, so I hope not.

Non-Join Cross-Table Predicate Correlation Changes

This post is a continuation of the SQL Server 2014 Cardinality Estimator enhancements exploration series:

In previous posts for this series, I discussed how the assumption of independence with regards to multiple predicates against the same table, in absence of multi-column stats, is blunted a bit with the new Cardinality Estimator for SQL Server 2014.  So your estimates may increase for this scenario.

On the flip-side, when it comes to joins between two tables, you may see a reduction in join estimate values for scenarios where there are non-join filter predicates on the tables being joined.

Take the following example, using pre-SQL Server 2014 CE behavior:

USE [master]
GO

ALTER DATABASE [Credit] SET COMPATIBILITY_LEVEL = 110
GO

USE [Credit];
GO

SELECT  [m].[member_no] ,
[m].[lastname] ,
[p].[payment_no] ,
[p].[payment_dt] ,
[p].[payment_amt]
FROM    dbo.[member] AS [m]
INNER JOIN dbo.[payment] AS [p]
ON      [m].[member_no] = p.[member_no]
WHERE   [m].[region_no] = 2
AND [p].[payment_dt] = '1999-09-02 00:00:00.000'
OPTION  ( RECOMPILE );
GO

The SQL Sentry Plan Explorer plan tree view is as follows:

image thumb Non Join Cross Table Predicate Correlation Changes

We see that the estimates are spot-on for the Clustered Index Scan and Table Scan, but we have an over-estimate for the Hash Match operation (1,767 estimated vs. 1,003 actual).

Now if I set the database compatibility level to 120 and re-execute the query, here is what we see instead:

image thumb1 Non Join Cross Table Predicate Correlation Changes

We still have identical estimate vs. actual values for the Clustered Index Scan and Table Scan, and now our over-estimate for the Hash Match is less pronounced (1,140 rows estimated instead of the 1,767 rows previously estimated).

For the pre-SQL Server 2014 cardinality estimation process, the assumption is that the non-join predicates for the two tables are somehow correlated (in our example, region_no = 2 and payment_dt = ’1999-09-02 00:00:00.000′).  This is called “Simple Containment”. For the new Cardinality Estimator, these non-join predicates are assumed to be independent (called “Base Containment”), and so this can translate into a reduced row estimate for the join.

More on Exponential Backoff

This post is a continuation of the SQL Server 2014 Cardinality Estimator enhancements exploration series:

Continuing the subject of exponential backoffs (from the 2nd and 3rd posts in this series), let’s restore the Credit sample database back to the baseline version and execute the following script:

USE [master];
GO

ALTER DATABASE [Credit] SET COMPATIBILITY_LEVEL = 120;
GO

USE [Credit];
GO

-- Add four new columns
ALTER TABLE [dbo].[member]
ADD [arbitrary_1] BIGINT NULL;

ALTER TABLE [dbo].[member]
ADD [arbitrary_2] BIGINT NULL;

ALTER TABLE [dbo].[member]
ADD [arbitrary_3] BIGINT NULL;

ALTER TABLE [dbo].[member]
ADD [arbitrary_4] BIGINT NULL;

I changed the database to the latest version so we use the new CE and then added four new columns.

Next, let’s update the values of the four new columns using different distributions:

;WITH    CTE_NTILE
AS ( SELECT   [member_no] ,
NTILE(10) OVER ( ORDER BY [member_no] DESC ) AS [arbitrary_1] ,
NTILE(2) OVER ( ORDER BY [member_no] DESC ) AS [arbitrary_2] ,
NTILE(4) OVER ( ORDER BY [member_no] DESC ) AS [arbitrary_3] ,
NTILE(250) OVER ( ORDER BY [member_no] DESC ) AS [arbitrary_4]
FROM     [dbo].[member]
)
UPDATE  [dbo].[member]
SET     [arbitrary_1] = 1.[arbitrary_1] ,
[arbitrary_2] = 1.[arbitrary_2] ,
[arbitrary_3] = 1.[arbitrary_3] ,
[arbitrary_4] = 1.[arbitrary_4]
FROM    [dbo].[member] AS [m]
INNER JOIN CTE_NTILE AS 1
ON      1.[member_no] = [m].[member_no];
GO

Looking at the estimates for single-predicate queries, if I execute the following, I’ll get an estimate of 1,000 rows:

SELECT  [member_no]
FROM    [dbo].[member]
WHERE   [arbitrary_1] = 1
OPTION  ( RECOMPILE );

For this next query I’ll get an estimate of 5,000 rows:

SELECT  [member_no]
FROM    [dbo].[member]
WHERE   [arbitrary_2] = 1
OPTION  ( RECOMPILE );

And for this next query, an estimate of 2,500 rows:

SELECT  [member_no]
FROM    [dbo].[member]
WHERE   [arbitrary_3] = 1
OPTION  ( RECOMPILE );

And lastly (for single-predicate examples anyhow), an estimate of 40 rows:

SELECT  [member_no]
FROM    [dbo].[member]
WHERE   [arbitrary_4] = 1
OPTION  ( RECOMPILE );

Now let’s start adding multiple predicates per statement.  The first example with multiple predicates uses two predicates – one with a selectivity of 0.1 and one of 0.5:

SELECT  [member_no]
FROM    [dbo].[member]
WHERE   [arbitrary_1] = 1 AND -- 0.1 selectivity
[arbitrary_2] = 1 -- 0.5 selectivity
OPTION  ( RECOMPILE );

The estimate for this query is 707.107 with the new CE, which we can derive using the POWER function in T-SQL as follows (I used Excel last time to do this, so see the previous posts for the background information on this calculation):

SELECT  10000 *0.10 * POWER(0.500000, 0.50);

That returned 707.107.

Now what about a query with three predicates, with selectivities of 0.1, 0.5 and 0.25?

SELECT  [member_no]
FROM    [dbo].[member]
WHERE   [arbitrary_1] = 1 AND -- .1 selectivity
[arbitrary_2] = 1 AND -- .5 selectivity
[arbitrary_3] = 1 -- .25 selectivity
OPTION  ( RECOMPILE );

The estimate for this was 420.448, and we can derive this via the following expression (and notice the order of selectivities goes from smallest to highest):

-- Notice the selectivity order (0.10, 0.25, .50)
SELECT  10000 * 0.10 * POWER(0.250000,0.50) * POWER(0.500000, 0.25);

Now let’s reference all four columns (with selectivities of 0.1, 0.5, 0.25 and 0.004):

SELECT  [member_no]
FROM    [dbo].[member]
WHERE   [arbitrary_1] = 1 AND -- .1 selectivity
[arbitrary_2] = 1 AND -- .5 selectivity
[arbitrary_3] = 1 AND -- .25 selectivity
[arbitrary_4] = 1  -- 0.004 selectivity
OPTION  ( RECOMPILE );

The estimate is 8.20193 and we can derive this via the following:

SELECT  10000 * 0.004* POWER(0.1000000, 0.50) * POWER(0.2500000, 0.25) * POWER(0.5000000, 0.125);

The selectivities are ordered from most selective to least selective, and the the less selective values get the “back offs” in order of none, 1/2, 1/4, and 1/8.

New Course: “SQL Server: Database Creation, Configuration, and Alteration”

 

Today Pluralsight published my new course, “SQL Server: Database Creation, Configuration, and Alteration”.

This is a demo-centric, short course on how to how to use T-SQL to create and configure databases, files, and filegroups.  Areas I cover include:

  • How to create a database
  • How to configure files and filegroups
  • Configuration of file settings
  • Configuration of recovery model, security, collation
  • How to create a database snapshot
  • How to support FILESTREAM data
  • Enabling containment
  • Adding, removing and modifying files and filegroups
  • Changing growth options
  • Relocating files
  • Changing logical file names
  • Making a filegroup read-only
  • Modifying compatibility levels, “AUTO” options, access and database state
  • Changing a database to read-only
  • Restricting database access
  • Changing recovery and recovery time options
  • Changing various options like page verification, snapshot isolation, ANSI settings
  • Enabling change tracking
  • Enabling Transparent Data Encryption
  • Changing security and Service Broker options
  • Changing containment settings

There are now 29 courses from the SQLskills team as of today and much more to come.

Using Legacy Methods to Lessen SQL Server 2014 Cardinality Estimator Skews

This post is a continuation of the SQL Server 2014 Cardinality Estimator enhancements exploration series:

In the previous post, I looked at root-level cardinality estimates vs. actual row counts across a set of 28 queries).  9 plans had improved cardinality but two of them had significant changes.  One of the plans for the new CE version had a root-level estimate of 11,199 rows vs. 1,468 actual rows.  Compare that to the original estimate of 348 vs. 1,468.

The query in-question was as follows:

SELECT AVG(OrderQty) AS 'Average Quantity',
NonDiscountSales = (OrderQty * UnitPrice)
FROM Sales.SalesOrderDetail
GROUP BY (OrderQty * UnitPrice)
ORDER BY (OrderQty * UnitPrice) DESC;

The pre-SQL Server 2014 CTP2 query plan was as follows (via SQL Sentry Plan Explorer, showing estimated rows):

image thumb18 Using Legacy Methods to Lessen SQL Server 2014 Cardinality Estimator Skews

And here was the new version of the plan (using the new Cardinality Estimator):

image thumb19 Using Legacy Methods to Lessen SQL Server 2014 Cardinality Estimator Skews

You can see that at the leaf-level, the Cluster Index Scan estimates remain the same, as does the consecutive Compute Scalar. Where it changes is in the Hash Match (Aggregate) operator – with the new CE estimating 11,200 rows versus the old estimate of 348.

Out of curiosity, I changed the original query to group by just OrderQty and another query to just group by UnitPrice (in other words – was the estimate issue related to our OrderQty * UnitPrice expression, or were the skews introduces at the single-column level?) Here was the OrderQty grouping query and plan tree (ignoring the useless nature of averaging the column you’re already group by):

SELECT AVG(OrderQty) AS 'Average Quantity',
OrderQty
FROM Sales.SalesOrderDetail
GROUP BY OrderQty
ORDER BY OrderQty  DESC;

image thumb20 Using Legacy Methods to Lessen SQL Server 2014 Cardinality Estimator Skews

And here was the UnitPrice query and plan tree:

SELECT AVG(OrderQty) AS 'Average Quantity',
UnitPrice
FROM Sales.SalesOrderDetail
GROUP BY UnitPrice
ORDER BY UnitPrice DESC;

image thumb21 Using Legacy Methods to Lessen SQL Server 2014 Cardinality Estimator Skews

No large skews for the prior two queries, compared to our original (OrderQty * UnitPrice) skew – shown below in plan tree form:

image thumb22 Using Legacy Methods to Lessen SQL Server 2014 Cardinality Estimator Skews

Aside from the standard methods of evaluating single-column statistics (do they exist, are they up to date, do I need to change the sampling) – I then wondered if some of the legacy methods for helping improve estimates would still apply?  For example, would adding a calculated column help?

ALTER TABLE Sales.[SalesOrderDetail]
ADD [NonDiscountSales] AS (OrderQty * UnitPrice);

Re-executing the original query, I saw the following plan tree (with auto-created stats to the rescue associated with the calculated column):

image thumb26 Using Legacy Methods to Lessen SQL Server 2014 Cardinality Estimator Skews

I wish I could look at the query_optimizer_estimate_cardinality results for the old version of the CE to compare the event output to the new CE.  I know it isn’t possible, but it would have been more helpful to understand what changed regarding the Hash Match (Aggregate) estimate.  As an aside, the exact math behind the estimates are hidden both in the new and old versions – so perhaps a comparison of calculators in this scenario would still be limited?

Regarding XE information, there were 12 query_optimizer_estimate_cardinality events raised for query with the larger estimate skew.  The second event out of the 12 events raised had the following calculator output (and it is interesting to see the SingleColumnStat reference in this output followed by two column names – but I believe this just means both single-column stats were loaded):

<CalculatorList>
<DistinctCountCalculator CalculatorName=”CDVCPlanProjectOrGroupBy”>
<SubCalculator Role=”DistinctCountCalculatorToDelegate”>
<DistinctCountCalculator CalculatorName=”CDVCPlanLeaf” SingleColumnStat=”OrderQty,UnitPrice” />
</SubCalculator>
</DistinctCountCalculator>
</CalculatorList>

And the stats collection was as follows:

<StatsCollection Name=”CStCollGroupBy” Id=”3″ Card=”11199.81“>
<LoadedStats>
<StatsInfo DbId=”7″ ObjectId=”898102240″ StatsId=”5″ />
<StatsInfo DbId=”7″ ObjectId=”898102240″ StatsId=”6″ />
</LoadedStats>
</StatsCollection>

The stats collection was interesting to me as it was our first appearance out of the 12 events of the GROUP BY estimate – 11,199.81 (show earlier in the plan tree rounded up).  The loaded stats StatsId of 5 was for the UnitPrice column and the StatsId of 6 was for the OrderQty column.

On the subject of SingleColumnStat, I wondered if creating multi-column stats (another legacy method for improving CE) on UnitPrice and OrderQty would be helpful.  I dropped the old calculated column and created the following multi-column stats instead:

CREATE STATISTICS [stats_SalesOrderDetail_OrderQty_UnitPrice] ON
Sales.[SalesOrderDetail](UnitPrice, OrderQty);

This answer is, multi-column stats were indeed helpful:

image thumb25 Using Legacy Methods to Lessen SQL Server 2014 Cardinality Estimator Skews

We do see now a reference to MultiColumnStat in the CalculatorList for one of the query_optimizer_estimate_cardinality events:

<CalculatorList>
<DistinctCountCalculator CalculatorName=”CDVCPlanProjectOrGroupBy”>
<SubCalculator Role=”DistinctCountCalculatorToDelegate”>
<DistinctCountCalculator CalculatorName=”CDVCPlanLeaf” MultiColumnStat=”OrderQty,UnitPrice” />
</SubCalculator>
</DistinctCountCalculator>
</CalculatorList>

And we see the new stats object was loaded instead of looking at the two individual stats objects:

<StatsCollection Name=”CStCollGroupBy” Id=”3″ Card=”1600.00″>
<LoadedStats>
<StatsInfo DbId=”7″ ObjectId=”898102240″ StatsId=”10″ />
</LoadedStats>
</StatsCollection>

So we see that with skews introduced in the new CE – similar to previous versions and depending on the query and schema, multi-column stats and calculated columns can still be helpful in reducing the CE skew.

To be continued…

Comparing Root-Level Skews in the new Cardinality Estimator

This post is a continuation of the SQL Server 2014 Cardinality Estimator enhancements exploration series:

In the previous post I showed you how to start using the new SQL Server 2014 CE model, but what different does it actually make?

Of course your own mileage may vary (and you should expect it to), but I thought I would kick the tires a bit by comparing root-level cardinality estimate skews for a specific workload against the AdventureWorks2008R2 database.  I used Jonathan’s “AdventureWorks BOL Workload.sql” script that he put together for use in the context of The AdventureWorks2008R2 Books Online Random Workload Generator (I made a few minor adjustments – for example, removing the queries that referenced the deprecated COMPUTE).

  1. I first cleared the cache of the (test) SQL Server 2014 CTP2 instance and then ran the script against the AdventureWorks2012 database using a pre-SQL Server 2014 database compatibility level (100).
  2. I then collected the root-level cardinality skew information and then re-ran the test using the SQL Server 2014 database compatibility level (120) and again collected the root-level cardinality skew information for that test as well.

The query I used for extracting estimated vs. actual last rows was as follows (for both tests):

SELECT  s.[query_hash],
p.[query_plan].value('(//@EstimateRows)[1]', 'varchar(128)') AS [estimated_rows],
s.[last_rows]
FROM    sys.[dm_exec_query_stats] AS [s]
CROSS APPLY sys.[dm_exec_sql_text](sql_handle) AS [t]
CROSS APPLY sys.[dm_exec_query_plan](plan_handle) AS [p]
WHERE   DATEDIFF(mi, s.[last_execution_time], GETDATE()) &lt; 1
GO

Of course, root level cardinality skews don’t really give us the full picture (see my post Detecting Cardinality Estimate Issues with sys.dm_exec_query_stats), especially for significant skews that are in the leaf-level or intermediate levels of the plan that work themselves out by the time they reach the root. But this informal test was a way for me to get a general feel for changes across several statements in a workload.

Now for these tests, I had query_hash values that would appear for the old-CE and not the new-CE and for the new-CE and not the old-CE.  Rather than run down that bunny trail (background requests, for example), I just focused for the time being on those query_hash values that joined consistently across each test run.

Here is what I saw – adding a bit of color to the noteworthy changes of old-CE vs. new-CE.

image thumb17 Comparing Root Level Skews in the new Cardinality Estimator

Now with regards to one round of informal testing I saw:

  • 9 root-level estimates improve (some slightly, some more significant)
  • 3 root-level estimates degrade slightly
  • 2 root-level estimates skew “more than slightly”
  • 13 plans remained the same

How consistent were the results across tests?  It was repeatable.  I did three test rounds where the skew was identical across workloads for the new CE and and for the old – as I would expect (and ruling out any rogue background queries jumping in during my test or new query hash values – there were 28 query_hash values I could keep track of).

Now again, this is just root-level skew.  Quite a bit can be buried in the leaf and intermediate levels of the plan, so I’ll be exploring changes there as well, but I thought this would be an interesting first cut at the overall workload estimate changes.

More to come…

Cardinality Estimation Model Version

This post is a continuation of the SQL Server 2014 Cardinality Estimator enhancements exploration series:

In this post I just wanted to step back and discuss how to enable and identify the Cardinality Estimator version, at least as it stands in SQL Server 2014 CTP2.

As you saw for earlier posts, I see the new CE behavior by changing a database’s compatibility level as follows:

USE [master]
GO
ALTER DATABASE [Credit] SET COMPATIBILITY_LEVEL = 120;
GO

Then when I execute a query within the database, looking at the execution plan I see the following attribute in the StmtSimple element:

<StmtSimple StatementCompId=”1″ StatementEstRows=”1600000″ StatementId=”1″ StatementOptmLevel=”FULL” CardinalityEstimationModelVersion=”120″ StatementSubTreeCost=”21.0963″ StatementText=”SELECT    c.[charge_no],&#xD;&#xA;        m.[lastname],&#xD;&#xA;        m.[firstname]&#xD;&#xA;FROM [dbo].[charge] AS c&#xD;&#xA;INNER JOIN [dbo].[member] AS m ON&#xD;&#xA;    c.[member_no] + ‘ ‘  =  m.[member_no] &#xD;&#xA;OPTION (RECOMPILE)” StatementType=”SELECT” QueryHash=”0xF698FF11488A3A3D” QueryPlanHash=”0x41F421A8F51D61CE” RetrievedFromCache=”false”>

Note that if my database compatibility level was for an earlier level, I see the following:

<StmtSimple StatementCompId=”1″ StatementEstRows=”1597640″ StatementId=”1″ StatementOptmLevel=”FULL” CardinalityEstimationModelVersion=”70″ StatementSubTreeCost=”21.0895″ StatementText=”SELECT    c.[charge_no],&#xD;&#xA;        m.[lastname],&#xD;&#xA;        m.[firstname]&#xD;&#xA;FROM [dbo].[charge] AS c&#xD;&#xA;INNER JOIN [dbo].[member] AS m ON&#xD;&#xA;    c.[member_no] + ‘ ‘  =  m.[member_no] &#xD;&#xA;OPTION (RECOMPILE)” StatementType=”SELECT” QueryHash=”0xF698FF11488A3A3D” QueryPlanHash=”0x41F421A8F51D61CE” RetrievedFromCache=”false”>

And I saw that value of 70 for compatibility levels 110 (SQL Server 2012), 100 (SQL Server 2008), and 90 (SQL Server 2005).  Perhaps this indicates that the cardinality estimation model being used pre-SQL Server 2014 CTP2 had its last major change back in the (70) SQL Server 7.0 days. Just a guess though.

More to come…

The CSelCalcAscendingKeyFilter Calculator

This post is a continuation of the SQL Server 2014 Cardinality Estimator enhancements exploration series (yeah – I’ve decided to call this a series, since there will be quite a bit to talk about, time permitting):

So for this post, I created a nonclustered index on the charge_dt column from the charge table in the Credit sample database and then inserted 10 new rows as follows:

INSERT INTO [dbo].[charge]
([member_no]
,[provider_no]
,[category_no]
,[charge_dt]
,[charge_amt]
,[statement_no]
,[charge_code])
SELECT TOP 10 [member_no]
,[provider_no]
,[category_no]
,GETDATE()
,[charge_amt]
,[statement_no]
,[charge_code]
FROM dbo.charge
GO

Notice I’m using GETDATE() for the charge_dt, and if you look at the last few rows of the histogram for charge_dt you’ll see that nothing in the year 2013 is represented (and I’m only inserting 10 new rows, and haven’t manually update statistics recently):

image thumb11 The CSelCalcAscendingKeyFilter Calculator

In SQL Server 2012, what row estimate would you expect if I executed the following query?

SELECT [charge_no]
FROM dbo.[charge]
WHERE [charge_dt] = ‘2013-11-17 12:46:44.307’
OPTION (RECOMPILE);

In this case, the plan tree (via SQL Sentry Plan Explorer) is as follows:

image thumb12 The CSelCalcAscendingKeyFilter Calculator

The estimate is for 1 row (actual is 10 rows).

What are my estimates if I switch the database to compatibility mode 120 and re-execute the same exact query?

image thumb13 The CSelCalcAscendingKeyFilter Calculator

In 2014 CTP2 – my estimate is for 10 rows, and re-checking the histogram, the maximum RANGE_HI_KEY is still 1999-10-13 10:51:19.317 and not a more recent value.

What does the query_optimizer_estimate_cardinality event show?

<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcColumnInInterval” Selectivity=”-1.000″
CalculatorFailed=”true” TableName=”[Credit].[dbo].[charge]” ColumnName=”charge_dt” />

<FilterCalculator CalculatorName=”CSelCalcAscendingKeyFilter” Selectivity=”0.000″
TableName=”[Credit].[dbo].[charge]” ColumnName=”charge_dt” UseAverageFrequency=”true” StatId=”7″ />
</CalculatorList>

I highlighted a couple of interesting areas – including the CalculatorFailed=”true” associated with CSelCalcColumnInInterval and then what seems to be the application of a second calculator, CSelCalcAscendingKeyFilter – which implies I would assume something related to the classic ascending key issue and a “true” for “UseAverageFrequency”.

If this is average frequency, then we’re looking at density times the row count, which would mean 1,600,010 rows times the 1.006087E-05 density from the that particular statistics object:

image thumb14 The CSelCalcAscendingKeyFilter Calculator

However if you do that calculation, you’ll get 16.0974926087 instead of the 10 row estimate.

Looking further in the query_optimizer_estimate_cardinality output (still associated with the prior calculator) I see the following:

<StatsCollection Name=”CStCollFilter” Id=”2″ Card=”10.00″>
<LoadedStats>
<StatsInfo DbId=”6″ ObjectId=”229575856″ StatsId=”7″ />
</LoadedStats>
</StatsCollection>

Continuing the search, I loaded 90 more rows with that same ‘2013-11-17 12:46:44.307’ value and re-executed the query to see the query_optimizer_estimate_cardinality event information.

This time the estimates bumped up as follows (16 rows instead of a 10 row estimate, with 100 actual rows):

image thumb15 The CSelCalcAscendingKeyFilter Calculator

There was still no change to the statistics (no additional step for the year 2013 data).

As for query_optimizer_estimate_cardinality event information, I saw the following:

<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcColumnInInterval” Selectivity=”-1.000″ CalculatorFailed=”true” TableName=”[Credit].[dbo].[charge]” ColumnName=”charge_dt” />
<FilterCalculator CalculatorName=”CSelCalcAscendingKeyFilter” Selectivity=”0.000″ TableName=”[Credit].[dbo].[charge]” ColumnName=”charge_dt” UseAverageFrequency=”true” StatId=”7″ />
</CalculatorList>

<StatsCollection Name=”CStCollFilter” Id=”2″ Card=”16.10“>
<LoadedStats>
<StatsInfo DbId=”6″ ObjectId=”229575856″ StatsId=”7″ />
</LoadedStats>
</StatsCollection>

The 16.10 reflects what the average frequency value was that I expected (16.0974926087 rounded up).  But regarding the original 10 estimate, I wondered what would happen if I deleted 99 out of the 100 rows with the 2013 value.  Would I still see a Card=”10.00” or something else?

This is what I saw:

image thumb16 The CSelCalcAscendingKeyFilter Calculator

The estimate was for 16 rows again – with 1 actual row returned.

Then I thought about reverting to the beginning of my test – restoring from backup to the vanilla version of the Credit database and then flipping it to 2014 mode again.  I did this, re-created the index on charge_dt and then re-ran the initial 10 row insert and follow-up query against the 2013 date value that didn’t exist as a histogram step.  I was able to repeat the 10 row estimate effect.

I also tried a re-set of the test with just 2 rows initially inserted and got the estimate of 2 rows and Card=”2.00” for the stats_collection output for query_optimizer_estimate_cardinality. Same thing if I did 15 rows (got a 15 row estimate).  Doing 16 rows, the same – estimating 16 instead of 16.10.

And when I reset the test with 17 rows (which is above the average frequency), the estimate used is the average frequency of 16.10 and not 17.  Interesting!

Much more to explore on this overall subject.