Cardinality Estimation Archives - Joe Sack

SQLIntersection Post-Conference Session

Joseph Sack — Mon, 19 May 2014 14:13:38 +0000

I am really happy to announce that Kimberly Tripp and I will be delivering a post-conference in November’s SQLIntersection conference at the MGM Grand. The session is “Queries Gone Wild 2: Statistics and Cardinality in Versions 2008, 2008R2, 2012, and 2014” and will be delivered November 14th, from 9AM to 4:30PM.

You can find registration details for the main conference, pre-conference and post-conference choices here.

It will be an incredibly fun day and you can be sure to walk away with practical query tuning techniques and a strong foundation in statistics and cardinality estimation concepts (both for the new and legacy cardinality estimator).

The post SQLIntersection Post-Conference Session appeared first on Joe Sack.

Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator

Joseph Sack — Mon, 14 Apr 2014 17:30:47 +0000

I’ve been working on writing a new Microsoft white paper since January covering the main changes made in the SQL Server 2014 Cardinality Estimator, and happy to announce that it was just published:

Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator

A big thanks to the contributor and reviewer team – including Yi Fang (Microsoft), Shep Sheppard (Microsoft), Mike Weiner (Microsoft), Paul White (SQL Kiwi Limited), Barbara Kess (Microsoft), Jimmy May (Microsoft), Sanjay Mishra (Microsoft), Vassilis Papadimos (Microsoft) and Jack Li (Microsoft). And thanks to Kathy MacDonald (Microsoft) for managing the whole process!

The post Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator appeared first on Joe Sack.

Deck from “Practical SQL Server Cardinality Estimation”

Joseph Sack — Wed, 02 Apr 2014 14:41:48 +0000

I uploaded the deck from my session for SQLSaturday #287 Madison 2014 and you can download it here.

It was an excellent event! Great organizers, volunteers, venue, speakers – and a matching big turnout with lots of first-time SQLSaturday attendees.

Grateful for the experience and hope to be back again next year.

The post Deck from “Practical SQL Server Cardinality Estimation” appeared first on Joe Sack.

MSTVF Fixed Cardinality Value in SQL Server 2014

Joseph Sack — Thu, 20 Mar 2014 16:21:12 +0000

In SQL Server 2014 CTP2 in the AdventureWorks2012 database, execute the following batch:

 
USE [AdventureWorks2012];
GO

SELECT [PersonID], [FirstName], [LastName], [JobTitle], [BusinessEntityType]
FROM [dbo].[ufnGetContactInformation](2)
OPTION (QUERYTRACEON 9481);  -- Legacy CE

SELECT [PersonID], [FirstName], [LastName], [JobTitle], [BusinessEntityType]
FROM [dbo].[ufnGetContactInformation](2)
OPTION (QUERYTRACEON 2312); -- New CE

The first query uses the legacy cardinality estimator and the second query uses the new cardinality estimator. Both queries are referencing a multi-statement table valued function.

Looking at the plan tree view in SQL Sentry Plan Explorer for the legacy CE plan, you’ll see the following (estimating 1 row for the function operators):

Looking at the new CE version of the plan tree, we see the following (estimating 100 rows for the function operators):

SQL Server 2014 uses a new default fixed cardinality for multi-statement table valued functions.

A few thoughts:

Whether 1 row or 100 rows, we’re still using a fixed guess that may or may not reflect reality
I’m very wary of using MSTVFs in scenarios where the estimate is critical for plan quality (and oftentimes it is)

The post MSTVF Fixed Cardinality Value in SQL Server 2014 appeared first on Joe Sack.

March Speaking Engagements

Joseph Sack — Mon, 17 Mar 2014 14:11:15 +0000

If you’re in the area, just a heads-up that I’ll be speaking at the following events:

PASSMN Minnesota SQL Server User Group
Location:    3601 West 76th Street, Suite 600 Edina, MN 55437
Date:    March 18, 2014
Time:    4:00 PM – 6:00 PM

SQLSaturday #287
Location: 6000 American Parkway, Building A, Madison, WI 53783
Date: March 29, 2014
Time: My session is in the morning, but there is a ton of good content all day!

I’ll be delivering the presentation “Practical SQL Server Cardinality Estimation”. The session summary is as follows:

What is cardinality estimation and why should you care? In this session we’ll cover the mechanics behind cardinality estimation and why it is so very critical to overall query performance. We’ll cover concepts applicable to the last few major versions of SQL Server and also preview cardinality estimator changes being introduced in SQL Server 2014.

Hope to see you there!

The post March Speaking Engagements appeared first on Joe Sack.

Using the SQL Server 2014 Database Compatibility Level without the New CE

Joseph Sack — Sun, 02 Mar 2014 21:53:18 +0000

Consider the following scenario:

You want to migrate all databases on a specific SQL Server instance to a new SQL Server 2014 instance
You want to leverage new SQL Server 2014 functionality and move to the latest database compatibility level
You don’t want to enable the new Cardinality Estimator right away (various reasons why this might be – but perhaps you didn’t have time to fully test key workloads)
You don’t want to have to manually add QUERYTRACEON hints for specific queries

To accommodate this scenario, you can do the following:

Change the migrated databases to COMPATIBILITY_LEVEL = 120 in order to leverage the SQL Server 2014 database compatibility level
Enable trace flag 9481 at the server-level as a startup trace flag (or via DBCC TRACEON – but remembering this doesn’t persist on restarts unless you re-execute)

Trace flag 9481, for using the legacy CE behavior and trace flag 2312, for using the new CE behavior are both fully supported and documented by Microsoft here:

Enable plan-affecting SQL Server query optimizer behavior that can be controlled by different trace flags on a specific-query level

That KB article focuses mostly on QUERYTRACEON – but the CE trace flags can apply at the server-level scope as well.

There are other combinations of CE enabling/disabling that you can use, depending on your requirements, but I just wanted to point out what I think will be a more common scenario.

The post Using the SQL Server 2014 Database Compatibility Level without the New CE appeared first on Joe Sack.

For the New CE, Database Session Context Matters

Joseph Sack — Sun, 02 Mar 2014 21:17:45 +0000

Testing on SQL Server 2014, CTP2, version 12.0.1524, imagine you have two databases – one called T1 and one called T2 configured with database compatibility levels as follows:

USE [master];
GO

ALTER DATABASE [T1] SET COMPATIBILITY_LEVEL = 120;
GO

ALTER DATABASE [T2] SET COMPATIBILITY_LEVEL = 110;
GO

Now with database T1, we know that using a compatibility level of 120 means we’ll be using the new cardinality estimator (CE) – assuming we’re in the database session context of a new CE database and don’t have a trace flag disabling the new CE behavior.

Executing the following query in the context of T1 does indeed mean we use the new CE:

USE [T1];
GO

SELECT    [member_no],
[provider_no],
[category_no]
FROM [dbo].[charge]
WHERE [charge_no] = 422;

CardinalityEstimationModelVersion=”120″ StatementSubTreeCost=”0.0032831″ StatementText=”SELECT [member_no],[provider_no],[category_no] FROM [dbo].[charge] WHERE [charge_no]=@1″ StatementType=”SELECT” QueryHash=”0x274BD0D496403EEE” QueryPlanHash=”0x6B5F27FE55FE8A5C” RetrievedFromCache=”true”>

But what if we change the query to be in the context of the T2 database (legacy CE) – but still access data from the T1 database?

USE [T2];
GO

SELECT    [member_no],
[provider_no],
[category_no]
FROM [T1].[dbo].[charge]
WHERE [charge_no] = 422;

Now we see the query used a legacy CE:

CardinalityEstimationModelVersion=”70″ StatementSubTreeCost=”0.0032831″ StatementText=”SELECT [member_no],[provider_no],[category_no] FROM [T1].[dbo].[charge] WHERE [charge_no]=@1″ StatementType=”SELECT” QueryHash=”0x274BD0D496403EEE” QueryPlanHash=”0x6B5F27FE55FE8A5C” RetrievedFromCache=”true”>

What if the cross-database query is executed from a new CE session context – but the destination is the legacy CE?

USE [T1];
GO

SELECT    [member_no],
[provider_no],
[category_no]
FROM [T2].[dbo].[charge]
WHERE [charge_no] = 422;

In this scenario, the query uses the new CE – based on the database session context – even though the destination database is set to compatibility level 110.

What about accessing data from two databases (rather than my previous example of just accessing data from one database)? The following example results in a legacy CE plan:

USE [T2];
GO

SELECT    [member_no],
[provider_no],
[category_no]
FROM [T1].[dbo].[charge]
WHERE [charge_no] = 422
UNION
SELECT    [member_no],
[provider_no],
[category_no]
FROM [T2].[dbo].[charge]
WHERE [charge_no] = 422;
GO

And this query results in a new CE plan:

USE [T1];
GO

SELECT    [member_no],
[provider_no],
[category_no]
FROM [T1].[dbo].[charge]
WHERE [charge_no] = 422
UNION
SELECT    [member_no],
[provider_no],
[category_no]
FROM [T2].[dbo].[charge]
WHERE [charge_no] = 422;
GO

So – bottom line – using the new CE doesn’t mean just changing the database compatibility level. Database session context also matters.

The post For the New CE, Database Session Context Matters appeared first on Joe Sack.

Troubleshooting the new Cardinality Estimator

Joseph Sack — Sun, 29 Dec 2013 23:42:52 +0000

This post is a continuation of the SQL Server 2014 Cardinality Estimator enhancements exploration series:

This post, like the previous nine posts on this subject, uses SQL Server 2014 CTP2 as a reference point. There may be changes by SQL Server 2014 RTM, and if so, I’ll write a post about applicable changes.

Now in terms of troubleshooting the new Cardinality Estimator, what I’m specifically referring to is the introduction of cardinality estimate skews that negatively impact query performance compared to the pre-SQL Server 2014 cardinality estimator functionality. Ideally performance regressions should be rare, but when they happen, what are our troubleshooting options?

To frame this discussion, let’s first discuss what may or may not warrant action…

No Action (Necessarily) Needed

The estimates are identical to old CE functionality and the query plan is unchanged
The estimates are skewed compared to the old CE functionality, but the query plan “shape” is identical (and you see no side-effects from the skews, such as sort or hash spills and query runtime degradation)
The estimates are skewed compared to the old CE functionality, the plan is different, but performance is equal or improved – or even more stable
The estimates are skewed compared to the old CE functionality, the plan is different, and performance is somewhat impacted but not enough to justify changes (totally depends on your SLAs & workload performance requirements of course)

Action Potentially Needed

The estimates are skewed, the plan shape is unchanged, but the estimates lead to performance issues such as spills (due to under-estimates) or concurrency issues (due to over-estimates) for memory-intensive operators
The estimates are skewed, the plan is changed, and the plan quality leads to performance degradation (a variety of query optimizer plan choices which may lead to issues)

So what troubleshooting methods and options are available to us?

Troubleshooting Options

Do nothing (yeah, I know, but this can be a decision you ultimately make, looking at risk/effort/reward)
Revert to the pre-SQL Server 2014 CE version (for example, via database compatibility level change)
Apply legacy troubleshooting methods, which may fix other issues directly or indirectly related to the skew and thus help close the gap (framing these legacy methods as questions below)
- Are the statistics old and need updating?
- Should the statistics sampling be changed?
- Are multi-column stats needed to help establish a correlation where one currently isn’t seen by the query optimizer?
- Parameter sniffing troubleshooting needed? (a much larger topic, but indulge me on including this as a method)
- Is your table variable usage contributing to the skew?
- Is your multi-statement table-valued function or scalar user-defined function contributing to the skew?
- Any data-type conversions occurring for predicates (join or filter)?
- Are you comparing column values from the same table?
- Is your column reference being buried by a function or embedded in a complex expression?
- Are hints being used and if so, is their usage appropriate?

The new CE story will unfold as customers start upgrading to SQL Server 2014 and I’ll be curious to see which regression patterns are most common.

Open Questions

Regarding the new query_optimizer_estimate_cardinality XE event… Will it be a practical source of information for most SQL Server users in cardinality estimator skew regression scenarios – or will it be something reserved for edge-cases and advanced Microsoft customer support scenarios? I suspect this XE event will have limited applicability, but I’m reserving judgment for now.
Will SQL Server 2014 RTM introduce finer-grained methods for reverting to the pre-2014 cardinality estimator?
How will the new CE behave with newer functionality? For example, Hekaton and clustered columnstore indexes.
Will this be it for CE changes for the next few versions? There is plenty left on the CE-improvement wish list, so I hope not.

The post Troubleshooting the new Cardinality Estimator appeared first on Joe Sack.

Non-Join Cross-Table Predicate Correlation Changes

Joseph Sack — Fri, 20 Dec 2013 16:30:38 +0000

This post is a continuation of the SQL Server 2014 Cardinality Estimator enhancements exploration series:

In previous posts for this series, I discussed how the assumption of independence with regards to multiple predicates against the same table, in absence of multi-column stats, is blunted a bit with the new Cardinality Estimator for SQL Server 2014. So your estimates may increase for this scenario.

On the flip-side, when it comes to joins between two tables, you may see a reduction in join estimate values for scenarios where there are non-join filter predicates on the tables being joined.

Take the following example, using pre-SQL Server 2014 CE behavior:

USE [master]
GO

ALTER DATABASE [Credit] SET COMPATIBILITY_LEVEL = 110
GO

USE [Credit];
GO

SELECT  [m].[member_no] ,
[m].[lastname] ,
[p].[payment_no] ,
[p].[payment_dt] ,
[p].[payment_amt]
FROM    dbo.[member] AS [m]
INNER JOIN dbo.[payment] AS [p]
ON      [m].[member_no] = p.[member_no]
WHERE   [m].[region_no] = 2
AND [p].[payment_dt] = '1999-09-02 00:00:00.000'
OPTION  ( RECOMPILE );
GO

The SQL Sentry Plan Explorer plan tree view is as follows:

We see that the estimates are spot-on for the Clustered Index Scan and Table Scan, but we have an over-estimate for the Hash Match operation (1,767 estimated vs. 1,003 actual).

Now if I set the database compatibility level to 120 and re-execute the query, here is what we see instead:

We still have identical estimate vs. actual values for the Clustered Index Scan and Table Scan, and now our over-estimate for the Hash Match is less pronounced (1,140 rows estimated instead of the 1,767 rows previously estimated).

For the pre-SQL Server 2014 cardinality estimation process, the assumption is that the non-join predicates for the two tables are somehow correlated (in our example, region_no = 2 and payment_dt = ‘1999-09-02 00:00:00.000’). This is called “Simple Containment”. For the new Cardinality Estimator, these non-join predicates are assumed to be independent (called “Base Containment”), and so this can translate into a reduced row estimate for the join.

The post Non-Join Cross-Table Predicate Correlation Changes appeared first on Joe Sack.

More on Exponential Backoff

Joseph Sack — Thu, 19 Dec 2013 21:26:31 +0000

This post is a continuation of the SQL Server 2014 Cardinality Estimator enhancements exploration series:

Continuing the subject of exponential backoffs (from the 2nd and 3rd posts in this series), let’s restore the Credit sample database back to the baseline version and execute the following script:

USE [master];
GO

ALTER DATABASE [Credit] SET COMPATIBILITY_LEVEL = 120;
GO

USE [Credit];
GO

-- Add four new columns
ALTER TABLE [dbo].[member]
ADD [arbitrary_1] BIGINT NULL;

ALTER TABLE [dbo].[member]
ADD [arbitrary_2] BIGINT NULL;

ALTER TABLE [dbo].[member]
ADD [arbitrary_3] BIGINT NULL;

ALTER TABLE [dbo].[member]
ADD [arbitrary_4] BIGINT NULL;

I changed the database to the latest version so we use the new CE and then added four new columns.

Next, let’s update the values of the four new columns using different distributions:

;WITH    CTE_NTILE
AS ( SELECT   [member_no] ,
NTILE(10) OVER ( ORDER BY [member_no] DESC ) AS [arbitrary_1] ,
NTILE(2) OVER ( ORDER BY [member_no] DESC ) AS [arbitrary_2] ,
NTILE(4) OVER ( ORDER BY [member_no] DESC ) AS [arbitrary_3] ,
NTILE(250) OVER ( ORDER BY [member_no] DESC ) AS [arbitrary_4]
FROM     [dbo].[member]
)
UPDATE  [dbo].[member]
SET     [arbitrary_1] = [c].[arbitrary_1] ,
[arbitrary_2] = [c].[arbitrary_2] ,
[arbitrary_3] = [c].[arbitrary_3] ,
[arbitrary_4] = [c].[arbitrary_4]
FROM    [dbo].[member] AS [m]
INNER JOIN CTE_NTILE AS [c]
ON      [c].[member_no] = [m].[member_no];
GO

Looking at the estimates for single-predicate queries, if I execute the following, I’ll get an estimate of 1,000 rows:

SELECT  [member_no]
FROM    [dbo].[member]
WHERE   [arbitrary_1] = 1
OPTION  ( RECOMPILE );

For this next query I’ll get an estimate of 5,000 rows:

SELECT  [member_no]
FROM    [dbo].[member]
WHERE   [arbitrary_2] = 1
OPTION  ( RECOMPILE );

And for this next query, an estimate of 2,500 rows:

SELECT  [member_no]
FROM    [dbo].[member]
WHERE   [arbitrary_3] = 1
OPTION  ( RECOMPILE );

And lastly (for single-predicate examples anyhow), an estimate of 40 rows:

SELECT  [member_no]
FROM    [dbo].[member]
WHERE   [arbitrary_4] = 1
OPTION  ( RECOMPILE );

Now let’s start adding multiple predicates per statement. The first example with multiple predicates uses two predicates – one with a selectivity of 0.1 and one of 0.5:

SELECT  [member_no]
FROM    [dbo].[member]
WHERE   [arbitrary_1] = 1 AND -- 0.1 selectivity
[arbitrary_2] = 1 -- 0.5 selectivity
OPTION  ( RECOMPILE );

The estimate for this query is 707.107 with the new CE, which we can derive using the POWER function in T-SQL as follows (I used Excel last time to do this, so see the previous posts for the background information on this calculation):

SELECT  10000 *0.10 * POWER(0.500000, 0.50);

That returned 707.107.

Now what about a query with three predicates, with selectivities of 0.1, 0.5 and 0.25?

SELECT  [member_no]
FROM    [dbo].[member]
WHERE   [arbitrary_1] = 1 AND -- .1 selectivity
[arbitrary_2] = 1 AND -- .5 selectivity
[arbitrary_3] = 1 -- .25 selectivity
OPTION  ( RECOMPILE );

The estimate for this was 420.448, and we can derive this via the following expression (and notice the order of selectivities goes from smallest to highest):

-- Notice the selectivity order (0.10, 0.25, .50)
SELECT  10000 * 0.10 * POWER(0.250000,0.50) * POWER(0.500000, 0.25);

Now let’s reference all four columns (with selectivities of 0.1, 0.5, 0.25 and 0.004):

SELECT  [member_no]
FROM    [dbo].[member]
WHERE   [arbitrary_1] = 1 AND -- .1 selectivity
[arbitrary_2] = 1 AND -- .5 selectivity
[arbitrary_3] = 1 AND -- .25 selectivity
[arbitrary_4] = 1  -- 0.004 selectivity
OPTION  ( RECOMPILE );

The estimate is 8.20193 and we can derive this via the following:

SELECT  10000 * 0.004* POWER(0.1000000, 0.50) * POWER(0.2500000, 0.25) * POWER(0.5000000, 0.125);

The selectivities are ordered from most selective to least selective, and the the less selective values get the “back offs” in order of none, 1/2, 1/4, and 1/8.

The post More on Exponential Backoff appeared first on Joe Sack.