Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator

I’ve been working on writing a new Microsoft white paper since January covering the main changes made in the SQL Server 2014 Cardinality Estimator, and happy to announce that it was just published:

Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator

A big thanks to the contributor and reviewer team – including Yi Fang (Microsoft), Shep Sheppard (Microsoft), Mike Weiner (Microsoft), Paul White (SQL Kiwi Limited), Barbara Kess (Microsoft),  Jimmy May (Microsoft), Sanjay Mishra (Microsoft), Vassilis Papadimos (Microsoft) and Jack Li (Microsoft).  And thanks to Kathy MacDonald (Microsoft) for managing the whole process!

Finding Timeout-Prone Code with max_elapsed_time

Consider a scenario where your application timeout setting is 30 seconds and you’ve been asked to proactively start tuning workloads that are at risk of timing out.  There are several different places you can start, and I want to demonstrate an additional option today.

To demonstrate timeout settings, I can configure it in SQL Server Management Studio via the “Execution time-out” setting:

image thumb Finding Timeout Prone Code with max elapsed time

Next – let’s create two procedures that have two different WAITFOR thresholds (one which doesn’t time out and another that will):

CREATE PROCEDURE [dbo].[TimeOutProne]
AS
WAITFOR DELAY '00:00:29';
GO

CREATE PROCEDURE [dbo].[TimeOutProneV2]
AS
WAITFOR DELAY '00:00:30';
GO

Execute both of these (starting with the one that succeeds and the one that times out next):

EXEC [dbo].[TimeOutProne];
GO

EXEC [dbo].[TimeOutProneV2];
GO

The first execute succeeds and the second times out with the following error message:

Msg -2, Level 11, State 0, Line 13
Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.

Now execute the following query against dm_exec_procedure_stats:

SELECT  [database_id] ,
OBJECT_NAME([object_id]) AS [object_name] ,
[max_elapsed_time],
[plan_handle],
[sql_handle]
FROM [sys].[dm_exec_procedure_stats]
WHERE [max_elapsed_time] > 28000000;  -- microseconds, "but only accurate to milliseconds"

Interestingly enough – even though the second query timed out – we still have accumulated max_elapsed_time stats for it prior to timeout:

image thumb1 Finding Timeout Prone Code with max elapsed time

I included plan_handle and sql_handle for later hook-ins – just keeping things minimal for demonstration purposes.

So if you’re seeing timeouts and want to start narrowing down workloads that are getting near or exceeding the timeout, both sys.dm_exec_procedure_stats and sys.sys.dm_exec_query_stats max_elapsed_time can be a helpful piece of information to work with.

Deck from “Practical SQL Server Cardinality Estimation”

I uploaded the deck from my session for SQLSaturday #287 Madison 2014 and you can download it here.

It was an excellent event!  Great organizers, volunteers, venue, speakers – and a matching big turnout with lots of first-time SQLSaturday attendees. 

Grateful for the experience and hope to be back again next year.

MSTVF Fixed Cardinality Value in SQL Server 2014

In SQL Server 2014 CTP2 in the AdventureWorks2012 database, execute the following batch:

 
USE [AdventureWorks2012];
GO

SELECT [PersonID], [FirstName], [LastName], [JobTitle], [BusinessEntityType]
FROM [dbo].[ufnGetContactInformation](2)
OPTION (QUERYTRACEON 9481);  -- Legacy CE

SELECT [PersonID], [FirstName], [LastName], [JobTitle], [BusinessEntityType]
FROM [dbo].[ufnGetContactInformation](2)
OPTION (QUERYTRACEON 2312); -- New CE

The first query uses the legacy cardinality estimator and the second query uses the new cardinality estimator.   Both queries are referencing a multi-statement table valued function.

Looking at the plan tree view in SQL Sentry Plan Explorer for the legacy CE plan, you’ll see the following (estimating 1 row for the function operators):

image thumb MSTVF Fixed Cardinality Value in SQL Server 2014

Looking at the new CE version of the plan tree, we see the following (estimating 100 rows for the function operators):

image thumb1 MSTVF Fixed Cardinality Value in SQL Server 2014

SQL Server 2014 uses a new default fixed cardinality for multi-statement table valued functions.

A few thoughts:

  • Whether 1 row or 100 rows, we’re still using a fixed guess that may or may not reflect reality
  • I’m very wary of using MSTVFs in scenarios where the estimate is critical for plan quality (and oftentimes it is)

March Speaking Engagements

If you’re in the area, just a heads-up that I’ll be speaking at the following events:

PASSMN Minnesota SQL Server User Group
Location:
    3601 West 76th Street, Suite 600 Edina, MN 55437
Date:    March 18, 2014
Time:    4:00 PM – 6:00 PM

SQLSaturday #287
Location: 6000 American Parkway, Building A, Madison, WI 53783
Date: March 29, 2014
Time: My session is in the morning, but there is a ton of good content all day!

I’ll be delivering the presentation “Practical SQL Server Cardinality Estimation”.  The session summary is as follows:

What is cardinality estimation and why should you care? In this session we’ll cover the mechanics behind cardinality estimation and why it is so very critical to overall query performance. We’ll cover concepts applicable to the last few major versions of SQL Server and also preview cardinality estimator changes being introduced in SQL Server 2014.

Hope to see you there!

Using the SQL Server 2014 Database Compatibility Level without the New CE

Consider the following scenario:

  • You want to migrate all databases on a specific SQL Server instance to a new SQL Server 2014 instance
  • You want to leverage new SQL Server 2014 functionality and move to the latest database compatibility level
  • You don’t want to enable the new Cardinality Estimator right away (various reasons why this might be – but perhaps you didn’t have time to fully test key workloads)
  • You don’t want to have to manually add QUERYTRACEON hints for specific queries

To accommodate this scenario, you can do the following:

  • Change the migrated databases to COMPATIBILITY_LEVEL = 120 in order to leverage the SQL Server 2014 database compatibility level
  • Enable trace flag 9481 at the server-level as a startup trace flag (or via DBCC TRACEON – but remembering this doesn’t persist on restarts unless you re-execute)

Trace flag 9481, for using the legacy CE behavior and trace flag 2312, for using the new CE behavior are both fully supported and documented by Microsoft here:

Enable plan-affecting SQL Server query optimizer behavior that can be controlled by different trace flags on a specific-query level

That KB article focuses mostly on QUERYTRACEON – but the CE trace flags can apply at the server-level scope as well.

There are other combinations of CE enabling/disabling that you can use, depending on your requirements, but I just wanted to point out what I think will be a more common scenario.

For the New CE, Database Session Context Matters

Testing on SQL Server 2014, CTP2, version 12.0.1524, imagine you have two databases – one called T1 and one called T2 configured with database compatibility levels as follows:

USE [master];
GO

ALTER DATABASE [T1] SET COMPATIBILITY_LEVEL = 120;
GO

ALTER DATABASE [T2] SET COMPATIBILITY_LEVEL = 110;
GO
 

Now with database T1, we know that using a compatibility level of 120 means we’ll be using the new cardinality estimator (CE) – assuming we’re in the database session context of a new CE database and don’t have a trace flag disabling the new CE behavior.

Executing the following query in the context of T1 does indeed mean we use the new CE:

USE [T1];
GO

SELECT    [member_no],
[provider_no],
[category_no]
FROM [dbo].[charge]
WHERE [charge_no] = 422;
 

<StmtSimple StatementCompId=”1″ StatementEstRows=”1″ StatementId=”1″ StatementOptmLevel=”TRIVIAL” CardinalityEstimationModelVersion=”120″ StatementSubTreeCost=”0.0032831″ StatementText=”SELECT [member_no],[provider_no],[category_no] FROM [dbo].[charge] WHERE [charge_no]=@1″ StatementType=”SELECT” QueryHash=”0x274BD0D496403EEE” QueryPlanHash=”0x6B5F27FE55FE8A5C” RetrievedFromCache=”true”>

But what if we change the query to be in the context of the T2 database (legacy CE) – but still access data from the T1 database?

USE [T2];
GO

SELECT    [member_no],
[provider_no],
[category_no]
FROM [T1].[dbo].[charge]
WHERE [charge_no] = 422;
 

Now we see the query used a legacy CE:

<StmtSimple StatementCompId=”1″ StatementEstRows=”1″ StatementId=”1″ StatementOptmLevel=”TRIVIAL” CardinalityEstimationModelVersion=”70″ StatementSubTreeCost=”0.0032831″ StatementText=”SELECT [member_no],[provider_no],[category_no] FROM [T1].[dbo].[charge] WHERE [charge_no]=@1″ StatementType=”SELECT” QueryHash=”0x274BD0D496403EEE” QueryPlanHash=”0x6B5F27FE55FE8A5C” RetrievedFromCache=”true”>

What if the cross-database query is executed from a new CE session context – but the destination is the legacy CE?

USE [T1];
GO

SELECT    [member_no],
[provider_no],
[category_no]
FROM [T2].[dbo].[charge]
WHERE [charge_no] = 422;
 

In this scenario, the query uses the new CE – based on the database session context – even though the destination database is set to compatibility level 110.

What about accessing data from two databases (rather than my previous example of just accessing data from one database)?  The following example results in a legacy CE plan:

USE [T2];
GO

SELECT    [member_no],
[provider_no],
[category_no]
FROM [T1].[dbo].[charge]
WHERE [charge_no] = 422
UNION
SELECT    [member_no],
[provider_no],
[category_no]
FROM [T2].[dbo].[charge]
WHERE [charge_no] = 422;
GO
 

And this query results in a new CE plan:

USE [T1];
GO

SELECT    [member_no],
[provider_no],
[category_no]
FROM [T1].[dbo].[charge]
WHERE [charge_no] = 422
UNION
SELECT    [member_no],
[provider_no],
[category_no]
FROM [T2].[dbo].[charge]
WHERE [charge_no] = 422;
GO
 

So – bottom line – using the new CE doesn’t mean just changing the database compatibility level.  Database session context also matters.

New Course: “SQL Server: Transact-SQL Common Table Expressions”

Today Pluralsight published my new course, “SQL Server: Transact-SQL Common Table Expressions”.

This is a demo-centric, short course on how to create and use common table expressions (CTEs) correctly, for developers and DBAs from SQL Server 2005 onward.  Areas I cover include:

  • What are CTEs and Why Use Them?
  • CTE Syntax
  • Semicolon Requirements
  • When Column Names Must Be Specified
  • Execution Scope
  • Multiple CTE Query Definitions
  • Defining and Referencing Multiple CTEs
  • Multiple References To the Same CTE
  • CTEs with Data Modification Statements
  • CTEs and Views
  • CTEs and Stored Procedures
  • CTEs and Triggers
  • CTEs and User-Defined Functions
  • Recursive CTE Examples
  • Forward Referencing Restrictions
  • ORDER BY restrictions
  • INTO and CTE Restriction
  • CTE Nesting Restriction
  • OPTION Clause with Query Hints Restriction
  • Max Recursion Limitation
  • Late Filtering
  • Non-Recursive Self-Join
  • CTEs and Non-Deterministic Functions
  • CTEs vs. Derived Table Code Clarity
  • Windowing Functions, Predicates and CTEs
  • Return Lineage Information
  • Multiple Anchors
  • Sequence Generation
  • CTEs as Reference Table Replacements
  • Windowing Functions and Data Modifications
  • Referencing a Scalar Subselect in a Predicate

Much more to come from the SQLskills team!

Combining multiple CONTAINS predicates in SQL Server 2012

The SQL Server 2005  “Performance Tuning and Optimization (Full-Text Search)” books online topic has the following recommendation:

Combine multiple CONTAINS predicates into one CONTAINS predicate. In SQL Server you can specify a list of columns in the CONTAINS query.”

While this shows up for the SQL Server 2005 topic, there is no such statement made for the 2008 and 2012 versions of the the books online topic.

I was curious if this 2005-era recommendation still applied for more recent versions.  To test this scenario I made the following changes to the AdventureWorks2012 sample database on version Microsoft SQL Server 2012 (SP1), 11.0.3000.0 Developer Edition:

USE [AdventureWorks2012];
GO
ALTER TABLE [Production].[ProductDescription]
ADD [Description2] NVARCHAR(400) NOT NULL DEFAULT '';
GO

-- Populating the new description column
UPDATE [Production].[ProductDescription]
SET [Description2] = [Description];
GO

CREATE FULLTEXT INDEX ON [Production].[ProductDescription] KEY INDEX [PK_ProductDescription_ProductDescriptionID] ON ([AW2008FullTextCatalog])
WITH (CHANGE_TRACKING AUTO);
GO

ALTER FULLTEXT INDEX ON [Production].[ProductDescription] ADD ([Description]);
ALTER FULLTEXT INDEX ON [Production].[ProductDescription] ADD ([Description2]);
GO

Now for an initial test, I executed the following single CONTAINS predicate query:

SET STATISTICS IO ON;

SELECT  [ProductDescriptionID] ,
[Description] ,
[rowguid] ,
[ModifiedDate]
FROM [Production].[ProductDescription]
WHERE CONTAINS([Description], 'bike');

SET STATISTICS IO OFF;

The query execution plan tree (from SQL Sentry Plan Explorer) was as follows:

image thumb Combining multiple CONTAINS predicates in SQL Server 2012

And the STATISTICS IO output was as follows:

Table ‘ProductDescription’. Scan count 0, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Now for a multi-CONTAINS predicate, I executed the following example:

SET STATISTICS IO ON;

SELECT  [ProductDescriptionID] ,
[Description] ,
[rowguid] ,
[ModifiedDate]
FROM [Production].[ProductDescription]
WHERE CONTAINS([Description], 'bike') OR
CONTAINS([Description2], 'bike');

SET STATISTICS IO OFF;

The query execution plan tree (from SQL Sentry Plan Explorer) was as follows:

image thumb1 Combining multiple CONTAINS predicates in SQL Server 2012

The operators remained the same, but we have a change in estimated subtree cost and also a doubling up of cardinality estimates.

As for the STATISTICS IO output, it was as follows:

Table ‘ProductDescription’. Scan count 0, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Logical reads were 34 versus the original 28.

For the last example, I combined the two CONTAINS predicates as follows:

SET STATISTICS IO ON;

SELECT  [ProductDescriptionID] ,
[Description] ,
[rowguid] ,
[ModifiedDate]
FROM [Production].[ProductDescription]
WHERE CONTAINS(([Description], [Description2]), 'bike');

SET STATISTICS IO OFF;

I found that with the combined CONTAINS, the plan shape for this example remained the same as the multiple-CONTAINS version. Operator costs and estimates also remained the same.

For SQL Server 2005, if you had multiple CONTAINS predicates referencing columns from the same table, your execution plan would show separate remote scan operators for each reference instead of them being combined natively into a single operator.  This behavior seems to have been addressed in recent versions, but if you have seen a variation of this issue given other conditions please share in the comments.  Thanks!

Finding a Deadlocked SQL Server Agent Job Name

Short and simple post – filed under “Joe – you’ve already forgotten about this twice, so write this down somewhere.”

Let’s say you have deadlock output that shows the following information for the “clientapp” attribute:

clientapp=SQLAgent – TSQL JobStep (Job 0xB813B96C59E6004CA8CD542D8A431A2E : Step 1)

Based on this output, we know a SQL Server Agent Job is involved, but what is the SQL Server Agent Job name?

Tested on SQL Server 2012 SP1, I can find it with the following query:

SELECT  [job_id] ,
        [name]
FROM [msdb].[dbo].[sysjobs]
WHERE [job_id] = 0xB813B96C59E6004CA8CD542D8A431A2E;
GO

I’ve forgotten about this twice over time because the job_id uses a uniqueidentifier data type (in this example, the job_id value is 6CB913B8-E659-4C00-A8CD-542D8A431A2E).  And as an aside, the query execution plan shows a CONVERT_IMPLICIT as I would expect and uses a Clustered Index Seek.