Nonclustered indexes require the “lookup” key in the b-tree when?

I received a great question in email and it’s something I cover in our IEPTO1 (Immersion Event on Performance Tuning, Part 1) so I thought I’d write a post about it…


When you have a non-unique key value in a nonclustered index, SQL Server adds the RID / Row Identifier (if the NC is on a heap table) or the clustered key (if the table has a clustered index) to the navigational structure (the b-tree) of the nonclustered index to make non-unique index rows unique. What is the reason behind that implementation?


SQL Server requires the lookup value (the RID or the clustering key) to go up the tree when a nonclustered key is non-unique in order to efficiently locate rows in the leaf level.

Scenario: Take a non-unique, nonclustered index on gender (on an employee table that’s clustered by EmployeeID). To show this, I’ll create a simple tree structure with only 24 rows (12 female and 12 male employees).

Note: this is NOT exactly what the index looks like but for simplicity, I’ll use this. Also, this data set is small for simplicity (you can understand this problem with a small data set); the problem is even more exaggerated for larger and larger data sets. Finally, if you’re new to index internals, consider checking out my online course here.

Possible Option 1: Only push the non-unique nonclustered key up the tree

If the leaf level is sorted only by the nonclustered key of gender then there’s no order to the EmployeeID values within each gender grouping:

Imagine a nonclustered index on gender with a clustered index on EmployeeID. If the leaf level were not sorted then every request (DELETE / UPDATE) would have to scan ALL of the Fs to find the female row they were searching for… This is a “What If” picture; this is NOT how nonclustered indexes are stored.

In this case, the index has very few uses. We can get a count of Female or Male employees easily but not much else is efficient. A SELECT will have to scan (from the start – every time); it can stop once the record is found (EmployeeID is unique) but the cost of always starting at page 1 (for female) or page 4 (for male) would be too expensive with many rows / pages.

Having said that though, you might even start by wondering what an index on gender would benefit anyway? The most useful requests are exactly that (counts by gender, list by gender). So, for most queries this structure could work and it really wouldn’t matter for queries. Inserts could just go to the end of each section. But, what about deletes? What if an employee were to leave? If you were to execute the following:

DELETE [Employee] WHERE [EmployeeID] = 27;

How would SQL Server find that row? They would know that the employee is female when they went to the data row itself. But, finding this row within the female grouping of this nonclustered index would ALWAYS require a scan through all female rows. The same would be true for an update.

So, while this could work for inserts as well as some queries, it’s horribly inefficient for updates and deletes.

Possible Option 2: Sort the leaf level but still ONLY push the non-unique nonclustered key up the tree

Even if they were sorted – if you don’t “push” the clustered key value (EmployeeID) up the tree, the index cannot be efficiently searched / modified, we still have to start at the lowest value and scan until we find the row to DELETE / UPDATE. This is a “What If” picture; this is NOT how nonclustered indexes are stored.

Here we don’t save much; we can still stop a scan once a value was found. But, we can do that in the structure for Option 1 (as long as EmployeeID was defined as unique).  About the only added benefit of this structure is for an ORDER BY query.

SELECT [EmployeeID], [Gender]
FROM [Employee]
ORDER BY [Gender], [EmployeeID];

For Option 1 this query would have to add an additional sort. For Option 2, we’d save this. But, we’d still have the same problem for a DELETE / UPDATE; we’d have to scan to find the specific row to modify.

So, we’re really back to the same issues as Option 1: this could work for inserts as well as even a few more queries, it’s horribly inefficient for updates and deletes.

SQL Server pushes the “lookup value” up the tree

This makes each row in a nonclustered tree unique. This makes every query directly seekable to the specific page on which that row resides. We can support queries with additional predicates (such as WHERE EmployeeID >). Again, this index doesn’t provide a lot of uses BUT it’s still MORE useful when it’s sorted AND seekable by the composite key and where that key is unique up the tree.

A delete can go directly to the leaf page.

DELETE [Employee] WHERE [EmployeeID] = 27;

After going to the row, we’d find that EmployeeID 27 is female. Using the root page we can see that the first pointer is for EmployeeID 2 or higher. The second pointer is for EmployeeID 59 or higher. Since we’re looking for 27 – we’d use the first pointer which points to page 1 for data.

In the same way, an update can go directly to the leaf page. An insert has a specific place to insert (yes, these can cause splits).

With both the non-unique nonclustered key as well as the clustered key, you have a tree structure that can be efficiently searched / modified / updated.

So, while I could have used a larger and more realistic example for this – this definitely shows the need for the clustering key in the b-tree portion of an index.

Have fun and keep those questions coming!




SQLintersection Conference and SQLafterDark Evening Event – what a fantastic week in Vegas!







The wonderful skyline in Vegas. This is the view of the hotel “New York, New York” which is directly across the street from the MGM Grand where we’ve held SQLintersection for Fall 2013, Fall 2014, and we’ll be heading back for Fall 2015. We hope to see you there!

It’s been a busy few weeks… and it all came together amazing well last week as we had our twice yearly SQLintersection conference (the Fall event is usually held in Las Vegas). It was only 2 years ago that we decided to move to a different format and really handle a conference the way that WE want to (you can read a bit more about that here: SQLintersection: a new year, a new conference). And, I’m happy to say that most of it went really well. We’re still learning little tips and tricks along the way and there are ALWAYS things that we can improve but overall, if we can keep on the same track as last week’s event, we’ll be very happy!

First off, our speaker line-up was PHENOMENAL!

  • Aaron Bertrand, Sr. Consultant, SQL Sentry, Inc. [blog | twitter]
  • Andrew J. Kelly, Mentor, SolidQ [blog | twitter]
  • Bob Ward, Principal Architect Escalation Engineer, Microsoft [blog | twitter]
  • Brent Ozar, Brent Ozar Unlimited [blog | twitter]
  • David Pless, Dedicated Support Engineer (PFE), Microsoft
  • Glenn Berry, Principal Consultant, [blog twitter]
  • Jeremiah Peschka, Brent Ozar Unlimited [blog | twitter]
  • Jonathan Kehayias, Principal Consultant, [blog | twitter]
  • Joseph Sack, Sack Consulting [website | blog]
  • Kendra Little, Managing Director, Brent Ozar Unlimited [blog | twitter]
  • Kevin Farlee, Storage Engine Program Manager, SQL Server, Microsoft
  • Kevin Kline, Director of Engineering Services, SQL Sentry, Inc. [blog | twitter]
  • Kimberly L. Tripp, President/Founder, [blog | twitter]
  • Paul S. Randal, CEO/Owner, [blog | twitter]
  • Paul White, SQL Kiwi Limited [blog | twitter]
  • Shep Sheppard, Senior Program Manager, Microsoft AzureCAT [blog twitter]
  • Tim Chapman, Dedicated Support Engineer (PFE), Microsoft [blog | twitter]

Our Content was Top Notch!

In addition to our 8 full-day workshops (3 precons on Sunday, 3 precons on Monday, and 2 post-cons on Friday), our 44 regular conference sessions and our two fantastic SQL-oriented keynotes (in additional to many developer and show keynotes), we also held our first-ever SQLafterDark evening event. We put this together somewhat at the last minute as we decided that we wanted to have more of a social evening event just for SQL folks. And, despite the late addition – it worked out perfectly.

201411081046-2014-11-08 10.46.10

SQLintersection workshop workbooks – lots of content over 3 different workshop days – and a few folks did ALL THREE days of workshops!

SQLafterDark – a new and PERMANENT addition to SQLintersection


Almost every speaker was able to join us and we came up with some great prizes to raffle off AND some great questions for a trivia game. We started the evening with everyone coming in and grabbing a drink. We socialized for the first 30+ mins and I brought attendees and speakers together by hearing questions and then matching the right folks together. We all relaxed and unwound from an intense day of sessions and we all had a drink (the first round was sponsored by SQL Sentry and SQLskills).


Roundtables, an open bar, TONs of great prizes, a great set of SQL folks all in one place – to relax / chat / unwind. Definitely an event to repeat!


Since it was SQLafterDark, I decided to go with a grainy black and white image of the event. What a fantastic evening. A new STAPLE for our SQLintersection event.


Our 24-question, two-round trivia game was filled with both technical and pop-culture trivia (questions supplied by SQL speakers and tied to their interests). Here’s the team that WON the overall game for the night. They each received a SQL Sentry Plan Explorer Pro license for their having won the challenging game. They also got bragging rights for the rest of the conference!

Accessing our FANTASTIC content post-show

This is one of the areas that we still need to improve upon. And, for this I’d like to apologize. It’s now 3 days after the show and the content has not yet been posted online. For those of you who have sent me mail – keep on doing that. I’m happy to get you the content you need directly. As for a show (with many moving parts), I’ve learned that I can only “ask” so much. And, since I don’t directly control the posting, it’s harder than I imagined to get all of the “updates” posted to our update location. If you were at the event then you know the location where this content is supposed to be and you know that I had told you that it should be available by Monday. I’m not going to point fingers but I had been told that it would be posted Friday / Monday after the show. But, a combination of things has delayed this process (no, it’s NOT acceptable IMO). This is one of the things that we WILL be improving for our next show – I PROMISE!

So… until all of the updates are posted (which really should be VERY soon; I’ve been PROMISED today / tonight), feel free to email me with your requests for the content that you need / want (give me the speaker and session name and I’ll work to get the resources to you ASAP).

UPDATE: About 2 hours after I wrote this post, I received the email that our resources are online! Yeah. And, we’ll make sure that this process is streamlined for the next conference so that things don’t go into the following week. Thanks for your patience everyone!!

Finally, a few pictures from the DEVintersection and SQLintersection conferences!


Our individual show logos were highlighted on the walls…


This was just as we were setting up for the first evening keynote by Microsoft Executive Vice President Scott Guthrie. Scott’s on stage; he’s in red (way off in the distance).


Here, Scott Guthrie and Richard Campbell (of DotNetRocks and RunAs Radio) prepare for the opening keynote. Talk about two fantastic techie folks taking the stage…


One more of Scott and Richard.


Turn off the lights, we’re going to open the doors!


And now, it’s time to start the keynote!


A few more angles for our opening keynote session with Scott Guthrie – Executive Vice President of the Cloud and Enterprise group in Microsoft. He is best known for his work on ASP.NET, which he and colleague Mark Anders developed while at Microsoft.


A few more angles for our opening keynote session with Scott Guthrie – Executive Vice President of the Cloud and Enterprise group in Microsoft. He is best known for his work on ASP.NET, which he and colleague Mark Anders developed while at Microsoft.


Last one! A few more angles for our opening keynote session with Scott Guthrie – Executive Vice President of the Cloud and Enterprise group in Microsoft. He is best known for his work on ASP.NET, which he and colleague Mark Anders developed while at Microsoft.

201411111304-2014-11-11 13.04.16

Lunch with 1500+ of my favorite techie folks (the lunch hall had just opened so I grabbed a quick shot before folks sat down). I heard lots of great things about our food during the event – which is always a good thing as it’s hard to feed so many folks at the same time! But, Vegas does this really well!

And now for our SQLintersection sessions:


Here’s Kendra Little showing an execution plan.

Here’s Bob Ward delivering an 8:30am session on Troubleshooting. I love this session and his words of wisdom on how to troubleshoot so that you can avoid calling him! This session always makes me think of Cuba Gooding Jr. in Jerry Maguire – “Help me, help you.”


I had a packed house for sessions on statement execution and stored procedures and with lunch in between my two sessions, I got to answer a lot of questions and hang out in the room (I try to do this if I can as it gives me more time with attendees). THANKS for all of your great questions!


Another angle, another session… this time it’s optimizing stored procedures!


OK, thanks Paul – you got me again! ;-)


Here’s Paul White discussing Parallelism in a two-parter of deep 300 and 400 level sessions.


Here’s Brent talking something… (OK, I can’t remember exactly which session this was). But, it is Brent – just doing his magical thing!


And here’s Microsoft’s Tim Chapman talking about tools for troubleshooting.


Andrew Kelly talks TempDB optimizations and usage.


Another one of Andrew Kelly talking tempdb.


And here’s my lovely husband talking Index Fragmentation. Or, something. Really, I don’t listen to him anymore… Ha! Just kidding!! Just checking to see if anyone’s actually reading all of these captions. ;-)

More fun stuff

On Tuesday evening, we had a speaker cocktail party after speakers / attendees went to the expo hall. I don’t have a lot of pictures from this but I do have a favorite that I took of two of our dear friends – Mark Minasi (left) and Richard Campbell (right). These two are an ABSOLUTE blast to sit back and hang out with…

201411120402-2014-11-11 20.02.13

Then, on Friday evening, Paul and I wrapped up the week heading out for dinner and a show (“O”) with Bob Ward and his lovely wife, Ginger. What a great way to end the week! If you have the opportunity to see a Cirque du Soleil show, I highly recommend it. In fact, one of my favorite things to do while in Vegas is “hit a show” and we try to do this almost every time we’re in Vegas. We’ve seen some wonderful presenters over the years – Bette Midler, Cher, Elton John, Rod Stewart… and our favorite Wayne Newton (no, seriously – he was so genuine and so down to earth – it was incredible). My Mother and Wayne Newton were born on the EXACT same day so we actually got tickets because my Mother wanted to see him (we never would have chosen him on our own but we’re SOOOOO glad that we did). Anyway, Vegas is really AMAZING for shows!

Here are a few pictures from our evening out and about with Bob and Ginger:

201411141932-2014-11-14 19.32.04

There’s a piano bar just off the main entrance of the Bellagio. We stopped there before dinner. I asked to hear some ColdPlay and the pianist knocked it out of the park. It was EXCELLENT.


Here we are at dinner… tapas, margaritas, and wonderful conversation – what a way to end a crazy week. Thanks Bob / Ginger!

201411142153-2014-11-14 21.53.23

This is during the “pre-show” as everyone is getting settled into their seats.

201411142334-2014-11-14 23.34.40

This is the ceiling in the Bellagio “O” theater – a sight in an of itself. Gorgeous theater specially-built for “O” (which is a water show – and, the stage is a big part [and, super interesting part] of the show)! Highly recommended!!

201411142157-2014-11-14 21.57.43

Yep… he’s all mine. I can’t even get a quick selfie shot that’s nice. Welcome to my world.

OK, that’s it from our Fall SQLintersection show. We’re already planning for Spring and Fall 2015. Be sure to save the date!

SQLintersection 2015: Save the Date

Spring 2015 – the week of May 18, 2015 and the location is still TBD (we should know later this week)

Fall 2015 – week of November 2, 2015 and we’re back in Las Vegas, NV

Thanks so much! We’re really looking forward to our next show already (already planning a few things for SQLafterDark).


Building High Performance Stored Procedures

[NOTE: Update 12 Nov 2014. The most recent bug (KB article KB2965069) is THANKFULLY very unlikely for most environments (mentioned below). As a result, I’m glad to say that we can use OPTION (RECOMPILE) as a much easier (and safer) solution. So, while I’ll still leave solution 3 as an option if you run into troubles with OPTION (RECOMPILE), I can luckily say that if you’re on the latest SP/CU – the problems are incredibly unlikely. The BEST hybrid solution is shown below as solution 2.]

In my SQL PASS Summit 2014 session last week, I spoke about building high performance stored procedures and properly dealing with parameter sniffing problems… if you’re interested, the session was recorded for PASStv and you can watch it here.

It all ties to something incredibly common in development – I call it the multipurpose procedure. Simply put, it’s one application dialog with many possible options and then sitting behind it – ONE stored procedure. This one stored procedure is supposed to handle every possible case of execution with multiple parameters. For example, imagine a dialog to search for a customer.


The user can enter in any combination of these elements and the procedure’s header looks like this:

CREATE PROC [dbo].[GetCustomerInformation]
    @CustomerID      INT = NULL
    , @LastName	     VARCHAR (30) = NULL
    , @FirstName     VARCHAR (30) = NULL
    , @MiddleInitial CHAR(1) = NULL
    , @EmailAddress  VARCHAR(128) = NULL
    , @Region_no     TINYINT = NULL
    , @Cust_code     TINYINT = NULL

And then the procedure’s main WHERE clause looks like this:

WHERE ([C].[CustomerID] = @CustomerID OR @CustomerID IS NULL)
    AND ([C].[lastname] LIKE @LastName OR @LastName IS NULL)
    AND ([C].[firstname] LIKE @FirstName OR @FirstName IS NULL)
    AND ([C].[middleinitial] = @MiddleInitial OR @MiddleInitial IS NULL)
    AND ([C].[EmailAddress] LIKE @EmailAddress OR @EmailAddress IS NULL)
    AND ([C].[region_no] = @Region_no OR @Region_no IS NULL)
    AND ([C].[cust_code] = @Cust_code OR @Cust_code IS NULL)GO

Or, possibly like this:

WHERE [C].[CustomerID] = COALESCE(@CustomerID, [C].[CustomerID])
    AND [C].[lastname] LIKE COALESCE(@lastname, [C].[lastname])
    AND [C].[firstname] LIKE COALESCE(@firstname, [C].[firstname])
    AND [C].[middleinitial] = COALESCE(@MiddleInitial, [C].[middleinitial])
    AND [C].[Email] LIKE COALESCE(@EmailAddress, [C].[Email])
    AND [C].[region_no] = COALESCE(@Region_no, [C].[region_no])
    AND [C].[cust_code] = COALESCE(@Cust_code, [C].[Cust_code]);

Or, maybe even like this:

WHERE [C].[CustomerID] = CASE WHEN @CustomerID IS NULL THEN [C].[CustomerID] ELSE @CustomerID END
    AND [C].[lastname] LIKE CASE WHEN @lastname IS NULL THEN [C].[lastname] ELSE @lastname END
    AND [C].[firstname] LIKE CASE WHEN @firstname IS NULL THEN [C].[firstname] ELSE @firstname END
    AND [C].[middleinitial] = CASE WHEN @MiddleInitial IS NULL THEN [C].[middleinitial] ELSE @MiddleInitial END
    AND [C].[Email] LIKE CASE WHEN @EmailAddress IS NULL THEN [C].[Email] ELSE @EmailAddress END
    AND [C].[region_no] = CASE WHEN @Region_no IS NULL THEN [C].[region_no] ELSE @Region_no END
    AND [C].[cust_code] = CASE WHEN @Cust_code IS NULL THEN [C].[cust_code] ELSE @Cust_code END

But, no matter which of these it looks like – they’re all going to perform horribly. An OSFA procedure does not optimize well and the end result is that you’ll get one plan in cache. If that was generated from a different combination of parameters than the ones you’re executing, you might get an absolutely abysmal plan. The concept is fairly simple – when a procedure executes and there isn’t already a plan in cache (for that procedure), then SQL Server has to generate one. To do so it “sniffs” the input parameters and optimizes based on the parameters sniffed. This is the plan that gets stored with the procedure and saved for subsequent executions. Depending on your workload characteristics, you might end up with a very common plan in cache and other users also executing the common parameters might see reasonable performance (especially if the WHERE does NOT use LIKE and has nothing but equality-based criteria [those aren’t quite as bad as this one]). But, there will still be cases where an atypical execution performs horribly.

And, unfortunately, sometimes the worst combination happens – a very atypical execution is the one that gets sniffed and then everyone suffers.

So, the admin might update statistics or force a recompile (or, even restart SQL Server) to try and get around this problem. These things “fix” the immediate problem by kicking the plan out of cache but they are NOT A SOLUTION.

Solution 1: The Simple Solution – OPTION (RECOMPILE)

Since SQL Server 2005, we’ve had an option to add OPTION (RECOMPILE) to the offending statement. When OPTION (RECOMPILE) works – it works incredibly well. However, there have been bugs and you’ll want to make sure that you’re on the latest SP or CU. It’s definitely had a bit of a checkered past. Here’s a list of issues and links to KB articles / bugs around this problem:

  • KB2965069 FIX: Incorrect result when you execute a query that uses WITH RECOMPILE option in SQL Server 2012 or SQL Server 2014. Fixed in:
    • Cumulative Update 4 for SQL Server 2014
    • Cumulative Update 2 for SQL Server 2012 SP2
    • Cumulative Update 11 for SQL Server 2012 SP1
  • KB968693 FIX: A query that uses parameters and the RECOMPILE option returns incorrect results when you run the query in multiple connections concurrently in SQL Server 2008. Fixed in:
    • SQL Server 2008 CU4
    • SQL Server 2008 R2 CU1
  • I don’t have a KB article link for this one but in both SQL Server 2008 and SQL Server 2008 R2 there are versions where some performance related features were unable. For example, filtered indexes could not be used inside of stored procedures even if you used OPTION (RECOMPILE) because of changes with the behavior or bugs. So, I’d fall back on using DSE (shown by solution 3).

Anyway, despite these issues (the first one listed above – is INCREDIBLY unlikely), I LOVE the simplicity of OPTION(RECOMPILE) because all you have to do is change the offending statement like this:

FROM ...

The reason I love this is for the simplicity of the statement. The reason that I have an issue with just doing this as a standard is two-fold:

  • First, if you do this too often (or, for extremely frequently executed procedures), you might end up using too much CPU.
  • Second, I’ve seen some environments where this is used to fix ONE problem and then it becomes a standard for ALL problems. This DEFINITELY should NOT become a standard coding practice. You only want to use this if your testing shows parameter sniffing problems and plan unstability. Stored procedures that have stable plans should be compiled, saved, and reused.

A better solution is one where you recompile for unstable plans but for stable plans – you place those in cache for reuse.

A Hybrid Solution – recompile when unstable, cache when stable

My main problem with adding OPTION (RECOMPILE) to many statements is that it can be costly to do when you’re executing thousands of procedures over and over again. And, if you really think about it, not all combinations really need to be recompiled.

For example, take an easier combination of just 3 parameters: CustomerID, Lastname, and Firstname. The possible combinations for execution will be:

  1. CustomerID alone
  2. Lastname alone
  3. Firstname alone
  4. CustomerID and Lastname together
  5. CustomerID and Firstname together
  6. Firstname and Lastname together
  7. Or, all three – CustomerID, Lastname, and Firstname

My questions for you to consider are:

  • Are all of those going to want the same execution plan?
  • And, what if they supply wildcards?

The most important consideration is that ANY query that submits CustomerID should NOT be recompiled, right? Think about it – if someone says I want to search for all of the people that have an ‘e’ in their last name (Lastname LIKE ‘%e%’) AND that have an ‘a’ in their first name (Firstname LIKE ‘%a%’) AND that have a CustomerID of 123456789 then all of a sudden this becomes an incredibly simple query. We’re going to go an lookup CustomerID 123456789 and then check to see if they have an ‘a’ in their first name and an ‘e’ in their last name. But, the lookup is incredibly simple. So, ANY execution where CustomerID is supplied should NOT be recompiled; the plan for those executions is STABLE. However, Lastname and Firstname searches might be highly unstable – especially if there’s a leading wildcard. So, if wildcard is present then let’s recompile if they don’t supply at least 3 characters for BOTH the first name and last name – preceding the wildcard character. And, it’s also OK if they don’t supply a wildcard at all. Then, the overall number of recompilations will be much lower – we’ll save CPU and we’ll get better performance by using a lot of STABLE, precompiled plans.

To do this, there are TWO versions. Solution 2 uses OPTION (RECOMPILE) strategically. Solution 3 using dynamic string execution instead.

Solution 2 – recompile when unstable [by adding OPTION (RECOMPILE)], cache when stable (using sp_executesql)

There are absolutely some criteria when plan stability can be predicted. For these, we will build the statement to concatenate only the non-NULL parameters and then execute normally using sp_executesql. For where it’s unknown or unstable, we’ll recompile by OPTION (RECOMPILE) to the statement. For all executions, use sp_executesql. And, of course, this is ONLY substitution for parameters. And, this should only be done for the most frequently executed procedures. You don’t have to do this for every procedure but it’s extremely beneficial for things that are heavily executed.

-- Solution 2

CREATE PROC [dbo].[GetCustomerInformation]
    @CustomerID	BIGINT = NULL
    , @LastName	VARCHAR (30) = NULL
    , @FirstName VARCHAR (30) = NULL
    , @MiddleInitial CHAR(1) = NULL
    , @EmailAddress VARCHAR(128) = NULL
    , @Region_no TINYINT = NULL
    , @Member_code TINYINT = NULL
IF (@CustomerID IS NULL
    AND @LastName IS NULL
    AND @FirstName IS NULL
    AND @MiddleInitial IS NULL
    AND @EmailAddress IS NULL
    AND @Region_no IS NULL
    AND @Member_code IS NULL)
    RAISERROR ('You must supply at least one parameter.', 16, -1);

        @Recompile  BIT = 1;

SELECT @ExecStr =
	N'SELECT COUNT(*) FROM [dbo].[Customers] AS [C] WHERE 1=1';

	SELECT @ExecStr = @ExecStr
		+ N' AND [C].[CustomerID] = @CustID';

	SELECT @ExecStr = @ExecStr
		+ N' AND [C].[LastName] LIKE @LName'; 

	SELECT @ExecStr = @ExecStr
		+ N' AND [C].[Firstname] LIKE @FName';

IF @MiddleInitial IS NOT NULL
	SELECT @ExecStr = @ExecStr
		+ N' AND [C].[MiddleInitial] = @MI';

IF @EmailAddress IS NOT NULL
	SELECT @ExecStr = @ExecStr
		+ N' AND [C].[EmailAddress] LIKE @Email';

IF @Region_no IS NOT NULL
	SELECT @ExecStr = @ExecStr
		+ N' AND [C].[Region_no] = @RegionNo';

IF @Member_code IS NOT NULL
	SELECT @ExecStr = @ExecStr
		+ N' AND [C].[Member_code] = @MemberCode';

-- These are highly limited sets
IF (@CustomerID IS NOT NULL)
    SET @Recompile = 0

IF (PATINDEX('%[%_?]%', @LastName) >= 4
        OR PATINDEX('%[%_?]%', @LastName) = 0)
    AND (PATINDEX('%[%_?]%', @FirstName) >= 4
        OR PATINDEX('%[%_?]%', @FirstName) = 0)
    SET @Recompile = 0

IF (PATINDEX('%[%_?]%', @EmailAddress) >= 4
        OR PATINDEX('%[%_?]%', @EmailAddress) = 0)
    SET @Recompile = 0

IF @Recompile = 1
    --SELECT @ExecStr, @Lastname, @Firstname, @CustomerID;
    SELECT @ExecStr = @ExecStr + N' OPTION(RECOMPILE)';

EXEC [sp_executesql] @ExecStr
    , N'@CustID bigint, @LName varchar(30), @FName varchar(30)
    , @MI char(1), @Email varchar(128), @RegionNo tinyint
    , @MemberCode tinyint'
    , @CustID = @CustomerID
    , @LName = @LastName
    , @FName = @FirstName
    , @MI = @MiddleInitial
    , @Email = @EmailAddress
    , @RegionNo = @Region_no
    , @MemberCode = @Member_code;

Solution 2 is the preferred and best method to create a balance where you don’t just recompile every time. This will give you better long-term scalability. 

Solution 3 – recompile when unstable [by using DSE], cache when stable (using sp_executesql)

In the past, if I’ve ever run into problems with OPTION (RECOMPILE), I always consider (albeit, not lightly) rewriting the statement using a dynamic string instead; this always works! But, yes, it’s a pain to write. It’s a pain to troubleshoot. And, above all, yes, you have to be careful of SQL injection. There are ways to reduce and even eliminate the potential for problems (and, not all DSE can). Above all, you do need to make sure you have clean input (limiting only to valid input for the type of column such as characters for names and emails, etc.). Check out this blog post if you want more information Little Bobby Tables, SQL Injection and EXECUTE AS.

Based on that blog post, the solution to this problem is the following code.

-- To reduce the potential for SQL injection, use loginless
-- users. Check out the "Little Bobby Tables" blog post for
-- more information on what I'm doing in this code / execution
-- to reduce the potential surface area of the DSE here.

CREATE USER [User_GetCustomerInformation]

GRANT SELECT ON [dbo].[Customers]
TO [User_GetCustomerInformation];

-- You'll need this if you want to review the showplan
-- for these executions.
GRANT SHOWPLAN TO [User_GetCustomerInformation];

-- Solution 3

CREATE PROC [dbo].[GetCustomerInformation]
 @CustomerID BIGINT = NULL
 , @LastName VARCHAR (30) = NULL
 , @FirstName VARCHAR (30) = NULL
 , @MiddleInitial CHAR(1) = NULL
 , @EmailAddress VARCHAR(128) = NULL
 , @Region_no TINYINT = NULL
 , @Cust_code TINYINT = NULL
WITH EXECUTE AS N'User_GetCustomerInformation'
IF (@CustomerID IS NULL
 AND @LastName IS NULL
 AND @FirstName IS NULL
 AND @MiddleInitial IS NULL
 AND @EmailAddress IS NULL
 AND @Region_no IS NULL
 AND @Cust_code IS NULL)
 RAISERROR ('You must supply at least one parameter.', 16, -1);

DECLARE @spexecutesqlStr NVARCHAR (4000),
 @ExecStr NVARCHAR (4000),
 @Recompile BIT = 1;

SELECT @spexecutesqlStr =
 N'SELECT * FROM [dbo].[Customers] AS [C] WHERE 1=1';

SELECT @ExecStr =
 N'SELECT * FROM [dbo].[Customers] AS [C] WHERE 1=1';

 SELECT @spexecutesqlStr = @spexecutesqlStr
 + N' AND [C].[CustomerID] = @CustID';
 SELECT @ExecStr = @ExecStr
 + N' AND [C].[CustomerID] = CONVERT(BIGINT, ' + CONVERT(NVARCHAR(30),@CustomerID) + N')';

 SELECT @spexecutesqlStr = @spexecutesqlStr
 + N' AND [C].[LastName] LIKE @LName';
 SELECT @ExecStr = @ExecStr
 + N' AND [C].[LastName] LIKE CONVERT(VARCHAR(30),' + QUOTENAME(@LastName, '''''') + N')';

 SELECT @spexecutesqlStr = @spexecutesqlStr
 + N' AND [C].[Firstname] LIKE @FName';
 SELECT @ExecStr = @ExecStr
 + N' AND [C].[FirstName] LIKE CONVERT(VARCHAR(30),' + QUOTENAME(@FirstName, '''''') + N')';

IF @MiddleInitial IS NOT NULL
 SELECT @spexecutesqlStr = @spexecutesqlStr
 + N' AND [C].[MiddleInitial] = @MI';
 SELECT @ExecStr = @ExecStr
 + N' AND [C].[MiddleInitial] = CONVERT(CHAR(1), ' + QUOTENAME(@MiddleInitial, '''''') + N')';

IF @EmailAddress IS NOT NULL
 SELECT @spexecutesqlStr = @spexecutesqlStr
 + N' AND [C].[EmailAddress] LIKE @Email';
 SELECT @ExecStr = @ExecStr
 + N' AND [C].[EmailAddress] LIKE CONVERT(VARCHAR(128), ' + QUOTENAME(@EmailAddress, '''''') + N')';

IF @Region_no IS NOT NULL
 SELECT @spexecutesqlStr = @spexecutesqlStr
 + N' AND [C].[Region_no] = @RegionNo';
 SELECT @ExecStr = @ExecStr
 + N' AND [C].[Region_no] = CONVERT(TINYINT, ' + CONVERT(NVARCHAR(5),@Region_no) + N')';

IF @Cust_code IS NOT NULL
 SELECT @spexecutesqlStr = @spexecutesqlStr
 + N' AND [C].[Cust_code] = @MemberCode';
 SELECT @ExecStr = @ExecStr
 + N' AND [C].[Cust_code] = CONVERT(TINYINT, ' + CONVERT(NVARCHAR(5), @Cust_code) + N')';

-- These are highly limited sets
IF (@CustomerID IS NOT NULL)
 SET @Recompile = 0

IF (PATINDEX('%[%_?]%', @LastName) >= 4
 OR PATINDEX('%[%_?]%', @LastName) = 0)
 AND (PATINDEX('%[%_?]%', @FirstName) >= 4
 OR PATINDEX('%[%_?]%', @FirstName) = 0)
 SET @Recompile = 0

IF (PATINDEX('%[%_?]%', @EmailAddress) >= 4
 OR PATINDEX('%[%_?]%', @EmailAddress) = 0)
 SET @Recompile = 0

IF @Recompile = 1
 -- Use this next line for testing
 -- SELECT @ExecStr -- For testing
 EXEC (@ExecStr);
 -- Use this next line for testing
 -- SELECT @spexecutesqlStr, @Lastname, @Firstname, @CustomerID;
 EXEC [sp_executesql] @spexecutesqlStr
 , N'@CustID bigint, @LName varchar(30), @FName varchar(30)
 , @MI char(1), @Email varchar(128), @RegionNo tinyint
 , @CustomerCode tinyint'
 , @CustID = @CustomerID
 , @LName = @LastName
 , @FName = @FirstName
 , @MI = @MiddleInitial
 , @Email = @EmailAddress
 , @RegionNo = @Region_no
 , @CustomerCode = @Cust_code;

This solution should NOT be used as a standard. But, it is an option for handling some cases where OPTION (RECOMPILE) isn’t giving you the plans / performance that you need. This solution is more complicated for developers to code. But, I’m not saying you have to do this for absolutely everything. Just work through the most critical procedures that are causing you the most grief.

The RIGHT way to handle multipurpose procedures

The hybrid solution (solution 2) is more complicated for developers to code. Remember, I’m not saying you have to do this for absolutely everything. Just work through the most critical procedures that are causing you the most grief (because they’re the most heavily executed procedures). The hybrid solution really gives you the best of all worlds here:

  • It reduces CPU by only recompiling for unstable combinations
  • It allows every combination executed with sp_executesql to get its own plan (rather than one per procedure)
  • It allows stable plans to be put in cache and reused for subsequent executions (which reduces your impact to CPU and gives you long-term scalability)

Above all, you need to test that this works as expected but not only will you get better plans but you’ll also scale better!


For more information

  • Check out my PASS presentation DBA-313 here.
  • Check out my Pluralsight course: SQL Server: Optimizing Ad Hoc Statement Performance here.
  • Check out my Pluralsight course: SQL Server: Optimizing Stored Procedure Performance here.
  • Check out our Immersion Event: IEPTO2: Immersion Event on Performance Tuning and Optimization – Part 2 here.

And so it (PASS 2014) begins…

Tonight is the opening evening event for PASS 2014 and we’re (SQLskills, as a team), really looking forward to seeing familiar faces and hearing your fantastic questions, problems, and DBA-stories. I’m putting the final touches on my session (and also going over feedback from last year’s session) and I stumbled on a bunch of photos that I wanted to share from last year.

I look forward to seeing everyone and taking lots of funs pictures. But, above all – I love the stories. Be sure to stop by and see me!

Here’s a really cool shot that @SQLBalls took of my session last year:

Bradley Ball's Pano shot of Ballroom A for my session

Bradley Ball’s Pano shot of Ballroom A for my session

Here’s another shot during the session. I had lots of great questions from the group (from here down – you can click on the photos to see a larger image of the shot).


SQL PASS 2013 – my session on skewed data

And how COOL is this? Because my session was broadcast on #PASStv there were folks watching from all over. I got this shot from a group that was watching together in their offices. This year I’m on #PassTV again TOMORROW so I hope to see a few of you again. Well, see you on twitter. ;-)


Watching #PASStv

This shot is from 45 minutes after my session and we were still going. I’m HAPPY to say that I have the last slot of the day once again and I expect that it may look like this again… 45 minutes after my session ends! I think the questions, the problems, the excitement that attendees have is what I [personally] love the most about these conferences!


45 minutes after my session and we were still going!

And now for the friends… here’s Erin and I. She’s MUCH taller than I am. See…


@ErinStellato is MUCH taller than I am!


Out for a few at the SQL Sentry party 
IsThatQwAPhotoBomb     MeStevePaulJesFromSQLAsylum

And now the attendee party – these are always great fun. I’m so glad that a few of you sent me the shots from your phones! 
NascarAttendeePhoto_@LaerteSQLDBA     NascarAttendeePhoto_@murilocmiranda     NascarAttendeePhoto_@ScottOstr     NascarAttendeePhoto_@SQLenergy     NascarAttendeePhoto_@yourSQLman


And last year’s AFTER PARTY was absolutely spectacular. We were in Charlotte, NC so SQL Sentry (based in Charlotte) KNOCKED IT OUT OF THE PARK with a KILLER event. They gave away some drives and a few of us (er, well – me and @PaulRandal) just HAD to tag along. One of my favorite shots is of @AaronBertrand, @KEKline and I. What a great time!! And, if you really want to see some great video action – check out SQL Sentry’s “Who’s the World’s Fastest DBA” video from the event here.


Aaron, me, and Kevin at Charlotte Motor Speedway… I WANNA GO FAST!

This last shot isn’t a complete shot of the team but we had most of us when @SQLAsylum passed by us and we couldn’t hesitate to get a shot from Pat. Every year he shoots the event and gets some great (and sometimes very candid) shots. I’m looking forward to seeing his pics for this year! As always, THANKS Pat!! He blogged his goals for 2014 here.


Part of the SQLskills crew – we’ll get a COMPLETE group shot this year!

Looking forward to seeing everyone… I hope that everyone has a fantastic SUMMIT 2014.


Fall SQLintersection is coming up soon and we can’t wait!

In just a couple of weeks (9-14 November 2014), we’ll be at our twice-yearly conference, SQLintersection. We’re really looking forward to it as our conference is different than many others… our sessions are hand-picked and combined to create daily “tracks” with themes. Often, I’ll even ask a speaker if they could talk about x or y or z so that it fits in well with that day’s theme. And, I only ask experts in those areas to speak. But, not just experts – truly great speakers. We really want you to walk away from a session empowered. And, nothing is more empowering than truly understanding something. Our speakers are available. If you don’t get a question answered during a session then stay after. Or, come to one of our open Q&A sessions (these are great fun). The point – we’re there to get your questions answered so that you can go back to the office with immediate solutions; immediate ROI for your time away.

SQLintersection: Great Speakers!

I’m so excited about this lineup. All of these speakers are top-rated, SQL experts that have been around in this field for years (some can even boast decades) but are still focused on consulting. Every speaker is a SQL Server MVP (with the except of the Microsoft speakers – but, I don’t think anyone’s going to question their SQL knowledge :)) and some are Microsoft Certified Masters in SQL Server. But, no matter what – ALL are focused on providing the most ROI that’s possible in their session(s) and/or their workshops. Check out this list of speakers:

  • Aaron Bertrand, Sr. Consultant, SQL Sentry, Inc. [blog | twitter]
  • Andrew J. Kelly, Mentor, SolidQ [blog | twitter]
  • Bob Ward, Principal Architect Escalation Engineer, Microsoft [blog | twitter]
  • Brent Ozar, Brent Ozar Unlimited [blog | twitter]
  • David Pless, Dedicated Support Engineer (PFE), Microsoft
  • Glenn Berry, Principal Consultant, [blog | twitter]
  • Jeremiah Peschka, Brent Ozar Unlimited [blog | twitter]
  • Jonathan Kehayias, Principal Consultant, [blog | twitter]
  • Joseph Sack, Sack Consulting [website | blog]
  • Kendra Little, Managing Director, Brent Ozar Unlimited [blog | twitter]
  • Kevin Farlee, Storage Engine Program Manager, SQL Server, Microsoft
  • Kevin Kline, Director of Engineering Services, SQL Sentry, Inc. [blog | twitter]
  • Kimberly L. Tripp, President/Founder, [blog | twitter]
  • Paul S. Randal, CEO / Owner, [blog | twitter]
  • Paul White, SQL Kiwi Limited [blog | twitter]
  • Shep Sheppard, Senior Program Manager, Microsoft AzureCAT [blog | twitter]
  • Tim Chapman, Dedicated Support Engineer (PFE), Microsoft [blog | twitter]

SQLintersection: When is it all happening?

The show officially runs from Monday, November 10 through Thursday, November 13 but there are both pre-conference and post-conference workshops that extend the show over a total of up to 6 full days. For the full conference, you’ll want to be there from Sunday, November 9 through Friday, November 14.

  • Sunday, November 9 – pre, pre-con day. There are three workshops running:
    • SQL Server Architecture: Choosing Virtualization, Clustering, AlwaysOn, and More by Jeremiah Peschka, Kendra Little, and Brent Ozar
    • Core SQL Server Fundamentals by Andrew Kelly
    • Mastering SQL Server Execution Plan Analysis by Paul White
  • Monday, November 10 – pre-con day. There are three workshops running. And, the first keynote of the conference is Monday evening.
    • Performance Troubleshooting Using Waits and Latches by Paul S. Randal
    • High Performance, Scalable Asynchronous Processing Using Service Broker by Jonathan Kehayias
    • Very Large Tables: Optimizing Performance and Availability through Partitioning by Kimberly L. Tripp
  • Tuesday, November 11 through Thursday, April 13 is the main conference. Conference sessions will run all day in multiple tracks:
    • Download our full schedule here
    • Be sure to check out some of our cross-conference events such as our Women In Tech luncheon on Tuesday!
  • Friday, November 13 is our final full day running with a few post-con workshops. There are a few workshops running.
    • Queries Gone Wild 2: Statistics and Cardinality in Versions 2008, 2008R2, 2012, and 2014 by Kimberly L. Tripp and Joe Sack
    • Developer’s Guide to Tuning Somebody Else’s SQL Server by Jeremiah Peschka, Kendra Little, and Brent Ozar

SQLintersection: Why is it for you?

If you want practical information delivered by speakers that not-only know the technologies but are competent and consistently, highly-rated presenters – this is the show for you. You will understand the RIGHT features to troubleshoot and solve your performance and availability problems now!

We hope to see you there!

PS – Use the discount code ‘SQLskills’ when you register and receive $50 off registration!

Multi-column statistics and exponential backoff

After writing SQLskills procs to analyze data skew and create filtered statistics, I received a couple of emails asking me to further explain the sentence “This is also true for multi-column, column statistics (where you need to use MORE than just the first column) and hypothetical indexes created with auto pilot.” (NOTE: this refers to the fact that these don’t work in the new CE.)

So, in this post, I want to further explain multi-column statistics and how both the legacy and new CE would calculate selectivity using the Credit sample database. If you don’t have a copy of this, you can download a SQL Server 2008 backup from here. To reproduce everything shown in this example, you must restore this to SQL Server 2014. FYI – at the time of this blog post, I’m running SQL Server 2014 CU3 (version = 12.0.2402.0).

What CE Model are you running?

Review the default cardinality estimation model used across all of your databases:

SELECT  [name] AS 'Database Name'
    , CASE
        WHEN [compatibility_level] = 120 THEN 'New SQL Server 2014 CE Model'
        ELSE 'Legacy CE Model'
      END AS 'Cardinality Estimation Model'
FROM    [sys].[databases];

What does the Legacy CE Model use if there are multi-column statistics?

Check out the indexes and statistics on the charge table:

EXEC [sp_helpstats] '[dbo].[charge]', 'all';

You should see that there are ONLY statistics on existing indexes:

statistics_name          statistics_keys
------------------------ ------------------
charge_category_link     category_no
charge_provider_link     provider_no
charge_statement_link    statement_no
ChargePK                 charge_no

We’re going to run some queries against the charge table and we’re going to query against the category_no and the provider_no columns… while indexes for category_no and provider_no might be helpful, I want to drop those and just see how the estimates work with multi-column, column statistics:

DROP INDEX [charge].[charge_provider_link];
DROP INDEX [charge].[charge_category_link];

Without those indexes, SQL Server does not have any column-level statistics for the category_no or provider_no columns. To help the queries we’re going to run, we’ll create a multi-column, column statistic:

ON [dbo].[charge] ([provider_no], [category_no]);

Now, we’ll see what happens when we run a query using the legacy CE model vs. the new CE model. Be sure to turn on “Show Actual Execution Plan” in the Query drop-down menu.

SELECT [ch].*
FROM [dbo].[charge] AS [ch]
WHERE [ch].[provider_no] = 434
    AND [ch].[category_no] = 10
GO -- TF 9481 = CardinalityEstimationModelVersion 70

Looking at the showplan: Plan&Estimate_withArrowsYou can see the estimate and the actual are fairly close (but, by no means perfect). To understand where this comes from – you need to understand the density vector component of a statistic:

DBCC SHOW_STATISTICS ('[dbo].[charge]', 'TestStat')

The “All density” columns can be used to calculate the average number of rows that are returned when that column (or, combination of columns – as you add the second, third, etc. columns – shown as rows [when present] in the density vector). And, the column “Columns” shows the combination. For another example, if we had created a statistic on columns such as Lastname, Firstname, and Middleinitial then the all density could help us understand the average number of rows returned when supplying just a lastname; or, when supplying both a lastname AND a firstname; or, finally, when supplying a lastname, firstname, and middleinitial. What the density vector does not provide is a way of knowing the selectivity of any of the secondary columns on their own; it knows ONLY of the left-based combinations. This is one of the reasons why column-level statistics are helpful; they can provide the densities of the secondary columns on their own.

For this query, we’re going to use the “All density” of the combination of  provider_no and category_no (from the 2nd row of output). The “All density” value is 0.0002914602. If we multiple that by the number of rows in the table (at the time the statistics were created) then we can get the average number of rows returned. To see the number of rows in the table, we need to review the statistics_header component as well as the density_vector. This is easiest by re-running our DBCC command without the WITH clause.

DBCC SHOW_STATISTICS ('[dbo].[charge]', 'TestStat');

The result is to multiply the “All density” of 0.0002914602 times 1600000 rows for an average of: 466.3363200000 (this is where the estimate of 466.336 is calculated).

What about the New CE Model in SQL Server 2014?

First, we’ll re-run the query but force the new CE model using trace flag 2312:

SELECT [ch].*
FROM [dbo].[charge] AS [ch]
WHERE [ch].[provider_no] = 434
AND [ch].[category_no] = 10
GO -- TF 2312 = CardinalityEstimationModelVersion 120


And, now we see the same plan – but, this time, the estimate is further off from the actual. And, another question might be – where did that number come from?

In SQL Server 2014, regardless of multi-column statistics, the estimate for multiple conjunctive (AND) predicates is calculated using exponential back-off. The idea is that they take the highest selectivity first and then multiply that by the subsequent square roots of the next three less selective predicates. Specifically:

most-selective-predicate * sqrt(next most selective predicate) * sqrt(sqrt(next most selective predicate))sqrt(sqrt(sqrt(next most selective predicate)))

In this case, they first need to calculate the selectivity of the two predicates supplied:

To calculate the selectivity for [provider_no] = 434, use the histogram from the TestStat multi-column statistic (but, they don’t use the density vector of the combination). The histogram actually has a step for 434 and it shows 6625.247 rows in the EQ_ROWS column. The selectivity of this can be calculated as 6625.247 / 1600000 OR 0.00414077937.

To calculate the selectivity for [category_no] = 10, use the histogram from the auto-created statistics on category_no (and, yes, this would have been created for this query if SQL Server hadn’t already created it for the other statement’s execution [which doesn’t entirely make sense because they didn’t use it. But, that’s another discussion for another day. And, if I’m being honest, as long as these get maintained properly, I’m ALL FOR AUTO CREATE STATS and I’d stand from the rooftops and scream it if it wouldn’t get me arrested… OK, sorry. I digress].

Once again, we’re doing well with our histograms as there’s an actual step for 10 and it shows 179692.4 rows in the EQ_ROWS column. The selectivity of this can be calculated as 179692.4 / 1600000 OR 0.112307750.

Now – to calculate our estimate… use the MOST selective predicate (0.00414077937) * the SQRT of the next most selective predicate (0.112307750) * the number of rows:

SELECT 0.00414077937 * sqrt(0.112307750) * 1600000;


New CE Model Estimate

NOTE: How this is calculated may change in a later SP or CU so I can only vouch for 2014 RTM through CU3.

Sure enough, this matches our showplan output.

What would the Legacy CE Model have done without multi-column statistics?

If we had not had our multi-column statistics then the Legacy CE Model would have just expected even distribution of providers across categories. To do this they’d simply multiply the selectivities (NOT backing off at all):

SELECT 0.00414077937 * 0.112307750 * 1600000 AS [Legacy CE Model Estimate];


Legacy CE Model Estimate

Tip: Drop the ‘TestStat’ statistic and run the query again using TF 9481 to see this.

So, which CE is best?

This is where the good news and bad news comes in. There’s NO model that can deal with every type of data distribution possible. When data is evenly distributed across the different columns then the old model can produce a more accurate result:

Legacy CE Model estimate: 744.07

Multi-column statistics (just the AVERAGE of provider and category TOGETHER and across ALL values): 466.34

New CE Model estimate: 2220.28


The point – I could create another example where the New CE Model is the best. Here the Legacy CE is the best but the legacy CE doesn’t even use it because they rely on multi-column statistics (and therefore averages across all categories and providers). So, the irony is that they get further away with the generalized multi-column statistic. But, I could also come up with yet another example where the New CE Model produces the best result (and another where the multi-column statistic is best). In the end, it completely depends on THE DISTRIBUTION OF THE DATA.

But, the really good news is that you have a lot of troubleshooting and control options here. My recommendation (you can read more about it in the post I linked to at the start of this post) is that you STAY using the Legacy CE Model and where estimates are OFF (or, where you’re troubleshooting a suspected cardinality estimation problem), TRY the new CE using the QUERYTRACEON option. You can even try TF 4137 (this has been available since SQL Server 2008*) if you know that one value is a direct subset of the other (TF 4137 uses the MINIMUM selectivity of the predicates and does NOT perform a calculation).

Above all – have fun!

* Thanks to Pedro for reminding me to mention that TF 4137 has been around since SQL Server 2008. So, if you have predicates that are subsets of the others (WHERE city = ‘Chicago’ AND state = ‘IL’ AND country = ‘USA’) then you would NOT want EITHER model to estimate (even exponential back-off is going to be wrong here even though it will be higher than the old model). Here, Chicago is a subset of IL and IL a subset of USA. Using the MINIMUM selectivity of these predicates (which would be the selectivity of Chicago) would be best. However, if you had any other predicates in there – then, all bets are off. See – this is SUPER HARD! ;-) ;-)

SQLskills procs to analyze data skew and create filtered statistics

At PASS Summit 2013 I delivered a presentation titled: Skewed Data, Poor Cardinality Estimates, and Plans Gone Bad (you can watch this session on PASStv here). Note: please fast forward 4 mins as they didn’t tell me they were recording / publishing my “chatting” before the session – which is weird IMO but if you want to hear me talk about diving and sharks and octopus then there it is. As for the actual content of the session, it’s all about understanding the limits of SQL Server’s histogram (which has a maximum of 201 steps) and the result that estimates from skewed data (using step averages) can be highly incorrect – resulting in estimates that are off and query plans that aren’t correct. As part of that session, I demo’ed some code to help programmatically analyze histograms for skew and also code that will create filtered statistics for you – based on a few parameters. In this post, I have an update to those scripts. But, there’s also a bit of good news and bad news that goes along with filtered statistics. Let me start with the good news…

The Good News

  • Filtered statistics work really well in SQL Server 2008, SQL Server 2008 R2, and SQL Server 2012. (hmm… I bet you can already figure out the bad news)
  • The scripts have been updated to fix a couple of minor issues (one where I checked for version but didn’t allow them to run on SQL Server 2014).
  • sp_SQLskills_DropAllColumnStats has been modified to ONLY drop filtered statistics that start with SQLskills_FS (which is the naming convention I use when I create filtered stats).
  • sp_SQLskills_CreateFilteredStats no longer requires a “max” value for tables. The final filtered statistic created will be unbounded. Also, depending on how many steps you have, there were a few weird situations where you might end up with a statistic that is definitely >= value and < same value. This would NOT hurt any estimates but it also wasn’t very useful. Totally benign but also fixed.

The Bad News

Filtered statistics are not yet accessible by the SQL Server 2014 cardinality estimator (tested up to SQL 2014 CU3). This is also true for multi-column, column statistics (where you need to use MORE than just the first column) and hypothetical indexes created with auto pilot. But, there is good news here as well – you CAN still use SQL Server 2014 and take advantage of filtered statistics by accessing the legacy CE at the query level. In fact, if you run using the legacy CE in 2014 then you can access all of these things (and, again, you can do this on a query by query basis if you’d like). So, let me give you a few options and a recommendation.

Cardinality Estimator Options for SQL Server 2014

Option 1 – low impact (my recommendation)

  • Upgrade existing databases (through backup / restore)
  • Leave the existing compatibility level intact (SQL Server 7.0 CE is the default for compatibility modes of SQL Server 7.0 through SQL Server 2012). Remember, restoring / attaching  does not “upgrade” the compatibility level; it remains the level it was when it was backed up / detached.
  • For the following types of queries, test the query performance using trace flag 2312. If you see a benefit, use the new CE model in the query with OPTION (QUERYTRACEON 9481).
    •  That aren’t performing well
    • Whose estimates are far off from actual
    • Where you want to use filtered statistics or auto pilot

Option 2 – low impact

  • Change to compatibility level 120
  • Enable trace flag 9481 server-wide
  • Again, use query-level changes for accessing the new CE.

Option 3 – potentially high impact (potentially good; potentially bad)

  • Change to compatibility level 120 (New CE model)
  • When troubleshooting, test trace flag 9481 against queries that have regressed
  • Use trace flag 9481 for queries that regressed

Playing with the skew analysis scripts and filtered statistics scripts

If you’re still wanting to determine if your very large tables show signs of skew and potentially poor estimates – check out my PASStv presentation from last year. Then, replace the SQLskills project with the scripts in this zip (SQLskillsProcs). If you keep your database compatibility mode at 110 (for SQL Server 2012) then you won’t need to make any changes to the other examples from that session. However, if you change to the SQL Server 2014 compatibility mode (120) then you’ll need to add this to the statements where you want to leverage filtered statistics.

SELECT blah blah

9481 = Legacy CE model for SQL Server versions 7.0 – 2012

3604 = this will display the output to the results window

9204 = this will display the statistics used by the query (note, I can’t seem to get this to work when using the new CE at all)

Have fun and let me know what you find. Cheers, kt

New York .Net and SQL Server User Groups – We’re back!!

Date & Time: Tuesday, February 25, 2014 – 6:30-9pm

Evening theme/title: SQL Server – Are you talking to me?

Groups/Target Audience: SQL Server and .NET User Groups, anyone that works with SQL Server that wants to talk techie on Tuesday evening

Cost: FREE (but, you must register; link at the end of this post)

Presenters: Kimberly L. Tripp & Jonathan Kehayias,

Abstract: Our title seems strange but it runs across multiple themes. First we’ll take a look at Service Broker in SQL Server and how it can extend service oriented architectures into the data tier with asynchronous message processing. As for the other theme, database developers, DBAs, architects – don’t always talk to each other. In this evening session, we’ll cover a mix of really interesting topics that help bridge the gap across these disciplines. Reading plans and understanding plans with Plan Explorer (both the free and the Pro versions will be discussed) – this is something everyone should be able to do. And, we’ll raffle off a PE Pro Edition (thanks SQL Sentry) in addition to talking about the FREE Plan Explorer. Then, we’ll dive into statement execution methods and plan cache. Different coding strategies on our servers can have a profound effect on both cache and performance. But, how can you write more effective code that takes less caching and has better plans? Some of this content comes from a newly released Pluralsight course titled: SQL Server – Optimizing Ad Hoc Statement Performance and for those of you that attend – Pluralsight will give you a monthly trial. And, we’ll also raffle off two Annual Plus subscriptions (thanks Pluralsight)!

How can we all just get along? Come to this fast-paced, demo-packed session led by industry-experts Kimberly L. Tripp and Jonathan Kehayias of SQLskills. And, after the presentation, Kimberly, Jonathan, and Paul Randal (also of SQLskills) will hang out for an open Q&A on almost-anything-SQL. We’ll post a list of topics and open up the floor!


Rough Agenda:

  • 6:30pm – Pizza (thanks Pluralsight & SQL Sentry), networking, settling in
  • 7:00pm
    • Service Broker (~15 mins) – Jonathan Kehayias
    • SQL Sentry’s Plan Explorer (~15 mins) – Jonathan Kehayias
    • Statement Execution (~15 mins) – Kimberly L. Tripp
    • Plan Cache (~15 mins) – Kimberly L. Tripp
  • 8:15pm – Raffles (must be present to win)
    • SQL Sentry Plan Explorer Pro License
    • Pluralsight Annual Plus Subscriptions
    • $100 off (per day) for an Immersion Event of your choice (maximum value $500 for a 5-day event)
    • Everyone – 30-day trial subscription to Pluralsight
  • 8:30pm – Open Q&A (all 3 of us)


11 Times Square (west 41st street and 8th avenue)
New York, NY 10036

Check-in with security in the lobby and then you’ll proceed to the 9th floor. Our meeting will be held in the Music Box/Winter Garden MPR.

And a final, big thanks to our evening hosts!

Microsoft contact: Jesus Aguilar,
SQL Server User Group contact:
Joe Lax,
.NET User Group Contact: Andrew Brust,

Want to join us – register here (cost: free):

Physical Database Design Considerations and a bit of freediving

Something that always amazes me is that people think SQL Server is easy. And, I’d like to (at least partially) blame Microsoft marketing for that. Over the years they’ve propagated this notion of SQL Server being a simplified database environment even “zero-administration” at one point (anyone remember that marketing nightmare?). And, while there are MANY things that are simplified – it’s by no means SIMPLE. And, simply put, it’s just like EVERY OTHER TOOL you’ve ever used. If you want to use it effectively, you need to learn how to use it properly.

Photo taken by Deron Verbeck of WHOA ( They're a group with whom I always go out when I'm in Kona. If you want to see wildlife and get out on the water with very knowledgeable folks - this is the company to pick!

Photo (of me w/some beautiful mantas) taken by Deron Verbeck of WHOA ( They’re a group with whom I always go out when I’m in Kona, HI. If you want to see marine wildlife and get out on the water with very knowledgeable folks (on a fantastic boat) – this is the company to pick!

Over the weekend I took a class in freediving (like snorkeling, but my goal is to dive/photograph animals [mostly cetaceans – who really don’t like bubbles] “on a single breath”). But, it’s not just about holding your breath. There are breathing techniques for slowing down your heart rate and making yourself more comfortable in the water. There are skills for coming back up out of the water after a long breath hold (it’s called recovery breathing and it’s meant to help reduce/prevent shallow water black-out [Samba] and full black-outs). And, there’s a HUGE emphasis on safety/buddy dives. Even with just a two-day class, I was diving deeper and holding my breath longer than I ever thought possible. I believe in classes. I believe in safety. I believe that you can never stop learning. I believe in IMMERSION (in more ways than one :).

My main point: It’s always better to know how to do something properly EARLY so that you can avoid problems later. Luckily, with SQL Server, it’s [usually] not that critical/dangerous (but, what about hospital systems, transit systems, safety/911 systems, etc…).

My point, the more you know about the RDBMS, the better you’ll be able to design for it. The better your applications will scale. The longer life your applications will have and the longer you’ll be able to use the hardware that you buy.

Get more information on Physical Database Design

These are the points that I’ve been making in a couple of my most recent releases and I wanted to make sure you knew they were available:

For FREE and available for download – my interview on DotNetRocks. I love these guys. They’re great friends of ours (did you know that Richard MC’ed our wedding?). But, it’s so much more than their being friends/friendly. They’re absolute geeks and they can really do killer interviews. Check out show 913: (yes, nine-HUNDRED thirteen): Physical Database Design with Kim Tripp. I’ve also done a few other shows with them. If you want a few more shows, check out this post: Getting ready for DotNetRocks tonight (in NYC).

My class on Pluralsight: SQL Server: Why Physical Database Design Matters. And, you can even get a 10-day/200 hour trial subscription HERE.

You always want to keep learning and there are lots of great options – including conferences. If you’re attending either SQL PASS in Charlotte next week OR SQLintersection in Vegas at the end of the month – please stop by and say hi. We’re looking forward to seeing a lot of you over the next few weeks.


PS – Here are a few additional links if you’re interested:

And, since I LOVE u/w photography – here are a few shots I took while out with WHOA:


White-tipped Oceanic Shark

Bottlenose Dolphin

Bottlenose Dolphin

Short-finned Pilot Whales

Short-finned Pilot Whales

Black-tipped Oceanic Shark

Black-tipped Oceanic Shark

Always wear sunscreen

OK, I know… this post is going to remind many of you of Baz Luhrmann’s Everybody’s Free to Wear Sunscreen. And, for me, ’99 was already probably too late (having been a beach lifeguard in the ’80s). But, I’d like to add a vote to that being very good advice.

This post is not going to be a technical one. It’s more of an experience one. And, one that I think everyone can benefit from. It’s true with everything in life – the sooner you find a problem, the easier it is to solve.

Back in 2010, I had a spot on my cheek that didn’t feel like much more than a little sliver of skin sticking up. It seemed to go away but then come back again. I saw a dermatologist who said that this is very common and is called Actinic Keratosis. It’s sometimes referred to as “pre-cancer” but the likelihood of it becoming cancerous is about 1 in 100 (this is just want I remember) and that we have an easy solution to freeze it off using cryotherapy. So, it was frozen off and gone. Well, for a few months. Then, it came back. And, then I had it frozen off again.

But, as life would have it, time went by. And, the spot got a bit bigger. But, somehow in my mind – the thought of it becoming cancerous was so low. I just became complacent. Last year, Paul started harassing me that I should go back and get it looked at. I argued that it was no big deal. It’s nothing. And, it wasn’t really doing anything interesting. Until this year. This year it actually scabbed and bled a couple of times. And, it seemed to be getting a tad larger. And, so I started doing some research, I read that there were a few signs that warranted having a skin lesion looked at:

  • Irregular in shape
  • Raised on the skin
  • Sore that never seems to completely heal or go away
  • Growing
  • Larger than the eraser of a pencil

And, all of a sudden I was a bit worried. So, I scheduled an appointment with a dermatologist (my prior dermatologist had left the practice). But, with my schedule and their schedule – this appointment got pushed out and pushed out and pushed out. Finally, I went to the dermatologist after we returned from our summer vacation. Where, as usual, we had been diving. And, yes, where I had been in the sun. So, as usual, I wasn’t really tan. I was more of a brownish shade of red. And, off I went to go see a dermatologist.

Now, I don’t know where things went wrong but this is how I remember that appointment (this is the short version):

(pleasantries, etc.)

Me: Actually, it looks about the best its looked in 6 months. The combination of salt water and sun seems to have done it some good.

Derm: Started to look at my cheek and ended up looking at the wrong spot… that’s how uninteresting it was…

Me: Well, I started to get nervous a few weeks back when I read the ABCs of skin cancer:

Derm: Oh, no. That only applies to BROWN spots.

OK, so this was a new dermatologist. And, we didn’t really know each other. And, she seemed pretty convincing that I was there for no reason. She proceeded to tell me that while the Actinic Keratosis that I was diagnosed with was sometimes called “pre-cancer” that the likelihood of it being cancer was really low (yes, I’d heard that before) AND that it usually takes decades for it to become cancerous.

So, while I wasn’t so impressed with the bedside manners of this dermatologist, they were [at least] making me feel like there was nothing to be worried about. In the back of my mind, I started thinking – maybe I don’t even need to get this biopsied. Maybe I should just get it frozen off again.

But, a bit more time with this dermatologist made me just want to “get it over with.” To be honest, I should have asked a lot more questions. However, I was already feeling like I was wasting their time. So, I just decided to move forward. We did the biopsy. I remember asking what would need to be done if it was something and their response was “not much more than what we’re doing today.” (but, again, it felt more like a brush-off than a real answer)

I left. I went home. I got more and more frustrated for the next few hours as I realized what an incredibly frustrating visit I had had. The doctor didn’t really interact. They didn’t seem to want to answer questions. I thought – at least I can research things and investigate this. I have people to chat with about this, etc. What would the experience be (with that doctor) for someone else who doesn’t use the internet, etc. I ended up writing a 2 page letter to the head of dermatology. Here’s a VERY small highlight:

Today, I had my first and last appointment with Dr. XYZ of Dermatology.

… They need to understand that just because my problem isn’t interesting to them, it’s still something that I want taken seriously. …

The good news is that they took away most of my fears of something nasty; I’m looking forward to hearing my incredibly lame results.

So, I waited. It was a Tuesday when I had the appointment. It was Tuesday night when I wrote the letter. Then, on Wednesday, I was even more frustrated with the biopsy. I was told not to remove the band-aid for 24 hours but when I did, I had a grand-canyon-like crevice on my face. [Note about my overreaction: it was about the size of a dime. But, it was on my face and it was awful looking. All I could think about was – sh*t, that’s gonna leave a mark. And, sh*t why didn’t they tell me that this biopsy was going to be so much bigger than the original spot and much deeper than I expected. Ugh. Again, I wish I had asked more questions.]

And, now I was even more frustrated. It looked like I was going to have a scar – for nothing. I ended up calling my regular doctor who put all of this to rest. Yes, shave biopsies tend to look really gnarly at first but they tend to heal pretty well. And, even a couple of days later, it was looking much better. OK, no biggie.

But, then I got the results (Friday afternoon at 4:30pm btw – have a nice weekend). It was Squamous Cell Carcinoma. Yes, it is cancer. On. my. face. And, Dr personality referred me to a specialist and my interactions with them were done.

So, that was it. I had a consult setup for the following week and I was going to have the procedure done at that time as well. It’s called Mohs Surgery.

OK, there goes my weekend; that was not what I expected.

So, I started reading and reading and reading and reading (er, the internet is both a good and a bad place to go for information). Probably what stood out in my mind the most was this: The average age for SCC is 66. People that get it at a young age are more likely to get other cancers. Great.

The good thing though is that Mohs Surgery didn’t sound too bad and I thought that it was probably the right direction. But, then I started to read about topical solutions (specifically Imiquimod). And, I started to wonder if Mohs was the best next step. And, then I decided I wanted a second opinion. So, I chatted with a few folks and got a referral for another dermatologist. She was out for a couple of days but her nurse talked to the head of their dermatology department and agreed that I should chat with someone else before I have the surgery. So, we cancelled that and I went in for my second opinion.

And, the long story short is that she was wonderful. She felt that Imiquimod wasn’t really the best for what I had. Instead, we should go with Fluorouracil. The Mohs Surgery has a 99% success rate but also was likely to leave a very large scar. My SCC wasn’t very deep but it extended beyond the area of the biopsy. So, we weren’t really sure how large the area actually was. The topical cream has an 85-90% success rate and would be a bit of a pain in the *ss for a few weeks (about 2 months) but would be a lot less likely to scar. And, of course, I can always go back and have the surgery.

So, we decided to go that route. And, that’s really the point of this post. I’m about to start teaching again next week, a couple of conferences are coming up, and my face isn’t all that attractive. In fact, tomorrow’s my birthday and Paul’s decided that the best present for me is a paper bag with two holes that just reads “wife” (yep, isn’t he nice :) ). And, it was going to be obvious that something is up.

But, even more than the lovely visual that is my cheek; I’ve learned a lot from this experience. And, if anything, maybe this will help some of you!

  1. If you don’t like a doctor, get another one. And, tell the clinic/hospital, etc. about your experience. In chatting with a few folks – they said that they had experienced something similar but they never wrote a letter. I realize that a doctor’s time is important. But, so is mine.
  2. Get a second opinion. Talk to people.
  3. Do research. The best post I’ve found about Fluorouracil is this one: My experience isn’t as bad as his but the cream definitely stings. The spot itches. It’s a bigger pain in the *ss than I had expected.
  4. Stay up on your health. The sooner you find a problem, the easier it is to solve. My SCC isn’t horrible. We did catch it very early. It’s very slow growing. But, I still [probably] should have taken care of this last year.
  5. Always wear sunscreen

So, I’ll leave you with a quick shot of what I looked like a few days ago. I’m not sure I’ll do the whole timeline thing as I really just don’t want to remember this nasty thing. But, it’s not contagious and I am fine. It just looks nasty!


Thanks for reading,