SQLskills SQL101: Indexes on Foreign Keys

As many of you know, we started our SQLskills SQL101 series a couple of weeks ago… it’s been great fun for the whole team to go back through our most common questions / concerns and set the record straight, per se. We’ve still got a lot of things to discuss but indexing is one of many questions / discussions and unfortunately, misunderstandings.

I’m going to tie today’s post with a question I received recently: if I have a table that has 6 foreign key columns/references should I create one index with all 6 foreign key columns in it, or should I create 6 individual indexes – one for each foreign key reference.

This is an interesting question with a few tangents to cover for our SQL101 series. I consider indexing foreign keys as part of my “Phase 1 of index tuning.” But, let’s go back to some basics before we dive into the answer on this one.

What is a Foreign Key Enforcing?

Imagine you have two tables: Employees and Departments. The Employee table has a column called DepartmentID and it represents the department of which that employee is a member. The department ID must be a valid department. So, to guarantee that the department ID exists and is valid – we create a foreign key to the DepartmentID column of the Departments table. When a row is inserted or updated in the Employees table, SQL Server will check to make sure that the value entered for DepartmentID is valid. This reference is very inexpensive because the foreign key MUST reference a column which is unique (which is in turn, enforced by a unique index).

What Must Exist in Order to Create a Foreign Key Reference?

A foreign key can be created on any column(s) that has a unique index on the referenced table. That unique index can be created with a CREATE INDEX statement OR that index could have been created as part of a constraint (either a UNIQUE or PRIMARY KEY constraint). A foreign key can reference ANY column(s) that has a UNIQUE index; it does not have to have been created by a constraint. And, this can be useful during performance tuning. A UNIQUE index offers options that constraints do not. For example, a UNIQUE index can have included columns and filters. A foreign key reference CAN reference a UNIQUE index with included columns; however, it cannot reference a UNIQUE index with a filter (I wish it could).

A good example of this might occur during database tuning and specifically during index consolidation (something I do after I do after query tuning and when I’m determining the best index for the database / for production). I often review existing indexes as well as any missing index recommendations, etc. Check out my SQLskills SQL101: Indexing Basics post for more information about these concepts.

Imagine I have the following:

Table: Employees, column: NationalID (SocialSecurityNumbery)

Column NationalID: this is an alternate key for Employees as their Primary Key is EmployeeID. Because it’s another column on which you will lookup employees and you want to make sure it’s unique, you decide to enforce it with a UNIQUE constraint on it. You may even reference it from other tables.

However, later, while doing database tuning, you decide that you need the following index:

CREATE INDEX [QueryTuningIndex]
ON [dbo].[Employees] ([NationalID])
INCLUDE ([LastName], [FirstName])

This index would be similar to and redundant with the existing constraint-based index on NationalID. But, you really want this new index to help performance (you’ve tested that this index is helpful to some frequently executed and important queries so you’ve already decided that the costs outweigh the negatives).

And, this is where the excellent feature to reference a unique index comes in… instead of adding this new one and keeping the existing constraint, change the index to the following:

ON [dbo].[Employees] ([NationalID])
INCLUDE ([LastName], [FirstName])

The uniqueness is always enforced ONLY on the key-portion of the index. So, this new index – even with included columns – still does this. The only bad news is that SQL Server has already associated the foreign key with the original constraint-based index so you’ll still need to remove the foreign key to drop the constraint (I wish this weren’t true). But, you’ll still have data integrity handled by the new unique index – as long as you create the new index before you drop the foreign key and the original unique constraint. Having said that, there’s more bad news – because there will be a short timeframe where the foreign key does not exist, you must do this off hours and when little-to-no activity is occurring. This will reduce the possibility of rows being inserted / updated that do not have a valid reference. You’ll certainly find out when you add the referential constraint again as the default behavior of adding the foreign key will be to verify that all rows have a reference row. NOTE: there is a way to skip this checking but it is NOT recommended as your constraint will be marked as untrusted. It’s is ALWAYS PREFERRED to create a foreign key with CHECK. Here’s a script that will walk you through the entire example – leveraging the default behavior to recheck the data when the constraint is created. Be sure to run this is a test / junk database.

Creating an Index on a Foreign Key Column

Foreign keys can reference any column(s) that have a UNIQUE index (regardless of whether it was created by a constraint).

Foreign keys can reference any column(s) that have a UNIQUE index (regardless of whether it was created by a constraint).

Now that you know the options for the column being referenced, let’s consider what’s required for the referencing column? The column on which the foreign key is created will not have an index by default. I wrote about this in a prior post: When did SQL Server stop putting indexes on Foreign Key columns? and the main point is that SQL Server has NEVER automatically created indexes on foreign key columns. But, many of us recommend that you do! (but, I also wish indexing were just that simple because this might not be an index you keep forever…)

See, if EVERY foreign key column automatically had an index created for you – then SQL Server might end up requiring it to always be there. This would then remove the option of consolidating this index with others when you were later running through performance tuning techniques. So, while it’s generally a good idea to have an index on a foreign key column; it might not ALWAYS be a good idea to keep that narrow index as you add other (probably, slightly-wider indexes).

However, initially, creating this index is what I call: Phase 1 of tuning for joins.

But, there are 3 phases of tuning for joins and these phases are all during query tuning and not database tuning. So, as you do deeper tuning, you might end up consolidating this foreign key index with another index(es) to reduce the overall number of indexes on your table.

Finally, the most important point (and this answers the original question), the index must be ONE per foreign key (with only the column(s) of that specific foreign key); you will create one index for each foreign key reference.

Indexes on Foreign Keys Can Help Improve Performance

There are two ways in which these indexes can improve performance.

First, they can help the foreign key reference maintain integrity on the referenced table. Take the example of Employees and Departments. Not only must SQL Server check that a DepartmentID is valid when inserting / updating an Employee row but SQL Server must also make sure that referential integrity is maintained when DepartmentIDs are removed from the Departments table. An index on the Employees table (on the DepartmentID columns) can be used to quickly check if any rows reference the DepartmentID being deleted from the Departments table. Without an index on DepartmentID in the Employees table, SQL Server would potentially have to scan the Employees table; this can be quite expensive.

Second, and this doesn’t always work, SQL Server may be able to use the index to help improve join performance. And, this is where I’ll cut the ideas a bit short as other phases of join tuning are more complex for this SQL101 post. So while there are other strategies that can be used to tune joins when this doesn’t work, it’s still a fantastic starting point. In fact, I generally recommend indexing foreign keys as part of your very early / development phase for a database. But, again, these indexes might be consolidated later in favor of other indexes.


Indexing for performance has many steps and many strategies, I hope to keep uncovering these in our SQL101 series but between this post and the Indexing Basics post, you’re well on your way to kicking off a better tuning strategy for your tables!

And, don’t forget to check out all of our SQL101 posts here!

Thanks for reading,

SQLskills SQL101: Indexing Basics

SQLskills introduced our new SQL101 recently and well… indexing is something that everyone needs to get right. But, it’s not a simple task. And, as I start to sit down to write a SQL101 post on indexing, I suspect I’m going to struggle keeping it simple? However, there are some core points on which I will focus and I’ll be sure to list a bunch of additional resources to get you more information from here! Remember, the point of each of our SQL101 posts is to make sure that everyone’s on the same page and has the same basic idea / background about a topic. And, for indexing, that’s incredibly important (and, often, misunderstood).

What is an Index?

Simply, it’s a structure that’s applied to a set of [or, subset of] data to enforce ordering – either to quickly check uniqueness or to aid in accessing data quickly. Simply, that’s why you create an index. You’re either wanting to enforce data integrity (such as uniqueness) or you’re trying to improve performance in some way.

How Does an Index Enforce Uniqueness?

If you want to maintain uniqueness over a column (or a combination of columns), SQL Server takes the required data (and very likely more data than you specifically chose) and sorts that data in an index. By storing the data in sorted order, SQL Server is able to quickly determine if a value exists (by efficiently navigating the index structure). For this intro post, it doesn’t entirely matter exactly what’s in the index but it does matter which index you choose and for what purpose.

Relational Rules Enforced by Indexing

In a relational database, relational rules rule the world. And, many learn some of the rules rather quickly. One of these rules is that relational theory says that every table must have a primary key. A primary key can consist of multiple columns; however, none of the columns can allow NULLs and the combination of those columns must be unique. While I agree that every table should have a primary key, what’s chosen AS the primary key can be more complex than the relational rules allow. Behind the scenes, a primary key is enforced by an index (to enforce and check for uniqueness). The type of index that is used depends on whether or not you explicitly state the index type or not. If you do not explicitly state an index type, SQL Server will default to trying to enforcing your primary key constraint with a unique clustered index. If a clustered index already exists, SQL Server will create a nonclustered index instead.

And, this is where things get tricky… a clustered index is a very important index to define. Internally, the clustering key defines how the entire data set is initially ordered. If not well chosen then SQL Server might end up with a structure that’s not as efficient as it could be. There’s quite a bit that goes into choosing a good clustering and I’ll stress that I think it’s one of the most important decisions to be made for your tables. And, it also needs to be made early as later changes to your clustering key can be difficult at best (often requiring downtime and complex coordinated scripting after already suffering poor performance before you make the decision to change).

So, let’s keep this simple… you’ll want to choose the clustering key wisely and early. And, you’ll want to get a good understanding on the things that depend on the clustering key. I did a discussion and demo in my Pluralsight course: Why Physical Database Design Matters and I’ve discussed this quite a bit in my blog category: Clustering key.

The key point is that the primary does NOT have to be enforced with a clustered index. Sometimes your primary key is not an ideal choice as the clustering key. Some great clustering key choices are:

  • Composite key: EntryDate, RowID where EntryDate is an ever-increasing date value that follows the insert pattern of your data. For example, OrderDate for a table that stores Orders. RowID should be something that helps to uniquely identify the rows (something like an identity column is useful). Key points: choose the smallest (but reasonable) data types for both the date and the ID. Ideally, use DATETIME2(p) where p is the level of precision desired. And, for an identity column – choose INT if you know you’ll never get anywhere near 2 billion rows. However, if you even think you’ll have “hundreds of millions” of rows, I’d probably go straight for BIGINT so that you never have to deal with the problems that you’ll have if you run out.
  • Identity column: When I don’t have a good composite key like that above, I’ll often consider an identity column for clustering – even if my queries aren’t specifically using this value. Even if you’re not explicitly using this value, SQL Server is using it behind the scenes in its nonclustered indexes. Please note that this is both a good thing and a bad thing. If your clustering key is very narrow then you’re not unnecessarily widening your nonclustered indexes. That isn’t to say that you won’t have a few wider nonclustered indexes but choosing a wide clustering key makes all of your nonclustered indexes wide when they might not need to be.

OK, I feel like I’ve started to open a car of worms with this one. But, the key points are:

  1. The primary key does NOT have to be clustered (and sometimes it’s better not to be)
  2. The clustering key needs to be chosen early and based on many factors – there’s no single right answer ALL the time… for example, if you don’t need any nonclustered indexes then the width of the clustering key becomes less of an issue.

At the end of this post, I’ll point you to more resources to help you to make a better decision.

What about Indexing for Performance?

In addition to enforcing uniqueness (and, allowing SQL Server to quickly determine whether or not a value already exists), indexes are used to help performance. And here’s where there are some very simple yet important things to understand. There are two types of performance tuning methods that I want to describe here: query tuning and database tuning. What’s often done most is query tuning. While that might be [temporarily] good for that query, it’s NOT a good long-term strategy for the server. I always START with query tuning but that’s not an ideal strategy to implement directly on your production server.

Query Tuning

Database Tuning Advisor for "query tuning" from SSMS

Indexing for performance using the DTA [Database Tuning Advisor] for “query tuning” from SSMS

Query tuning is where you focus on nothing but the query to come up with a better indexing strategy. You might use the Database Engine Tuning Advisor on the specific query right from SQL Server Management Studio (see image). Or, you might use the “green hint” found in the show actual executing plan window. While these might significantly help you with that query, they may not be ideal to implement in your database (where other users are executing, where other indexes exist, and where resource access/usage are at a premium).

Don’t get me wrong, query tuning is a MUST. But, it’s just a starting point. I always start my tuning process by determining the best indexes for a query. But, you can’t stop there. You MUST do “database tuning” if you want your production database to truly scale.

Database Tuning

Before you create a desired index in production (or, while you’re doing testing / analysis in development / QA [quality assurance]) you really want to check to see if this index is going to be good for production.

Are there other similar indexes?

Maybe you can consolidate some of these indexes into one. Yes, this consolidated index might not be the best for the individual queries but by creating one index instead of three, you’ll be reducing the cost of this index for data modifications, maintenance, storage, etc.

Are there any suggested missing indexes?

Again, before I create a new index, I want to see if I can get more uses out of it. Can I consolidate this new index with existing and/or missing recommendations? If I can then I’ll get more uses out of this index.

Are there good maintenance strategies in place?

Before you go and create more indexes, make sure that your existing indexes are being maintained. You should also check that the indexes being maintained are actually being used. To be honest, you should do that BEFORE you do any tuning at all.

SUMMARY: Steps for Database Tuning and Server Scalability

  1. Get rid of the dead weight. Clean up unused indexes. Consolidate similar indexes.
  2. Make sure your index maintenance strategy is in place. There’s no point in adding indexes if you’re not cleaning up fragmentation and reducing splits.
  3. Then, you can consider adding indexes BUT only after you’ve done the following:
    1. Query tuning
    2. Existing index consolidation
    3. Missing index consolidation
    4. And, of course, TESTING!

Well… that was much longer than I had hoped. But, there are a lot of good concepts here. Unfortunately, indexing for performance is just NOT a simple discussion. You can’t just put an index on every column and expect things to work well. Conversely, some of these tools seem helpful but they mostly do query tuning and not database tuning. If you really want to get better performance diving into indexing is a fantastic way to do this! So, if you’re motivated – here are a ton of resources to consider!

Learning more about Indexing

If you want to learn more about index structures, check out this older (but still useful page of videos). On it, watch them in this order.

  1. Index Internals
  2. Index Internals Demo
  3. Index Fragmentation
  4. Index Fragmentation Demo
  5. Even better – skip 3 and 4 and go to Paul’s Pluralsight course on SQL Server: Index Fragmentation Internals, Analysis, and Solutions
  6. Index Strategies
  7. Index Strategies Demonstration

Also, check out these blog posts:

I’m also working on a much more extensive course on Indexing for Pluralsight, this should be available within the next few months.

Thanks for reading!

Work-Life-Balance and Alternative-Obsessions

Mixing work and pleasure: a wonderful image from an incredible project that blended two styles and two mediums for an amazing result.

Mixing work and pleasure: a wonderful image from an incredible project that blended two styles and two mediums for an amazing result.


Over the past few years, Paul and I have worked to get to a point where we can work with SQL but also satisfy our urge to explore (and dive!). Finding a perfect balance between work and life can be challenging; I’m not sure we’ve done it as we tend to work really hard, and have periods when we’re working a lot. But, then we’ve also found another passion that allows us to completely disconnect. We’re extremely fortunate to have the same crazy passion for travel and adventure and especially diving… And, above all, we’re truly fortunate to have an incredible SQLskills team.

Where in the World are Kimberly and Paul?

So, whether you call it work-life-balance or an alternative-obsession, Paul and I are usually out diving if we’re not out fixing performance problems, dealing with poor design, cracking cases of corruption, etc. But, occasionally, we also allow our obsession to spill into our work; you may have noticed some of the images that we use on the SQLskills website? Sometimes I display images during breaks in class or even between sessions as conferences. Those are all images that I’ve taken on some of our incredible adventures. And, you can read more about the website images here; I have another website | Instagram | Facebook where I post photos – feel free to follow! And, not surprisingly, while I’m shooting stills, Paul’s usually shooting video; he posts many of these online on Facebook.

An Earthscape for SQLskills

And, that brings me to the image in this post…

Back in 2015, I learned about a wonderful project called Between Worlds, described as A documentary film and book featuring the collaboration between Waterscape photographer Henthorne and Earthscape artist Andres Amador.

Paul and I had met Jason [Henthorne] on a dive trip back in 2012 and we’ve always loved his work (not to mention; he’s a really nice guy!) so we looked into joining the Kickstarter for Between Worlds. One of the levels allowed us to get an “earthscape” as a sponsor. The image above is the result! I wish we could have seen it in person but the tide has long since washed it away. We’re so happy to have been part of this fantastic work; it’s always fun to watch a project go from concept to design to implementation to completion… in any world. ;-)

More Stunning Work from Two Wonderful Artists

You can still see a lot more from the two wonderful Between Worlds artists:

So, this post is just a small reminder that there’s a lot of beauty out there; take time every day to find it and you’ll find inspiration everywhere. Most importantly, find what you love and love what you do…  

Thanks for reading,

USE THIS: sp_helpindex

Since I’ve rewritten sp_helpindex a few times, I have a few blogs posts in this category. Each time I do an update I’ll make the NEW version titled USE THIS and I’ll change all others to (OLD).

The point of this updated version of sp_helpindex is to add details for newer functionality (included columns and filters) as well as better describe what’s REALLY in your indexes (at the leaf level AND up the b-tree). This particular version won’t error for new index types AND it’s “generic” and not version specific (meaning you don’t have to have an sp_SQLskills_SQL2012_helpindex and instead just have one sp_SQLskills_helpindex across versions). You use sp_SQLskills_helpindex in exactly the same way as sp_helpindex; if you can pass it into sp_helpindex then you can pass it into sp_SQLskills_helpindex!


sp_helpindex tablename WORKS
sp_helpindex owner.tablename DOES NOT WORK
sp_helpindex 'owner.tablename' WORKS


To setup this version of sp_helpindex, you need TWO scripts. Both scripts are generic and work on all versions from SQL Server 2005 through SQL Server 2016. To produce the detailed output, you need to first create these system procedures.


Step 1: Setup sp_SQLskills_ExposeColsInIndexLevels

Create this procedure first: sp_SQLskills_ExposeColsInIndexLevels.sql  (7 kb).

This is what gives us the tree/leaf definitions.

Step 2: Setup the replacement procedure for sp_helpindex -> sp_SQLskills_helpindex

Create the new sp_helpindex, use: sp_SQLskills_helpindex.sql  (18 kb) to create sp_SQLskills_helpindex.


Have fun! And, post a comment and/or shoot me an email if you find any issues!

Thanks for reading,

SQLskills SQL101: Stored Procedures

Start with a CLEAN slate!

Sometimes going back-to-the-basics is the best thing to do to fully understand something. Some of us have been doing things in a certain way for so long that we’ve missed a very basic / important point. So… SQLskills is introducing a SQL101 series of blog posts to do just that – get back to the basics! These will be posts hitting some of the core / critical points around technologies that we often see used the WRONG way or where there are many misunderstandings that lead to serious problems. And, well, stored procedures are definitely something we see misused in SQL Server (resulting in poor performance).

What is a Stored Procedure?

Simply, it’s a set of Transact-SQL commands that have been grouped together for reuse. It’s a way of taking some process you do repetitively and “wrapping” it into a named unit – allowing parameters. Think of something like sales by customer where you pass in a customer number and it outputs their sales. By creating this “process” as a stored procedure you’re allowing SQL Server to also store an executable plan in the plan cache which can save time in compilation / execution.

The Good News: Centralization

Having your code stored on the server can be fantastic for code reuse and manageability / maintainability… this can also allow the database developers to change schema without the application being aware of those changes. Or worse, requiring the application to change. Instead, the stored procedure can be changed to continue to behave the same way after the schema change as it did before the change. This is especially valuable when there are multiple applications accessing the database. And, there are other reasons but these are some of the most important IMO!

The Bad News: Performance

This executable plan that SQL Server comes up with might not be good for all executions (based on the parameters). Again, think of the case for sales by customer; some customers have very few purchases while other customers have many. When an executable plan is created it is tied to the set of parameters used when that procedure is executed (by default – there are statement options [hints] that can override this) and there isn’t already a plan in the cache. That plan is then placed in the cache so that subsequent users benefit from having a cached plan (saving compile time) but since that plan was optimized with the parameters first passed subsequent executions passing different parameters may warrant a plan change (which SQL Server does NOT do by default). And, this becomes significantly more complex as you have more parameters and more variations in how it’s executed.

More info: once you’ve read this post, get more insight into stored procedures by reading: Stored Procedure Execution with Parameters, Variables, and Literals

The Best News: Recompilation

Now that you’re aware of the fact that a stored procedure may be sensitive to the parameters passed, your database developers can dive deeper and learn options to handle this better. One way is through recompilation but here’s where we open a can of worms. There are quick / simple solutions that can help a procedure in the short-term and there are longer-term solutions that are even better for stored procedures that execute often and are used by a lot of users. Since this is only SQL101, I have a quick guide to what I would / wouldn’t use for recompilation as well as some resources to use to get more information.

Recompile the RIGHT Way

Recompilation is not always bad. But, you want to make sure that you always recompile as little as possible. Ideally, using a strategy that only recompiles the statement(s) that need it – is best. And, as of SQL Server 2005, you have a way to do this using OPTION (RECOMPILE). This is – BY FAR – the easiest strategy to use. However, it can become costly if it’s used too often. So, here’s a quick guide to some of the options with a brief description (note: before you use these options you should do more learning / testing as this is starting to head to 200+ level):

  • CREATE <procedure> WITH RECOMPILE: do not use this. First, you rarely want the entire procedure to be recompiled. Second, because it’s recompiled on every execution SQL Server does not place it in the plan cache so troubleshooting performance problems around these procedures is more difficult. Finally, without getting into deeper details – the type of optimization performed for recompilations done WITH RECOMPILE is not as advanced.
  • EXECUTE <procedure> WITH RECOMPILE: for the same reasons as above, this is not recommended either. But, even while it’s not ideal, I DO use this for testing. Testing: YES. Production: NO. I describe this and do some demos in my DBA Fundamentals webcast that you can watch online here.
  • Statement-level
    • OPTION (RECOMPILE): this is fantastic. But, it can be OVER-used. So, use it sparingly. It’s a great way to temporarily solve a problem but then you might want to use the hybrid solution for long-term scalability / performance.
    • OPTION (OPTIMIZE for …): there are specific cases where this can be useful – primarily when you’re trying to tune for a specific case or a specific “type” of value. But, this takes important (and, sometimes deep) knowledge of the data to get this right. And, it needs to be “watched” more carefully as data changes.
    • OPTION (OPTIMIZE FOR UNKNOWN): This was new in SQL Server 2008 and allows you to tune for the “average” case without having to know what that case is (like you need to know with
  • Hybrid option: sometimes cache / sometimes recompile. This can give you the best of both worlds where you programmatically chose to cache stable plans and recompile unstable plans (which sounds more complicated than it is but it does take more knowledge / more testing to get it right). To help you with this one, check out Building High Performance Procedures. And, make sure to review the DBA Fundamentals webcast here.

OK, so, while I tried to be brief – the “instructor” in me wanted to give you lots of nuggets of info. I hope you found this post helpful! And, if you want to find all of our SQLskills SQL101 blog posts – check out: SQLskills.com/help/SQL101

Thanks for reading!



Explicitly naming CL key columns in NC indexes – when and why

This morning, as I was prepping for my PASS Data Architecture Virtual Chapter presentation on “Why Physical Database Design Matters,” I went to tweet that folks could join the presentation. I stumbled on a discussion that led to Robert Davis (@SQLSoldier) asking this: I was searching to see if you had a blog post on missing indexes recommending CI keys as included columns in NCIs.

And, there was no way I’d be able to do that in 140 characters… so, I thought a blog post would be easier!

First off, you do NOT need to explicitly state clustering key columns in your nonclustered index definition; those that aren’t already part of the index – ARE ADDED automatically by SQL Server. So, it seems weird that the missing index DMVs always add the clustering key to the recommended nonclustered indexes – even if it’s not required?

SIDE NOTE: The “green hint” / missing index recommendation shown in showplan does this as well (since it uses the missing index DMVs behind the scenes).

But, let’s go back to why it’s in there to begin with..

Why does SQL Server NEED a lookup value?

SQL Server uses the clustering key (if the table has a rowstore clustered index defined) as the way to “lookup” a corresponding row when accessing the data from a nonclustered index. The main reason to do this is that the nonclustered index does not have all of the information the query needs so SQL Server has to look up the rest of the data by accessing the data row.

SIDE NOTE: If the table is a HEAP (a table without a clustered index) then SQL Server uses a “fixed RID” which consists of an 8 byte row-ID (2 bytes for the fileID, 4 bytes for the pageID, and 2 bytes for the slot number). There are pros/cons to heaps vs. clustered tables but that’s not the main point of this post; I’m solely going to focus on what happens when a table has a clustered index.

If you think of an index in the back of a book on animals, you have duplicate data (like a “common name”) that’s ordered in the index. We use that sorted order to find the animal in which we’re interested and then the index tells us on what page we can find the rest of the information about that animal.

Nonclustered indexes in SQL Server work similarly. However, there isn’t a person “scanning” the sorted data (as we would a regular index). Instead, SQL Server has TWO structures to an index. One structure (the leaf level) is VERY much like an index in the back of a book (in this case, the common name + a lookup value to get to the rest of the data). The other part of an index structure is called the b-tree; this structure (0 to n levels) is solely used for efficient navigation to the leaf level. (If you’re wondering when an index tree would have 0 levels – only when the data of the entire table is only 1 page. SQL Server will not add a navigational level to a table that’s only one page. Yes, rare. But, yes, I expected someone would have asked! And, again, I’m somewhat simplifying things but I want to get to the main point of the question… :))

So, now we know WHY SQL Server needs the clustering key columns added to the index. And, we know that SQL Server adds them when they’re not explicitly defined… but, is it a problem and why do they do it?

Is it a problem to EXPLICITLY define clustered key columns in a nonclustered index?

No. But, I personally don’t recommend it. If the query doesn’t need it nor is the query going to use it (and, isn’t that the point of a missing index recommendation), then there’s no reason to add it. I never recommend adding anything that isn’t explicitly needed. I can only speculate on why they do it AND it does have one benefit.

First, no, it’s completely unnecessary. But, it does NOT cause any problems (SQL Server won’t add clustering key columns to the index more than once; they’re either explicitly added or internally added – but, they’re never in the index twice).

And, it does provide ONE benefit in that columns that are explicitly defined show up in sp_helpindex and in the management studio tools. So, this helps people to better see what’s actually in their indexes (but, this is kind of a weird way around the inadequacies of both the tools as well as sp_helpindex). So, while there is a benefit; it’s only making up for shortcomings in the tools.

SIDE NOTE: Whether or not you believe me is also part of the problem because none of the standard utilities / sps, etc. expose that SQL Server is doing this. But, you can check table size with sp_spaceused and you should see the size is the same whether the clustering key column is explicitly added or not. And, there are [3rd party] ways to expose that these columns ARE part of the index (I’ve written about my sp_helpindex rewrites before and I know there are a bunch of other tools out there that show it as well! I’m a bit overdue for yet-another sp_helpindex rewrite / update but I do have one planned. So, I hope to get to that within the next few days. Stay tuned. But, the 2012 version still works in 2014/2016 for clustered/nonclustered indexes. I have a new version that works with ALL index types but it still needs a bit more testing – email me if you have columnstore, in-memory indexes, full-text, spatial, or hypothetical indexes and want to do some tests for me!)

When should you explicitly define clustered key columns in a nonclustered index?

This is the real reason I wrote this post was to get to this point. And, it’s something I hadn’t thought about until I ran into it personally…

So, when SHOULD you explictly define the clustering key columns in a nonclustered index? When they ARE needed by the query.

This sounds rather simple but if the column is USED by the query then the index MUST have the column explicitly defined. Yes, I realize that SQL Server will add it… so it’s not necessary NOW but what if things change? (this is the main point!)

What could go wrong?

If you don’t explicitly add clustering key columns in a nonclustered index and the queries actually rely on that clustering key column to be there (for better performance)… then your queries will not be able to use the nonclustered index if the clustered index changes.

What if you want to change your clustering key? What if you later want to create a clustered columnstore index. In either case, all of your nonclustered indexes will no longer include those [former] clustering key columns and the queries that relied on their being present, won’t perform.

So, the simple answer is – always define the index explicitly with the columns needed (in the key, in INCLUDE, wherever they are REQUIRED). If a column is not needed by that query then do not explicitly name it (it might happen behind the scenes but it won’t add extra data where it’s not needed if the clustering key were to change at a later time).

Using this strategy when you’re doing query tuning will make your indexes more readable, more effective / scalable and should work even if there are later changes to your schema!

SIDE NOTE: Indexing is a very difficult topic to discuss in only a blog post and “query tuning” is not the same as “server tuning.” Sometimes the index that’s best for a query is not ideal to add to your server… but, once you get to the point where you want to create an index on your server – explicitly state only the columns absolutely necessary. Don’t just add things “because SQL Server would have done it anyway.”

Hope this helps! And, thanks for reading!!


Setting CE TraceFlags on a query-by-query (or session) basis

IMPORTANT NOTE: Be sure to read the ENTIRE post as there’s a workaround that QUERYTRACEON is ALLOWED inside of stored procedures… just NOT in an adHoc statement. Not documented but I think this is excellent! There are still some uses of what I’ve written but this specific use is much easier because of this undoc’ed “feature.” :-)

While the new cardinality estimator can offer benefits for some queries, it might not be perfect for ALL queries. Having said that, there is an OPTION clause that allows you to set the CE for that query. The setting to use depends on the CE that you’re running under currently. And, as of SQL Server 2016, even determining this can be difficult. At any given time, there are multiple settings that might affect your CE.

In SQL Server 2014, your CE was set by the database compatibility model. If you’re running with compatibility mode 120 or higher, then you’re using the new CE. If you’re running with compatibility mode 110 or lower, then you’re using the Legacy CE. In SQL Server 2016, the database compatibility mode is not the only setting that can affect the CE that you’re using. In SQL Server 2016, they added ‘database scoped configuations’ and introduced:


Having said that, an administrator can always override this setting by setting one of the CE trace flags globally [using DBCC TRACEON (TF, -1) – but I don’t recommend this!].

To use the LegacyCE when the database is set to the new CE, use Trace Flag 9481.
To use the New CE when the database is set to the LegacyCE, use Trace Flag 2312.

Generally, I recommend that most people STAY with the LegacyCE until they’ve thoroughly tested the new CE. Then, and only then, change the compatibility mode. But, even with extensive testing, you might still want some queries to run with the LegacyCE while most run with the new CE (or, potentially the opposite). What I like most about the addition of the new CE is that we have the ability to set EITHER!

But, before setting this (or, overriding how the database is set), let’s make sure we know how it’s set currently… If you’re wondering which CE you’re running under, you can see it within the graphical showplan (in the Properties [F4] window, use: CardinalityEstimationModelVersion OR search for that within the showplan XML).


Above all, what I like most is – CHOICE. I can even set this for a specific query:

FROM [dbo].[Member] AS [m]
WHERE [m].[firstname] LIKE 'Kim%'

However, the bad news is the QUERYTRACEON is limited to SysAdmin only (be sure to read the UPDATEs at the end of this post). Jack Li (Microsoft CSS) wrote a great article about a problem they solved by using a logon trigger to change the CE for an entire session: Wanting your non-sysadmin users to enable certain trace flags without changing your app? Now, I do want to caution you that setting master to trustworthy is not something you should take lightly. But, you should NOT let anyone other than SysAdmin have any other rights in master (other than the occasional EXEC on an added user-defined SP). Here are a couple of posts to help warn you of the danger:

A warning about the TRUSTWORTHY database option
Guidelines for using the TRUSTWORTHY database setting in SQL Server

Having said that, what I really want is to set this on a query by query basis AND I don’t want to elevate the rights of an entire stored procedure (in order to execute DBCC TRACEON). So, I decided that I could create a procedure in master, set master to trustworthy (with the caution and understanding of the above references/articles), and then I can reference it within my stored procedures NOT having to use use 3-part naming (for the sp_ version of the procedure):


USE master;

    (@TraceFlag int,
     @OnOff     bit = 0)
DECLARE @OnOffStr char(1) = @OnOff;
-- Sysadmins can add supported trace flags and then use this
-- from their applications
IF @TraceFlag NOT IN (
            9481 -- LegacyCE if database is compat mode 120 or higher
          , 2312 -- NewCE if database compat mode 110 or lower
        RAISERROR('The Trace Flag supplied is not supported. Please contact your system administrator to determine inclusion of this trace flag: %i.', 16, 1, @TraceFlag);
        DECLARE @ExecStr nvarchar(100);
        IF @OnOff = 1
            SELECT @ExecStr = N'DBCC TRACEON(' + CONVERT(nvarchar(4), @TraceFlag) + N')';
            SELECT @ExecStr = N'DBCC TRACEOFF(' + CONVERT(nvarchar(4), @TraceFlag) + N')';
        -- SELECT (@ExecStr)
        -- RAISERROR (N'TraceFlag: %i has been set to:%s (1 = ON, 0 = OFF).', 10, 1, @TraceFlag, @OnOffStr);


As for using this procedure, you have TWO options. If you can modify the stored procedure then you can wrap a single statement with the change in trace flag. But, to make it take effect, you’ll need to recompile that statement. So, if your stored procedure looks like the following:

( params )
statement; <<<--- problematic statement

Then you can change this to:

( params )
EXEC sp_SetTraceFlag 2312, 1
statement OPTION (RECOMPILE); <<<--- MODIFIED problematic statement
EXEC sp_SetTraceFlag 2312, 0

If you don’t want to set master to trustworthy then you can add a similar procedure to msdb (which is already set to TRUSTWORTHY) and then use 3-part naming to reference it.

USE msdb;

    (@TraceFlag int,
     @OnOff bit = 0)
DECLARE @OnOffStr char(1) = @OnOff;
-- Sysadmins can add supported trace flags and then use this
-- from their applications
IF @TraceFlag NOT IN (
              9481 -- LegacyCE if database is compat mode 120 or higher
            , 2312 -- NewCE if database compat mode 110 or lower
         RAISERROR('The Trace Flag supplied is not supported. Please contact your system administrator to determine inclusion of this trace flag: %i.', 16, 1, @TraceFlag);
         DECLARE @ExecStr nvarchar(100);
         IF @OnOff = 1
             SELECT @ExecStr = N'DBCC TRACEON(' + CONVERT(nvarchar(4), @TraceFlag) + N')';
             SELECT @ExecStr = N'DBCC TRACEOFF(' + CONVERT(nvarchar(4), @TraceFlag) + N')';
         -- SELECT (@ExecStr)
         -- RAISERROR (N'TraceFlag: %i has been set to:%s (1 = ON, 0 = OFF).', 10, 1, @TraceFlag, @OnOffStr);


To use this, you’ll need to use 3-part naming:

( params )
EXEC msdb.dbo.msdbSetTraceFlag 2312, 1
statement OPTION (RECOMPILE); <<<--- MODIFIED problematic statement
EXEC msdb.dbo.msdbSetTraceFlag 2312, 0

Finally, another option is to wrap a statement with the change in trace flag.

EXEC sp_SetTraceFlag 2312, 1;
EXEC sp_SetTraceFlag 2312, 0;  -- don't remember to turn it back off!

Now, you have strategic access to EITHER CE and you don’t have to elevate anyone’s specific rights to SysAdmin. You can even let developers use the changes to the new CE or the LegacyCE in their code which is incredibly useful!

UPDATE: If you can’t change the code then another excellent option would be use to a plan guide (permissions are only at the object/database level: To create a plan guide of type OBJECT, requires ALTER permission on the referenced object. To create a plan guide of type SQL or TEMPLATE, requires ALTER permission on the current database.). Here’s a good post on how to add QUERYTRACEON into a plan guide: Using QUERYTRACEON in plan guides by Gianluca Sartori.

UPDATE #2: Wow! A bit of digging and I stumbled on this ConnectItem: QUERYTRACEON with no additional permissions and after I up-voted it, I noticed the workaround by Dan Guzman: There is currently undocumented behavior in SQL 2014 that allows use of QUERYTRACEON in a stored procedure without sysadmin permissions regardless of module signing or EXECUTE AS. This is a workaround in situations where the query is already in a stored procedure or when a problem ad-hoc query needing the trace flag can be encapsulated in a proc. However, until it’s documented, this workaround is at-your-own-risk. I did NOT know this… So, QUERYTRACEON CAN be used in a stored procedure even without SysAdmin permissions (I actually like this!). And, I tested this in BOTH 2014 AND 2016 and it works! It’s not documented nor is it recommended but I’m happy to see that you do NOT need to do this for QUERYTRACEON. It looks like it evens works in earlier versions. Most of us ran into problems with the clause erroring on adhoc statements so we just didn’t expect it to work INSIDE of a proc. Well… better late than never!

Thanks for reading,

New features plus excellent tips & tricks – 2016 is looking great!

There are so many exciting things going on this year and it’s just getting started! If I’m being honest, I haven’t been as excited for a new release of SQL Server for a long time. Don’t get me wrong – I love new features and I love fixes but I’m also a firm believer in “if it ain’t broke and the features aren’t critical, there’s no rush to upgrade.” And, even though I’m excited about SQL 2016, that’s not going to prevent migration testing and a solid QA pass before upgrading. But, IMO, SQL Server 2016 is a game changer for table design and indexing. I still need to work with it a bit more and I still believe that the BEST large-table architecture isn’t super simple. But, I truly feel that the table design combinations available with SQL Server 2016 (in terms of partitioning [with BOTH partitioned tables AND partitioned views] and indexing) are EXACTLY what you need for high performance hybrid (OLTP+ DSS) implementations. So, this makes me really excited for the upcoming release. I know that most of ySQL_Server_evolution_194x194ou aren’t going to jump ship immediately but you should start learning as much as you can – NOW.

So, since SQL Server 2016 is getting ready to release – check out some of the Microsoft resources that are available already:

Also, if you’re looking for really technical posts from PSS – they just blogged about 2016 with a post titled: SQL 2016 – It Just Runs Faster Announcement. Watch them as that post announces a new techie series on 2016 starting with that post!


And, all of this is also getting us excited about our upcoming SQLintersection show in Orlando the week of April 18. Our phenomenal line-up of presenters has been working on content for our SQLintersection show and a few have blogged about the event:


And, if that’s not enough to get you excited… Richard Campbell of DotNetRocks and RunAsRadio has been recording some DEVintersection / SQLintersection “countdown” sessions on Channel 9 and a few are already available:


I recorded one with Richard yesterday and that’s what’s got me thinking about everything that’s starting to come together for 2016. And, we’re also starting to put together our trivia questions and our SQL questions for our SQLafterDark evening event… this is one of our favorite events at SQLintersection. Yes, the tech content is absolutely top-notch and the speakers are all technical folks that work in the trenches day after day so they’re presenting what they know works; and they can also present well as they’re seasoned speakers. However, an evening event where you kick-back, have a few beers with speakers and peers, and play a goofy (but also interesting and fun) trivia game… who can beat that! Our event is a place where you can really get to intersect with other attendees and speakers and get your tough questions answered! Also, remember to use the discount code ‘SQLskills’ (without quotes) to save $50 on your registration.

So, lots going on and things are really shaping up to be an amazing year. Are you excited about the new release?

Thanks for reading,

SQLintersection Spring 2016 – Our best line-up yet!

As we head into our 7th SQLintersection this April, I’m excited to say that it’s one of our most diverse, complete, and information-packed shows yet! We’ve added a 3rd precon day to our show and with our postcon day, there are 9 full-day workshops from which to choose. We have 3 SQL keynotes that are relevant and timely including two from Microsoft experts Bob Ward and Buck Woody and a third is all about productivity using their fantastic tools (SQL Sentry).

If you’re interested in how we got here – check out some of my past posts:

  1. SQLintersection: a new year, a new conference
  2. SQLintersection’s Fall Conference – It’s all about ROI!
  3. Fall SQLintersection is coming up soon and we can’t wait!
  4. SQLintersection Conference and SQLafterDark Evening Event – what a fantastic week in VegasSQLafterDark

With minimal to no marketing filler, we’ve largely kept our conference focus on ROI and technical content (performance / troubleshooting / tales-from-the-trenches with best practices on how to fix them ) but we’ve also added even more social events so that you really get time to intersect with the conference attendees and speakers. The addition of the SQL-specific evening event SQLafterDark was wildly popular from some of our past shows and that’s returning for Spring!

And, so are many of our phenomenal speakers. But, we also have a few new industry-experts rounding out our line-up.

SQLintersection: Great Speakers!

Once again, I think a great show starts with great speakers and current / useful content. All of these speakers are industry-experts that have worked in data / SQL for years (some can even boast decades) but all are still focused on consulting and working in the trenches. And, they’re good presenters! Not only will you hear useful content but you’ll do so in a way that’s digestible and applicable. Every speaker is either an MCM (Master), a SQL Server MVP, (or both), or a Microsoft employee. But, regardless of their official credentials – ALL are focused on providing the most ROI that’s possible in their session(s) and/or their workshops. Check out this phenomenal list of speakers:

  • Aaron Bertrand
  • Allen White
  • Ben Miller
  • Bob Ward
  • Brent Ozar
  • Buck Woody
  • David Pless
  • Glenn Berry
  • Grant Fritchey
  • Jason Hall
  • Jeremiah Peschka
  • Kimberly L. Tripp
  • Paul S. Randal
  • Randy Knight
  • Scott Fallen
  • Tim Chapman
  • Tim Mitchell
  • Tim Radney

You can read everyone’s full bio on our speaker page here.

SQLintersection: When is it all happening?

The conference officially runs from Monday, April 19 through Thursday, April 21 with pre-conference and post-conference workshops that extend the show over a total of up to 7 full days. For the full conference, you’ll want to be there from Saturday, April 16 through Friday, April 22.

  • Saturday, April 16 – pre-con day. There is one workshops running:
    • Intro to SSIS with Tim Mitchell
  • Sunday, April 17 – pre-con day. There are two workshops running:
    • SQL Server 2014 and 2016 New Features and Capabilities with David Pless and Tim Chapman
    • Developer’s Guide to SQL Server Performance with Brent Ozar and Jeremiah Peschka
  • Monday, April 18 – pre-con day. There are three workshops running during the day with the first keynote of the conference on Monday evening:
    • Performance Troubleshooting using Waits and Latches with Paul Randal
    • Analyzing and Improving I/O Subsystem Performance with Glenn Berry
    • PowerShell for the DBA from 0-60 in a Day with Ben Miller
  • Tuesday, April 19 through Thursday, April 21 is the main conference. Conference sessions will run all day in multiple tracks:
    • Check out our sessions online here
    • Be sure to check out our cross-conference events and sessions .
    • Get your pop-culture trivia and techie-SQL-trivia hat on and join us for SQLafterDark on Wednesday evening, April 20
  • Friday, May 21 is our final full day running with three post-conference workshops running:
    • Queries Gone Wrong: Statistics, Cardinality, Solutions with Kimberly Tripp
    • Leveraging SQL Server in the Cloud with Brent Ozar and Jeremiah Peschka
    • Building Better SSIS Packages with Tim Mitchell

SQLintersection: Why is it for you?

If you want practical information delivered by speakers that not-only know the technologies but are competent and consistently, highly-rated presenters – this is the show for you. You will understand the RIGHT features to troubleshoot and solve your performance and availability problems now!

Check us out: www.SQLintersection.com.

We hope to see you there!

PS – Use the discount code ‘SQLskills’ when you register and receive $50 off registration!

Inconsistent analysis in read committed using locking

Well, I had a second post already planned and partially written but a comment on the first post (in what’s going to become a multi-part series) has made me decide to do a different part 2 post. So, make sure you read this post first: Locking, isolation, and read consistency.

The comment was about how read committed using locking has inconsistencies – and, yes, that’s true. Read committed using locking is prone to multiple inconsistencies; there are things you can do to reduce some anomalies but in a highly volatile OLTP environment you will get inconsistencies for long running readers. This is why having a secondary server for analysis is always a great idea. Then, everyone gets the same version of the truth – the point in time to which the secondary was updated (is that nightly, weekly, monthly?). Now, if the secondary is getting updates through transactional replication, you are back to the same problem. But, here, I cannot imagine NOT using read committed with versioning. Probably one of the BEST places to use versioning is on a replication subscriber. Replication won’t be blocked by readers and readers won’t be blocked by replication:


Why is read committed with versioning preferred for readers?

More and more, we’re being tasked with analyzing “current” data. This is harder and harder to do without introducing anomalies. So, I’ll discuss non-repeatable reads in the bounds of a single statement. To get this to occur with a small data set and without setting up a bunch of connections, etc. I’ll engineer the anomaly.


To do this, I’m going to use a sample database called Credit. You can download a copy of it from here. Restore the 2008 database if you’re working with 2008, 2008R2, 2012, or 2014. Only restore the 2000 database if you’re working with 2000 or 2005 (but, you can’t do versioning on 2000 so this is a 2005 or higher example).

Also, if you’re working with SQL Server 2014, you’ll potentially want to change your compatibility mode to allow for the new cardinality estimation model. However, I often recommend staying with the old CE until you’ve done some testing. To read a bit more on that, check out the section titled: Cardinality Estimator Options for SQL Server 2014 in this blog post.

So, to setup for this example – we need to:

(1) Restore the Credit sample database

(2) Leave the compatibility mode at the level restored (this will use the legacy CE). Consider testing with the new CE but for this example, the behavior (and all examples / locking, etc.) don’t actually change.

Scenario / example

(3) This is where things will get a bit more interesting – you’ll need a couple of windows to create the scenario…

FIRST WINDOW – setup the table:

USE [Credit];

IF OBJECTPROPERTY(object_id('[dbo].[MembersOrdered]'), 'IsUserTable') = 1
    DROP TABLE [dbo].[MembersOrdered];

-- Create a copy of the Member table to mess with!
INTO [dbo].[MembersOrdered]
FROM [dbo].[Member];

-- Create a bad clustered index that's prone to both fragmentation and
-- record relocation. This is part of what makes this scenario more likely!
-- But, that's not the main point of discussion HERE. This is good
-- because it's also easy to visualize
CREATE CLUSTERED INDEX [MembersOrderedLastname]
ON [dbo].[MembersOrdered]
    ([lastname], [firstname], [middleinitial]);

SELECT COUNT(*) FROM [dbo].[MembersOrdered];

Now… we’ll engineer the problem. But, I’ll describe it visually here:

This is a rough approximation of what the data looks like when the table is clustered by lastname

This is a rough approximation of what the data looks like when the table is clustered by lastname (not that I’m recommending that you do this).

The table has NO nonclustered indexes and ONLY a clustered index on lastname. The data is essentially ordered by lastname (not going to get technical about what the data looks like on the page here or the fact that it’s really the slot array that maintains order not the actual rows… for this example – the logical structure of the table is good enough to get the point).

(4) So, now that the table has been created – we will start by creating a problem (a row that will end up blocking us):

SECOND WINDOW – we’ll create a blocking transaction (t1)…

USE [Credit];

UPDATE [dbo].[MembersOrdered]
    SET [lastname] = 'test',
        [firstname] = 'test'
WHERE [member_no] = 9965;

Don’t get me wrong – this is a horribly bad practice (to have interaction in the midst of a transaction and to only send part of the transaction… if you walk away – SQL Server is incredibly patient and will hold locks until the transaction is finished [or, killed by an administrator]). But, the point… now our table will essentially look like this:


Yes, there are other locks that exist but this is the only one we really care about here...

Yes, there are other locks that exist but this is the only one we really care about here…

And, yes, there are MORE locks than just what I show here. SQL Server will also hold an IX lock on the page where this row resides and an IX lock on the table as well. And, of course, SQL Server will also hold a shared lock on the database. All of these locks (IX and the database-level S lock) are used as indicators. Individually, they don’t really block all that many other operations. You can have MANY IX locks on the same page at the same time and you can have many IX locks on the same table at the same time. Essentially these locks are used to show that someone has the “intent” to exclusively lock resources on this resource.

If you’re interested, check out sys.dm_tran_locks to see the locks held at each point of this example…

(5) So, now that the table has a locked row, we will try again to get a row count. Go back to the FIRST WINDOW and JUST re-run the query to get a count:

SELECT COUNT(*) FROM [dbo].[MembersOrdered];

This will be blocked. And, it will sit blocked until the transaction finishes (either through a commit or rollback OR by being killed – none of which we’ll do…yet ).

So, what’s interesting and we need a bit of back-story to fill in is that our count query reads ONLY committed rows (we know that – it’s read committed locking). What a lot of people don’t know is that the row-level shared locks are released as soon as the resource has been read. This is to reduce blocking and because there’s no need (in read committed ) to preserve the state of the row (like repeatable reads does). And, I’ll prove that with an update…

(6) Now we’ll update a row that’s just sitting there and not locked (but, may have already been counted [locked, read, and then the lock was released] by our count query).

THIRD WINDOW – update and move a row that was already counted

USE [Credit];

UPDATE [dbo].[MembersOrdered]
 SET [lastname] = 'ZZZuckerman',
 [firstname] = 'ZZZachary'
WHERE [lastname] = 'ANDERSON'
 AND [firstname] = 'AMTLVWQBYOEMHD';

OK, now here’s another horribly bad practice (not using a key for a modification). But, this table has all sorts of problems (which is what makes it a wonder example for this). And, if I don’t go with something that’s efficiently indexed for this update then the locking would be different. I NEED to get row-level locking here. So, I have to do the update above. And, now what’s happened:

There are so many bad things going on with this . What if there are two rows with that name (ok, unlikely) but you still shouldn't do modifications without a KEY (primary / or candidate)!

There are so many bad things going on with this poor table. What if there are two rows with that name (ok, unlikely) but you still shouldn’t do modifications without a KEY (primary / or candidate)!

(7) Now, it’s time to “complete” the transaction that was blocking our count query.

Go back to the SECOND WINDOW and complete the transaction (to be honest, you can COMMIT or ROLLBACK – it really won’t matter for this example):


(8) Step 7 will have unblocked the row count query so all you need to do is go back to the first window and check the count that’s sitting there… which will be 10001  rows in our table that only has 10000 rows. Why? Because you were NOT guaranteed repeatable reads even in the bounds of a single statement. 

Re-run the count query and you’ll get 10000 again.

Why read committed with versioning solves this problem?

When you change the database to read_commmitted_snapshot you change ALL read committed statements to use version-based reads. The best part is that we’ll never WAIT for in-flight rows nor will we see rows whose transactions completed within the bounds of your statement [the longer running the statements the more that this is possible]. The incorrect count would not occur for a few reasons but I’ll create a list of the reasons here:

When the row modification (t1) is made – the row will be versioned (SQL uses “copy on write” / “point in time” = snapshot) to take the BEFORE modification VERSION of the row and put it into the version store.

When the count begins it will just quickly continue (and not be blocked by the X lock held on the ‘Gohan’ row). This is not a problem, the columns are changing but the row is not being deleted – yes, we should count that row.

If I could switch to the window and run the update fast enough, it too would not be blocked by the version-based read NOR would the moved row be visible (even if it made it there in time) by the version-based read because it’s new location (handled as an INSERT) would be NEWER than our count statement’s START time. Our count will reconcile to the point in time when the statement BEGAN. Nothing is blocked and those changes (even when committed before we get there) are NOT visible to our version-based read.

Is Summary 

Yes, read committed with versioning is often MUCH preferred over read committed using locking. But, versioning does come at an expense. You do need to do monitoring and analysis of THE version store (in THE tempdb). Someday (SQL Server 2016???) maybe we’ll multiple tempdbs (that would be awesome) and ideally, we’ll have some nice capabilities for controlling / limiting version store overhead (of one database over another). But, for right now – you have to make sure that your tempdb is optimized! And, if you have an optimized tempdb and you don’t have a lot of really poorly written long-running queries, then you might find out that the overhead is less than you expected. This is absolutely something that you should be not only considering but using!

Whitepaper: Working with tempdb in SQL Server 2005

Plus, be sure to review Paul’s tempdb category for other tips / tricks, etc.

OK, so now the originally planned part 2 is now trending to be part 3. Depending on questions – I’ll get to that in the next few days!

Thanks for reading,