SQLskills SQL101: Indexing Basics

SQLskills introduced our new SQL101 recently and well… indexing is something that everyone needs to get right. But, it’s not a simple task. And, as I start to sit down to write a SQL101 post on indexing, I suspect I’m going to struggle keeping it simple? However, there are some core points on which I will focus and I’ll be sure to list a bunch of additional resources to get you more information from here! Remember, the point of each of our SQL101 posts is to make sure that everyone’s on the same page and has the same basic idea / background about a topic. And, for indexing, that’s incredibly important (and, often, misunderstood).

What is an Index?

Simply, it’s a structure that’s applied to a set of [or, subset of] data to enforce ordering – either to quickly check uniqueness or to aid in accessing data quickly. Simply, that’s why you create an index. You’re either wanting to enforce data integrity (such as uniqueness) or you’re trying to improve performance in some way.

How Does an Index Enforce Uniqueness?

If you want to maintain uniqueness over a column (or a combination of columns), SQL Server takes the required data (and very likely more data than you specifically chose) and sorts that data in an index. By storing the data in sorted order, SQL Server is able to quickly determine if a value exists (by efficiently navigating the index structure). For this intro post, it doesn’t entirely matter exactly what’s in the index but it does matter which index you choose and for what purpose.

Relational Rules Enforced by Indexing

In a relational database, relational rules rule the world. And, many learn some of the rules rather quickly. One of these rules is that relational theory says that every table must have a primary key. A primary key can consist of multiple columns; however, none of the columns can allow NULLs and the combination of those columns must be unique. While I agree that every table should have a primary key, what’s chosen AS the primary key can be more complex than the relational rules allow. Behind the scenes, a primary key is enforced by an index (to enforce and check for uniqueness). The type of index that is used depends on whether or not you explicitly state the index type or not. If you do not explicitly state an index type, SQL Server will default to trying to enforcing your primary key constraint with a unique clustered index. If a clustered index already exists, SQL Server will create a nonclustered index instead.

And, this is where things get tricky… a clustered index is a very important index to define. Internally, the clustering key defines how the entire data set is initially ordered. If not well chosen then SQL Server might end up with a structure that’s not as efficient as it could be. There’s quite a bit that goes into choosing a good clustering and I’ll stress that I think it’s one of the most important decisions to be made for your tables. And, it also needs to be made early as later changes to your clustering key can be difficult at best (often requiring downtime and complex coordinated scripting after already suffering poor performance before you make the decision to change).

So, let’s keep this simple… you’ll want to choose the clustering key wisely and early. And, you’ll want to get a good understanding on the things that depend on the clustering key. I did a discussion and demo in my Pluralsight course: Why Physical Database Design Matters and I’ve discussed this quite a bit in my blog category: Clustering key.

The key point is that the primary does NOT have to be enforced with a clustered index. Sometimes your primary key is not an ideal choice as the clustering key. Some great clustering key choices are:

  • Composite key: EntryDate, RowID where EntryDate is an ever-increasing date value that follows the insert pattern of your data. For example, OrderDate for a table that stores Orders. RowID should be something that helps to uniquely identify the rows (something like an identity column is useful). Key points: choose the smallest (but reasonable) data types for both the date and the ID. Ideally, use DATETIME2(p) where p is the level of precision desired. And, for an identity column – choose INT if you know you’ll never get anywhere near 2 billion rows. However, if you even think you’ll have “hundreds of millions” of rows, I’d probably go straight for BIGINT so that you never have to deal with the problems that you’ll have if you run out.
  • Identity column: When I don’t have a good composite key like that above, I’ll often consider an identity column for clustering – even if my queries aren’t specifically using this value. Even if you’re not explicitly using this value, SQL Server is using it behind the scenes in its nonclustered indexes. Please note that this is both a good thing and a bad thing. If your clustering key is very narrow then you’re not unnecessarily widening your nonclustered indexes. That isn’t to say that you won’t have a few wider nonclustered indexes but choosing a wide clustering key makes all of your nonclustered indexes wide when they might not need to be.

OK, I feel like I’ve started to open a car of worms with this one. But, the key points are:

  1. The primary key does NOT have to be clustered (and sometimes it’s better not to be)
  2. The clustering key needs to be chosen early and based on many factors – there’s no single right answer ALL the time… for example, if you don’t need any nonclustered indexes then the width of the clustering key becomes less of an issue.

At the end of this post, I’ll point you to more resources to help you to make a better decision.

What about Indexing for Performance?

In addition to enforcing uniqueness (and, allowing SQL Server to quickly determine whether or not a value already exists), indexes are used to help performance. And here’s where there are some very simple yet important things to understand. There are two types of performance tuning methods that I want to describe here: query tuning and database tuning. What’s often done most is query tuning. While that might be [temporarily] good for that query, it’s NOT a good long-term strategy for the server. I always START with query tuning but that’s not an ideal strategy to implement directly on your production server.

Query Tuning

Database Tuning Advisor for "query tuning" from SSMS

Indexing for performance using the DTA [Database Tuning Advisor] for “query tuning” from SSMS

Query tuning is where you focus on nothing but the query to come up with a better indexing strategy. You might use the Database Engine Tuning Advisor on the specific query right from SQL Server Management Studio (see image). Or, you might use the “green hint” found in the show actual executing plan window. While these might significantly help you with that query, they may not be ideal to implement in your database (where other users are executing, where other indexes exist, and where resource access/usage are at a premium).

Don’t get me wrong, query tuning is a MUST. But, it’s just a starting point. I always start my tuning process by determining the best indexes for a query. But, you can’t stop there. You MUST do “server tuning” if you want your production database to truly scale.

Server Tuning

Before you create a desired index in production (or, while you’re doing testing / analysis in development / QA [quality assurance]) you really want to check to see if this index is going to be good for production.

Are there other similar indexes?

Maybe you can consolidate some of these indexes into one. Yes, this consolidated index might not be the best for the individual queries but by creating one index instead of three, you’ll be reducing the cost of this index for data modifications, maintenance, storage, etc.

Are there any suggested missing indexes?

Again, before I create a new index, I want to see if I can get more uses out of it. Can I consolidate this new index with existing and/or missing recommendations? If I can then I’ll get more uses out of this index.

Are there good maintenance strategies in place?

Before you go and create more indexes, make sure that your existing indexes are being maintained. You should also check that the indexes being maintained are actually being used. To be honest, you should do that BEFORE you do any tuning at all.

SUMMARY: Steps for Server Tuning and Scalability

  1. Get rid of the dead weight. Clean up unused indexes. Consolidate similar indexes.
  2. Make sure your index maintenance strategy is in place. There’s no point in adding indexes if you’re not cleaning up fragmentation and reducing splits.
  3. Then, you can consider adding indexes BUT only after you’ve done the following:
    1. Query tuning
    2. Existing index consolidation
    3. Missing index consolidation
    4. And, of course, TESTING!

Well… that was much longer than I had hoped. But, there are a lot of good concepts here. Unfortunately, indexing for performance is just NEVER just a simple discussion. You can’t just put an index on every column and expect things to work well. Conversely, some of these tools seem helpful but they mostly do query tuning and not server tuning. If you really want to get better performance diving into indexing is a fantastic way to do this! So, if you’re motivated – here are a ton of resources to consider!

Learning more about Indexing

If you want to learn more about index structures, check out this older (but still useful page of videos). On it, watch them in this order.

  1. Index Internals
  2. Index Internals Demo
  3. Index Fragmentation
  4. Index Fragmentation Demo
  5. Even better – skip 3 and 4 and go to Paul’s Pluralsight course on SQL Server: Index Fragmentation Internals, Analysis, and Solutions
  6. Index Strategies
  7. Index Strategies Demonstration

Also, check out these blog posts:

I’m also working on a much more extensive course on Indexing for Pluralsight, this should be available within the next few months.

UPDATE: Check out my Pluralsight course – SQL Server: Indexing for Performance for more details! Enjoy!!

Thanks for reading!
k

Work-Life-Balance and Alternative-Obsessions

Mixing work and pleasure: a wonderful image from an incredible project that blended two styles and two mediums for an amazing result.

Mixing work and pleasure: a wonderful image from an incredible project that blended two styles and two mediums for an amazing result.

 

Over the past few years, Paul and I have worked to get to a point where we can work with SQL but also satisfy our urge to explore (and dive!). Finding a perfect balance between work and life can be challenging; I’m not sure we’ve done it as we tend to work really hard, and have periods when we’re working a lot. But, then we’ve also found another passion that allows us to completely disconnect. We’re extremely fortunate to have the same crazy passion for travel and adventure and especially diving… And, above all, we’re truly fortunate to have an incredible SQLskills team.

Where in the World are Kimberly and Paul?

So, whether you call it work-life-balance or an alternative-obsession, Paul and I are usually out diving if we’re not out fixing performance problems, dealing with poor design, cracking cases of corruption, etc. But, occasionally, we also allow our obsession to spill into our work; you may have noticed some of the images that we use on the SQLskills website? Sometimes I display images during breaks in class or even between sessions as conferences. Those are all images that I’ve taken on some of our incredible adventures. And, you can read more about the website images here; I have another website | Instagram | Facebook where I post photos – feel free to follow! And, not surprisingly, while I’m shooting stills, Paul’s usually shooting video; he posts many of these online on Facebook.

An Earthscape for SQLskills

And, that brings me to the image in this post…

Back in 2015, I learned about a wonderful project called Between Worlds, described as A documentary film and book featuring the collaboration between Waterscape photographer Henthorne and Earthscape artist Andres Amador.

Paul and I had met Jason [Henthorne] on a dive trip back in 2012 and we’ve always loved his work (not to mention; he’s a really nice guy!) so we looked into joining the Kickstarter for Between Worlds. One of the levels allowed us to get an “earthscape” as a sponsor. The image above is the result! I wish we could have seen it in person but the tide has long since washed it away. We’re so happy to have been part of this fantastic work; it’s always fun to watch a project go from concept to design to implementation to completion… in any world. ;-)

More Stunning Work from Two Wonderful Artists

You can still see a lot more from the two wonderful Between Worlds artists:

So, this post is just a small reminder that there’s a lot of beauty out there; take time every day to find it and you’ll find inspiration everywhere. Most importantly, find what you love and love what you do…

Thanks for reading,
Kimberly

USE THIS: sp_helpindex

Since I’ve rewritten sp_helpindex a few times, I have a few blogs posts in this category. Each time I do an update I’ll make the NEW version titled USE THIS and I’ll change all others to (OLD).

The point of this updated version of sp_helpindex is to add details for newer functionality (included columns and filters) as well as better describe what’s REALLY in your indexes (at the leaf level AND up the b-tree). This particular version won’t error for new index types AND it’s “generic” and not version specific (meaning you don’t have to have an sp_SQLskills_SQL2012_helpindex and instead just have one sp_SQLskills_helpindex across versions). You use sp_SQLskills_helpindex in exactly the same way as sp_helpindex; if you can pass it into sp_helpindex then you can pass it into sp_SQLskills_helpindex!

EXAMPLES:

sp_helpindex tablename WORKS
sp_helpindex owner.tablename DOES NOT WORK
sp_helpindex 'owner.tablename' WORKS

 

To setup this version of sp_helpindex, you need TWO scripts. Both scripts are generic and work on all versions from SQL Server 2005 through SQL Server 2016. To produce the detailed output, you need to first create these system procedures.

 

Step 1: Setup sp_SQLskills_ExposeColsInIndexLevels

Create this procedure first: sp_SQLskills_ExposeColsInIndexLevels.sql  (7 kb).

This is what gives us the tree/leaf definitions.

Step 2: Setup the replacement procedure for sp_helpindex -> sp_SQLskills_helpindex

Create the new sp_helpindex, use: sp_SQLskills_helpindex.sql  (18 kb) to create sp_SQLskills_helpindex.

 

Have fun! And, post a comment and/or shoot me an email if you find any issues!

Thanks for reading,
Kimberly