Indexes: just because you can, doesn’t mean you should!

I've decided to create a new series of posts – just because you can, doesn't mean you should! These are going to go in a new category called "Just because" and I've already got a few lined up in this series. The first one is one I've spoken about often and one that's near/dear to my heart – indexes.

I've often received the question – why did they increase the limit from 249 nonclustered indexes to 999 nonclustered in SQL Server 2008? Does that mean that I have a need for more – potentially, many more – indexes? Simply put – yes and no. (a new spin on "it depends" :)). However, for most environments, NO is the correct answer. And, a VERY STRONG NO at that. To help give you some insight, I've answered the question "how many indexes should a table have" in this SQL Server Magazine Q&A blog post titled How Many Indexes Should I Create? (however, it's a difficult question to answer in general). But if everyone says that tables should have minimal indexes then why would the SQL team increase the limit from 249 to 999? It just doesn't seem to make sense.

And, it's a great start to my "just because" series in that this is one area where you can really shoot yourself in the foot if you create too many indexes (which, btw, has become a VERY common problem). So… I want to break this down into a few parts (and therefore posts).

Why shouldn't I create a lot of indexes?

  • Indexes have overhead for every INSERT/DELETE on the table. Every INSERT must add a row to each/every nonclustered index. Every DELETE must remove a row from each/every nonclustered index.
  • Indexes require disk space. While nonclustered indexes are generally a lot smaller than the table itself their total size (of all indexes together) can often exceed the size of the table by 2 or 3 times. This can be quite normal. However, a table that is poorly indexed and severely over indexed might have index space which is 6-10 times the size of the base table. This is wasted space on disk and also in cache. While disk space is relatively cheap, memory is not. But, that's not even the most important point. There's a lot more to it… For more insight, check out the post titled: Disk space is cheap… (that's NOT the point!)
  • Even if you have the disk space to store all of these indexes, it's unlikely that you can fit all of your tables and all of your excessive indexes in cache. Most environments have a hard time fitting all of their data in general in cache – let alone a poorly/overindexed set of data. As a result, you're going to be constantly marshalling data in and out of memory and causing your system to perform excessive IOs.
  • You can cause the optimizer to waste time in evaluating indexes as well as cause compilation plans to be bigger. This can in turn waste both time and waste even more cache. Especially if you have a lot of plans being generated through adhoc statements. Each statement might waste quite a bit of cache. For more insight, check out the post titled: Clearing the cache – are there other options?

Why is this a common problem?

I'll go into more details in a moment but there are really three reasons:

  1. SHOWPLAN shows missing index recommendations when evaluating a plan. Again, some folks take that to mean that they absolutely should create the index.  Often, I've seen multiple developers working on a project all get their index recommendations in a vacuum (per se) and then they just provide scripts to the DBA for the indexes to implement. At this point, someone should evaluate the overall combination but often no one does. And, this leads to another reason why this is such a big problem.
  2. SQL Server 2005 introduced the missing index DMVs and some people believe that all of the indexes it recommends should be created (which is far from the truth).
  3. SQL Server (every version) will allow you to create redundant indexes. Not just similar (and redundant) but completely duplicate. There is a reason and I wrote about it in this SQL Server Magazine Q&A titled: Why SQL Server Lets You Create Redundant Indexes

The end result of this combination is that more and more servers I see – have too many indexes and often redundant or similar indexes. This wastes resources, time, cache and ultimately adds to a servers inability to scale as the data set grows. See, during development when the data sets were smaller, none of this really matters. Almost none is detected until more data and more users (and unfortunately, more code and more applications) exist. And, at that point it's too late to make schema changes BUT, it's not too late to make index changes. Phew. Believe me, many problems are MUCH harder to solve after an application is in production but indexes are one of the easier ones.

Common Problem 1: Index Recommendations from SHOWPLAN (the green hint)

 

Don't get me wrong – I *LOVE* that this exists. And I *always* look at what they recommend. It's a good way to see what tables might have holes in my indexing strategy. But, there's a potential problem here. These recommendations are tied to the Missing Index DMVs and so BOTH have this fault but it's A LOT more apparent when using the "green hint" than using the DMVs. My main reason to think twice about these recommendations is that the Missing Index DMVs recommendations are based SOLELY on the plan that was executed. SQL Server did not analyze the statements for different algorithms. If a hash join was used then the index recommended will be an index that will help the hash join. However, your query might be SIGNIFICANTLY better off using a merge join instead. No, please understand that this is NOT a generalization (I'm not saying that merge is better than hash) it's just an example of a situation where I saw a query go from many minutes to HALF with the best index for the hash join (hey, that's good, right?) BUT, the query went down to only 4 seconds with the merge join's index. There was NO comparison here.

There are many cases where the plan that was executed IS the best plan but there are also cases where it's not. As a result, if I'm about to create the index from the "green hint" in SHOWPLAN, I will always run the individual query through DTA (the Database Tuning Advisor) when possible. This will more thoroughly evaluate the query, providing me with alternatives. Yes, it's "focused" tuning but if you're already going to create an index for that query then you might as well work out the best one. Having said that, I will ONLY create the index recommended for the table that the "green hint" recommended and I will see if that gives me enough of a gain. DTA will often make additional recommendations for both indexes and stats. For stats, I would HIGHLY recommend creating all of the statistics it recommends but ONLY the table(s) where you actually create the indexes it recommends. Personally, I don't always create all of the recommended indexes. If I'm going to create them I'll do so iteratively – testing the improvements as I go so I know what indexes give me the best gains. But, I will always add the statistics for the tables where I create their recommended indexes. Why? Because the stats (and usually multi-column stats) are there to help the indexes and their uses. So, it's pretty much always a good thing to create those stats.

Common Problem 2: The Missing Index DMVs

Don't get me wrong – I *LOVE* that these exist too and just like SHOWPLAN, I use these as well. There are some great examples out there of ways to better evaluate what's in them, what should be created and there are even some cool posts about clever ways to corroborate what's recommended by looking at showplan XML, etc. Check out these links:

However, there's a dark side to these missing index DMVs and these points aren't often mentioned and – they might SIGNIFICANTLY change the way that you view these recommendations. First, they're not perfect (ok, nothing is but I had to remind you :). The Missing Index DMVs might recommend indexes that you have already created – even after you created them, even after a reboot, etc. They will just continue to recommend the index as if it didn't exist. Check out Paul's post titled: Missing index DMVs bug that could cost your sanity. And, second, my issue ties in with Common Problem 1. Again, these recommendations are NOT for the best index but for indexes that will help the plans as they were executed.

However, there's a HUGE win that comes from the Missing Index DMVs OVER SHOWPLAN. And, that's that they tune for query classes. What this means is that a query that's higher up in the "user impact" category will likely affect many queries and therefore it is much higher on the list to consider. But, even armed with this information, I would NOT automate the creation of these indexes based on the results of queries that access these DMVs. Instead, as an application runs in production and as workload characteristics are being evaluated – save this information and use it when you're tuning to help you know better what indexes might give you the best improvements.

Common Problem 3: Duplicate/redundant Indexes

I can't help you with this one. Except feel free to increase the vote count on this on connect. I found that someone recently (3/17/2011) re-added this as a DCR here: https://connect.microsoft.com/SQLServer/feedback/details/652071/duplicate-indexes#tabs. I've made a comment to further explain it and so I hope that they will reevaluate it.

In Summary

The tools are great. You definitely WANT to use them. However, you must understand their caveats and pitfalls and add indexes only when it's appropriate. Adding more and more indexes does NOT improve performance. And, above all – be sure to review your existing indexes to see if you can drop an existing index OR make your index slightly different to consolidate index definitions into one index that does a lot more with less overhead, etc. I call this Index Consolidation. And, this is another topic for another day. :)

So – why would you want to create lots and lots of indexes?

Ha, I bet you never thought we'd get here? And, for today – I'd rather you just DON'T do it. Just because you can, doesn't mean you should. I'll give you some reasons to consider LOTS of indexes…later…after you've cleaned up all of these redundant indexes! And, to help you find redundant indexes – use my latest version of sp_helpindex from my category: sp_helpindex rewrite. This will help you to more accurately see what's really in those indexes.

Thanks for reading!
kt

Become a SQLskills Insider NOW!

10 thoughts on “Indexes: just because you can, doesn’t mean you should!

  1. Great start, Kimberly. I am always telling my students (and my developers) "just because you can does not mean you should", so I love that you are using that same phrase in the series.

  2. Hey there Glenn – Absolutely! I love that phrase. And, I’ve got a long list of things that certainly fit into that category. I bet you do too!! A topic/series for all, eh?

    Cheers,
    kt

  3. Kimberly, have you talked about multi-attribute indexes from the perspective of how to decide if you need to add new ones or alter an existing single-attribute index to use more than one attribute? Does SHOWPLAN or the Missing Index DMV provide that level of guidance? "Yo, dummy – if you add that TIMESTAMP field to that index over there, your ORDER BY [my_timestamp] query will run in 1/20th the time", that kind of thing?

  4. Kim, didn’t you say something like this 15 or 20 years ago? Surely after all this time SQL Server should be able to handle indexes on every column combination on every table, right? ;)

  5. I love that you pointed out the issue of individual developers "tuning in a vacuum" (re Showplan recommendations), and this is really just a symptom of a wider problem. Too many people get too attached to the idea that THEIR query should be the highly-tuned one, and in a sufficiently complex system compromises always have to be made. If we directed all traffic to the fast lane, nobody would have a faster journey.

  6. Kim,
    You have pointed out the every mistake i have done while Indexing has done by me for the 1st time( I was the developer and i was the acting DBA.. (So worse:)), Starting from the Phrase "Just because you can, doesn’t mean you should" (Awesome!!!..)

    I don’t understand for first time, u have noticed certain things from that list.

    Thanks, :)

  7. Some background before I start with the questions: I am a Microsoft Dynamics AX developer currently in the process of reviewing a rather large project with lots of customisations to the standard Dynamics ERP.

    My first question: Is it correct to say that if you have a table on which many INSERTS and UPDATES are happing, you should definitely have a clustered index on the unique identifier column (PK) of the table?
    And is it also correct to say that for tables from where a lot of SELECTS are happing rather have non-clustered indexes, covering the fields most commonly selected?

    My second question: I am trying to delete a clustered index that was on a table with more than 60mil records. The table holds transaction records and the clustered index includes fields such as a TransType, a Date, and a Period Ending; these are not unique fields nor does the data in them follow a set pattern.
    I am having a hard time understanding what the thought process behind this index was, and have come to the conclusion that the developer might not have understood clustered indexes. Do you agree?

    Can I do anything to speed up the deletion of the index, and how would I know if my DROP INDEX statement is just hanging?
    Thanks for allot of great info on your blog!

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Wow! Wow! Wow! THANK YOU!

I announced my retirement from SQL/tech here and your comments on my blog, on LinkedIn, and on Facebook were overwhelming and humbling! I’m so touched

Explore

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.