I've decided to create a new series of posts – just because you can, doesn't mean you should! These are going to go in a new category called "Just because" and I've already got a few lined up in this series. The first one is one I've spoken about often and one that's near/dear to my heart – indexes.
I've often received the question – why did they increase the limit from 249 nonclustered indexes to 999 nonclustered in SQL Server 2008? Does that mean that I have a need for more – potentially, many more – indexes? Simply put – yes and no. (a new spin on "it depends" :)). However, for most environments, NO is the correct answer. And, a VERY STRONG NO at that. To help give you some insight, I've answered the question "how many indexes should a table have" in this SQL Server Magazine Q&A blog post titled How Many Indexes Should I Create? (however, it's a difficult question to answer in general). But if everyone says that tables should have minimal indexes then why would the SQL team increase the limit from 249 to 999? It just doesn't seem to make sense.
And, it's a great start to my "just because" series in that this is one area where you can really shoot yourself in the foot if you create too many indexes (which, btw, has become a VERY common problem). So… I want to break this down into a few parts (and therefore posts).
Why shouldn't I create a lot of indexes?
- Indexes have overhead for every INSERT/DELETE on the table. Every INSERT must add a row to each/every nonclustered index. Every DELETE must remove a row from each/every nonclustered index.
- Indexes require disk space. While nonclustered indexes are generally a lot smaller than the table itself their total size (of all indexes together) can often exceed the size of the table by 2 or 3 times. This can be quite normal. However, a table that is poorly indexed and severely over indexed might have index space which is 6-10 times the size of the base table. This is wasted space on disk and also in cache. While disk space is relatively cheap, memory is not. But, that's not even the most important point. There's a lot more to it… For more insight, check out the post titled: Disk space is cheap… (that's NOT the point!)
- Even if you have the disk space to store all of these indexes, it's unlikely that you can fit all of your tables and all of your excessive indexes in cache. Most environments have a hard time fitting all of their data in general in cache – let alone a poorly/overindexed set of data. As a result, you're going to be constantly marshalling data in and out of memory and causing your system to perform excessive IOs.
- You can cause the optimizer to waste time in evaluating indexes as well as cause compilation plans to be bigger. This can in turn waste both time and waste even more cache. Especially if you have a lot of plans being generated through adhoc statements. Each statement might waste quite a bit of cache. For more insight, check out the post titled: Clearing the cache – are there other options?
Why is this a common problem?
I'll go into more details in a moment but there are really three reasons:
- SHOWPLAN shows missing index recommendations when evaluating a plan. Again, some folks take that to mean that they absolutely should create the index. Often, I've seen multiple developers working on a project all get their index recommendations in a vacuum (per se) and then they just provide scripts to the DBA for the indexes to implement. At this point, someone should evaluate the overall combination but often no one does. And, this leads to another reason why this is such a big problem.
- SQL Server 2005 introduced the missing index DMVs and some people believe that all of the indexes it recommends should be created (which is far from the truth).
- SQL Server (every version) will allow you to create redundant indexes. Not just similar (and redundant) but completely duplicate. There is a reason and I wrote about it in this SQL Server Magazine Q&A titled: Why SQL Server Lets You Create Redundant Indexes
The end result of this combination is that more and more servers I see – have too many indexes and often redundant or similar indexes. This wastes resources, time, cache and ultimately adds to a servers inability to scale as the data set grows. See, during development when the data sets were smaller, none of this really matters. Almost none is detected until more data and more users (and unfortunately, more code and more applications) exist. And, at that point it's too late to make schema changes BUT, it's not too late to make index changes. Phew. Believe me, many problems are MUCH harder to solve after an application is in production but indexes are one of the easier ones.
Common Problem 1: Index Recommendations from SHOWPLAN (the green hint)
Don't get me wrong – I *LOVE* that this exists. And I *always* look at what they recommend. It's a good way to see what tables might have holes in my indexing strategy. But, there's a potential problem here. These recommendations are tied to the Missing Index DMVs and so BOTH have this fault but it's A LOT more apparent when using the "green hint" than using the DMVs. My main reason to think twice about these recommendations is that the Missing Index DMVs recommendations are based SOLELY on the plan that was executed. SQL Server did not analyze the statements for different algorithms. If a hash join was used then the index recommended will be an index that will help the hash join. However, your query might be SIGNIFICANTLY better off using a merge join instead. No, please understand that this is NOT a generalization (I'm not saying that merge is better than hash) it's just an example of a situation where I saw a query go from many minutes to HALF with the best index for the hash join (hey, that's good, right?) BUT, the query went down to only 4 seconds with the merge join's index. There was NO comparison here.
There are many cases where the plan that was executed IS the best plan but there are also cases where it's not. As a result, if I'm about to create the index from the "green hint" in SHOWPLAN, I will always run the individual query through DTA (the Database Tuning Advisor) when possible. This will more thoroughly evaluate the query, providing me with alternatives. Yes, it's "focused" tuning but if you're already going to create an index for that query then you might as well work out the best one. Having said that, I will ONLY create the index recommended for the table that the "green hint" recommended and I will see if that gives me enough of a gain. DTA will often make additional recommendations for both indexes and stats. For stats, I would HIGHLY recommend creating all of the statistics it recommends but ONLY the table(s) where you actually create the indexes it recommends. Personally, I don't always create all of the recommended indexes. If I'm going to create them I'll do so iteratively – testing the improvements as I go so I know what indexes give me the best gains. But, I will always add the statistics for the tables where I create their recommended indexes. Why? Because the stats (and usually multi-column stats) are there to help the indexes and their uses. So, it's pretty much always a good thing to create those stats.
Common Problem 2: The Missing Index DMVs
Don't get me wrong – I *LOVE* that these exist too and just like SHOWPLAN, I use these as well. There are some great examples out there of ways to better evaluate what's in them, what should be created and there are even some cool posts about clever ways to corroborate what's recommended by looking at showplan XML, etc. Check out these links:
- Bart Duncan's post: Are you using SQL's Missing Index DMVs?
- Brent Ozar's post on SQL Server Pedia: Find Missing Indexes
- Michelle Ufford's post: A Look At Missing Indexes and another post: Find Missing Indexes in Stored Procs with T-SQL
However, there's a dark side to these missing index DMVs and these points aren't often mentioned and – they might SIGNIFICANTLY change the way that you view these recommendations. First, they're not perfect (ok, nothing is but I had to remind you :). The Missing Index DMVs might recommend indexes that you have already created – even after you created them, even after a reboot, etc. They will just continue to recommend the index as if it didn't exist. Check out Paul's post titled: Missing index DMVs bug that could cost your sanity. And, second, my issue ties in with Common Problem 1. Again, these recommendations are NOT for the best index but for indexes that will help the plans as they were executed.
However, there's a HUGE win that comes from the Missing Index DMVs OVER SHOWPLAN. And, that's that they tune for query classes. What this means is that a query that's higher up in the "user impact" category will likely affect many queries and therefore it is much higher on the list to consider. But, even armed with this information, I would NOT automate the creation of these indexes based on the results of queries that access these DMVs. Instead, as an application runs in production and as workload characteristics are being evaluated – save this information and use it when you're tuning to help you know better what indexes might give you the best improvements.
Common Problem 3: Duplicate/redundant Indexes
I can't help you with this one. Except feel free to increase the vote count on this on connect. I found that someone recently (3/17/2011) re-added this as a DCR here: https://connect.microsoft.com/SQLServer/feedback/details/652071/duplicate-indexes#tabs. I've made a comment to further explain it and so I hope that they will reevaluate it.
The tools are great. You definitely WANT to use them. However, you must understand their caveats and pitfalls and add indexes only when it's appropriate. Adding more and more indexes does NOT improve performance. And, above all – be sure to review your existing indexes to see if you can drop an existing index OR make your index slightly different to consolidate index definitions into one index that does a lot more with less overhead, etc. I call this Index Consolidation. And, this is another topic for another day. :)
So – why would you want to create lots and lots of indexes?
Ha, I bet you never thought we'd get here? And, for today – I'd rather you just DON'T do it. Just because you can, doesn't mean you should. I'll give you some reasons to consider LOTS of indexes…later…after you've cleaned up all of these redundant indexes! And, to help you find redundant indexes – use my latest version of sp_helpindex from my category: sp_helpindex rewrite. This will help you to more accurately see what's really in those indexes.
Thanks for reading!