The Accidental DBA (Day 20 of 30): Are your indexing strategies working? (aka Indexing DMVs)

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

As an accidental DBA you are constantly wearing many hats. You’ve heard that indexes are critical for performance (and that’s absolutely true) but it’s not just any indexes – it’s the RIGHT indexes. The unfortunate thing about indexing is that there’s both a science and an art to it. The science of it is that EVERY single query can be tuned and almost any SINGLE index can have a positive effect on certain/specific scenarios. What I mean by this is that I can show an index that’s ABSOLUTELY fantastic in one scenario but yet it can be horrible for others. In other words, there’s no short cut or one-size-fits-all answer to indexing.

This is where the art comes in – indexing for performance is really about finding the right balance between too many and too few indexes, as well as trying to get more from the indexes that you do keep.

Having said that, it’s way beyond the scope of what we can talk about in a short post. Unfortunately, a full discussion about these things would take a lot of time but if you’re interested in digging in a bit deeper on any of these topics…

  • How do indexes work?
  • What’s the clustered index?
  • What are good/general strategies for indexing?
  • When should columns be put in the key (and in what order should they be defined) versus when should they be in the INCLUDE list?
  • Would we benefit from a filtered index?

… then check out my new (in 2017) online training course SQL Server: Indexing for Performance.

These are difficult discussions. And really, if you’re only meant to “maintain” the system and keep the lights on – these are probably beyond what you can effectively do in your already-packed day.

So, who is handling this job, to create and define the database’s indexes? It’s probably the application developers. These might be in-house developers (which is good – you’ll have someone with whom to consult when you do your analysis) or they might be vendor developers (which might be both good and bad). The good side is that some vendors are open to discussions on their customer support lines and may help you through the issues that you’re seeing. The bad news is that some vendors are not open to discussions and they might not support you if you make any indexing changes.

So, first and foremost – make sure you thoroughly understand the environment you’re about to analyze and do not change anything without verifying that you can.

Introduction

The good news is that it’s NOT all doom and gloom. There are some very helpful DMVs and resources that you can use to analyze your environment and see where some of your indexing issues are. Again, you might not be able to change them (immediately) but, you will be armed with information to help you discuss what you are seeing.

When I’m analyzing a new-to-me system, I tend to break down my index analysis into three parts:

  1. Are there any indexes just lying around not doing anything useful… time to get rid of the dead weight!
  2. Are there any indexes that are bloated and unhealthy – costing me time and space… time to analyze the health of my existing (and useful) indexes
  3. Then, and only then do I feel like you can add more indexes.

Part I: Getting rid of the dead weight

Fully duplicate indexes

SQL Server lets you create redundant/duplicate indexes. This is annoying but it’s always been the case. It certainly begs the question about why SQL Server lets you do this and I wrote up an answer to this in an article on SQL Server Magazine here: Why SQL Server Lets You Create Redundant Indexes (http://sqlmag.com/blog/why-sql-server-lets-you-create-redundant-indexes). Regardless of why, you still need to remove them. And, even if you don’t remember seeing duplicate indexes, you might be surprised. Without knowing index internals, it might be harder to recognize duplicates than you think. It’s not always as easy as Index1 on col1 and Index2 on col1. Internally, SQL Server adds columns to your index and most commands (like sp_helpindex) do not show these internally added columns. The good news is that I have a version of sp_helpindex that does show you the entire structure. And, tied to that updated version of sp_helpindex, I built a script for finding duplicate indexes at either the table-level or database-wide. Check out these links:

But, you could BREAK the application if they’ve used index hints. So, beware! Generally, it might be best to disable an index for a while before you just drop it.

Unused Indexes

Almost as expensive as a duplicate index is one that never gets used. However, this is a lot more challenging to determine. There is a fantastic DMV (sys.dm_db_index_usage_stats) that gives you information about index usage but, it’s not perfect. And, some of the behaviors have changed in some releases (sigh). If you’re really wanting to best understand your index usage patterns you’ll have to persist this information over a business cycle and be sure to persist it prior to index maintenance (see this connect item: Rebuilding an index clears stats from sys.dm_db_index_usage_stats). https://connect.microsoft.com/SQLServer/feedback/details/739566/rebuilding-an-index-clears-stats-from-sys-dm-db-index-usage-stats Note: this is only an issue is SQL Server 2012.

Bug, again, even the information tracked in this DMV isn’t perfect. One of my biggest frustrations is the user_updates only tracks the number of STATEMENTS, not the number of ROWS modified. For example, if I execute this statement (without a WHERE clause) UPDATE Table SET ColumnX = Value and it affects 10,000 rows, then the user_updates column will be incremented by 1 for BOTH the table and any indexes that include ColumnX. So, you might have an even higher (possibly MUCH higher) value for updates.

And, there’s more to it than that. Instead of duplicating this information, I’ll link to a FANTASTIC post by Microsoft PFE Ignacio [Nacho] Alonso’s FAQ around sys.dm_db_index_usage_stats. http://blogs.msdn.com/b/ialonso/archive/2012/10/08/faq-around-sys-dm-db-index-usage-stats.aspx.

Finally, both Paul and I have written about this DMV as well as how to persist it. Check out these posts:

Similar or semi-redundant indexes

You might have some indexes that are good candidates for consolidation:

  • Indexes that have the same key (but possibly different INCLUDEd columns)
    • Index1: Key = LastName
    • Index2: Key = LastName, INCLUDE = FirstName
    • In this case you don’t “NEED” Index1. There’s NOTHING that Index1 does that Index2 cannot also do. However, Index2 is wider. So, a query that solely wants the following will have more I/Os to do because of the wider index:
      • SELECT LastName, count(*) FROM table GROUP BY LastName
    • But, the argument is – how critical is that query? How often is that index really used? Remember, you can use sys.dm_db_index_usage_stats to help you determine how often it’s used.
  • Indexes that have left-based subsets of other index keys
    • Index1: Key = LastName, FirstName, MiddleInitial
    • Index2: Key = LastName INCLUDE = SSN
    • Index3: Key = LastName, FirstName INCLUDE = phone
    • In this case each index does provide some specific (and unique) uses. However, you have a lot of redundancy there.
    • What if you created a new Index: LastName, FirstName, MiddleInitial INCLUDE (SSN, phone)
    • Again, this new index is wider than any of the prior 3 but this new index has even more uses and it has less overall overhead (only one index to maintain, only one index on disk, only one index in cache [and, it’s more likely to stay in cache]). But, you still have to determine how critical each of the queries are that were using the narrower indexes? As well as how much more expensive they are with the new index.

So, this is where the “art” of indexing comes into play. Index consolidation is a critical step in reducing waste and table bloat but there isn’t a simple answer to every consolidation option. This is another “it depends” case.

Part II: Analyze the health of your existing indexes

This topic has been talked about in many places. And, we’ve even chatted about it in our Accidental DBA series here: The Accidental DBA (Day 14 of 30): Index Maintenance.

In the content of this post, I want to make sure that after I’ve cleaned up the dead weight, my existing and useful indexes are healthy. And, you might want to review your index maintenance strategies and see if they’re “lighter” and take less time. And, be sure that they don’t miss anything. A couple of key reminders:

  • Make sure your index maintenance routines at indexes on tables AND views
  • Make sure your index routines use a LIMITED scan if you’re only analyzing avg_fragmentation_in_percent

And, here are a few other resources that you might find handy on this topic:

Part III: Adding more indexes

This is a tricky one. There are lots of good/bad practices around adding indexes. One of the worst is that most folks just add indexes without really fully analyzing (and CORRECTLY analyzing) their existing indexes. The reason I say correctly analyzing their existing indexes is that the tools (like sp_helpindex and SSMS) hide some of the information about columns that might have been added to your indexes. So, unless you really know what your indexes look like you won’t be able to correctly add new indexes while consolidating your existing indexes.

The primary tool that I want to discuss here is the “user impact” aspect of the missing index DMV queries that exist out there (and, there are some great examples of using the missing index DMVs). And, while I STRONGLY encourage you to use them as a GUIDE, I do want you to remember that they’re not perfect. Here are my main issues/concerns/gripes:

  • The missing index DMVs (and therefore the “index/green hint” that shows up in showplan) only tune the plan that was executed. If the plan performed a hash join then the index is going to help the hash join. But, it’s unlikely that the join type will change. And, it might be the case that a different index would perform a different join type and the query would be even faster. If you’re about to trust the missing index DMVs recommendations(or, the green hint), then consider reverse-engineering the queries that are being tuned by these recommendations (see Jon’s post on how to do this) and then (if possible) run these queries through DTA (the Database Engine Tuning Advisor). DTA has capabilities that the missing index DMVs do not in that DTA can “hypothesize” about alternate strategies. This makes the index recommendations even better!
  • The missing index DMVs only think about the BEST index for EACH query. And, that does make sense (from a QUERY tuning perspective) but, you need to do SYSTEM tuning. You can’t just create individual indexes for each and every query that needs one. You definitely want to consider the indexes that have the highest user impact but you also don’t want to forget about consolidation.
  • The missing index DMVs can show indexes that you already have. Missing index DMVs bug that could cost your sanity…

Here are a few Missing Index DMV queries/resources:

Summary

These are the primary things that I’m looking for when I want to see how the already implemented indexing strategies are working as well as the order in which I begin to analyze and change indexes. But, beware: you can negatively impact the environment so it’s important that adequate testing is done to make sure that what you’re doing has a net-positive effect.

Finally, I also did a video summarizing these things that I’m describing here – you might want to check this out as well: http://technet.microsoft.com/en-US/sqlserver/gg545020.aspx.

Thanks for reading!
kt

5 thoughts on “The Accidental DBA (Day 20 of 30): Are your indexing strategies working? (aka Indexing DMVs)

  1. Thanks for lots of information.

    Can you please provide me the clarification on the DUPLICATE Index

    If I have an Indexes like the following
    ◦Index1: Key = Column1
    ◦Index2: Key = Column1, INCLUDE = Column2
    ◦Index3: Key = Column1, INCLUDE = Column3
    ◦Index4: Key = Column1, INCLUDE = Column4
    ◦Index5: Key = Column1, INCLUDE = Column5

    NOTE: Above mentioned Column2 to Column5 used in 4 functions, Each function uses only one column. From the query plan found that each function was doing Index Scan, to make to make it Index Seek I have created duplicate Indexes with INCLUDE option.
    for ex: Select SUM(Column2) From Table1 WHERE Column1=222 [here column1 is a primary key]

    Query 1 => Can I eliminate all the 5 Indexes and create one single index like the following
    ◦Index1: Key = Column1, INCLUDE = Column2,Column2,Column4,Column5

    Query 2 => If “Column1” is a “PRIMARY KEY” how do we handle this situation

    1. Hey there Nagendra – Regardless of whether or not col1 is the primary key is irrelevant. The only real issue is whether or not col1 is the CLUSTERING KEY. It’s true that a lot of primary keys are enforced through a clustered index (thereby defining the clustering key) but if the PK is nonclustered then it will have no impact on other indexes. I explain some of this in the post: How can you tell if an index is REALLY a duplicate? (here: http://3.209.169.194/blogs/kimberly/how-can-you-tell-if-an-index-is-really-a-duplicate/).

      Having said that – you also say that col2-col5 are used in functions. Do you mean aggregates? It’s actually a VERY important difference… If ALL of your queries are searching ONLY on col1 and then aggregating data from ONLY one column then YES, you could create a single index that includes all four of the columns col2-col5 BUT that index would then be wider making a scan more expensive. If each of these queries were individually run frequently AND very important then you might want to consider separate individual indexes. If the table is really wide (lots of OTHER columns) and the aggregates aren’t performed all that often (but, you still want to improve them) then a single index would be fine.

      Finally, if col1 were the CLUSTERING key then the only thing I’d consider is how many additional columns there are in the table and the performance impact on other operations. If the table is narrow (possibly only col1 – col5) AND it’s not heavily updated then you could consider clustering on col1 and then every query will do a seek with a partial scan.

      Hmmm… I’ve just re-read your statement again. To be honest, it’s not really making sense. A covering index doesn’t allow for a seek. A seek is tied to the leading column and whether or not the index is useful (an index on Lastname, Firstname can be used for Lastname alone OR Lastname and Firstname together but NOT Firstname alone).

      So, I think I need specific queries and a table definition. I think you may have reversed some of the columns / information in your example.

      But, maybe my response will still help?! Let me know!
      k

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

SQLskills SQL101: Partitioning

Continuing on our path to understanding the basics and core concepts, there’s a big topic that’s often greatly misunderstood and that’s partitioning. I know I’m

Explore

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.