{"id":468,"date":"2011-07-14T11:39:00","date_gmt":"2011-07-14T11:39:00","guid":{"rendered":"\/blogs\/kimberly\/post\/UnderstandingDuplicateIndexes.aspx"},"modified":"2013-02-23T20:51:39","modified_gmt":"2013-02-24T04:51:39","slug":"how-can-you-tell-if-an-index-is-really-a-duplicate","status":"publish","type":"post","link":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/how-can-you-tell-if-an-index-is-really-a-duplicate\/","title":{"rendered":"How can you tell if an index is REALLY a duplicate?"},"content":{"rendered":"<p>There&#8217;s a lot of confusion around duplicate indexes and I think a lot of it has to do with how sp_helpindex (or even SSMS) display what&#8217;s in an index. What&#8217;s actually in the index &#8211; and how it&#8217;s structured &#8211; are not always what they seem. This was the original motivation behind my rewrite of sp_helpindex but even since then, I&#8217;ve still seen a lot of confusion. In today&#8217;s blog post I&#8217;m going to first explain EXACTLY which indexes are the same and which aren&#8217;t &#8211; as well as the faults in the tools.<\/p>\n<p>So, starting with index structures&#8230; (<em>it all starts with internals <\/em>:) )<\/p>\n<p>The clustered index IS the data. The KEY to the clustered index (what I often refer to as <em>the clustering key<\/em>) defines the way that the data is ordered (not necessarily truly, physically ordered on disk &#8211; but ordered LOGICALLY). <em>And, no, I&#8217;m not going to rehash every bit of internals here&#8230; just a few reminders.<\/em><\/p>\n<p>A nonclustered index is duplicated data (similar to an index in the back of a book). This duplicated data can be used to help reference the actual data (exactly like an index in the back of a book) OR can be used to actually respond to requests (for example, if you&#8217;re solely looking for a count of people with a name that begins with &#8221; then an index that has LastName in the index could be used to count them &#8211; without actually &#8220;looking up&#8221; the actual data). So, there are some very powerful uses to indexes. <em>But, alas, this isn&#8217;t a post on using indexes or indexing strategies &#8211; this is ALL about internals (and understanding the structure of an index).<\/em> So, I&#8217;m going to cut to the chase!<\/p>\n<p>A nonclustered index <strong><span style=\"text-decoration: underline;\">always<\/span> <\/strong>has:<\/p>\n<ul>\n<li>Key (this is what defines the order of the index)<\/li>\n<li>A leaf-level entry (this is the actual data stored in the index + the <strong><em>lookup value* <\/em><\/strong>+ any included columns) &#8211; however, ALL of these columns are only stored once (and they&#8217;ve ALWAYS only been stored once here so, even if you reference a column that&#8217;s part of the <strong><em>lookup value<\/em><\/strong>, SQL Server will NOT duplicate it again).<\/li>\n<\/ul>\n<p>*So, what is this <strong><em>lookup value<\/em><\/strong>?<\/p>\n<p>The lookup value is what SQL Server uses to reference the actual data row. If a table has a clustered index then the lookup value is the clustering key (and EVERY column defined by it). If a table does not have a clustered index (it&#8217;s then a HEAP) then SQL Server uses something called a RID. A RID is an 8-byte structure consisting of 2:4:2 bytes which breakdown into 2 for the FileID, 4 for the PageID and 2 for the slot number. And while RID&#8217;s (and their history) are very interesting &#8211; it&#8217;s not entirely relevant here (how they specifically work and\/or their structures) but if\/when they&#8217;re in an index, I&#8217;ll list is as RID.<\/p>\n<p>Let&#8217;s now put all of this together with an example (or two) starting with a relatively easy one.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nUSE [JunkDB];\r\nGO\r\n\r\nCREATE TABLE Test\r\n(\r\n    TestID  int identity,\r\n    [Name]  char(16)\r\n);\r\nGO\r\n\r\nCREATE UNIQUE CLUSTERED INDEX TestCL ON Test (TestID);\r\nGO\r\n\r\nCREATE INDEX TestName ON Test([Name]);\r\nGO\r\n\r\nsp_helpindex Test\r\nGO\r\n<\/pre>\n<p>The output shows:<\/p>\n<pre class=\"brush: plain; gutter: false; title: ; notranslate\" title=\"\">\r\nindex_name      index_description                               index_keys\r\nTestCL          clustered, unique located on PRIMARY            TestID\r\nTestName        nonclustered located on PRIMARY                 Name\r\n<\/pre>\n<p>And, while that <em>looks <\/em>right, it&#8217;s horribly misleading. The index that&#8217;s on TestName also includes TestID in the index as well. And, not just in the leaf level but in the tree (for ordering purposes). So, that should really show Name, TestID. But, it&#8217;s a bit more confusing if you throw in this:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nCREATE UNIQUE INDEX TestNameUnique ON Test([Name]);\r\nGO\r\n\r\nEXECUTE sp_helpindex Test;\r\nGO\r\n<\/pre>\n<p>The output shows:<\/p>\n<pre class=\"brush: plain; gutter: false; title: ; notranslate\" title=\"\">\r\nindex_name      index_description                               index_keys\r\nTestCL          clustered, unique located on PRIMARY            TestID\r\nTestName        nonclustered located on PRIMARY                 Name\r\nTestNameUnique  nonclustered, unique located on PRIMARY         Name\r\n<\/pre>\n<p>And, at this point, it doesn&#8217;t look like there&#8217;s any difference at all between the second and third indexes (well, except that the third index requires that the values be unique &#8211; shown in the description). But, as for the &#8220;index_keys&#8221; they look the same. However, they&#8217;re actually not the same (in all parts of the tree). So, this is why I tend to differentiate between the &#8220;leaf&#8221; and the non-leaf levels of an index (when I describe them). And, it only gets more complicated when you throw in included columns (2005+).<\/p>\n<p>So, how to do you tell the difference? Unfortunately, there are NO tools within SQL Server (or even any 3rd party tools that I know of) that display this through the UI, etc. But, you could start by using my replacement to sp_helpindex. I&#8217;ve updated and rewritten it a few times but the latest one will always be in the sp_helpindex reqrites category here: <a href=\"https:\/\/www.sqlskills.com\/blogs\/kimberly\/category\/sp_helpindex-rewrites\/\">https:\/\/www.sqlskills.com\/blogs\/kimberly\/category\/sp_helpindex-rewrites.aspx<\/a>. I personally think that&#8217;s the easiest (and is the reason why I originally wrote it!). Using it you can see how the output is more detailed.<\/p>\n<p>The output shows (specifically showing ONLY the last 2 columns):<\/p>\n<pre class=\"brush: plain; gutter: false; title: ; notranslate\" title=\"\">\r\ncolumns_in_tree              columns_in_leaf\r\n[TestID]                     All columns &quot;included&quot; \u2013 the leaf level IS the data row.\r\n[Name], [TestID]             [Name], [TestID]\r\n[Name]                       [Name], [TestID]\r\n<\/pre>\n<p>Now, we&#8217;re getting somewhere. We can definitely see that these two indexes ARE different. A nonunique nonclustered needs to have the lookup value pushed up into the tree (for navigation). A unique nonclustered index does not.<\/p>\n<p>Next, let&#8217;s go with a [just slightly more challenging] example:<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nUSE [JunkDB];\r\nGO\r\n\r\nCREATE TABLE Member\r\n(\r\n    MemberNo    int     identity,\r\n    FirstName   varchar(30)     NOT NULL,\r\n    LastName    varchar(30)     NOT NULL,\r\n    RegionNo    int\r\n);\r\nGO\r\n\r\nCREATE UNIQUE CLUSTERED INDEX MemberCL ON Member(MemberNo);\r\nGO\r\n    \r\nCREATE INDEX MemberIndex1 \r\nON Member(FirstName, RegionNo, MemberNo)\r\nINCLUDE (LastName);\r\nGO\r\n\r\nCREATE INDEX MemberIndex2 \r\nON Member(FirstName, RegionNo)\r\nINCLUDE (LastName);\r\nGO\r\n\r\nCREATE INDEX MemberIndex3 \r\nON Member(FirstName, RegionNo)\r\nINCLUDE (MemberNo, LastName);\r\nGO\r\n\r\nCREATE UNIQUE INDEX MemberIndex4 \r\nON Member(Firstname, RegionNo)\r\nINCLUDE (MemberNo, lastname);\r\nGO\r\n<\/pre>\n<p>First, let&#8217;s review with sp_helpindex:<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nEXECUTE sp_helpindex Member;\r\nGO\r\n<\/pre>\n<p>The output shows:<\/p>\n<pre class=\"brush: plain; gutter: false; title: ; notranslate\" title=\"\">\r\nindex_name      index_description                           index_keys\r\nMemberCL        clustered, unique located on PRIMARY        MemberNo\r\nMemberIndex1    nonclustered located on PRIMARY             FirstName, RegionNo, MemberNo\r\nMemberIndex2    nonclustered located on PRIMARY             FirstName, RegionNo\r\nMemberIndex3    nonclustered located on PRIMARY             FirstName, RegionNo\r\nMemberIndex4    nonclustered, unique located on PRIMARY     FirstName, RegionNo\r\n<\/pre>\n<p>Looking solely at sp_helpindex it LOOKS like the first nonclustered index is different from the others and that the 2nd, 3rd and 4th nonclustered indexes are the same. However, that&#8217;s actually <strong>NOT <\/strong>true. Next, let&#8217;s use my version of sp_helpindex.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nEXECUTE sp_SQLskills_SQL2008_helpindex Member;\r\nGO\r\n<\/pre>\n<p>The output (again, just showing the last two columns of output):<\/p>\n<pre class=\"brush: plain; gutter: false; title: ; notranslate\" title=\"\">\r\ncolumns_in_tree                           columns_in_leaf\r\n[MemberNo]                                All columns &quot;included&quot; \u2013 the leaf level IS the data row.\r\n[FirstName], [RegionNo], [MemberNo]       [FirstName], [RegionNo], [MemberNo], [LastName]\r\n[FirstName], [RegionNo], [MemberNo]       [FirstName], [RegionNo], [MemberNo], [LastName]\r\n[FirstName], [RegionNo], [MemberNo]       [FirstName], [RegionNo], [MemberNo], [LastName]\r\n[FirstName], [RegionNo]                   [FirstName], [RegionNo], [MemberNo], [LastName]\r\n\r\n<\/pre>\n<p>From this, you can see that all 4 indexes have the same leaf level but index 4 has a slightly different tree structure. Ultimately, it&#8217;s indexes 1, 2 and 3 that are the same and index 4 is actually [slightly] different. How they&#8217;re different (outside of the fact that the 4th index guarantees uniqueness) is a bit beyond the scope of this post. But, yes, there are some [relatively minor in this case] differences. And, since I&#8217;m ONLY looking for indexes that are the same then only 1, 2 and 3 fit that requirement.<\/p>\n<p>And, things get MUCH more complicated if you have multicolumn clustering key and\/or more complicated INCLUDEs.<\/p>\n<p><strong>Having said that &#8211; how do you find duplicate indexes? <\/strong><\/p>\n<p>Well&#8230; I had started this by putting together a simple way for you to check for dupes with my version of sp_helpindex but then I figured out an issue with included columns. I&#8217;m showing\u00a0the structures as they&#8217;re defined (and stored). But, in terms of usage &#8211; the order of the columns in the INCLUDE does not matter. As a result, two indexes with different orders for their included columns will show as two different indexes (technically they are and they are stored\u00a0differently). However, there are absolutely NO differences (in terms of usage) so I&#8217;ll need to write code to adjust for that (to truly find duplicates).<\/p>\n<p>For now, here&#8217;s some quick code to get you <em>closer<\/em>. I had written some of this when we discussed this in our London Immersion Event. However, I&#8217;ve tweaked it even further here after thinking about some interesting exceptions. This code will find absolute duplicates (where the order of the structure is completely identical). To use it you&#8217;ll need to enter in the schemaname and tablename\u00a0(in bold below).<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nIF (SELECT count(*) FROM #FindDupes) IS NOT NULL\r\n    DROP TABLE #FindDupes;\r\nGO\r\n\r\nCREATE TABLE #FindDupes\r\n(\r\n index_id           int,\r\n is_disabled        bit,\r\n index_name         sysname,\r\n index_description  varchar(210),\r\n index_keys         nvarchar(2126),\r\n included_columns   nvarchar(max),\r\n filter_definition  nvarchar(max),\r\n columns_in_tree    nvarchar(2126),\r\n columns_in_leaf    nvarchar(max)\r\n);\r\nGO\r\n\r\nDECLARE @SchemaName sysname,\r\n        @TableName  sysname,\r\n        @ExecStr    nvarchar(max);\r\n\r\nSELECT @SchemaName = N'schemaname',     \u2014 MODIFY\r\n       @TableName = N'tablename';       \u2014 MODIFY\r\n        \r\nSELECT @ExecStr = 'EXECUTE sp_SQLskills_SQL2008_helpindex ''' \r\n                    + QUOTENAME(@SchemaName) \r\n                    + N'.' \r\n                    + QUOTENAME(@TableName)\r\n                    + N'''';\r\n\r\nINSERT #FindDupes\r\nEXEC (@ExecStr);\r\n\r\nSELECT t1.index_id, COUNT(*) AS 'Duplicate Indexes w\/Lower Index_ID', \r\n        N'DROP INDEX ' \r\n            + QUOTENAME(@SchemaName, N']') \r\n            + N'.' \r\n            + QUOTENAME(@TableName, N']') \r\n            + N'.' \r\n            + t1.index_name AS 'Drop Index Statement'\r\nFROM #FindDupes AS t1\r\n    INNER JOIN #FindDupes AS t2\r\n        ON t1.columns_in_tree = t2.columns_in_tree\r\n            AND t1.columns_in_leaf = t2.columns_in_leaf \r\n            AND ISNULL(t1.filter_definition, 1) = ISNULL(t2.filter_definition, 1) \r\n            AND PATINDEX('%unique%', t1.index_description) = PATINDEX('%unique%', t2.index_description)\r\n            AND t1.index_id &gt; t2.index_id\r\nGROUP BY t1.index_id, N'DROP INDEX ' + QUOTENAME(@SchemaName, N']') \r\n                    + N'.' \r\n                    + QUOTENAME(@TableName, N']') \r\n                    + N'.' + t1.index_name;\r\nGO\r\n<\/pre>\n<p>I&#8217;m planning to bulid a sproc around this next as well as get rid of cases where the included columns are just in different orders next. It will be another sp_ for master and I do hope to wrap it up shortly! That code will REALLY find ALL unnecessary duplicates and help you to drop them! Stay tuned!!<\/p>\n<p><strong>Cheers and thanks for reading!!<\/strong><\/p>\n<p>kt<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There&#8217;s a lot of confusion around duplicate indexes and I think a lot of it has to do with how sp_helpindex (or even SSMS) display what&#8217;s in an index. What&#8217;s actually in the index &#8211; and how it&#8217;s structured &#8211; are not always what they seem. This was the original motivation behind my rewrite of [&hellip;]<\/p>\n","protected":false},"author":7,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,17,30,36,37,48,63,78],"tags":[],"class_list":["post-468","post","type-post","status-publish","format-standard","hentry","category-clustered-index","category-clustering-key","category-filtered-indexes","category-indexes","category-inside-the-storage-engine","category-nonclustered-indexes","category-sp_helpindex-rewrites","category-tips"],"_links":{"self":[{"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/posts\/468","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/comments?post=468"}],"version-history":[{"count":0,"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/posts\/468\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/media?parent=468"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/categories?post=468"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/tags?post=468"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}