{"id":2739,"date":"2014-11-20T11:40:48","date_gmt":"2014-11-20T19:40:48","guid":{"rendered":"http:\/\/3.209.169.194\/blogs\/kimberly\/?p=2739"},"modified":"2014-12-02T00:02:54","modified_gmt":"2014-12-02T08:02:54","slug":"nonclustered-indexes-lookup-key-btree","status":"publish","type":"post","link":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/nonclustered-indexes-lookup-key-btree\/","title":{"rendered":"Nonclustered indexes require the &#8220;lookup&#8221; key in the b-tree when?"},"content":{"rendered":"<p>I received a great question in email and it&#8217;s something I cover in our IEPTO1 (<a href=\"https:\/\/www.sqlskills.com\/sql-server-training\/iepto1\/\" target=\"_blank\">Immersion Event on Performance Tuning, Part 1<\/a>) so I thought I&#8217;d write a post about it&#8230;<\/p>\n<h3><strong>Question:\u00a0<\/strong><\/h3>\n<address style=\"padding-left: 30px;\"><em>When you have a non-unique key value in a nonclustered index, SQL Server adds the RID \/\u00a0Row Identifier\u00a0(if the NC is on a heap table) or the clustered key (if the table has a clustered index)\u00a0to the navigational structure (the b-tree) of the nonclustered index to make non-unique index rows unique. <strong>What is the reason behind that implementation?<\/strong><\/em><\/address>\n<h3><strong>Answer:\u00a0<\/strong><\/h3>\n<p>SQL Server requires the lookup value (the RID or the clustering key) to go up the tree when a nonclustered key is non-unique in order\u00a0to\u00a0efficiently locate rows in the leaf level.<\/p>\n<p>Scenario: Take a non-unique, nonclustered index on gender (on an employee table that&#8217;s clustered by EmployeeID). To show this, I&#8217;ll create a simple tree structure with only 24 rows (12 female and 12 male employees).<\/p>\n<p>Note: this is NOT exactly what the index looks like but for simplicity,\u00a0I&#8217;ll use this. Also,\u00a0this data set is small for simplicity (you can understand this problem with a small data set); the problem is even more exaggerated for larger and larger data sets. Finally, if you&#8217;re new to index internals, consider checking out my online course <a href=\"http:\/\/technet.microsoft.com\/en-us\/sqlserver\/gg508878.aspx\" target=\"_blank\">here<\/a>.<\/p>\n<h2>Possible Option 1: Only push the non-unique nonclustered key up the tree<\/h2>\n<p>If the leaf level is sorted only by the nonclustered key of gender then there&#8217;s no order to the EmployeeID\u00a0values within each gender grouping:<\/p>\n<figure id=\"attachment_2743\" aria-describedby=\"caption-attachment-2743\" style=\"width: 720px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-content\/uploads\/2014\/11\/Slide4.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-2743 size-full\" src=\"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-content\/uploads\/2014\/11\/Slide4.jpg\" alt=\"\" width=\"720\" height=\"440\" srcset=\"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-content\/uploads\/2014\/11\/Slide4.jpg 720w, https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-content\/uploads\/2014\/11\/Slide4-300x183.jpg 300w\" sizes=\"auto, (max-width: 720px) 100vw, 720px\" \/><\/a><figcaption id=\"caption-attachment-2743\" class=\"wp-caption-text\">Imagine a nonclustered index on gender with a clustered index on EmployeeID. If the leaf level were not sorted then every request (DELETE \/ UPDATE) would have to scan ALL of the Fs to find the female row they were searching for&#8230; This is a &#8220;What If&#8221; picture; this is NOT how nonclustered indexes are stored.<\/figcaption><\/figure>\n<p>In this case,\u00a0the index has very few uses. We can get a count of Female or Male employees easily but not much else is efficient. A SELECT will have to scan (from the start &#8211; every time); it can stop once the record is found (EmployeeID is unique) but the cost of always starting at page 1 (for female) or page 4\u00a0(for male) would be too expensive with many rows \/ pages.<\/p>\n<p>Having said that though, you might even start by\u00a0wondering what an index on gender would benefit anyway? The most useful requests are exactly that (counts by gender, list by gender). So, for most <em><strong>queries<\/strong> <\/em>this structure could work and it\u00a0really wouldn&#8217;t matter for <strong><em>queries<\/em><\/strong>. Inserts <em>could<\/em> just go to the end of each section.\u00a0But, what about deletes? What if an employee were to leave? If you were to execute the following:<\/p>\n<pre class=\"brush: sql; title: ; toolbar: true; wrap-lines: true; notranslate\" title=\"\">\r\nDELETE [Employee] WHERE [EmployeeID] = 27;\r\nGO<\/pre>\n<p>How would SQL Server find that row? They would know that the employee is female when they went to the data row itself. But, finding this row within the female grouping of this nonclustered index would ALWAYS require a\u00a0scan through all female rows. The same would be true for an update.<\/p>\n<p>So, while this could work for inserts as well as <em>some <strong>queries<\/strong><\/em>, it&#8217;s horribly inefficient for updates and\u00a0deletes.<\/p>\n<h2>Possible Option 2: Sort the leaf level but still ONLY push the non-unique nonclustered key up the tree<\/h2>\n<figure id=\"attachment_2751\" aria-describedby=\"caption-attachment-2751\" style=\"width: 746px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-content\/uploads\/2014\/11\/NonclusteredIndexTreeStructure_Update2.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-2751 size-full\" src=\"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-content\/uploads\/2014\/11\/NonclusteredIndexTreeStructure_Update2.jpg\" alt=\"\" width=\"746\" height=\"433\" srcset=\"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-content\/uploads\/2014\/11\/NonclusteredIndexTreeStructure_Update2.jpg 746w, https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-content\/uploads\/2014\/11\/NonclusteredIndexTreeStructure_Update2-300x174.jpg 300w\" sizes=\"auto, (max-width: 746px) 100vw, 746px\" \/><\/a><figcaption id=\"caption-attachment-2751\" class=\"wp-caption-text\">Even if they were sorted &#8211; if you don&#8217;t &#8220;push&#8221; the clustered key value (EmployeeID) up the tree, the index cannot be efficiently searched \/ modified, we still have to start at the lowest value and scan until we find the row to DELETE \/ UPDATE. This is a &#8220;What If&#8221; picture; this is NOT how nonclustered indexes are stored.<\/figcaption><\/figure>\n<p>Here we don&#8217;t save much; we can\u00a0still\u00a0stop\u00a0a scan once a value was found. But, we can\u00a0do that in the structure for Option 1 (as long as EmployeeID was defined as unique). \u00a0About the only added benefit of this structure is for an ORDER BY query.<\/p>\n<pre class=\"brush: sql; title: ; toolbar: true; wrap-lines: true; notranslate\" title=\"\">\r\nSELECT [EmployeeID], [Gender]\r\nFROM [Employee]\r\nORDER BY [Gender], [EmployeeID];\r\nGO<\/pre>\n<p>For Option 1 this query would have to add an additional sort. For Option 2, we&#8217;d save this. But, we&#8217;d still have the same problem for a DELETE \/ UPDATE; we&#8217;d have to scan to find the specific row to modify.<\/p>\n<p>So, we&#8217;re really back to the same issues as Option 1:\u00a0this could work for inserts as well as\u00a0<em>even a few more\u00a0<strong>queries<\/strong><\/em>, it&#8217;s horribly inefficient for updates\u00a0and\u00a0deletes.<\/p>\n<h2>SQL Server\u00a0pushes the &#8220;lookup value&#8221;\u00a0up the tree<\/h2>\n<p>This makes each row in a nonclustered tree unique. This makes every query directly seekable to the specific page on which that row resides.\u00a0We can support queries with additional predicates (such as WHERE EmployeeID &gt;). Again, this index doesn&#8217;t provide a lot of uses BUT it&#8217;s still MORE useful when it&#8217;s sorted AND seekable by the composite key and where that key is unique up the tree.<\/p>\n<p>A delete can go directly to the leaf page.<\/p>\n<pre class=\"brush: sql; title: ; toolbar: true; wrap-lines: true; notranslate\" title=\"\">\r\nDELETE [Employee] WHERE [EmployeeID] = 27;\r\nGO<\/pre>\n<p>After going to the row, we&#8217;d find that EmployeeID 27 is female. Using the root\u00a0page we can see that the first pointer\u00a0is for\u00a0EmployeeID 2 or higher. The second pointer is for EmployeeID 59 or higher. Since we&#8217;re looking for 27 &#8211; we&#8217;d use the first pointer which points to page\u00a01 for data.<\/p>\n<p>In the same way, an update can go directly to the leaf page. An insert has a specific place to insert (yes, these can cause splits).<\/p>\n<figure id=\"attachment_2754\" aria-describedby=\"caption-attachment-2754\" style=\"width: 746px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-content\/uploads\/2014\/11\/NonclusteredIndexTreeStructure_Update31.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-2754 size-full\" src=\"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-content\/uploads\/2014\/11\/NonclusteredIndexTreeStructure_Update31.jpg\" alt=\"\" width=\"746\" height=\"433\" srcset=\"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-content\/uploads\/2014\/11\/NonclusteredIndexTreeStructure_Update31.jpg 746w, https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-content\/uploads\/2014\/11\/NonclusteredIndexTreeStructure_Update31-300x174.jpg 300w\" sizes=\"auto, (max-width: 746px) 100vw, 746px\" \/><\/a><figcaption id=\"caption-attachment-2754\" class=\"wp-caption-text\">With both the non-unique nonclustered key as well as the clustered key, you have a tree structure that can be efficiently searched \/ modified \/ updated.<\/figcaption><\/figure>\n<p>So, while I could have used a larger and more realistic example for this &#8211; this definitely shows the need for the clustering key in the b-tree portion of an index.<\/p>\n<p><strong>Have fun and keep those questions coming!<\/strong><\/p>\n<p>Cheers,<br \/>\nk<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I received a great question in email and it&#8217;s something I cover in our IEPTO1 (Immersion Event on Performance Tuning, Part 1) so I thought I&#8217;d write a post about it&#8230; Question:\u00a0 When you have a non-unique key value in a nonclustered index, SQL Server adds the RID \/\u00a0Row Identifier\u00a0(if the NC is on a [&hellip;]<\/p>\n","protected":false},"author":7,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17,36,48],"tags":[],"class_list":["post-2739","post","type-post","status-publish","format-standard","hentry","category-clustering-key","category-indexes","category-nonclustered-indexes"],"_links":{"self":[{"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/posts\/2739","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/comments?post=2739"}],"version-history":[{"count":0,"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/posts\/2739\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/media?parent=2739"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/categories?post=2739"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/tags?post=2739"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}