{"id":533,"date":"2009-06-07T21:29:00","date_gmt":"2009-06-07T21:29:00","guid":{"rendered":"\/blogs\/kimberly\/post\/The-Tipping-Point-Query-Answers.aspx"},"modified":"2013-01-11T23:11:22","modified_gmt":"2013-01-12T07:11:22","slug":"the-tipping-point-query-answers","status":"publish","type":"post","link":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/the-tipping-point-query-answers\/","title":{"rendered":"The Tipping Point Query Answers"},"content":{"rendered":"<p>\nOK, I&#39;ll definitely take a beating from all of you for having gone so long between my survey posts and now. I won&#39;t even go into the details but between some crazy work schedules, multiple sinus problems and even migraines&#8230; well, I&#39;ve been a bit behind. Let&#39;s just say that April\/May were rough at best. I&#39;m feeling better and well, now I&#39;m trying to catch up. I had really gotten the blogging bug in March but I completely lost it in April. But, this tipping point series is in dire need of lots of explaining so I&#39;m really hoping to get a few posts done in this area for sure!\n<\/p>\n<p>\nFirst, I started the discussion around this in a few surveys:\n<\/p>\n<p>\n<strong>Survey\/Question 1<\/strong>\n<\/p>\n<blockquote>\n<p>\n\tQ1 was described as this: if a table has 1 million rows at 20 rows per page (50,000 pages), at what percentage (roughly) of the data would a nonclustered index no longer be used. Blogged <a href=\"https:\/\/www.sqlskills.com\/blogs\/kimberly\/why-arent-those-nonclustered-indexes-being-used\/\" target=\"_blank\">here<\/a>. Here&#39;s what the survey said as of today:\n\t<\/p>\n<p>\n\t<img decoding=\"async\" src=\"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-content\/uploads\/2009\/6\/tippingpointquery1survey.jpg\" alt=\"\" \/>\n\t<\/p>\n<p>\n\tAnd, for Q1 the&nbsp;correct result (<strong>Between 0-2% of the rows<\/strong>) is actually the best result (but, by no means the overwhelming majority at only 28%). However, often people just &quot;think&quot; the answer is very small. So&#8230; I did a few more questions\/surveys.&nbsp;\n\t<\/p>\n<\/blockquote>\n<p>\n<strong>Survey\/Question 2<\/strong>\n<\/p>\n<blockquote>\n<p>\n\tQ2 was described as this: if a table has 1 million rows at&nbsp;100 rows per page (10,000 pages), at what percentage (roughly) of the data would a nonclustered index no longer be used. Blogged <a href=\"https:\/\/www.sqlskills.com\/blogs\/kimberly\/tipping-point-queries-more-questions-to-really-test-you\/\" target=\"_blank\">here<\/a>. Here&#39;s what the survey said as of today:\n\t<\/p>\n<\/p>\n<p>\n\t<img decoding=\"async\" src=\"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-content\/uploads\/2009\/6\/tippingpointquery2survey.jpg\" alt=\"\" \/>\n\t<\/p>\n<p>\n\tAnd, for Q2 the&nbsp;correct result (<strong>Less than .5% of the rows<\/strong>) is actually at a tie for the best (but, again, even a small percentage at only 22%). Again, often people just &quot;think&quot; the answer is very small. So&#8230; I did one more question\/survey.&nbsp;\n\t<\/p>\n<\/blockquote>\n<p>\n<strong>Survey\/Question 3<\/strong>\n<\/p>\n<blockquote>\n<p>\n\tQ3 was described as this: if a table has 1 million rows at&nbsp;2 rows per page (500,000 pages), at what percentage (roughly) of the data would a nonclustered index no longer be used. Blogged <a href=\"https:\/\/www.sqlskills.com\/blogs\/kimberly\/tipping-point-queries-more-questions-to-really-test-you\/\" target=\"_blank\">here<\/a>. Here&#39;s what the survey said as of today:\n\t<\/p>\n<p>\n\t<img decoding=\"async\" src=\"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-content\/uploads\/2009\/6\/tippingpointquery3survey.jpg\" alt=\"\" \/>\n\t<\/p>\n<p>\n\tAnd, for Q3 the&nbsp;correct result (<strong>Between 10-20% of the rows<\/strong>) is actually NOT the highest answer. And, this is even more convincing that there&#39;s confusion around what&#39;s going on and why.\n\t<\/p>\n<\/blockquote>\n<p>\n<strong>The Tipping Point<\/strong>\n<\/p>\n<p>\n<strong>What is the tipping point? <\/strong>\n<\/p>\n<blockquote>\n<p>\n\tIt&#39;s the point where the number of rows returned is &quot;no longer selective enough&quot;. SQL Server chooses NOT to use the nonclustered index to look up the corresponding data rows and instead performs a table scan.\n\t<\/p>\n<\/blockquote>\n<p>\n<strong>When does the tipping point occur? <\/strong>\n<\/p>\n<blockquote>\n<p>\n\tIt depends&#8230; it&#39;s MOSTLY tied to the number of pages in the table. Generally, around 30% of the number of PAGES in the table is about where the tipping point occurs. However, parallelism, some server settings (processor affinity and I\/O affinity), memory and table size &#8211; all can have an impact. And, since it can vary &#8211; I typically estimate somewhere between 25% and 33% as a <strong><u>rough<\/u><em> <\/em><\/strong>tipping point (and, you&#39;ll see from a bunch of my examples, that number is not EXACT). Then, I translate that into rows.\n\t<\/p>\n<p>\n\t<strong>Math for Tipping Point Query 3:<\/strong> If a table has 500,000 pages then 25% = 125,000 and 33% = 166,000. So, somewhere between 125,000 and 166,000 ROWS the query will tip. Turning that into a percentage 125,000\/1million = 12.5% and 166,000\/1million = 16.6%. So, if a table has 500,000 pages (and 1 million rows) then queries that return less than 12.5% of the data are likely to USE the nonclustered index to lookup the data and queries over 16.6% of the data are LIKELY to use a table scan. For this table, that percentage seems &quot;reasonable&quot;. But, most of us say that the tipping point happens at a much lower percentage&#8230; why? Because&nbsp;row size &#8211; which&nbsp;determines table size (and therefore pages) is really what has the greatest impact. So, let&#39;s look at Tipping&nbsp;Point Query 2&#8230;&nbsp;\n\t<\/p>\n<p>\n\t<strong>Math for Tipping Point Query 2:<\/strong> If a table has 10,000 pages then 25% = 2,500 and 33% = 3,333. So, somewhere between 2,500 and&nbsp;3,333 ROWS the query will tip. Turning that into a percentage 2,500\/1million = .25% and 3,333\/1million = .33% (not even 1%). So, if a table has only 10,000 pages (and 1 million rows) then queries that return less than a quarter of 1% of the data are likely to USE the nonclustered index to lookup the data and queries over one third of one percent are LIKELY to use a table scan. For this table, that percentage seems really low BUT, at the same time it makes sense (to a point) that a small table would be scanned&#8230; but, for less than 1%. 1% is NOT selective enough.&nbsp;For small tables, it might not matter all that much&nbsp;(they&#39;re small, they fit in cache, etc.) but for bigger tables &#8211; it might be a big performance problem.&nbsp;\n\t<\/p>\n<p>\n\t<strong>Math for Tipping Point Query 1:<\/strong> If a table has 50,000 pages then 25% = 12,500 and 33% = 16,666. So, somewhere between 12,500 and&nbsp;16,666 ROWS the query will tip. Turning that into a percentage 12,500\/1million = 1.25% and 16,666\/1million = 1.66% (under 2%). So, if a table has 50,000 pages (and 1 million rows) then queries that return less than 1.25% of the data are likely to USE the nonclustered index to lookup the data and queries over 1.66% are LIKELY to use a table scan. Again, this seems like a low number. Again, for small tables, it might not matter all that much&nbsp;(they&#39;re small, they fit in cache, etc.) but as tables get larger and larger &#8211; it CAN be a big performance problem.&nbsp;\n\t<\/p>\n<\/blockquote>\n<p>\n<strong>Why is the tipping point interesting? <\/strong>\n<\/p>\n<ul>\n<li>\n<div>\n\tIt shows that narrow (non-covering) nonclustered indexes have fewer uses than often expected (just because a query has a column in the WHERE clause doesn&#39;t mean that SQL Server&#39;s going to use that index)\n\t<\/div>\n<\/li>\n<li>\n<div>\n\tIt happens at a point that&#39;s typically MUCH earlier than expected&#8230; and, in fact, sometimes this is a VERY bad thing!\n\t<\/div>\n<\/li>\n<li>\n<div>\n\tOnly nonclustered indexes that do not cover a query have a tipping point. Covering indexes don&#39;t have this same issue (which further proves why they&#39;re so important for performance tuning)\n\t<\/div>\n<\/li>\n<li>\n<div>\n\tYou might find larger tables\/queries performing table scans when in fact, it might be better to use a nonclustered index. How do you know, how do you test, how do you hint and\/or force&#8230; and, is that a good thing?\n\t<\/div>\n<\/li>\n<\/ul>\n<p>\n<strong>Real example of an interesting tipping point<\/strong>\n<\/p>\n<p>\nEarlier today, I went on facebook and twitter and gave the following information &#8211; very vaguely &#8211; and I asked &quot;why&quot; is Q2 so much slower than Q1 if Q2 returns only 10 more rows. Same table and&nbsp;no hints (other than MAXDOP)&#8230;\n<\/p>\n<blockquote>\n<p>\n\tQ1: <font color=\"#0000ff\"><font color=\"#0000ff\">SELECT <font color=\"#808080\"><font color=\"#808080\">*&nbsp;<font color=\"#0000ff\"><font color=\"#0000ff\">FROM&nbsp;table <font color=\"#0000ff\"><font color=\"#0000ff\">WHERE&nbsp;colx <font color=\"#808080\"><font color=\"#808080\">&lt; 597420 <font color=\"#0000ff\"><font color=\"#0000ff\">OPTION <font color=\"#808080\"><font color=\"#808080\">(<font color=\"#0000ff\"><font color=\"#0000ff\">MAXDOP 1<font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\">)\n\t<\/p>\n<ul>\n<li><font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\">returns 197,419 rows<\/li>\n<li><font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\">takes 116,031 ms (1 minute, 52 seconds)<\/li>\n<li><font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\">1,197,700 logical reads, <font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\">5 physical reads, 137,861 read-ahead reads<\/li>\n<li><font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\">7,562 ms CPU time\n<p>\n\t\tQ2: <font color=\"#0000ff\"><font color=\"#0000ff\">SELECT <font color=\"#808080\"><font color=\"#808080\">*<font color=\"#000000\">&nbsp;<font color=\"#0000ff\"><font color=\"#0000ff\">FROM<font color=\"#000000\">&nbsp;table <font color=\"#0000ff\"><font color=\"#0000ff\">WHERE&nbsp;colx <font color=\"#808080\"><font color=\"#808080\">&lt; 597430 <font color=\"#0000ff\"><font color=\"#0000ff\">OPTION <font color=\"#808080\"><font color=\"#808080\">(<font color=\"#0000ff\"><font color=\"#0000ff\">MAXDOP 1<font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\">)\n\t\t<\/p>\n<\/li>\n<\/ul>\n<ul>\n<li><font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\">returns 197,4<u>2<\/u>9 rows<\/li>\n<li><font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\">takes&nbsp;244,094 ms (4 minutes,&nbsp;4 seconds)<\/li>\n<li><font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\">801,685&nbsp;logical reads,&nbsp;<font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\">1410 physical reads,&nbsp;801,678 read-ahead reads<\/li>\n<li><font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\">9,188&nbsp;ms CPU time<\/li>\n<\/ul>\n<p>\n\tThere were lots of great guesses&#8230; but, it&#39;s the tipping point. SQL Server chose to &quot;tip&quot; the second query because it was &quot;over the line&quot;. But, it&#39;s important to realize that there are cases when that&#39;s NOT a good idea. And, what are your options?\n\t<\/p>\n<p>\n\tIn SQL Server 2005 &#8211; the only option is to force the nonclustered index to be used:\n\t<\/p>\n<blockquote>\n<p>\n\t\tQ2: <font color=\"#0000ff\"><font color=\"#0000ff\">SELECT <font color=\"#808080\"><font color=\"#808080\">*<font color=\"#000000\">&nbsp;<font color=\"#0000ff\"><font color=\"#0000ff\">FROM<font color=\"#000000\">&nbsp;table WITH (INDEX (NCInd)) <font color=\"#0000ff\"><font color=\"#0000ff\">WHERE&nbsp;colx <font color=\"#808080\"><font color=\"#808080\">&lt; 597430 <font color=\"#0000ff\"><font color=\"#0000ff\">OPTION <font color=\"#808080\"><font color=\"#808080\">(<font color=\"#0000ff\"><font color=\"#0000ff\">MAXDOP 1<font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\">)\n\t\t<\/p>\n<p>\n\t\t<font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\"><font size=\"2\" color=\"#000000\">But, this can be TERRIBLY bad on some machines where the IOs could be a lot faster (and where data might already be in cache). These <strong><em>specific <\/em><\/strong>numbers are exactly that &#8211; specific to this HARDWARE (and, I chose not-so-optimal HW in this case to highlight this problem). And, depending on what number you use (what if this is a parameter in sps?) you might force SQL Server to do WAY more IOs by forcing the index than allowing the tipping point to do its job. But, depending on your hardware (and\/or what you know to be in cache at the time of execution), it might be better to force an index instead of letting SQL Server choose. So, should I force the index? Be careful, if you&#39;re wrong &#8211; it could take more time and actually be slower.\n\t\t<\/p>\n<\/blockquote>\n<p>\n\tIn SQL Server 2008 &#8211; there&#39;s a new hint &#8211; FORCESEEK:\n\t<\/p>\n<blockquote>\n<p>\n\t\t&#8212; Cut\/paste error with this hint&#8230; no INDEX\n\t\t<\/p>\n<p>\n\t\t<strike>Q2: <font color=\"#0000ff\"><font color=\"#0000ff\">SELECT <font color=\"#808080\"><font color=\"#808080\">*<font color=\"#000000\">&nbsp;<font color=\"#0000ff\"><font color=\"#0000ff\">FROM<font color=\"#000000\">&nbsp;table WITH (INDEX (FORCESEEK)) <font color=\"#0000ff\"><font color=\"#0000ff\">WHERE&nbsp;colx <font color=\"#808080\"><font color=\"#808080\">&lt; 597430 <font color=\"#0000ff\"><font color=\"#0000ff\">OPTION <font color=\"#808080\"><font color=\"#808080\">(<font color=\"#0000ff\"><font color=\"#0000ff\">MAXDOP 1<font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\">) <\/strike>\n\t\t<\/p>\n<p>\n\t\tQ2: <font color=\"#0000ff\"><font color=\"#0000ff\">SELECT <font color=\"#808080\"><font color=\"#808080\">*<font color=\"#000000\">&nbsp;<font color=\"#0000ff\"><font color=\"#0000ff\">FROM<font color=\"#000000\">&nbsp;table WITH (FORCESEEK) <font color=\"#0000ff\"><font color=\"#0000ff\">WHERE&nbsp;colx <font color=\"#808080\"><font color=\"#808080\">&lt; 597430 <font color=\"#0000ff\"><font color=\"#0000ff\">OPTION <font color=\"#808080\"><font color=\"#808080\">(<font color=\"#0000ff\"><font color=\"#0000ff\">MAXDOP 1<font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\">)&nbsp; &#8212; Thanks to @SQL_Kiwi\n\t\t<\/p>\n<p>\n\t\t<font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\"><font size=\"2\" color=\"#000000\">FORCESEEK is better because it doesn&#39;t tie you to a particular index directly but it also doesn&#39;t let SQL Server tip to a table scan. However, just like forcing an index &#8211; you can be wrong!\n\t\t<\/p>\n<\/blockquote>\n<p>\n\t<font size=\"3\" color=\"#808080\"><font size=\"3\" color=\"#808080\"><font size=\"2\" color=\"#000000\">So, what should you do? It depends. If you know your data well and you do some extensive testing you <em>might<\/em> consider using a hint (there are some clever things you can do programmatically in sps, I&#39;ll try and dedicate a post to this soon). However, a much better choice (if at all possible) is to consider covering (that&#39;s really my main point :). In my queries, covering is unrealistic because my queries want all columns (the evil SELECT *) but, if your queries are narrower AND they are high-priority, you are better off with a covering index (in many cases) over a hint because an index which covers a query, never tips.\n\t<\/p>\n<p>\n\tThat&#39;s the answer to the puzzle for now but there&#39;s definitely a lot more to dive into. The Tipping Point can be a very good thing &#8211; and it usually works well. But, if you&#39;re finding that you can force an index and get better performance you might want to do some investigating and see if it&#39;s this. Then consider how likely a hint is to help and now you know where you can focus.\n\t<\/p>\n<p>\n\tThanks for reading,<br \/>\n\tkt\n\t<\/p>\n<\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>OK, I&#39;ll definitely take a beating from all of you for having gone so long between my survey posts and now. I won&#39;t even go into the details but between some crazy work schedules, multiple sinus problems and even migraines&#8230; well, I&#39;ve been a bit behind. Let&#39;s just say that April\/May were rough at best. [&hellip;]<\/p>\n","protected":false},"author":7,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[48,65,66,77],"tags":[],"class_list":["post-533","post","type-post","status-publish","format-standard","hentry","category-nonclustered-indexes","category-sql-server-2005","category-sql-server-2008","category-the-tipping-point"],"_links":{"self":[{"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/posts\/533","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/comments?post=533"}],"version-history":[{"count":0,"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/posts\/533\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/media?parent=533"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/categories?post=533"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sqlskills.com\/blogs\/kimberly\/wp-json\/wp\/v2\/tags?post=533"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}