The Tipping Point Query Answers

OK, I'll definitely take a beating from all of you for having gone so long between my survey posts and now. I won't even go into the details but between some crazy work schedules, multiple sinus problems and even migraines… well, I've been a bit behind. Let's just say that April/May were rough at best. I'm feeling better and well, now I'm trying to catch up. I had really gotten the blogging bug in March but I completely lost it in April. But, this tipping point series is in dire need of lots of explaining so I'm really hoping to get a few posts done in this area for sure!

First, I started the discussion around this in a few surveys:

Survey/Question 1

Q1 was described as this: if a table has 1 million rows at 20 rows per page (50,000 pages), at what percentage (roughly) of the data would a nonclustered index no longer be used. Blogged here. Here's what the survey said as of today:

And, for Q1 the correct result (Between 0-2% of the rows) is actually the best result (but, by no means the overwhelming majority at only 28%). However, often people just "think" the answer is very small. So… I did a few more questions/surveys. 

Survey/Question 2

Q2 was described as this: if a table has 1 million rows at 100 rows per page (10,000 pages), at what percentage (roughly) of the data would a nonclustered index no longer be used. Blogged here. Here's what the survey said as of today:

And, for Q2 the correct result (Less than .5% of the rows) is actually at a tie for the best (but, again, even a small percentage at only 22%). Again, often people just "think" the answer is very small. So… I did one more question/survey. 

Survey/Question 3

Q3 was described as this: if a table has 1 million rows at 2 rows per page (500,000 pages), at what percentage (roughly) of the data would a nonclustered index no longer be used. Blogged here. Here's what the survey said as of today:

And, for Q3 the correct result (Between 10-20% of the rows) is actually NOT the highest answer. And, this is even more convincing that there's confusion around what's going on and why.

The Tipping Point

What is the tipping point?

It's the point where the number of rows returned is "no longer selective enough". SQL Server chooses NOT to use the nonclustered index to look up the corresponding data rows and instead performs a table scan.

When does the tipping point occur?

It depends… it's MOSTLY tied to the number of pages in the table. Generally, around 30% of the number of PAGES in the table is about where the tipping point occurs. However, parallelism, some server settings (processor affinity and I/O affinity), memory and table size – all can have an impact. And, since it can vary – I typically estimate somewhere between 25% and 33% as a rough tipping point (and, you'll see from a bunch of my examples, that number is not EXACT). Then, I translate that into rows.

Math for Tipping Point Query 3: If a table has 500,000 pages then 25% = 125,000 and 33% = 166,000. So, somewhere between 125,000 and 166,000 ROWS the query will tip. Turning that into a percentage 125,000/1million = 12.5% and 166,000/1million = 16.6%. So, if a table has 500,000 pages (and 1 million rows) then queries that return less than 12.5% of the data are likely to USE the nonclustered index to lookup the data and queries over 16.6% of the data are LIKELY to use a table scan. For this table, that percentage seems "reasonable". But, most of us say that the tipping point happens at a much lower percentage… why? Because row size – which determines table size (and therefore pages) is really what has the greatest impact. So, let's look at Tipping Point Query 2… 

Math for Tipping Point Query 2: If a table has 10,000 pages then 25% = 2,500 and 33% = 3,333. So, somewhere between 2,500 and 3,333 ROWS the query will tip. Turning that into a percentage 2,500/1million = .25% and 3,333/1million = .33% (not even 1%). So, if a table has only 10,000 pages (and 1 million rows) then queries that return less than a quarter of 1% of the data are likely to USE the nonclustered index to lookup the data and queries over one third of one percent are LIKELY to use a table scan. For this table, that percentage seems really low BUT, at the same time it makes sense (to a point) that a small table would be scanned… but, for less than 1%. 1% is NOT selective enough. For small tables, it might not matter all that much (they're small, they fit in cache, etc.) but for bigger tables – it might be a big performance problem. 

Math for Tipping Point Query 1: If a table has 50,000 pages then 25% = 12,500 and 33% = 16,666. So, somewhere between 12,500 and 16,666 ROWS the query will tip. Turning that into a percentage 12,500/1million = 1.25% and 16,666/1million = 1.66% (under 2%). So, if a table has 50,000 pages (and 1 million rows) then queries that return less than 1.25% of the data are likely to USE the nonclustered index to lookup the data and queries over 1.66% are LIKELY to use a table scan. Again, this seems like a low number. Again, for small tables, it might not matter all that much (they're small, they fit in cache, etc.) but as tables get larger and larger – it CAN be a big performance problem. 

Why is the tipping point interesting?

  • It shows that narrow (non-covering) nonclustered indexes have fewer uses than often expected (just because a query has a column in the WHERE clause doesn't mean that SQL Server's going to use that index)
  • It happens at a point that's typically MUCH earlier than expected… and, in fact, sometimes this is a VERY bad thing!
  • Only nonclustered indexes that do not cover a query have a tipping point. Covering indexes don't have this same issue (which further proves why they're so important for performance tuning)
  • You might find larger tables/queries performing table scans when in fact, it might be better to use a nonclustered index. How do you know, how do you test, how do you hint and/or force… and, is that a good thing?

Real example of an interesting tipping point

Earlier today, I went on facebook and twitter and gave the following information – very vaguely – and I asked "why" is Q2 so much slower than Q1 if Q2 returns only 10 more rows. Same table and no hints (other than MAXDOP)…

Q1: SELECT FROM table WHERE colx < 597420 OPTION (MAXDOP 1)

  • returns 197,419 rows
  • takes 116,031 ms (1 minute, 52 seconds)
  • 1,197,700 logical reads, 5 physical reads, 137,861 read-ahead reads
  • 7,562 ms CPU time

    Q2: SELECT * FROM table WHERE colx < 597430 OPTION (MAXDOP 1)

  • returns 197,429 rows
  • takes 244,094 ms (4 minutes, 4 seconds)
  • 801,685 logical reads, 1410 physical reads, 801,678 read-ahead reads
  • 9,188 ms CPU time

There were lots of great guesses… but, it's the tipping point. SQL Server chose to "tip" the second query because it was "over the line". But, it's important to realize that there are cases when that's NOT a good idea. And, what are your options?

In SQL Server 2005 – the only option is to force the nonclustered index to be used:

Q2: SELECT * FROM table WITH (INDEX (NCInd)) WHERE colx < 597430 OPTION (MAXDOP 1)

But, this can be TERRIBLY bad on some machines where the IOs could be a lot faster (and where data might already be in cache). These specific numbers are exactly that – specific to this HARDWARE (and, I chose not-so-optimal HW in this case to highlight this problem). And, depending on what number you use (what if this is a parameter in sps?) you might force SQL Server to do WAY more IOs by forcing the index than allowing the tipping point to do its job. But, depending on your hardware (and/or what you know to be in cache at the time of execution), it might be better to force an index instead of letting SQL Server choose. So, should I force the index? Be careful, if you're wrong – it could take more time and actually be slower.

In SQL Server 2008 – there's a new hint – FORCESEEK:

— Cut/paste error with this hint… no INDEX

Q2: SELECT * FROM table WITH (INDEX (FORCESEEK)) WHERE colx < 597430 OPTION (MAXDOP 1)

Q2: SELECT * FROM table WITH (FORCESEEK) WHERE colx < 597430 OPTION (MAXDOP 1)  — Thanks to @SQL_Kiwi

FORCESEEK is better because it doesn't tie you to a particular index directly but it also doesn't let SQL Server tip to a table scan. However, just like forcing an index – you can be wrong!

So, what should you do? It depends. If you know your data well and you do some extensive testing you might consider using a hint (there are some clever things you can do programmatically in sps, I'll try and dedicate a post to this soon). However, a much better choice (if at all possible) is to consider covering (that's really my main point :). In my queries, covering is unrealistic because my queries want all columns (the evil SELECT *) but, if your queries are narrower AND they are high-priority, you are better off with a covering index (in many cases) over a hint because an index which covers a query, never tips.

That's the answer to the puzzle for now but there's definitely a lot more to dive into. The Tipping Point can be a very good thing – and it usually works well. But, if you're finding that you can force an index and get better performance you might want to do some investigating and see if it's this. Then consider how likely a hint is to help and now you know where you can focus.

Thanks for reading,
kt

32 thoughts on “The Tipping Point Query Answers

  1. Well done – thanks for a insightful post!

    I didn’t even know non-clustered indexes were utilitised (or not) this way. I assume a tool like Database Engine Tuning Advisor takes into account the "tipping point" when analysing a work load to suggest indexes?

  2. Very interesting, and very informative. I believe this explains behavior that we have been seeing. I hadn’t thought to examine row size…

  3. You said for Query 3: If a table has 500,000 pages(with two rows per page) then 25% = 125,000 and 33% = 166,000. So, somewhere between 125,000(I think this should be 250,000) and 166,000(I think this should be 332,000) ROWS the query will tip because there are two rows per page.

    Correct me if I am getting it wrong.

  4. I can’t see the name associated with comment #6 but the math is always based on a fraction of the PAGES but then converted into rows. So, for Query 3 we start with 500,000 pages (so 25% = 125,000 and 33% = 166,000). So, when we translate that into rows we keep THOSE numbers 125,000-166,000 and see how that translates into percentages of the table which is 12.5-16.6%.

    Cheers!
    kt

  5. Great post. Thanks.

    But I still want to continue #6 comment and your reply. If the tipping point is 30% of the PAGES, what is the reasom for keeping THOSE numbers and translate it to rows? why not to calculate the number of rows in this 30% of PAGES? what is the rational of keeping the same percentage. Just need to understand.

    Thanks.

  6. It’s the point where the number of rows returned is "no longer selective enough". SQL Server chooses NOT to use the nonclustered index to look up the corresponding data rows and instead performs a table scan.

  7. Thanks for such an in-depth explanation that was actually intelligible and understandable. Our programmers were trying to explain the basis of this to me last week, but for the life of me I just couldn’t understand what in goodness sakes they were talking about!

  8. OK, for years I’ve been saying that SQL Server doesn’t care about the order in which you define the columns of your table because internally SQL Server will re-arrange your columns to store all of the fixed width columns first and the variable columns last. In both the fixed-width portion of the row as well as the variable-width portion of the row, the columns are defined in the order in which they are declared. So, what does matter?

  9. Hi Kimberly.

    As with Manu and Aviel’s comments, I don’t understand the leap from the percentage of the number of pages to saying that equates to the number of rows. In your questions you state the number of rows per page but then when you start doing the math you seem to be implying there is only one row per page. In answer to Manu’s question you said that “the math is always based on a fraction of the PAGES but then converted into rows” but you don’t explain why that is.

    It’s a bit like that Sidney Harris cartoon “Then a miracle occurs”.

    Cheers
    Simon

    1. Hey there Simon – Actually, I think people get confused with this one because they’re making it harder than it is… Simply put, the access pattern for a bookmark lookup (the process of getting the row’s clustering key value from the nonclustered index seek) produces random [page] I/Os in the data (the clustered index structure). This is how we determine how many I/Os it will take before it’s a bad idea. Based solely on the math – if the table has 10,000 pages then somewhere between 2500-3333 RANDOM I/Os and we’d have been better off just doing a table scan. So, these are RANDOM I/Os tied to the fact that there’s no correlation between the nonclustered order (something like Social Security Number) and the clustered index order (something like EmployeeID). So… it’s that number of ROWS which will cause the random I/Os. We use pages to see how many pages it takes but that’s actually the number of [row] lookups.

      Hope that helps!
      kt

      1. Seems Simon is right, the math seems flawed.

        ” Generally, around 30% of the number of PAGES in the table is about where the tipping point occurs.”

        “If a table has 500,000 pages then 25% = 125,000 and 33% = 166,000. So, somewhere between 125,000 and 166,000 ROWS the query will tip.”

        No, somewhere between 125,000 and 166,000 PAGES is where the tipping point interval is defined. Unless 1 Page = 1 Row.
        Extra proof is recalculation of your conclusion:

        “So, if a table has 500,000 pages (and 1 million rows) then queries that return less than 12.5% of the data are likely to USE the nonclustered index to lookup the data and queries over 16.6% of the data are LIKELY to use a table scan.”

        So 12.5 % of 1,000,000 rows = 125,000 rows. Giving 1 page = 2 rows, there are 125,000/2 = 62,500 pages als tippingpoint. This is half of your original statement of what the tipping point was.

        1. Actually, the math is absolutely correct. YES – the expectation is that each page access driven from a nonclustered index TO the clustered index will be on a DIFFERENT page. That’s exactly the point as there’s no correlation between a nonclustered seek (something like Lastname) and a clustered index on ID (something like employee ID). So, the calculation is determined from looking at PAGES in the table. Then, you convert that number directly into rows because those rows are statistically expected to be on different pages. Even if there is a correlation – SQL Server won’t know of it.

          Hope that helps – but, alas, the math/numbers/calculations are correct!
          k

          1. Ok, I checked a bit deeper:
            1 million rows at 2 rows per page (500,000 pages), Tiping point interval 25%-30%. What is the corresponding number of rows?

            25% = 125,000 pages. You need between 62,500 and 125,000 rows to access so many pages at 1 pages=2 rows.

            33% = 166,000 pages. You need between 83,000 and 166,000 rows to access so many pages at 1 pages=2 rows.

            So, somewhere between (62,500 – 125,000) and (83,000 – 166,000) ROWS the query will tip. It would be more correct to
            get the middle of the interval. So I would rewrite the statement

            “So, somewhere between 125,000 and 166,000 ROWS the query will tip.”

            to

            “So, somewhere between 93,750 and 124,500 ROWS the query will tip.”
            Turned into percentage: Between 9% and 12%.

          2. May i switch to example #2 in order to show a more pronounced difference between our math:

            1 million rows at 100 rows per page (10,000 pages),
            Tiping point interval 25%-30%. What is the corresponding number of rows?

            25% of 10,000 pages = 2,500 pages. You need between 2,500 – 250,000 rows to access that many pages at 1 page = 100 rows
            33% of 10,000 pages = 3,300 pages. You need between 3,300 – 330,000 rows to access that many pages at 1 page = 100 rows

            tipping point interval (2,500 – 250,000) and (3,300 – 330,000)

            tipping interval (taking the middle value): 126,250 rows – 166,650 rows. Into percentage: 12,6% – 16,6 %. Your answer was ‘Less than .5% of the rows’.

          3. Hey there Willem – this is a bit of a challenging discussion to have over blog comments but I assure you that while there COULD be 100 rows that you’re interest ALL on one page and that there COULD be a correlation, there is NO WAY for SQL Server to know that. The ONLY assumption that SQL Server will make is that the rows are on DIFFERENT pages. So, back to the math…

            If a table has 10,000 PAGES then the SOLE number that defines the tipping point is 25-33% of THAT (and only that) – which is 2500-3333. When you convert that into lookups from the nonclustered to the clustered that 2500-3333 ROWS. As a percentage 2500/1000000 = .25% to 3333/1000000 = .33 which is LESS THAN .5%.

            Having said that – both the math and the answer are correct.

            Getting back to the point though – it’s NOT the math that’s important it’s the “usefulness” of narrow nonclustered indexes. If a nonclustered index does NOT cover a query, then it loses its usefulness fairly early. It’s not really the math that’s important – it’s the concept that putting indexes on columns that are in your WHERE clauses is “good enough” for performance. Often, a nonclustered is NOT useful.

            Go ahead and try this against a few of your larger tables. It WILL work exactly as I described (if the choices are a narrow non-covering nonclustered vs. a [clustered] table scan).

            Hope that helps!
            k

          4. Thank you for your time and explanation. The fact however that SQL Server doesn’t know the correlation between the nonclustered seek and clustered index wasn’t obvious to me. In Oracle, there is the ‘Clustering Factor’ for this correlation. So if the CLUSTERING_FACTOR approaches the
            number of blocks in the table, the rows are ordered. If it approaches the number of rows
            in the table, the rows are randomly ordered. But Oracle uses heap tables by default, and in the SQL Server world, where the non-clustered indexes refer to a surrogate clustered key, the aforemantioned correlation can safely assumed to be absent. Thanks again,
            Willem

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

SQLskills SQL101: Partitioning

Continuing on our path to understanding the basics and core concepts, there’s a big topic that’s often greatly misunderstood and that’s partitioning. I know I’m

Explore

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.