sqlskills-logo-2015-white.png

Auto-Update Stats Default Sampling Test

In this post I’ll just share the results of a simple set of tests regarding automatic-update sampling.  Nothing fancy – I just populated the category table from the Credit sample database with varying row counts and then took note of the associated duration and sampling percent.  The query I used to kick of auto-updates after the row inserts was the following:

    
SELECT  [category_no],
[category_desc],
[category_code]
FROM [dbo].[category]
WHERE [category_desc] = 'Meals'
OPTION (RECOMPILE);

I also show the StatMan MAXDOP just for kicks (once true sampling kicks in, it is set to MAXDOP “1”, so not terribly interesting). I executed a few test iterations beyond what I’m showing here, but I just include two test runs per scenario just for comparison:

Test ID Row Count Data Pages StatMan Duration (ms) TABLESAMPLE PERCENT MAXDOP
1 10 1 0 100.0000 16
2 10 1 0 100.0000 16
1 1,000,000 3,228 1228 34.6964 1
2 1,000,000 3,228 1184 34.6964 1
1 2,000,000 6,455 1297 18.0480 1
2 2,000,000 6,455 1248 18.0480 1
1 10,000,000 32,275 1287 4.2324 1
2 10,000,000 32,275 1171 4.2324 1
1 20,000,000 64,550 1481 2.3625 1
2 20,000,000 64,550 1489 2.3625 1
1 40,000,000 129,103 2071 1.3617 1
2 40,000,000 129,103 1890 1.3617 1
1 80,000,000 258,210 2487 0.8129 1
2 80,000,000 258,210 2301 0.8129 1

A few comments:

  • Notice the sample percent is identical between identical row/page tests (each test started with a restore to the same base state)
  • Notice that as the data pages grow, the duration of the stats update doesn’t grow significantly, and then consider the sample percentage multiplied by the total data page value (for example – ~1757 data pages for 40M vs. ~2098 data pages for 80M)
  • Duration is just based on my laptop and I was more interested in the relative duration across test runs

Now what happens if I rebuild the index with a low fill factor? (Just doing this on the 80M row version):

    

USE [Credit]
GO
ALTER INDEX [category_ident] ON [dbo].[category]
REBUILD PARTITION = ALL
WITH (FILLFACTOR = 10);
GO

I’ll do a non-updating update so that I can get the threshold for the column updates incremented and thus kick off an auto-update of statistics:

  
UPDATE TOP (16000500) [category]
SET [category_desc] = [category_desc];
GO

Here are the results (Test ID 3 in bold):

Test ID Row Count Data Pages StatMan Duration (ms) TABLESAMPLE PERCENT MAXDOP
1 80,000,000 258,210 2487 0.8129 1
2 80,000,000 258,210 2301 0.8129 1
3 80,000,000 2,543,516 1399 0.4568 1

Same row count, significantly more pages, lower duration (interesting) and a reduced table sample percent – but if you do the math, more pages than the previous samplings. Lots of other fun things we could try, but that’s all for now…

2 thoughts on “Auto-Update Stats Default Sampling Test

  1. Joe,

    When the sampling kicks in, am I correct in thinking there is no way of influencing the sampling algirthm. For example, if you have billion row tables with highly varaible and slwed data, 240 histogram steps does not go very far. So if you know that there are values of importance iit would be nice for the step boundaries to fall on these popular values. I suspect that if you want to do this filtered statistics and partitioning is the way to go ?, is this correct ?.

    Regards,

    Chris

  2. Hi Chris,

    When auto-update statistics kicks in, you are correct in that we cannot influence the sampling algorithm. Regarding steps – 200 sometimes does not go very far (and it doesn’t even need to be a very large table for this to be the case).

    Filtered stats can be helpful, but if you are using parameterization or other techniques where the qualifying value is not known at runtime, those filtered stats may not actually be used. Regarding your comment on partitioning, this isn’t inherently helpful in this case, unless you are using in conjunction with filtered stats, which is what I think you were referencing.

    Another option is to see if manual updates of stats with higher sampling percentages may prove helpful. Sometimes it can, but you need to test it out.

    Best Regards,

    Joe

Comments are closed.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.