In this post I’ll just share the results of a simple set of tests regarding automatic-update sampling. Nothing fancy – I just populated the category table from the Credit sample database with varying row counts and then took note of the associated duration and sampling percent. The query I used to kick of auto-updates after the row inserts was the following:
1 2 3 4 5 6 | SELECT [category_no], [category_desc], [category_code] FROM [dbo].[category] WHERE [category_desc] = 'Meals' OPTION (RECOMPILE); |
I also show the StatMan MAXDOP just for kicks (once true sampling kicks in, it is set to MAXDOP “1”, so not terribly interesting). I executed a few test iterations beyond what I’m showing here, but I just include two test runs per scenario just for comparison:
Test ID | Row Count | Data Pages | StatMan Duration (ms) | TABLESAMPLE PERCENT | MAXDOP |
1 | 10 | 1 | 0 | 100.0000 | 16 |
2 | 10 | 1 | 0 | 100.0000 | 16 |
1 | 1,000,000 | 3,228 | 1228 | 34.6964 | 1 |
2 | 1,000,000 | 3,228 | 1184 | 34.6964 | 1 |
1 | 2,000,000 | 6,455 | 1297 | 18.0480 | 1 |
2 | 2,000,000 | 6,455 | 1248 | 18.0480 | 1 |
1 | 10,000,000 | 32,275 | 1287 | 4.2324 | 1 |
2 | 10,000,000 | 32,275 | 1171 | 4.2324 | 1 |
1 | 20,000,000 | 64,550 | 1481 | 2.3625 | 1 |
2 | 20,000,000 | 64,550 | 1489 | 2.3625 | 1 |
1 | 40,000,000 | 129,103 | 2071 | 1.3617 | 1 |
2 | 40,000,000 | 129,103 | 1890 | 1.3617 | 1 |
1 | 80,000,000 | 258,210 | 2487 | 0.8129 | 1 |
2 | 80,000,000 | 258,210 | 2301 | 0.8129 | 1 |
A few comments:
- Notice the sample percent is identical between identical row/page tests (each test started with a restore to the same base state)
- Notice that as the data pages grow, the duration of the stats update doesn’t grow significantly, and then consider the sample percentage multiplied by the total data page value (for example – ~1757 data pages for 40M vs. ~2098 data pages for 80M)
- Duration is just based on my laptop and I was more interested in the relative duration across test runs
Now what happens if I rebuild the index with a low fill factor? (Just doing this on the 80M row version):
1 2 3 4 5 6 | USE [Credit] GO ALTER INDEX [category_ident] ON [dbo].[category] REBUILD PARTITION = ALL WITH (FILLFACTOR = 10); GO |
I’ll do a non-updating update so that I can get the threshold for the column updates incremented and thus kick off an auto-update of statistics:
1 2 3 | UPDATE TOP (16000500) [category] SET [category_desc] = [category_desc]; GO |
Here are the results (Test ID 3 in bold):
Test ID | Row Count | Data Pages | StatMan Duration (ms) | TABLESAMPLE PERCENT | MAXDOP |
1 | 80,000,000 | 258,210 | 2487 | 0.8129 | 1 |
2 | 80,000,000 | 258,210 | 2301 | 0.8129 | 1 |
3 | 80,000,000 | 2,543,516 | 1399 | 0.4568 | 1 |
Same row count, significantly more pages, lower duration (interesting) and a reduced table sample percent – but if you do the math, more pages than the previous samplings. Lots of other fun things we could try, but that’s all for now…
2 thoughts on “Auto-Update Stats Default Sampling Test”
Joe,
When the sampling kicks in, am I correct in thinking there is no way of influencing the sampling algirthm. For example, if you have billion row tables with highly varaible and slwed data, 240 histogram steps does not go very far. So if you know that there are values of importance iit would be nice for the step boundaries to fall on these popular values. I suspect that if you want to do this filtered statistics and partitioning is the way to go ?, is this correct ?.
Regards,
Chris
Hi Chris,
When auto-update statistics kicks in, you are correct in that we cannot influence the sampling algorithm. Regarding steps – 200 sometimes does not go very far (and it doesn’t even need to be a very large table for this to be the case).
Filtered stats can be helpful, but if you are using parameterization or other techniques where the qualifying value is not known at runtime, those filtered stats may not actually be used. Regarding your comment on partitioning, this isn’t inherently helpful in this case, unless you are using in conjunction with filtered stats, which is what I think you were referencing.
Another option is to see if manual updates of stats with higher sampling percentages may prove helpful. Sometimes it can, but you need to test it out.
Best Regards,
Joe
Comments are closed.