Auto-Update Stats Default Sampling Test

In this post I’ll just share the results of a simple set of tests regarding automatic-update sampling. Nothing fancy – I just populated the category table from the Credit sample database with varying row counts and then took note of the associated duration and sampling percent. The query I used to kick of auto-updates after the row inserts was the following:

    
SELECT  [category_no],
[category_desc],
[category_code]
FROM [dbo].[category]
WHERE [category_desc] = 'Meals'
OPTION (RECOMPILE);

I also show the StatMan MAXDOP just for kicks (once true sampling kicks in, it is set to MAXDOP “1”, so not terribly interesting). I executed a few test iterations beyond what I’m showing here, but I just include two test runs per scenario just for comparison:

Test ID	Row Count	Data Pages	StatMan Duration (ms)	TABLESAMPLE PERCENT	MAXDOP
1	10	1	0	100.0000	16
2	10	1	0	100.0000	16
1	1,000,000	3,228	1228	34.6964	1
2	1,000,000	3,228	1184	34.6964	1
1	2,000,000	6,455	1297	18.0480	1
2	2,000,000	6,455	1248	18.0480	1
1	10,000,000	32,275	1287	4.2324	1
2	10,000,000	32,275	1171	4.2324	1
1	20,000,000	64,550	1481	2.3625	1
2	20,000,000	64,550	1489	2.3625	1
1	40,000,000	129,103	2071	1.3617	1
2	40,000,000	129,103	1890	1.3617	1
1	80,000,000	258,210	2487	0.8129	1
2	80,000,000	258,210	2301	0.8129	1

A few comments:

Notice the sample percent is identical between identical row/page tests (each test started with a restore to the same base state)
Notice that as the data pages grow, the duration of the stats update doesn’t grow significantly, and then consider the sample percentage multiplied by the total data page value (for example – ~1757 data pages for 40M vs. ~2098 data pages for 80M)
Duration is just based on my laptop and I was more interested in the relative duration across test runs

Now what happens if I rebuild the index with a low fill factor? (Just doing this on the 80M row version):

    

USE [Credit]
GO
ALTER INDEX [category_ident] ON [dbo].[category]
REBUILD PARTITION = ALL
WITH (FILLFACTOR = 10);
GO

I’ll do a non-updating update so that I can get the threshold for the column updates incremented and thus kick off an auto-update of statistics:

  
UPDATE TOP (16000500) [category]
SET [category_desc] = [category_desc];
GO

Here are the results (Test ID 3 in bold):

Test ID	Row Count	Data Pages	StatMan Duration (ms)	TABLESAMPLE PERCENT	MAXDOP
1	80,000,000	258,210	2487	0.8129	1
2	80,000,000	258,210	2301	0.8129	1
3	80,000,000	2,543,516	1399	0.4568	1

Same row count, significantly more pages, lower duration (interesting) and a reduced table sample percent – but if you do the math, more pages than the previous samplings. Lots of other fun things we could try, but that’s all for now…

2 thoughts on “Auto-Update Stats Default Sampling Test”

Chris Adkin says:

March 13, 2013 at 5:46 am

Joe,

When the sampling kicks in, am I correct in thinking there is no way of influencing the sampling algirthm. For example, if you have billion row tables with highly varaible and slwed data, 240 histogram steps does not go very far. So if you know that there are values of importance iit would be nice for the step boundaries to fall on these popular values. I suspect that if you want to do this filtered statistics and partitioning is the way to go ?, is this correct ?.

Regards,

Chris
Joseph Sack says:

March 13, 2013 at 5:53 am

Hi Chris,

When auto-update statistics kicks in, you are correct in that we cannot influence the sampling algorithm. Regarding steps – 200 sometimes does not go very far (and it doesn’t even need to be a very large table for this to be the case).

Filtered stats can be helpful, but if you are using parameterization or other techniques where the qualifying value is not known at runtime, those filtered stats may not actually be used. Regarding your comment on partitioning, this isn’t inherently helpful in this case, unless you are using in conjunction with filtered stats, which is what I think you were referencing.

Another option is to see if manual updates of stats with higher sampling percentages may prove helpful. Sometimes it can, but you need to test it out.

Best Regards,

Joe

Comments are closed.

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.

Auto-Update Stats Default Sampling Test

2 thoughts on “Auto-Update Stats Default Sampling Test”

Other articles

New Course: “SQL Server: Common Query Tuning Problems and Solutions – Part 2″

New Course: “SQL Server: Common Query Tuning Problems and Solutions – Part 1”

SQLIntersection Post-Conference Session

Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator

Finding Timeout-Prone Code with max_elapsed_time

Deck from “Practical SQL Server Cardinality Estimation”

Imagine feeling confident enough to handle whatever your database throws at you.