In my post Row and Page Count Influence on Estimated CPU and I/O Cost I demonstrated how I/O cost of a clustered index scan had sensitivity to page counts but not row counts. For this post I’ll lay out a direct connection between data page counts and the query execution plan that gets generated.
To illustrate the scenario, I created a table in the Credit database based on the charge table and I added two indexes, one clustered and one nonclustered:
USE [Credit]; GO SELECT TOP 575000 [charge_no], [member_no], [provider_no], [category_no], [charge_dt], [charge_amt], [statement_no], [charge_code] INTO [dbo].[charge_demo] FROM [dbo].[charge]; GO CREATE CLUSTERED INDEX [charge_demo_charge_no] ON [dbo].[charge_demo] ([charge_no]); GO CREATE NONCLUSTERED INDEX [charge_demo_charge_amt] ON [dbo].[charge_demo] ([charge_amt]) INCLUDE ([member_no]) WITH (FILLFACTOR = 100); GO
Next, I checked the data page counts by index for this new 575,000 row table:
SELECT [index_id], [in_row_data_page_count] FROM [sys].[dm_db_partition_stats] WHERE [object_id] = OBJECT_ID('dbo.charge_demo'); GO
The clustered index has 3,426 data pages and the nonclustered index has 1,567 data pages.
Next I looked at the execution plan for the following query:
SELECT [member_no], SUM([charge_amt]) AS [charge_amt] FROM [dbo].[charge_demo] WHERE [charge_amt] > 0 GROUP BY [member_no] OPTION (RECOMPILE); GO
The query execution plan (via SQL Sentry Plan Explorer) was as follows:
The overall estimated subtree cost for the plan ended up being 4.6168.
Next, I rebuilt the nonclustered index using a very low fill factor (far lower than I would ever recommend, but I was doing this to demonstrate the placement of the same number of rows over many more pages than the original default fill factor):
CREATE NONCLUSTERED INDEX [charge_demo_charge_amt] ON [dbo].[charge_demo] ([charge_amt]) INCLUDE ([member_no]) WITH (FILLFACTOR = 1, DROP_EXISTING = ON); GO
The clustered index still has 3,426 data pages (since we didn’t change it), but now the nonclustered index has 143,753 data pages instead of the original 1,567 data pages. And again, this represents the same 575,000 row count. Re-executing the original test query, I saw the following changed plan:
The overall estimated subtree cost for the plan increased to 54.3065 with a few other significant changes as well. The second plan switched to using a clustered index scan instead of a nonclustered index seek. Also, the second plan uses a stream aggregate with an “injected” sort operation, instead of the original plan’s hash match aggregate operation.