Selectivity Guesses in absence of Statistics

Let’s say you have a heap table with 1,000,000 rows in it. Let’s also say that your automatic creation of statistics are disabled, as well as updates to the statistics (and in this scenario, there are NO existing indexes or statistics).

What kind of selectivity guess would the optimizer make for a query like the following? (I copied over the FactInternetSales table with no indexes and put 1,000,000 rows in it – tested on SQL Server 2012, 11.0.2316)

SELECT SalesOrderNumber
FROM dbo.FactInternetSales

WHERE ProductKey = 537;

First of all, the actual plan on my system was as follows:

If I look at the properties of the table scan, I see the actual number of rows was 23,042 versus the estimated of 31,622.8. (As an aside – execute SELECT POWER(1000000.0,.75) and you’ll see how the estimated number of rows was derived). I also see that the table cardinality is known (and correct) at 1,000,000.

What if I have a predicate referencing another column, like OrderDateKey? This too results in an estimate of 31,622.8 rows.

The following table shows a sampling of various queries and associated search conditions, along with their associated row estimates:

Query	Estimated Rows	Percentage (based on 1,000,000 rows)
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey = 537;	31,622.8	3.16228%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE OrderDateKey = 20040224;	31,622.8	3.16228%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey BETWEEN 537 AND 600;	90,000	9%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey BETWEEN 537 AND 537;	90,000	9%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE TotalProductCost < 100.00;	300,000	30%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE TotalProductCost > 100.00;	300,000	30%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE TotalProductCost > 100.00 AND TotalProductCost < 200.00;	90,000	9%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE TotalProductCost >= 100.00 AND TotalProductCost <= 200.00;	90,000	9%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE TotalProductCost LIKE 100.00;	100,000	10%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE SalesOrderNumber LIKE ‘%A%’;	539,232	53.9232%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE SalesOrderNumber LIKE ‘A%’;	269,616	26.9616%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE SalesOrderNumber LIKE ‘%A’;	269,616	26.9616%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE TotalProductCost > (ProductStandardCost + 10.00);	300,000	30%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey IN (1);	31,622.8	3.16228%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey IN (1,2);	62,245.6	6.22456%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey = 1 OR ProductKey = 2;	62,245.6	6.22456%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey IN (1,2,3);	91,900	9.19%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey = 1 OR ProductKey = 2 OR ProductKey = 3;	91,900	9.19%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey = 1 AND ProductKey = 2 AND ProductKey = 3;	10,746.1	1.07461%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey = 1 AND ProductKey = 2;	13,335.2	1.33352%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE (TaxAmt * UnitPrice) > 10.00;	300,000	30%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey NOT IN (1);	968,377	96.8377%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey <> 1;	968,377	96.8377%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey NOT IN (1,2);	13,335.2	1.33352%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey NOT IN (1,2,3);	10746.1	1.07461%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey <> 1 AND ProductKey <> 2 AND ProductKey <> 3;	10746.1	1.07461%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE EXISTS (SELECT 1);	1,000,000	100%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE NOT EXISTS (SELECT 1);	0	0%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey IS NOT NULL;	1,000,000	100%
SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey IS NULL;	0	0%
ALTER TABLE dbo.FactInternetSales ALTER COLUMN ProductKey int NULL; SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey IS NOT NULL;	968,377	96.8377%
ALTER TABLE dbo.FactInternetSales ALTER COLUMN ProductKey int NULL; SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey IS NULL;	31,622.8	3.16228%

There are numerous other scenarios I could have put in this table, but you probably get the point by now that the search conditions help define the which “guess” calculation is used in absence of statistics. When there are additional details beyond table cardinality, such as the nullability of a column or constraints, SQL Server can often leverage it accordingly.

4 thoughts on “Selectivity Guesses in absence of Statistics”

tobi says:

May 10, 2012 at 9:22 pm

This table could be used to construct "cardinality hints" by introducing useless predicates which alter misestimated cardinality. A nasty hack, but it seems nicer that join hints which cause side-effects like forcing join order.
Chandan says:

May 20, 2012 at 10:05 pm

Thanks for writing this Joe. Does it mean that in a table which is a heap and with no -nonclustered index, the actual row-counts guessed by the optimizer while compiling a plan will always be wrong?

What about a heap with some non-clustered index on some columns?

Thanks
Chandan
joe says:

May 21, 2012 at 7:06 am

Hi Chandan,

If auto-create / auto-update statistics are enabled (which most often they should be) – the guesses don’t come in to play (for this scenario at least). Basically the guesses come into play when there are no stats involved (whether stats objects or stats associated with an index).

Cheers

Comments are closed.

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.

Selectivity Guesses in absence of Statistics

4 thoughts on “Selectivity Guesses in absence of Statistics”

Other articles

New Course: “SQL Server: Common Query Tuning Problems and Solutions – Part 2″

New Course: “SQL Server: Common Query Tuning Problems and Solutions – Part 1”

SQLIntersection Post-Conference Session

Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator

Finding Timeout-Prone Code with max_elapsed_time

Deck from “Practical SQL Server Cardinality Estimation”

Imagine feeling confident enough to handle whatever your database throws at you.