sqlskills-logo-2015-white.png

Selectivity Guesses in absence of Statistics

Let’s say you have a heap table with 1,000,000 rows in it.  Let’s also say that your automatic creation of statistics are disabled, as well as updates to the statistics (and in this scenario, there are NO existing indexes or statistics).

What kind of selectivity guess would the optimizer make for a query like the following? (I copied over the FactInternetSales table with no indexes and put 1,000,000 rows in it – tested on SQL Server 2012, 11.0.2316)

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey = 537;

First of all, the actual plan on my system was as follows:

SNAGHTML1904f4f

If I look at the properties of the table scan, I see the actual number of rows was 23,042 versus the estimated of 31,622.8.  (As an aside – execute SELECT POWER(1000000.0,.75) and you’ll see how the estimated number of rows was derived).  I also see that the table cardinality is known (and correct) at 1,000,000.

SNAGHTML19c0871

What if I have a predicate referencing another column, like OrderDateKey?  This too results in an estimate of 31,622.8 rows.

The following table shows a sampling of various queries and associated search conditions, along with their associated row estimates:

Query Estimated Rows Percentage (based on 1,000,000 rows)
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey = 537;

31,622.8 3.16228%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE OrderDateKey = 20040224;

31,622.8 3.16228%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey BETWEEN 537 AND 600;

90,000 9%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey BETWEEN 537 AND 537;

90,000 9%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE TotalProductCost < 100.00;

300,000 30%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE TotalProductCost > 100.00;

300,000 30%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE TotalProductCost > 100.00 AND

TotalProductCost < 200.00;

90,000 9%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE TotalProductCost >= 100.00 AND

TotalProductCost <= 200.00;

90,000 9%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE TotalProductCost LIKE 100.00;

100,000 10%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE SalesOrderNumber LIKE ‘%A%’;

539,232 53.9232%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE SalesOrderNumber LIKE ‘A%’;

269,616 26.9616%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE SalesOrderNumber LIKE ‘%A’;

269,616 26.9616%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE TotalProductCost > (ProductStandardCost + 10.00);

300,000 30%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey IN (1);

31,622.8 3.16228%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey IN (1,2);

62,245.6 6.22456%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey = 1 OR

ProductKey = 2;

62,245.6 6.22456%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey IN (1,2,3);

91,900 9.19%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey = 1 OR

ProductKey = 2 OR

ProductKey = 3;

91,900 9.19%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey = 1 AND

ProductKey = 2 AND

ProductKey = 3;

10,746.1 1.07461%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey = 1 AND

ProductKey = 2;

13,335.2 1.33352%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE (TaxAmt * UnitPrice) > 10.00;

300,000 30%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey NOT IN (1);

968,377 96.8377%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey <> 1;

968,377 96.8377%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey NOT IN (1,2);

13,335.2 1.33352%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey NOT IN (1,2,3);

10746.1 1.07461%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey <> 1 AND

ProductKey <> 2 AND

ProductKey <> 3;

10746.1 1.07461%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE EXISTS (SELECT 1);

1,000,000 100%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE NOT EXISTS (SELECT 1);

0 0%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey IS NOT NULL;

1,000,000 100%
SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey IS NULL;

0 0%
ALTER TABLE dbo.FactInternetSales

ALTER COLUMN ProductKey int NULL;

SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey IS NOT NULL;

968,377 96.8377%
ALTER TABLE dbo.FactInternetSales

ALTER COLUMN ProductKey int NULL;

SELECT SalesOrderNumber

FROM dbo.FactInternetSales

WHERE ProductKey IS NULL;

31,622.8 3.16228%

There are numerous other scenarios I could have put in this table, but you probably get the point by now that the search conditions help define the which “guess” calculation is used in absence of statistics.  When there are additional details beyond table cardinality, such as the nullability of a column or constraints, SQL Server can often leverage it accordingly.

4 thoughts on “Selectivity Guesses in absence of Statistics

  1. This table could be used to construct "cardinality hints" by introducing useless predicates which alter misestimated cardinality. A nasty hack, but it seems nicer that join hints which cause side-effects like forcing join order.

  2. Thanks for writing this Joe. Does it mean that in a table which is a heap and with no -nonclustered index, the actual row-counts guessed by the optimizer while compiling a plan will always be wrong?

    What about a heap with some non-clustered index on some columns?

    Thanks
    Chandan

  3. Hi Chandan,

    If auto-create / auto-update statistics are enabled (which most often they should be) – the guesses don’t come in to play (for this scenario at least). Basically the guesses come into play when there are no stats involved (whether stats objects or stats associated with an index).

    Cheers

Comments are closed.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.