Warning: array_merge(): Expected parameter 1 to be an array, null given in D:\Wordpress\blogs\paul\wp-includes\blocks\shortcode.php on line 31 Warning: array_merge(): Expected parameter 1 to be an array, null given in D:\Wordpress\blogs\paul\wp-includes\blocks\social-link.php on line 45 Which index will SQL Server use to count all rows - Paul S. Randal

Which index will SQL Server use to count all rows

This is a question that came up yesterday on Twitter: will SQL Server always do a table scan to find the result of SELECT COUNT (*) FROM mytable?

The answer is no. The query processor will use the index with the smallest number of pages – i.e. with the least I/O cost.

Let me quickly show you. First I'll create a simple table with no indexes.

CREATE TABLE CTest (c1 INT IDENTITY, c2 BIGINT DEFAULT 1, c3 CHAR (1000) DEFAULT 'a');
GO
SET NOCOUNT ON;
GO
INSERT INTO CTest DEFAULT VALUES;
GO 10000

Don't forget to unclick the Include Actual Query Plan button

 

before running the 10000 inserts otherwise it will take forever (generating 10000 graphical plans) and SSMS will barf with:

The query has exceeded the maximum number of result sets that can be displayed in the Execution Plan pane. Only the first 250 result sets are displayed in the Execution Plan pane.

Now if we do the SELECT COUNT (*), the plan is:

 

The query processor has no choice but to do a table scan.

Now I'll add a nonclustered index that will have less pages that the table itself:

CREATE NONCLUSTERED INDEX CTest_1 ON CTest (c2);
GO

And the select plan is now:

 

Notice that the Table Scan operator has changed to an Index Scan operator on the new CTest_1 index. This is because the nonclustered index has fewer pages than the table and so the I/O cost is lower.

Now I'll create an index that's even smaller, over the integer column:

CREATE NONCLUSTERED INDEX CTest_2 ON CTest (c1);
GO

And the plan should change to use the smallest index again:

 

And it does, as I expected.

Let's look at the relative page counts for each index and the table:

SELECT [index_id], [page_count]
FROM sys.dm_db_index_physical_stats (DB_ID (), OBJECT_ID ('CTest'), NULL, NULL, 'LIMITED');
GO

index_id    page_count
———– ——————–
0           1436
2           28
3           19

Any time the query processor is working out what plan to use, one of the key factors in plan choice is the I/O cost.

Hope this helps!

13 thoughts on “Which index will SQL Server use to count all rows

  1. Thanks Paul for this excellent blog.
    just want to add you can get addtional information about IO with
    set statistics io on

    Thanx.
    Vinay

  2. Actually, just to add to this – SQL Server will do a leaf level scan of the smallest nonclustered index for count(*) as well as count(not-nullable column). So, if you’re particularly interested in a count – having a small nc to scan is very important. In fact, a small/interesting trick is to create a nonclustered index on your clustering key. That’s the smallest index that could exist (but, it has VERY limited uses).

    Cheers,
    kt

  3. Thanks Paul..I went through many articles about Table Index vs Index Scan , all articles have different thoughts , can you pls explain , how both work internally to get the data.

  4. Thanks for writing that up 5 years ago Paul…came in helpful today. Was trying to figure out if SELECT COUNT(*) was worse than SELECT COUNT(pk_id) and if not, why? I could’ve sworn I heard you say once that SELECT COUNT(pk_id) was better…but apparently not. This post explains it. Tried Kimberly’s “trick” as well on a wide 24 million row table. Realized that the optimizer was already using a non-clustered index on another INT field. Once I added the non-clustered index on the pk field the optimizer used the new index, but with only a slight performance gain. Now I’m tempted to drop all the non-clustered indexes and see what happens to performance….. I did find that using/changing MAXDOP hint affected performance and that in this case of a 4 proc VM using MAXDOP(2) had better performance over MAXDOP(1) or no hint.

    1. Answer to an old comment: As far I know it is true for Oracle that you should prevent a SELECT COUNT(*) (for this reason we used usually COUNT(1) at my previous company).

      MS SQL Server interpretes the * (or a constant value as 1) and counts all row using the smallest index (even if you created a heap table with all rows allows NULLs and inserts only NULLs in every column).

      Using the pk_id in the COUNT() would be counterproductive, because it could prevent this behavior, when the pk_id allows NULLs (whyever).

      Another important remark:
      Of course SQL Server could not use filtered indexes for counting (even if the are much smaller than the unfiltered), except the SELECT COUNT(*) has exact the same WHERE condition.

      1. An interesting trick on big tables where counting rows is done often is to add a column to the table with BIT (default(0)) datatype and the create a nonclustered index over it.

        1. Maybe for heaps, but for clustered tables is this not usuable. Clustered key is always in index row, then you can add field or filds from clustered index to new not clustered index. :-)

          –creating table [BIGTABLE]
          CREATE TABLE [dbo].[BIGTABLE](
          [cl_ix_val] [int] IDENTITY(1,1) NOT NULL,
          [bit_val] [bit] NOT NULL default(0),
          [data] [varchar](max) NULL)
          go

          CREATE CLUSTERED INDEX [CL_IX_BIGTABLE] ON [dbo].[BIGTABLE]
          ( [cl_ix_val] ASC)
          go

          CREATE NONCLUSTERED INDEX [right_index] ON [dbo].[BIGTABLE]
          ( [cl_ix_val] ASC)
          go

          CREATE NONCLUSTERED INDEX [false_index] ON [dbo].[BIGTABLE]
          ( [bit_val] ASC)
          go

          –data prepare
          insert into bigtable (data) values (‘datadatadata…’)
          go

          insert into bigtable (data)
          select data from bigtable
          go 20

          — wow
          SELECT (select name from sysindexes x where indid=index_id and object_id=id) as name, index_type_desc, index_level,page_count ,avg_record_size_in_bytes
          FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N’dbo.bigtable’), NULL, 0 , ‘DETAILED’);

          — MSSQL server is clever
          set statistics io on
          SET SHOWPLAN_TEXT ON
          select count(*) from bigtable
          — :-)

  5. Hi
    I have count function giving different results when it uses a clustered index and when it uses a non-clustered index. “DW_Id” is Primary Key :

    select count(RECID),count(DW_Id),count(*) from MP_CORPEDWStage.dbo.AX_dbo_CUSTINVOICETRANS

    uses clustered index and gives

    ———– ———– ———–
    1082527 1082527 1082527

    When I remove count(RECID), it uses non-clustered index and gives different result.

    select count(DW_Id),count(*) from MP_CORPEDWStage.dbo.AX_dbo_CUSTINVOICETRANS

    ———– ———–
    1082519 1082519

    I am not sure what I am missing. How do I know which count is correct.
    I tried rebuilding both the indexes, dbcc updateusage(database,table) etc but still nothing changes….

    1. Sounds like the nonclustered index has a filter on it or is potentially missing some rows. Does a DBCC CHECKTABLE come back clean? What does sys.partitions list for each of the indexes?

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.