I gave a presentation on columnstore indexing last week and one question I received was regarding the performance of a (hypothetical) narrow, supporting nonclustered index versus a columnstore index.  We discussed how nonclustered indexes were still going to be more effective for specific types of queries (for example – singleton operations, smaller range scans, etc.).  But what about a situation where we’re aggregating data across the entire fact table? Was there any way that a narrow nonclustered index could measure up against a columnstore index?

The ability to contrast the performance is of course very much dependent on the query construction, table design, data distribution and other factors.  With that said, I thought I would test out a scenario as follows…

First of all, I used FactInternetSales – pumped up to 123,695,104 rows.  The query I used for comparing performance was as follows (with a few hint-modifications to force index and parallelism caps that I used later):

SELECT     p.ProductLine,

           SUM(f.SalesAmount) TotalSalesAmount

FROM [dbo].[FactInternetSales] f

INNER JOIN [dbo].[DimProduct] p ON

     f.ProductKey = p.ProductKey

GROUP BY p.ProductLine

ORDER BY p.ProductLine;

 

I created the following nonclustered indexes (index on joining keys – and then an INCLUDE on the SalesAmount which is being aggregated):

CREATE NONCLUSTERED INDEX [NCI_FactInternetSales_ProductKey]

ON [dbo].[FactInternetSales]

([ProductKey] ASC)

INCLUDE ( [SalesAmount]) ON [PRIMARY];

GO

 

CREATE NONCLUSTERED INDEX [NCI_DimProduct_ProductKey_ProductLine] ON [dbo].[DimProduct]

(    [ProductKey] ASC,

     [ProductLine] ASC

)ON [PRIMARY];

GO

 

I also created a columnstore index that covered all columns in the fact table.  To force my query to ignore that index for the comparison test, I added the following line:

OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX);

 

So how did the performance compare between a supporting nonclustered index versus the columnstore index? (By the way, I used a warm cache for both tests – executing twice):

Index Support

Elapsed Time

CPU Time

Columnstore

256 ms.

1045 ms.

Nonclustered index

34189 ms.

33322 ms

 

The columnstore index plan was as follows:

clip image002 thumb Comparing Query Performance when using an “ideal” Nonclustered Index versus Columnstore Index

The nonclustered index plan was as follows:

clip image004 thumb Comparing Query Performance when using an “ideal” Nonclustered Index versus Columnstore Index

One obvious difference is that the columnstore index execution leveraged parallelism.  The query also benefited from batch execution mode:

clip image006 thumb Comparing Query Performance when using an “ideal” Nonclustered Index versus Columnstore Index

If I hobble the columnstore index by capping max degree of parallelism, the results (in comparison) are as follows (third row represents the test):

Intended Index Support

Elapsed Time

CPU Time

Mode

Columnstore

256 ms.

1045 ms.

Batch

Nonclustered index

34189 ms.

33322 ms.

Row

Columnstore index (MAXDOP 1) – ending up using NCI

33540 ms.

34560 ms.

Row

 

Looking at the plan, we see the columnstore index wasn’t used at all:

clip image008 thumb Comparing Query Performance when using an “ideal” Nonclustered Index versus Columnstore Index

If I force use of the columnstore index with MAXDOP 1, I see:

Index Support

Elapsed Time

CPU Time

Mode

Columnstore

256 ms.

1045 ms.

Batch

Nonclustered index

34189 ms.

33322 ms.

Row

Columnstore index (MAXDOP 1) – ending up using NCI

33540 ms.

34560 ms.

Row

Columnstore index (MAXDOP 1) – forced columnstore index

29859 ms.

30107 ms.

Row

 

The plan shows the index is used, but it is row mode (due to capping parallelism):

clip image010 thumb Comparing Query Performance when using an “ideal” Nonclustered Index versus Columnstore Index

clip image012 thumb Comparing Query Performance when using an “ideal” Nonclustered Index versus Columnstore Index

Looping back to the inciting discussion – even with a supporting nonclustered index, this specific query which scanned all 123 million rows and returned a smaller result set with aggregated values performed significantly better when using a columnstore index in conjunction with batch execution mode.