sqlskills-logo-2015-white.png

Comparing Query Performance when using an “ideal” Nonclustered Index versus Columnstore Index

I gave a presentation on columnstore indexing last week and one question I received was regarding the performance of a (hypothetical) narrow, supporting nonclustered index versus a columnstore index.  We discussed how nonclustered indexes were still going to be more effective for specific types of queries (for example – singleton operations, smaller range scans, etc.).  But what about a situation where we’re aggregating data across the entire fact table? Was there any way that a narrow nonclustered index could measure up against a columnstore index?

The ability to contrast the performance is of course very much dependent on the query construction, table design, data distribution and other factors.  With that said, I thought I would test out a scenario as follows…

First of all, I used FactInternetSales – pumped up to 123,695,104 rows.  The query I used for comparing performance was as follows (with a few hint-modifications to force index and parallelism caps that I used later):

SELECT     p.ProductLine,

           SUM(f.SalesAmount) TotalSalesAmount

FROM [dbo].[FactInternetSales] f

INNER JOIN [dbo].[DimProduct] p ON

     f.ProductKey = p.ProductKey

GROUP BY p.ProductLine

ORDER BY p.ProductLine;

 

I created the following nonclustered indexes (index on joining keys – and then an INCLUDE on the SalesAmount which is being aggregated):

CREATE NONCLUSTERED INDEX [NCI_FactInternetSales_ProductKey]

ON [dbo].[FactInternetSales]

([ProductKey] ASC)

INCLUDE ( [SalesAmount]) ON [PRIMARY];

GO

 

CREATE NONCLUSTERED INDEX [NCI_DimProduct_ProductKey_ProductLine] ON [dbo].[DimProduct]

(    [ProductKey] ASC,

     [ProductLine] ASC

)ON [PRIMARY];

GO

 

I also created a columnstore index that covered all columns in the fact table.  To force my query to ignore that index for the comparison test, I added the following line:

OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX);

 

So how did the performance compare between a supporting nonclustered index versus the columnstore index? (By the way, I used a warm cache for both tests – executing twice):

Index Support

Elapsed Time

CPU Time

Columnstore

256 ms.

1045 ms.

Nonclustered index

34189 ms.

33322 ms

 

The columnstore index plan was as follows:

clip_image002

The nonclustered index plan was as follows:

clip_image004

One obvious difference is that the columnstore index execution leveraged parallelism.  The query also benefited from batch execution mode:

clip_image006

If I hobble the columnstore index by capping max degree of parallelism, the results (in comparison) are as follows (third row represents the test):

Intended Index Support

Elapsed Time

CPU Time

Mode

Columnstore

256 ms.

1045 ms.

Batch

Nonclustered index

34189 ms.

33322 ms.

Row

Columnstore index (MAXDOP 1) – ending up using NCI

33540 ms.

34560 ms.

Row

 

Looking at the plan, we see the columnstore index wasn’t used at all:

clip_image008

If I force use of the columnstore index with MAXDOP 1, I see:

Index Support

Elapsed Time

CPU Time

Mode

Columnstore

256 ms.

1045 ms.

Batch

Nonclustered index

34189 ms.

33322 ms.

Row

Columnstore index (MAXDOP 1) – ending up using NCI

33540 ms.

34560 ms.

Row

Columnstore index (MAXDOP 1) – forced columnstore index

29859 ms.

30107 ms.

Row

 

The plan shows the index is used, but it is row mode (due to capping parallelism):

clip_image010

clip_image012

Looping back to the inciting discussion – even with a supporting nonclustered index, this specific query which scanned all 123 million rows and returned a smaller result set with aggregated values performed significantly better when using a columnstore index in conjunction with batch execution mode.

3 thoughts on “Comparing Query Performance when using an “ideal” Nonclustered Index versus Columnstore Index

  1. Very interesting post, Joe! I am just curious, did you do the normal DBCC DROPCLEANBUFFERS between query runs for your testing? Did you try using PAGE compression on the non-clustered indexes? Finally, what kind of hardware were you running these tests on? Probably a laptop, but you know me, I would like to know what CPU, how much RAM, what kind of storage it has.

  2. Thanks Glenn! Here is some additional information (and I’ll be trying to get a parallel plan for the NCI scenario so we can compare that performance as well):

    -) No exec of DBCC DROPCLEANBUFFERS (each test run I did execute twice, and then I used the second exec for the post)
    -) I didn’t try page compression – BUT – I think that is a good idea to test (thanks)
    -) Hardware is my laptop, Intel Core i7 2820 (4 cores)
    -) 16 GB RAM
    -) Samsung SSD PM810 2.5" 7

    More questions spawning from this post – so I’ll update here…

  3. Posting parallel NCI query (per Adam Twitter suggestion):

    SELECT ProductLine,
    SUM(SalesAmt)
    FROM (SELECT DISTINCT ProductLine, 0%ProductKey +
    ProductKey AS Pkey
    FROM [dbo].[DimProduct]) AS p
    OUTER APPLY
    (SELECT SUM(y.SalesAmount) SalesAmt
    FROM [dbo].[FactInternetSales] AS y
    WITH(INDEX ([NCI_FactInternetSales_ProductKey]))
    WHERE p.Pkey = y.ProductKey) AS x
    GROUP BY ProductLine
    OPTION (QUERYTRACEON 8649);

    Same results (had NULL row) but matched data set. 16 seconds duration using NCI and parallelism.

Comments are closed.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.