This post covers examples from Denali  SQL Server 2012 CTP3, version 11.0.1440.

In my last post I hinted towards some interesting findings regarding parallelism and columnstore indexes.  I’ll talk about what I observed in this post.

Just as a quick level-set, columnstore indexing provides two new areas of functionality within SQL Server 2012 that can potentially improve query performance for typical relational data warehouse workloads.  The features contributing to this potential performance gain include the Vertipaq columnstore technology (which uses compression and stores the data by column instead of row) and the new method of query processing that processes blocks of column data in batches.  You can benefit from from the columnstore storage alone  – but you may also see further query performance improvements if the query runs in "batch" execution mode versus "row".  

One key point to underscore– just because you add a columnstore index to a table doesn’t mean that your query will use “batch” execution mode. 

Let’s walk through an example of row vs. batch using the AdventureWorksDWDenali database.  The test server used in this illustration has four visible schedulers (not including the DAC) and 8GB of RAM.

Also – in order to increase the cost of the queries against this table and make it worth the columnstore index's time, I put together the below query to bump up the row count in FactInternetSales from 60,398 rows to 483,184 rows (SalesOrderNumber was part of the primary key, so I populated it with NEWID() and kept the other column values static from the existing rows and then used "GO 3" to populate it a few times):

INSERT dbo.FactInternetSales
(ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey, CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight, CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
)

SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey, CurrencyKey, SalesTerritoryKey, LEFT(CAST(NEWID() AS NVARCHAR(36)),20), SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount,
TaxAmt, Freight, CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate,
ShipDate
FROM
dbo.FactInternetSales
GO 3


 

The following query is executed against a table with no columnstore index (I’ve also enabled the actual Execution Plan to show afterwards – and also execute the query twice in order to measure results with the data in cache):

SET STATISTICS IO ON
SET STATISTICS
TIME
ON

SELECT c.CommuteDistance
       d.CalendarYear
,
       SUM(f.SalesAmount)
TotalSalesByCommuteDistance
FROM
dbo.FactInternetSales as f
INNER JOIN dbo.DimCustomer as c
ON
           f.CustomerKey =
c.CustomerKey
INNER JOIN dbo.DimDate d
ON
           d.DateKey =
f.OrderDateKey
GROUP BY c.CommuteDistance
,
         d.CalendarYear 

The results for STATISTICS IO were as follows:

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'FactInternetSales'. Scan count 1, logical reads 20289, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'DimCustomer'. Scan count 1, logical reads 979, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'DimDate'. Scan count 1, logical reads 66, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

And the results of the SET STATISTICS TIME ON were as follows:

SQL Server Execution Times:

   CPU time = 562 ms,  elapsed time = 625 ms.

 

As for the execution plan, if we hover over the Clustered Index Scan for FactInternetSales you’ll see the “Actual Execution Mode”.

clip image0024 thumb Row and batch execution modes and columnstore indexes

So we see a value of “Row” (the execution mode we’re used to) and “Actual Number of Batches” of 0.

 

Now let’s create a columnstore index:

 

CREATE NONCLUSTERED COLUMNSTORE INDEX [CSI_FactInternetSales] ON [dbo].[FactInternetSales]
(
[OrderDateKey]
,
[CustomerKey]
,
[SalesAmount]
)

GO

Re-executing the query after creating the index, I see the following STATISTICS IO output:

 

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'FactInternetSales'. Scan count 1, logical reads 597, physical reads 0, read-ahead reads 1167, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'DimCustomer'. Scan count 1, logical reads 979, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'DimDate'. Scan count 1, logical reads 66, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

That’s a reduction in logical reads against FactInternetSales from 20289 down to 597.  Very likeable.

What about STATISTICS TIME?

 

SQL Server Execution Times:

   CPU time = 453 ms,  elapsed time = 485 ms.

 

Okay so CPU time was 562 ms now it is 453 ms.  Elapsed time was 625 ms and now it is 485 ms. 

 

What about the execution plan, were we using batch execution mode?

 

clip image0044 thumb Row and batch execution modes and columnstore indexes

No.  Still using “Row” with a columnstore index scan.   So things brings me to what I observed a few days ago. 

One thing I didn’t mention is that in my test environment -  my max degree of parallelism was set to “1”. 

 

What happens if I uncap the max degree of parallelism for my 4-scheduler server and re-execute the query?

clip image0054 thumb Row and batch execution modes and columnstore indexes

 

After lifting the cap on max degree of parallelism, the query executes using a parallel plan and also switches from “row” to “batch” mode.  Notice also the “Actual Number of Batches” – which shows a value of 1,482.

 

This is all nice – but what about the query performance and I/O.  Any further reductions?

 

The I/O results are as follows:

Table 'DimCustomer'. Scan count 5, logical reads 1072, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'DimDate'. Scan count 3, logical reads 56, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'FactInternetSales'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

What’s this? Zero for all stats related to FactInternetSales?  I even dropped clean buffers to double-check, and it still displayed all zeros.  (More on this for another day.)

 

What about the execution statistics?

 

SQL Server Execution Times:

 

                 CPU time = 78 ms,  elapsed time = 114 ms.

 

Now we're talking big improvements. CPU time was 562 ms without columnstore, then 453 ms with columnstore + row processing and then 78 ms for columnstore + batch processing.  Elapsed time was 625 ms without columnstore and then 485 ms for columnstore + row processing and then 114 ms for columnstore + batch processing.  So more non-trivial reductions.

 

So a few key findings in closing:

 

·        Columnstore + row processing can provide a performance boost (getting column based storage and compression)

·        Columnstore + batch processing can provide additional performance improvements on top of the columnstore index I/O benefits

·        When I capped parallelism, I didn’t see batch processing – so check your execution plan and be aware that there are other factors that impact whether batch processing vs. row processing as well