Row and batch execution modes and columnstore indexes

This post covers examples from ~~Denali~~ SQL Server 2012 CTP3, version 11.0.1440.

In my last post I hinted towards some interesting findings regarding parallelism and columnstore indexes. I’ll talk about what I observed in this post.

Just as a quick level-set, columnstore indexing provides two new areas of functionality within SQL Server 2012 that can potentially improve query performance for typical relational data warehouse workloads. The features contributing to this potential performance gain include the Vertipaq columnstore technology (which uses compression and stores the data by column instead of row) and the new method of query processing that processes blocks of column data in batches. You can benefit from from the columnstore storage alone – but you may also see further query performance improvements if the query runs in "batch" execution mode versus "row".

One key point to underscore– just because you add a columnstore index to a table doesn’t mean that your query will use “batch” execution mode.

Let’s walk through an example of row vs. batch using the AdventureWorksDWDenali database. The test server used in this illustration has four visible schedulers (not including the DAC) and 8GB of RAM.

Also – in order to increase the cost of the queries against this table and make it worth the columnstore index's time, I put together the below query to bump up the row count in FactInternetSales from 60,398 rows to 483,184 rows (SalesOrderNumber was part of the primary key, so I populated it with NEWID() and kept the other column values static from the existing rows and then used "GO 3" to populate it a few times):

INSERT dbo.FactInternetSales
(ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey, CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight, CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate)

SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey, CurrencyKey, SalesTerritoryKey, LEFT(CAST(NEWID() AS NVARCHAR(36)),20), SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount,
TaxAmt, Freight, CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
FROM dbo.FactInternetSales
GO 3

The following query is executed against a table with no columnstore index (I’ve also enabled the actual Execution Plan to show afterwards – and also execute the query twice in order to measure results with the data in cache):

SET STATISTICS IO ON
SET STATISTICS TIME ON

SELECT c.CommuteDistance,
       d.CalendarYear,
       SUM(f.SalesAmount) TotalSalesByCommuteDistance
FROM dbo.FactInternetSales as f
INNER JOIN dbo.DimCustomer as c ON
           f.CustomerKey = c.CustomerKey
INNER JOIN dbo.DimDate d ON
           d.DateKey = f.OrderDateKey
GROUP BY c.CommuteDistance,
         d.CalendarYear

The results for STATISTICS IO were as follows:

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'FactInternetSales'. Scan count 1, logical reads 20289, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'DimCustomer'. Scan count 1, logical reads 979, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'DimDate'. Scan count 1, logical reads 66, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

And the results of the SET STATISTICS TIME ON were as follows:

SQL Server Execution Times:

CPU time = 562 ms, elapsed time = 625 ms.

As for the execution plan, if we hover over the Clustered Index Scan for FactInternetSales you’ll see the “Actual Execution Mode”.

So we see a value of “Row” (the execution mode we’re used to) and “Actual Number of Batches” of 0.

Now let’s create a columnstore index:

CREATE NONCLUSTERED COLUMNSTORE INDEX [CSI_FactInternetSales] ON [dbo].[FactInternetSales]
(
[OrderDateKey],
[CustomerKey],
[SalesAmount]
)

Re-executing the query after creating the index, I see the following STATISTICS IO output:

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'FactInternetSales'. Scan count 1, logical reads 597, physical reads 0, read-ahead reads 1167, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'DimCustomer'. Scan count 1, logical reads 979, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'DimDate'. Scan count 1, logical reads 66, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

That’s a reduction in logical reads against FactInternetSales from 20289 down to 597. Very likeable.

What about STATISTICS TIME?

SQL Server Execution Times:

CPU time = 453 ms, elapsed time = 485 ms.

Okay so CPU time was 562 ms now it is 453 ms. Elapsed time was 625 ms and now it is 485 ms.

What about the execution plan, were we using batch execution mode?

No. Still using “Row” with a columnstore index scan. So things brings me to what I observed a few days ago.

One thing I didn’t mention is that in my test environment – my max degree of parallelism was set to “1”.

What happens if I uncap the max degree of parallelism for my 4-scheduler server and re-execute the query?

After lifting the cap on max degree of parallelism, the query executes using a parallel plan and also switches from “row” to “batch” mode. Notice also the “Actual Number of Batches” – which shows a value of 1,482.

This is all nice – but what about the query performance and I/O. Any further reductions?

The I/O results are as follows:

Table 'DimCustomer'. Scan count 5, logical reads 1072, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'DimDate'. Scan count 3, logical reads 56, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'FactInternetSales'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

What’s this? Zero for all stats related to FactInternetSales? I even dropped clean buffers to double-check, and it still displayed all zeros. (More on this for another day.)

What about the execution statistics?

SQL Server Execution Times:

CPU time = 78 ms, elapsed time = 114 ms.

Now we're talking big improvements. CPU time was 562 ms without columnstore, then 453 ms with columnstore + row processing and then 78 ms for columnstore + batch processing. Elapsed time was 625 ms without columnstore and then 485 ms for columnstore + row processing and then 114 ms for columnstore + batch processing. So more non-trivial reductions.

So a few key findings in closing:

· Columnstore + row processing can provide a performance boost (getting column based storage and compression)

· Columnstore + batch processing can provide additional performance improvements on top of the columnstore index I/O benefits

· When I capped parallelism, I didn’t see batch processing – so check your execution plan and be aware that there are other factors that impact whether batch processing vs. row processing as well

5 thoughts on “Row and batch execution modes and columnstore indexes”

Tim Felber says:

April 25, 2013 at 8:56 am

I’ve deployed column store indexes on some large fact tables in our data warehouse and have hit an issue with Row vs Batch that I can’t understand. Our fact table has 911,073,320 rows, contains store level sales information for magazines, and is partitioned by the agency distributing magazines to those stores. When I run a query like the following

select f.magnet_dealer_number, i.Issue_Code, sum(f.net_draw) AS Draw, sum(f.net_sale) AS Sale
from WH_OR.Fact_StoreIssue_Detail f
INNER JOIN WH_OR.DIM_BIPAD b on f.bipad_id = b.BipadID
Inner join WH_OR.Dim_Issue i on f.Issue_id = i.Issue_ID
where f.agency_id = 2004 and b.BipadTitle = ‘ADMIN MAGAZINE (86640)’
and i.Issue_Code in (‘2013-02′,’2013-03′,’2013-04’)
group by f.magnet_dealer_number, i.Issue_Code

It uses the column store indexes in Row Processing mode and pretty much times out in our reporting tool (5 minutes +). If I remove the filter on Agency_ID (our partition key) the query kicks into Batch processing mode and completes around 1 second.

Are there any things you can think of that would trigger this. First thought was maybe row count in the partition but 2004 contains 105,516,664 (our largest partition). This still seems like it would be large enough to want batch processing.

Other info
Max DOP for the server is set to 0 with the server recognizing 32 procs. Threshold cost is set to 5.
Clusted index for each partition is created with Max DOP of 1.

Appreciate any feed back.

Tim Felber
tfelber@MagNetData.net
1. Joseph Sack says:
  
  April 25, 2013 at 9:36 am
  
  Hi Tim,
  
  The predicate you’ve narrowed it down to (Agency_Id = 2004), shouldn’t inhibit batch-execution mode directly (per the various things we know can indeed block batch execution mode) – so that makes me wonder about other options… There is a known issue with IN and NOT IN with semi-joins – but you’ve already narrowed it down to an unrelated predicate.
  
  A few things I’d be curious about:
  – Estimated total subtree cost of the plans in comparison – knowing how they differ between the batch vs. row plan
  – Estimated vs. actual number of rows – are there big skews in the plan that runs in row-mode? And does the EstimatedExecuteMode = batch and ActualExecutionMode = row? I wonder if there is a switch from batch to row due to a spill?
  – Do you see any batch_hash_table_build_bailout extended events during the row execution?
  Cheers,
  Joe
  1. Tim Felber says:
    
    April 26, 2013 at 8:18 am
    
    Your Question “Estimated vs. actual number of rows – are there big skews in the plan that runs in row-mode?” seems to be where the problem is. For the Row Execution, it estimated the number of rows to be 1, where the Actual Number of Rows was 105,516,664. On Batch Execution method it estimated 911,073,000, and the Actual Number of Rows was 1038 with 299 Batches.
    
    I ran the query and watched for batch_hash_table_build_bailout and it did not seem to occur during execution.
    
    Last night I decided to rebuild the fact table using a different clustered index key structure to see what difference that would make(maybe get a Clustered Index Seek Query Plan). This however, actually corrected my problem where I am now getting Batch Execution on the Column Store Index when filtering by Agency_ID. The old Clustered Index was based on the field OffSaleDateID, DealerID, AgencyID with the partition on AgencyID. The New Clustered Index is based on IssueID, DealerID, AgencyID and still partitioned by AgencyID. I am guessing maybe the Statistics based on the updated index call for the different query plan, but I don’t understand why the order the table is stored would affect the Column Store Index and it usage other then partitions which in this case didn’t change.
    
    Thanks for your Reply and helping me make sense of this.
    
    Tim
    1. Joseph Sack says:
      
      April 26, 2013 at 9:34 am
      
      Excellent – thanks for the update Tim!

Comments are closed.

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.

Row and batch execution modes and columnstore indexes

5 thoughts on “Row and batch execution modes and columnstore indexes”

Other articles

New Course: “SQL Server: Common Query Tuning Problems and Solutions – Part 2″

New Course: “SQL Server: Common Query Tuning Problems and Solutions – Part 1”

SQLIntersection Post-Conference Session

Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator

Finding Timeout-Prone Code with max_elapsed_time

Deck from “Practical SQL Server Cardinality Estimation”

Imagine feeling confident enough to handle whatever your database throws at you.