In this blog post we’ll explore columnstore index batch execution mode limits.  I’m using SQL Server 2012 Developer Edition, version 11.0.3349 (SP1, CU3).

To start off with, I’ll create a table that uses six different supported nonclustered columnstore index data types:


CREATE TABLE dbo.[FactContrived]
col02 INT NOT NULL DEFAULT 2147483647,
col03 BIGINT NOT NULL DEFAULT 9223372036854775807,
col05 MONEY NOT NULL DEFAULT 922337203685477.5807);

I’ll populate this table with 1,048,576 rows, all using the same value for each row (contrived by-design):


INSERT dbo.[FactContrived]

INSERT dbo.[FactContrived]
SELECT  [col00],
FROM dbo.[FactContrived];
GO 20

Next I’ll create a nonclustered columnstore index on each column in the table:

ON dbo.[FactContrived]
([col00], [col01], [col02], [col03], [col04], [col05] )

Once created, I’ll check the page count statistics with the following query:

SELECT  [partition_id],
SUM([in_row_used_page_count]) AS [in_row_used_page_count],
SUM([lob_used_page_count]) AS [lob_used_page_count],
SUM([row_count]) AS [row_count]
FROM    sys.[dm_db_partition_stats]
WHERE   [object_id] = OBJECT_ID('FactContrived') AND
[index_id] = INDEXPROPERTY
GROUP BY [partition_id];

This query returns the following results:

image thumb Exploring Columnstore Index Batch Sizes

So as expected, the nonclustered columnstore index is stored within LOB pages (1,050 total).

Next I’ll check the segment metadata for the specific partition_id:

FROM    sys.[column_store_segments]
WHERE [partition_id] = 72057594039762944;

This returns the following 7 rows:

image thumb1 Exploring Columnstore Index Batch Sizes

Observations on this output:

  • We have 7 columns instead of 6 that are defined for the index.  Recall that there is no primary key or unique column for this table – and also look at column_id 7 and its associated on_disk_size and primary_dictionary_id of “-1”.
  • Our data was duplicated across each column when we populated it and notice the min_data_id and max_data_id is identical for each column as expected (except column_id 7).

Next I’ll check the dictionary metadata:

SELECT  [column_id],
FROM    sys.column_store_dictionaries
WHERE [partition_id] = 72057594039762944;

This returns the following 6 rows (no dictionary for our column_id 7):

image thumb2 Exploring Columnstore Index Batch Sizes

Notice each dictionary just has one entry (since each column has just one unique value in the table across the 1,048,576 rows).  Also notice the very small on_disk_size is bytes – and the dictionary type of “1” which according to BOL is a “Hash dictionary containing int values.”

Now I’ll execute a query that leverages the columnstore index using batch execution mode:

SELECT    [col00],
MAX([col00]) AS [MaxCol00]
FROM [dbo].[FactContrived]
GROUP BY [col00]

The plan (via SQL Sentry Plan Explorer) is as follows:

image thumb3 Exploring Columnstore Index Batch Sizes

The Plan Tree tab shows that batch execution mode was used for the Columnstore Index Scan and the Hash Match (Partial Aggregate) operators:

image thumb4 Exploring Columnstore Index Batch Sizes

As for thread distribution, while the plan had a degree of parallelism of 8 and and one branch with 8 reserved threads, only one thread has rows associated with it:

image thumb5 Exploring Columnstore Index Batch Sizes

Now regarding the actual number of batches, we see the following:

image thumb6 Exploring Columnstore Index Batch Sizes

So with 1,166 batches of rows across 1,048,576 rows, we’re looking at an average batch size of 899.29 rows.

Experimenting with referencing the other columns of this table, we’ll see the same number of batches each time (I’ll spare you the repeated results).

Now when columnstore indexes were first discussed a year or so ago, you would see references to batch sizes of approximately 1,000 rows.  For example, the SQL Server Columnstore Index FAQ talks about how a batch “typically represents about 1000 rows of data.”  Then a few months after SQL Server 2012 I recall seeing references to it being approximately 900 rows.  The results of this particular example show we’re closer to that 900 row value.

Now for contrast – I went ahead and dropped the nonclustered columnstore index, truncated the table, and repopulated it with random values instead of using one unique value per column.  Below is a revisiting of the metadata for the new data distributions after adding back a nonclustered columnstore index…

Partition Stats

image thumb7 Exploring Columnstore Index Batch Sizes

Notice we have more LOB pages for the random data vs. the uniform data – as expected.

Segment Metadata

image thumb8 Exploring Columnstore Index Batch Sizes

Notice the higher on_disk_size and the varying min/max data ranges per segment.  Also notice that with the random, more unique data, there only one of the columns (our tinyint column) has a primary_dictionary.

Dictionary Metadata

image thumb9 Exploring Columnstore Index Batch Sizes

And we see one dictionary entry for our tinyint column.

Revisiting our original test query, the plan shape changes somewhat (additional Parallel Gather Streams), given that we now actually have 256 rows returned based on the tinyint column:

image thumb10 Exploring Columnstore Index Batch Sizes

As for row distribution across threads, we still see one thread handling the batch-mode operations and then a spread of the row-mode rows across the other operators:

image thumb11 Exploring Columnstore Index Batch Sizes

Now as for the number of batches, we see the following:

image thumb12 Exploring Columnstore Index Batch Sizes

So 1,280 batches over 1,048,576 rows.  Averaging 819.2 rows per batch, vs. our previous test’s 899.29 rows.

There is more to explore on this subject – but that’s all for today as I’m about to jump on a plane to Chicago to help deliver IE2: Immersion Event on Performance Tuning.