In this blog post we\u2019ll explore columnstore index batch execution mode limits.\u00a0 I\u2019m using SQL Server 2012 Developer Edition, version 11.0.3349 (SP1, CU3).<\/p>\n
To start off with, I\u2019ll create a table that uses six different supported nonclustered columnstore index data types:<\/p>\n
\r\nUSE [CS];\r\nGO\r\n\r\nCREATE TABLE dbo.[FactContrived]\r\n(\r\ncol00 TINYINT NOT NULL DEFAULT 255,\r\ncol01 SMALLINT NOT NULL DEFAULT 32767,\r\ncol02 INT NOT NULL DEFAULT 2147483647,\r\ncol03 BIGINT NOT NULL DEFAULT 9223372036854775807,\r\ncol04 SMALLMONEY NOT NULL DEFAULT 214748.3647,\r\ncol05 MONEY NOT NULL DEFAULT 922337203685477.5807);\r\nGO\r\n<\/pre>\nI\u2019ll populate this table with 1,048,576 rows, all using the same value for each row (contrived by-design):<\/p>\n
\r\nSET NOCOUNT ON;\r\nGO\r\n\r\nINSERT dbo.[FactContrived]\r\nDEFAULT VALUES;\r\nGO\r\n\r\nINSERT dbo.[FactContrived]\r\nSELECT\u00a0 [col00],\r\n[col01],\r\n[col02],\r\n[col03],\r\n[col04],\r\n[col05]\r\nFROM dbo.[FactContrived];\r\nGO 20\r\n <\/pre>\nNext I\u2019ll create a nonclustered columnstore index on each column in the table:<\/p>\n
\r\nCREATE NONCLUSTERED COLUMNSTORE INDEX [NCI_FactContrived]\r\nON dbo.[FactContrived]\r\n([col00], [col01], [col02], [col03], [col04], [col05] )\r\nWITH (MAXDOP = 1);\r\nGO\r\n<\/pre>\nOnce created, I\u2019ll check the page count statistics with the following query:<\/p>\n
\r\nSELECT\u00a0 [partition_id],\r\nSUM([in_row_used_page_count]) AS [in_row_used_page_count],\r\nSUM([lob_used_page_count]) AS [lob_used_page_count],\r\nSUM([row_count]) AS [row_count]\r\nFROM\u00a0\u00a0\u00a0 sys.[dm_db_partition_stats]\r\nWHERE\u00a0\u00a0 [object_id] = OBJECT_ID('FactContrived') AND\r\n[index_id] = INDEXPROPERTY\r\n(OBJECT_ID('FactContrived'),\r\n'NCI_FactContrived',\r\n'IndexId')\r\nGROUP BY [partition_id];\r\nGO\r\n<\/pre>\nThis query returns the following results:<\/p>\n
<\/a><\/p>\n
So as expected, the nonclustered columnstore index is stored within LOB pages (1,050 total).<\/p>\n
Next I\u2019ll check the segment metadata for the specific partition_id:<\/p>\n
\r\nSELECT\r\n[column_id],\r\n[segment_id],\r\n[encoding_type],\r\n[row_count],\r\n[primary_dictionary_id],\r\n[secondary_dictionary_id],\r\n[min_data_id],\r\n[max_data_id],\r\n[on_disk_size]\r\nFROM\u00a0\u00a0\u00a0 sys.[column_store_segments]\r\nWHERE [partition_id] = 72057594039762944;\r\n<\/pre>\nThis returns the following 7 rows:<\/p>\n
<\/a><\/p>\n
Observations on this output:<\/p>\n
\n
- We have 7 columns instead of 6 that are defined for the index.\u00a0 Recall that there is no primary key or unique column for this table \u2013 and also look at column_id 7 and its associated on_disk_size and primary_dictionary_id of \u201c-1\u201d.<\/li>\n
- Our data was duplicated across each column when we populated it and notice the min_data_id and max_data_id is identical for each column as expected (except column_id 7).<\/li>\n<\/ul>\n
Next I\u2019ll check the dictionary metadata:<\/p>\n
\r\nSELECT\u00a0 [column_id],\r\n[dictionary_id],\r\n[type],\r\n[entry_count],\r\n[on_disk_size]\r\nFROM\u00a0\u00a0\u00a0 sys.column_store_dictionaries\r\nWHERE [partition_id] = 72057594039762944;\r\n<\/pre>\nThis returns the following 6 rows (no dictionary for our column_id 7):<\/p>\n
<\/a><\/p>\n
Notice each dictionary just has one entry (since each column has just one unique value in the table across the 1,048,576 rows).\u00a0 Also notice the very small on_disk_size is bytes \u2013 and the dictionary type of \u201c1\u201d which according to BOL is a \u201cHash dictionary containing int values.\u201d<\/p>\n
Now I\u2019ll execute a query that leverages the columnstore index using batch execution mode:<\/p>\n
\r\nSELECT\u00a0\u00a0\u00a0 [col00],\r\nMAX([col00]) AS [MaxCol00]\r\nFROM [dbo].[FactContrived]\r\nGROUP BY [col00]\r\nOPTION (RECOMPILE);\r\nGO\r\n<\/pre>\nThe plan (via SQL Sentry Plan Explorer<\/a>) is as follows:<\/p>\n
<\/a><\/p>\n
The Plan Tree tab shows that batch execution mode was used for the Columnstore Index Scan and the Hash Match (Partial Aggregate) operators:<\/p>\n
<\/a><\/p>\n
As for thread distribution, while the plan had a degree of parallelism of 8 and and one branch with 8 reserved threads, only one thread has rows associated with it:<\/p>\n
<\/a><\/p>\n
Now regarding the actual number of batches, we see the following:<\/p>\n
<\/a><\/p>\n
So with 1,166 batches of rows across 1,048,576 rows, we\u2019re looking at an average batch size of 899.29 rows.<\/p>\n
Experimenting with referencing the other columns of this table, we\u2019ll see the same number of batches each time (I\u2019ll spare you the repeated results).<\/p>\n
Now when columnstore indexes were first discussed a year or so ago, you would see references to batch sizes of approximately 1,000 rows.\u00a0 For example, the SQL Server Columnstore Index FAQ talks about how a batch \u201ctypically represents about 1000 rows of data<\/a>.\u201d\u00a0 Then a few months after SQL Server 2012 I recall seeing references to it being approximately 900 rows.\u00a0 The results of this particular example show we\u2019re closer to that 900 row value.<\/p>\n
Now for contrast \u2013 I went ahead and dropped the nonclustered columnstore index, truncated the table, and repopulated it with random values instead of using one unique value per column.\u00a0 Below is a revisiting of the metadata for the new data distributions after adding back a nonclustered columnstore index\u2026<\/p>\n
Partition Stats<\/strong><\/p>\n
<\/a><\/p>\n
Notice we have more LOB pages for the random data vs. the uniform data \u2013 as expected.<\/p>\n
Segment Metadata<\/strong><\/p>\n
<\/a><\/p>\n
Notice the higher on_disk_size and the varying min\/max data ranges per segment.\u00a0 Also notice that with the random, more unique data, there only one of the columns (our tinyint column) has a primary_dictionary.<\/p>\n
Dictionary Metadata<\/strong><\/p>\n
<\/a><\/p>\n
And we see one dictionary entry for our tinyint column.<\/p>\n
Revisiting our original test query, the plan shape changes somewhat (additional Parallel Gather Streams), given that we now actually have 256 rows returned based on the tinyint column:<\/p>\n
<\/a><\/p>\n
As for row distribution across threads, we still see one thread handling the batch-mode operations and then a spread of the row-mode rows across the other operators:<\/p>\n
<\/a><\/p>\n
Now as for the number of batches, we see the following:<\/p>\n
<\/a><\/p>\n
So 1,280 batches over 1,048,576 rows.\u00a0 Averaging 819.2 <\/strong>rows per batch, vs. our previous test\u2019s 899.29 rows.<\/p>\n
There is more to explore on this subject \u2013 but that\u2019s all for today as I\u2019m about to jump on a plane to Chicago to help deliver IE2: Immersion Event on Performance Tuning<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"
In this blog post we\u2019ll explore columnstore index batch execution mode limits.\u00a0 I\u2019m using SQL Server 2012 Developer Edition, version 11.0.3349 (SP1, CU3). To start off with, I\u2019ll create a table that uses six different supported nonclustered columnstore index data types: USE [CS]; GO CREATE TABLE dbo.[FactContrived] ( col00 TINYINT NOT NULL DEFAULT 255, col01 […]<\/p>\n","protected":false},"author":7,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16],"tags":[],"class_list":["post-998","post","type-post","status-publish","format-standard","hentry","category-columnstore-indexes"],"yoast_head":"\n
Exploring Columnstore Index Batch Sizes - Joe Sack<\/title>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\t\n\t\n\t\n