Today I thought I’d try out some of the new, highly-touted columnstore improvements in SQL Server 2014. First off, to figure out what’s new I used three main sources: one is the “canon”, the whitepaper, “Enhancements to SQL Server Column Stores“, presented at Sigmod 2013. The second source is the SQL Server 2014 CTP1 Books Online page, that documents and illustrates the DDL to define and work with clustered (and nonclustered) columnstore indexes. The third source is Niko Neugebauer’s excellent blog post series and experiments on Clustered Columnstore Indexes, that starts here.
Rather than repeat what those folks have already implemented, documented, and experimented with respectively, I went off on a different tangent. I’d like to find out how the new clustered columnstore index affects query plans, and look for changes in queries between SQL Server 2012 and 2014 CTP1. I started by restoring AdventureWorks2014 to a SQL Server 2014 instance and a SQL Server 2012 SP1 instance. And taking the FactInternetSales table up to over a million (about 1.15 million) rows using a combination of select…into and insert…select.
On the SQL Server 2012 instance a made in copy of the table without a nonclustered columnstore index and one with a nonclustered columnstore index. The nonclustered columnstore index contained all of the columns in the table. SQL Server 2014 had a third copy of this table, one that used a clustered columnstore index.
With each table (named FactInternetSalesNew, FactInternetSalesNewNCI, and FactInternetSalesNewCCI so I could keep trace of them), I did the relevant variation of the following query:
select ProductAlternateKey, CalendarYear, SUM(SalesAmount) as SumSales
from FactInternetSalesNew fact
join DimDate on fact.OrderDateKey = DateKey
join DimProduct on DimProduct.ProductKey = fact.ProductKey
where CalendarYear BETWEEN 2006 AND 2007
and ProductAlternateKey LIKE ‘BK%’
group by ProductAlternateKey, CalendarYear;
The query plans were interesting is their differences and similarities.
1.The “traditional” row-based storage used a HashJoin with a Bitmap for one dimension (DimDate) and a MergeJoin to join in the Product dimension. This was the same in SQL Server 2012 SP1 and 2014. This is a different plan shape than any of the columnstore plans, with the dimensions being joined in a different order and the aggregation happening at a different point in the plan.
2.The SQL Server 2012 nonclustered columnstore index used BatchHashTableBuild iterators to build hash tables for each dimension, then HashJoins for the join with each dimension. The scan of the fact table, BatchHashTableBuilds and HashJoins, and a HashMatch (partial aggregation) iterator after the joins used Batch Mode; the rest of the iterators used Row mode.
3. The SQL Server 2014 nonclustered columnstore index did NOT use BatchHashTableBuild iterators, instead the join between the Fact table and DateDim used a HashMatch. However, the HashMatch iterator did contain the new (for 2014) property “BitmapCreator=True”. The HashMatch iterator to join in the Product dimension did use a HashMatch, but the BitmapCreator property was not present. All iterators EXCEPT the Exchange, DimensionTable scans, and the last HashMatch (aggregate) used Batch Mode.
4. The SQL Server 2014 clustered columnstore index plan was essentially the same as the nonclustered columnstore index plan.
For clarity’s sake, I’ll continue the series and see how the cardinality estimation works with columnstore indexes in the next post.
For expediency’s sake, the plans are posted as an attachment. I’ll hopefully get back to replace them with pictures soon.
So, to summarize:
1. The query plan for nonclustered columnstore index changes for the example query between SQL Server 2012 SP1 and 2014 CTP1. All experiments are performed on my 2-proc VM.
2. In SQL Server 2014 CTP1, the plan for nonclustered and clustered columnstore indexes are the same. The savings of the clustered columnstore index is that you don’t have to store your actual data twice.