Last month I wrote a post called Hash Partitioning with SQL Server 2012’s SEQUENCE object and CYCLE argument and I was asked a good question by Eric Humphrey (twitter) about whether there was any advantage of using a sequence object for hash partitioning over an identity/modulus/computed-column technique.
At that point I hadn’t compared the two techniques, but I wanted to perform a test eventually, so here is what I found… This was a quick-and-dirty test, by the way, as I really should have been working on something else (first), but I needed a little “fun” time.
Before testing, I made a few minor modifications to the schema used in my original post:
1. I split the range of the partition function as follows:
ALTER PARTITION FUNCTION pfFactInternetSales () SPLIT RANGE (0)
2. I changed the sequence object to be zero-based instead of one-based:
CREATE SEQUENCE dbo.Seq_FactInternetSales
AS int
START WITH 0
INCREMENT BY 1
MINVALUE 0
MAXVALUE 10
CYCLE
CACHE 10;
That is all that was changed with the sequence object technique I used before. As for a modulus technique, I created two additional columns to another version of the FactInternetSales table:
ProductID int NOT NULL IDENTITY(1,1),
PartitionBucketKey AS ProductID %11 PERSISTED,
After adding the columns I referenced the PartitionBucketKey table as the partition key for the clustered index and imported the data from an INSERT…SELECT (as I did with the sequence object method).
So did the performance differ between the two techniques?
Here are the results of eight separate tests (each executed with a cold data cache for the SELECT, truncated destination table and pre-sized data and log files):
|
Test number
|
Rows loaded
|
Buckets
|
Hash Partition
|
Method
|
|
1
|
1,207,960 rows
|
11 buckets
|
26 seconds
|
Sequence
|
|
2
|
1,207,960 rows
|
11 buckets
|
26 seconds
|
Sequence
|
|
3
|
1,207,960 rows
|
11 buckets
|
29 seconds
|
Sequence
|
|
4
|
1,207,960 rows
|
11 buckets
|
26 seconds
|
Sequence
|
|
5
|
1,207,960 rows
|
11 buckets
|
27 seconds
|
Modulus
|
|
6
|
1,207,960 rows
|
11 buckets
|
24 seconds
|
Modulus
|
|
7
|
1,207,960 rows
|
11 buckets
|
25 seconds
|
Modulus
|
|
8
|
1,207,960 rows
|
11 buckets
|
26 seconds
|
Modulus
|
So the modulus method seems to perform ever-so-slightly faster than the sequence method. Stylistically, the sequence method involved a more direct way to create X number of hash buckets (basically no calculation necessary). With that said, the execution plans are nearly identical.
Sequence Plan (estimated cost – 17.4357)
Modulus Plan (estimated cost – 17.635)
I’m sure there are other variations of this test that I could try out, but that’s all for now. I’m curious about other techniques people may be using (like a CLR function to do the hashing), so please post them here if you have used an interesting variation on this post’s theme.