Last month I wrote a post called Hash Partitioning with SQL Server 2012’s SEQUENCE object and CYCLE argument and I was asked a good question by Eric Humphrey (twitter) about whether there was any advantage of using a sequence object for hash partitioning over an identity/modulus/computed-column technique.
At that point I hadn’t compared the two techniques, but I wanted to perform a test eventually, so here is what I found… This was a quick-and-dirty test, by the way, as I really should have been working on something else (first), but I needed a little “fun” time.
Before testing, I made a few minor modifications to the schema used in my original post:
1. I split the range of the partition function as follows:
ALTER PARTITION FUNCTION pfFactInternetSales () SPLIT RANGE (0)
2. I changed the sequence object to be zero-based instead of one-based:
CREATE SEQUENCE dbo.Seq_FactInternetSales
AS int
START WITH 0
INCREMENT BY 1
MINVALUE 0
MAXVALUE 10
CYCLE
CACHE 10;
That is all that was changed with the sequence object technique I used before. As for a modulus technique, I created two additional columns to another version of the FactInternetSales table:
ProductID int NOT NULL IDENTITY(1,1),
PartitionBucketKey AS ProductID %11 PERSISTED,
After adding the columns I referenced the PartitionBucketKey table as the partition key for the clustered index and imported the data from an INSERT…SELECT (as I did with the sequence object method).
So did the performance differ between the two techniques?
Here are the results of eight separate tests (each executed with a cold data cache for the SELECT, truncated destination table and pre-sized data and log files):
Test number |
Rows loaded |
Buckets |
Hash Partition |
Method |
1 |
1,207,960 rows |
11 buckets |
26 seconds |
Sequence |
2 |
1,207,960 rows |
11 buckets |
26 seconds |
Sequence |
3 |
1,207,960 rows |
11 buckets |
29 seconds |
Sequence |
4 |
1,207,960 rows |
11 buckets |
26 seconds |
Sequence |
5 |
1,207,960 rows |
11 buckets |
27 seconds |
Modulus |
6 |
1,207,960 rows |
11 buckets |
24 seconds |
Modulus |
7 |
1,207,960 rows |
11 buckets |
25 seconds |
Modulus |
8 |
1,207,960 rows |
11 buckets |
26 seconds |
Modulus |
So the modulus method seems to perform ever-so-slightly faster than the sequence method. Stylistically, the sequence method involved a more direct way to create X number of hash buckets (basically no calculation necessary). With that said, the execution plans are nearly identical.
Sequence Plan (estimated cost – 17.4357)
Modulus Plan (estimated cost – 17.635)
I’m sure there are other variations of this test that I could try out, but that’s all for now. I’m curious about other techniques people may be using (like a CLR function to do the hashing), so please post them here if you have used an interesting variation on this post’s theme.
2 Responses to SEQUENCE Object versus Modulus Hash Partitioning
Interesting.. I wonder how that compares to simple row_number().
Try increasing the cache size to 11 on the sequence and throw on a (tablockx) on the insert !
Thanks Joe. So the upshot is we can do this in versions prior to 2012. The downside is the computed column, since it is persisted, takes up additional space.