sqlskills-logo-2015-white.png

SEQUENCE Object versus Modulus Hash Partitioning

Last month I wrote a post called Hash Partitioning with SQL Server 2012’s SEQUENCE object and CYCLE argument and I was asked a good question by Eric Humphrey (twitter) about whether there was any advantage of using a sequence object for hash partitioning over an identity/modulus/computed-column technique. 

At that point I hadn’t compared the two techniques, but I wanted to perform a test eventually, so here is what I found…  This was a quick-and-dirty test, by the way, as I really should have been working on something else (first), but I needed a little “fun” time.

Before testing, I made a few minor modifications to the schema used in my original post:

1.      I split the range of the partition function as follows:

 

ALTER PARTITION FUNCTION pfFactInternetSales () SPLIT RANGE (0)

 

2.      I changed the sequence object to be zero-based instead of one-based:

 

CREATE SEQUENCE dbo.Seq_FactInternetSales

    AS int

               START WITH 0

               INCREMENT BY 1

    MINVALUE 0

    MAXVALUE 10

    CYCLE

    CACHE 10;

That is all that was changed with the sequence object technique I used before. As for a modulus technique, I created two additional columns to another version of the FactInternetSales table:

ProductID int NOT NULL IDENTITY(1,1),

               PartitionBucketKey AS ProductID %11  PERSISTED,

 

After adding the columns I referenced the PartitionBucketKey table as the partition key for the clustered index and imported the data from an INSERT…SELECT (as I did with the sequence object method).

 

So did the performance differ between the two techniques?

 

Here are the results of eight separate tests (each executed with a cold data cache for the SELECT, truncated destination table and pre-sized data and log files):

 

Test number

Rows loaded

Buckets

Hash Partition

Method

1

1,207,960 rows

11 buckets

26 seconds

Sequence

2

1,207,960 rows

11 buckets

26 seconds

Sequence

3

1,207,960 rows

11 buckets

29 seconds

Sequence

4

1,207,960 rows

11 buckets

26 seconds

Sequence

5

1,207,960 rows

11 buckets

27 seconds

Modulus

6

1,207,960 rows

11 buckets

24 seconds

Modulus

7

1,207,960 rows

11 buckets

25 seconds

Modulus

8

1,207,960 rows

11 buckets

26 seconds

Modulus

So the modulus method seems to perform ever-so-slightly faster than the sequence method.  Stylistically, the sequence method involved a more direct way to create X number of hash buckets (basically no calculation necessary).  With that said, the execution plans are nearly identical.

Sequence Plan (estimated cost – 17.4357)

clip_image002

Modulus Plan (estimated cost – 17.635)

clip_image004

I’m sure there are other variations of this test that I could try out, but that’s all for now. I’m curious about other techniques people may be using (like a CLR function to do the hashing), so please post them here if you have used an interesting variation on this post’s theme.

2 thoughts on “SEQUENCE Object versus Modulus Hash Partitioning

  1. Interesting.. I wonder how that compares to simple row_number().

    Try increasing the cache size to 11 on the sequence and throw on a (tablockx) on the insert !

  2. Thanks Joe. So the upshot is we can do this in versions prior to 2012. The downside is the computed column, since it is persisted, takes up additional space.

Comments are closed.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.