sqlskills-logo-2015-white.png

Hash Partitioning with SQL Server 2012’s SEQUENCE object and CYCLE argument

When I first heard about SQL Server 2012’s SEQUENCE object – I thought it was an interesting feature to be added and one that I have been asked about by customers in the past (from those who had worked on different database platforms).  But when I looked at the CYCLE argument of SEQUENCE, that’s when I really got interested.

I wondered if it could be used in the service of implementing hash partitioning (of sorts) – allowing me to evenly distribute rows across a set number of partitions based on a hash key.  In this scenario I want the distribution to be evenly spread out, but NOT partition based on other business keys (like a datetime column or other attribute that has business or application meaning).

So will a column with a sequence default also work as a partition key? 

I started off by creating a new table based on AdventureWorkDWDenali’s FactInternetSales table:

— Create demonstration Fact table, no constraints, indexes, keys
— Tested on version 11.0.1750 (SQL Server 2012 RC0)
USE [SequenceDemo];
GO
CREATE TABLE [dbo].[FactInternetSales](
[ProductKey] [int] NOT NULL,
[OrderDateKey] [int] NOT NULL,
[DueDateKey] [int] NOT NULL,
[ShipDateKey] [int] NOT NULL,
[CustomerKey] [int] NOT NULL,
[PromotionKey] [int] NOT NULL,
[CurrencyKey] [int] NOT NULL,
[SalesTerritoryKey] [int] NOT NULL,
[SalesOrderNumber] [nvarchar](20) NOT NULL,
[SalesOrderLineNumber] [tinyint] NOT NULL,
[RevisionNumber] [tinyint] NOT NULL,
[OrderQuantity] [smallint] NOT NULL,
[UnitPrice] [money] NOT NULL,
[ExtendedAmount] [money] NOT NULL,
[UnitPriceDiscountPct] [float] NOT NULL,
[DiscountAmount] [float] NOT NULL,
[ProductStandardCost] [money] NOT NULL,
[TotalProductCost] [money] NOT NULL,
[SalesAmount] [money] NOT NULL,
[TaxAmt] [money] NOT NULL,
[Freight] [money] NOT NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[CustomerPONumber] [nvarchar](25) NULL,
[OrderDate] datetime NULL,
[DueDate] datetime NULL,
[ShipDate] datetime NULL
)ON [PRIMARY];
GO

Next I created the sequence object (increment by 1, with a min of 1, max of 10, caching of 10 at a time and a cycling of values):

CREATE SEQUENCE dbo.Seq_FactInternetSales
AS int
START WITH 1
INCREMENT BY 1
MINVALUE 1
MAXVALUE 10
CYCLE
CACHE 10;

After that, I added a new column to the Fact table called PartitionBucketKey and associated it with the new sequence object:

ALTER TABLE [dbo].[FactInternetSales]
ADD PartitionBucketKey int DEFAULT
(NEXT VALUE FOR dbo.Seq_FactInternetSales);

Next, I created a partition function and scheme:

— Create a new partition function
CREATE PARTITION FUNCTION pfFactInternetSales(int)
AS RANGE LEFT FOR VALUES (1,2,3,4,5,6,7,8,9);
— Create a new partition scheme
— And yes, being lazy about the FGs, as I just want to see whether the
— individual partitions fan-out the way I want…
CREATE PARTITION SCHEME psFactInternetSales
AS PARTITION pfFactInternetSales
ALL TO ( [PRIMARY] );

Next up, I created a clustered index on the table referencing the PK columns used in the original version of this table but then referencing the PartitionBucketKey in partition scheme:

— Create it on the new column referencing the sequence
CREATE CLUSTERED INDEX IX_FactInternetSales
ON  dbo.FactInternetSales(SalesOrderNumber, SalesOrderLineNumber)
ON psFactInternetSales(PartitionBucketKey);

It’s show time.  Now I went ahead and populated 60,398 rows from the original table.  Not much for this test I realize, but this was just an initial proof-of-concept:

INSERT dbo.FactInternetSales
(ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey,
PromotionKey, CurrencyKey, SalesTerritoryKey, SalesOrderNumber,
SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice,
ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost,
TotalProductCost, SalesAmount, TaxAmt, Freight, CarrierTrackingNumber,
CustomerPONumber, OrderDate, DueDate, ShipDate)
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey,
PromotionKey, CurrencyKey, SalesTerritoryKey, SalesOrderNumber,
SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice,
ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost,
TotalProductCost, SalesAmount, TaxAmt, Freight, CarrierTrackingNumber,
CustomerPONumber, OrderDate, DueDate, ShipDate
FROM [AdventureWorksDWDenali].[dbo].[FactInternetSales];

Now I’ll check if the 60,398 rows were divided up evenly over the 10 partitions:

SELECT partition_number, row_count
FROM sys.dm_db_partition_stats
WHERE object_id = object_id(‘[dbo].[FactInternetSales]’);

clip_image001

It worked.  And if you look at the individual rows, you’ll see the cycle of sequence values were defined based on the PK composite key (SalesOrderNumber, SalesOrderLineNumber):

SELECT SalesOrderNumber, SalesOrderLineNumber, PartitionBucketKey
FROM [dbo].[FactInternetSales]
ORDER BY SalesOrderNumber, SalesOrderLineNumber;

clip_image003

Okay, so it works.  But is this a wise thing to do? 

I don’t know yet.  I have other questions about this technique and I’d like to do more testing on various scenarios.  But I do like the fact that I’m able to leverage a native engine feature in service of another native engine feature.  Time will tell if this is a viable pattern or a known anti-pattern.

4 thoughts on “Hash Partitioning with SQL Server 2012’s SEQUENCE object and CYCLE argument

  1. Can’t you do something similar just using an identity, the modulus operator, and a computed column? Does this gain you anything over that approach?

    Thanks,

  2. Good questions Eric! I haven’t tried that technique before. I’d be interested in the CPU overhead of that versus this technique (and also would like to compare execution plans).

    One thing I’m also wondering is if the "cache" option of the sequence will provide a benefit for large data loads. It would definitely be interesting to test several options. I was also curious to compare it to a UDF (T-SQL and CLR). We’ll see if I can get to that.

    If you have time to validate, let me know and share here if you can.

    Thanks!

Comments are closed.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.