Clarifying LEFT and RIGHT in the defintion of a PARTITION FUNCTION in SQL Server 2005

NOTE: Generally, I recommend RIGHT-based partition function so that you don't have to deal with datetime timetick issues at all. However, this post can really help you if you still want to use LEFT-based partition functions. Enjoy!!! 

When creating partitioned tables in SQL Server 2005, a partition function requires a LEFT or RIGHT designation. In general, I recommend that you choose to create a LEFT-based partition. However, choosing LEFT always creates some confusion because a left-based partition function requires upper boundaries. This results in a more complicated partition function definition (than a RIGHT partition function).

Background First
In SQL Server 2005, you can create truly Partitioned Objects (objects are inclusive to Tables and Indexes) and to create a partitioned table you must base that table on a Partition Scheme (PS) and the PS must be based on a Partition Function (PF). Now, if you're new to the concept of partitioning, you should seriously consider reading the whitepaper I wrote for MSDN which is posted here.

Regardless of your having read the whitepaper, I want to add a few basic principles and make sure that some of the key points are clear. Simply put, a PF is the logical definition of your boundary cases (or possibly better described as the intervals over which each partition covers). A PF must cover the entire domain from negative infinity to positive infinity. If a value doesn't have a place to go then you'd have a problem with the logic of your partition function. Having said that, when you define the PF you are ONLY stating values, not where they will physically will reside. Futhermore, the expression on which the data will be based has to be somewhat straightforward and consistent. In other words, simple. If the PF were extremely complex it could compromise the usage and performance of the Partitioned Object. Finally, the definition of the PF is really just a way to easily define these boundaries. As a result, you can choose to enter a specific value (like June 1, 2005) or a function like the following to create a PF over 4 months with each month being computed from current:

CREATE PARTITION FUNCTION TestFunctionValues(datetime)
AS
RANGE RIGHT FOR VALUES (
   CONVERT(char(4), year(dateadd(m, 1, getdate()))) + right('0' + convert(char(1), month(dateadd(m, 1, getdate()))), 2) + '01'
    — next month (executed in May, 2005 yields June 2005)
   , CONVERT(char(4), year(dateadd(m, 2, getdate()))) + right('0' + convert(char(1), month(dateadd(m, 2, getdate()))), 2) + '01'
    — executed in May, 2005 yields July 2005
   , CONVERT(char(4), year(dateadd(m, 3, getdate()))) + right('0' + convert(char(1), month(dateadd(m, 3, getdate()))), 2) + '01'
    — executed in May, 2005 yields August 2005
   , CONVERT(char(4), year(dateadd(m, 4, getdate()))) + right('0' + convert(char(1), month(dateadd(m, 4, getdate()))), 2) + '01'
    — executed in May, 2005 yields September 2005
)
GO

Since all of these complex expressions are evaluated to a constant at creation, you can immediately verify your boundary cases by accessing some catalog views. Here is the catalog view query which helps you see what your partition boundaries are:

SELECT PF.[name], RV.boundary_id, RV.[value]
FROM sys.partition_range_values AS RV
 JOIN sys.partition_functions AS PF
  ON RV.function_id = PF.function_id
WHERE PF.[name] = 'TestFunctionValues'

In the above partition function, you'll notice that in addition to the values, I also have a RIGHT designation. The choice of using RIGHT, helps to target whether or not the value is a member of the first partition or the second partition. In the case of RIGHT, this says that the value is a lower boundary of the second partition. In an example that involves something like "sales" where data rows are ever increasing and their usage is for more recent rather than older data you migth want to always have the most recent four months of data available (in the real world you'd probably want more than 4 months but I'm going to simplify to four so that we have fewer boundaries with which to work). So, if we want 4 partitions – one for June, July, August and September – we need to define the partitions in the form of intervals (using boundaries). The first boundary point will either reside in the first partition (as an upper boundary) or the second partition (as a lower boundary). For four partitions, there are numerous ways in which you could define the partition function such that the logical/physical placement of the data is THE SAME however, the behavior of the partition function during later SPLIT and MERGE operations is also impacted by the choice of LEFT/RIGHT so understanding how they work is critical in making your decision to use one or the other. More than anything, will need to respect the full domain from negative infinity to positive infinity.

The following partition function is one of the easiest and most logical to understand. We need four partitions so let's just enter our boundary points and be done.

CREATE PARTITION FUNCTION RightPartitionFunction1(datetime)
AS
RANGE RIGHT FOR VALUES (
   '20040601', — Jun 2004
   '20040701', — Jul 2004
   '20040801', — Aug 2004
   '20040901' — Sep 2004
)

However, remember that we have the full domain to cover. As a result, the above partition function will create 5 partitions where the partitions will be:
 Partition 1: all data less than 20040601
 Partition 2: all data greater then/equal to 20040601 and less than 20040701
 Partition 3: all data greater then/equal to 20040701 and less than 20040801
 Partition 4: all data greater then/equal to 20040801 and less than 20040901
 Partition 5: all data greater then/equal to 20040901

This really isn't too bad but it seems to waste the first partition because it's completely empty… so, you could remove the first boundary point and just supply the second, third and forth. In that case, the first partition would cover June (instead of being empty) and you could get down to 4 partitions with only 3 boundary points.

CREATE PARTITION FUNCTION RightPartitionFunction2(datetime)
AS
RANGE RIGHT FOR VALUES (
   '20040701', — Jul 2004
   '20040801', — Aug 2004
   '20040901' — Sep 2004
)

This will create 4 partitions where the partitions will be:
 Partition 1: all data less than 20040701
 Partition 2: all data greater then/equal to 20040701 and less than 20040801
 Partition 3: all data greater then/equal to 20040801 and less than 20040901
 Partition 4: all data greater then/equal to 20040901

OK, so this seems a bit more optimal. And by definition alone, I would agree with you. Where it becomes problematic is in the merge and split operations. When a partition function goes through a merge of a boundary point, that boundary point is essentially removed. This seems simple but also has an impact on the data which resides in that partition (the partition which includes that boundary point). So, if this partitioned table is NEVER going to be modified and you never need to plan for a merge or split, then you can choose whatever definition is easier for you to use. However, if you plan to do a merge or split, you need to understand the ramifications of the LEFT/RIGHT decision. If November 1 comes around and you want the data for October to be added to your partitioned table and you ONLY want to show the most four months, then you will need to add October and remove June. The process will essentially consist of the following:

  1. Create a place for new data (Oct) to reside
  2. Add this location to the partition scheme (more details on the scheme v. the function shortly)
  3. Split the last boundary to add the new month (Oct)
  4. Switch-in the staging table of October data so it becomes part of the partitioned table
  5. Create a staging table into which the old data (Jun) will go
  6. Switch-out the June data into the staging table
  7. Merge the boundary point to remove June from the table

More background first – Partition Schemes
You will want to have a place to put the new data (there are lots of things we could look at here… do you want to ALWAYS only use 4 filegroups and when you remove June's data do you want to reuse that space – or do you have a new location on which October can be placed). This is a whole different topic over which we should chat… but for simplicity let's just say that we want to use a new location. To tell the PF where this location is, you need to work with the physical definition of the partition function. OK, that might not be clear. So far, I've only told you about the PF and so far, I've only said that the PF described the logical boudaries. You may have been wondering how SQL Server maps these logical boundaries to the phyiscal files and the answer is through the Partition Scheme. The partition scheme (at initial creation) sets up the mapping of the boundaries to their phyiscal locations (through filegroups). For the first and second PFs created above the respective partition schemes would be:

CREATE PARTITION SCHEME [PartitionScheme1]
AS
PARTITION [RightPartitionFunction1] TO
  ( [Primary]    — should be empty?
  , [PartitionedTablesFG1]  — for June
  , [PartitionedTablesFG2]  — for July
  , [PartitionedTablesFG3]  — for Aug
  , [PartitionedTablesFG4]  — for Sept
  )
and/or

CREATE PARTITION SCHEME [PartitionScheme2]
AS
PARTITION [RightPartitionFunction2] TO
  ( [PartitionedTablesFG1]  — for June
  , [PartitionedTablesFG2]  — for July
  , [PartitionedTablesFG3]  — for Aug
  , [PartitionedTablesFG4]) — for Sept

In both cases, it seems as though these do the same thing with the exception that the first PF ends up with an extra partition which is essentially empty. In this case (in a RIGHT-based PF), the second is a better choice (coming up there will be a reason for why we create an empty partition).

Back to SPLIT and MERGE
OK, so back to the issue of spliting and merging. Our goal is to remove June and add October so that the "current" data shows the four months of July, Aug, Sept and Oct. If we want to create a location on which October can be placed, we need to add another filegroup to the scheme. The way we do this is that we first need to add the filegroup/file to the database and then we need to add this new filegroup to our partition scheme.

Step 1 – Create a place for new data (Oct) to reside
Create a new filegroup and new file.

ALTER DATABASE CreditPT
 ADD FILEGROUP PartitionedTablesFG5
GO

ALTER DATABASE CreditPT
 ADD FILE
  (NAME = N'CreditPTFG5File1',
  FILENAME = N'C:\SQLTemp\CreditPTFG5File1.ndf',
  SIZE = 30, FILEGROWTH = 10, MAXSIZE = 50)
  TO FILEGROUP [PartitionedTablesFG5]
GO

Step 2 – Add this location to the partition scheme
The way to do this is to add a "NEXT USED" filegroup. Syntax:

ALTER PARTITION SCHEME PartitionScheme2 NEXT USED [PartitionedTablesFG5]
GO

Once the partition scheme has a new filegroup to use (and you can only have one at a time – note that if you add a second it solely replaces the first and does not warn you that you already had a next used), then the next time a partition is needed (which will be on the next split) then this "next used" partition will be used.

Step 3 – Split the last boundary to add the new month (Oct)
Now that we have a filegroup AND we have told the partition scheme to use it for the next split operation, we can split the PS to include this 5th filegroup

ALTER PARTITION FUNCTION RightPartitionFunction2()
SPLIT RANGE ('20041001')
GO

Step 4 – Switch-in the staging table of October data so it becomes part of the partitioned table
This step really needs more to it. To switch in your staging table you need to already have a staging table. To make this an optimal operation, you MUST create your staging table on the same fielgroup on which the partition will reside. What you're trying to do is make the switch optimal by making it a metadata only operation. This step really consists of creating an identical table with identical indexes and creating it with a constraint that matches the partitions' definition (so that SQL Server can trust that it only includes data within the interval for that partition).

CREATE TABLE ChargesPTRightOct
(  charge_no  int   NOT NULL IDENTITY,
 member_no  int   NOT NULL
      CONSTRAINT ChargesPTRightOctMemberNoFK
       REFERENCES dbo.Member(Member_No),
 provider_no  int   NOT NULL
      CONSTRAINT ChargesPTRightOctProviderNoFK
       REFERENCES dbo.Provider(Provider_No),
 category_no  int   NOT NULL
      CONSTRAINT ChargesPTRightOctCategoryNoFK
       REFERENCES dbo.Category(Category_No),
 charge_dt  datetime  NOT NULL
      CONSTRAINT ChargesPTRightOctChargeDtCK
       CHECK (Charge_dt >= '20041001'
         AND Charge_dt < '20041101'),
 charge_amt  money  NOT NULL,
 statement_no int   NOT NULL,
 charge_code  char(2)  NOT NULL
) ON [PartitionedTablesFG5]
GO

ALTER TABLE ChargesPTRightOct
ADD CONSTRAINT ChargesPTRightOctPK
  PRIMARY KEY CLUSTERED (charge_dt, charge_no)
—   ON Credit4MonthPFN (charge_dt)
GO

INSERT ChargesPTRightOct (member_no, provider_no, category_no
      , charge_dt, charge_amt
      , statement_no, charge_code)
 SELECT member_no, provider_no, category_no
   , dateadd(yy, 5, charge_dt), (charge_amt + charge_no)/10
   , statement_no, charge_code
 FROM CreditPT.dbo.Charge
 WHERE month(charge_dt) IN (10)
 ORDER BY charge_dt, charge_no
GO

ALTER TABLE ChargesPTRightOct
SWITCH TO ChargesPTRight PARTITION 5
GO

Step 5 – Create a staging table into which the old data (Jun) will go
Similarly, you will want to create a place for the June data to go – kind of like a “staging” table but more for the exit of the table.

CREATE TABLE ChargesPTRightJun
(  charge_no  int   NOT NULL IDENTITY,
 member_no  int   NOT NULL
      CONSTRAINT ChargesPTRightJunMemberNoFK
       REFERENCES dbo.Member(Member_No),
 provider_no  int   NOT NULL
      CONSTRAINT ChargesPTRightJunProviderNoFK
       REFERENCES dbo.Provider(Provider_No),
 category_no  int   NOT NULL
      CONSTRAINT ChargesPTRightJunCategoryNoFK
       REFERENCES dbo.Category(Category_No),
 charge_dt  datetime  NOT NULL,
 charge_amt  money  NOT NULL,
 statement_no int   NOT NULL,
 charge_code  char(2)  NOT NULL
) ON [PartitionedTablesFG1]
GO

ALTER TABLE ChargesPTRightJun
ADD CONSTRAINT ChargesPTRightJunPK
  PRIMARY KEY CLUSTERED (charge_dt, charge_no)
—   ON Credit4MonthPFN (charge_dt)
GO

Step 6 – Switch-out the June data into the staging table
This must also be a metadata only step. You need to make sure that it's created on the same filegroup as where the data currently resides (see the ON [ParitionedTablesFG1] clause. This is the current location of this partition. As a result, the switch out will be very fast.

ALTER TABLE ChargesPTRight
SWITCH PARTITION 1
TO ChargesPTRightJun
GO

Step 7 – Merge the boundary point to remove June from the table
This last step is the whole reason for this blog entry… yes, we're getting there ;).

ALTER PARTITION FUNCTION RightPartitionFunction2()
MERGE RANGE ('20040701')
GO

When you merge this boundary point, you will remove the value AND the parition in which it resides. Since this boundary point resides in the second partition (PartitionedTablesFG2) then it's actually PartitionedTablesFG2 that gets removed. As a result ALL of the data that resides on it, must be moved to the filegroup that now covers this boundary (which is PartitionedTablesFG1 and which was just emptied). As a result, MERGE does NOT optimally merge with a RIGHT-based partition on the rolling range scenario.

And – as a result, I recommend LEFT-based partition functions… Since the whitepaper tends to focus on using LEFT-based parition functions, please review that for more details. You can access the “Paritioned Tables” whitepaper on MSDN, here.

Hope this helps and thanks for reading!
kt

7 thoughts on “Clarifying LEFT and RIGHT in the defintion of a PARTITION FUNCTION in SQL Server 2005

  1. Hey, superb article about the partitioning. I really enjoyed reading it.

    Maybe there’s something more you could write about: How does table partitioning work together with Data Warehouse Models where one usually keeps the history of business entities denormalized in one table. The partition column is required to be part of the primary key as far as I know. So, either have no PK on the table at all, have a composite PK including the partition column which would be some kind of timestamp? Or simply forget about table partitioning?

    Maybe you already have some best practice up your sleeve. ;)

    Cheers,
    Frank

  2. Hey there Frank – In general, for partitioning, I recommend that your Primary Key be your clustering key AND that your key be a composite key of the partitioning key (which is typically datetime or smalldatetime) + the unique row identifier (not necessarily a guid… and, better when it’s an identity col). This key meets my primary requirements for clustering that the key is unique (by adding the identity), narrow (it’s all relative and this is relatively narrow at 12 bytes), and static (the partitioning key should never change if you can avoid it… it certainly should not be a volatile value). The last criteria that I look for is ever-increasing as that helps to avoid fragmentation.

    If you want some more details, I blogged about this here: https://www.sqlskills.com/blogs/kimberly/PermaLink.aspx?guid=bdaee3f7-1e15-414b-b75f-a290db645159

    Thanks for reading (and thanks for the complements!),
    kt

  3. Hi Kimberly,
    thanks a lot for your response. I took my time to read your splendid article about clustered indexes (and also followed some links to other articles).
    So if I follow your advice, I’ll have a 4bit integer and an 8bit datetime column making up my table primary (clustered) key where the datetime column is also used for partitioning. Okay, that’s clear so far. But then I am left with the Guid identifying the business entity in the OLTP database. I cannot use this column for referential integrity unless I create a composite unique key (not index) on that guid column and the datetime column again. So I could have achieved this already with building the table primary key from the guid and datetime column (with all the maintenance consequences of course). However, I wonder what I would need referential integrity (foreign keys a.s.o.) for?
    But thanks very much for your help. I wish I could have attended your seminar at Trivadis these days. :)
    Cheers, Frank

  4. Your blog is the first thing I’ve found that seems to cover partitioning in enough detail that I’m actually beginning to understand it – and because of that, guess what, you get my questions!

    I’ve been asked to add partitions to a table (SQL Server 2005) that’s already partitioned.

    Background: The table is named Solicits. The PK is a combination of a unique ID and date YYYYMMDD.

    There is one partition named MailDate and that was blocked into month-long sets.
    20090101–20090131, 20090201–20090228,…… 20091101-20091130 (currently being populated)
    20091201- ….This is the last one currently defined.

    AND – the partition schemes where spread over 12 file groups:
    Sol2009_01, Sol2009_02,……Sol2009_12

    I’ve been asked to create additional 2010 file groups, Schemes and Functions – something like this:
    1. File groups Sol2010_01 through Sol2010_12
    2. ALTER PARTITION SCHEME mailDate NEXT USE [Sol2010_01]
    through:
    ALTER PARTITION SCHEME mailDate NEXT USE [Sol2010_12]

    3. ALTER PARTITION FUNCTION MailDate () SPLIT RANGE (20100101)
    through:
    ALTER PARTITION FUNCTION MailDate () SPLIT RANGE (20101201)

    Questions:
    1. If I first create all the file groups and then execute all the ALTER PARTITION SCHEME statements followed by the ALTER PARTITION FUNCTION statements, I don’t see how it can “tie” a specific PARTITION FUNCTION to specific PARTITION SCHEME. Correct?

    2. On the other hand, if I create one file group, then execute one ALTER PARTITION SCHEME and finally one ALTER PARTITION FUNCTION and then start over and create the next file group, next ALTER PARTITION SCHEME and next ALTER PARTITION FUNCTION, will everything tie appropriately?

    3. Next, I’m concerned how the NEXT USE is interpreted. If the last PARTITION SCHEME I create is for Dec., 2010 (file group Sol2010_12), wouldn’t it have the NEXT USE associated with it? What happens next month when Dec 2009 data starts arriving, would it be inserted in the file group previously defined for Sol2009_12 data or would it be directed to the NEXT USE which would be Sol2010_12 – which is the wrong space.

    4. Last question. When they created all these file_groups, they placed them on the same i: drive. I don’t see how there could be any performance gain with the file groups/SCHEMEs on the same drive. Performance wise, wouldn’t we expect the same if we create just one file group for 2010 and then divide it with the ALTER PARTITION FUNCTION SPLITS?

  5. After a couple more readings and a "Doh!" or two, I seem to have figured most of it out –

    Questions: –
    1. If I first create all the file groups and then execute all the ALTER PARTITION SCHEME statements followed by the ALTER PARTITION FUNCTION statements, …….

    >>> 1. I was [i]correct[/i] – actually I found out that I couldn’t even do what I was proposing. The file groups can be created prior to the ALTER PARTION SCHEME and ALTER PARTION FUNCTION statements but the APS and APF need to run in pairs. They are then mapped to the FG as named in each APS.

    2. On the other hand, if I create one file group, then execute one ALTER PARTITION SCHEME and finally one ALTER PARTITION FUNCTION ……..

    >>> 2. True, True. This is the step by step process I used.

    3. Next, I’m concerned how the NEXT USE is interpreted. If the last PARTITION SCHEME I create is for Dec., 2010 (file group Sol2010_12), wouldn’t it have the NEXT USE associated with it? ……

    >>> 3. This was an unfounded concern. I tested adding data and everything fell into the correct partition based on the date.

    4. Last question. When they created all these file_groups, they placed them on the same i: drive. I don’t see how there could be any performance gain with the file groups/SCHEMEs on the same drive. Performance wise, wouldn’t we expect the same if we create just one file group for 2010 and then divide it with the ALTER PARTITION FUNCTION SPLITS?

    >>> 4. I think the benefit of having multiple file groups is when splitting off old data for archiving. Having file groups on different physical devices would always be a plus but that’s another issue.

    Anyway, thanks for great write-up.

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Wow! Wow! Wow! THANK YOU!

I announced my retirement from SQL/tech here and your comments on my blog, on LinkedIn, and on Facebook were overwhelming and humbling! I’m so touched

Explore

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.