Back in November I kicked off a survey that had you run some code to get some details about your cluster keys, nonclustered indexes, and table size. I got results from more than 500 systems across the world, resulting in 97565 lines of data – thanks!

The purpose of the survey is to highlight one of the side-effects of not adhering to the general guidelines (i.e. there are exceptions to these) for choosing a clustered index key. It should be, if possible:

  1. Narrow
  2. Static
  3. Ever-increasing
  4. Unique

The survey and this post are intended to show how not adhering to Rule #1 can lead to performance problems.

Both Kimberly and I have explained in the past the architecture of nonclustered indexes – where every nonclustered index row has to have a link back to the matching heap or clustered index record. The link must be a unique value as it must definitively match a single record in the heap or clustered index. For nonclustered indexes on a table with a clustered index, this link is the cluster key (or keys) as these are guaranteed to be unique. Ah, you say, but what about when the clustered index is NOT defined as unique? That’s where Rule #4 comes in. For a non-unique clustered index, there will be a hidden 4-byte column (called the uniquifier) added when necessary as a tie-breaker when multiple clustered index records have the same key values. This increases the clustered index key size by 4 bytes (the uniquifier is an integer) when needed.

But I digress. The crux of the matter is that every nonclustered index record will include the cluster keys. The wider the cluster key size is (e.g. a few natural keys), the more overhead there is in each nonclustered index record, compared to using, for instance, an integer (4-byte) or bigint (8-byte) surrogate cluster key. This can mean you’ve got the potential for saving huge amounts of space by moving to smaller clustered index keys – as we’ll see from the data I collected.

The survey code I got you to run returned:

  1. Number of nonclustered indexes
  2. Number of cluster keys
  3. Total cluster key size
  4. Number of table rows
  5. Calculation of bytes used in all the nonclustered indexes to store the cluster keys in each row

I did not take into account filtered indexes in 2008, or variable length cluster key columns, as to be honest although these will make a difference, for the purposes of my discussion here (making you aware of the problem), they’re irrelevant. It also would have made the survey code much more complex for me to figure out :-)

Now let’s look at some of the results I received. To make things a little simpler, I discarded results from tables with less then ten thousand rows, and with clustered index key sizes less than 9. This dropped the number of data points from 97565 down to 22425.

The graphs below show the estimated amount off savings that could be had in GB from moving to an 8-byte bigint, plotted against the first four factors in the list above.

 

 byncindexes How cluster key size can lead to GBs of wasted space

 bycluskeys How cluster key size can lead to GBs of wasted space

 byclussize How cluster key size can lead to GBs of wasted space

 bytablerows How cluster key size can lead to GBs of wasted space

And here are the top 20 in terms of potential savings so you can see how the rough table schema:

NCIndexes  ClusterKeys  KeyWidth  TableRows      KeySpaceGB  SavingsGB
———  ———–  ——–  ————-  ———-  ———
6          4            72        891,751,171    358.8       352.1
6          3            16        3,189,075,035  285.1       261.4
1          5            45        4,479,327,954  187.7       154.4
6          4            72        453,251,463    182.4       179.0
4          3            16        2,766,814,206  164.9       144.3
4          2            89        371,745,035    123.3       120.5
2          4            774       76,337,053     110.1       109.5
2          4            774       76,331,676     110.0       109.5
2          4            774       75,924,837     109.5       108.9
2          4            774       75,533,539     108.9       108.3
5          4            72        318,217,628    106.7       104.3
7          1            60        269,590,810    105.5       103.4
22         3            13        389,203,725    103.7       100.8
22         3            13        329,772,049    87.8        85.4
2          2            509       90,311,271     85.6        85.0
17         1            510       9,334,362      75.4        75.3
22         3            13        267,380,864    71.2        69.2
2          7            172       219,929,560    70.5        68.8
22         3            13        261,967,851    69.8        67.8
6          5            31        395,800,250    68.6        65.6

Wow – that’s some pretty amazing stuff – and that doesn’t even account for the space taken up by page headers etc.

You might be thinking – why do I care? There are plenty of reasons:

  • If you can save tens or hundreds of GBs by changing the cluster key to something much smaller, that translates directly into a reduction in size of your backups and data file disk space requirements.
  • Smaller databases mean faster backups and restores.
  • Making the nonclustered indexes smaller means that index maintenance (from inserts/updates/deletes) and index fragmentation removal will be much faster and generate less transaction log.
  • Making the nonclustered indexes smaller means that consistency checking will be much faster – nonclustered index checking takes 30% of the CPU usage of DBCC CHECKDB.
  • Reducing the width of nonclustered index records means the density of records (number of records per nonclustered index page) increases dramatically, leading to faster index processing, more efficient buffer pool (i.e. memory) usage, and fewer I/Os as more of the indexes can fit in memory.
  • Anything you can do to reduce the amount of transaction log directly affects the performance of log backups, replication, database mirroring, and log shipping.

As you can see, there are many reasons to keep the cluster key as small as possible – all directly translating into performance improvements. For those of you that think that moving to a bigint may cause you to run out of possible keys, see this blog post where I debunk that – unless you’ve got 3 million years and 150 thousand petabytes to spare…

One thing I’m not doing in this post is advocating any particular key over any other (although bigint identity does fit all the criteria from the top of the post) – except to try to keep it as small as possible. Choosing a good cluster key entails understanding the data and workload as well as the performance considerations of key size that I’ve presented here. And in some very narrow cases, not having a cluster key at all is acceptable – which means there’s 8 bytes in each nonclustered index record (just to forestall those who may want to post a comment arguing against clustered indexes in general :-)

Changing the cluster key can be tricky – Kimberly blogged a set of steps to follow plus some code to help you on our SQL Server Magazine blog back in April 2010.

Later this week I’ll blog some code that will run through your databases and spit out table names that could have significant space savings from changing the cluster key.

I won’t be blogging or tweeting much in January as we’ll be in Indonesia diving, but I will be posting photos later in the month.

Enjoy!