More considerations for the clustering key – the clustered index debate continues!

OK, I've talked about the clustering key many times. Here, I thought I'd bring together one final series of details (and links) to help you really understand why I'm so adamant about DESIGNING a clustering key and not just letting SQL Server pick it for you (for example when it defaults to making the primary key clustered). Just because SQL Server defaults to making the primary key clustered doesn't actually mean it's a GOOD clustering key!

The key things that I've always recommended about a GOOD clustering key is that it is: unique, narrow, static and ever-increasing. For more details on the reasons behind this, check out these posts:

And, today, there are two additional items that I want to add to this list: your clustering key should be non-nullable and fixed-width!

First, why non-nullable?

In a regular data row you will ALWAYS have a null block. This is 2 bytes for a column count (in that row) and 1 bit per column (to store the actual null values). If you want to get more details on the internals of a data row, see Paul's post: Inside the Storage Engine: Anatomy of a record. However, the btree of the clustered index and the nonclustered index leaf/non-leaf levels do NOT have to have this "null block" of information if the columns in the index do NOT allow nulls. While this may be only 3 or 4 bytes (depending on the number of columns), it's still 3 or 4 bytes that would have to be added to EVERY nonclustered index's leaf level (for every row). And, it just doesn't have to be there. Why waste space when you don't have to!

Second, and along the same lines, is the need to use fixed-width columns!

Just like in data rows, index rows only have a variable block (offsets + end of row) when there are variable-width columns. If a clustered index is chosen that has one or more variable-width columns then you are adding at least 4 bytes to every row that might not have otherwise needed to be there. In fact, often people forget that uniquifiers are stored in the variable block portion of the row and as a result, a uniquifier really needs 8 bytes (4 bytes for the int + 2 bytes in the variable block for the offset + 2 bytes for the end of row marker).

To show you this, I've created four tables each with the same 3 columns DATA TYPES (table 4 has col1 as an int that's nullable and you can't do that with an identity so I used INSERT/SELECT to copy the data over):

col1 int identity not null,
col2 datetime2(7) not null default sysdatetime(),
col3 datetime2(7) null default sysdatetime()

And, each of the tables has these four nonclustered indexes:

  • Non-unique nonclustered on col2
  • Unique nonclustered on col2
  • Non-unique nonclustered on col3
  • Unique nonclustered on col3

The difference is solely within the definition of the clustering key as well as weather or not the values are unique (or nullable)!

CLTable1 is clustered and non-nullable but NOT defined as unique (the values are unique)

CLTable2 is clustered and non-nullable but there are duplicate values in the clustering key

CLTable3 is defined as a unique clustered index

CLTable4 is clustered, nullable and there are duplicate values in the clustering key

The end result is the sizes of all of the keys and the wasted space from different keys!

For CLTable1, CLTable2 and CLTable3 – all of the minimums are the same. For CLTable1 and CLTable3 – the maximums are the same as the minimums. In CLTable3 it's because the records MUST be unique. In CLTable1 it's because the records just happen to be unique right now (meaning there's no overhead unless there are actual dupes).

index_id           min          max          avg
1 27 27 27
1 11 11 11
2 13 13 13
2 19 19 19
3 13 13 13
3 15 15 15
4 16 16 16
4 22 22 22
5 16 16 16
5 18 18 18

For CLTable2 the maximums are each 8 bytes larger (highlighted in yellow) EXCEPT where in the btrees of UNIQUE nonclustered indexes where SQL Server does NOT need to duplicate the clustering key (pale pink):

index_id         min         max          avg
1 27 35 27.73
1 11 19 11.76
2 13 21 13.73
2 19 27 20.09
3 13 21 13.73
3 15 15 15.00
4 16 24 16.73
4 22 30 22.92
5 16 24 16.73
5 18 18 18.00

NOTE: If you're not familar with index internals and when/why SQL Server duplicates the clustering key in nonclustered indexes, check out my Index Internals chapter from the SQL Server 2008 Internals title (here's the Amazon link: and my companion content from Chapter 6 here: Companion content for Chapter 6 (Index Internals) of SQL Server 2008 Internals. And, if you really want to see what's in your indexes, check out my updated versions of sp_helpindex here:

Finally, CLTable4 has the most interesting results…

index_id         min         max          avg
1 27 35 27.73
1 14 22 14.78
2 16 24 16.73
2 22 30 22.92
3 16 24 16.73
3 15 15 15
4 16 24 16.73
4 22 30 22.92
5 16 24 16.73
5 18 18 18

Nonclustered indexes 2 and 3 (and the clustered index's btree) are all 3 bytes larger… why? Because the clustering key allows nulls and each of these structures has only 2-3 columns. As a result, the space needed for the null block is 3 bytes (2 bytes for the column count [NCol] and 1 byte for the actual null bitmap [less than 9 columns]).

The btree for index id 3 only needs the nonclustered key column (col2) which does not allow NULLs and the index is unique so the tree is the same as before.

Nonclustered indexes 4 and 5 already had a nullable column and therefore already had a null block.

My point – it all adds up and if it's not truly necessary, then avoid it!

Thanks for reading!

4 thoughts on “More considerations for the clustering key – the clustered index debate continues!

  1. I’m curious, what’s the history of the clustered index name? A cluster is a grouping and a unique sequential key is not a cluster. Did a clustered index behave differently in the old SQL Server engine or is their a definition of cluster I’m not finding on Google?

    A perfect example of something with a non-intuitive name in SQL Server is the timestamp. So I’m more than willing to accept this is another case. Just wondering if there is a reason for it.

    Thanks for the great blogs on the topic.

    1. In earlier versions the clustered index was generally created on something like a foreign key constraint or something like name. But, this was mainly to reduce the negatives of page-level locking. So, yes, you could say that data was “clustered” together. For example, a clustered index on LastName, FirstName, MiddleInitial would cluster data by name.

      Having said that, it’s also a way of “ordering [or, grouping]” data. From a pure definition perspective: a small group of things that are closely packed together.

      So… I suspect it was just the best name at the time and often did reflect “clusters” of data. And, for high-end systems where hash partitioning has been used then the data is often clustered by a composite key of HashID, RowID so here the data is grouped again.

      But, I wouldn’t get caught up in the cluster term as much as what it actually does. I’ve found that lots of naming conventions in databases/computers (in general) are often uninspiring. ;-) ;-)

      Hope that helps!

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Wow! Wow! Wow! THANK YOU!

I announced my retirement from SQL/tech here and your comments on my blog, on LinkedIn, and on Facebook were overwhelming and humbling! I’m so touched


Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.