An example of a nasty cluster key

I'm teaching a class this week on database maintenance, for DBAs inside Microsoft. One of the things we're discussing today is index fragmentation and how poor cluster key choice can lead to page splits, poor performance, index fragmentation, and so on – not just in the clustered index, but also in nonclustered indexes.

One of the students looked in a database underpinning an application and found a unique cluster key, which is the worst I've ever seen (although not the worst that Kimberly's ever seen apparently – the mind boggles!).

The cluster key is defined as a combination of the following column types:

16-byte GUID
varbinary (16)
nvarchar (512)
nvarchar (256)
tinyint

Now, the wide cluster key isn't a big deal UNLESS there are nonclustered indexes, but there are in this case – so the cluster key is included in all nonclustered index rows. And the random GUID high-order key is always a bad idea, as it means the clustered index will be heavily fragmented as records are inserted. This is all simplified and generalizations (and I open this can of worms happily) – but you get the idea.

Good design up-front, with an understanding of how key choice affects the behavior of SQL Server and how indexes are stored and indexed, can lead to vastly reduced performance problems and maintenance issues.

15 thoughts on “An example of a nasty cluster key”

Adam Nachman says:

May 20, 2009 at 1:14 pm

Sounds like an auto-gen CI. No-one could possibly deliberately select that … I think Kimberly blogged about an entity framework that created something like this a little while back.

Reply
Mladen Prajdić says:

May 20, 2009 at 1:04 pm

Come one Paul, now you’re just messing with us.
There’s no way this is CI, is there? :)

Reply
paul says:

May 20, 2009 at 1:28 pm

This is a deliberately chosen CI key from a large server application.

Reply
Gail says:

May 21, 2009 at 2:41 am

The Windows 7 download application, by any chance?

Reply
BradC says:

May 21, 2009 at 6:54 am

Nasty. Not quite as bad, but I recently ran across an internal app that has a table with the following Primary Key/Clustered Index on a 65 million row table:

tinyint
varchar(5)
varchar(5)
varchar(5)
varchar(5)
varchar(5)
varchar(5)
varchar(5)
varchar(5)
varchar(5)

And yes, there are 4 non-clustered indexes, including another one on the same TinyInt field (obviously useless). Total data space for the table is about 5GB. Total index space is 17.7GB :)

Holy crap. I just looked at the table contents, and all 65 million rows in the table have the same TinyInt value!! Talk about a selective index!

Reply
Tim Ford says:

May 21, 2009 at 8:50 am

I love how I can see one blog post and suddenly every bad thing I’ve ever seen professionally in my life pales in comparison. My horror story was the 16 million row table with a CI on int, varchar(750) (both ASC order) with a NCI on the same columns in DESC order. Thanks Paul.

Reply
paul says:

May 21, 2009 at 6:56 am

@brad nice…
@gail no… didn’t hear any more info about that – don’t suppose we will

Reply
Robert says:

May 21, 2009 at 2:54 pm

I wonder why people still use GUIDs instead of Int/BigInt identity columns. If the GUID on that CI is Sequential, it would certainly help with the fragmentation, but even that does not save the CI from a "Horrid" label.

Reply
Tibor Karaszi says:

May 22, 2009 at 7:33 am

Hehe, I love how they managed to hit *both*: random first column and wide cl ix…

Reply
Walt Baby says:

May 22, 2009 at 5:46 pm

>>so the cluster key is included in all nonclustered index rows.

Is that always true? From Kalen’s article in which she links to your blog I get a different understanding.

http://sqlblog.com/blogs/kalen_delaney/archive/2008/03/16/nonclustered-index-keys.aspx

I assumed that the fact that the clustered index key was in the nonclustered index meant that it was part of the key. And all my previous testing seemed to bear that out; I must never have tested at this level of detail with a unique nonclustered index. What this means is that I have been wrong in the following claim: "Explicitly declaring your clustered key columns as part of your nonclustered indexes does not make any difference in the storage of the index." That statement is true for nonunique indexes, but not true for unique.

Thanks – Walt Baby

Reply
Paul Randal says:

May 22, 2009 at 7:14 pm

It’s in every nonclustered index row (meaning at the leaf leve), just not necessarily in the non-leaf records – depending on whether its a non-unique or unique nonclustered index. By saying nonclustered index row, just like saying clustered index row, I mean the leaf-level records in the index – not the non-leaf records, which are usually very different in their contents.

That’s why I didn’t say that the clustered index keys are part of the nonclustered index key – because it depends.

This is one part of DBCC CHECKDB that I took quite a while over while writing it to make sure it copes with every case.

Thanks

Reply
Paul Randal says:

May 22, 2009 at 7:15 pm

This is also factored into the Books Online sections I wrote on calculating the size of a nonclustered index.

Reply
Gail says:

May 23, 2009 at 1:33 pm

@Tim Worst I’ve ever seen was a table used for a homebrewed service broker’s message table. High inserts, high updates. 26 nonclustered indexes, all of which shared the same leading column (which had only one value in the entire table). 12 of those shared one second column, the other 13 all had another second column. The widest index was 9 columns.

Oddly enough it didn’t perform well. </sarcasm>

Reply
Mehdi says:

April 11, 2018 at 2:37 am

As you know, the clustered index should be: Narrow, Unique, Static(never updated) and Ever-increasing.
I think create a clustered index on a customer ID is good idea, because that is unique and static and narrow.
The only problem is that the IDs will not be inserted to the table incrementally.
This cause insert speed will decrease and also index fragmentation.
If we consider an identity column for clustering, our queries aren’t specifically using this value.

What is the solution to this problem?

Reply
1. Paul Randal says:
  
  April 11, 2018 at 8:51 am
  
  The solution is well beyond the scope of a comment. Depends on what the table is going to be used for, what queries will be against it, insert pattern, and so on. I recommend you watch Kimberly’s Pluralsight course on SQL Server: Indexing for Performance.
  
  Reply

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.

An example of a nasty cluster key

15 thoughts on “An example of a nasty cluster key”

Leave a Reply Cancel reply

Other articles

The Curious Case of… missing 832 errors during memory corruption

The Curious Case of… recovering from a ransomware attack

30%-off health checks and self-service via credit card – no waiting!

The Curious Case of…. alcoholism

The Mentoring Class of 2023

Mentoring 2023 – want some help from me?

Imagine feeling confident enough to handle whatever your database throws at you.