I'm teaching a class this week on database maintenance, for DBAs inside Microsoft. One of the things we're discussing today is index fragmentation and how poor cluster key choice can lead to page splits, poor performance, index fragmentation, and so on – not just in the clustered index, but also in nonclustered indexes.
One of the students looked in a database underpinning an application and found a unique cluster key, which is the worst I've ever seen (although not the worst that Kimberly's ever seen apparently – the mind boggles!).
The cluster key is defined as a combination of the following column types:
Now, the wide cluster key isn't a big deal UNLESS there are nonclustered indexes, but there are in this case – so the cluster key is included in all nonclustered index rows. And the random GUID high-order key is always a bad idea, as it means the clustered index will be heavily fragmented as records are inserted. This is all simplified and generalizations (and I open this can of worms happily) – but you get the idea.
Good design up-front, with an understanding of how key choice affects the behavior of SQL Server and how indexes are stored and indexed, can lead to vastly reduced performance problems and maintenance issues.
15 thoughts on “An example of a nasty cluster key”
Sounds like an auto-gen CI. No-one could possibly deliberately select that … I think Kimberly blogged about an entity framework that created something like this a little while back.
Come one Paul, now you’re just messing with us.
There’s no way this is CI, is there? :)
This is a deliberately chosen CI key from a large server application.
The Windows 7 download application, by any chance?
Nasty. Not quite as bad, but I recently ran across an internal app that has a table with the following Primary Key/Clustered Index on a 65 million row table:
And yes, there are 4 non-clustered indexes, including another one on the same TinyInt field (obviously useless). Total data space for the table is about 5GB. Total index space is 17.7GB :)
Holy crap. I just looked at the table contents, and all 65 million rows in the table have the same TinyInt value!! Talk about a selective index!
I love how I can see one blog post and suddenly every bad thing I’ve ever seen professionally in my life pales in comparison. My horror story was the 16 million row table with a CI on int, varchar(750) (both ASC order) with a NCI on the same columns in DESC order. Thanks Paul.
@gail no… didn’t hear any more info about that – don’t suppose we will
I wonder why people still use GUIDs instead of Int/BigInt identity columns. If the GUID on that CI is Sequential, it would certainly help with the fragmentation, but even that does not save the CI from a "Horrid" label.
Hehe, I love how they managed to hit *both*: random first column and wide cl ix…
>>so the cluster key is included in all nonclustered index rows.
Is that always true? From Kalen’s article in which she links to your blog I get a different understanding.
I assumed that the fact that the clustered index key was in the nonclustered index meant that it was part of the key. And all my previous testing seemed to bear that out; I must never have tested at this level of detail with a unique nonclustered index. What this means is that I have been wrong in the following claim: "Explicitly declaring your clustered key columns as part of your nonclustered indexes does not make any difference in the storage of the index." That statement is true for nonunique indexes, but not true for unique.
Thanks – Walt Baby
It’s in every nonclustered index row (meaning at the leaf leve), just not necessarily in the non-leaf records – depending on whether its a non-unique or unique nonclustered index. By saying nonclustered index row, just like saying clustered index row, I mean the leaf-level records in the index – not the non-leaf records, which are usually very different in their contents.
That’s why I didn’t say that the clustered index keys are part of the nonclustered index key – because it depends.
This is one part of DBCC CHECKDB that I took quite a while over while writing it to make sure it copes with every case.
This is also factored into the Books Online sections I wrote on calculating the size of a nonclustered index.
@Tim Worst I’ve ever seen was a table used for a homebrewed service broker’s message table. High inserts, high updates. 26 nonclustered indexes, all of which shared the same leading column (which had only one value in the entire table). 12 of those shared one second column, the other 13 all had another second column. The widest index was 9 columns.
Oddly enough it didn’t perform well. </sarcasm>
As you know, the clustered index should be: Narrow, Unique, Static(never updated) and Ever-increasing.
I think create a clustered index on a customer ID is good idea, because that is unique and static and narrow.
The only problem is that the IDs will not be inserted to the table incrementally.
This cause insert speed will decrease and also index fragmentation.
If we consider an identity column for clustering, our queries aren’t specifically using this value.
What is the solution to this problem?
The solution is well beyond the scope of a comment. Depends on what the table is going to be used for, what queries will be against it, insert pattern, and so on. I recommend you watch Kimberly’s Pluralsight course on SQL Server: Indexing for Performance.