I'm teaching a class this week on database maintenance, for DBAs inside Microsoft. One of the things we're discussing today is index fragmentation and how poor cluster key choice can lead to page splits, poor performance, index fragmentation, and so on – not just in the clustered index, but also in nonclustered indexes.

One of the students looked in a database underpinning an application and found a unique cluster key, which is the worst I've ever seen (although not the worst that Kimberly's ever seen apparently – the mind boggles!).

The cluster key is defined as a combination of the following column types:

  • 16-byte GUID
  • varbinary (16)
  • nvarchar (512)
  • nvarchar (256)
  • tinyint

Now, the wide cluster key isn't a big deal UNLESS there are nonclustered indexes, but there are in this case – so the cluster key is included in all nonclustered index rows. And the random GUID high-order key is always a bad idea, as it means the clustered index will be heavily fragmented as records are inserted. This is all simplified and generalizations (and I open this can of worms happily) - but you get the idea.

Good design up-front, with an understanding of how key choice affects the behavior of SQL Server and how indexes are stored and indexed, can lead to vastly reduced performance problems and maintenance issues.