Q: What about using uniqueidentifier fields for keys in a clustered index, will this affect performance? They are random numbers so they are not monotonically increasing…
Ha, I always want to time how long it’s going to take for me to get this question. Seriously, I don’t think I’ve EVER given an indexing lecture (where I talked about the clustering key) where someone didn’t ask about GUIDs. And – once again – a great question! I do not really have a problem with GUIDs in general. I think they have a purpose in that they are relatively “random” numbers (they are not really random entirely as they get part of their value from SQL Server, part from Windows, part is based on time, part is ever increasing and part is random; however, the final values do not create any form of pattern). They truly make sense in distributed applications or those where you do not want a predictable row identifier; however, they are not always an appropriate choice for a key. (I think this should be a discussion in and of itself. Issues related to GUIDs. OK, I’ll work on this one!) But this leads me to another issue – is the primary key always clustered and/or always need to be an identity. These three things are really three separate things and they do NOT have to be on the same column. I often prefer that all three are the same column as an identity is naturally unique, naturally static, naturally narrow as well as creates an ever-increasing pattern that minimizes splits/fragmentation due to inserts (again, only when this is also the clustering key). So I guess the question is – does it really need to be the Primary key? And well, that’s the debatable but if all three criteria come together that often gives you the largest gains. If the Primary key is not an identity (and it’s a very large transaction processing table with a lot of inserts/updates/deletes) then I would consider adding a surrogate column – solely for clustering – and make the Primary Key non-clustered. The non-clustered index will certainly become fragmented (if on a GUID) so as long as proper maintenance practices are in place you can reduce the excess disk space and improve modification performance.
Q: What about UPDATEs that change the row’s clustering columns?
This question was related to “what can cause fragmentation” where I was discussing UPDATEs to variable width columns. While updates to variable width columns can cause a row to widen (and possibly result in a page split) it’s a good point to mention that volatility in the clustering key is far worse. In fact, in last month’s webcast, I spent quite a bit of time discussing proper creation of the clustering key – to minimize fragmentation and in general, improve performance. There are three criteria for which I look in a clustering key: the key should be Unique, Narrow and Static. As final criteria – and this helps to minimize fragmentation proactively – I look for the key to have an ever-increasing pattern. Sometimes this means creating the clustered index on an identity column and if one doesn’t exist you might consider adding a column solely for this purpose! I probably wouldn’t worry about this for small tables but for an extremely large and volatile table this can often be a good idea. You might also consider (depending on access patterns), creating the clustered index on a combination of columns – such as date, identity; this creates the ever increasing pattern and also ensures uniqueness for the value as date is likely to have duplicates. There are multiple options and there are multiple considerations – all of which affect query performance, data modification performance, maintenance requirements and therefore availability. Make sure to review many of the other questions in this area as a few have a great amount of detail.