Index Creation Questions:


Q: In an OLTP server do clustered indexes create a negative performance impact to the constant inserts and updates?


The wrong clustered index can for sure… In most environments, having the right clustered index (an index on an ever-increasing key) will be better than a heap (a table without a clustered index) and much better than the wrong clustered index (a clustered index that creates wildly random access patterns for insert – i.e. LastName). So – the idea with the clustered index debate (as I often refer to this as J) is that the right clustered index balances an improvement in performance for inserts (by minimizing splits as well as having the needed pages already in cache – and by needing less cache overall) with keeping the table more compact (i.e. less fragmentation). However, if you have a high volume of inserts (typically more than 400 per second) then you might want to add additional hot spots (cluster on a composite key like “state, id” which creates as many hot spots as you deal with states) to better balance this increased volume. I still wouldn’t go with something random (like name) and if the table has a lot of scans/reads I certainly wouldn’t go with a heap. You do have to be careful with updates though – if there’s a high volume of updates and the updates are to varchar then there are a few things I’d think about: fixed length fields (which is rarely an option), default values (GREAT choice if the updates come in with no value and later get updated) and rebuilds with a more appropriate fillfactor.


 


Q: Is a unique key the same as a primary key?


Almost… SQL Server has two ways of enforcing entity integrity: the Primary Key (of which there can only be one per table) and Unique Key(s) (of which there can be many – 249 is the maximum number of non-clustered indexes so unique keys and other non-clustered indexes combined can be no more than 249).


 


Q: I thought you said the primary key constraint automatically gets a clustered index, how can you make it non-clustered key?


When specifying index-based constraints (Primary Key and Unique Key) you can specify index type. The default index type (when not specified) is Clustered for the Primary Key and non-clustered for a unique key. Here’s complete syntax for adding these keys to existing tables.


 


ALTER TABLE dbo.Employee
  ADD CONSTRAINT EmployeePK
    PRIMARY KEY CLUSTERED (EmployeeID)


 


ALTER TABLE dbo.Employee
  ADD CONSTRAINT EmployeeSSNUK
    UNIQUE NONCLUSTERED (SSN)


 


In either case you can CHANGE the underlined part of the syntax to change the index type.


 


Q: How is a unique key created? Explicitly or is it Implied via Indexes?


Well, I have to admit I’m not entirely sure of the question here but I think I do… When you create a unique key constraint SQL Server enforces that uniqueness by automatically adding a unique non-clustered index to the table.


 


Q: Are GUIDs as efficient as identity for the clustering key?


Not really. They do meet the criteria of being static and unique however, they’re not as narrow (they’re 16 bytes) and if the newid function is used to populate the values (which is the common approach) then the values are not ever-increasing. With the values populated by the newid function you can end up with a lot of fragmentation quickly. There are alternatives to the newid function for generating the GUID value as an incrementing value through a windows call. Gert Drapers has an extended procedure to help you do this on www.SQLDev.net. I’m not sure of the specific link but I will update this here when I do!


 


Q: When you say to avoid a “volatile” clustering key, do you mean the data value actually changing/being updated?


Yes! Since the clustering key is in EVERY nonclustered index you want to choose it wisely and make sure that it’s relatively static. If it changes that will require all of the redundant versions of the value to change as well…


 


Q: If you create an identity column in a table just for the sake of creating a clustered index on it, how does this improve performance (especially if most of the queries use other fields in the table)?


Another great question! Remember two things: the internals of SQL Server rely heavily on the clustering key (even if you don’t) and if the table stays more compact (by not becoming fragmented) then queries [especially large scan queries] improve. So – it’s a better balance. Now to add a bit more – in many cases having the right nonclustered index can yield better performance for most queries anyway!


 


Q: You said in the power point slides that generally a few wide indexes are better than several narrow; However, later you said that one narrow clustered index on an id column + non clustered indexes is the way to go. So which is it? Or did I not really understand?


You’ve got it – it’s a combination of the two. The clustered should be narrow/static/unique but non-clustered indexes tend to be more effective when they are slightly wider (i.e. they contain a couple to a few columns rather than just one) so that they can service more types of query requests.


 


Q: Do I need to change my indexes if I use the like operator? I noticed that I get Index Scans instead of seeks when I use LastName like ‘%johnson%’ v. LastName = ‘johnson’?


Well, this one is kind of mixed. But – let me give you something to think about (which might help)… Think about the phone book for a second – if you have someone’s last name it’s pretty easy to find them. However, if you only know they have “johnson” somewhere in their name it’s a lot harder, right. This could yield ‘Cajohnson’, ‘Ejohnson’, ‘Smith-Johnson’ and so forth. So – there’s really no effective way to get at that data… SQL Server will always need to scan for that type of request (now I could add that covering indexes might help but quite honestly I think there’s a better option here). SO – changing index strategy really won’t help. But – I often see applications that just automatically add the % before and after the requested data. If this is the case, can you ask the user if it’s a real value or a pattern value? Or can you even allow them to supply the exact request? When the value is specific (i.e. Johnson) then the query will be fast, when the value is pattern matching (i.e. %johnson%) then SQL Server will need to scan and the query won’t be as fast – but it will be correct! If you want to search searching for more wildcards and more complex pattern matching into descriptions, etc. then you’ll probably want to look into Full-text Indexing.