The title is a common question I've received in the past and I thought I'd take a few minutes to explain a bit about keys and indexes…

This is by no means a lot of detail regarding relational theory, etc. but there are a few things that we should quickly review to make sure that the basis for indexes being created makes sense. First, from a relational theory perspective every table must have a primary key. From SQL Server's perspective it's not a requirement but it's generally a good idea. A primary/unique key are entity identifiers. Each are a unique way of identifying a row. There are subtle differences between the two – in implementation:

Primary Key

  • In SQL Server the Primary Key is enforced through a Primary Key Constraint.
  • None of the columns that make up the primary key allow nulls.
  • The values in the Primary Key must be unique – to enforce uniqueness (as well as make it efficient), SQL Server creates a unique clustered [composite] index on the column(s) that make up the key.

Unique Key

  • In SQL Server the Primary Key is enforced through a Unique Key Constraint.
  • The columns that make up the unique key CAN allow nulls but not for more than one complete key. Meaning that allowing Nulls on a single column unique key really only allows NULL (only one NULL value). Overall, allowing Nulls in a column (like Social Security Number) doesn't really make much sense but if you need to then you can't go with a unique constraint – instead consider a unique index. At that point, I typically get the question of what's the difference between a unique key and a unique index…
  • Allowing Nulls values in the columns that make up a composite unique key makes more sense as long as the complete key is not null for more than one row.
  • The values in the unique Key must be unique and to enforce uniqueness (as well as make it efficient), SQL Server creates a unique NON-clustered [composite] index on the column(s) that make up the key.

What's the difference between a unique index and a unique constraint?

  • A unique key CAN be referenced by a foreign key constraint and a column which has only a unique index cannot be referenced. NOTE: This is NO LONGER true with SQL Server 2005. See this post: http://www.sqlskills.com/blogs/kimberly/post/Foreign-Keys-can-reference-UNIQUE-indexes-(without-constraints).aspx.
  • Constraints are checked before indexes and this can lead to a large multi-row insert/select or update to fail before modification. However, indexes might (for a large modification) be validated at the end (instead of row by row) so a large modification that has a failure will need to rollback at the end of the modification rather than before. This is a good point – and one I hadn't really thought of until I was poking around some of the Q&A on SQLMag's website. Here's where I found it.

So, all of this leads me up to the original question (yes, you knew I get here someday :) and that's “When did SQL Server stop putting indexes on Foreign Key columns?”

First, SQL Server has NEVER put an index on a foreign key column… Indexes are used (as described above) to make the lookup (in a primary or unique key) for a duplicate value FAST. If the keys are ordered then checking to see if one already exists is trivial (i.e. fast). There is NO reason for SQL Server to put an index on a foreign key column as the column does not (and probably would never be) unique (if it is then it's likely to have a primary or unique key on it as well – as in a 1-1 relationship). So, that leads me to another key point…

Are there any benefits to indexing foreign key columns? YES

  • Better performance on maintaining the relationship on a delete of a primary/unique key. When you delete a key row, SQL Server must check to see if there are any rows which reference the row being deleted.
    • If the foreign key relationship is defined with NO ACTION (on update/delete) then a referenced row CANNOT be deleted as it would leave the referencing rows “orphaned.” To find the rows efficiently an index on the foreign key column helps!
    • If the foreign key relationship is defined with CASCADE (on update/delete) then when a referenced row is modified all of the referencing rows must be modified as well (either updated to reflect the new value or on cascade delete). To find the rows to modify efficiently, an index on the foreign key column helps!
  • Better join performance – for many of the reasons above, SQL Server can more effectively find the rows to join to when tables are joined on primary/foreign key relationships. However, this is NOT always the “best” indexing choice for joins but it is a good start. 

Finally, if you want a few titles related to relational theory check out these links:

An Introduction to Database Systems, Eighth Edition
     by C.J. Date
     E.F. Codd

Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design
     Mike Hernandez

Have fun!
kt