As many of you know, we started our SQLskills SQL101 series a couple of weeks ago… it’s been great fun for the whole team to go back through our most common questions / concerns and set the record straight, per se. We’ve still got a lot of things to discuss but indexing is one of many questions / discussions and unfortunately, misunderstandings.
I’m going to tie today’s post with a question I received recently: if I have a table that has 6 foreign key columns/references should I create one index with all 6 foreign key columns in it, or should I create 6 individual indexes – one for each foreign key reference.
This is an interesting question with a few tangents to cover for our SQL101 series. I consider indexing foreign keys as part of my “Phase 1 of index tuning.” But, let’s go back to some basics before we dive into the answer on this one.
What is a Foreign Key Enforcing?
Imagine you have two tables: Employees and Departments. The Employee table has a column called DepartmentID and it represents the department of which that employee is a member. The department ID must be a valid department. So, to guarantee that the department ID exists and is valid – we create a foreign key to the DepartmentID column of the Departments table. When a row is inserted or updated in the Employees table, SQL Server will check to make sure that the value entered for DepartmentID is valid. This reference is very inexpensive because the foreign key MUST reference a column which is unique (which is in turn, enforced by a unique index).
What Must Exist in Order to Create a Foreign Key Reference?
A foreign key can be created on any column(s) that has a unique index on the referenced table. That unique index can be created with a CREATE INDEX statement OR that index could have been created as part of a constraint (either a UNIQUE or PRIMARY KEY constraint). A foreign key can reference ANY column(s) that has a UNIQUE index; it does not have to have been created by a constraint. And, this can be useful during performance tuning. A UNIQUE index offers options that constraints do not. For example, a UNIQUE index can have included columns and filters. A foreign key reference CAN reference a UNIQUE index with included columns; however, it cannot reference a UNIQUE index with a filter (I wish it could).
A good example of this might occur during database tuning and specifically during index consolidation (something I do after I do after query tuning and when I’m determining the best index for the database / for production). I often review existing indexes as well as any missing index recommendations, etc. Check out my SQLskills SQL101: Indexing Basics post for more information about these concepts.
Imagine I have the following:
Table: Employees, column: NationalID (SocialSecurityNumbery)
Column NationalID: this is an alternate key for Employees as their Primary Key is EmployeeID. Because it’s another column on which you will lookup employees and you want to make sure it’s unique, you decide to enforce it with a UNIQUE constraint on it. You may even reference it from other tables.
However, later, while doing database tuning, you decide that you need the following index:
CREATE INDEX [QueryTuningIndex] ON [dbo].[Employees] ([NationalID]) INCLUDE ([LastName], [FirstName])
This index would be similar to and redundant with the existing constraint-based index on NationalID. But, you really want this new index to help performance (you’ve tested that this index is helpful to some frequently executed and important queries so you’ve already decided that the costs outweigh the negatives).
And, this is where the excellent feature to reference a unique index comes in… instead of adding this new one and keeping the existing constraint, change the index to the following:
CREATE UNIQUE INDEX [QueryTuningIndex] ON [dbo].[Employees] ([NationalID]) INCLUDE ([LastName], [FirstName])
The uniqueness is always enforced ONLY on the key-portion of the index. So, this new index – even with included columns – still does this. The only bad news is that SQL Server has already associated the foreign key with the original constraint-based index so you’ll still need to remove the foreign key to drop the constraint (I wish this weren’t true). But, you’ll still have data integrity handled by the new unique index – as long as you create the new index before you drop the foreign key and the original unique constraint. Having said that, there’s more bad news – because there will be a short timeframe where the foreign key does not exist, you must do this off hours and when little-to-no activity is occurring. This will reduce the possibility of rows being inserted / updated that do not have a valid reference. You’ll certainly find out when you add the referential constraint again as the default behavior of adding the foreign key will be to verify that all rows have a reference row. NOTE: there is a way to skip this checking but it is NOT recommended as your constraint will be marked as untrusted. It’s is ALWAYS PREFERRED to create a foreign key with CHECK. Here’s a script that will walk you through the entire example – leveraging the default behavior to recheck the data when the constraint is created. Be sure to run this is a test / junk database.
Creating an Index on a Foreign Key Column
Now that you know the options for the column being referenced, let’s consider what’s required for the referencing column? The column on which the foreign key is created will not have an index by default. I wrote about this in a prior post: When did SQL Server stop putting indexes on Foreign Key columns? and the main point is that SQL Server has NEVER automatically created indexes on foreign key columns. But, many of us recommend that you do! (but, I also wish indexing were just that simple because this might not be an index you keep forever…)
See, if EVERY foreign key column automatically had an index created for you – then SQL Server might end up requiring it to always be there. This would then remove the option of consolidating this index with others when you were later running through performance tuning techniques. So, while it’s generally a good idea to have an index on a foreign key column; it might not ALWAYS be a good idea to keep that narrow index as you add other (probably, slightly-wider indexes).
However, initially, creating this index is what I call: Phase 1 of tuning for joins.
But, there are 3 phases of tuning for joins and these phases are all during query tuning and not database tuning. So, as you do deeper tuning, you might end up consolidating this foreign key index with another index(es) to reduce the overall number of indexes on your table.
Finally, the most important point (and this answers the original question), the index must be ONE per foreign key (with only the column(s) of that specific foreign key); you will create one index for each foreign key reference.
Indexes on Foreign Keys Can Help Improve Performance
There are two ways in which these indexes can improve performance.
First, they can help the foreign key reference maintain integrity on the referenced table. Take the example of Employees and Departments. Not only must SQL Server check that a DepartmentID is valid when inserting / updating an Employee row but SQL Server must also make sure that referential integrity is maintained when DepartmentIDs are removed from the Departments table. An index on the Employees table (on the DepartmentID columns) can be used to quickly check if any rows reference the DepartmentID being deleted from the Departments table. Without an index on DepartmentID in the Employees table, SQL Server would potentially have to scan the Employees table; this can be quite expensive.
Second, and this doesn’t always work, SQL Server may be able to use the index to help improve join performance. And, this is where I’ll cut the ideas a bit short as other phases of join tuning are more complex for this SQL101 post. So while there are other strategies that can be used to tune joins when this doesn’t work, it’s still a fantastic starting point. In fact, I generally recommend indexing foreign keys as part of your very early / development phase for a database. But, again, these indexes might be consolidated later in favor of other indexes.
Indexing for performance has many steps and many strategies, I hope to keep uncovering these in our SQL101 series but between this post and the Indexing Basics post, you’re well on your way to kicking off a better tuning strategy for your tables!
And, don’t forget to check out all of our SQL101 posts here!
Thanks for reading,
7 thoughts on “SQLskills SQL101: Indexes on Foreign Keys”
Nice post Kim. I’ve always thought that declared foreign keys should automatically create indexes on the keys unless you use the “DONT_ADD_INDEX_I_KNOW_WHAT_IM_DOING” option.
There are countless perf problems in the wild caused by the lack of these.
Every time I hear a comment that having more indexes slows down modifications, etc., I just smile. DELETEs are particularly prone to issues with this. But even for UPDATE operations, any data you modify has to be found first. I think people writing that always assume all data is updated via the PK, which is so often isn’t.
Hey there Greg – Yes, I remember your post many moons ago asking if people thought that SQL Server should automatically add them. That would be great – especially if there were an option like you suggest – that would allow us to consolidate / drop them when we know that they’re not needed or that we can create an even better index!
“which it so often isn’t” -> don’t write in the morning before sufficient coffee :-)
LOL. True that! ;-)
Your article brings up some very good points. However, I support a database that is heavily normalized. When I create an index on a foreign key, at some point later it will show up as unused so then it is removed. How do I deal with this?
The main point is that this is a *starting* point for indexing. And, one of the reasons that I’m glad that it’s not required is that you CAN drop them in favor of better indexes (consolidation) or if they’re not used. But, yes, if they’re getting used (for joins) then you’ll want to keep them around. I would definitely start with these but during test / QA – you may end up dropping a few (but, only after significant and thorough testing).
Hope that helps!