What columns are in that index?

I’ve been helping several customers with tuning lately, which typically means I’m also looking at indexes and asking questions. A pattern I’ve noticed is columns from the clustering key being added to the key definition for the nonclustered index.  This brings up some interesting considerations when it comes to creating your nonclustered indexes.

Review

We’re going to start with a copy of WideWorldImporters, and create a table and indexes from it.
Notice that I’m adding a column to this table named OrderID_and_OrderLineID, which is the OrderID and the OrderLineID values concatenated. In Sales.OrderLines, OrderLine is unique, but OrderID is not. The two of them concatenated are unique, and I made this my primary key, as well as my clustering key (remember, you can have a primary key that’s different than your clustering key, which is a separate topic for another day).

USE [WideWorldImporters]
GO

DROP TABLE IF EXISTS [Sales].[SQLskills_OrderLines];
GO

CREATE TABLE [Sales].[SQLskills_OrderLines](
     [OrderLineID] [int] NOT NULL,
     [OrderID] [int] NOT NULL,
     [OrderID_and_OrderLineID] [bigint] NOT NULL,
     [StockItemID] [int] NOT NULL,
     [Description] [nvarchar](100) NOT NULL,
     [PackageTypeID] [int] NOT NULL,
     [Quantity] [int] NOT NULL,
     [UnitPrice] [decimal](18, 2) NULL,
     [TaxRate] [decimal](18, 3) NOT NULL,
     [PickedQuantity] [int] NOT NULL,
     [PickingCompletedWhen] [datetime2](7) NULL,
     [LastEditedBy] [int] NOT NULL,
     [LastEditedWhen] [datetime2](7) NOT NULL,
     CONSTRAINT [PK_Sales_SQLskills_OrderLines] PRIMARY KEY CLUSTERED
     (
          [OrderID_and_OrderLineID] ASC
     )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
     ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [USERDATA]
     ) ON [USERDATA];
GO

INSERT INTO [Sales].[SQLskills_OrderLines](
     [OrderLineID]
     ,[OrderID]
     ,[OrderID_and_OrderLineID]
     ,[StockItemID]
     ,[Description]
     ,[PackageTypeID]
     ,[Quantity]
     ,[UnitPrice]
     ,[TaxRate]
     ,[PickedQuantity]
     ,[PickingCompletedWhen]
     ,[LastEditedBy]
     ,[LastEditedWhen]
     )
SELECT
     [OrderLineID]
     ,[OrderID]
     ,CAST(CAST([OrderID] AS VARCHAR(20)) + CAST([OrderLineID] AS VARCHAR(20)) AS BIGINT)
     ,[StockItemID]
     ,[Description]
     ,[PackageTypeID]
     ,[Quantity]
     ,[UnitPrice]
     ,[TaxRate]
     ,[PickedQuantity]
     ,[PickingCompletedWhen]
     ,[LastEditedBy]
     ,[LastEditedWhen]
FROM [Sales].[OrderLines];
GO

We’ll use Kimberly’s version of sp_helpindex, sp_SQLskills_helpindex, to see what’s truly in the index, and where. I much prefer this version to the built-in sp_helpindex, as it includes information about included columns, filters, and tells you what columns are in the tree and leaf levels.

EXEC sp_SQLskills_helpindex 'Sales.SQLskills_OrderLines';
GO
Clustered index for SQLskills_OrderLines
Clustered index for SQLskills_OrderLines

This output is wide, which doesn’t lend itself well to a post, so you may need to zoom in on the image. I also trimmed out the included and filtered columns as neither are relevant in this discussion.

Notice that the only column in columns_in_tree is the concatenated OrderID and OrderLineID, and then all columns are in columns_in_leaf (because the leaf level of the clustered index is all the data).  Now we’ll add a nonclustered index on OrderID:

CREATE NONCLUSTERED INDEX [NCI_OrderID]
     ON [Sales].[SQLskills_OrderLines] ([OrderID])
     ON [USERDATA];
GO

EXEC sp_SQLskills_helpindex 'Sales.SQLskills_OrderLines';
GO
Nonclustered index for SQLskills_OrderLines
Nonclustered index for SQLskills_OrderLines

In columns_in_tree we have our index key, OrderID, but we also have our clustering key, OrderID_and_OrderLineID, and both are in columns_in_leaf. What happens if we create a nonclustered index on OrderLineID, and make it unique?

CREATE UNIQUE NONCLUSTERED INDEX [NCI_OrderLineID]
     ON [Sales].[SQLskills_OrderLines] ([OrderLineID])
     ON [USERDATA];
GO

EXEC sp_SQLskills_helpindex 'Sales.SQLskills_OrderLines';
GO
Unique nonclustered index for SQLskills_OrderLines
Unique nonclustered index for SQLskills_OrderLines

In the sp_SQLskills_helpindex output you’ll notice that OrderLineID is the only column in columns_in_tree, and both OrderLineID and OrderID_and_OrderLineID are in columns_in_leaf.

Remember that when you identify a nonclustered index as unique, it will not add the clustering key to the tree level, it just adds it to the leaf level. But if the nonclustered index is not declared as unique, then SQL Server adds the clustering key to the tree level to uniquely identify the row.

If we had created a clustered index that we didn’t also identify as unique, then SQL Server adds a UNIQUIFIER to the clustered index key, which then also gets added to the nonclustered index.

Defining the clustering key as part of the non-clustered key

What I’ve seen lately when looking at indexes is this:

CREATE NONCLUSTERED INDEX [NCI_OrderID_2]     
     ON [Sales].[SQLskills_OrderLines] ([OrderID], [OrderID_and_OrderLineID])
     ON [USERDATA];
GO

This is the same as the NCI_OrderID index I created, but someone has specifically placed the clustering key in the nonclustered index key. I always ask why they added the clustering key. Typically the answer is: I know it’s going to be needed in the query. That’s fine, but understand that it doesn’t matter whether you add the clustering key to the key for a non-unique nonclustered index or not; if you don’t, SQL Server will add it.

We’ve kept it simple so far with a single column for our clustering key. What if we have multiple columns? We’ll change up our table a bit:

USE [WideWorldImporters]
GO

DROP TABLE IF EXISTS [Sales].[SQLskills_OrderLines];
GO

CREATE TABLE [Sales].[SQLskills_OrderLines](
     [OrderLineID] [int] NOT NULL,
     [OrderID] [int] NOT NULL,
     [OrderID_and_OrderLineID] [bigint] NOT NULL,
     [StockItemID] [int] NOT NULL,
     [Description] [nvarchar](100) NOT NULL,
     [PackageTypeID] [int] NOT NULL,
     [Quantity] [int] NOT NULL,
     [UnitPrice] [decimal](18, 2) NULL,
     [TaxRate] [decimal](18, 3) NOT NULL,
     [PickedQuantity] [int] NOT NULL,
     [PickingCompletedWhen] [datetime2](7) NULL,
     [LastEditedBy] [int] NOT NULL,
     [LastEditedWhen] [datetime2](7) NOT NULL,
     CONSTRAINT [PK_Sales_SQLskills_OrderLines] PRIMARY KEY CLUSTERED
     (
     [OrderID] ASC, [OrderLineID] ASC, [StockItemID] ASC, [PackageTypeID] ASC
     )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
     ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [USERDATA]
     ) ON [USERDATA];
GO

INSERT INTO [Sales].[SQLskills_OrderLines](
     [OrderLineID]
     ,[OrderID]
     ,[OrderID_and_OrderLineID]
     ,[StockItemID]
     ,[Description]
     ,[PackageTypeID]
     ,[Quantity]
     ,[UnitPrice]
     ,[TaxRate]
     ,[PickedQuantity]
     ,[PickingCompletedWhen]
     ,[LastEditedBy]
     ,[LastEditedWhen])
SELECT
     [OrderLineID]
     ,[OrderID]
     ,CAST(CAST([OrderID] AS VARCHAR(20)) + CAST([OrderLineID] AS VARCHAR(20)) AS BIGINT)     
     ,[StockItemID]
     ,[Description]
     ,[PackageTypeID]
     ,[Quantity]
     ,[UnitPrice]
     ,[TaxRate]
     ,[PickedQuantity]
     ,[PickingCompletedWhen]
     ,[LastEditedBy]
     ,[LastEditedWhen]
FROM [Sales].[OrderLines];
GO

EXEC sp_SQLskills_helpindex 'Sales.SQLskills_OrderLines';
GO
Four column clustered index for SQLskills_OrderLines
Four column clustered index for SQLskills_OrderLines

I don’t generally advocate a wide clustered index, but there are certain scenarios where it’s needed/appropriate. If we take this table and add a nonclustered index, not unique, on one column, then you’ll see all four columns from the clustering key in the tree level:

CREATE NONCLUSTERED INDEX [NCI_Quantity]
     ON [Sales].[SQLskills_OrderLines] ([Quantity])
     ON [USERDATA];
GO

EXEC sp_SQLskills_helpindex 'Sales.SQLskills_OrderLines';
GO
Nonclustered index (with wide CI) for SQLskills_OrderLines
Nonclustered index (with wide CI) for SQLskills_OrderLines

For a large table, the size of this nonclustered index has the potential to be much, much larger than a table where the clustering key is only a single column.

If we add a unique nonclustered index, then that clustering key is only in the leaf level:

CREATE UNIQUE NONCLUSTERED INDEX [NCI_OrderID_and_OrderLineID]
     ON [Sales].[SQLskills_OrderLines] ([OrderID_and_OrderLineID])
     ON [USERDATA];
GO

EXEC sp_SQLskills_helpindex 'Sales.SQLskills_OrderLines';
GO
Unique nonclustered index (with wide CI) for SQLskills_OrderLines
Unique nonclustered index (with wide CI) for SQLskills_OrderLines

But then…what happens if we add one or two columns from the clustering key into the nonclustered index key?

CREATE NONCLUSTERED INDEX [NCI_PickingCompletedWhen]
     ON [Sales].[SQLskills_OrderLines] ([PickingCompletedWhen], [OrderID], [OrderLineID])
     ON [USERDATA];
GO

EXEC sp_SQLskills_helpindex 'Sales.SQLskills_OrderLines';
GO
Nonclustered index with column from clustering key for SQLskills_OrderLines
Nonclustered index with column from clustering key for SQLskills_OrderLines

It doesn’t matter! All the columns from the clustering key are added to the tree level because I did not identify it as unique! If I had defined the nonclustered index as unique, then only those two columns that I specified in the key would be in the tree, and the other two columns from the clustering key would be part of the leaf level.

Conclusion

There are a couple points to remember when designing your nonclustered indexes. First, whenever you know that the column, or combination or columns, makes a unique key for a nonclustered index, I always recommend that you specify UNIQUE when you create the index. That information is great for the optimizer to have when generating query plans. Second, think carefully about the impact of your clustering key on your nonclustered indexes. Any non-unique nonclustered index will be larger (more pages, eventually more levels) because the entire clustering key will be in the tree level. Further, if you happen to have a clustering key with a column (or multiple columns) that are frequently modified, then those modifications have to be propagated to the tree levels of the nonclustered indexes as well.

Happy indexing!

One thought on “What columns are in that index?

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

A Fond Farewell

If you haven’t guessed from the title, I’m writing this post because I am leaving SQLskills. This Friday, January 14th, is my last day, and

Explore

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.