SQLskills at the PASS Summit

The PASS Summit is coming up this fall in Seattle, and it will be here before you know it.  Though I want to mention that I am thoroughly enjoying summer and really not ready for pumpkin anything!  I’m returning to Summit this year, as are Jonathan and Glenn.  We are all looking forward to the conference and with the schedule announced this week, I wanted to include some more details about  the full day pre-conference sessions that Jonathan and I are each presenting.  Lucky Jonathan, we are presenting together as part of the Developer Learning Pathway along with fellow Clevelander Bert Wagner (one of my favorite peeps, ask him about coffee!).

I’ve included links to all of our sessions below, and since Jonathan and I are teaching on consecutive days, you can attend both our workshops 😊

My pre-conference session will dive deep into how to leverage Query Store to detect and correct performance issues, both with existing SQL Server 2016 and newer environments, and Jonathan’s session will teach you how to properly use the new features in SQL Server 2017 and SQL Server 2019 to get better performance.  If you’re planning an upgrade OR moving to Azure, both of these workshops are applicable.  We think these sessions complement each other very well, and even though we aren’t presenting together, we will reference content/concepts discussed in the other session.  Do you have to attend both to make sense of it all?  Definitely not.  But we do have a ton of great information to cover and are already working on new demos to showcase the latest and greatest in SQL Server.

If you have questions, please email us!  You can email me or Jonathan directly to ask about our pre-conference sessions if you don’t have a clear picture after reading the abstract.  We know that there are a lot of great workshops at Summit, and we definitely want to make sure you learn what will help you most.  Glenn, Jonathan and I look forward to seeing you in November!

Erin’s workshop on Monday, November 4, 2019: Performance Tuning with Query Store in SQL Server

Jonathan’s workshop on Tuesday, November 5, 2019: Bigger Hardware or Better Code and Design?

Erin’s general session on Wednesday, November 6, 2019 at 3:15PM: Understanding Execution Plans

Glenn’s half-day session on Wednesday, November 6, 2019 at 3:15PM: Dr. DMV’s Troubleshooting Toolkit

Jonathan’s general session on Thursday, November 7, 2019 at 10:45 AM: Eliminating Anti-Patterns and RBAR for Faster Performance

Glenn’s general session on Friday, November 8, 2019 at 9:30AM: Hardware 301: Choosing Database Hardware for SQL Server 2019

 

 

What columns are in that index?

I’ve been helping several customers with tuning lately, which typically means I’m also looking at indexes and asking questions. A pattern I’ve noticed is columns from the clustering key being added to the key definition for the nonclustered index.  This brings up some interesting considerations when it comes to creating your nonclustered indexes.

Review

We’re going to start with a copy of WideWorldImporters, and create a table and indexes from it.
Notice that I’m adding a column to this table named OrderID_and_OrderLineID, which is the OrderID and the OrderLineID values concatenated. In Sales.OrderLines, OrderLine is unique, but OrderID is not. The two of them concatenated are unique, and I made this my primary key, as well as my clustering key (remember, you can have a primary key that’s different than your clustering key, which is a separate topic for another day).

USE [WideWorldImporters]
GO

DROP TABLE IF EXISTS [Sales].[SQLskills_OrderLines];
GO

CREATE TABLE [Sales].[SQLskills_OrderLines](
     [OrderLineID] [int] NOT NULL,
     [OrderID] [int] NOT NULL,
     [OrderID_and_OrderLineID] [bigint] NOT NULL,
     [StockItemID] [int] NOT NULL,
     [Description] [nvarchar](100) NOT NULL,
     [PackageTypeID] [int] NOT NULL,
     [Quantity] [int] NOT NULL,
     [UnitPrice] [decimal](18, 2) NULL,
     [TaxRate] [decimal](18, 3) NOT NULL,
     [PickedQuantity] [int] NOT NULL,
     [PickingCompletedWhen] [datetime2](7) NULL,
     [LastEditedBy] [int] NOT NULL,
     [LastEditedWhen] [datetime2](7) NOT NULL,
     CONSTRAINT [PK_Sales_SQLskills_OrderLines] PRIMARY KEY CLUSTERED
     (
          [OrderID_and_OrderLineID] ASC
     )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
     ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [USERDATA]
     ) ON [USERDATA];
GO

INSERT INTO [Sales].[SQLskills_OrderLines](
     [OrderLineID]
     ,[OrderID]
     ,[OrderID_and_OrderLineID]
     ,[StockItemID]
     ,[Description]
     ,[PackageTypeID]
     ,[Quantity]
     ,[UnitPrice]
     ,[TaxRate]
     ,[PickedQuantity]
     ,[PickingCompletedWhen]
     ,[LastEditedBy]
     ,[LastEditedWhen]
     )
SELECT
     [OrderLineID]
     ,[OrderID]
     ,CAST(CAST([OrderID] AS VARCHAR(20)) + CAST([OrderLineID] AS VARCHAR(20)) AS BIGINT)
     ,[StockItemID]
     ,[Description]
     ,[PackageTypeID]
     ,[Quantity]
     ,[UnitPrice]
     ,[TaxRate]
     ,[PickedQuantity]
     ,[PickingCompletedWhen]
     ,[LastEditedBy]
     ,[LastEditedWhen]
FROM [Sales].[OrderLines];
GO

We’ll use Kimberly’s version of sp_helpindex, sp_SQLskills_helpindex, to see what’s truly in the index, and where. I much prefer this version to the built-in sp_helpindex, as it includes information about included columns, filters, and tells you what columns are in the tree and leaf levels.

EXEC sp_SQLskills_helpindex 'Sales.SQLskills_OrderLines';
GO
Clustered index for SQLskills_OrderLines

Clustered index for SQLskills_OrderLines

This output is wide, which doesn’t lend itself well to a post, so you may need to zoom in on the image. I also trimmed out the included and filtered columns as neither are relevant in this discussion.

Notice that the only column in columns_in_tree is the concatenated OrderID and OrderLineID, and then all columns are in columns_in_leaf (because the leaf level of the clustered index is all the data).  Now we’ll add a nonclustered index on OrderID:

CREATE NONCLUSTERED INDEX [NCI_OrderID]
     ON [Sales].[SQLskills_OrderLines] ([OrderID])
     ON [USERDATA];
GO

EXEC sp_SQLskills_helpindex 'Sales.SQLskills_OrderLines';
GO
Nonclustered index for SQLskills_OrderLines

Nonclustered index for SQLskills_OrderLines

In columns_in_tree we have our index key, OrderID, but we also have our clustering key, OrderID_and_OrderLineID, and both are in columns_in_leaf. What happens if we create a nonclustered index on OrderLineID, and make it unique?

CREATE UNIQUE NONCLUSTERED INDEX [NCI_OrderLineID]
     ON [Sales].[SQLskills_OrderLines] ([OrderLineID])
     ON [USERDATA];
GO

EXEC sp_SQLskills_helpindex 'Sales.SQLskills_OrderLines';
GO
Unique nonclustered index for SQLskills_OrderLines

Unique nonclustered index for SQLskills_OrderLines

In the sp_SQLskills_helpindex output you’ll notice that OrderLineID is the only column in columns_in_tree, and both OrderLineID and OrderID_and_OrderLineID are in columns_in_leaf.

Remember that when you identify a nonclustered index as unique, it will not add the clustering key to the tree level, it just adds it to the leaf level. But if the nonclustered index is not declared as unique, then SQL Server adds the clustering key to the tree level to uniquely identify the row.

If we had created a clustered index that we didn’t also identify as unique, then SQL Server adds a UNIQUIFIER to the clustered index key, which then also gets added to the nonclustered index.

Defining the clustering key as part of the non-clustered key

What I’ve seen lately when looking at indexes is this:

CREATE NONCLUSTERED INDEX [NCI_OrderID_2]     
     ON [Sales].[SQLskills_OrderLines] ([OrderID], [OrderID_and_OrderLineID])
     ON [USERDATA];
GO

This is the same as the NCI_OrderID index I created, but someone has specifically placed the clustering key in the nonclustered index key. I always ask why they added the clustering key. Typically the answer is: I know it’s going to be needed in the query. That’s fine, but understand that it doesn’t matter whether you add the clustering key to the key for a non-unique nonclustered index or not; if you don’t, SQL Server will add it.

We’ve kept it simple so far with a single column for our clustering key. What if we have multiple columns? We’ll change up our table a bit:

USE [WideWorldImporters]
GO

DROP TABLE IF EXISTS [Sales].[SQLskills_OrderLines];
GO

CREATE TABLE [Sales].[SQLskills_OrderLines](
     [OrderLineID] [int] NOT NULL,
     [OrderID] [int] NOT NULL,
     [OrderID_and_OrderLineID] [bigint] NOT NULL,
     [StockItemID] [int] NOT NULL,
     [Description] [nvarchar](100) NOT NULL,
     [PackageTypeID] [int] NOT NULL,
     [Quantity] [int] NOT NULL,
     [UnitPrice] [decimal](18, 2) NULL,
     [TaxRate] [decimal](18, 3) NOT NULL,
     [PickedQuantity] [int] NOT NULL,
     [PickingCompletedWhen] [datetime2](7) NULL,
     [LastEditedBy] [int] NOT NULL,
     [LastEditedWhen] [datetime2](7) NOT NULL,
     CONSTRAINT [PK_Sales_SQLskills_OrderLines] PRIMARY KEY CLUSTERED
     (
     [OrderID] ASC, [OrderLineID] ASC, [StockItemID] ASC, [PackageTypeID] ASC
     )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
     ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [USERDATA]
     ) ON [USERDATA];
GO

INSERT INTO [Sales].[SQLskills_OrderLines](
     [OrderLineID]
     ,[OrderID]
     ,[OrderID_and_OrderLineID]
     ,[StockItemID]
     ,[Description]
     ,[PackageTypeID]
     ,[Quantity]
     ,[UnitPrice]
     ,[TaxRate]
     ,[PickedQuantity]
     ,[PickingCompletedWhen]
     ,[LastEditedBy]
     ,[LastEditedWhen])
SELECT
     [OrderLineID]
     ,[OrderID]
     ,CAST(CAST([OrderID] AS VARCHAR(20)) + CAST([OrderLineID] AS VARCHAR(20)) AS BIGINT)     
     ,[StockItemID]
     ,[Description]
     ,[PackageTypeID]
     ,[Quantity]
     ,[UnitPrice]
     ,[TaxRate]
     ,[PickedQuantity]
     ,[PickingCompletedWhen]
     ,[LastEditedBy]
     ,[LastEditedWhen]
FROM [Sales].[OrderLines];
GO

EXEC sp_SQLskills_helpindex 'Sales.SQLskills_OrderLines';
GO
Four column clustered index for SQLskills_OrderLines

Four column clustered index for SQLskills_OrderLines

I don’t generally advocate a wide clustered index, but there are certain scenarios where it’s needed/appropriate. If we take this table and add a nonclustered index, not unique, on one column, then you’ll see all four columns from the clustering key in the tree level:

CREATE NONCLUSTERED INDEX [NCI_Quantity]
     ON [Sales].[SQLskills_OrderLines] ([Quantity])
     ON [USERDATA];
GO

EXEC sp_SQLskills_helpindex 'Sales.SQLskills_OrderLines';
GO
Nonclustered index (with wide CI) for SQLskills_OrderLines

Nonclustered index (with wide CI) for SQLskills_OrderLines

For a large table, the size of this nonclustered index has the potential to be much, much larger than a table where the clustering key is only a single column.

If we add a unique nonclustered index, then that clustering key is only in the leaf level:

CREATE UNIQUE NONCLUSTERED INDEX [NCI_OrderID_and_OrderLineID]
     ON [Sales].[SQLskills_OrderLines] ([OrderID_and_OrderLineID])
     ON [USERDATA];
GO

EXEC sp_SQLskills_helpindex 'Sales.SQLskills_OrderLines';
GO
Unique nonclustered index (with wide CI) for SQLskills_OrderLines

Unique nonclustered index (with wide CI) for SQLskills_OrderLines

But then…what happens if we add one or two columns from the clustering key into the nonclustered index key?

CREATE NONCLUSTERED INDEX [NCI_PickingCompletedWhen]
     ON [Sales].[SQLskills_OrderLines] ([PickingCompletedWhen], [OrderID], [OrderLineID])
     ON [USERDATA];
GO

EXEC sp_SQLskills_helpindex 'Sales.SQLskills_OrderLines';
GO
Nonclustered index with column from clustering key for SQLskills_OrderLines

Nonclustered index with column from clustering key for SQLskills_OrderLines

It doesn’t matter! All the columns from the clustering key are added to the tree level because I did not identify it as unique! If I had defined the nonclustered index as unique, then only those two columns that I specified in the key would be in the tree, and the other two columns from the clustering key would be part of the leaf level.

Conclusion

There are a couple points to remember when designing your nonclustered indexes. First, whenever you know that the column, or combination or columns, makes a unique key for a nonclustered index, I always recommend that you specify UNIQUE when you create the index. That information is great for the optimizer to have when generating query plans. Second, think carefully about the impact of your clustering key on your nonclustered indexes. Any non-unique nonclustered index will be larger (more pages, eventually more levels) because the entire clustering key will be in the tree level. Further, if you happen to have a clustering key with a column (or multiple columns) that are frequently modified, then those modifications have to be propagated to the tree levels of the nonclustered indexes as well.

Happy indexing!