On index key size, index depth, and performance

In my Insider newsletter a couple of weeks ago, I discussed how index fragmentation is often considered when designing indexes, but index depth often isn’t. In the newsletter I said I’d do a more comprehensive blog post with some data, so this is it.

Fanout and Index Depth

The index depth is determined by the fanout of the index. From the newsletter:

The fanout of an index measures, for a page at level x in an index, how many pages it references in the level below (nearer the leaf level). The higher the fanout is, the fewer the number of levels in the index.

The index key size impacts the size of the structure needed to reference it. Specifically, the index key is pushed up to all entries (and all levels) in the index as it’s used to allow navigation through the index from the root page down to the leaf level.

The larger the index key size, the fewer index records can be stored in an index page and so the lower the fanout. The lower the fanout is the more levels are required in the index, depending on the number of pages at the leaf level.

For instance, if the fanout is 10 in an index, that means each index page can hold 10 index records, referencing 10 pages at the level below in the index. If the index has 10,000 pages at the leaf level, there needs to be 1,000 pages in the level above, then 100 pages, then 10 pages, and finally the root page. That’s a total of 5 levels.

For the same data, if the index fanout is changed to 100, and the index has 10,000 pages at the leaf level, the next level needs 100 pages, and then there’s the root page. That’s a total of only three levels.

I want to measure whether there’s a noticeable performance difference based on the fanout, and hence index depth, of an index from varying it’s key size, for single-row select operations. There won’t be any noticeable effect on scans, as that only involves a single traversal of the index, to find the starting point of the scan. (Ok, it’s a little more complicated than that for scans if any of the index leaf-level pages change while the scan is positioned on them, but that’s not relevant here.)

Test Description

The test I’m going to use is:

  • Create a table with 2 million records, each record being large enough that only one record can fit on each data page (I was going to do ten million rows, but that was just taking too long)
  • Drop any existing clustered index
  • Create a new clustered index with varying key size from 8 to 900 bytes (creating the index after populating the table guarantees the tightest space usage)
  • Ensure that all the index is in memory (clear wait stats and make sure there are no page reads from disk during the next step)
  • Time how long it takes to do a single row lookup of all 2 million rows (run 5 tests and average the times)

Here’s the code for my test:

USE [master];

IF DATABASEPROPERTYEX (N'IndexDepthTest', N'Version') != 0
    DROP DATABASE [IndexDepthTest];

    NAME = N'IndexDepthTest_data',
    FILENAME = N'T:\IDT\IndexDepthTest_data.mdf',
    SIZE = 32768MB,
    NAME = N'IndexDepthTest_log',
    FILENAME = N'N:\IDT\IndexDepthTest_log.ldf',
    SIZE = 2048MB,
    FILEGROWTH = 256MB);



USE [IndexDepthTest];

CREATE TABLE [DepthTest] (
    [c2] CHAR (8) DEFAULT 'c2',		-- to allow 16-byte key
    [c3] CHAR (92) DEFAULT 'c3',	-- to allow 100-byte key
    [c4] CHAR (300) DEFAULT 'c4',	-- to allow 400-byte key
    [c5] CHAR (500) DEFAULT 'c5',	-- to allow 900-byte key
    [c6] CHAR (4000) DEFAULT 'c6');	-- to force one row per leaf page

GO 2000000

-- Run one of the following sets of DROP/CREATE statements

-- No existing clustered index to drop
CREATE CLUSTERED INDEX [8ByteKey] ON [DepthTest] ([c1]);

DROP INDEX [8ByteKey] ON [DepthTest];
CREATE CLUSTERED INDEX [16ByteKey] ON [DepthTest] ([c1], [c2]);

DROP INDEX [16ByteKey] ON [DepthTest];
CREATE CLUSTERED INDEX [100ByteKey] ON [DepthTest] ([c1], [c3]);

DROP INDEX [100ByteKey] ON [DepthTest];
CREATE CLUSTERED INDEX [400ByteKey] ON [DepthTest] ([c1], [c3], [c4]);

DROP INDEX [400ByteKey] ON [DepthTest];
CREATE CLUSTERED INDEX [900ByteKey] ON [DepthTest] ([c1], [c3], [c4], [c5]);

FROM sys.dm_db_index_physical_stats (
    DB_ID (N'IndexDepthTest'),
    OBJECT_ID (N'DepthTest'),

WHILE (@c != 5)
    DECLARE @a BIGINT = 0;

    WHILE (@a != 2000000)
        SELECT @b = [c1] FROM [DepthTest] WHERE [c1] = @a;
        SELECT @a = @a + 1;

    SELECT GETDATE () - @t;
    SELECT @c = @c + 1;

My test server is a Dell R720 with 16 physical cores (Intel E5-2670 @ 2.60 GHz), 64GB of memory, a Fusion-io/SanDisk 640GB SSD for storage, and I’m running the test on SQL Server 2012.

The test is designed both to make sure that the index is traversed all the way down to the leaf level (and the leaf record has to be accessed to check the existence of the value being selected and to retrieve it), and to make sure that all pages in the index are in memory.

I’ll walk through the steps for the 8-byte cluster key and then present the data for all the tests.

It took a few minutes to do the 2 million inserts, and then create the first clustered index. The results of the DMV call were:

index_depth index_level page_count           record_count
----------- ----------- -------------------- --------------------
4           0           2000000              2000000
4           1           4214                 2000000
4           2           16                   4214
4           3           1                    16

So with a index key size of 8 bytes, the index needs 4214 pages at level 1 in the index structure to hold references to all 2 million leaf-level pages. This means the fanout value is 2000000 / 4214, which is approximately 474.

The times for the 2 million selects for the 8-byte cluster key were 21.983s, 21.94s, 21.973s, 21.967s, 21.963s, with an average of 21.9652s, and a per-select average of 10.98 microseconds.

Test Results

Running the test for each of my test key sizes produced the following results:

Key Size Index Depth Total Page Count Fanout Average Time for selects Rough time per select
-------- ----------- ---------------- ------ ------------------------ ---------------------
8        4           2004231          474    21.9652 secs             10.9826 microsecs
16       4           2006980          288    21.8122 secs             10.9061 microsecs
100      5           2028182          72     22.9522 secs             11.4976 microsecs
400      6           2111124          19     23.7482 secs             11.8741 microsecs
900      8           2285728          8      25.5732 secs             12.7866 microsecs

The results clearly show that there’s a performance penalty for index seeks when the index has more levels. At each level of the index during a seek, a binary search takes place, to find the right index record to use to navigate down to the next level lower in the index, and this binary search takes CPU time.

For each additional level in the index, my results show that it takes roughly 0.4 to 0.5 microseconds of extra time, and that’s pure CPU time as there were no page reads during the tests.

You might wonder why the per-select time for the 16-byte key index is less than for the 8-byte key index, even though they have the same depth of 4 in my test. That’s to do with the binary search algorithm. On average, the number of comparisons that need to be done for a binary search of x elements is log (x) for log base 2. For the 8-byte index, the fanout (i.e. number of records per page for the binary search) is 474, giving an average number of comparisons of 8.9. For the 16-byte index, the fanout is 288, giving the average number of comparisons of 8.2. This slight drop accounts for the slight drop we see in the test time – it’s a tiny bit more efficient for a lower fanout with the same index depth. I’m not going to say that this means you’re better off with a GUID cluster key than a bigint – that’s a whole other discussion with much more to consider than just single-row select performance :-)


My results show that index depth does matter.

Index depth is determined by the number of rows in the index and the index key size. You can’t control the number of rows, but you can control the index key size. Where possible, the smaller you can keep the index key size, the smaller the index depth will be for the same number of records, and the faster an index traversal from root page to leaf level will be.

Even though we’re only talking about fractions of a microsecond, for workloads with huge numbers of single-row select operations, that all adds up, and especially so on older, slower processors where the difference will be more pronounced than in my tests. And these results also counter the argument that says “index depth doesn’t matter because it’s all in memory anyway”.

Bottom line – this is one more reason to keep your index keys as narrow as possible.

Btw, Kimberly goes into all this in much more detail in her excellent 4-hour Pluralsight course on SQL Server: Why Physical Database Design Matters.

Low priority locking wait types

[Edit 2016: Check out my new resource – a comprehensive library of all wait types and latch classes – see here.]

SQL Server 2014 (and Azure SQL Database V12) added some cool new functionality for online index operations to allow you to prevent long-term blocking because of the two blocking locks that online index operations require.

At the start of any online index operation, it acquires a S (share) table lock. This lock will be blocked until all transactions that are changing the table have committed, and while the lock is pending, it will block any transactions wanting to change the table in any way. The S lock is only held for a short amount of time, then dropped to an IS (Intent-Share) lock for the long duration of the operation. At the end of any online index operation, it acquires a SCH-M (schema modification) table lock, which you can think of as a super-exclusive lock. This lock will be blocked by any transaction accessing or changing the table, and while the lock is pending, it will block any transactions wanting to read or change the table in any way.

The new syntax allow you to specify how long the online index operation will wait for each of these locks, and what to do when the timeout expires (nothing: NONE, kill the online index operation: SELF, or kill the blockers of the online index operation: BLOCKERS – see Books Online for more info). While the online index operation is blocked, it shows a different lock wait type than we’re used to seeing, and any lock requests are allowed to essentially jump over the online index operation in the lock pending queues – i.e. the online index operation waits with lower priority than everything else on the system.

To demonstrate this, I’ve got a table called NonSparseDocRepository, with a clustered index called NonSparse_CL, and 100,000 rows in the table.

First, I’ll kick off an online index rebuild of the clustered index, specifying a 1 minute wait, and to kill itself of the wait times out:

ALTER INDEX [NonSparse_CL] ON [nonsparsedocrepository] REBUILD

I let it run for ten seconds or so, so make sure it got past the initial table S lock required. Now, in another connection, I’ll start a transaction that takes an IX table lock, which will block the final SCH-M lock the online index operation requires:


UPDATE [NonSparseDocRepository]
SET [c4] = '1'
WHERE [DocID] = 1;

And then I’ll wait until the drive light on my laptop goes off, which lets me know that the online index rebuild is stalled. If I look in sys.dm_os_waiting_tasks (using the script in this post), I’ll see the rebuild is blocked (script output heavily edited for clarity and brevity):

session_id exec_context_id scheduler_id wait_duration_ms wait_type                blocking_session_id resource_description
57         0               4            7786             LCK_M_SCH_M_LOW_PRIORITY 58                  objectlock

Look at the wait type: LCK_M_SCH_M_LOW_PRIORITY. The _LOW_PRIORITY suffix indicates that this is a special lock wait attributable to the online index operation being blocked.

This also neatly proves that the wait-at-low-priority feature applies to both the blocking locks that online index operations require, even if the first one isn’t blocked.

And eventually the online index operation fails, as follows:

Msg 1222, Level 16, State 56, Line 1
Lock request time out period exceeded.

If I leave that open transaction in the other connection (holding its IX table lock), and try the index rebuild again, with the exact same syntax, it’s immediately blocked and the sys.dm_os_waiting_tasks script shows:

session_id exec_context_id scheduler_id wait_duration_ms wait_type                blocking_session_id resource_description
57         0               4            8026             LCK_M_S_LOW_PRIORITY     58                  objectlock

This shows that the initial blocking lock is blocked, and is waiting at low priority.

So if either of these wait types show up during your regular wait statistics analysis, now you know what’s causing them.