Do changes to index keys really do in-place updates?

There was an interesting discussion on Twitter this morning (started by my good friend Erin Stellato (blog|twitter)) about in-place updates for index records where the key value changes and the record remains on the same page. Various sources including the SQL Server 2008 Internals book (pg 361 – I didn't write or review that bit :-) describe the process, but they describe it incorrectly unfortunately. They say that the record will remain in the exact same location on the same page and only the key storage bytes need to change.

This is not true. True in-place updates of index key values do not happen in 2005 onwards. Let's go down the rabbit hole…

Here's a simple example:

CREATE DATABASE KeyUpdateTest;
GO
USE KeyUpdateTest;
GO
ALTER DATABASE KeyUpdateTest SET RECOVERY SIMPLE;
ALTER DATABASE KeyUpdateTest SET AUTO_CREATE_STATISTICS OFF;
ALTER DATABASE KeyUpdateTest SET AUTO_UPDATE_STATISTICS OFF;
GO
CREATE TABLE test (c1 INT, c2 VARCHAR (2000));
GO
CREATE CLUSTERED INDEX test_cl ON test (c1);
GO
CHECKPOINT;
GO
INSERT INTO test VALUES (1, REPLICATE ('Paul', 500));
GO
CHECKPOINT;
GO

Now we look at the page itself (and you can get the stored-proc I use below from my old blog post Inside The Storage Engine: sp_AllocationMetadata – putting undocumented system catalog views to work):

EXEC sp_allocationMetadata 'test';
GO

Object Name Index ID Alloc Unit ID     Alloc Unit Type First Page Root Page First IAM Page
———– ——– —————– ————— ———- ——— ————–
test        1        72057594039762944 IN_ROW_DATA     (1:144)    (1:144)   (1:145)

DBCC TRACEON (3604);
GO
DBCC PAGE (keyupdatetest, 1, 144, 2);
GO

Memory Dump @0x0000000016A6A000

0000000016A6A000:   01010400 00c20001 00000000 00000800 †…..Â……….
0000000016A6A010:   00000000 00000100 1c000000 bd174108 †…………½.A.
0000000016A6A020:   90000000 01000000 1d000000 8a000000 †…………Š…
0000000016A6A030:   12000000 00000000 00000000 239dbd4a †…………#.½J
0000000016A6A040:   00000000 00000000 00000000 00000000 †…………….
0000000016A6A050:   00000000 00000000 00000000 00000000 †…………….
0000000016A6A060:   30000800 01000000 03000002 001100e1 †0…………..á
0000000016A6A070:   07506175 6c506175 6c506175 6c506175 †.PaulPaulPaulPau
0000000016A6A080:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000016A6A090:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000016A6A0A0:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000016A6A0B0:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
<snip>
0000000016A6A820:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000016A6A830:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000016A6A840:   6c000021 21212121 21212121 21212121 †l..!!!!!!!!!!!!!
0000000016A6A850:   21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
<snip>
0000000016A6BFE0:   21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
0000000016A6BFF0:   21212121 21212121 21212121 21216000 †!!!!!!!!!!!!!!`.

OFFSET TABLE:

Row – Offset                        
0 (0x0) – 96 (0x60)

Now I'll update the record:

UPDATE test SET c1 = 2 WHERE c1 =1;
GO

And let's look at the log records generated (in the output below I've removed all the extraneous stuff from the log that occurs before my transaction):

SELECT [Current LSN], [Operation], [Context],
 [Log Record Length], [Page ID], [Slot ID] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation           Context               Log Record Length Page ID        Slot ID
———————– ——————- ——————— —————– ————– ———–
0000001d:00000096:0011  LOP_BEGIN_XACT      LCX_NULL              112               NULL           NULL
0000001d:00000096:0012  LOP_DELETE_ROWS     LCX_MARK_AS_GHOST     2116              0001:00000090  0
0000001d:00000096:0013  LOP_MODIFY_HEADER   LCX_PFS               76                0001:00000001  0
0000001d:00000096:0014  LOP_SET_BITS        LCX_PFS               56                0001:00000001  0
0000001d:00000096:0015  LOP_INSERT_ROWS     LCX_CLUSTERED         2120              0001:00000090  1
0000001d:00000096:0016  LOP_COMMIT_XACT     LCX_NULL              52                NULL           NULL
0000001d:000000a2:0001  LOP_EXPUNGE_ROWS    LCX_CLUSTERED         64                0001:00000090  0
0000001d:000000a2:0002  LOP_SET_BITS        LCX_PFS               56                0001:00000001  0
0000001d:000000a2:0003  LOP_MODIFY_HEADER   LCX_PFS               76                0001:00000001  0

The row is deleted and then inserted. Look at the log record lengths! And look at the slot numbers. The new row is in slot 1 (the LOP_INSERT_ROWS record) and slot 0 is ghosted (the LOP_DELETE_ROWS record), and then we see ghost cleanup running (the LOP_EXPUNGE_ROWS record), which removes the old slot 0, and the slot array shifts down so the new record is slot 0 again. See my blog post Inside the Storage Engine: Ghost cleanup in depth for info on ghost cleanup internals.

But is it in the same physical place on the page? No – a whole new record was created.

DBCC PAGE (keyupdatetest, 1, 144, 2);
GO

Memory Dump @0x0000000013CAC000

0000000013CAC000:   01010400 00800001 00000000 00000800 †…………….
0000000013CAC010:   00000000 00000100 1c000000 bd172210 †…………½.".
0000000013CAC020:   90000000 01000000 1d000000 a2000000 †…………¢…
0000000013CAC030:   01000000 76020000 00000000 239dbd4a †….v…….#.½J
0000000013CAC040:   01000000 00000000 00000000 00000000 †…………….
0000000013CAC050:   00000000 00000000 00000000 00000000 †…………….
0000000013CAC060:   3c000800 01000000 03000002 001100e1 †<…………..á
0000000013CAC070:   07506175 6c506175 6c506175 6c506175 †.PaulPaulPaulPau
0000000013CAC080:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000013CAC090:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
<snip>
0000000013CAC820:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000013CAC830:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000013CAC840:   6c300008 00020000 00030000 02001100 †l0…………..
0000000013CAC850:   e1075061 756c5061 756c5061 756c5061 †á.PaulPaulPaulPa
0000000013CAC860:   756c5061 756c5061 756c5061 756c5061 †ulPaulPaulPaulPa
<snip>
0000000013CADFE0:   21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
0000000013CADFF0:   21212121 21212121 21212121 41084108 †!!!!!!!!!!!!A.A.

OFFSET TABLE:

Row – Offset                        
0 (0x0) – 2113 (0x841)              

Notice that where the original record ended (offset 0x830 into the page). In the first page dump, everything after that is empty space. In the second page dump, the new record starts there. And look at the slot array (called the offset table by DBCC PAGE). In the first page dump the original record started at offset 0x60, and in the second dump it starts at offset 0x841. Even though ghost cleanup ran, the record did NOT migrate to it's original location.

It's very clear that an in-place update did not happen, and that the entire record is recreated in it's entirety.

In this case, the record ended up being physical slot 0 again, but that's a pathological case because there's only one slot on the page. If you repeat the experiment with two records (e.g. c1=1 and c1=2) and then update c1 of the first record to 3, you'll see the new record being physical slot 2, then after ghost cleanup becoming physical slot 1. But again, it's deleted and reinserted in it's entirety.

Now I'll show you a case where the row has to move to a new page – definitely not an in-place update!

CREATE DATABASE KeyUpdateTest;
GO
USE KeyUpdateTest;
GO
ALTER DATABASE KeyUpdateTest SET RECOVERY SIMPLE;
ALTER DATABASE KeyUpdateTest SET AUTO_CREATE_STATISTICS OFF;
ALTER DATABASE KeyUpdateTest SET AUTO_UPDATE_STATISTICS OFF;
GO
CREATE TABLE test (c1 INT, c2 VARCHAR (4000));
GO
CREATE CLUSTERED INDEX test_cl ON test (c1);
GO
CHECKPOINT;
GO
INSERT INTO test VALUES (1, REPLICATE ('Paul', 1000));
GO
INSERT INTO test VALUES (2, REPLICATE ('Erin', 1000));
GO
CHECKPOINT;
GO

I've filled up the page with two rows.

DBCC PAGE (keyupdatetest, 1, 144, 2);
GO

<snip>
Metadata: ObjectId = 2105058535      m_prevPage = (0:0)                   m_nextPage = (0:0)
pminlen = 8                          m_slotCnt = 2                        m_freeCnt = 58
m_freeData = 8130                    m_reservedCnt = 0                    m_lsn = (29:150:2)
<snip>
0000000016A6BFA0:   696e4572 696e4572 696e4572 696e4572 †inErinErinErinEr
0000000016A6BFB0:   696e4572 696e4572 696e4572 696e4572 †inErinErinErinEr
0000000016A6BFC0:   696e0000 21212121 21212121 21212121 †in..!!!!!!!!!!!!
0000000016A6BFD0:   21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
0000000016A6BFE0:   21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
0000000016A6BFF0:   21212121 21212121 21212121 11106000 †!!!!!!!!!!!!..`.

OFFSET TABLE:

Row – Offset                        
1 (0x1) – 4113 (0x1011)             
0 (0x0) – 96 (0x60)

Now I'll update the first row to have c1 = 3 so my row moves away from Erin's :-)

UPDATE test SET c1 = 3 WHERE c1 =1;
GO

And look at the log:

SELECT [Current LSN], [Operation], [Context],
 [Log Record Length], [Page ID], [Slot ID] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation           Context                Log Record Length Page ID        Slot ID
———————– ——————- ———————- —————– ————– ———–
0000001d:000000a3:0015  LOP_BEGIN_XACT      LCX_NULL               136               NULL           NULL
0000001d:000000a3:0016  LOP_MODIFY_ROW      LCX_PFS                80                0001:00000001  0
0000001d:000000a3:0017  LOP_MODIFY_ROW      LCX_IAM                88                0001:00000091  0
0000001d:000000a3:0018  LOP_HOBT_DELTA      LCX_NULL               64                NULL           NULL
0000001d:000000a3:0019  LOP_FORMAT_PAGE     LCX_INDEX_INTERIOR     84                0001:00000092  -1
0000001d:000000a3:001a  LOP_INSERT_ROWS     LCX_INDEX_INTERIOR     84                0001:00000092  0
0000001d:000000a3:001b  LOP_ROOT_CHANGE     LCX_CLUSTERED          96                0001:00000082  52
0000001d:000000a3:001c  LOP_COMMIT_XACT     LCX_NULL               52                NULL           NULL
0000001d:000000a3:001d  LOP_BEGIN_XACT      LCX_NULL               120               NULL           NULL
0000001d:000000a3:001e  LOP_MODIFY_ROW      LCX_PFS                80                0001:00000001  0
0000001d:000000a3:001f  LOP_MODIFY_ROW      LCX_IAM                88                0001:00000091  0
0000001d:000000a3:0020  LOP_HOBT_DELTA      LCX_NULL               64                NULL           NULL
0000001d:000000a3:0021  LOP_FORMAT_PAGE     LCX_HEAP               84                0001:00000093  -1
0000001d:000000a3:0022  LOP_INSERT_ROWS     LCX_CLUSTERED          4092              0001:00000093  0
0000001d:000000a3:0023  LOP_DELETE_SPLIT    LCX_CLUSTERED          60                0001:00000090  1
0000001d:000000a3:0024  LOP_MODIFY_HEADER   LCX_HEAP               84                0001:00000090  0
0000001d:000000a3:0025  LOP_INSERT_ROWS     LCX_INDEX_INTERIOR     88                0001:00000092  1
0000001d:000000a3:0026  LOP_COMMIT_XACT     LCX_NULL               52                NULL           NULL
0000001d:000000a3:0027  LOP_INSERT_ROWS     LCX_CLUSTERED          4120              0001:00000093  1
0000001d:000000a3:0028  LOP_COMMIT_XACT     LCX_NULL               52                NULL           NULL

This is really interesting. Here's what happens (missing a few things for clarity):

  • 0000001d:000000a3:0016: page (1:146) is allocated to be the new root page for the clustered index
  • 0000001d:000000a3:0019: it gets minimally formatted (just the page header – although the page header is 96 bytes, there is 12 bytes of empty space at the end)
  • 0000001d:000000a3:001a: the index record pointing at (1:144) is inserted in the root page
  • 0000001d:000000a3:001b: the index root os changedd from (1:144) to (1:146) in metadata
  • 0000001d:000000a3:001e: page (1:147) is allocated to be the second data page at the leaf of the clustered index
  • 0000001d:000000a3:0022: a page split of page (1:144) occurs, moving the Erin row to slot 0 of page (1:147)
  • 0000001d:000000a3:0023: the Erin row is removed from page (1:144) – not ghosted
  • 0000001d:000000a3:0025: the index record pointing at (1:147) is inserted in the root page
  • 0000001d:000000a3:0027: the updated Paul row is inserted as slot 1 of page (1:147)

As you can see, this is as far as you can get from doing an in-place update. Page (1:144) is left with a single ghost-record on (the Paul record – the Erin record isn't ghosted because it was moved because of a split). The next access to the page will cause ghost cleanup to remove it and deallocate the page.

So no, in-place updates of key values do not happen, even though they appear to at first glance. You can experiment to show the same thing happening in nonclustered indexes too.

Another misconception busted!

PS Some of you may remember when such in-place updates really *did* happen. That was before 2005 when the Storage Engine (specifically my old Access Methods team) did the index maintenance (changing clustered and nonclustered index structures based on key value changes). A whole bunch of features in 2005 meant that we switched it over to the Query Processor to drive the index maintenance, and nifty short-cuts like we used to do aren't possible any more.

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.