Inside the Storage Engine: Proof that records are not always physically stored in index key order

I mentioned this in my Anatomy of a page post – its a common misconception that records in an index are ALWAYS stored in the same physical order as the logical order defined by the index key. Here’s proof for you that this is incorrect (as well as introducing you to the other dump styles for DBCC PAGE).

I’m going to create a table with a clustered index on an integer column and keep the table to a single page for simplicity:

USE [master];
GO
IF DATABASEPROPERTY (N'rowordertest', 'Version') > 0 DROP DATABASE [rowordertest];
GO

CREATE DATABASE [rowordertest];
GO
USE [rowordertest];
GO

CREATE TABLE [t1] ([c1] INT, [c2] VARCHAR (10));
CREATE CLUSTERED INDEX [t1c1] ON [t1] ([c1]);
GO

Now I’m going to insert a few rows into the table, with c1 from 2 to 5, deliberately not inserting c1 = 1:

INSERT INTO [t1] VALUES (2, REPLICATE ('b', 10));
INSERT INTO [t1] VALUES (3, REPLICATE ('c', 10));
INSERT INTO [t1] VALUES (4, REPLICATE ('d', 10));
INSERT INTO [t1] VALUES (5, REPLICATE ('e', 10));
GO

Now, using DBCC IND we see that the data page is (1:143) and dumping that with DBCC PAGE gives the following (skipping the header output):

DBCC IND ('rowordertest', 't1', 1);
GO

DBCC TRACEON (3604);
GO

DBCC PAGE ('rowordertest', 1, 143, 3);
GO
Slot 0 Offset 0x60 Length 27
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS

Memory Dump @0x5BA3C060
00000000:   30000800 02000000 0300f802 0011001b  0...............
00000010:   00626262 62626262 626262             .bbbbbbbbbb
UNIQUIFIER = [NULL]

Slot 0 Column 1 Offset 0x4 Length 4
c1 = 2

Slot 0 Column 2 Offset 0x11 Length 10
c2 = bbbbbbbbbb

(skip slots 2 and 3 for brevity)

Slot 3 Offset 0xb1 Length 27
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS

Memory Dump @0x5BA3C0B1
00000000:   30000800 05000000 0300f802 0011001b  0...............
00000010:   00656565 65656565 656565             .eeeeeeeeee
UNIQUIFIER = [NULL]

Slot 3 Column 1 Offset 0x4 Length 4
c1 = 5

Slot 3 Column 2 Offset 0x11 Length 10
c2 = eeeeeeeeee

DBCC PAGE with dump-style 3 always outputs the records on a page in their logical order (because that’s how the slot array is ordered). Notice that the record with c1 = 2 is stored at offset 0x60 in the page and the last record on the page with c1 = 5 is stored at offset 0xb1. Now we’ll insert a record with c1 = 1. This will become the first logical record in the index, but will it cause the page to be shuffled so the records can all be stored in logical order?

INSERT INTO [t1] VALUES (1, REPLICATE ('a', 10));
GO

DBCC PAGE ('rowordertest', 1, 143, 3);
GO
Slot 0 Offset 0xcc Length 27
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS

Memory Dump @0x61FCC0CC
00000000:   30000800 01000000 0300f802 0011001b  0...............
00000010:   00616161 61616161 616161             .aaaaaaaaaa
UNIQUIFIER = [NULL]

Slot 0 Column 1 Offset 0x4 Length 4
c1 = 1

Slot 0 Column 2 Offset 0x11 Length 10
c2 = aaaaaaaaaa

Slot 1 Offset 0x60 Length 27
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS

(snip)

The answer is no. Even though the record with c1 = 1 is output by DBCC PAGE first, look at its offset within the page – 0xCC – clearly the last record on the page and stored in a different physical order than the logical order defined by the index key. Further proof can be obtained by looking at a raw hex dump of the page using dump style 2 with DBCC PAGE:

DBCC PAGE ('rowordertest', 1, 143, 2);
GO
(snip)

6204C000:   01010400 00400001 00000000 00000800  .....@..........
6204C010:   00000000 00000500 44000000 0f1fe700  ........D.......
6204C020:   8f000000 01000000 13000000 60000000  ............`...
6204C030:   16000000 00000000 00000000 00000000  ................
6204C040:   01000000 00000000 00000000 00000000  ................
6204C050:   00000000 00000000 00000000 00000000  ................
6204C060:   30000800 02000000 0300f802 0011001b  0...............
6204C070:   00626262 62626262 62626230 00080003  .bbbbbbbbbb0....
6204C080:   00000003 00f80200 11001b00 63636363  ............cccc
6204C090:   63636363 63633000 08000400 00000300  cccccc0.........
6204C0A0:   f8020011 001b0064 64646464 64646464  .......ddddddddd
6204C0B0:   64300008 00050000 000300f8 02001100  d0..............
6204C0C0:   1b006565 65656565 65656565 30000800  ..eeeeeeeeee0...
6204C0D0:   01000000 0300f802 0011001b 00616161  .............aaa
6204C0E0:   61616161 61616100 00000000 00000000  aaaaaaa.........
6204C0F0:   00000000 00000000 00000000 00000000  ................

(snip)

You can clearly see that the last row I inserted, with c1 = 1 and the replicated ‘a’s is stored after the other records on the page, even though its key is logically before the others. And just to nail the point home, I’ll use DBCC PAGE to look at the slot array:

DBCC PAGE ('rowordertest', 1, 143, 1);
GO
(snip)

OFFSET TABLE:

Row - Offset
4 (0x4) - 177 (0xb1)
3 (0x3) - 150 (0x96)
2 (0x2) - 123 (0x7b)
1 (0x1) - 96 (0x60)
0 (0x0) - 204 (0xcc)

(snip)

The slot array grows backwards, which is why its dumped in what looks like reverse logical order. You can see that slot 0, which represents the first logical record on the page, is stored at an offset greater than the others.

10 thoughts on “Inside the Storage Engine: Proof that records are not always physically stored in index key order

  1. HI Paul,

    what is logical order here…

    I understand that Physical order is something how the data is physically stored inside SQL Server..!

    Logical Order meaning, will there be any pointer assigned to the physical data?

    What is the difference between Physical and Logical Order of Data.. Could you please explain in detail?

    1. Logical order is the order based on the index keys. Physical order is the order in which the Storage Engine stores the records in data file pages. In a brand new index, conceptually, the records in an index will be in the same logical and physical order. When things like page splits occur, that ordering is broken. When there’s free space on a page, the Storage Engine can use it to store a record in the wrong physical order, but the slot array maintains logical ordering, as the demo shows.

    1. If the table has a clustered index, the rows will be read from the page in key order (using the slot array at the bottom of the page). If the table is a heap, the rows will be read in the order defined by the slot array (usually the insert order). Note that the order of rows you get when doing a select may not be exactly in key order depending on how the Access Methods reads the pages. If you want a definite ordering, you must use an ORDER BY.

  2. Hi Paul,
    If I only know the offset at the beginning of each record, how do I get the length of each record?

      1. thannks paul,it help me a lot。
        I have one more question,Is the data structure of the NDF file consistent with that of the MDF file?

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.