Time for the first post in the Inside the Storage Engine series. I’m going to focus on SQL Server 2005 in this series and I’ll point out major differences between 2005 and previous versions. Please drop me a line if there’s something you’d like to see explained and demo’d.


Before jumping into how things work, I’d like to go over two commands I’ll be using a lot – DBCC PAGE and DBCC IND. These are both undocumented and unsupported commands, but are very safe to use as they’re used extensively inside and outside Microsoft when troubleshooting. Nevertheless, use at your own risk. They’re quite well known in the SQL community and I and others have publicized them before (I even demo’d them last year at ITForum in Spain).


To illustrate their use, I’m going to use a simple script I wrote to prove that page splits never roll back. I was having a discussion with someone a while ago about this question and the answer is always no. A page split occurs when an insert or update has to happen at a certain point in an index page, and there’s no room on the page to accomomodate the new or updated record. Page splits are done internally as separate ‘system’ transactions. Once a system transaction commits, it cannot be rolled back – even if the user transaction it was part of rolls back.


So, let’s run through the script. First thing to do is create a database containing a table with an index (as page splits only happen in indexes).



USE MASTER;


GO


IF DATABASEPROPERTY (N‘pagesplittest’, ‘Version’) > 0 DROP DATABASE pagesplittest;


GO


CREATE DATABASE pagesplittest;


GO


USE pagesplittest;


GO


CREATE TABLE t1 (c1 INT, c2 VARCHAR (1000));


CREATE CLUSTERED INDEX t1c1 ON t1 (c1);


GO


Now I’m going to fill up a page in the index, but leave a gap in the c1 values so that I can force a page split by inserting the missing key value.



INSERT INTO t1 VALUES (1, REPLICATE (‘a’, 900));


INSERT INTO t1 VALUES (2, REPLICATE (‘b’, 900));


INSERT INTO t1 VALUES (3, REPLICATE (‘c’, 900));


INSERT INTO t1 VALUES (4, REPLICATE (‘d’, 900));


– leave a gap at 5


INSERT INTO t1 VALUES (6, REPLICATE (‘f’, 900));


INSERT INTO t1 VALUES (7, REPLICATE (‘g’, 900));


INSERT INTO t1 VALUES (8, REPLICATE (‘h’, 900));


INSERT INTO t1 VALUES (9, REPLICATE (‘i’, 900));


GO


I can find out what the first index page is using the DBCC IND command:



DBCC IND (‘pagesplittest’, ‘t1′, 1);


GO


This command list all the pages that are allocated to an index. Here’s the output in this case:


SEQA3.jpg


The columns mean:




  • PageFID – the file ID of the page


  • PagePID – the page number in the file


  • IAMFID – the file ID of the IAM page that maps this page (this will be NULL for IAM pages themselves as they’re not self-referential)


  • IAMPID – the page number in the file of the IAM page that maps this page


  • ObjectID – the ID of the object this page is part of


  • IndexID – the ID of the index this page is part of


  • PartitionNumber – the partition number (as defined by the partitioning scheme for the index) of the partition this page is part of


  • PartitionID – the internal ID of the partition this page is part of




  • PageType – the page type. Some common ones are:



    • 1 – data page


    • 2 – index page


    • 3 and 4 – text pages


    • 8 – GAM page


    • 9 – SGAM page


    • 10 – IAM page


    • 11 – PFS page


  • IndexLevel – what level the page is at in the index (if at all). Remember that index levels go from 0 at the leaf to N at the root page (except in clustered indexes in SQL Server 2000 and 7.0 – where there’s a 0 at the leaf level (data pages) and a 0 at the next level up (first level of index pages))


  • NextPageFID and NextPagePID – the page ID of the next page in the doubly-linked list of pages at this level of the index


  • PrevPageFID and PrevPagePID – the page ID of the previous page in the doubly-linked list of pages at this level of the index

So you can see we’ve got a single page clustered index with an IAM page. Note that the page IDs returned may differ on your server. Let’s look at the data page:



DBCC TRACEON (3604);


GO


DBCC PAGE (pagesplittest, 1, 143, 3);


GO


The traceflag is to make the output of DBCC PAGE go to the console, rather than to the error log. The syntax for DBCC PAGE is:



dbcc page ( {‘dbname’ | dbid}, filenum, pagenum [, printopt={0|1|2|3} ])


The filenum and pagenum parameters are taken from the page IDs that come from various system tables and appear in DBCC or other system error messages. A page ID of, say, (1:143) has filenum = 1 and pagenum = 143.


The printopt parameter has the following meanings:




  • 0 – print just the page header


  • 1 – page header plus per-row hex dumps and a dump of the page slot array (unless its a page that doesn’t have one, like allocation bitmaps)


  • 2 – page header plus whole page hex dump


  • 3 – page header plus detailed per-row interpretation

The per-row interpretation works for all page types, including the allocation bitmaps. In our case, we asked for a detailed output. I’ll explain the various parts of the output in a post about the anatomy of a page. Here’s the output from DBCC PAGE, with a bunch of the repeated per-row info removed for brevity:



PAGE: (1:143)



BUFFER:



BUF @0x02C49720


bpage = 0×05400000                   bhash = 0×00000000                   bpageno = (1:143)
bdbid = 8                            breferences = 0                      bUse1 = 22163
bstat = 0xc0010b                     blog = 0x32159bb                     bnext = 0×00000000


PAGE HEADER:



Page @0×05400000


m_pageId = (1:143)                   m_headerVersion = 1                  m_type = 1
m_typeFlagBits = 0×4                 m_level = 0                          m_flagBits = 0×8000
m_objId (AllocUnitId.idObj) = 68     m_indexId (AllocUnitId.idInd) = 256 
Metadata: AllocUnitId = 72057594042384384                                
Metadata: PartitionId = 72057594038386688                                 Metadata: IndexId = 1
Metadata: ObjectId = 2073058421      m_prevPage = (0:0)                   m_nextPage = (0:0)
pminlen = 8                          m_slotCnt = 8                        m_freeCnt = 744
m_freeData = 7432                    m_reservedCnt = 0                    m_lsn = (18:113:2)
m_xactReserved = 0                   m_xdesId = (0:0)                     m_ghostRecCnt = 0
m_tornBits = 0                      


Allocation Status


GAM (1:2) = ALLOCATED                SGAM (1:3) = ALLOCATED              
PFS (1:1) = 0×60 MIXED_EXT ALLOCATED   0_PCT_FULL                         DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED           


Slot 0 Offset 0×60 Length 917


Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS


Memory Dump @0x5C24C060


00000000:   30000800 01000000 0300f802 00110095 †0……………        
00000010:   03616161 61616161 61616161 61616161 †.aaaaaaaaaaaaaaa        


<snip> I’ve removed this section to save space


00000380:   61616161 61616161 61616161 61616161 †aaaaaaaaaaaaaaaa        
00000390:   61616161 61††††††††††††††††††††††††††aaaaa                   
UNIQUIFIER = [NULL]                 


Slot 0 Column 1 Offset 0×4 Length 4


c1 = 1                              


Slot 0 Column 2 Offset 0×11 Length 900


c2 = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
a                                   


Slot 1 Offset 0x3f5 Length 917


Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS


Memory Dump @0x5C24C3F5


<snip> And again…


DBCC execution completed. If DBCC printed error messages, contact your system administrator.


As you can see from the output, each row is 917 bytes long and there’s only 744 bytes free (look at the m_freecnt value in the PAGE HEADER section). This means that we can’t insert another row on that page of the same length – there just isn’t space – but that’s what we’re going to do! Remember that this page currently has nine rows on it. Let’s force a page split:



BEGIN TRAN;


GO


INSERT INTO t1 VALUES (5, REPLICATE (‘a’, 900));


GO


Now we know there wasn’t enough room so the page must have split. Let’s check DBCC IND again to see if another page was allocated to the index – here’s the output:


SEQA41.jpg


Two pages have been added – an index page and another data page. Before we added the extra row and caused the page split, the index only needed one page. Now that there are two data pages, there needs to be an index page to allow searches through the index b-tree. Let’s take a look at the two data pages to see which rows are stored on which page. Doing DBCC PAGE on them shows that page (1:143) has 5 rows, with c1 values 1 through 5, and page (1:154) has the 4 rows with c1 values 6 through 9 (I’m not going to post all the DBCC PAGE output – that would make the post way too long and it gives you an incentive to try the commands out). This is what we’d expect, as the page split occurs at the insertion point, and the row being inserted is put onto the page that split.


Now let’s rollback the user transaction and see what happens:



ROLLBACK TRAN;


GO


Running the DBCC PAGE commands again shows that the index structure remains the same as after the split. Page (1:154) has the 4 rows on it from the split and page (1:143) has the other rows but not the one we inserted in the explicit transaction.


So, proof that a page split is never rolled back. I’ll be making much more use of these two DBCC commands in future posts and I’ll do the page anatomy one later this week. Let me know if there’s anything in particular you’d like to see described in this series.