Inside the Storage Engine: Anatomy of a page

Next up in the Inside the Storage Engine series is a discussion of page structure. Pages exist to store records. A database page is an 8192-byte (8KB) chunk of a database data file. They are aligned on 8KB boundaries within the data files, starting at byte-offset 0 in the file. Here’s a picture of the basic structure:

Header

The page header is 96 bytes long. What I’d like to do in this section is take an example page header dump from DBCC PAGE and explain what all the fields mean. I’m using the database from the page split post and I’ve snipped off the rest of the DBCC PAGE output.

DBCC TRACEON (3604);
DBCC PAGE (N'pagesplittest', 1, 143, 1);
GO

m_pageId = (1:143)                   m_headerVersion = 1                  m_type = 1
m_typeFlagBits = 0x4                 m_level = 0                          m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 68     m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594042384384
Metadata: PartitionId = 72057594038386688                                 Metadata: IndexId = 1
Metadata: ObjectId = 2073058421      m_prevPage = (0:0)                   m_nextPage = (1:154)
pminlen = 8                          m_slotCnt = 4                        m_freeCnt = 4420
m_freeData = 4681                    m_reservedCnt = 0                    m_lsn = (18:116:25)
m_xactReserved = 0                   m_xdesId = (0:0)                     m_ghostRecCnt = 0
m_tornBits = 1333613242

Here’s what all the fields mean (note that the fields aren’t quite stored in this order on the page):

m_pageId
- This identifies the file number the page is part of and the position within the file. In this example, (1:143) means page 143 in file 1.
m_headerVersion
- This is the page header version. Since version 7.0 this value has always been 1.
m_type
- This is the page type. The values you’re likely to see are:
  - 1 – data page. This holds data records in a heap or clustered index leaf-level.
  - 2 – index page. This holds index records in the upper levels of a clustered index and all levels of non-clustered indexes.
  - 3 – text mix page. A text page that holds small chunks of LOB values plus internal parts of text tree. These can be shared between LOB values in the same partition of an index or heap.
  - 4 – text tree page. A text page that holds large chunks of LOB values from a single column value.
  - 7 – sort page. A page that stores intermediate results during a sort operation.
  - 8 – GAM page. Holds global allocation information about extents in a GAM interval (every data file is split into 4GB chunks – the number of extents that can be represented in a bitmap on a single database page). Basically whether an extent is allocated or not. GAM = Global Allocation Map. The first one is page 2 in each file. More on these in this post.
  - 9 – SGAM page. Holds global allocation information about extents in a GAM interval. Basically whether an extent is available for allocating mixed-pages. SGAM = Shared GAM. the first one is page 3 in each file. More on these in this post.
  - 10 – IAM page. Holds allocation information about which extents within a GAM interval are allocated to an allocation unit (portion of a table or index). IAM = Index Allocation Map. More on these in this post.
  - 11 – PFS page. Holds allocation and free space information about pages within a PFS interval (every data file is also split into approx 64MB chunks – the number of pages that can be represented in a byte-map on a single database page. PFS = Page Free Space. The first one is page 1 in each file. More on these in this post.
  - 13 – boot page. Holds information about the database. There’s only one of these in the database. It’s page 9 in file 1.
  - 15 – file header page. Holds information about the file. There’s one per file and it’s page 0 in the file.
  - 16 – diff map page. Holds information about which extents in a GAM interval have changed since the last full or differential backup. The first one is page 6 in each file.
  - 17 – ML map page. Holds information about which extents in a GAM interval have changed while in bulk-logged mode since the last backup. This is what allows you to switch to bulk-logged mode for bulk-loads and index rebuilds without worrying about breaking a backup chain. The first one is page 7 in each file.
  - 18 – a page that’s be deallocated by DBCC CHECKDB during a repair operation.
  - 19 – the temporary page that ALTER INDEX … REORGANIZE (or DBCC INDEXDEFRAG) uses when working on an index.
  - 20 – a page pre-allocated as part of a bulk load operation, which will eventually be formatted as a ‘real’ page.
m_typeFlagBits
- This stores a few values about the page. For data and index pages, if the field is 4, that means all the rows on the page are the same fixed size. If a PFS page has m_typeFlagBits of 1, that means that at least one of the pages in the PFS interval mapped by the PFS page has at least one ghost record.
m_level
- This is the level that the page is part of in the b-tree.
- Levels are numbered from 0 at the leaf-level and increase to the single-page root level (i.e. the top of the b-tree).
- In SQL Server 2000, the leaf level of a clustered index (with data pages) was level 0, and the next level up (with index pages) was also level 0. The level then increased to the root. So to determine whether a page was truly at the leaf level in SQL Server 2000, you need to look at the m_type as well as the m_level.
- For all page types apart from index pages, the level is always 0.
m_flagBits
- This stores a number of different flags that describe the page. For example, 0x200 means that the page has a page checksum on it (as our example page does) and 0x100 means the page has torn-page protection on it.
- Some bits are no longer used from SQL Server 2005 onward.
m_objId
m_indexId
- In SQL Server 2000, these identified the actual relational object and index IDs to which the page is allocated. In SQL Server 2005 this is no longer the case. The allocation metadata totally changed so these instead identify what’s called the allocation unit that the page belongs to. This post explains how an allocation unit ID is calculated. Note that for databases upgraded from SQL Server 2000, they will still be the the actual object ID and index ID. Also for databases on all versions, many system tables still have these be the actual object and index IDs.
m_prevPage
m_nextPage
- These are pointers to the previous and next pages at this level of the b-tree and store 6-byte page IDs.
- The pages in each level of an index are joined in a doubly-linked list according to the logical order (as defined by the index keys) of the index. The pointers do not necessarily point to the immediately adjacent physical pages in the file (because of fragmentation).
- The pages on the left-hand side of a b-tree level will have the m_prevPage pointer be NULL, and those on the right-hand side will have the m_nextPage be NULL.
- In a heap, or if an index only has a single page, these pointers will both be NULL for all pages. There’s a special case when they won’t be NULL – if the heap is rebuilt using ALTER TABLE … REBUILD. This uses the index rebuild code to build the leaf-level of a clustered index, but the linkages aren’t actually used for anything. See here for more details.
pminlen
- This is the size of the fixed-length portion of the records on the page.
m_slotCnt
- This is the count of records on the page.
m_freeCnt
- This is the number of bytes of free space in the page.
m_freeData
- This is the offset from the start of the page to the first byte after the end of the last record on the page. It doesn’t matter if there is free space nearer to the start of the page.
m_reservedCnt
- This is the number of bytes of free space that has been reserved by active transactions that freed up space on the page. It prevents the free space from being used up and allows the transactions to roll-back correctly. There’s a very complicated algorithm for changing this value.
m_lsn
- This is the Log Sequence Number of the last log record that changed the page.
m_xactReserved
- This is the amount that was last added to the m_reservedCnt field.
m_xdesId
- This is the internal ID of the most recent transaction that added to the m_reservedCnt field.
m_ghostRecCnt
- The is the count of ghost records on the page.
m_tornBits
- This holds either the page checksum or the bits that were displaced by the torn-page protection bits – depending on what form of page protection is turned on for the database.

Note that I didn’t include the fields starting with Metadata:. That’s because they’re not part of a page header. During SQL Server 2005 development I did some major work rewriting the guts of DBCC PAGE and to save everyone using it from having to do all the system table look-ups to determine what the actual object and index IDs are, I changed DBCC PAGE to do them internally and output the results.

Records

See this blog post for details.

Slot Array

It’s a very common misconception that records within a page are always stored in logical order. This is not true. There is another misconception that all the free-space in a page is always maintained in one contiguous chunk. This also is not true. (Yes, the image above shows the free space in one chunk and that very often is the case for pages that are being filled gradually.)

If a record is deleted from a page, everything remaining on the page is not suddenly compacted – inserters pay the cost of compaction when it’s necessary, not deleters.

Consider a completely full page – this means that record deletions cause free space holes within the page. If a new record needs to be inserted onto the page, and one of the holes is big enough to squeeze the record into, why go to the bother of compacting it? Just stick the record in and carry on.

What if the record should logically have come at the end of all other records on the page, but we’ve just inserted it in the middle – doesn’t that screw things up somewhat? No, because the slot array is ordered and gets reshuffled as records are inserted and deleted from pages. As long as the first slot array entry points to the logically first record on the page, everything’s fine.

Each slot entry is just a two-byte pointer into the page – so it’s far more efficient to manipulate the slot array than it is to manipulate a bunch of records on the page. Only when the Storage Engine knows there’s enough free space contained within the page to fit in a record, but it’s spread about the page does it compact the records on the page to make the free space into a contiguous chunk. One interesting fact is that the slot array grows backwards from the end of the page, so the free space is squeezed from the top by new rows, and from the bottom by the slot array.

Posted Under:

Inside the Storage Engine, On-Disk Structures

80 thoughts on “Inside the Storage Engine: Anatomy of a page”

Gail says:

October 31, 2007 at 12:45 pm

Very interesting. Thanks for that.

Based on your breakdown of the m_type, these are the system pages at the beginning of each db file

0 – File header
1 – PFS
2 – GAM
3 – SGAM
6 – DCM
7 – ML

Out of interest, what are on pages 4 and 5?

I had a look at them, but they seem to both be empty (DBCC page returns m_type=0, m_pageId = (0,0)), but the PFS page lists them as fill, and the DCM lists them a changed (in the test DB I looked at at least)

Reply
Paul Randal says:

October 31, 2007 at 7:27 pm

Hi Gail – pages 4 and 5 are unused. Page 5 used to be the first page of the sysobjects table in SQL Server 2000, and I don’t remember what page 4 was – I don’t believe it was used in SQL Server 2000. Thanks

Reply
1. sravan says:
  
  October 3, 2014 at 6:46 am
  
  9-boot page
  
  what is page 8?
  
  Reply
  1. Paul Randal says:
    
    October 7, 2014 at 8:00 am
    
    First page of a random system table.
    
    Reply
    1. HEMANTH says:
      
      October 8, 2014 at 6:32 am
      
      Mr.Paul Is It page 8:First page of a random system table.I Need Proof About This One
      
      Reply
      1. Paul Randal says:
        
        October 8, 2014 at 10:33 am
        
        Go look at page 8 using DBCC PAGE and you’ll see it’s used for a system table.
Gail says:

November 1, 2007 at 8:10 am

Thanks.

Reply
Alejandro Chavez says:

December 7, 2007 at 9:59 pm

Paul,
I think if you want to obtain the AllocationUnitId you only have to perform the following operation

AllocationUnitID = AllocationUnitID.objId * 2^16+2^56

Where the AllocationUnitID.objId is the value located in the page header (bytes 24-27)

Again, you have shown very important details of the internal of SQL Server. I enjoyed your session with you and Kimberly in Vegas
Regards,
Alejandro Chavez

Reply
Eric says:

December 11, 2008 at 6:54 am

Thanks, for very interesting info.

In SQL 2000 m_objId in page header – it’s really object ID.
In SQL 2005 and 2008 dbcc page show abstract value for m_objId, and really objectID in Metadata.
Can You tell: How server calculate this data?

Reply
Lonely Swordsman says:

May 29, 2009 at 8:12 pm

Thank everyone,what does the value ‘0’ for m_type mean?please.

Reply
paul says:

May 29, 2009 at 8:20 pm

If the data file was zero initialized, m_type = 0 is an unformatted page (or one that has been subsequently corrupted). If the data file was instant initialized, the page may or may not be corrupt.

Whatever the case, if you’re seeing that on a page that’s supposed to be allocated, that’s corruption.

Reply
Lonely Swordsman says:

June 1, 2009 at 8:24 am

When I dump the pages of a data file by DBCC PAGE(…),both of m_type are "0" for the fifth(4) and sixth(5) dumped pages,so I don’t think they may be all corruption.The following are the fifth and sixth dumped pages:

PAGE HEADER:
Page @0x108E8000
m_pageId = (0:0) m_headerVersion = 0 m_type = 0
m_typeFlagBits = 0x0 m_level = 0 m_flagBits = 0x0
m_objId (AllocUnitId.idObj) = 0 m_indexId (AllocUnitId.idInd) = 0 Metadata: AllocUnitId = 0
Metadata: PartitionId = 0 Metadata: IndexId = -1 Metadata: ObjectId = 0
m_prevPage = (0:0) m_nextPage = (0:0) pminlen = 0
m_slotCnt = 0 m_freeCnt = 0 m_freeData = 0
m_reservedCnt = 0 m_lsn = (0:0:0) m_xactReserved = 0
m_xdesId = (0:0) m_ghostRecCnt = 0 m_tornBits = 0

Allocation Status

GAM (1:2) = ALLOCATED SGAM (1:3) = NOT ALLOCATED PFS (1:1) = 0x0 0_PCT_FULL
DIFF (1:6) = CHANGED ML (1:7) = NOT MIN_LOGGED

PAGE HEADER:
Page @0x108EA000
m_pageId = (0:0) m_headerVersion = 0 m_type = 0
m_typeFlagBits = 0x0 m_level = 0 m_flagBits = 0x0
m_objId (AllocUnitId.idObj) = 0 m_indexId (AllocUnitId.idInd) = 0 Metadata: AllocUnitId = 0
Metadata: PartitionId = 0 Metadata: IndexId = -1 Metadata: ObjectId = 0
m_prevPage = (0:0) m_nextPage = (0:0) pminlen = 0
m_slotCnt = 0 m_freeCnt = 0 m_freeData = 0
m_reservedCnt = 0 m_lsn = (0:0:0) m_xactReserved = 0
m_xdesId = (0:0) m_ghostRecCnt = 0 m_tornBits = 0

Allocation Status

GAM (1:2) = ALLOCATED SGAM (1:3) = NOT ALLOCATED PFS (1:1) = 0x0 0_PCT_FULL
DIFF (1:6) = CHANGED ML (1:7) = NOT MIN_LOGGED

Thank you very much for your answer.

Reply
paul says:

June 1, 2009 at 2:16 pm

As I explained in a comment a few above yours, these are reserved pages that used to have special meaning. They are formatted with zeroes.

Reply
Lonely Swordsman says:

June 2, 2009 at 3:42 am

Yes,I see,thanks a lot.

Reply
Lonely Swordsman says:

June 2, 2009 at 5:16 am

Anyone has detailed manual for SQL SERVER SQLdiag and Microsoft PSS PerfStats?they are all SQL SERVER performance tools,but I can’t find the manuals anywhere,if you have ,would you like to send a copy to me?thanks.

Reply
max says:

June 24, 2009 at 4:37 am

> In SQL 2000 m_objId in page header – it’s really object ID.
> In SQL 2005 and 2008 dbcc page show abstract value for m_objId, and really objectID in Metadata.
> Can You tell: How server calculate this data?

In SQL 2005/2008 m_objId is calculated from auid field in sys.sysallocunits. Also, through this table real objId connected with m_objId. I think, partitioning is guilty.

Reply
Ashish says:

August 12, 2009 at 10:56 pm

can anyone tell me ,how the sql server 2005/2008 calculate MetaData information in Page header ( MetaData:AllocUnitId, Metadata:PartitionId , MetaData: indexId, MetaData:ObjectId) .Thanks in advance.

Reply
paul says:

August 13, 2009 at 4:10 am

That info is calculated for you by DBCC PAGE – otherwise you need to do a series of lookups using various system tables. Why do you need to do it yourself when I put it into DBCC PAGE already?

Reply
Ashish K. Bajpai says:

August 25, 2009 at 11:31 pm

Hello Paul,

Thanks for replying.Actually when i saw SQL server 2005 internal system critical tables….I’ve no idea about that filed….and how they gives there valuse….I think that values are important for me to study a database(SQL 2005) internal.

Reply
himanshu says:

March 22, 2013 at 3:02 am

Hi sir,
How to make relation between table object id in syschobj table and object id which page header hold.

in above example object id is 2073058421 and page object id is 444

Reply
1. Paul Randal says:
  
  March 22, 2013 at 6:41 am
  
  The output from DBCC PAGE tells you the real object ID – look for the Metadata: ObjectId value.
  
  Reply
Rajesh Kumaran says:

April 3, 2013 at 6:11 am

Hi Paul,

Could you please tell me why would an object_id be 0?

DBCC PAGE (‘dbresume’, 10, 1581624, 1)

Metadata: ObjectId = 0

Thanks
Raj

Reply
1. Paul Randal says:
  
  April 3, 2013 at 6:36 am
  
  If the page is deallocated, or the allocation unit it used to belong to doesn’t exist any more, or the page is corrupt, or the page has never been allocated, or…
  
  Reply
Rajesh Kumaran says:

April 3, 2013 at 6:40 am

OK , thanks very much Paul

Thanks
Raj

Reply
constantin.negut says:

August 15, 2013 at 8:12 am

Inside 8192 there are many structures.
Which structure participate at beeing disk writed as data.
I mean in this memory area(on which the above resides) we have no memory address.
It have to be memory page address and who and what maps this page on disk.

Reply
1. constantin.negut says:
  
  August 15, 2013 at 8:27 am
  
  I reply myself.i think that sql memory manager call windows kernel MM which seems not to care about sql pages.It is just interested in page having 8192 bytes.
  The IO read/writes on disk also have no binding to any sql data.These can be anyway in this sql pages.Only the rest of interpretation is done by SQL, after it’s request for memory are satisfied by the kernel
  
  I do not know if this request allocation are threaded or not ,or if kernel thread in a certain way the requests.
  
  Reply
  1. Paul Randal says:
    
    August 22, 2013 at 9:41 am
    
    I get it now. The 8192 becomes a SQL Server data page once it’s read into memory. Windows knows nothing about what the memory is being used for, or what the data is that it’s reading from/writing to disk.
    
    Reply
2. Paul Randal says:
  
  August 22, 2013 at 9:40 am
  
  Not sure I quite understand your question – but the entire 8192 are written to disk, even if only a single byte on the page is changed.
  
  Reply
naveen says:

August 24, 2013 at 6:55 am

hai paul,
nice article this gives the entire info about the page anatomy of datafile.
can you please tell me the exact size of a slot array in page
and for a page how many slot arrays are there.

Reply
1. Paul Randal says:
  
  August 24, 2013 at 9:49 am
  
  Two bytes per row on the page, no fixed size. One slot array per page.
  
  Reply
  1. naveen says:
    
    August 25, 2013 at 9:10 am
    
    Thanks paul for your reply and is there any fixed size for a slot i.e i want to know the actual size of a slot on the page cause in one page i have 5 slots and in the other page i have only one slot
    
    Reply
    1. Paul Randal says:
      
      August 25, 2013 at 4:04 pm
      
      Yes, it’s two bytes per row, fixed size, growing backwards from the end of the page.
      
      Reply
      1. Hai Ton says:
        
        May 27, 2014 at 11:31 am
        
        Hi Paul,
        I was confused about the slot array when watching Index Internal part of mcm training from Kimberly. I thought that slot array only take 2 bytes per page. After reading your explanation, I finally understand it.
        
        Thanks.
pardhu says:

August 30, 2013 at 9:03 pm

Very interesting article.
I hope you could answer one of the practical problems that a dba usually faces at his work.
DBCC Checkdb() has returned the corrupted page id. I want to restore the healthy page from the full and transactional log backups that i have. How do i do that? Which of the paramenters will help me?

Thank You

Reply
1. Paul Randal says:
  
  September 1, 2013 at 10:59 am
  
  The page ID in the corruption message is the one you use for the single-page restore.
  
  Reply
ES says:

September 23, 2013 at 4:04 am

Hi Paul,

While testing DBCC PAGE/IND I ran “SELECT TOP 10 * INTO TEST10 FROM Users” statement. And I am a bit stuck…I know that the original table holds 12 rows per page, so I expected to see 1 in-row data page allocated for the new table.
DBCC IND shows 3 pages: 1 IAM, 2 DATA; and the first of data pages appears to be empty:
m_slotCnt = 0
m_freeCnt = 8096
m_freeData = 96
… and the second page holds the data.
Could you please clarify why is there an empty pages being allocated?
Thank you very much!

Reply
1. Paul Randal says:
  
  September 23, 2013 at 6:47 am
  
  Can you email me a repro please?
  
  Reply
  1. ES says:
    
    September 23, 2013 at 8:42 am
    
    Sent to paul@sqlskills.com
    Thanks!
    
    Reply
Naveen M says:

March 13, 2014 at 3:02 am

Hai Paul,

Great article about the anatomy of the pages inside the ingine and one more can we change the page size in sql server 2012 or 2014 versions of sql server. however i know that we can’t change them in 2008R2 release but did not have clarity about the sql server 2012 page size.

Reply
1. Paul Randal says:
  
  March 13, 2014 at 1:51 pm
  
  No – page size is fixed.
  
  Reply
  1. Naveen M says:
    
    March 14, 2014 at 2:15 am
    
    Thanks Paul For your quick help.:)
    
    Reply
Sudhakar says:

April 15, 2014 at 7:13 am

Do you an idea about m_type = 20 in SQL 2008 R2?

Reply
1. Paul Randal says:
  
  April 15, 2014 at 7:21 am
  
  Yes – it’s a page pre-allocated as part of a bulk load. It’ll eventually be formatted correctly.
  
  Reply
Hans says:

April 5, 2015 at 10:16 am

SQL Server stores data in pages of 8k (8192) bytes. In a data page, 96 bytes are reserved for page header. Considering the maximum allowed capacity of 8060 bytes for a data page, there are still 36 bytes remained. Where does this block of 36 bytes go?

Reply
1. Paul Randal says:
  
  April 5, 2015 at 10:36 am
  
  The 8060 bytes is the maximum size of one record, not the amount of data space on the page – 8096 bytes.
  
  For a maximum-sized record of 8060 bytes, add two bytes for the slot array entry, 10 bytes for a possible heap forwarded-record backpointer, 14-bytes for a possible versioning tag, and that’s 26 bytes used. The other 10 bytes are for possible future use.
  
  If there are more than one record on the page, all 8096 bytes of data space can be used.
  
  Reply
Alan says:

December 9, 2015 at 3:26 pm

Hi, Paul

I am reading the article at https://www.sqlskills.com/blogs/paul/inside-the-storage-engine-anatomy-of-a-page/

I have a database ‘MyDB’, and there is a table ‘MyTable’ in the database.

I have the following questions:

1. If I do as follows:

(1) use the following query to find the object ID for a table:

Use MyDB;
select sys.objects.name, sys.objects.object_id from sys.objects where (name = ‘MyTable’);

(2) then use the following command to find all pages allocated to ‘MyTable’:

dbcc ind(MyDB, ‘MyTable’, -1);

(3) then in the result table, I choose one of the data pages(page type = 1), and use the following command:

DBCC TRACEON(3604);
DBCC PAGE (MyDB, 1, 17386, 3);

Then in the dumped content(page header) of step (3), the m_objId (AllocUnitId.idObj) field should equal to the object ID obtained in step (1). Is that correct?

And whether that works for both user tables and system base tables, such as ‘sys.syscolpars’ base table?

Based on my test, the above two conclusion are both correct.

2. What is the meaning of Metadata: ObjectId? In article https://www.sqlskills.com/blogs/paul/inside-the-storage-engine-anatomy-of-a-page/, the ‘metadata: objectId’ ‘m_objId’. But from my own tests, the ‘metadata: objectId’ always equals to ‘m_objId’. Why? The original article does not explain the metadata clearly.

I am using SQL Server 2005, 2008, 2008R2, 2012 and 2014

Thanks

Reply
1. Paul Randal says:
  
  December 9, 2015 at 3:40 pm
  
  The m_objId is only the actual object ID for system tables with IDs less than 100, and for tables in databases upgraded from SQL Server 2000. For all other tables, the m_objId is part of the allocation unit ID.
  
  Reply
Sac says:

February 9, 2016 at 1:15 am

Hello Paul,

If a slot array has a 2 byte pointer for each data row in a page and a page accommodates 8060 bytes of data so that would leave 36 bytes in a page.
Does that mean that the slot array can only cover 18 records on the page ?

Reply
1. Paul Randal says:
  
  February 9, 2016 at 11:36 am
  
  No – 8060 is the maximum in-row size of a single record. All 8096 bytes of the page can be used for records – the maximum being 736 records (for a clustered index on a smallint with no other table columns – 9-byte record plus 2-byte slot array entry).
  
  Reply
Damodar polani says:

August 10, 2016 at 6:15 pm

Hi Paul,
My question is about deallocated page (mtype = 18). After the repair, the deallocated index page is still showing mtype =2 in my case. Why is it now showing it as mtype =18. I tried with DM_db_database_page_allocations filtering out mtype 18. But it reflects no records.

Reply
1. Paul Randal says:
  
  August 23, 2016 at 10:20 am
  
  It depends what repair did. If it rebuilt the index to fix a corruption, that’s not a targeted deallocation of the page (especially an index page, type=2).
  
  Reply
Alex says:

March 30, 2017 at 9:25 pm

Hi Paul! In SQL 2012 in page header there is DB Frag ID, it’s always 1, what is it?

Reply
1. Paul Randal says:
  
  March 30, 2017 at 9:55 pm
  
  Hey Alex – it’s to do with a feature that may see the light of day in a future release. Right now, consider it undocumented and of no use. Thanks
  
  Reply
vinu says:

April 25, 2017 at 5:51 am

Hi Pul,

Very useful post. Thanks for sharing your knowledge.

I am trying to understand the page structure of Clustered index Column store table. Any article on Column store?

Thanks

Reply
1. Paul Randal says:
  
  April 25, 2017 at 8:03 am
  
  No – there’s nothing I know of that documents that.
  
  Reply
ZeroPerfect says:

September 28, 2017 at 12:46 am

hello,Paul！
very glad to study this post,I have some questions belows. I ask these questiones is want to help me to recovery data when the dbcc can not or the MFT in NFTS is corrupt~.
1.m_typeFlagBits ，you said :This is mostly unused. For data and index pages it will always be 4. For all other pages it will always be 0 . But I check my database and tables, I find that in some heap tables ,the data page’s m_typeFlagBits is 0x04, but in clustered table, I find some index and data page(leaf page)has m_typeFlagBits 0x00 in my some tables, and I also find some index and data page has m_typeFlagBits 0x04. so I do not know when the bytes will be 0x04 and when it will be 0x00?
2.another question， just user page header（not use DBCC in some error environment), I can not get metadata:indexid， so I can not judge a index page is an cluster index or non cluster index,because its m_indexid is always 256 except the system table page, so I want to know how to solve this question? is there any algorithm to calculate the metadata:indexid or partitionid or objectid? not by look up the system table sys.sysrowsets sys.sysschobjs because in some case, the system table’s page is been corrupt!

Reply
1. Paul Randal says:
  
  September 28, 2017 at 8:40 am
  
  1) Yeah – I’ve updated that with the real meaning (in the early blog days I didn’t want to write too much about the internals that weren’t documented)
  2) You can find the algorithm at https://www.sqlskills.com/blogs/paul/inside-the-storage-engine-how-are-allocation-unit-ids-calculated/ and write some code to reverse it from a raw reading of the page. However, how can you get the page header if you’re not using DBCC PAGE?
  
  Reply
  1. ZeroPerfect says:
    
    September 28, 2017 at 6:39 pm
    
    thank you for your reply!
    1)the website you give me is just show how to use m_indexid and m_objid to calculate allocunits id, that is ok, but I want to know how to get the metadata:Indexid (its real indexid) because I find in all user tables, the m_indexid in page header is always 256, whether this page is a clustered page or non clustered page or heap, so I want to know how to use information in page header to calculate the real indexid. I know the regular way is to use allocunitsid to find container id in sys.sysallocunits, and then use container id(partition id) to find the real indexid in sys.sysrowsets. but I think it must have an algorithm to calculate it. can you tell me?
    2. you ask me how to get the page header if not using DBCC PAGE, I will write some code to scaning pages in the disk, every 8k, I will use some key byte to judge whether this 8k is a sqlserver page, I will use (HeaderVersion,type,TypeFlagBits(that is why I ask you yesterday),and slot number/slot point in the tail of page) to help me judge.
    3. my purpose is to recovery the SQL server data when the NTFS’ metadata is corruption, such as the database1.MDF’s MFT has been damaged, so you can not find the database1.MDF in the windows disk, I will write some code to scan the disk ,and find page’s belong to database1.MDF and rebuilt the MFT file to recovery SQL Server data. SO I want to get more bytes-level information in the disk
    
    Reply
    1. Paul Randal says:
      
      September 28, 2017 at 6:49 pm
      
      1) No, there’s no algorithm. You need to use the allocation unit ID and do the lookups in metadata. This is what DBCC CHECKDB and DBCC PAGE (both of which I wrote) have to do when it works out what the index ID is.
      2) Eh? Why do that when you can use DBCC PAGE? How are you going to check that the page is allocated or not? A deallocated page will still have the correct-looking bytes, but won’t be a valid page.
      3) Why not just buy a product that has all that code already? You need a whole level more of information to be able to tell whether a page is allocated.
      
      It’s fun as an intellectual exercise, but a huge amount of work to get it right, and to work when the database/disk is corrupt. And the whole thing is mitigated if you have backups. I strongly encourage you to focus your enthusiasm doing something else around SQL Server.
      
      Reply
      1. ZeroPerfect says:
        
        September 28, 2017 at 11:37 pm
        
        thank you for reply!
        I know this job is sometime useless when there are backups. But my job is always need to recovery data in case of no backups, and there are servreral tools in the Internet such as OrcaMDF（http://improve.dk/sql-server-corruption-recovery-when-all-else-fails/）。
        I know how to check that the page is allocated or not is a big problem to me, can I do it like this below:
        1.I scan all the IAM page first(get m_objid)
        2.Then I use the m_objid to find all other page that have the same m_objid,
        3.Then I check these page id in IAM page’s slot 2(the extent bitmap) to decide whether the page is allocate?
        4.I can also use the sys.sysallocunits information ,because this table keep the firstIAM, firstPG, ROOT PAGE. and I find this table’s first page id always be 16, so I can find the 16 page, then find next page by its next-page point in the page header to gather all the pages belong to the sys.sysallocunis. （In some way, I can do this to find other system table such as sys.sysschobjs/sys.sysrowsets~)
Paul Randal says:

September 29, 2017 at 2:16 am

No – there are all kinds of other cases you’d have to deal with to make sure that a page is allocated. I’m not going to spend my time on these with you I’m afraid, as you can ignore them if you have backups. I’m you’re trying to build a commercial tool, good luck to you, as I said, there are a myriad cases that you’re going to find it next to impossible to deal with, and you’ll need to work them all out for yourselves.

Reply
1. ZeroPerfect says:
  
  September 30, 2017 at 12:13 am
  
  thank you ! I will spend some time to do this , maybe it is hard for me!
  Very glad to communicate with you!
  
  Reply
zeroperfect says:

November 3, 2017 at 12:54 am

Hello,Paul!
I have a question that about the relationship below
to a page, the slotcnt*2 + m_freeCnt+m_freeData=8192 ? I found this is ok for most page, but there is some page do not suit this calcuate, such as page 13, belong to a system table sysobjvalues, its slotcnt:16 m_freeCnt:2693 m_freeData=5535, slotcnt*2 + m_freeCnt+m_freeData>8192, why?

Reply
1. Paul Randal says:
  
  November 3, 2017 at 12:53 pm
  
  m_freeCnt is not always maintained.
  
  Reply
Thiago says:

December 27, 2017 at 7:24 am

Hi Paul! Thanks for the very informative article.

Do you know why some pages read from disk (not the data returned by dbcc page) appears to have some extra bits set on the first entry in the slot array (I mean, the one at page[8190])?

I’ve got some examples of it in a Northwind.mdf database, examples:
Slot array entry value 6001 but the record is at 0x60, 6002 record at 0x60, and some weird ones, such as 09ec with record at 0989 (SlotCnt = 1).

It appears to happen only in the first slot array entry.

Thank you!

Reply
1. Paul Randal says:
  
  December 28, 2017 at 11:36 am
  
  Never seen that before – maybe an artifact of an older database (like Northwind)?
  
  Reply
Wilson Mateus says:

April 26, 2018 at 8:49 am

Hi Paul, how are you?

I have a question of filling a data page in a Heap Table.

A page has 143 slots used.

Metadata: ObjectId = 1365579903 m_prevPage = (0: 0) m_nextPage = (0: 0)
pminlen = 20 m_slotCnt = 143 m_freeCnt = 374
m_freeData = 7532 m_reservedCnt = 0 m_lsn = (36: 2328: 2)
m_xactReserved = 0 m_xdesId = (0: 0) m_ghostRecCnt = 0
m_tornBits = 0 DB Frag ID = 1

The size of the record is always 52 bytes fixed.

Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 52
Memory Dump @ 0x0000006B2FA5A060

And “m_freeCnt” has the value of 374. That means 374 bytes free.

But when I run INSERT on this Heap Table, SQL SERVER allocates a new page.

If I have a fixed 52-byte record, does not SQL Server allocate 155 slots?

The Heap Table has no index and has only 4 columns (INT, INT, DATETIME, and VARCHAR (50))

I did not understand what happened.

Thank you.

Reply
1. Wilson Mateus says:
  
  April 26, 2018 at 8:52 am
  
  I am using :
  Microsoft SQL Server 2017. (RTM) – 14.0.1000.169 (X64) Aug 22 2017 17:04:49 Copyright (C) 2017 Microsoft Corporation Developer Edition (64-bit) on Windows Server 2012 R2 Datacenter 6.3 (Build 9600 : (Hypervisor)
  
  Reply
2. Paul Randal says:
  
  April 26, 2018 at 8:55 am
  
  It’s because of the free-space search that occurs to find a heap page to insert into. The page is deemed to be almost full, and so it doesn’t make use of the last few bytes on the page. One more reason why heaps are generally less efficient that clustered indexes. Thanks
  
  Reply
Wolney Marconi Maia says:

September 19, 2018 at 11:52 am

Hi Paul …

When we perform a full backup, is the “Differential Changed Map” (which contains information about extensions modified since the last BACKUP DATABASE statement) fully cleaned?

Thank you!!

Wolney

Reply
1. Paul Randal says:
  
  September 19, 2018 at 12:03 pm
  
  Yes, except for extents that contain the per-4GB allocation bitmaps, and extents that contain some pages from critical Storage-Engine system tables.
  
  Reply
DW says:

June 5, 2019 at 7:36 pm

Hi Paul, Thanks for the great info. I’m trying to figure out a difference in docs I’ve found regarding LOB data. As you state above, pages have 8096 bytes available for data, though a single record can only be 8060 bytes. What I’m looking at is LOB data (varchar(max), varbinary(max), etc.) and using the UPDATE .WRITE syntax to do appends in the most efficient block size (from an application using buffered streams). MS docs on that state “For best performance, we recommend that data be inserted or updated in chunk sizes that are multiples of 8040 bytes.” I’m wondering why 8040 instead of 8060 or 8096? Do LOB pages have a different overhead? Thanks.

Reply
1. Paul Randal says:
  
  June 6, 2019 at 12:33 pm
  
  All text records have a 14-byte versioning tag added immediately, to avoid having to split up the values if versioning is turned on and there’s no space on the page for the versioning tag, plus there’s some overhead in the text record itself too.
  
  Reply
Paul says:

July 23, 2019 at 11:08 pm

Hello Paul, thank you for the excellent article.
Could you tell me why,u when reading data from
a heap with one page of data, i get one logical read as expected, and when i read the same page from a clustered index, i get two logical reads.
My test.

Create table dbo.t(id int)
INSERT dbo.t VALUES(1)

With a heap:
set statistics io on
select * from dbo.t

Table ‘t’.Scan count 1, logical reads 1.

With a clustered index:
create clustered index cl_ind on dbo.t(id)

Table ‘t’. Scan count 1, logical reads 2.
I tested on the version of SQL Server 2016(SP1).

Thank you very much.

Reply
1. Paul Randal says:
  
  July 31, 2019 at 1:35 pm
  
  It’s because it reads the index page twice – once being for readahead, but isn’t reported as such.
  
  Reply
Mehdi says:

April 26, 2021 at 11:52 am

Hi Paul!
Thank you for sharing your knowledge!
I have two questions:
What is bsort_r_nextbP?
How does one page find its next page? By file number and page number or by physical location?

Reply
1. Paul Randal says:
  
  April 26, 2021 at 12:12 pm
  
  1) It’s not documented – to do with the internals of sort so I’m not explaining it.
  2) Depends on whether it’s a page in an index or a heap – page pointers or IAM+PFS, respectively.
  
  Reply
Mehdi says:

April 26, 2021 at 6:29 pm

Thanks Paul!

Reply
Mehdi says:

March 28, 2022 at 3:46 pm

Hi Paul!
Why is there SGAM in SQL server 2016 and newer versions?

Reply
1. Paul Randal says:
  
  March 29, 2022 at 1:16 pm
  
  Because IAM pages are always taken from mixed extents.
  
  Reply

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.

Inside the Storage Engine: Anatomy of a page

Posted Under:

80 thoughts on “Inside the Storage Engine: Anatomy of a page”

Leave a Reply Cancel reply

Other articles

The SQL Server Transaction Log, Part 4: Log Records

SQL101: Application data consistency checking

SQL101: Phishing attacks

SQL101: Top Ten SQL Server Performance Tuning Best Practices

The Curious Case of… no buffer pool memory and no OS memory available

The SQL Server Transaction Log, Part 3: The Circular Nature of the Log

Imagine feeling confident enough to handle whatever your database throws at you.

Inside the Storage Engine: Anatomy of a page

Check It Out!

SQLskills Complete Short Course Bundle

Posted Under:

80 thoughts on “Inside the Storage Engine: Anatomy of a page”

Leave a Reply Cancel reply

Other articles

The SQL Server Transaction Log, Part 4: Log Records

SQL101: Application data consistency checking

SQL101: Phishing attacks

SQL101: Top Ten SQL Server Performance Tuning Best Practices

The Curious Case of… no buffer pool memory and no OS memory available

The SQL Server Transaction Log, Part 3: The Circular Nature of the Log

Imagine feeling confident enough to handle whatever your database throws at you.