Inside the Storage Engine: IAM pages, IAM chains, and allocation units

This is a combo from some previously posted material, with some more DBCC PAGE output thrown in.

IAM pages

An IAM (Index Allocation Map) page tracks approximately 4GB worth of space in a single file, aligned on a 4GB boundary. These 4GB chunks are called ‘GAM intervals’. An IAM page tracks which extents within that specific GAM interval belongs to a single entity (I’m chosing my words carefully here and not using any word that has SQL Server connotations like ‘object’).

An IAM page can only track the space for a single GAM interval in a single file so If the database has multiple files, or some files are more then 4GB, and the entity has space allocated from multiple files or multiple GAM intervals within a file, then you can see how multiple IAM pages are needed for each entity to track all the space that its using. If an entity requires multiple IAM pages to track all its extents, then they IAM page have to be linked together. That’s where an IAM chain comes in. More on these below.

Each IAM page has two records, an IAM page header and a bitmap. Let’s look at one with DBCC PAGE. I’m using the database from the page split post. Doing a DBCC IND on the table we created gives us:

SEQA3.jpg

By looking at the PageType column, we can see that there’s an IAM page (a page with type 10 – see the post on Anatomy of a page for more details) with page ID (1:152):

DBCC TRACEON (3604);

GO

DBCC PAGE (‘pagesplittest’, 1, 152, 3);

GO

m_pageId = (1:152)                   m_headerVersion = 1                  m_type = 10
m_typeFlagBits = 0x0                 m_level = 0                          m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 68     m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594042384384
Metadata: PartitionId = 72057594038386688                                 Metadata: IndexId = 1
Metadata: ObjectId = 2073058421      m_prevPage = (0:0)                   m_nextPage = (0:0)
pminlen = 90                         m_slotCnt = 2                        m_freeCnt = 6
m_freeData = 8182                    m_reservedCnt = 0                    m_lsn = (18:116:13)
m_xactReserved = 0                   m_xdesId = (0:0)                     m_ghostRecCnt = 0
m_tornBits = -1947725876

Allocation Status

GAM (1:2) = ALLOCATED                SGAM (1:3) = ALLOCATED
PFS (1:1) = 0x70 IAM_PG MIXED_EXT ALLOCATED   0_PCT_FULL                  DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED

IAM: Header @0x620CC064 Slot 0, Offset 96

sequenceNumber = 0                   status = 0x0                         objectId = 0
indexId = 0                          page_count = 0                       start_pg = (1:0)
IAM: Single Page Allocations @0x620CC08E

Slot 0 = (1:143)                     Slot 1 = (1:153)                     Slot 2 = (1:154)
Slot 3 = (0:0)                       Slot 4 = (0:0)                       Slot 5 = (0:0)
Slot 6 = (0:0)                       Slot 7 = (0:0)
IAM: Extent Alloc Status Slot 1 @0x620CC0C2

(1:0)        – (1:272)      = NOT ALLOCATED

Some things to note about the page header itself:

  • The page has type 10, as we’d expect
  • The previous and next page pointers are NULL, because there aren’t any other IAM pages in this IAM chain
  • The slot count is 2 – one for the IAM header record and one for the bitmap itself
  • The page is almost entirely full

The IAM page header has the following fields:

  • sequenceNumber
    • This is the position of the IAM page in the IAM chain. This increases by one for each page added to the IAM chain.
  • status
    • This is unused.
  • objectId
  • indexId
    • On SQL Server 2000 and before, these are the object  and index IDS that the IAM page is part of. On SQL Server 2005 and later they are unused.
  • page_count
    • This is unused  – it used to be the number of page IDs that are being tracked in the single page allocation array.
  • start_pg
    • This is the GAM interval that the page maps. It stores the first page ID in the mapped interval.
  • Single Page Allocations array
    • These are the pages that have been allocated from mixed extents. This array is only used in the first IAM page in the chain (as the whole IAM chain only need to track at most 8 single-page allocations).

The bitmap occupies the rest of the IAM page and has a bit for each extent in the GAM interval. The bit is set if the extent is allocated to the entity, and clear if it is not. Obviously two IAM pages that map the same GAM interval for different entities cannot both have the same bit set – this is checked by DBCC CHECKDB. In the output from DBCC PAGE above, you can see that there are no extents allocated to the table. You’ll notice that the output only goes up to the extent starting at page 272 in the file – this is because the data file is only that big. I inserted a bunch more rows into the table and did another DBCC PAGE of the IAM page. This time the DBCC PAGE output contains:

IAM: Single Page Allocations @0x620CC08E

Slot 0 = (1:143)                     Slot 1 = (1:153)                     Slot 2 = (1:154)
Slot 3 = (1:155)                     Slot 4 = (1:156)                     Slot 5 = (1:157)
Slot 6 = (1:158)                     Slot 7 = (1:159)

IAM: Extent Alloc Status Slot 1 @0x620CC0C2

(1:0)        – (1:152)      = NOT ALLOCATED
(1:160)      – (1:296)      =     ALLOCATED
(1:304)      – (1:400)      = NOT ALLOCATED

You can see that the entire single-page allocation array is full and then allocations switched to dedicated extents. The first available extent must have been the one starting at page 160 and all extents up to an including the one starting at page 296 are now allocated. Note also that the file must have grown because the output now goes up to page 400 in the file.

A couple more things to note about IAM pages:

  • There are themselves single-page allocations from mixed extents and are not tracked anywhere
  • They can be allocated from any file to track extents in any other file

IAM chains

If we continued to grow the file and fill up the table then eventually we’d need another IAM page to map the next GAM interval. This is where an IAM chain comes in. It’s a linked-list of IAM pages that track the space allocated to a single entity. The linked-list is not sorted at all – IAM pages are appended to it in the order that they’re needed. The IAM pages within the list are numbered, again, in the order that they were appended to the list.

Definition of ‘entity’ – what uses an IAM chain? This is vastly different in SQL Server 2000 and 2005.

In SQL Server 2000, a single IAM chain is used for each:

  • Heap or clustered index
    • A table can only have one or the other, not both. These have index IDs of 0 and 1 respectively.
  • Non-clustered index
    • These have index IDs from 2 to 250 (i.e. you can only have 249 of them)
  • Table’s complete LOB storage
    • For LOB columns (text, ntext, image) in the heap or clustered index. This is sometimes called the ‘text index’ and has a fixed index ID of 255.

This gives a maximum of 251 IAM chains per object in SQL Server 2000 and before. I usually generalize and say that in SQL Server 2000, there’s one IAM chain per index (which fits nicely if you remember that IAM stands for Index Allocation Map).

Allocation units (SQL Server 2005 and later)

Now from SQL Server 2005 onward, things have changed a lot. IAM chains and IAM pages are exactly the same, but what they correspond to is different. A table can now have up to 45 million IAM chains! There are now three things that IAM chains map space allocations for:

  1. heaps and b-trees (a b-tree is the internal structure used to store an index)
  2.  LOB data
  3. row-overflow data

and we now call these units of space allocation tracking allocation units. The internal names for these three types of allocation unit are (respectively):

  1. hobt allocation unit (Heap Or BTree, pronounced ‘hobbit’, yes, as in Lord Of The Rings)
  2. LOB allocation unit
  3. SLOB allocation unit (Small-LOB or Short-LOB)

and the external names are, respectively:

  1. IN_ROW_DATA allocation unit
  2. LOB_DATA allocation unit
  3. ROW_OVERFLOW_DATA allocation unit

They couldn’t really continue to be called IAM chains, because they’re no longer tracking space allocation for an index. However, they’re chain of IAM pages is still called an IAM chain, and the unit of tracking is now called an allocation unit. Apart from that, there’s no difference.

Let’s have a quick look at three new features in SQL Server 2005 that made these changes necessary and boosted the number of potential IAM chains per table.

Included Columns
This is the ability for non-clustered indexes to include non-key columns at the leaf-level. This is useful for three reasons:

  1. Iit allows a non-clustered index to truly cover a query where the query results include more than 16 columns or the combination of column lengths in the query results is greater than 900 bytes (remember that a non-clustered index key is limited to 16 columns and 900 bytes).
  2. It allows columns to be include in the non-clustered index that have data types that cannot be part of an index key (e.g. varchar(max) or XML).
  3. It allows a non-clustered index to cover a query without having to have all the columns included in the index key. As the index key is included in rows at all levels of the b-tree, this allows the index to be smaller.

An example of space saving: imagine a 100 million row index, with a key length of 900 bytes, but only the first two integer keys are really needed as the index key, the other 4 fixed-length columns could be stored in the index as included columns. With the 900 byte key, 8 rows can fit per database page (i.e. the fanout is 8). This means there will be 12500000 pages at the leaf level, 1562500 pages at the next level up in the b-tree and so on, giving a total of 12500000 + 1562500 + 195313 + 24415 + 3052 + 382 + 48 + 6 + 1 = 14285717 pages (including 1785717 to store the upper levels of the b-tree).

If we go with the included columns method then the key size shrinks to 8 bytes, and with the row overhead we can get the row length in the upper levels of the b-tree down to 15 bytes (giving a fanout of approx. 537). Note that the fanout at the leaf-level is still going to be 8,  because the amount of data stored in each row at the leaf-level is the same. So, this means there will be 12500000 pages at the leaf level, 23278 pages at the next level up and so on, giving a total of 12500000 + 23278 + 44 + 1 = 12523323 pages (including 23323 to store the upper levels of the b-tree). Compared to the full-size 900-byte key, this is a 12% saving of 1762394 pages, or 13.6GB! Obviously this is a contrived case but you can see how the savings can occur.

The main reason for adding this feature it to enable true covering queries. A covering query is one where the query optimizer knows it can get all the query results from the non-clustered index and so the query can be satisfied without having to incur the extra IOs of looking up data in the base table – a significant performance saving.

Now that non-clustered indexes can have included columns, and those columns can be LOB data types (but only the new ones in SQL Server 2005 – varchar(max), nvarchar(max), varbinary(max), and XML). This means that having a single LOB allocation unit (as in the case of the single text index in SQL Server 2000) isn’t possible any more because each index may have its own set of LOBs. Now, you may ask why there isn’t just a single set of LOBs with multiple references from various indexes plus the base table. We considered that during SQL Server 2005 development but it would have made things a lot more complicated.

So, with this new feature, each index needs two allocation units – one for the data or index rows (the hobt allocation unit) and one for any LOB data.

Large Rows

One of the things that has plagued schema designers for a long time is the 8060 byte limit on table row sizes so this restriction was removed in SQL Server 2005. The way this is done is to allow variable-length columns (e.g. varchar, sqlvariant) to get pushed off-row when the row size gets too big to fit on a single page.

But where do these column values get pushed to? They’re effectively turned into mini LOB columns. The column value in the row is replaced with a 24-byte (and possibly 36, 48, 60, or 72-byte) pointer to the off-row column value, which is stored as if its a LOB value in a seperate allocation unit – the row-overflow (or SLOB) allocation unit. These values are stored in text pages in exactly the same way as regular LOB values are, just using a separate allocation unit. The SLOB allocation unit is only created when the first column value is pushed off-row.

This feature works for non-clustered indexes too – if you consider the ability to have included columns in non-clustered indexes then you could easily have non-clustered index rows that won’t fit on a page. It would have been short-sighted of to get rid of the 900-byte limit and replace it with an 8060-byte limit by not extending the row-overflow feature to non-clustered indexes too.

Now with the addition of this new feature, each index can have up to three allocation units – hobt, LOB, and SLOB. Even with this, that only makes a maximum of 750 IAM chains per table for SQL Server 2005 (remember an IAM chain now maps the storage allocations for an allocation unit, so 250 indexes * 3 allocation units = 750 IAM chains) and 3000 IAM chains per table for SQL Server 2008 onward (where the number of indexes per table can be 1000). But I mentioned 45 million IAM chains per table earlier – where do all the rest come from?

Partitioning

Partitioning gives us the 15,000x multiplier. Partitioning  allows tables and indexes to be split into a series of ranges, with each range stored separately (most commonly in seperate filegroups). Partitioning is a topic for a separate post.

If each range or partition of the table or index is stored seperately, then each is going to need its own hobt allocation unit. Of course, the LOB values associated with each partition need to be stored with it, and so each partition also needs a LOB allocation unit. Also, the row-overflow feature is per-row, and so rows in each partition will overflow into SLOB allocation units just as for un-partitioned tables and indexes. Thus each partition of a table or index can have up to three allocation units (and hence three IAM chains).

Still, where does that 15,000 come in? Each table or index can have up to 15,000 partitions from SQL Server 2008 SP2 onward (1,000 partitions before that). This gives us 1000 indexes x 15,000 partitions x 3 allocation units = 45 millionIAM chains. Realistically this probably won’t happen, but it is possible.

10 thoughts on “Inside the Storage Engine: IAM pages, IAM chains, and allocation units

  1. Does the 1000 partitions limit for each table/index still exist with SQL Server 2008? do you have any resource where this is mentioned?

    many thanks for your fantastic blog

    Abdel

    1. I thought that in SQL server 2008 there is a switch to give you up to 15,000 partitions? Or is this only in r2 and above?

  2. Thank for a nice blog. Can you please tell me if an IAM page tracks more than one object?

    I have done some analysis on this and to me it looks as if the single page slot array in IAM pages tracks allocations to a specific object. Hence, each object has its own IAM page/Chain.

    Secondly, does the IAM page tell us if the extent allocated to the object is a mixed extend or a dedicated one?

    Thanks

    1. You’re welcome.

      IAM pages always apply to only one allocation unit (portion of a single index or object). The pages in the single page slot array come from mixed extents always, the bits in the main IAM bitmap are for dedicated extents always.

  3. Hi,Paul,
    I want to ask you a question, the m_indexid in system table’s page is not always its real indexid? because I find that in object 5,that is the sys.sysrowsets table , this object has a clustered index, so its page’s real index id must be 1, but I find in page 18,its m_object id is 5,but its m_indexid is 0, so I do not know whether the m_index id is not always its real index_id(in user table ,the m_indexid is always 0x100,but in system table, it sometime equals its real indexid,but sometime it is not. why?

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.