Monday, April 07, 2008

Now we're back from Iceland and I have a week to catch up with some content development before the MVP Summit next week and then SQL Connections the following week.

One of the things I struggled with earlier in the year while writing a SQL Server 2008 training course for Microsoft was how to get FILESTREAM to work with partitioning. There wasn't (and still isn't) any information in Books Online that I could find so I had to play around to figure it out.

I should say that the CTP-6/February CTP version of Books Online *does* have a bunch of code examples around using FILESTREAM, so I'm not going to write a blog post about that. Look in the Getting Started with FILESTREAM Storage section (or paste this link into the Books Online URL: window).

Back to partitioning - first I created a test database:

CREATE DATABASE FileStreamTestDB
ON PRIMARY
   
(NAME = FileStreamTestDB_data,
      
FILENAME = N'C:\Metro Demos\FileStreamTestDB\FSTestDB_data.mdf'),
FILEGROUP FileStreamFileGroup CONTAINS FILESTREAM
   
(NAME = FileStreamTestDBDocuments,
      
FILENAME = N'C:\Metro Demos\FileStreamTestDB\Documents')
LOG ON
   
(NAME = 'FileStreamTestDB_log',
      
FILENAME = N'C:\Metro Demos\FileStreamTestDB\FSTestDB_log.ldf');
GO

Then I tried the obvious, knowing that I can't partition on the ROWGUIDCOL:

CREATE PARTITION FUNCTION MyPartFunction (INT) AS RANGE RIGHT FOR VALUES (1000, 2000);

CREATE PARTITION SCHEME MyPartScheme AS PARTITION MyPartFunction ALL TO ([PRIMARY]);

CREATE TABLE FileStreamTest (
   
TestId UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,
   
Customer INT,
   
[
Name] VARCHAR (25),
   
[
Document] VARBINARY(MAX) FILESTREAM)
ON MyPartScheme (Customer);
GO

Partition scheme 'MyPartScheme' has been created successfully. 'PRIMARY' is marked as the next used filegroup in partition scheme 'MyPartScheme'.
Msg 1921, Level 16, State 4, Line 8
Invalid filegroup 'default' specified.

Eventually I worked out that you have to define a separate partitioning scheme just for FILESTREAM data. This is because the regular data is stored on non-FILESTREAM filegroups, so trying to use the regular partitioning scheme for FILESTREAM would mean telling the Engine to store the FILESTREAM data in non-FILESTREAM filegroups. Clearly a non-starter. Ok - try again with a separate partitioning scheme (the prior MyPartFunction partition function and MyPartScheme partition scheme already exist now remember):

CREATE PARTITION SCHEME MyFSPartScheme AS PARTITION MyPartFunction ALL TO ([FileStreamFileGroup]);

CREATE TABLE FileStreamTest (
   
TestId UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,
   
Customer INT,
   
[Name] VARCHAR (25),
   
[Document] VARBINARY(MAX) FILESTREAM)
ON MyPartScheme (Customer)
FILESTREAM_ON MyFSPartScheme;
GO

Partition scheme 'MyFSPartScheme' has been created successfully. 'FileStreamFileGroup' is marked as the next used filegroup in partition scheme 'MyFSPartScheme'.
Msg 1908, Level 16, State 1, Line 1
Column 'Customer' is partitioning column of the index 'UQ__FileStreamTest__03317E3D'. Partition columns for a unique index must be a subset of the index key.
Msg 1750, Level 16, State 0, Line 1
Could not create constraint. See previous errors.

Hmm - I can't partition on Customer because there's already a unique index over TestId - UNLESS I specifically set the unique index on TestId to be non-partitioned by setting a filegroup for it:

CREATE TABLE FileStreamTest (
   
TestId UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE ON [PRIMARY],
   Customer INT,
   
[Name] VARCHAR (25),
   
[Document] VARBINARY(MAX) FILESTREAM)
ON MyPartScheme (Customer)
FILESTREAM_ON MyFSPartScheme;
GO

That works! Now - the BIG issue with this setup is that switching partitions won't work while the unaligned index is enabled. So how to disable it? First we need to find out what it's called:

SELECT * FROM sys.indexes WHERE [object_id] = OBJECT_ID ('FileStreamTest');
GO

And then disable it:

ALTER INDEX UQ__FileStre__8CC33161060DEAE8 ON FileStreamTest DISABLE;

Now you can do partition switching. Here's the catch - to re-enable the index you need to REBUILD it - which is a size of data operation! The upshot of all this is that partitioning can be made to work with FILESTREAM data but partition switching is no longer a metadata-only operation.

Hopefully this will be addressed for V2.

Monday, April 07, 2008 3:07:33 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Wednesday, March 12, 2008

A couple of weeks ago I blogged about the three tracks of the SQL Server 2008 JumpStart course that SQLskills.com taught internally for Microsoft and some MVPs - see here for details. Well, the content is now available to download! Note that this was based on CTP-5 (November 2007 CTP) and there have been *lots* of behavioral changes since then (with more planned for CTP-6 Refresh and RTM), but if you want a high-level overview of a bunch of the features (albeit in bullet-point summaries on slides) then this is a good place to start.

For me, what's *REALLY* cool is that the site also has a downloadable VPC plus lab manuals for all of the AlwaysOn High-Availability hands-on labs that SQLskills.com wrote. We originally wrote these labs for SQL Server 2005 and I updated them all for CTP-5. The VPC has a long lab on each of the following:

  • Database Snapshots
  • Data Recovery and Preventative Techniques
  • Instant Initialization
  • Peer-to-Peer Replication (including the new Topology Wizard I blogged about here)
  • Table and Index Partitioning
  • Snapshot Isolation
  • Online Operations
  • Database Mirroring (including a demo I wrote of Automatic Page Repair, described here)
  • Service Oriented Database Architecture

There is some great depth in each of these - Kimberly blogged more info about the exercises in each lab here. There's also another VPC image with some higher-level labs on a variety of 2008 features and written by a number of different people- including some labs on Policy-Based Management and Performance Data Collection that Kimberly wrote.

So - where can you get these from? Go to http://sqlserver2008jumpstart.microsofttraining.com/ and hit the Download link on the right-hand side. Register and then you can get to the materials. The AlwaysOn VPC image is Collection 2 at the bottom of the page, and you'll see all the slide decks as you scroll down the page.

Enjoy!

Wednesday, March 12, 2008 9:23:23 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Thursday, February 14, 2008

Phew - last week Kimberly and I spent 3 days teaching the ins-and-outs of SQL Server 2008 for DBAs/IT-Pros to about 130 Microsoft SQL Server experts and MVPs (like Kalen Delaney, Adam Machanic and Ron Talmage). This was the (95% complete) Beta delivery of a course we've been developing for the last six months for Microsoft that they'll use to train their SQL experts around the world on the new release. It's been very interesting watching the features develop through the CTPs (especially since I left the fold last August) - and making demos work on pre-release builds of the CTPs.

Teaching the course was a *blast* - the thing I love about teaching a really geeky crowd is the plethora of great questions and opportunities for going deep with explanations. Our team actually wrote and delivered the concurrently presented Developer and BI tracks as well. As you can see from the list below (and this is just the features a DBA needs to use/know about), SQL Server 2008 isn't a dot release of Yukon at all, as some people have suggested. Over the three days we covered:

  • Database Mirroring (D)
  • Backup Compression
  • Peer-to-Peer Replication (D)
  • Transparent Data Encryption (D)
  • Extensible (Off-Box) Key Management
  • All Actions Audited (D)
  • Policy-Based Management
  • Resource Governor (D)
  • Extended Events (D)
  • Spatial Indexes
  • Integrated Full-Text Search
  • Sparse Columns (D)
  • Filtered Indexes
  • Change Tracking
  • Change Data Capture (D)
  • FILESTREAM (D)
  • Performance Data Collection
  • Query Optimizer Enhancements
  • Data Compression (D)
  • Service Broker
  • Partition-Level Lock Escalation (D)

The features marked with a (D) are ones I demo'd during the course (Kimberly demo'd a bunch of the others - especially the tools features). Some of the demos were challenging to make work in time as we only got a pre-CTP6 build mid-January just before we headed off to China.

So why am I posting this? Well, a bunch of these features are in CTP-6, which should be just around the corner, and I have some easy-to-understand demos of them that I'll be posting here over the next month or so. Also, if this course sounds interesting, Kimberly and I will be teaching it in various configurations over the next year - starting with SQL Connections in April, a soon-to-be-announced class in Iceland in March, and the ITPro portion of TechEd in June.

Watch this space starting next week (today's the last day of six straight weeks of teaching for us so this weekend's a break :-))

Thursday, February 14, 2008 4:38:30 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Monday, January 14, 2008

Back in October 2007 I blogged about partition-level lock escalation in SQL Server 2008 (see here) and I promised to do a follow-up once CTP-5 came out with syntax etc. So here it is.

A brief recap - lock escalation in SQL Server 2005 and before only allowed table-level lock escalation. If you have a partitioned table with queries going against different partitions, then table-level escalation is a pain because the whole table is suddenly locked and concurrent queries against distinct partitions can't run. SQL Server 2008 gives the ability to escalate to a parttition lock, which won't affect the queries on the other partitions.

The lock escalation policy can only be set with ALTER TABLE after a table has been created, and the policy can only be set at the table level. The syntax is

ALTER TABLE TableName SET (LOCK_ESCALATION = TABLE | AUTO | DISABLE);

The options mean:

  • TABLE - escalation will always be to the table level. This is the default.
  • AUTO - escalation will be to the partition level if the table is partitioned; otherwise it will be to the table level
  • DISABLE - escalation will be disabled. This does not guarantee that it will NEVER occur - there are some cases where it is necessary (Books Online gives the example of scanning a heap in the SERIALIZABLE isolation level)

The only way I could find to check what the escalation policy for a table is set to is to use the sys.tables catalog view:

SELECT lock_escalation_desc FROM sys.mytables WHERE name = 'TableName';

Let's try it out. Here's a script that creates a database with an example table with 3 partitions. The partition ranges are negative infinity to 7999, 8000 to 15999, 16000 to positive infinity.

CREATE DATABASE LockEscalationTest;
GO

USE LockEscalationTest;
GO

-- Create three partitions: -7999, 8000-15999, 16000+
CREATE PARTITION FUNCTION MyPartitionFunction (INT) AS RANGE RIGHT FOR VALUES (8000, 16000);
GO

CREATE PARTITION SCHEME MyPartitionScheme AS PARTITION MyPartitionFunction
ALL TO ([PRIMARY]);
GO

-- Create a partitioned table
CREATE TABLE MyPartitionedTable (c1 INT);
GO

CREATE CLUSTERED INDEX MPT_Clust ON MyPartitionedTable (c1)
ON MyPartitionScheme (c1);
GO

-- Fill the table
SET NOCOUNT ON;
GO

DECLARE @a INT = 1;
WHILE (@a < 17000)
BEGIN
INSERT INTO MyPartitionedTable VALUES (@a);
SELECT @a = @a + 1;
END;
GO

Now I'm going to explicitly set the escalation to TABLE and start a transaction that should cause lock escalation.

ALTER TABLE MyPartitionedTable SET (LOCK_ESCALATION = TABLE);
GO

BEGIN TRAN
UPDATE
MyPartitionedTable SET c1 = c1 WHERE c1 < 7500;
GO

We should be able to see the locks being held:

SELECT [resource_type], [resource_associated_entity_id], [request_mode],
[request_type], [request_status] FROM sys.dm_tran_locks WHERE [resource_type] <> 'DATABASE';
GO

resource_type   resource_associated_entity_id request_mode   request_type   request_status
--------------- ----------------------------- -------------- -------------- ----------------
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
OBJECT          2105058535                    X              LOCK           GRANT

Just as we expected - an X table lock. Trying any query against the table fails now. Now I'll rollback that transaction, set the escalation to partition-level and try again.

ROLLBACK TRAN;
GO

ALTER TABLE MyPartitionedTable SET (LOCK_ESCALATION = AUTO);
GO

BEGIN TRAN
UPDATE
MyPartitionedTable SET c1 = c1 WHERE c1 < 7500;
GO

SELECT [partition_id], [object_id], [index_id], [partition_number]
FROM sys.partitions WHERE object_id = OBJECT_ID ('MyPartitionedTable');
GO

SELECT [resource_type], [resource_associated_entity_id], [request_mode],
[request_type], [request_status] FROM sys.dm_tran_locks WHERE [resource_type] <> 'DATABASE';
GO

partition_id         object_id   index_id    partition_number
-------------------- ----------- ----------- ----------------
72057594039042048    2105058535  1           1
72057594039107584    2105058535  1           2
72057594039173120    2105058535  1           3

resource_type   resource_associated_entity_id request_mode   request_type   request_status
--------------- ----------------------------- -------------- -------------- ----------------
HOBT            72057594039042048             X              LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
OBJECT          2105058535                    IX             LOCK           GRANT

Excellent - the object lock is now IX rather than X, and the X lock is at the partition (HOBT) level for partition 1 (see the bold highlighting to match the partition ID with the lock resource). (For an explanation of HOBTs, see my post Inside The Storage Engine: IAM pages, IAM chains, and allocation units.) So now we should be able to do something with another partition - let's see if we can cause another partition level X lock in another connection:

USE LockEscalationTest;
GO

BEGIN TRAN
UPDATE
MyPartitionedTable set c1 = c1 WHERE c1 > 8100 AND c1 < 15900
;
GO

SELECT [partition_id], [object_id], [index_id], [partition_number]
FROM sys.partitions WHERE object_id = OBJECT_ID ('MyPartitionedTable'
);
GO

SELECT [resource_type], [resource_associated_entity_id], [request_mode],
[request_type], [request_status] FROM sys.dm_tran_locks WHERE [resource_type] <> 'DATABASE'
;
GO

partition_id         object_id   index_id    partition_number
-------------------- ----------- ----------- ----------------
72057594039042048    2105058535  1           1
72057594039107584    2105058535  1           2
72057594039173120    2105058535  1           3

resource_type   resource_associated_entity_id request_mode   request_type   request_status
--------------- ----------------------------- -------------- -------------- ----------------
HOBT            72057594039107584             X              LOCK           GRANT
HOBT            72057594039042048             X              LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
OBJECT          2105058535                    IX             LOCK           GRANT
OBJECT          2105058535                    IX             LOCK           GRANT

Now we have two partition X locks, for partitions 1 and 2 (as expected - use the color coding above to match up the IDs), plus two table-level IX locks (one for each  connection, as expected). Very cool!

Now I'm going to force a deadlock - by having each connection try to read a row from the other locked partition:

Connection 1:

SELECT * FROM MyPartitionedTable WHERE c1 = 8500;
GO

Conneciton 2:

SELECT * FROM MyPartitionedTable WHERE c1 = 100;
GO

Connection 2 succeeds but on connection 1 we get (as expected):

(local)\SQLDEV01(SQLHAVPC\Administrator): Msg 1205, Level 13, State 18, Line 1
Transaction (Process ID 51) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

This illustrates a potential problem with this new mechanism - applications that used to rely on the blocking nature of X table locks may now exhibit deadlocks if partition-level escalation is turned on in production without any testing. In fact, this mode was specifically chosen NOT to be the default setting for new tables because some trial workloads exhibited deadlocks during testing. Don't just turn it on in production without testing - as with any other option or feature.

Monday, January 14, 2008 2:10:24 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Tuesday, December 04, 2007

A quickie today to get back into the swing of things.

In Kimberly's whitepaper on partitioning she discusses the 'sliding window' scenario (where you switch in and out partitions of data into an existing table - see this previous post for a few more details). She recommends that the constraints are extended rather than dropped and recreated - which I totally agree with. I had a question about why this is a best practice, and is it more efficient than dropping and recreating the constraints?

Let's create a little example to illustrate all these points. A simple table called Sales with a couple of indexes and 100000 rows of data.

CREATE TABLE Sales (salesID INT IDENTITY, SalesDate DATETIME);
GO

CREATE CLUSTERED INDEX Sales_CL ON Sales (SalesID);
CREATE NONCLUSTERED INDEX Sales_NCL ON Sales (SalesDate
);
GO

SET NOCOUNT ON;
GO

DECLARE @a INT;
SELECT @a = 1
;
WHILE (@a < 100000
)
BEGIN

INSERT INTO Sales VALUES (GETDATE ());
SELECT @a = @a + 1;

END;
GO

Now I want to create two constraints - for the lower and upper bounds of the sales date. I could do this using a single constraint with both conditions or two constraints with a single condition each. For simplicity I'll use one constraint, but first I want to see how expensive the operation is, so I'm going to turn on STATISTICS IO - this is a very cool feature that gives the IO costs of a query after it's completed.

SET STATISTICS IO ON;
GO

ALTER TABLE Sales WITH CHECK ADD CONSTRAINT [CK_Sales_SalesDate] CHECK (YEAR (SalesDate) >= 2005 AND YEAR (SalesDate) < 2008);
GO

Table 'Sales'. Scan count 1, logical reads 399, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The 399 logical reads are for the table scan that's done to ensure that the constraint is valid for the data currently in the table.

So - the first question is why not drop/create the constraint to update it? Well, what if invalid data is entered into the table between dropping and recreating the constraint?

ALTER TABLE Sales DROP CONSTRAINT [CK_Sales_SalesDate];
GO

ALTER TABLE Sales WITH CHECK ADD CONSTRAINT [CK_Sales_SalesDate] CHECK (YEAR (SalesDate) >= 2005 AND YEAR (SalesDate) < 2009);
GO

Msg 547, Level 16, State 0, Line 1

The ALTER TABLE statement conflicted with the CHECK constraint "CK_Sales_SalesDate". The conflict occurred in database "ConstraintTest", table "dbo.Sales", column 'SalesDate'.

The constraint can't be recreated and you have to find the invalid data and get rid of it - which may not be as easy as it sounds depending on your schema and business logic. So, the best practice is always to update a constraint. Well, you can't update an existing constraint but you can create a new constraint with different conditions and then drop the old constraint:

ALTER TABLE Sales WITH CHECK ADD CONSTRAINT [CK_Sales_SalesDate2] CHECK (YEAR (SalesDate) >= 2005 AND YEAR (SalesDate) < 2009);
ALTER TABLE Sales DROP CONSTRAINT [CK_Sales_SalesDate]
;
GO

Table 'Sales'. Scan count 1, logical reads 399, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The second question is which way is more efficient? The answer is neither. Dropping and recreating the constraint will obviously do a table scan again, but so does adding the new constraint - even though there's a trusted constraint in place already which guarantees that adding the new constraint can't possibly fail!!! Hopefully in the future the smarts will be built into the SQL Engine to recognize this and not do the unnecessary table scan (this process is known as interval subsumption - according to my geeky wife :-))

Tuesday, December 04, 2007 4:51:14 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Wednesday, October 31, 2007

This is a quick answer to a question I was sent today by someone who'd read Kimberly's partitioning whitepaper - Partitioned Tables and Indexes in SQL Server 2005 - and is implementing a "sliding-window" scenario. (This is a mechanism to allow fast insertion and deletion of significant portions of data into/from a partitioned production table. Insertion is done by taking a table and making it a new partition of the production table - called switching-in. Deletion is done by removing a partition from the production table and making it into a stand-alone table - called switching-out.)

The question is - what indexes are required on the staging table to prevent the ALTER TABLE ... SWITCH PARTITION statement from failing with a message like that below:

Msg 4947, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. There is no identical index in source table 'PartitionTest.dbo.StagingTable' for the index 'NC_Birthday' in target table 'PartitionTest.dbo.ProductionTable'.

The answer is that the staging table has to have the exact same indexes - clustered and non-clustered - as the production table. I asked Kimberly if it has to have the same constraints too - the answer is yes, plus the staging table has to have a trusted constraint on it such that SQL Server can tell (without checking all the data in the staging table) that all the data satisfies the partitioning function for the partition that you're switching-in (i.e. the partition that the staging table will become in the production table). If it doesn't, the switching-in will fail with the following error:

Msg 4982, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. Check constraints of source table 'PartitionTest.dbo.StagingTable' allow values that are not allowed by range defined by partition 4 on target table 'PartitionTest.dbo.ProductionTable'.

One thing that confuses people is that SQL Server does not create the target table for you when doing a switch-out of a partition. The target table has to exist and have the exact same schema as the production table. Also, it has to be completely empty - otherwise you'll get an error like:

Msg 4905, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. The target table 'PartitionTest.dbo.StagingTable' must be empty.

The must-be-empty requirement also holds for switching-in operations - the partition that will be created has to be empty otherwise a similar 4904 error results.

Hope this helps!

Wednesday, October 31, 2007 6:05:57 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Wednesday, October 03, 2007

This is a combo from some previously posted material, with some more DBCC PAGE output thrown in.

IAM pages

An IAM (Index Allocation Map) page tracks approximately 4GB worth of space in a single file, aligned on a 4GB boundary. These 4GB chunks are called 'GAM intervals'. An IAM page tracks which extents within that specific GAM interval belongs to a single entity (I'm chosing my words carefully here and not using any word that has SQL Server connotations like 'object').

An IAM page can only track the space for a single GAM interval in a single file so If the database has multiple files, or some files are more then 4GB, and the entity has space allocated from multiple files or multiple GAM intervals within a file, then you can see how multiple IAM pages are needed for each entity to track all the space that its using. If an entity requires multiple IAM pages to track all its extents, then they IAM page have to be linked together. That's where an IAM chain comes in. More on these below.

Each IAM page has two records, an IAM page header and a bitmap. Let's look at one with DBCC PAGE. I'm using the database from the page split post. Doing a DBCC IND on the table we created gives us:

SEQA3.jpg

By looking at the PageType column, we can see that there's an IAM page (a page with type 10 - see the post on Anatomy of a page for more details) with page ID (1:152):

DBCC TRACEON (3604);

GO

DBCC PAGE ('pagesplittest', 1, 152, 3);

GO

m_pageId = (1:152)                   m_headerVersion = 1                  m_type = 10
m_typeFlagBits = 0x0                 m_level = 0                          m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 68     m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594042384384
Metadata: PartitionId = 72057594038386688                                 Metadata: IndexId = 1
Metadata: ObjectId = 2073058421      m_prevPage = (0:0)                   m_nextPage = (0:0)
pminlen = 90                         m_slotCnt = 2                        m_freeCnt = 6
m_freeData = 8182                    m_reservedCnt = 0                    m_lsn = (18:116:13)
m_xactReserved = 0                   m_xdesId = (0:0)                     m_ghostRecCnt = 0
m_tornBits = -1947725876

Allocation Status

GAM (1:2) = ALLOCATED                SGAM (1:3) = ALLOCATED
PFS (1:1) = 0x70 IAM_PG MIXED_EXT ALLOCATED   0_PCT_FULL                  DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED

IAM: Header @0x620CC064 Slot 0, Offset 96

sequenceNumber = 0                   status = 0x0                         objectId = 0
indexId = 0                          page_count = 0                       start_pg = (1:0)


IAM: Single Page Allocations @0x620CC08E

Slot 0 = (1:143)                     Slot 1 = (1:153)                     Slot 2 = (1:154)
Slot 3 = (0:0)                       Slot 4 = (0:0)                       Slot 5 = (0:0)
Slot 6 = (0:0)                       Slot 7 = (0:0)


IAM: Extent Alloc Status Slot 1 @0x620CC0C2

(1:0)        - (1:272)      = NOT ALLOCATED

Some things to note about the page header itself:

  • The page has type 10, as we'd expect
  • The previous and next page pointers are NULL, because there aren't any other IAM pages in this IAM chain
  • The slot count is 2 - one for the IAM header record and one for the bitmap itself
  • The page is almost entirely full

The IAM page header has the following fields:

  • sequenceNumber
    • This is the position of the IAM page in the IAM chain. This increases by one for each page added to the IAM chain.
  • status
    • This is unused.
  • objectId
  • indexId
    • On SQL Server 2000 and before, these are the object  and index IDS that the IAM page is part of. On SQL Server 2005 and later they are unused.
  • page_count
    • This is unused  - it used to be the number of page IDs that are being tracked in the single page allocation array.
  • start_pg
    • This is the GAM interval that the page maps. It stores the first page ID in the mapped interval.
  • Single Page Allocations array
    • These are the pages that have been allocated from mixed extents. This array is only used in the first IAM page in the chain (as the whole IAM chain only need to track at most 8 single-page allocations).

The bitmap occupies the rest of the IAM page and has a bit for each extent in the GAM interval. The bit is set if the extent is allocated to the entity, and clear if it is not. Obviously two IAM pages that map the same GAM interval for different entities cannot both have the same bit set - this is checked by DBCC CHECKDB. In the output from DBCC PAGE above, you can see that there are no extents allocated to the table. You'll notice that the output only goes up to the extent starting at page 272 in the file - this is because the data file is only that big. I inserted a bunch more rows into the table and did another DBCC PAGE of the IAM page. This time the DBCC PAGE output contains:

IAM: Single Page Allocations @0x620CC08E

Slot 0 = (1:143)                     Slot 1 = (1:153)                     Slot 2 = (1:154)
Slot 3 = (1:155)                     Slot 4 = (1:156)                     Slot 5 = (1:157)
Slot 6 = (1:158)                     Slot 7 = (1:159)


IAM: Extent Alloc Status Slot 1 @0x620CC0C2

(1:0)        - (1:152)      = NOT ALLOCATED
(1:160)      - (1:296)      =     ALLOCATED
(1:304)      - (1:400)      = NOT ALLOCATED

You can see that the entire single-page allocation array is full and then allocations switched to dedicated extents. The first available extent must have been the one starting at page 160 and all extents up to an including the one starting at page 296 are now allocated. Note also that the file must have grown because the output now goes up to page 400 in the file.

A couple more things to note about IAM pages:

  • There are themselves single-page allocations from mixed extents and are not tracked anywhere
  • They can be allocated from any file to track extents in any other file

IAM chains

If we continued to grow the file and fill up the table then eventually we'd need another IAM page to map the next GAM interval. This is where an IAM chain comes in. It's a linked-list of IAM pages that track the space allocated to a single entity. The linked-list is not sorted at all - IAM pages are appended to it in the order that they're needed. The IAM pages within the list are numbered, again, in the order that they were appended to the list.

Definition of 'entity' - what uses an IAM chain? This is vastly different in SQL Server 2000 and 2005.

In SQL Server 2000, a single IAM chain is used for each:

  • Heap or clustered index
    • A table can only have one or the other, not both. These have index IDs of 0 and 1 respectively.
  • Non-clustered index
    • These have index IDs from 2 to 250 (i.e. you can only have 249 of them)
  • Table's complete LOB storage
    • For LOB columns (text, ntext, image) in the heap or clustered index. This is sometimes called the 'text index' and has a fixed index ID of 255.

This gives a maximum of 251 IAM chains per object in SQL Server 2000 and before. I usually generalize and say that in SQL Server 2000, there's one IAM chain per index (which fits nicely if you remember that IAM stands for Index Allocation Map).

Allocation units (SQL Server 2005 and later)

Now in SQL Server 2005 and later, things have changed a lot. IAM chains and IAM pages are exactly the same, but what they correspond to is different. A table can now have up to 750000 IAM chains! There are now three things that IAM chains map space allocations for:

  1. heaps and b-trees (a b-tree is the internal structure used to store an index)
  2.  LOB data
  3. row-overflow data

and we now call these units of space allocation tracking allocation units. The internal names for these three types of allocation unit are (respectively):

  1. hobt allocation unit (Heap Or B-Tree, pronounced 'hobbit', yes, as in Lord Of The Rings)
  2. LOB allocation unit
  3. SLOB allocation unit (Small-LOB or Short-LOB)

and the external names are, respectively:

  1. IN_ROW_DATA allocation unit
  2. LOB_DATA allocation unit
  3. ROW_OVERFLOW_DATA allocation unit

They couldn't really continue to be called IAM chains, because they're no longer tracking space allocation for an index. However, they're chain of IAM pages is still called an IAM chain, and the unit of tracking is now called an allocation unit. Apart from that, there's no difference.

Let's have a quick look at three new features in SQL Server 2005 that made these changes necessary and boosted the number of potential IAM chains per table.

Included Columns
This is the ability for non-clustered indexes to include non-key columns at the leaf-level. This is useful for three reasons:

  1. Iit allows a non-clustered index to truly cover a query where the query results include more than 16 columns or the combination of column lengths in the query results is greater than 900 bytes (remember that a non-clustered index key is limited to 16 columns and 900 bytes).
  2. It allows columns to be include in the non-clustered index that have data types that cannot be part of an index key (e.g. varchar(max) or XML).
  3. It allows a non-clustered index to cover a query without having to have all the columns included in the index key. As the index key is included in rows at all levels of the b-tree, this allows the index to be smaller.

An example of space saving: imagine a 100 million row index, with a key length of 900 bytes, but only the first two integer keys are really needed as the index key, the other 4 fixed-length columns could be stored in the index as included columns. With the 900 byte key, 8 rows can fit per database page (i.e. the fanout is 8). This means there will be 12500000 pages at the leaf level, 1562500 pages at the next level up in the b-tree and so on, giving a total of 12500000 + 1562500 + 195313 + 24415 + 3052 + 382 + 48 + 6 + 1 = 14285717 pages (including 1785717 to store the upper levels of the b-tree).

If we go with the included columns method then the key size shrinks to 8 bytes, and with the row overhead we can get the row length in the upper levels of the b-tree down to 15 bytes (giving a fanout of approx. 537). Note that the fanout at the leaf-level is still going to be 8,  because the amount of data stored in each row at the leaf-level is the same. So, this means there will be 12500000 pages at the leaf level, 23278 pages at the next level up and so on, giving a total of 12500000 + 23278 + 44 + 1 = 12523323 pages (including 23323 to store the upper levels of the b-tree). Compared to the full-size 900-byte key, this is a 12% saving of 1762394 pages, or 13.6GB! Obviously this is a contrived case but you can see how the savings can occur.

The main reason for adding this feature it to enable true covering queries. A covering query is one where the query optimizer knows it can get all the query results from the non-clustered index and so the query can be satisfied without having to incur the extra IOs of looking up data in the base table - a significant performance saving.

Now that non-clustered indexes can have included columns, and those columns can be LOB data types (but only the new ones in SQL Server 2005 - varchar(max), nvarchar(max), varbinary(max), and XML). This means that having a single LOB allocation unit (as in the case of the single text index in SQL Server 2000) isn't possible any more because each index may have its own set of LOBs. Now, you may ask why there isn't just a single set of LOBs with multiple references from various indexes plus the base table. We considered that during SQL Server 2005 development but it would have made things a lot more complicated.

So, with this new feature, each index needs two allocation units - one for the data or index rows (the hobt allocation unit) and one for any LOB data.

Large Rows

One of the things that has plagued schema designers for a long time is the 8060 byte limit on table row sizes so this restriction was removed in SQL Server 2005. The way this is done is to allow variable-length columns (e.g. varchar, sqlvariant) to get pushed off-row when the row size gets too big to fit on a single page.

But where do these column values get pushed to? They're effectively turned into mini LOB columns. The column value in the row is replaced with a 16-byte pointer to the off-row column value, which is stored as if its a LOB value in a seperate allocation unit - the row-overflow (or SLOB) allocation unit. These values are stored in text pages in exactly the same way as regular LOB values are, just using a separate allocation unit. The SLOB allocation unit is only created when the first column value is pushed off-row.

This feature works for non-clustered indexes too - if you consider the ability to have included columns in non-clustered indexes then you could easily have non-clustered index rows that won't fit on a page. It would have been short-sighted of to get rid of the 900-byte limit and replace it with an 8060-byte limit by not extending the row-overflow feature to non-clustered indexes too.

Now with the addition of this new feature, each index can have up to three allocation units - hobt, LOB, and SLOB. Even with this, that only makes a maximum of 750 IAM chains per table (remember an IAM chain now maps the storage allocations for an allocation unit, so 250 indexes * 3 allocation units = 750 IAM chains). But I mentioned 750 thousand IAM chains per table earlier - where do all the rest come from?

Partitioning

This is what gives us the 1000x multiplier. As you may already know, partitioning is the new feature that allows tables and indexes to be split into a series of ranges, with each range stored separately (most commonly in seperate filegroups). Partitioning is a topic for a separate post.

If each range or partition of the table or index is stored seperately, then each is going to need its own hobt allocation unit. Of course, the LOB values associated with each partition need to be stored with it, and so each partition also needs a LOB allocation unit. Also, the row-overflow feature is per-row, and so rows in each partition will overflow into SLOB allocation units just as for un-partitioned tables and indexes. Thus each partition of a table or index can have up to three allocation units (and hence three IAM chains).

Still, where does that 1000 come in? Each table or index can have up to 1000 partitions. This gives us 250 indexes x 1000 partitions x 3 allocation units = 750000 IAM chains. Realistically this probably won't happen, but it is possible.

Wednesday, October 03, 2007 4:42:23 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

In SQL Server 2005, queries over partitioned tables use a single-thread per partition. This can cause performance problems under certain circumstances:

  1. On systems with many CPUs, if the table schema has less partitions than there are CPUs, then not all the CPUs will be used to process the query. Some examples:
    1. On a 32-way box, a query over a 12-partition table (e.g. a sales table partitioned by month) will only use 12 threads (one on each of 12 CPUs). This means 20 CPUs are potentially idle.
    2. On a 4-way box, a query over a 12-partition table only accesses a single partition so will only use one thread (on one CPU). This means 3 CPUs are potentially idle.
  2. On tables that have skewed data such that one partition is much larger than another, the length of time the query takes to complete will be bounded by the single thread processing the largest partition.

As part of the set of improvements in SQL Server 2008 for data warehousing there will be an option to change the threading behavior for queries over partitioned tables. The new, alternative model is that all available threads process part of each partition and then move into the next partition. This allows all available CPUs to take part in processing the query, which should lead to a drop in the query completion time.

The only time this model won't work is if the data is not in the buffer pool and is not spread out evenly across the available drives. For example, if an entire partition is stored on a single drive, then multiple threads will be scanning different portions of the drive, causing the disk head to thrash and IO throughput to drop sharply compared with a single thread driving the IO. For this reason, the option to use the new model will be off by default, to avoid surprising people with sudden bad performance after upgrading.

This should be available in the next CTP and then I'll post again with some example datasets and queries to see what the potential benefits and drawbacks are.

Wednesday, October 03, 2007 2:40:26 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 

Theme design by Jelle Druyts

Pick a theme: