This week I'm going to post a bunch of info on the basic structures used to store data and track allocations in SQL Server. A bunch of this was posted back when I started blogging at TechEd 2006 but I want to consolidate/clarify info and add more about using DBCC PAGE to examine the various structures.

So, what are records? At the simplest level, a record is the physical storage associated with a table or index row. Of course, it gets much more complicated than that...

Data records

  • Data records are stored on data pages.
  • Data records store rows from a heap or the leaf level of a clustered index.
  • A data record always stores all columns from a table row - either by-value or by-reference.
    • If any columns are for LOB data types (text, ntext, image, and the new LOB types in SQL Server 2005 - varchar(max), nvarchar(max), varbinary(max), XML), then there's a pointer stored in the data record which points to a text record on a different page (the root of a loose tree that stores the LOB value). Exceptions to this are when the schema has been set to store LOB columns 'in-row' when possible. This is when a LOB value is small enough to fit within the size limits of a data record. This is a performance benefit as selecting the LOB column does not require an extra IO to read the text record.
    • In SQL Server 2005, non-LOB variable length columns (e.g. varchar, sqlvariant) may also be stored 'off-row' as part of the row-overflow feature of having table rows longer than 8060 bytes. In this case the storage format is the same as for LOB values - a pointer in the data record pointing to a text record.
  • There is a difference to how the columns are laid out between heaps and clustered indexes - I'll cover that in a later post.

Forwarded/Forwarding records

  • These are technically data records and are only present in a heap.
  • A forwarded record is a data record in a heap that was updated and was too large to fit in-place on its original page and so has been moved to another page. It contains a back-pointer to the forwarding record.
  • A forwarding record is left in its place and points to the new location of the record. It's sometimes known as a forwarding-stub, as all it contains is the location of the real data record.
  • This is done to avoid having to update any non-clustered index records that point back directly to the original physical location of the record.
  • Although this optimizes non-clustered index maintenance during updates, it can cause additional IOs during SELECTs. This is because the non-clustered index record points to the old location of the index, so an extra IO might be needed to read the real location of the data row. This is fuel for the heap vs clustered index debate, in favor of clustered indexes.

Index records

  • Index records are stored on index pages.
  • There are two types of index records (which differ only in what columns they store):
    1. Those that store non-clustered index rows at the leaf level of a non-clustered index
    2. Those that comprise the b-tree that make up clustered and non-clustered indexes (i.e. in index pages above the leaf level of a clustered or non-clustered index)
  • I'll explain more about the differences between these in a later post as it can be quite complicated (especially the differences between SQL Server 2000 and 2005) and is worth doing in separate posts.
  • Index records typically do not contain all the column values in a table (although some do - called covering indexes).
  • In SQL Server 2005, non-clustered index records can include LOB values as included columns (with the storage details exactly the same as for data records) and also can have row-overflow data that is pushed off-row (again, in exactly the same way as for data records).

Text records

  • Text records are stored on text pages.
  • There are various types of text records that comprise the tree structure that stores LOB values, stored on two types of text page. I'll explain how they work and are linked together in a future post.
  • They are also used to store variable-length column values that have been pushed out of data or index records as part of the row-overflow capability.

Ghost records

  • These are records that have been logically deleted but not physically deleted from a page. The reasons for this are complicated, but basically having ghost records simplfies key-range locking and transaction rollback.
  • The record is marked with a bit that indicates it's a ghost record and cannot be physically deleted until the transaction that caused it to be ghosted commits. Once this is done, it is deleted by an asynchronous background proces (called the ghost-cleanup task) or it is converted back to a real record by an insert of a record with the exact same set of keys.

Other record types

  • There are also records that are used to store various allocation bitmaps, intermediate results of sort operations, and file and database metadata (e.g. in the per-file fileheader page and database boot page). Again, I'll go into these in later posts (there's a big queue of posts building up :-))

Record structure

All records have the same structure, regardless of their type and use, but the number and type of columns will be different. For instance, a data record from a table with a complex schema may have hundreds of columns of various types whereas an allocation bitmap record will have a single column, filling up the whole page.

The record structure is as follows:

  • record header
    • 4 bytes long
    • two bytes of record metadata (record type)
    • two bytes pointing forward in the record to the NULL bitmap
  • fixed length portion of the record, containing the columns storing data types that have fixed lengths (e.g. bigint, char(10), datetime)
  • NULL bitmap
    • two bytes for count of columns in the record
    • variable number of bytes to store one bit per column in the record, regardless of whether the column is nullable or not (this is different and simpler than SQL Server 2000 which had one bit per nullable column only)
    • this allows an optimization when reading columns that are NULL
  • variable-length column offset array
    • two bytes for the count of variable-length columns
    • two bytes per variable length column, giving the offset to the end of the column value
  • versioning tag
    • this is in SQL Server 2005 only and is a 14-byte structure that contains a timestamp plus a pointer into the version store in tempdb

NULL bitmap optimization

So why is the NULL bitmap an optimization?

Firstly, having a null bitmap removes the need for storing special 'NULL' values for fixed-length datatypes. Without the null bitmap, how can you tell whether a column is NULL? For fixed-length columns you'd need to define a special 'NULL' value, which limits the effective range of the datatype being stored. For varchar columns, the value could be a zero-length empty string, so just checking the length doesn't work - you'd need the special value again. For all other variable-length data types you can just check the length. So, we nede the NULL bitmap.

Secondly, it saves CPU cycles. If there was no NULL bitmap, then there are extra instructions executed for fixed-length and variable-length columns.

For fixed-length:

  1. read in the stored column value (possibly taking a cpu data cache miss)
  2. load the pre-defined NULL value for that datatype (possibly taking a cpu data cache miss, but only for the first read in the case of a multiple row select)
  3. do a comparison between the two values

For variable-length:

  1. calculate the offset of the variable length array
  2. read the number of variable length columns (possibly taking a cpu data cache miss)
  3. calculate the position in the variable length offset array to read
  4. read the column offset from it (possibly taking a cpu data cache miss)
  5. read the next one too (possibly taking another cpu data cache miss, if the offset in step 4 was on the boundary of a cache line size)
  6. compare them to see if they're the same

But with a NULL bitmap, all you have to do is:

  1. read the NULL bitmap offset (possibly taking a cpu data cache miss)
  2. calculate the additional offset of the NULL bit you want to read
  3. read it (possibly taking a cpu data cache miss)

So, its about even for a lookup of a single fixed-length column, but for variable-length columns, and for multiple row selects, there's a clear advantage to having the NULL bitmap.

Using DBCC IND and DBCC PAGE to examine a row in detail

Let's create an example table to look at:

USE MASTER;

GO

IF DATABASEPROPERTY (N'recordanatomy', 'Version') > 0 DROP DATABASE recordanatomy;

GO

CREATE DATABASE recordanatomy;

GO

USE recordanatomy;

GO

CREATE TABLE example (

destination VARCHAR(100),

activity VARCHAR(100),

duration INT);

GO

INSERT INTO example VALUES ('Banff', 'sightseeing', 5);

INSERT INTO example VALUES ('Chicago', 'sailing', 4);

GO

And we can use DBCC IND again to find the page to look at:

DBCC IND ('recordanatomy', 'example', 1);

GO

The output tells us the data page is (1:143) so we can dump it with DBCC PAGE, using option 3 to get a fully interpreted dump of each record.

DBCC TRACEON (3604);

GO

DBCC PAGE ('recordanatomy', 1, 143, 3);

GO

Remember we need the trace-flag to make the DBCC PAGE output go to the console instead of the error log. The output will contain something like the following:

Slot 0 Offset 0x60 Length 33

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS

Memory Dump @0x5C76C060

00000000:   30000800 05000000 0300f802 00160021 †0..............!        
00000010:   0042616e 66667369 67687473 6565696e †.Banffsightseein        
00000020:   67†††††††††††††††††††††††††††††††††††g                       

Slot 0 Column 0 Offset 0x11 Length 5

destination = Banff                 

Slot 0 Column 1 Offset 0x16 Length 11

activity = sightseeing              

Slot 0 Column 2 Offset 0x4 Length 4

duration = 5                        

Let's use the record structure I listed above to go through this record and see how things are stored.

  • Byte 0 is the TagA byte of the record metadata.
    • Its 0x30, which corresponds to 0x10 (bit 4) and 0x20 (bit 5). Bit 4 means the record has a NULL bitmap and bit 5 means the record has variable length columns. If 0x40 (bit 6) was also set, that would indicate that the record has a versioning tag. If 0x80 (bit 7) was also set, that would indicate that byte 1 has a value in it.
    • Bits 1-3 of byte 0 give the record type. The possible values are:
      • 0 = primary record. A data record in a heap that hasn't been forwarded or a data record at the leaf level of a clustered index.
      • 1 = forwarded record
      • 2 = forwarding record
      • 3 = index record
      • 4 = blob fragment
      • 5 = ghost index record
      • 6 = ghost data record
      • 7 = ghost version record. A special 15-byte record containing a single byte record header plus a 14-byte versioning tag that is used in some circumstances (like ghosting a versioned blob record)
    • In our example, none of these bits are set which means the record is a primary record. If the record was an index record, byte 0 would have the value 0x36. Remember that the record type starts on bit 1, not bit 0, and so the record type value from the enumeration above needs to be shifted left a bit (multiplied by two) to get its value in the byte.
  • Byte 1 is the TagB byte of the record metadata. It can either be 0x00 or 0x01. If it is 0x01, that means the record type is ghost forwarded record. In this case its 0x00, which is what we expect given the TagA byte value.
  • Bytes 2 and 3 are the offset of the NULL bitmap in the record. This is 0x0008 (DBCC PAGE presents multi-byte values in hex dumps as least-significant byte first). This means that there's a 4-byte fixed length portion of the record starting at byte 4. We expect this because we know the table schema.
  • Bytes 4 to 7 are the fixed length portion. Again, because we know the table schema, we know to interpret these bytes as a 4-byte integer. Without that knowledge, you'd have to guess. The value therefore is 0x00000005, which is what we'd expect to see as the value of the duration column.
  • Bytes 8 and 9 are the count of columns in the record. This is 0x0003 which is correct. Given that there are only 3 columns, the NULL bitmap of one bit per column will fit in a single byte.
  • Byte 10 is the NULL bitmap. The value is 0xF8. We need to convert it to binary to make sense of the value. 0xF8 = 11111000. This makes sense - bits 0-2 represent columns 1-3 and they're all 0, meaning the columns aren't NULL. Bits 3-7 represent non-existent columns and they're set to 1 for clarity.
  • Bytes 11 and 12 are the count of variable length columns in the record. That value is 0x0002, which we again know to be correct. This means there will be two two-byte entries in the variable length column offset array. These will be bytes 13-14 and 15-16, having values of 0x0016 and 0x0021 respectively. Remember that NULL bitmap entries point to the end of the column value - this is done so that we know how long each column is without having to store their length as well.
  • So, the final offset is bytes 15 and 16, which means the offset of the start of the first variable length column must be byte 17 (or 0x11 in hex), which agrees with the DBCC PAGE dump. The offset of the end of the first variable length column is 0x0016, so the first value is from byte 17 to byte 21 inclusive. This value is 0x42616E6666. We know from the table metadata that this is the first varchar column, destination. Converting to ASCII gives us the column value 'Banff'. Using similar logic, the second value is from byte 22 to byte 32 inclusive and has the value 'sightseeing'. Both of these match the data we're expecting.

And that's it.

Some of the features of SQL Server 2008 will introduce changes to the record structure - more on those when the features are available in CTPs.

This was a question from the MSDN Disaster Recovery forum I started while I was at Microsoft.

I have a 600 gig database that has a mirror.  I need to move the databases from local drives to a SAN.  Can anyone recommend a document that lists the steps to go through to move both the principle and mirror to the SAN with no down time? or minimal down time?

As far as I know, there isn't any such document so I had a crack at coming up with a list of operations. Here's what I had:

  1. Take a full backup of the principal on node A
  2. Restore it on the SAN on node B using WITH NORECOVERY, remembering to use WITH MOVE to place the files correctly, and with a different database name than the current mirror
  3. Take the required log backup on the principal and restore on the database copy on the SAN on node B
  4. Break the mirroring partnership
  5. Drop the current mirror database on node B
  6. Rename the database on the SAN on node B to be the mirror database -- THIS DOESN"T WORK!
  7. Setup the mirroring partnership to point to the newly restored database on the SAN on node B
  8. Start mirroring and the new mirror will catch-up
  9. Failover to the mirror on node B, which becomes the new principal
  10. Follow the same procedure to move the new mirror on node A onto its SAN
  11. Failback if you want to

And I promised to try it out to make sure I had it right so in this blog post I'm going to walk through the steps of doing this. It turns out that the steps above are slightly incorrect. Step 6 above doesn't work because the database is in recovery (so is inaccessible) and there's a short-cut when moving the database on the first node to avoid having to take and copy more backups. Let's see how it works and I'll post the corrected sequence at the end.

As I did in yesterday's mirroring post, I'm going to use the TicketSalesDB database from our Always-On DVDs. It's only a few hundred MB instead of 600GB but the principal is the same (no pun intended :-)). I've got mirroring running between two nodes, SQLDEV01 (the principal) and SQLDEV02 (the mirror), both of which are running 2005 SP2 and I've got a simulated workload inserting rows into the database. I don't actually have a SAN laying around so I'm cheating and I have directories called C:\SQLDEV01SAN and C:\SQLDEV02SAN instead. It's the location change that's the interesting part, not where the actual location is.

Step 1

On SQLDEV01, take a full backup and a log backup:

BACKUP DATABASE TicketSalesDB TO DISK = 'C:\SQLskills\TicketSalesDB.BAK' WITH INIT;

GO

BACKUP LOG TicketSalesDB TO DISK = 'C:\SQLskills\TicketSalesDB_Log.bak' WITH INIT;

GO

Step 2

On SQLDEV01, break the mirroring partnership:

ALTER DATABASE TicketSalesDB SET PARTNER OFF;

GO

And just check that it's gone:

SELECT mirroring_state_desc FROM sys.database_mirroring WHERE database_id = DB_ID ('TicketSalesDB');

GO

which returns:

NULL

Step 3

On SQLDEV02, drop the mirror database - this wouldn't work unless mirroring was no longer running:

DROP DATABASE TicketSalesDB;

GO

Step 4

Copy the backups to SQLDEV02 and restore them on the SAN and remembering to use WITH NORECOVERY:

RESTORE DATABASE TicketSalesDB FROM DISK='C:\SQLskills\TicketSalesDB.bak'

WITH

MOVE 'TicketSalesDBData' TO 'C:\SQLDEV02SAN\TicketSalesDBData.MDF',

MOVE 'TicketSalesFG2005Q1' TO 'C:\SQLDEV02SAN\TicketSalesFG2005Q1.NDF',

MOVE 'TicketSalesFG2005Q2' TO 'C:\SQLDEV02SAN\TicketSalesFG2005Q2.NDF',

MOVE 'TicketSalesFG2005Q3' TO 'C:\SQLDEV02SAN\TicketSalesFG2005Q3.NDF',

MOVE 'TicketSalesFG2005Q4' TO 'C:\SQLDEV02SAN\TicketSalesFG2005Q4.NDF',

MOVE 'TicketSalesDBLog' TO 'C:\SQLDEV02SAN\TicketSalesDBLog.LDF',

NORECOVERY;

GO

RESTORE LOG TicketSalesDB FROM DISK = 'C:\SQLskills\TicketSalesDB_Log.bak' WITH NORECOVERY;

GO

Step 5

On SQLDEV02, set the mirroring partner to be SQLDEV01:

ALTER DATABASE TicketSalesDB SET PARTNER = 'TCP://SQLDEV01:5091';

GO

Step 6

On SQLDEV01, start mirroring:

ALTER DATABASE TicketSalesDB SET PARTNER = 'TCP://SQLDEV02:5092';

GO

And check that it's running:

SELECT mirroring_state_desc FROM sys.database_mirroring WHERE database_id = DB_ID ('TicketSalesDB');

GO

This time it returns:

SYNCHRONIZED

Step 7

Now we need to failover so that we can move the database on SQLDEV01 onto its SAN. Before we do that, let's make sure that SQLDEV01 is the principal:

SELECT mirroring_role_desc FROM sys.database_mirroring WHERE database_id = DB_ID ('TicketSalesDB');

GO

which returns:

PRINCIPAL

Now force the failover:

ALTER DATABASE TicketSalesDB SET PARTNER FAILOVER;

GO

And query the DMV again to make sure. This time the mirroring_state_desc returned is:

MIRROR

Excellent!

Now, I did all of this while my workload was running and it automatically failed over to SQLDEV02, with the database now hosted on the SAN. To do the same move on SQLDEV01, we don't need to go through the backup and copy process again - we can just use the original backups we took in step 1.

Step 8

We need to break the mirroring partnership again, this time executing on SQLDEV02, the new principal:

ALTER DATABASE TicketSalesDB SET PARTNER OFF;

GO

On SQLDEV01, we can now drop the database and restore the original backups onto the SAN:

DROP DATABASE TicketSalesDB;

GO

RESTORE DATABASE TicketSalesDB FROM DISK='C:\SQLskills\TicketSalesDB.bak'

WITH

MOVE 'TicketSalesDBData' TO 'C:\SQLDEV01SAN\TicketSalesDBData.MDF',

MOVE 'TicketSalesFG2005Q1' TO 'C:\SQLDEV01SAN\TicketSalesFG2005Q1.NDF',

MOVE 'TicketSalesFG2005Q2' TO 'C:\SQLDEV01SAN\TicketSalesFG2005Q2.NDF',

MOVE 'TicketSalesFG2005Q3' TO 'C:\SQLDEV01SAN\TicketSalesFG2005Q3.NDF',

MOVE 'TicketSalesFG2005Q4' TO 'C:\SQLDEV01SAN\TicketSalesFG2005Q4.NDF',

MOVE 'TicketSalesDBLog' TO 'C:\SQLDEV01SAN\TicketSalesDBLog.LDF',

NORECOVERY;

GO

RESTORE LOG TicketSalesDB FROM DISK = 'C:\SQLskills\TicketSalesDB_Log.bak' WITH NORECOVERY;

GO

And setup mirroring again. On SQLDEV01:

ALTER DATABASE TicketSalesDB SET PARTNER = 'TCP://SQLDEV02:5092';

GO

And on SQLDEV02:

ALTER DATABASE TicketSalesDB SET PARTNER = 'TCP://SQLDEV01:5091';

GO

And we're running again.

Step 9

Now all we need to do is fail the workload back to SQLDEV01 by executing this on SQLDEV02:

ALTER DATABASE TicketSalesDB SET PARTNER FAILOVER;

GO

Summary

So - the corrected sequence for moving a database while mirroring is running is the following:

  1. Take a full backup of the principal database on node A, and the required log backup
  2. Break the mirroring partnership
  3. Drop the current mirror database on node B
  4. Copy the backups to node B and restore it on the SAN on node B using WITH NORECOVERY, remembering to use WITH MOVE to place the files correctly
  5. Setup the mirroring partnership to point to the newly restored database on the SAN on node B
  6. Start mirroring and the new mirror will catch-up
  7. Failover to the mirror on node B, which becomes the new principal
  8. Follow the same procedure to move the new mirror on node A onto its SAN, but using the original backups from step 1
  9. Failback

Hope this helps.

One of the hottest features in SQL Server 2005 is database mirroring, and it's helped many companies implement successful and relatively inexpensive high-availability strategies. In SQL Server 2008, Database Mirroring has been enhanced in several ways - one of which is the ability to automatically repair corrupt pages!

This feature is based on the fact that the principal and mirror databases are exactly the same. So, if a page becomes corrupt on the principal, SQL Server should be able to read the page from the mirror and use it to fix the principal. Similarly, if a page becomes corrupt on the mirror, the page can be read from the principal to fix up the mirror. Pretty cool, eh?

Details

  • The feature works for pages that have 824 errors, 823 errors where the OS returns a CRC error while reading the page (to prevent resource issues triggering a page repair), and pages that have 829 errors (where the page is marked as restore pending). See my previous post here for more details on page errors.
  • Pages are fixed asynchronously.
    • If the page is corrupt in the principal, the query that hit the corrupt page will fail. Once discovered, a page is marked as being 829 until its fixed. This prevents an issue where a transient disk error could allow a subsequent update to change the page after it's been queued for being repaired, and then the page is overwritten with a copy from the mirror, losing the update. Nasty.
    • If the page is corrupt in the mirror (which is discovered when the page is read as part of the continual recovery of the log), the mirroring session is suspended. The mirror keeps track of all corrupt pages that need to be repaired with copies from the principal. Once all corrupt pages have been repaired, the mirroring session will be resumed automatically. This means that if a page is corrupt in both the mirror and the principal, manual intervention will be required to resolve the issue.
  • The feature is only available in Enterprise Edition. This means that if the principal is on Enterprise Edition and the mirror is on Standard Edition, then corrupt pages on the principal can repaired from the mirror but not the other way around.
  • There is a new DMV - sys.dm_db_mirroring_auto_page_repair - that allows you to track corrupt pages in mirrored databases 
    • It covers all mirrored databases on a server.
    • It provides info on the last 100 pages that were found in any mirrored database, as well as the status of the automatic page repair operation.
    • This DMV isn't yet included in the SQL Server 2008 Books Online available on TechNet but will have the following info:
      • Database ID the page is in
      • The Page ID, split into file and page-in-file
      • The error type - distinguishing between 823 errors, torn-page errors, page checksum failures, and all-other-824 errors
      • The status of the page repair operation
      • The time that the status was last updated
  • If a page repair fails for any reason (e.g. the mirroring partner couldn't supply the page) then the repair will be marked as failed. If the page is then hit again (by a query on the principal or a recovery operation on the mirror) then it will be re-queued for repair.
  • Not all pages can be repaired - the file header page, database boot page, and allocation bitmap pages (GAM, SGAM, PFS) cannot be repaired this way.

In Action

I've got a system with a few SQL Server 2008 instances running so I decided to give it a try. Here's what I did, using the TicketSalesDB from the Always-On DVDs and labs that SQLskills produces:

  • Changed the database to use page checksums
  • Rebuilt the clustered index of one of the tables (so the pages have page checksums on)
  • Took the initial backup of the database that's needed for mirroring (so that it has no corruptions in)
  • Used DBCC IND to find a page in the leaf level of the clustered index and corrupted the page
  • Setup mirroring (but using the backup I took before introducing the corruption - so the mirror database will have a clean copy of the page I corrupted)
  • Performed a query on the table with the corrupt page to force the page checksum failure and kick-off automatic page repair

And it worked! Going back into the instance with the principal database and querying the DMV gives:

C:\>sqlcmd /E /S.\KATMAI01
1> SELECT * FROM sys.dm_db_mirroring_auto_page_repair;
2> GO
database_id file_id     page_id              error_type page_status modification_time
----------- ----------- -------------------- ---------- ----------- -----------------------
          6           4                 4256         -1           5 2007-09-27 17:23:20.067

(1 rows affected)
1>

The page I corrupted was (4:4256) and page_status of 5 means the repair succeeded. Running the query again confirms that the corruption has been fixed. The page was also logged in the suspect_pages table in msdb:

1> SELECT * FROM msdb..suspect_pages;
2> GO
database_id file_id     page_id              event_type  error_count last_update_date
----------- ----------- -------------------- ----------- ----------- -----------------------
          6           4                 4256           5           1 2007-09-27 17:23:20.407

(1 rows affected)
1>

In Books Online (both 2005 and 2008), event_type of 5 means that the page was repaired.

I checked the SQL Server error log for the principal and this is what I found (the bolding is mine):

2007-09-27 17:17:10.41 spid25s     Database mirroring is active with database 'TicketSalesDB' as the principal copy. This is an informational message only. No user action is required.
2007-09-27 17:23:19.92 spid51      Error: 824, Severity: 24, State: 2.
2007-09-27 17:23:19.92 spid51      SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x55684fbe; actual: 0x16e84fbe). It occurred during a read of page (4:4256) in database ID 6 at offset 0x00000002140000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL.6\MSSQL\Data\TicketSalesFG2005Q1.NDF'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
2007-09-27 17:23:19.96 spid26s     Database mirroring is attempting to repair physical page (4:4256) in database "TicketSalesDB" by requesting a copy from the partner.
2007-09-27 17:23:20.42 spid26s     Database mirroring successfully repaired physical page (4:4256) in database "TicketSalesDB" by obtaining a copy from the partner.

I also checked the error log for the mirror and there's nothing relevant in there. I guess the same info would be output to the mirror database's error log if the mirror became corrupted. It's reasonably simple to check this using similar steps as above, but corrupting the database before the initial backup is taken, fixing the corruption again after the backup and before mirroring starts (so the corruption is on the mirror but not the principal), and then triggering an update on the corrupt page. When the update is replayed on the mirror, the corruption will be hit and the page repaired in the same way. I'll check later and blog if there's any difference.

Summary

SQL Server 2008 introduces an enhancement to database mirroring that can bi-directionally pull pages between the principal and mirror databases to fix page corruptions. One word of caution I'd give is that this feature doesn't mean you can ignore these errors when they occur - you still need to do root-cause analysis on the corruption and take steps to prevent them happening again before a corruption occurs that automatic page repair cannot fix (as I mentioned above).

Nevertheless, this is a tremendously useful feature that's going to save a lot of downtime. Cool!

One of the great things about the blog engine we use is that it shows all the search engine queries that led to someone clicking through to the site. I've been looking through the logs to see what kind of problems people are having that end up here. In this occasional series, I'm going to pick a search engine query and blog about it - hopefully helping out people who have the problem in future.

First up is running out of transaction log space. This happens when the transaction log fills up to the point where it has to grow but either autogrow is not turned on or the volume on which the transaction log is placed has no more space for the file to grow. So what causes the transaction log to fill up in the first place? It could be a number of different things. The two most common ones I've seen are:

  • The database is in full recovery mode with normal processing, a full database backup has been taken but no log backups have been taken. This commonly happens when an application goes into production and someone decides to take a backup, without realizing the consequences. Log truncation is not automatic in such a case - a log backup is required to allow the log to truncate.
  • The database is in any recovery mode but there's a very long-running, uncommitted transaction that prevents log truncation. (Even in full recovery mode with regular log backups, all the log records for the long-running transaction are required in case it rolls-back - and SQL Server can't selectively truncate log records from the log for some transactions but not others.)

Let's have a look at both of these in SQL Server 2005 and example solutions.

The log backup case...

First thing to do is create a database with a very small transaction log that we can easily fill up.

USE master;
GO
IF DATABASEPROPERTY (N'tinylogtest', 'Version') > 0 DROP DATABASE tinylogtest;
GO

CREATE DATABASE tinylogtest ON (
   
NAME = fgt_mdf,
   
FILENAME = 'c:\tinylogtest\tinylogtest.mdf',
   
SIZE = 2MB)
LOG ON (
   
NAME = fgt_log,
   
FILENAME = 'c:\tinylogtest\tinylogtest.ldf',
   
SIZE = 512KB,
   
FILEGROWTH = 0);
GO

Notice that I've specifically turned auto-growth of the log file off by setting FILEGROWTH = 0. Now let's set the database into full recovery mode and take a full database backup.

ALTER DATABASE tinylogtest SET RECOVERY FULL;
GO
BACKUP DATABASE tinylogtest to DISK = 'C:\tinylogtest\dummybackup.bck';
GO

And then create a simple table and fill up the transaction log.

SET NOCOUNT ON;
GO
CREATE TABLE tinylogtest..testtable (c1 INT, c2 CHAR (3000));
GO
WHILE (1=1)
BEGIN
   
INSERT INTO tinylogtest..testtable VALUES (1, 'a')
END;

GO

And we get the following error:

Msg 9002, Level 17, State 2, Line 4
The transaction log for database 'tinylogtest' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases

This is much better than you get on SQL Server 2000:

Msg 9002, Level 17, State 2, Line 4
The log file for database 'tinylogtest' is full.

This has no helpful information at all. So, looking at the 2005 message, it's telling us to look in the sys.databases table for more info. Let's do that:

SELECT log_reuse_wait_desc FROM sys.databases WHERE [name] = 'tinylogtest';
GO

And the result is:

LOG_BACKUP

That's pretty clear - take a log backup! Here are the various values this can take, from the Books Online entry for sys.databases:

  • NOTHING
  • CHECKPOINT
  • LOG_BACKUP
  • ACTIVE_BACKUP_OR_RESTORE
  • ACTIVE_TRANSACTION
  • DATABASE_MIRRORING
  • REPLICATION
  • DATABASE_SNAPSHOT_CREATION
  • LOG_SCAN
  • OTHER_TRANSIENT

Their meanings are explained in the Books Online entry Factors That Can Delay Log Truncation. Now if we take a log backup, the log can be truncated.

BACKUP LOG tinylogtest to DISK = 'C:\tinylogtest\dummybackup.bck';
GO

And checking the state in sys.databases again returns:

NOTHING

Simple - but it's amazing how many times I see this problem on newsgroups and forums.

The long-running transaction case...

For this test, assume I've run the T-SQL above to create the database, put it into full recovery mode, take the first full backup, and then create the table. Now in a second connection, I'll create a long-running transaction:

BEGIN TRANSACTION PAULSTRAN;
GO

INSERT INTO tinylogtest..testtable VALUES (1, 'a');
GO

Note that the transaction hasn't been committed or rolled back. In the original connection, I'll execute the WHILE loop to fill up the log again. When we check sys.databases we get back:

LOG_BACKUP

So we take a log backup and check the state again. This time we get:

ACTIVE_TRANSACTION

We need to use the DBCC OPENTRAN command to find out what the long running transaction is:

DBCC OPENTRAN ('tinylogtest');
GO

And we get back:

Transaction information for database 'tinylogtest'.

Oldest active transaction:
    SPID (server process ID): 54
    UID (user ID) : -1
    Name            : PAULSTRAN
    LSN               : (18:98:1)
    Start time      : Sep 23 2007  5:49:53:077PM
    SID               : 0x010500000000000515000000ae4da6eadcd59cf661d6bb58ed030000
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

From this info we can track down the transaction and commit it or roll it back. In our example, once we roll back the PAULSTRAN transaction, there's enough log space to insert one more record into the table, but then we run out of log space again. What's going on? Because the PAULSTRAN transaction we very close to the start of the transaction log, all of the log it was effectively holding hostage was backed up but couldn't be cleared. In this case we need to perform another log backup to allow the log to be cleared. This makes sense and is documented behavior.

Summary

Of course, there are other causes of the transaction log filling up and other things you can do apart from taking log backups or identifying long-running transactions - such as growing the log file or moving the log file to a volume with more free space. Here are some good resources you can use for further reading:

Hope this helps!

Quickie this morning - I was surfing some MVP blogs this morning and came across a series of posts by Allen White around using VB Script and SMO to automate regular maintenance jobs. His latest post contains code to regularly run consistency checks and more, and report any failures. Very nice - check it out!

My previous post on Backup Compression reminded me to post about installing the latest SQL Server 2008 CTP (here's a link to the download page).

I've installed SQL Server 2008 many times, both within Microsoft and since leaving in August, but I've never done it on a system that has SQL Server 2000 installed too. Last week I decided to create a VPC image with a few instances of SQL Server 2008 on as well as SQL Server 2005 - basically updating the VPC image from the Always-On DVD that contains all the Hands-On Labs that Kimberly's written over the last few years for SQL Server 2005.

One thing I'd heard is that SQL Server 2008 won't install side-by-side yet with SQL Server 2000, or any SQL Server 2000 tools. I'm not sure whether this will be the case at RTM, but it's fair enough for a pre-RTM CTP. So, I tried it and sure-enough it failed. I then used Add/Remove Programs to de-install SQL Server 2000 and thought this would do the trick. Nope - the install still failed, thinking I had one or more SQL Server 2000 instances in the VPC. Next I tried deleting the 2000-specific directories from the C:/Program Files/Microsoft SQL Server directory. Nope - the install still failed. Finally, I fired up regedt32.exe and deleted the 2000-specific registry keys. Then the install finally worked.

So - to save you a bunch of time figuring this out, using Add/Remove Programs doesn't de-install SQL Server 2000 well enough to satisfy the SQL Server 2008 CTP setup.

Over the next few months I'll be blogging a lot about new features that are coming in SQL Server 2008 for DBAs and ITPros.

First up is Backup Compression. This has been one of most heavily requested features since before I joined the SQL team back in 1999 - for me it's really cool that it's in this coming release.

It's widely recognized that database sizes are growing significantly, and that the growth is being forced by many factors, such as companies requiring more data to be available online for longer (e.g. to comply with government regulations) or an increasing amount of data being digitized for storage. Although the per/GB cost of storage is falling, the actual cost of keeping a database available is increasing. Here's why:

  • Prior to SQL Server 2008, a backup of a VLDB (or any sized database really) takes, on average, the same amount of disk space as the data in the database itself (remember that database files can be sized much larger than the amount of data they hold). A prudent backup strategy is to have multiple backups online, and multiple redundant copies of each backup. Thus, the amount of disk space required increases rather quickly - more than offsetting the drop in storage costs.
  • Keeping safe copies of all database backups means they have to be physically copied to a separate location. Increasing database sizes translates into more bandwidth required to copy larger database backups (or lower bandwidth links get saturated for longer and longer).
  • If you have a well designed storage strategy, then you're probably using RAID - so the more backups you need to store, and the higher level of RAID you use, the more drives you need to store all these backup copies.

In addition to costs, the elapsed times of backup and restore operations need to be considered. Backup and restore operations are essentially IO-bound. For a backup in SQL Server 2005 and before, the whole database has to be read and then written to a new location - with the total IO size of the writes to the backup equalling that of the reads from the database. The same is true for a restore, with reads from the backup equalling writes to the restored database. (As an aside, a restore operation also has the added CPU cost of having to examine each page as it's read from the backup to figure out where it should be written to in the restored database - this usually makes a restore operation take 10-20% more elapsed time than a backup.)

The perfect solution to these problems, at the expense of sometimes-considerable CPU time, is to compress the database as it's written into the backup. This reduces storage space per backup, required network bandwidth per backup copy, and elapsed time for backup and restore operations. The last point - reducing elapsed time for restore operations - is especially important in disaster recovery situations, where anything that can help reduce database downtime is a good thing.

[Edit: I've added a little here based on comments I received - thanks Brent and Marc!]

A less ideal solution I've seen is to take a regular backup and then compress it after-the-fact before copying to other locations. While this is a reasonable solution, it requires more disk space than compressing the database as its backed up, and it's a more complicated procedure. It also increases the time to take the backup, as the compression is done in a seperate step.

An alternative to compressing the backup at all is to make the backup location a compressed NTFS directory. While this achieves the compression goal, it doesn't permanently compress the backup so doesn't reduce the network bandwidth required to copy the backup or the space needed to archive the backup to tape.

Up until SQL Server 2008, the only solutions for compression-during-backup have come from third-party software vendors. Although these solutions do the job of aleviating the problems I've described above, there are two major roadblocks to their adoption that I've heard from SQL Server customers:

  1. You need to buy another software license as well as SQL Server - this can be pretty expensive for a large number of SQL Server installations. Management is also a headache, to ensure that all sites that may need to decompress the backup have the correct software installed.
  2. Some IT shops are Microsoft-only, which precludes the use of any software not supplied by Microsoft.

In SQL Server 2008, Microsoft will include a long-awaited and much-asked-for backup compression solution - eliminating the two roadblocks above. This is a fantastic first step improving backup/restore functionality - hopefully in the release after SQL Server 2008 we'll see further innovations that will allow encrypted backups, table-level restore, and easier validation of the database stored in a backup.

Some points to note:

  • Adhering to the principal-of-least-surprise, backup compression will be off by default, with very simple syntax to turn it on - directly with T-SQL or through the tools.
  • The compression algorithm used is proprietary to Microsoft and has yielded similar compression ratios to well-known third-party products.
  • During Microsoft's in-house testing on real -world customer databases, average compression ratios of 5:1 have been observed.
  • A backup set will not be able to contain both compressed and uncompressed backups.
  • None of the existing functionality will be altered by compression - e.g. the operation of WITH CHECKSUM or WITH CONTINUE_AFTER_ERROR (see here for more info on those options).
  • A restore operation will be able to tell automatically whether a given backup is compressed or not and just do the right thing.

Once backup compression is available in a public CTP, I'll blog some sample scripts and report on things like:

  • the compression ratios, elapsed time differences, and CPU usages for a few sample databases
  • the varoius configuration options available
  • any differences in the MSDB backup history tables or the output from RESTORE HEADERONLY/LABELONLY

In summary, backup bompression is a very exciting feature and should enable many customers to save money, either on additional software licenses or on storage/network costs, and time, especially in all-important disaster recovery situations.

Thanks

120x240_SQLConn_IBT.jpg120x240_SQLConn_IBT.jpg

 

 

The Fall conference season is on us again! Building on the great reaction to our co-presented Database Maintenance workshop at SQL Connections in Orlando, Kimberly and I are doing a *ton* of stuff at SQL Connections this Fall. The conference is in Las Vegas (where I've never been!) and runs from November 5th to 8th, with pre-con workshops on the 5th. We've got so much cool stuff to talk about that as well as doing a pre-con on the 5th, we're also doing a pre-pre-con on the 4th AND a post-con workshop on the 9th! And if that's not enough to help us lose our voices during the week, we're also doing 5 conference sessions between us! Tuesday 6th is Microsoft day and the session line-up looks great - lots of juicy details about SQL Server 2008 - as well as some best-practices sessions for those of you who are happy with SQL Server 2000 or 2005 for now.

 

 

Here's what we're doing:

Workshops

  • November 4th - Pre-pre-con: SPR301: SQL Server Database Maintenance: From Planning to Practice to Post-Mortem

No matter how much effort you spend on the design of your database, if you don't maintain it in production then it will suffer from performance and manageability problems. The key to continued performance and smooth operations is a well thought-out and automated database maintenance plan. This full-day workshop has three sections: planning, practice, and post-mortem. Planning for database maintenance actually starts with database design, so one of the things covered will be how to avoid design choices that limit database maintenance or contribute to maintenance problems. We'll discuss a laundry-list of maintenance problems and then explore how to tell if you need to mitigate them, strategies and best-practices for doing so, and how to avoid having your mitigation choices cause unforeseen and undesirable side-effects. Topics covered will include database files (shrink, grow, virtual log files, log size/management), consistency checks and corruption detection, fragmentation, statistics, backup/restore (options, granularity, strategies) and recovery models. The workshop will vary between 200-400 level covering ALL the key concepts of maintenance features. Finally, we'll spotlight some real-world examples where people made good and bad choices and discuss how you can repeat or avoid them, respectively. If you're wondering how to bring your database back under control, and keep it there, then this full-day workshop will help you tame maintenance problems whether you're a full-time system administrator or a reluctant DBA.

  • November 5th - Pre-con: SPR303: SQL Server Disaster Recovery: From Planning to Practice to Post-Mortem

Every DBA's nightmare is having down time and data loss and not knowing how to recover. However, designing and implementing a successful disaster recovery strategy is easier said than done. It's about asking all the right questions and figuring out all the best answers for your situation. This full-day workshop has three sections: planning, practice and post-mortem. Planning is a critical part of disaster recovery, but the most-often disregarded. Topics we'll cover here include: How do you choose technologies to fit requirements and effectively use key features of SQL Server 2005? How do technology choice affect workload performance? Putting a well-thought out plan into practice requires even more planning and in this section we'll discuss technology implementation, building step-by-step recovery/operation guides for when disasters happen, and, most importantly, testing your plan by simulating real problems. In the final section, we'll spotlight some real-world examples where people made costly mistakes and show you how they could have been avoided with a little planning and practice. If you've ever had nightmares about disaster recovery (or actually had a disaster!) and been at a loss for what to do, then this full-day workshop will give you the direction and technical details you need for success!

  • November 9th - Post-con: SPS302: SQL Server - Put Your Knowledge Into Action (Bring Your Own Laptop)

After a week of learning and watching demos - spend your last conference day putting your knowledge into action and diving deeper into the implementation details. Bring your own laptop to install our VPC environment setup with hands-on lab exercises to walk you through some of our most important features in Database Maintenance and Disaster Recovery. All labs will be ILLs (instructor-led labs) with supporting hands-on lab content *and* you will walk away with your own copy of the DVD to continue the exercises back at your office. You can attend without a laptop but your experience will be significantly better with one! This is meant as an advanced workshop and you're expected to bring a reasonable laptop configuration in order to participate: * Virtual Server or Virtual PC - already installed * At least 1GB of physical memory w/512MB dedicated to the VPC environment (2GB is preferred w/1GB dedicated to VPC) * 12 GB of physical disk space (20+ GB is preferred) * DVD Drive

Sessions

  • SDB351: Follow the Rabbit - Interactive Q&A on Availability

In this session, Kimberly Tripp and Paul Randal will have only 5-10 slides. The focus of this session is on mixing availability technologies to create the best overall architecture to minimize downtime and data loss. In general, we're going to focus on best practices and then open up to your questions so that you can drive the discussion! This session might not seem as structured as other sessions, but you'll be surprised at how informative and fun it is! Grab your lunch and come back - we'll probably still be hanging out!

  • SDB350: SQL Server Table Strategies - Designing for Performance and Availability

Often tables are designed based solely on the data that needs to be tracked (here's a column name, here's a data type - done!). Unfortunately, design does not usually take into account how the data is going to be used OR how SQL Server uses the data. Knowing the internals of table structures as well as the optmizations that come with good design will make your database truly scalable. Come to this session to learn some internals as well as various design strategies such as vertical and horizonal partitioning. Additionally, are there any other features that require changes in your design and thinking? For example, online index operations impact design because of the limitations that exist with partitioning and LOB columns. If you want to scale, you need to be here!

  • SDB347: SQL Server Indexing for Performance - Finding the Right Balance

In terms of performance tuning, there are few silver bullets. If I had to choose ONE area that improves performance the most (when designed appropriately!), it's indexing. However, indexing strategies depend on the data and even more so, the usage of the data. Come to this session to see what indexing strategies help the base table the most as well as how to optimize your worst performing queries.

  • SDB348: SQL Server Indexing Strategies - Are You Sure?

Knowing tips and tricks to indexing is extremely helpful and will help you to solve "known" problems. But what's lurking in the unknown? Is SQL Server using your indexes? Or, do you have a bunch of useless indexes? Finally, SQL Server 2005 has an answer! SQL Server 2005 DMVs (Dynamic Management Views) can provide you with valuable information about your current indexing strategies, what should be removed, and even what's missing. Do you know how to find this information, leverage it, and then programmatically respond to it? Come to this session to figure it out!

  • SDB349: Follow the Rabbit - Interactive Q&A on the Storage Engine and the Relational Engine

In this session, Kimberly Tripp and Paul Randal will have only 5-10 slides. Each slide cover topics for discussion as well as the reason(s) for why something might be behaving badly and/or things to try to solve your problems. In general we're going to focus on best practices and then open up to your questions so that you can drive the discussion! Paul will focus on the SE (Storage Engine) and internals and Kimberly will focus on the RE (Relational Engine) and query tuning/performance. This session might not seem as structured as other sessions, but you'll be surprised at how informative and fun it is!

So, serious amounts of info, tips, and tricks for you to geek-out on with us and take home to use. We hope to see a lot of you there!

Paul (and Kimberly)

On the Storage Engine blog last year I started two series that I got distracted from - one on CHECKDB and one on fragmentation. With the benefit of hindsight, I'm going to start 3 series on my new blog here - one on CHECKDB ('CHECKDB from every angle'), one on indexes ('Indexes from every angle'), and one on internals ('Inside the Storage Engine'). The first few posts of each will be updated reposts of a few from the previous blog, just for completeness.

I realize that I promised to finalize the DBCC whitepaper this summer - well, that's been delayed a little by the wedding and other stuff. Once I've done a few blog posts in this series, I'll tie everything up into a PDF and add a link to it. This post is a good start - 6500 words. Phew.

First up is describing just what CHECKDB does. As with all things related to DBCC, this topic has its share of misinformation. In this post I'll set the record straight by running through all the stages of CHECKDB in SQL Server 2000 and 2005. This was originally split into 5 posts but I'll bring them all together today, add some more info, and make things a little clearer. There are a lot of terms in this post that may be new to some of you - I'll cover them in the internals series over time.

1. Get a transactionally consistent view of the database

CHECKDB needs a consistent view of the database. Why? Well, usually its running on a live database with in-flight transactions. It needs to read and analyze the whole database but it can't do it instantaneously so it has to take steps to ensure that what it reads is transactionally consistent.

Here's an example. Consider a transaction to insert a record into a table that is a heap and has a non-clustered index, with a concurrent CHECKDB that doesn't enforce a consistent view. The table record is inserted first, and then the non-clustered index record is inserted (that's just the way the operations are split up in the database engine).  Because this hypothetical CHECKDB doesn't have a consistent view, it could read the record in the table but not that in the index, conclude that the non-clustered index is out of sync with the table and flag an 8951 missing-index-record error.

How could this happen? Depending on the order in which the pages are read by the CHECKDB process, the page on which the new non-clustered index record should go could be read before the page on which the new heap record should go. If the index page read happens before the insert transaction, and the table page read happens after the insert tra