On index key size, index depth, and performance

In my Insider newsletter a couple of weeks ago, I discussed how index fragmentation is often considered when designing indexes, but index depth often isn’t. In the newsletter I said I’d do a more comprehensive blog post with some data, so this is it.

Fanout and Index Depth

The index depth is determined by the fanout of the index. From the newsletter:

The fanout of an index measures, for a page at level x in an index, how many pages it references in the level below (nearer the leaf level). The higher the fanout is, the fewer the number of levels in the index.

The index key size impacts the size of the structure needed to reference it. Specifically, the index key is pushed up to all entries (and all levels) in the index as it’s used to allow navigation through the index from the root page down to the leaf level.

The larger the index key size, the fewer index records can be stored in an index page and so the lower the fanout. The lower the fanout is the more levels are required in the index, depending on the number of pages at the leaf level.

For instance, if the fanout is 10 in an index, that means each index page can hold 10 index records, referencing 10 pages at the level below in the index. If the index has 10,000 pages at the leaf level, there needs to be 1,000 pages in the level above, then 100 pages, then 10 pages, and finally the root page. That’s a total of 5 levels.

For the same data, if the index fanout is changed to 100, and the index has 10,000 pages at the leaf level, the next level needs 100 pages, and then there’s the root page. That’s a total of only three levels.

I want to measure whether there’s a noticeable performance difference based on the fanout, and hence index depth, of an index from varying it’s key size, for single-row select operations. There won’t be any noticeable effect on scans, as that only involves a single traversal of the index, to find the starting point of the scan. (Ok, it’s a little more complicated than that for scans if any of the index leaf-level pages change while the scan is positioned on them, but that’s not relevant here.)

Test Description

The test I’m going to use is:

  • Create a table with 2 million records, each record being large enough that only one record can fit on each data page (I was going to do ten million rows, but that was just taking too long)
  • Drop any existing clustered index
  • Create a new clustered index with varying key size from 8 to 900 bytes (creating the index after populating the table guarantees the tightest space usage)
  • Ensure that all the index is in memory (clear wait stats and make sure there are no page reads from disk during the next step)
  • Time how long it takes to do a single row lookup of all 2 million rows (run 5 tests and average the times)

Here’s the code for my test:

USE [master];
GO

IF DATABASEPROPERTYEX (N'IndexDepthTest', N'Version') != 0
BEGIN
    ALTER DATABASE [IndexDepthTest] SET SINGLE_USERWITH ROLLBACK IMMEDIATE;
    DROP DATABASE [IndexDepthTest];
END
GO

CREATE DATABASE [IndexDepthTest] ON PRIMARY (
    NAME = N'IndexDepthTest_data',
    FILENAME = N'T:\IDT\IndexDepthTest_data.mdf',
    SIZE = 32768MB,
    FILEGROWTH = 256MB)
LOG ON (
    NAME = N'IndexDepthTest_log',
    FILENAME = N'N:\IDT\IndexDepthTest_log.ldf',
    SIZE = 2048MB,
    FILEGROWTH = 256MB);
GO

ALTER DATABASE [IndexDepthTest] SET RECOVERY SIMPLE;
GO

SET NOCOUNT ON;
GO

USE [IndexDepthTest];
GO

CREATE TABLE [DepthTest] (
    [c1] BIGINT IDENTITY,
    [c2] CHAR (8) DEFAULT 'c2',		-- to allow 16-byte key
    [c3] CHAR (92) DEFAULT 'c3',	-- to allow 100-byte key
    [c4] CHAR (300) DEFAULT 'c4',	-- to allow 400-byte key
    [c5] CHAR (500) DEFAULT 'c5',	-- to allow 900-byte key
    [c6] CHAR (4000) DEFAULT 'c6');	-- to force one row per leaf page

INSERT INTO [DepthTest] DEFAULT VALUES;
GO 2000000

-- Run one of the following sets of DROP/CREATE statements

-- No existing clustered index to drop
CREATE CLUSTERED INDEX [8ByteKey] ON [DepthTest] ([c1]);
GO

DROP INDEX [8ByteKey] ON [DepthTest];
GO
CREATE CLUSTERED INDEX [16ByteKey] ON [DepthTest] ([c1], [c2]);
GO

DROP INDEX [16ByteKey] ON [DepthTest];
GO
CREATE CLUSTERED INDEX [100ByteKey] ON [DepthTest] ([c1], [c3]);
GO

DROP INDEX [100ByteKey] ON [DepthTest];
GO
CREATE CLUSTERED INDEX [400ByteKey] ON [DepthTest] ([c1], [c3], [c4]);
GO

DROP INDEX [400ByteKey] ON [DepthTest];
GO
CREATE CLUSTERED INDEX [900ByteKey] ON [DepthTest] ([c1], [c3], [c4], [c5]);
GO

SELECT
    [index_depth],
    [index_level],
    [page_count],
    [record_count]
FROM sys.dm_db_index_physical_stats (
    DB_ID (N'IndexDepthTest'),
    OBJECT_ID (N'DepthTest'),
    1,
    0,
    'DETAILED');
GO

DECLARE @c INT = 0;
WHILE (@c != 5)
BEGIN
    DECLARE @t DATETIME = GETDATE ();
    DECLARE @a BIGINT = 0;
    DECLARE @b BIGINT;

    WHILE (@a != 2000000)
    BEGIN
        SELECT @b = [c1] FROM [DepthTest] WHERE [c1] = @a;
        SELECT @a = @a + 1;
    END;

    SELECT GETDATE () - @t;
    SELECT @c = @c + 1;
END;
GO

My test server is a Dell R720 with 16 physical cores (Intel E5-2670 @ 2.60 GHz), 64GB of memory, a Fusion-io/SanDisk 640GB SSD for storage, and I’m running the test on SQL Server 2012.

The test is designed both to make sure that the index is traversed all the way down to the leaf level (and the leaf record has to be accessed to check the existence of the value being selected and to retrieve it), and to make sure that all pages in the index are in memory.

I’ll walk through the steps for the 8-byte cluster key and then present the data for all the tests.

It took a few minutes to do the 2 million inserts, and then create the first clustered index. The results of the DMV call were:

index_depth index_level page_count           record_count
----------- ----------- -------------------- --------------------
4           0           2000000              2000000
4           1           4214                 2000000
4           2           16                   4214
4           3           1                    16

So with a index key size of 8 bytes, the index needs 4214 pages at level 1 in the index structure to hold references to all 2 million leaf-level pages. This means the fanout value is 2000000 / 4214, which is approximately 474.

The times for the 2 million selects for the 8-byte cluster key were 21.983s, 21.94s, 21.973s, 21.967s, 21.963s, with an average of 21.9652s, and a per-select average of 10.98 microseconds.

Test Results

Running the test for each of my test key sizes produced the following results:

Key Size Index Depth Total Page Count Fanout Average Time for selects Rough time per select
-------- ----------- ---------------- ------ ------------------------ ---------------------
8        4           2004231          474    21.9652 secs             10.9826 microsecs
16       4           2006980          288    21.8122 secs             10.9061 microsecs
100      5           2028182          72     22.9522 secs             11.4976 microsecs
400      6           2111124          19     23.7482 secs             11.8741 microsecs
900      8           2285728          8      25.5732 secs             12.7866 microsecs

The results clearly show that there’s a performance penalty for index seeks when the index has more levels. At each level of the index during a seek, a binary search takes place, to find the right index record to use to navigate down to the next level lower in the index, and this binary search takes CPU time.

For each additional level in the index, my results show that it takes roughly 0.4 to 0.5 microseconds of extra time, and that’s pure CPU time as there were no page reads during the tests.

You might wonder why the per-select time for the 16-byte key index is less than for the 8-byte key index, even though they have the same depth of 4 in my test. That’s to do with the binary search algorithm. On average, the number of comparisons that need to be done for a binary search of x elements is log (x) for log base 2. For the 8-byte index, the fanout (i.e. number of records per page for the binary search) is 474, giving an average number of comparisons of 8.9. For the 16-byte index, the fanout is 288, giving the average number of comparisons of 8.2. This slight drop accounts for the slight drop we see in the test time – it’s a tiny bit more efficient for a lower fanout with the same index depth. I’m not going to say that this means you’re better off with a GUID cluster key than a bigint – that’s a whole other discussion with much more to consider than just single-row select performance :-)

Summary

My results show that index depth does matter.

Index depth is determined by the number of rows in the index and the index key size. You can’t control the number of rows, but you can control the index key size. Where possible, the smaller you can keep the index key size, the smaller the index depth will be for the same number of records, and the faster an index traversal from root page to leaf level will be.

Even though we’re only talking about fractions of a microsecond, for workloads with huge numbers of single-row select operations, that all adds up, and especially so on older, slower processors where the difference will be more pronounced than in my tests. And these results also counter the argument that says “index depth doesn’t matter because it’s all in memory anyway”.

Bottom line – this is one more reason to keep your index keys as narrow as possible.

Btw, Kimberly goes into all this in much more detail in her excellent 4-hour Pluralsight course on SQL Server: Why Physical Database Design Matters.

Low priority locking wait types

[Edit 2016: Check out my new resource – a comprehensive library of all wait types and latch classes – see here.]

SQL Server 2014 (and Azure SQL Database V12) added some cool new functionality for online index operations to allow you to prevent long-term blocking because of the two blocking locks that online index operations require.

At the start of any online index operation, it acquires a S (share) table lock. This lock will be blocked until all transactions that are changing the table have committed, and while the lock is pending, it will block any transactions wanting to change the table in any way. The S lock is only held for a short amount of time, then dropped to an IS (Intent-Share) lock for the long duration of the operation. At the end of any online index operation, it acquires a SCH-M (schema modification) table lock, which you can think of as a super-exclusive lock. This lock will be blocked by any transaction accessing or changing the table, and while the lock is pending, it will block any transactions wanting to read or change the table in any way.

The new syntax allow you to specify how long the online index operation will wait for each of these locks, and what to do when the timeout expires (nothing: NONE, kill the online index operation: SELF, or kill the blockers of the online index operation: BLOCKERS – see Books Online for more info). While the online index operation is blocked, it shows a different lock wait type than we’re used to seeing, and any lock requests are allowed to essentially jump over the online index operation in the lock pending queues – i.e. the online index operation waits with lower priority than everything else on the system.

To demonstrate this, I’ve got a table called NonSparseDocRepository, with a clustered index called NonSparse_CL, and 100,000 rows in the table.

First, I’ll kick off an online index rebuild of the clustered index, specifying a 1 minute wait, and to kill itself of the wait times out:

ALTER INDEX [NonSparse_CL] ON [nonsparsedocrepository] REBUILD
WITH (FILLFACTOR = 70, ONLINE = ON (
	WAIT_AT_LOW_PRIORITY (
		MAX_DURATION = 1 MINUTES, ABORT_AFTER_WAIT = SELF)
	)
);
GO

I let it run for ten seconds or so, so make sure it got past the initial table S lock required. Now, in another connection, I’ll start a transaction that takes an IX table lock, which will block the final SCH-M lock the online index operation requires:

BEGIN TRAN;
GO

UPDATE [NonSparseDocRepository]
SET [c4] = '1'
WHERE [DocID] = 1;
GO

And then I’ll wait until the drive light on my laptop goes off, which lets me know that the online index rebuild is stalled. If I look in sys.dm_os_waiting_tasks (using the script in this post), I’ll see the rebuild is blocked (script output heavily edited for clarity and brevity):

session_id exec_context_id scheduler_id wait_duration_ms wait_type                blocking_session_id resource_description
57         0               4            7786             LCK_M_SCH_M_LOW_PRIORITY 58                  objectlock

Look at the wait type: LCK_M_SCH_M_LOW_PRIORITY. The _LOW_PRIORITY suffix indicates that this is a special lock wait attributable to the online index operation being blocked.

This also neatly proves that the wait-at-low-priority feature applies to both the blocking locks that online index operations require, even if the first one isn’t blocked.

And eventually the online index operation fails, as follows:

Msg 1222, Level 16, State 56, Line 1
Lock request time out period exceeded.

If I leave that open transaction in the other connection (holding its IX table lock), and try the index rebuild again, with the exact same syntax, it’s immediately blocked and the sys.dm_os_waiting_tasks script shows:

session_id exec_context_id scheduler_id wait_duration_ms wait_type                blocking_session_id resource_description
57         0               4            8026             LCK_M_S_LOW_PRIORITY     58                  objectlock

This shows that the initial blocking lock is blocked, and is waiting at low priority.

So if either of these wait types show up during your regular wait statistics analysis, now you know what’s causing them.

Cool workshops at SQLintersection in October

This year’s Fall SQLintersection conference in October is fast approaching and we’ve got some excellent full-day workshops lined up for you:

  • Sunday 25th: Kimberly with Queries Gone Wrong: Statistics, Cardinality, Solutions
  • Monday 26th: Jon and Tim with Managing SQL Server for the Non-DBA
  • Monday 26th: Glenn with Analyzing and Improving I/O Subsystem Performance
  • Friday 30th: Me with Performance Troubleshooting with Waits and Latches
  • Friday 30th: Brent with Advanced SQL Server High Availability and Disaster Recovery

And then of course there are another 30+ workshops on all kinds of Developer and I.T. topics (even a hands-on Internet-of-Things workshop where you program micro-controllers against Azure IoT Services!).

You can get more details about this year’s Fall SQLintersection conference in Las Vegas, October 25-30 at this website. And don’t forget to use the discount code SQLSKILLS when you register to save $50.

We hope to see you there!

June 2016 London classes open for registration

I’ve just released our London classes for 2016 for registration!

We’ll be teaching the following classes:

  • IEPTO1: Immersion Event on Performance Tuning and Optimization – Part 1
    • June 13-17 (US$200 discount for registering in 2015)
  • IEPTO2: Immersion Event on Performance Tuning and Optimization – Part 2
    • June 20-24 (US$200 discount for registering in 2015)

As you can see, we’re offering discounts off the early-bird price for all our 2016 classes if you register before the end of this year. The regular early-bird prices will apply from January 1st, 2016. If you’ve previously attended an Immersion Event, there’s a larger discount that’s always available to you whenever you register.

You can get all the logistical, registration, and curriculum details by drilling down from our main schedule page.

We hope to see you there!

New SSIS Immersion Events with Andy Leonard and Tim Mitchell

We’re very excited to announce that we’re partnering with our friends at Linchpin People to offer two new Immersion Events on SSIS, both debuting in 2016 along side our existing classes.

IESSIS1: Immersion Event on Learning SQL Server Integration Services

This is a 5-day course taught by Andy Leonard, with the following modules:

  1. Introduction
  2. Creating Your First SSIS package
  3. Introduction to the Data Flow Task
  4. Data Flow Task 201
  5. Data Flow Task 202
  6. The Control Flow
  7. Loop Containers
  8. Data Flow Task 301
  9. Data Flow Task 302
  10. Data Flow Task 303
  11. Event Handlers, Logging, and Configurations
  12. Security, Deployment, and Execution
  13. ETL Design Patterns
  14. Enterprise Execution Patterns

You can read through the detailed curriculum here.

The first offering of this new class will be in Chicago, at our usual location, from May 2-6, 2016 – all the details are here.

IESSIS2: Immersion Event on Advanced SQL Server Integration Services

This is a 4-day course taught by Andy Leonard and Tim Mitchell, with the following modules:

  1. SSIS Catalog
  2. SSIS Security
  3. Load Performance
  4. Data Flow Internals
  5. Testing Patterns
  6. SSIS Scripting
  7. Data Quality and SSIS
  8. Hacking the SSIS Catalog
  9. Advanced Logging
  10. Advanced SSIS Load Patterns
  11. Error Handling
  12. Data Edge Cases
  13. Business Intelligence Markup Language (Biml)

You can read through the detailed curriculum here.

The first offering of this new class will be in Bellevue, at our usual location, from September 19-22, 2016 – all the details are here.

Cool stuff – hope to see you there!

2016 Chicago and Bellevue classes open for registration

I’ve just released our first sets of US classes for 2016 for registration!

Our classes in April/May will be in Chicago, IL:

  • IE0: Immersion Event for Junior/Accidental DBAs
    • April 25-27 (US$120 discount for registering in 2015)
  • IEPTO1: Immersion Event on Performance Tuning and Optimization – Part 1
    • April 25-29 (US$200 discount for registering in 2015)
  • IEBI: Immersion Event on Business Intelligence
    • April 25-29 (US$200 discount for registering in 2015)
  • IEPTO2: Immersion Event on Performance Tuning and Optimization – Part 2
    • May 2-6 (US$200 discount for registering in 2015)
  • **NEW** IESSIS1: Immersion Event on Learning SQL Server Integration Services
    • May 2-6 (US$200 discount for registering in 2015)
  • IEHADR: Immersion Event on High Availability and Disaster Recovery
    • May 9-13 (US$200 discount for registering in 2015)

Our classes in September will be in Bellevue, WA:

  • IEPTO1: Immersion Event on Performance Tuning and Optimization – Part 1
    • September 12-16 (US$200 discount for registering in 2015)
  • IEPTO2: Immersion Event on Performance Tuning and Optimization – Part 2
    • September 19-23 (US$200 discount for registering in 2015)
  • **NEW** IESSIS2: Immersion Event on Advanced SQL Server Integration Services
    • September 19-22 (US$160 discount for registering in 2015)

We’ll likely have some US classes in the second half of 2016 (details in Spring 2016), and there will be classes in London in June that I’ll announce in the next few days.

As you can see, we’re offering discounts off the early-bird price for all our 2016 classes if you register before the end of this year. The regular early-bird prices will apply from January 1st, 2016. If you’ve previously attended an Immersion Event, there’s a larger discount that’s always available to you whenever you register.

You can get all the logistical, registration, and curriculum details by drilling down from our main schedule page.

We hope to see you there!

“Attempt to read or write protected memory” error from SSMS for System.Data

I just spent a couple of hours fruitlessly trying to solve a problem and thought I’d blog the solution in case anyone else hits the issue.

A few months back I got a new laptop and installed SQL Server 2014 on it and everything worked fine. A few weeks ago I installed SQL Server 2008 R2, SQL Server 2012, and Visual Studio 2013. After that, any time I tried to use the Dedicated Admin Connection (DAC) through SSMS, I got this error:

Untitled

I did some research online and couldn’t find any solutions that worked. So I tried repairing the installation, installing 2014 SP1 CU2 (latest build at time of writing), and a bunch of other things – to no avail. I was just about to give up and post a request for help when I thought I’d do one more search online.

When I did that, I noticed that all the solutions I’d tried had revolved around the assembly name in the error being something to do with SQL Server. In my case, it was System.Data, which is a .NET assembly. I added that into my search and found a bunch of new hits. Lo and behold, buried in the comments on a Stack Overflow question, I found the solution.

Turns out the problem was because of an upgraded .NET version, and the solution was to run the following from a command line and then reboot:

netsh winsock reset

And after that SSMS worked perfectly.

Hope this post helps others find the answer quickly in future!

My upcoming sessions at PASS and SQLintersection

The two major Fall conferences (PASS Summit and SQLintersection) are coming up in October so I wanted to give you a heads-up about what I’ll be presenting.

The workshop I’ll be doing at both conferences is based on one of my favorite topics – wait statistics and using them for performance troubleshooting. I’m very passionate about helping people learn about wait statistics and how to use them *correctly* as an invaluable aid when troubleshooting performance problems in SQL Server. You may have seen one of my user group sessions over the year where I’ve talked about wait stats for 60-90 minutes, but here I’ll have an entire day to expand into techniques for further troubleshooting, rather than just evaluating the wait and latch statistics themselves.

The no-frills title of the workshop is: Performance Troubleshooting Using Waits and Latches

Here’s the abstract:

One of the first things you should check when investigating performance issues are wait and latch statistics, as these can often point you in the right direction for further analysis. Unfortunately many people misinterpret what SQL Server is telling them and jump to conclusions about how to solve the problem – what is often called ‘knee-jerk performance tuning’. In this full-day workshop, you’ll learn how to investigate and interpret wait and latch statistics – practical techniques you can take home and start using immediately. You’ll also learn what a myriad of wait and latch types actually mean and how you can investigate lightly-documented types to gain insight into what causes them. Don’t waste time when it comes to performance troubleshooting; wait statistics are the fastest route to understanding your problems and this workshop will help you get there faster.

If you come along I promise you that you’ll learn a ton of really, really useful information and techniques – I’m looking forward to teaching it tremendously.

You can get more details about this year’s PASS Summit in Seattle, October 27-30 at this website.

You can get more details about this year’s Fall SQLintersection in Las Vegas, October 26-29 at this website.

Yes, unfortunately both conferences are unavoidably on the same week, which means some creative flying around the country on our part.

Whichever conference you go to, I’m looking forward to meeting you and I hope to see you in my workshop!

Cheers

Disaster recovery 101: fixing a broken boot page

One of the corruptions that can stymie all efforts at disaster recovery is broken boot page. If the boot page can’t be processed, the database can’t be brought online or even put into emergency mode. I first demonstrated how to work around this in my session on Advanced Data Recovery Techniques at PASS in 2014 and here I’d like to walk through the steps of what to do.

First of all, I’ll create a broken boot page in a test database:

-- Drop old database
USE [master];
GO

IF DATABASEPROPERTYEX (N'Company', N'Version') != 0
BEGIN
	ALTER DATABASE [Company] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
	DROP DATABASE [Company];
END

-- Create database and table
CREATE DATABASE [Company] ON PRIMARY (
    NAME = N'Company',
    FILENAME = N'D:\SQLskills\Company.mdf')
LOG ON (
    NAME = N'Company_log',
    FILENAME = N'D:\SQLskills\Company_log.ldf');
GO

USE [Company];
GO

CREATE TABLE [Sales] (
	[SalesID] INT IDENTITY,
	[CustomerID] INT DEFAULT CONVERT (INT, 100000 * RAND ()),
	[SalesDate] DATETIME DEFAULT GETDATE (),
	[SalesAmount] MONEY DEFAULT CONVERT (MONEY, 100 * RAND ()));

CREATE CLUSTERED INDEX [SalesCI] ON [Sales] ([SalesID]);
GO

-- Populate the table
SET NOCOUNT ON;
GO

INSERT INTO [Sales] DEFAULT VALUES;
GO 5000

-- Create some nonclustered indexes
CREATE NONCLUSTERED INDEX [SalesNCI_CustomerID] ON [Sales] ([CustomerID]);

CREATE NONCLUSTERED INDEX [SalesNCI_SalesDate_SalesAmount] ON [Sales] ([SalesDate]) INCLUDE ([SalesAmount]);
GO

-- Create a good backup
BACKUP DATABASE [Company] TO DISK = N'C:\SQLskills\OldCompany.bck'
WITH INIT;

-- And detach it
USE [master]
GO

EXEC sp_detach_db N'Company';
GO

Now I’ll corrupt it using a hex editor. The one I like to use is called HxD and it’s a freeware tool you can download from here.

image1

And then go to the offset of the boot page. It’s page 9 always, so the offset is 8192 x 9 = 73728.

image2

And make sure to select the ‘dec’ option to input the number in decimal, offset from beginning of the file:

image3

You’ll see the boot page contents, including the name of the database:

image4

Highlight all the lines down to the database name, and then right-click and select Fill selection…

image5

And then select the default to fill that area with zeroes:

image6

Which will make it look like this:

image7

Then hit the disk icon to save the file. Ignore any security errors you get about the ownership of the backup file.

Throughout the rest of these steps, if you get “Access is denied” from SQL Server, you need to change the security in the directory you’re using so the SQL Server service account has the correct file permissions.

You will also see different messages through some of these steps depending on which version of SQL Server you’re using – I’m using SQL Server 2014.

Exit from HxD.

Now we’re ready to try to salvage this database.

First I’ll try to attach it:

USE [master];
GO

-- Try attaching it again
EXEC sp_attach_db @dbname = N'Company', 
    @filename1 = N'D:\SQLskills\Company.mdf', 
    @filename2 = N'D:\SQLskills\Company_log.ldf';
GO
Msg 1813, Level 16, State 2, Line 5
Could not open new database 'Company'. CREATE DATABASE is aborted.
Msg 824, Level 24, State 2, Line 5
SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:9; actual 0:0). It occurred during a read of page (1:9) in database ID 6 at offset 0x00000000012000 in file 'D:\SQLskills\Company.mdf'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

The attach failed. You can verify this yourself by looking in sys.databases for the database – it’s not there.

I’ll try a hack-attach. I copied off the corrupt database files, and then create the dummy database and set it offline (to release SQL Server’s locks on the files):

CREATE DATABASE [Company] ON PRIMARY (
    NAME = N'Company',
    FILENAME = N'D:\SQLskills\Company.mdf')
LOG ON (
    NAME = N'Company_log',
    FILENAME = N'D:\SQLskills\Company_log.ldf');
GO

ALTER DATABASE [Company] SET OFFLINE;
GO

Then delete the dummy database files, and copy in the original corrupt database files. And then try to bring the database online again, completing the hack-attach:

ALTER DATABASE [Company] SET ONLINE;
GO
Msg 5181, Level 16, State 5, Line 33
Could not restart database "Company". Reverting to the previous status.
Msg 5069, Level 16, State 1, Line 33
ALTER DATABASE statement failed.
Msg 824, Level 24, State 2, Line 33
SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:9; actual 0:0). It occurred during a read of page (1:9) in database ID 6 at offset 0x00000000012000 in file 'D:\SQLskills\Company.mdf'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Hmmm… but did it work?

SELECT DATABASEPROPERTYEX (N'Company', N'STATUS');
GO
SUSPECT

Yes! Now let’s try doing an emergency-mode repair:

ALTER DATABASE [Company] SET EMERGENCY;
GO
ALTER DATABASE [Company] SET SINGLE_USER;
GO
Msg 824, Level 24, State 2, Line 43
SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:9; actual 0:0). It occurred during a read of page (1:9) in database ID 6 at offset 0x00000000012000 in file 'D:\SQLskills\Company.mdf'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Nope – it’s not going to work because the broken boot page won’t allow the database to be accessed at all.

Now we’ll fix it, again using a hex editor.

First off I’ll set the database offline again, copy off the broken files, and drop the database so it’s gone from SQL Server.

ALTER DATABASE [Company] SET OFFLINE;
GO

-- ***** Copy off the corrupt files

DROP DATABASE [Company];
GO

Now I’ll restore an older copy of the database and set it offline so I can open the files with HxD:

RESTORE DATABASE [Company] FROM
DISK = N'C:\SQLskills\OldCompany.bck'
WITH REPLACE;
GO

ALTER DATABASE [Company] SET OFFLINE;
GO

In HxD, I then open the restored copy of the database AND the corrupt database, and go to the boot page offset in both, just as I did in the setup phase above:

image8

And you can see that the boot page is intact in the restored copy.

Next, I’ll highlight from 12000 (hexadecimal) down to, but not including, 14000, and then right-click and hit Copy to copy the whole 8192 bytes. These offsets are the same in every database.

image9

Then go to the corrupt file, at offset 12000 (same in every database), right-click and select Paste Write to overwrite the broken boot page:

image10

And you’ll see everything between 12000 and 14000 (same in every database) go red:

image11

Now save the file and exit HxD.

Delete the restored files, and rename the corrupt files to their correct names. You’ll notice that HxD created a backup copy of the file we just changed – you can ignore it.

ALTER DATABASE [Company] SET ONLINE;
GO
Msg 5181, Level 16, State 5, Line 79
Could not restart database "Company". Reverting to the previous status.
Msg 5069, Level 16, State 1, Line 79
ALTER DATABASE statement failed.
Msg 9003, Level 20, State 9, Line 79
The log scan number (45:16:0) passed to log scan in database 'Company' is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication. Otherwise, restore from backup if the problem results in a failure during startup.
Msg 3414, Level 21, State 1, Line 79
An error occurred during recovery, preventing the database 'Company' (6:0) from restarting. Diagnose the recovery errors and fix them, or restore from a known good backup. If errors are not corrected or expected, contact Technical Support.

Good – the 824 message is gone, but now you can see we have another issue: all the LSNs in the boot page are incorrect now as we’re now using an older boot page that doesn’t match the more recent transaction log. Emergency mode and/or emergency-mode repair is necessary to either access the data or repair the corrupt database.

ALTER DATABASE [Company] SET EMERGENCY;
GO
ALTER DATABASE [Company] SET SINGLE_USER;
GO
DBCC CHECKDB (N'Company', REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS;
GO
Warning: The log for database 'Company' has been rebuilt. Transactional consistency has been lost. The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were. You should run DBCC CHECKDB to validate physical consistency. The database has been put in dbo-only mode. When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.

In this case there were no other corruptions, so all emergency-mode repair had to do is rebuild the transaction log.

Now we can get in to the database and access the data.

Note: the data is likely to be transactionally inconsistent. If you continue to use the database in production following this procedure, you do so entirely at your own risk.

If you don’t have a backup of the original database, you can use any database as a source for a good boot page – just make sure it has the same name as the one you’re trying to fix. Bear in mind that the further away from a recent backup of the original database, the more trouble you’ll have trying to get crash recovery to work.

And there you have it – no longer do broken boot pages have to curtail data recovery efforts.

Enjoy!

T-SQL Tuesday #67 – monitoring log activity with Extended Events

TSQL2sDay150x150_388014A5

On the second Tuesday of each month, many people in the SQL Server community join together to all blog on the same topic – a cool idea from Adam Machanic many years ago.

This month’s topic is Extended Events, hosted by Jes Borland (b | t) – you can see her original post here.

Yesterday in class I was discussing monitoring transaction log activity with a student, to show that when a transaction commits, there is a write to the transaction log file. This is easy to do with Extended Events.

I’m going to use the file_write_completed event to track writes occurring and the transaction_log event to watch log records being generated so we can see transactions committing.

First of all I’ll setup my simple scenario:

USE [master];
GO

IF DATABASEPROPERTYEX (N'Test', N'Version') > 0
BEGIN
	ALTER DATABASE [Test] SET SINGLE_USER
		WITH ROLLBACK IMMEDIATE;
	DROP DATABASE [Test];
END
GO

CREATE DATABASE [Test] ON PRIMARY (
    NAME = N'Test_data',
    FILENAME = N'D:\SQLskills\Test_data.mdf')
LOG ON (
    NAME = N'Test_log',
    FILENAME = N'C:\SQLskills\Test_log.ldf',
    SIZE = 1MB,
    FILEGROWTH = 0MB);
GO

USE [test];
GO

CREATE TABLE TestTable (
	c1 INT IDENTITY,
	c2 CHAR (1000) DEFAULT 'a');
GO

INSERT INTO [TestTable] DEFAULT VALUES;
GO

And here’s the simple Extended Event session:

-- Drop the session if it exists. 
IF EXISTS (
	SELECT * FROM sys.server_event_sessions
		WHERE [name] = N'MonitorLog')
    DROP EVENT SESSION [MonitorLog] ON SERVER
GO

-- Create the event session
CREATE EVENT SESSION [MonitorLog] ON SERVER
	ADD EVENT [sqlserver].[file_write_completed],
	ADD EVENT [sqlserver].[transaction_log]
	ADD TARGET [package0].[ring_buffer]
		WITH (MAX_MEMORY = 50MB, max_dispatch_latency = 1 seconds)
GO

-- Start the session
ALTER EVENT SESSION [MonitorLog] ON SERVER
STATE = START;
GO

I ran both of those scripts, then executed the INSERT a few times.

Now if I go to Object Explorer, I can use the Live Data Viewer (in SQL Server 2012 onwards). Find the session, right-click on it and select Watch Live Data.

WatchLiveData

Choose the following columns to view (right click on any column name and select Choose Columns…): name, timestamp, database_id, file_id, size, operation.

And then run another INSERT (you might have to run two to make the first set of data show in the Viewer) and you’ll see data like below.

LiveData

And then you can play around and watch stuff happening.

Enjoy! (and check out the other posts from this T-SQL Tuesday)