Thursday, June 26, 2008

One problem (the only one!) of going on vacation with Kimberly is that can be hard to banish SQL Server completely from conversation. Over breakfast this morning we were discussing the pros and cons of advising someone to use sp_attach_single_file_db as a way to shrink an out-of-control transaction log - with careful guidance it can be done, but there's a lot of scope for misuse and getting into trouble.

One problem with being on vacation in general is that your mind wanders away from the normal bounds of rational thought (well, at least mine does...) While discussing the merits of shrinking transaction logs I was cutting up my eggs and mused aloud on how much easier it was to divide an egg in half when it was scrambled compared to when it was raw - you can get a nice Euclidian straight edge. After that Kimberly had nothing else to say about transaction logs :-)

Then I wondered how far away we are from the mainland (we're on Maui for a week, then on a live-aboard dive boat out of Kona - the Kona Aggressor - for another week). Luckily the waitress brought the breakfast check so I spent 5 minutes doing the a2 = b2 + c2 calculation (where a was our flight length from Seattle, b is the distance south from Seattle, and c is the distance from the mainland). Figuring about 2700 miles for the flight, and 2000 miles south of Seattle (and no-doubt convincing everyone around us that I needed to use long multiplication, scientific notation, long division, and geometric figures to calculate the tip on the breakfast check), I came up with roughly 1800 miles as the distance of Hawaii from the mainland. In reality, the distance is about 1625 miles - not bad!

This is my first trip to Hawaii (and Kimberly's fourth, but first to Maui) - it's a very cool place. On Tuesday we took a long helicopter tour around the island (courtesy of Blue Hawaiian Helicopters) which gave us some stunning views of the volcanic scenery (we're doing a similar tour of the Big Island after the dive trip). Today we're going to drive to the top of the 10000 foot volcano to watch the sunset and do some bird-watching. Here are a few photos:

 

 

Ok - back to vacation...

Thursday, June 26, 2008 2:14:03 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Monday, June 23, 2008

As well as the usual round of conferences later this year, we've also organized some public classes in the UK after lots of requests. In between these two classes we'll be hopping over to Dublin to do a launch seminar for Microsoft on SQL Server 2008 - more details on that as they become available.

The UK classes are organized with our UK partners SQLKnowHow.com. We haven't taught in the UK since a one-day seminar we did with Tony Rogerson (one of the founders of SQLKnowHow) back in March last year so this is pretty exciting (and the Edinburgh class will be at my old alma-mater, The University of Edinburgh). The complete line-up is below - register now to avoid disappointment as the classes are filling up fast.

Best Practices in Performance and Availability for SQL Server 2005/2008

  • When: 1st to 3rd September, 2008
  • Where: Hatfield, Hertfordshire
  • Who: Paul and Kimberly
  • How much: See here for details, discounts, and early-bird specials
  • What:

    This class has three primary goals (for almost all topics/modules): planning, practice/implementation and post-mortem - with the largest emphasis on designing/implementing the RIGHT solution. Questions that you must ask are: How do you choose technologies to fit requirements and effectively use key features of SQL Server 2005/2008? How does your technology/choice affect workload performance?

    Only after an in-depth plan is developed should you move on to actual implementation. So what are the areas that you need to consider?

    • Architecting for Availability
    • Architecting for Performance
    • Maintaining Performance and Availability

    And just to be clear, this is not a high-level class on planning. This is an intense, in-depth class encompassing structures, internals, technologies and solutions. Planning is a critical part of performance, high-availability, database maintenance and disaster recovery - but the most-often disregarded.

    Performance tuning spans many areas within SQL Server from database creation to database design to the code you execute (ad-hoc or procedural). A single magic bullet does not exist (indexing is the closest thing to a magic bullet for some queries). However, to achieve a truly scalable and reliable database it takes a variety of best practices - from database creation (including file structure and placement) to table design and creation (using vertical and horizontal partitioning techniques) to system architecture (including disaster recovery planning and implementation) to ongoing maintenance. Whether you're trying to achieve high performance for a few users or scale to support thousands, there are numerous areas that you can tune to improve performance - proactively. But, how do you make this a reality?

    SQL Server 2005 and 2008 provide a variety of options to help keep your database more available. However, even in the event of a disaster, are you sure you know the best path for recovery - with the least amount of downtime and/or data loss? Putting a well-thought out plan into practice requires a thorough understanding of the technologies, their pitfalls and the effects of many technologies when combined. In terms of architecture, we will start by discussing the most important part of designing an available solution - requirements. Then we'll show how to use requirements to drive a technology decision - not the other way around, which happens so often and results in an inadequate implementation.

    No matter how much effort you spend on the design of your database, if you don't maintain it in production then it will suffer from performance and manageability problems - and possibly data loss and/or downtime. The key to availability and performance is well thought-out and automated database maintenance. The final part of the course will discuss maintenance strategies required to keep your carefully designed system available and performing well, plus a primer on recovering from disasters.

    If you're planning, or already manage, an enterprise system and want better performance and availability - then this is the place to be!

    Module List:

      1. Foundations - SQL Server structures and algorithms
      2. Architecting for Availability
      3. Architecting for Performance
      4. Maintaining Performance and Availability

Indexing for Performance in SQL Server 200/2005/2008

  • When: 8th to 9th September, 2008
  • Where: Edinburgh
  • Who: Paul and Kimberly
  • How much: See here for details, discounts, and early-bird specials
  • What:

    There are many areas of performance tuning in SQL Server: database design, application design, hardware/software configuration, and many more. But none are as important as indexing. Creating the "right indexes" is the most important thing you can do for performance and scalability. Is proper indexing something your application is missing? Do you realize the impact of your clustering key; forcing your base structure of your tables to be either ordered or unordered. If ordered is chosen, by what type of column(s) should the data be ordered? Is the decision solely based on query performance or are there other factors?

    Whether your system is 24x7 or a small system just trying to setup for future growth and improved performance this course is for you! We will cover the often-overlooked impacts of poorly chosen clustered indexes, where/why clustered indexes help the most and how the type of table and the type/frequency of your queries affect your decisions. Additionally, once the internals, statistics and base table structures have been defined, we will talk about indexing strategies for search arguments (including SQL Server 2008 Filtered Indexes), joins, aggregations and appropriate uses for indexed views. Finally, we'll discuss index maintenance as well as how to evaluate your indexing strategy over time to make sure it remains appropriate as your data and workload changes.

    If you want better performance and excellent insight into the wide range of indexing strategies - as well as how things work internally, this is the place to be!

    Course Modules

    1. Index Internals
    2. Statistics
    3. Indexing Strategies, Part I: SARGs and Joins
    4. Indexing Strategies, Part II: Aggregations and Indexed Views
    5. Index Maintenance
    6. Is Your Indexing Strategy Working?
Monday, June 23, 2008 5:07:12 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Thursday, June 19, 2008

TechEd US is done for another year! As I mentioned before, we did a lot of stuff but still found time to chill by the pool a few times in the Speaker Hotel. This was my first US TechEd since leaving Microsoft last year so it was quite interesting seeing the organizational side of things from the outside. I was particularly pleased that my new Surviving Corruption - From Detection To Resolution session clinched a prestigious top-10 rating (#6) for the whole conference - look out for it at all the other conferences I'll be at this year (next post today...)

Edit: Forgot to say - thanks to all those in the Olympia, WA User Group who came out yesterday to see us present the Surviving Corruption session!

We've already started posting scripts from our session demos (see the Past Conferences page) and I'm blogging detailed walkthroughs of my demos from the corruption session in my CHECKDB From Every Angle series. The online panel we did hasn't been released yet on the TechEd Online site - I'll blog when it is.

Now we're off for a couple of weeks of real vacation - flying, diving, bird-watching, and best of all, not working!

I'll leave you with my usual conference wrap-up... thanks to Carlos Santillana for the photos!

Thursday, June 19, 2008 2:30:20 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Wednesday, June 11, 2008

Today I presented my brand new session Surviving Corruption: From Detection to Recovery at TechEd. I had a lot of fun putting together the demos, presenting the session, and talking to people afterwards. During the session, I promised to blog each of the demos so that everyone can run through them - here's the first one.

On SQL 2000, it was pretty easy to get into the system tables and manually change them - all you had to do was:

EXEC sp_configure 'allow updates', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO

And then you could insert, update, and delete whatever you wanted in the all the system tables, including the critical three - sysindexes, sysobjects, and syscolumns. The problem was that sometimes people actually did this and messed things up - for instance, by manually deleting an object from sysobjects, but leaving around all the other info about the object - such as indexes and columns. DBCC CHECKCATALOG in SQL 2000 would find this, but DBCC CHECKDB would not - as it didn't run the DBCC CHECKCATALOG code - any most people do not run DBCC CHECKCATALOG at all. Many times now, I've seen databases upgraded to 2005 and suddenly DBCC CHECKDB is reporting metadata corruption errors - all because someone had manually changed the system tables on 2000, and I changed DBCC CHECKDB in 2005 to include the DBCC CHECKCATALOG checks.

This demo is all about that. I created a 2000 database, manually deleted a row in sysobjects and then upgraded the database to 2005. The corrupt database is available in a zip file - DemoCorruptMetadata.zip. If you unzip it into a folder C:\SQLskills then you can attach it using:

RESTORE DATABASE DemoCorruptMetadata FROM DISK = 'C:\SQLskills\DemoCorruptMetadata.bak'
   
WITH MOVE 'DemoCorruptMetadata' TO 'C:\SQLskills\DemoCorruptMetadata.mdf',
   
MOVE 'DemoCorruptMetadata_log' TO 'C:\SQLskills\DemoCorruptMetadata_log.ldf',
   
REPLACE;
GO

So what does the corruption look like on 2005?

DBCC CHECKDB (DemoCorruptMetadata) WITH NO_INFOMSGS, ALL_ERRORMSGS;
GO
Msg 8992, Level 16, State 1, Line 1
Check Catalog Msg 3853, State 1: Attribute (object_id=1977058079) of row (object_id=1977058079,column_id=1) in sys.columns does not have a matching row (object_id=1977058079) in sys.objects.
Msg 8992, Level 16, State 1, Line 1
Check Catalog Msg 3853, State 1: Attribute (object_id=1977058079) of row (object_id=1977058079,column_id=2) in sys.columns does not have a matching row (object_id=1977058079) in sys.objects.
CHECKDB found 0 allocation errors and 2 consistency errors not associated with any single object.
CHECKDB found 0 allocation errors and 2 consistency errors in database 'DemoCorruptMetadata'.

This is what we expect. Notice that there's no recommended repair level at the end of the output - this is because CHECKDB can't repair metadata corruptions. We can't fix this with a backup - unless we have a backup from 2000 from before the manual delete in the system tables. To fix this we'd need to go back to 2000, fix the corruption, and then upgrade again - usually not feasible.

Instead, we're going to fix it by manually altering the system tables in 2005 - something that's purportedly not possible. First let's see what tables there are that could include column information (remembering that the system catalogs were completely rewritten between 2000 and 2005):

SELECT [name] FROM DemoCorruptMetadata.sys.objects WHERE [name] LIKE '%col%';
GO

name
------------------
sysrowsetcolumns
syshobtcolumns
syscolpars
sysiscols

I know that sysrowsetcolumns and syshobtcolumns are involved at low-levels of the Storage Engine and don't contain relational metadata, so let's try syscolpars. I want to see what columns there are to see if one of the looks like an object ID, and another looks like a column ID. This query will just return the table columns, with no rows (because the condition 1=0 is always false:

SELECT * FROM DemoCorruptMetadata.sys.syscolpars WHERE 1 = 0;
GO

Msg 208, Level 16, State 1, Line 1
Invalid object name 'DemoCorruptMetadata.sys.syscolpars'.

I can't bind to internal system tables in 2005. But - I can bind to internal system tables using the Dedicated Admind Connection (or DAC for short). This is documented in Books Online at http://msdn.microsoft.com/en-us/library/ms179503.aspx. You can get to the DAC through SQLCMD using the /A switch. So - assuming I'm now connected through the DAC, I'll try that command again:

C:\Documents and Settings\paul>sqlcmd /A
1> USE DemoCorruptMetadata;
2> GO
Changed database context to 'DemoCorruptMetadata'.
1> SELECT * FROM sys.syscolpars WHERE 1=0;
2> GO
id          number colid       name

xtype utype       length prec scale collationid status      maxinrow xmlns
 dflt        chk         idtval

----------- ------ ----------- -------------------------------------------------
-------------------------------------------------------------------------------
----- ----------- ------ ---- ----- ----------- ----------- -------- -----------
 ----------- ----------- -------------------------------------------------------
-----------

(0 rows affected)
1>

This looks like the table. Now I'll query against it using the object ID from the original corruption message:

1> SELECT colid, name FROM sys.syscolpars WHERE id = 1977058079;
2> GO
colid       name
----------- --------------------------------------------------------------------
------------------------------------------------------------
          1 SalesID
         
2 CustomerID
(2 rows affected)
1>

Cool. So I'll try deleting the orphaned columns:

1> DELETE FROM sys.syscolpars WHERE id = 1977058079;
2> GO
Msg 259, Level 16, State 1, Server ROADRUNNERPR, Line 1
Ad hoc updates to system catalogs are not allowed.
1>

Hmm. And it doesn't help if I set 'allow updates' to 1, or try putting the database into single-user mode.

There IS a way though. You can put the SERVER into single-user mode, then connect with the DAC and you can then update the system tables. This particular twist on using the DAC isn't documented anywhere except in an MSDN forum thread answered by someone from Microsoft (see here).

BEWARE (if I could put little flashing lights around this too then I would...) that this is undocumented and unsupported - misuse will lead to unrepairable corruption of your databases.

The sequence of events to follow is:

  • make a backup of the database just in case something goes wrong
  • shutdown the server
  • go to the binaries directory (e.g. C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn) and start the server in single-user mode using 'sqlservr -m'
  • connect back in using SQLCMD /A, and run the deleta again. This time it will work, but will give an error about metadata cache consistency:

C:\Documents and Settings\paul>sqlcmd /A
1> USE DemoCorruptMetadata;
2> GO
Changed database context to 'DemoCorruptMetadata'.
1> DELETE FROM sys.syscolpars WHERE id = 1977058079;
2> GO

(2 rows affected)
Warning: System table ID 41 has been updated directly in database ID 12 and cache coherence may not have been maintained. SQL Server should be restarted.
1>

  • The system table has been updated, but the in-memory cache of metadata is now out-of-sync with the system tables. So, shutdown the server again as the message suggests and restart it normally
  • run CHECKDB again and you'll see the corruption has been fixed.

Hope this helps some of you. Watch this space for the next demo from TechEd of repairing corruption when no backup is available.

Wednesday, June 11, 2008 5:42:23 PM (Pacific Standard Time, UTC-08:00)  #    Comments [3]  | 
Monday, June 09, 2008

(I'm actually on-stage here at TechEd doing the  DAT track pre-con with Kimberly - she's on now until lunch so I'm catching up on forum problems...)

Here's a question that came up on of the SQLServerCentral.com corruption forums I monitor that I think is worth blogging about. To paraphrase:

I have a bunch of corruptions in a database, that look like they've been there for a while. Repair is my only option - it works but I'd like to know what data is being deleted. How can I do that? Here are some of the errors:

Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168576) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168577) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168578) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168579) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168580) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168581) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168582) could not be processed. See other errors for details.
Server: Msg 8976, Level 16, State 1, Line 2
Table error: Object ID 645577338, index ID 1. Page (1:168576) was not seen in the scan although its parent (1:165809) and previous (1:168575) refer to it. Check any previous errors.
Server: Msg 8978, Level 16, State 1, Line 2
Table error: Object ID 645577338, index ID 1. Page (1:168583) is missing a reference from previous page (1:168582). Possible chain linkage problem.

This is a clustered index that CHECKDB  will repair by deleting pages at the leaf-level - essentially deleting a bunch of records. The pages look to be trashed (there were a bunch more errors that I didn't include here that said the page headers were all corrupted - looked like the IO subsystem trashde a whole 64KB chunk of the disk) so there's nothing much else you can do. As the table has a clustered index, you can use the error messages to find the pages on either 'logical' side of the pages being deleted - and hence figure out the range of records that have been deleted.

The errors show that pages 168576 through 168582 in file 1 are corrupt. There are also errors that say the previous page of 168576 is 168575, and the next page of 168582 is 168583. If you do a DBCC PAGE of these two pages, you can find the lower and upper bound of the clustered index key values that have been lost. Think of three ranges:

  • the lower range of records that are intact, logically before the corrupt pages in the index
  • the range of records that will be deleted by repair
  • the upper range of records that are intact, logically after the corrupt pages in the index

To find the upper bound of the lower range:

DBCC TRACEON (3604); -- allows the output to come to the console
DBCC PAGE ('dbname', 1, 168575, 3);
GO

The key value in the slot at the end of output is the upper bound of the bottom range that's intact.

Then do:

DBCC PAGE ('dbname', 1, 168583, 3);
GO

The key value in the slot at the beginning of the output is the lower bound of the upper range that's intact.

Everything in the middle will be deleted. You could also try a DBCC PAGE on the corrupt pages themselves too - you might be able to see some data in them.

I'll be blogging a bunch more about repair after my corruption session this week at TechEd - watch this space!

Monday, June 09, 2008 7:54:32 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Sunday, June 08, 2008

Over the last few weeks I've seen (and helped correct) quite a few myths and misconceptions about index rebuild operations. There's enough now to make it worthwhile doing a blog post (and it's too hot here in Orlando for us to go sit by the pool so we're both sitting here blogging)...

Myth 1:  index rebuild pre-allocates the necessary space

This myth has two variations:

  1. The space for the new copy of the index is pre-allocated
  2. The space for the sort portion of the rebuild is pre-allocated

Neither of these are true. Index rebuild (whether online or offline, and at least as far back as 7.0) will create a new copy of the index before dropping the old copy. The pages and extents required to do this will always be allocated as needed, as with any other operation in SQL Server. The sort phase of an index rebuild, if required (in certain cases it is skipped in 2005), will adhere to the same allocation behavior.

Myth 2: indexes are rebuilt within a single file in a multi-file filegroup

This is a new one that I just heard yesterday - (paraphrasing) "In a two-file filegroup, an index in file 1 will be rebuilt into file 2. The next time it is rebuilt, it will be built in file 1. And so on".

This is untrue. Any time any allocations are done in a multi-file filegroup, the allocations are spread amongst all the files using the allocation system's proportional fill algorithm. In a nutshell, this says that space will be allocated more frequently from larger files with more free space than from smaller files with less free space. There is no concept in SQL Server of limiting allocations to a particular file in a multi-file filegroup.

Myth 3: non-clustered indexes are always rebuilt when a clustered index is rebuilt

This is untrue. The rules are a little complex here but can be summed up as follows:

  • In 2005+, rebuilding a unique or non-unique clustered index (without changing its definition) will NOT rebuild the non-clustered indexes
  • In 2000:
    • Rebuilding a non-unique clustered index WILL rebuild the non-clustered indexes
    • Rebuilding a unique clustered index will NOT rebuild the non-clustered indexes

The first few service packs of 2000 had bugs that changed the behavior of rebuilding unique clustered indexes back and forth - this is the source of much of the confusion around this myth.

For a much more detailed discussion of this, see my blog post from last Fall - Indexes From Every Angle: What happens to non-clustered indexes when the table structure is changed?.

Myth 4: BULK_LOGGED recovery mode decreases the size of the transaction log and log backups for an index rebuild

This myth is partly true.

Switching to the BULK_LOGGED recovery mode while doing an index rebuild operation WILL reduce the amount of transaction log generated, which is very useful for limiting the size of the transaction log file (note I say 'file', not 'files' - you only need one log file).

Switching to the BULK_LOGGED recovery mode while doing an index rebuild will NOT reduce the size of the transaction log BACKUP. Although the operation will be minimally-logged, the next transaction log backup will read all the transaction log since the last backup plus all the extents that were changed by the minimally-logged index rebuild. This will result in a log backup that's almost exactly the same size as for a fully-logged index rebuild. The ONLY time a log backup will contain data extents is when a minimally-logged operation has taken place since the last log backup - see here on MSDN for more info.

If you're considering using the BULK_LOGGED recovery mode, beware that you lose the ability to do point-in-time recovery to ANY point covered by a transaction log backup that contains even a single minimally-logged operation. Make sure that there's nothing else happening in the database that you may need to effectively roll-back with P.I.T. recovery. The operations you should perform if you're going to do this are:

  • In FULL recovery mode, take log backup immediately before switching to BULK_LOGGED
  • Switch to BULK_LOGGED and do the index rebuild
  • Switch back to FULL and immediately take a log backup

This limits the time period in which you can't do P.I.T. recovery.

Myth 5: online index rebuild doesn't take any locks

This myth is untrue. The 'online' in 'online index operations' is a bit of a misnomer.  Online index operations need to take two very short-term table locks. An S (Shared) table lock at the start of the operation to force all write plans that could touch the index to recompile, and a SCH-M (Schema-Modification - think of it as an Exclusive) table lock at the end of operation to force all read and write plans that could touch the index to recompile.

The most recent time this came up on the forums was someone noticing insert queries timing out after an online index rebuild operation had just started. The problem is that the  table lock that online index rebuild needs has to be entered into the grant queue in the lock manager until it can be acquired - and it will stay there until existing transactions that are holding conflicting locks either commit or roll-back. Any transaction that requires a conflicting lock AFTER the index rebuild lock has been queued but not acquired (and then released) will wait behind it in the lock grant queue. If the query timeout is reached before the transaction can get it's lock, it will timeout.

This is still much better than the table lock being held for the entire duration of the index rebuild operation. For more info, checkout this whitepaper on Online Index Operations in SQL Server 2005.

Sunday, June 08, 2008 9:12:56 AM (Pacific Standard Time, UTC-08:00)  #    Comments [6]  | 
Thursday, June 05, 2008

That time has rolled around again and we're flying down to Orlando for TechEd US tomorrow - my first US TechEd since I left Microsoft. We're doing a lot of stuff this year - here's our schedule if you're going to be there:

Monday

  • Full day pre-con seminar: SQL Server 2008 Overview for DBAs

Tuesday

  • 13.15 - 14.30 (Room N230) DAT354 Are Your Indexing Strategies Working?
  • 15.00 - 16.00 (TechEd Online Stage) Panel: Leveraging SQL Server Technologies to Build a Solid High-Availability Strategy
  • 16.00 - 18.00 DAT track booth

Wednesday

  • 10.15 - 11.30 (Room N220D) DAT375 Corruption Survival Techniques: From Detection to Recovery
  • 11.30 - 14.45 DAT track booth
  • 15.00 - 16.00 Blogger's Lounge

Thursday

  • 10.15 - 11.30 (Room S230E) DAT363 Essential Database Maintenance
  • 11.45 - 13.00 Speader Idol judging
  • 14.30 - 18.00 DAT track booth

Hopefully a bunch of you will stop by and say hi - I'm looking forward to seeing some familiar faces and some new ones! I'll try to blog while I'm there on questions I get and I've got some cool demos for the corruption session that I'll be blogging about over the summer.

See you next week...

Thursday, June 05, 2008 1:26:54 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

Theme design by Jelle Druyts

Pick a theme: