The Euclidian Geometry of Eggs


One problem (the only one!) of going on vacation with Kimberly is that can be hard to banish SQL Server completely from conversation. Over breakfast this morning we were discussing the pros and cons of advising someone to use sp_attach_single_file_db as a way to shrink an out-of-control transaction log – with careful guidance it can be done, but there’s a lot of scope for misuse and getting into trouble.


One problem with being on vacation in general is that your mind wanders away from the normal bounds of rational thought (well, at least mine does…) While discussing the merits of shrinking transaction logs I was cutting up my eggs and mused aloud on how much easier it was to divide an egg in half when it was scrambled compared to when it was raw – you can get a nice Euclidian straight edge. After that Kimberly had nothing else to say about transaction logs :-)


Then I wondered how far away we are from the mainland (we’re on Maui for a week, then on a live-aboard dive boat out of Kona – the Kona Aggressor – for another week). Luckily the waitress brought the breakfast check so I spent 5 minutes doing the a2 = b2 + c2 calculation (where a was our flight length from Seattle, b is the distance south from Seattle, and c is the distance from the mainland). Figuring about 2700 miles for the flight, and 2000 miles south of Seattle (and no-doubt convincing everyone around us that I needed to use long multiplication, scientific notation, long division, and geometric figures to calculate the tip on the breakfast check), I came up with roughly 1800 miles as the distance of Hawaii from the mainland. In reality, the distance is about 1625 miles – not bad!


This is my first trip to Hawaii (and Kimberly’s fourth, but first to Maui) – it’s a very cool place. On Tuesday we took a long helicopter tour around the island (courtesy of Blue Hawaiian Helicopters) which gave us some stunning views of the volcanic scenery (we’re doing a similar tour of the Big Island after the dive trip). Today we’re going to drive to the top of the 10000 foot volcano to watch the sunset and do some bird-watching. Here are a few photos:


 


 


Ok – back to vacation…

Another year, another TechEd over…


TechEd US is done for another year! As I mentioned before, we did a lot of stuff but still found time to chill by the pool a few times in the Speaker Hotel. This was my first US TechEd since leaving Microsoft last year so it was quite interesting seeing the organizational side of things from the outside. I was particularly pleased that my new Surviving Corruption – From Detection To Resolution session clinched a prestigious top-10 rating (#6) for the whole conference – look out for it at all the other conferences I’ll be at this year (next post today…)


Edit: Forgot to say – thanks to all those in the Olympia, WA User Group who came out yesterday to see us present the Surviving Corruption session!


We’ve already started posting scripts from our session demos (see the Past Conferences page) and I’m blogging detailed walkthroughs of my demos from the corruption session in my CHECKDB From Every Angle series. The online panel we did hasn’t been released yet on the TechEd Online site – I’ll blog when it is.


Now we’re off for a couple of weeks of real vacation – flying, diving, bird-watching, and best of all, not working!


I’ll leave you with my usual conference wrap-up… thanks to Carlos Santillana for the photos!


Using the Dedicated Admin Connection to fix Msg 8992: corrupt system tables

Today I presented my brand new session Surviving Corruption: From Detection to Recovery at TechEd. I had a lot of fun putting together the demos, presenting the session, and talking to people afterwards. During the session, I promised to blog each of the demos so that everyone can run through them – here's the first one.

On SQL 2000, it was pretty easy to get into the system tables and manually change them – all you had to do was:

EXEC sp_configure 'allow updates', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO

And then you could insert, update, and delete whatever you wanted in the all the system tables, including the critical three – sysindexes, sysobjects, and syscolumns. The problem was that sometimes people actually did this and messed things up – for instance, by manually deleting an object from sysobjects, but leaving around all the other info about the object – such as indexes and columns. DBCC CHECKCATALOG in SQL 2000 would find this, but DBCC CHECKDB would not – as it didn't run the DBCC CHECKCATALOG code – any most people do not run DBCC CHECKCATALOG at all. Many times now, I've seen databases upgraded to 2005 and suddenly DBCC CHECKDB is reporting metadata corruption errors – all because someone had manually changed the system tables on 2000, and I changed DBCC CHECKDB in 2005 to include the DBCC CHECKCATALOG checks.

This demo is all about that. I created a 2000 database, manually deleted a row in sysobjects and then upgraded the database to 2005. The corrupt database is available in a zip file – DemoCorruptMetadata.zip. If you unzip it into a folder C:\SQLskills then you can attach it using:

RESTORE DATABASE DemoCorruptMetadata FROM DISK = 'C:\SQLskills\DemoCorruptMetadata.bak'
   
WITH MOVE 'DemoCorruptMetadata' TO 'C:\SQLskills\DemoCorruptMetadata.mdf',
   
MOVE 'DemoCorruptMetadata_log' TO 'C:\SQLskills\DemoCorruptMetadata_log.ldf',
   
REPLACE;
GO

So what does the corruption look like on 2005?

DBCC CHECKDB (DemoCorruptMetadata) WITH NO_INFOMSGS, ALL_ERRORMSGS;
GO
Msg 8992, Level 16, State 1, Line 1
Check Catalog Msg 3853, State 1: Attribute (object_id=1977058079) of row (object_id=1977058079,column_id=1) in sys.columns does not have a matching row (object_id=1977058079) in sys.objects.
Msg 8992, Level 16, State 1, Line 1
Check Catalog Msg 3853, State 1: Attribute (object_id=1977058079) of row (object_id=1977058079,column_id=2) in sys.columns does not have a matching row (object_id=1977058079) in sys.objects.
CHECKDB found 0 allocation errors and 2 consistency errors not associated with any single object.
CHECKDB found 0 allocation errors and 2 consistency errors in database 'DemoCorruptMetadata'.

This is what we expect. Notice that there's no recommended repair level at the end of the output – this is because CHECKDB can't repair metadata corruptions. We can't fix this with a backup – unless we have a backup from 2000 from before the manual delete in the system tables. To fix this we'd need to go back to 2000, fix the corruption, and then upgrade again – usually not feasible.

Instead, we're going to fix it by manually altering the system tables in 2005 – something that's purportedly not possible. First let's see what tables there are that could include column information (remembering that the system catalogs were completely rewritten between 2000 and 2005):

SELECT [name] FROM DemoCorruptMetadata.sys.objects WHERE [name] LIKE '%col%';
GO

name
——————
sysrowsetcolumns
syshobtcolumns
syscolpars
sysiscols

I know that sysrowsetcolumns and syshobtcolumns are involved at low-levels of the Storage Engine and don't contain relational metadata, so let's try syscolpars. I want to see what columns there are to see if one of the looks like an object ID, and another looks like a column ID. This query will just return the table columns, with no rows (because the condition 1=0 is always false:

SELECT * FROM DemoCorruptMetadata.sys.syscolpars WHERE 1 = 0;
GO

Msg 208, Level 16, State 1, Line 1
Invalid object name 'DemoCorruptMetadata.sys.syscolpars'.

I can't bind to internal system tables in 2005. But – I can bind to internal system tables using the Dedicated Admind Connection (or DAC for short). This is documented in Books Online at http://msdn.microsoft.com/en-us/library/ms179503.aspx. You can get to the DAC through SQLCMD using the /A switch. So – assuming I'm now connected through the DAC, I'll try that command again:

C:\Documents and Settings\paul>sqlcmd /A
1> USE DemoCorruptMetadata;
2> GO
Changed database context to 'DemoCorruptMetadata'.
1> SELECT * FROM sys.syscolpars WHERE 1=0;
2> GO
id          number colid       name

xtype utype       length prec scale collationid status      maxinrow xmlns
 dflt        chk         idtval

———– —— ———– ————————————————-
——————————————————————————-
—– ———– —— —- —– ———– ———– ——– ———–
 ———– ———– ——————————————————-
———–

(0 rows affected)
1>

This looks like the table. Now I'll query against it using the object ID from the original corruption message:

1> SELECT colid, name FROM sys.syscolpars WHERE id = 1977058079;
2> GO
colid       name
———– ——————————————————————–
————————————————————
          1 SalesID
         
2 CustomerID
(2 rows affected)
1>

Cool. So I'll try deleting the orphaned columns:

1> DELETE FROM sys.syscolpars WHERE id = 1977058079;
2> GO
Msg 259, Level 16, State 1, Server ROADRUNNERPR, Line 1
Ad hoc updates to system catalogs are not allowed.
1>

Hmm. And it doesn't help if I set 'allow updates' to 1, or try putting the database into single-user mode.

There IS a way though. You can put the SERVER into single-user mode, then connect with the DAC and you can then update the system tables. This particular twist on using the DAC isn't documented anywhere except in an MSDN forum thread answered by someone from Microsoft (see here).

BEWARE (if I could put little flashing lights around this too then I would…) that this is undocumented and unsupported – misuse will lead to unrepairable corruption of your databases.

The sequence of events to follow is:

  • make a backup of the database just in case something goes wrong
  • shutdown the server
  • go to the binaries directory (e.g. C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn) and start the server in single-user mode using 'sqlservr -m'
  • connect back in using SQLCMD /A, and run the deleta again. This time it will work, but will give an error about metadata cache consistency:

C:\Documents and Settings\paul>sqlcmd /A
1> USE DemoCorruptMetadata;
2> GO
Changed database context to 'DemoCorruptMetadata'.
1> DELETE FROM sys.syscolpars WHERE id = 1977058079;
2> GO

(2 rows affected)
Warning: System table ID 41 has been updated directly in database ID 12 and cache coherence may not have been maintained. SQL Server should be restarted.
1>

  • The system table has been updated, but the in-memory cache of metadata is now out-of-sync with the system tables. So, shutdown the server again as the message suggests and restart it normally
  • run CHECKDB again and you'll see the corruption has been fixed.

 

Hope this helps some of you. Watch this space for the next demo from TechEd of repairing corruption when no backup is available.