Just noticed this come up on my Facebook news feed - how to code assignment statements when programming (not in T-SQL):

  1. if (foo == 4) then ...
  2. if (4 == foo) then ...

#1 is the most natural way to write the conditional expression, but has a *massive* potential for introducing bugs that are very, very hard to find. If the second '=' is omitted, the conditional expression suddenly becomes an assignment statement and evaluates to true. Nasty.

#2 is the best way to avoid such problems, even though it seems unnatural. If the second '=' is omitted, it becomes an assignment statement but the compiler will barf, because you can't assign a value to a constant. (Some languages and data types may be immune to this - so this isn't a blanket statement - it's certainly a problem in C++.)

#1 was the cause of several bugs that took me a while to find while working on the Storage Engine code - all C++. I always use #2.

PS if (foo = 4) won't compile in C# if foo is an int, but you can come up with weird cases that do compile and break. Who uses managed code anyway?!? ;-) (ok, no hate mail please)

Categories:
General

In February this year we stopped off in Bangkok on the way to and from teaching in Hyderabad. On one of the days we did some sight-seeing, including the Imperial Palace in Bangkok. I took a bunch of photos of the wall paintings in the outer courtyard and I've posted an album up on Facebook (easier than a big, bloated blog post here). You can get to the album at http://www.facebook.com/album.php?aid=2020899&id=1293146061&l=5c4c559fa5 (even if you're not on Facebook).

Here's an example:

Enjoy!

(And it's official - this is blog post 39 this month, making this my most prolific month yet for blogging. An arguably dubious achievement...)

In this week's survey, I'm interested in your views on the best way to store large-value character data. I'll report on the results next week (around 4/3/09).

Thanks!

Categories:
LOB data | Surveys

Last week's survey was on what method you use to run consistency checks (see here for the survey). Here are the results as of 3/27/09 - again, very encouraging:

As you can see, 70% of respondents run DBCC CHECKDB on the production server, either with PHYSICAL_ONLY or without. For those running without it, be aware that using PHYSICAL_ONLY turns DBCC CHECKDB from a CPU-bound process into an I/O-bound process, and makes it run (potentially) magnitudes faster. It will still evaluate page checksums and torn-page protection, just skipping the higher-level logical checks.

I'm surprised to see so many people using a completely separate system to run consistency checks (restoring a backup and running a CHECKDB on the restored backup) - I've been a proponent of this method for a few years now, but I didn't think it had caught on so much (assuming a reasonably representative sample of readers responded to the survey). This method allows the entire consistency checking workload to be offloaded, and completely validates the backups used - but has the downside of requiring extra disk space on another server to restore the backup (I wish they'd build the system I got a patent for - to consistency check the database inside a backup without restoring it - see here).

Three of these choices I threw in to see if anyone was doing them so I could explain why they're not good methods to use.

  • "Don't run any consistency checks at all". I'm sure I don't need to labor the point with this one - you need to run consistency checks as well as having some kind of page protection turned on. If you don't proactively check for corruption, when it does occur it will likely be more widespread, and harder to recover within your data-loss and recovery time objectives than if you'd discovered it earlier. I've written lots about this in the Corruption and CHECKDB From Every Angle categories.
  • "Run DBCC CHECKDB on a database snapshot on a mirror database". I was asked about this several times while here at SQL Connections too. Database mirroring works by shipping transaction log records between the principal and mirror databases, NOT by shipping database pages - so if a page gets corrupted on the principal, the corruption will not be transferred to the mirror. This means that DBCC CHECKDB on a snapshot of the mirror does not reflect the consistency state of the principal database at all. Saying that, it is however, possible for corruption in the principal to affect the mirror. Imagine a column value is corrupted on disk by the principal's I/O subsystem. If that value is then read and used to calculate another value, which is then persisted - that calculated value is also 'corrupt' and will be reflected in the mirror database. Kind of an insidious, second-order corruption effect.
  • "Use BACKUP WITH CHECKSUM to validate page checksums, no DBCCs". Another option that sounds feasible, but in fact isn't for a couple of reasons. Firstly, BACKUP WITH CHECKSUM will stop when it finds a bad checksum, whereas DBCC CHECKDB will continue reading the rest of the database and tell you everything that's wrong with it. Secondly, BACKUP WITH CHECKSUM will only check those pages that HAVE page checkums, whereas DBCC CHECKDB will consistency check everything, regardless of whether a page has a checksum or not. For a database upgraded from an earlier version, where not every page will have a page checksum since you enabled them after upgrading, this is a critical point.

To summarize, this week's results were great, with almost 90% of respondents running some kind of consistency check. If you're not running any, for whatever reason, rethink your decision - there's always a way to run some kind of consistency checks and give yourself more peace of mind. See CHECKDB From Every Angle: Consistency Checking Options for a VLDB for more info.

Next post - this week's survey!

Jack Li, one of the Senior Escalation Engineers in Product Support, just posted details of an interesting case over on the CSS blog - his article is here. It talks about index builds and rebuilds, but the issue is the same for both, so I'll just talk about rebuilds.

The jist of the problem is that index rebuilds can parallelize, but sometimes they don't parallelize vey well. Each thread gets a certain range of the index to rebuild, using the existing index statistics to divide the ranges equally between the threads. If there's massive data skew, then one thread can end up doing the majority of the work, leading to a long run-time. The case in Jack's post involved a 250 million row index where 150 million rows had the same (NULL) key value. This range has to be processed by a single thread - a single value can't be divided between two+ threads.

Now, this is understandable behavior by the database engine, but it relies on the statistics being up-to-date. That's a bit of a catch-22 - rebuilding an index updates the statistics, but if the statistics aren't up-to-date then the index rebuild might parallelize badly! I guess the solution is that if you know that you have massive data skew in your large indexes, update statistics BEFORE doing an index rebuild. And given what I've been hearing this week at SQL Connections about how badly statistics keep biting people, I'm leaning towards a different recommendation for those people who have lots of perf trouble caused by statistics and the potential for skewed data - rebuild all your statistics regularly, and only rebuild/reorganize fragmented indexes. Statistics just cause so many problems it seems.

Thanks

PS Kimberly has a lot more info about statistics over on her blog - I'm just starting to venture into that mine-field

Just saw this on a forum - running REPAIR_ALLOW_DATA_LOSS on msdb to cope with corruption. Yes, this will work but it's certainly not safe and something I'd only recommend as a last resort.

Here's why: msdb stores your backup history and your Agent jobs. REPAIR_ALLOW_DATA_LOSS works by usually "delete what's broken and fix up all the links" (quoting myself). If you run repair on msdb, you need to pay close attention to what was deleted by repair, as it could lead to big problems later on. Some examples:

  1. What if repair deletes a record which just happened to store the Agent job that runs the transaction log backups for your main production database? Suddenly your log isn't being backed up and you don't know about it. The log starts to grow and eventually runs out of space. The database stops and your application is down until you figure out what's wrong.
  2. What if repair deletes a record which just happened to store the details of a log backup of the production database? Your have a disaster and run the script that looks through the backup history tables and auto-generates RESTORE statements to get you up and running again with up-to-the minute recovery. Because of the missing record, there's a missing RESTORE LOG statement in the middle of the restores of the log-backup-chain for the production database. So the restores fail, and you're down until you figure out what's wrong.

Bottom line, it's not safe to run repair on msdb - proceed with extreme caution. Not that repair won't work, or will cause corruption, but that it's far better to have backups of msdb that you can restore from.

(Yes, Kimberly's lecturing again...)

Here's something that I've seen crop up a lot recently on corruption forums:

Server: Msg 602, Level 21, State 50, Line 1
Could not find row in sysindexes for database ID 10, object ID 1, index ID 1. Run DBCC CHECKTABLE on sysindexes.

This isn't corruption - it comes from trying to attach a 2005 database to a 2000 server. If you try to restore a 2005 database on a 2000 server, you'll see:

Server: Msg 3169, Level 16, State 1, Line 1
The backed-up database has on-disk structure version 611. The server supports version 539 and cannot restore or upgrade this database.
Server: Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.

This gives a more useful error. Basically, SQL Server is not up-level compatible in terms of the database physical structures. A SQL 2000 server cannot understand the new structures that are in a SQL 2005 database. There's a lot of confusion about this, and why setting database compatibility level is not the same as the physical version of the database. My blog post Search Engine Q&A #13: Difference between database version and database compatibility level has more details.

Trying to do the same thing with a 2008 database on a 2005 server is a bit better. For the attach we get:

Msg 1813, Level 16, State 2, Line 1
Could not open new database 'Dbmaint2008'. CREATE DATABASE is aborted.
Msg 948, Level 20, State 1, Line 1
The database 'Dbmaint2008' cannot be opened because it is version 655. This server supports version 612 and earlier. A downgrade path is not supported.

(Note that 611 is the physical version for all SQL Server 2005 builds, but if VARDECIMAL is enabled, the version gets bumped by one to 612 - long story...)

But the restore error is still a little cryptic:

Msg 3241, Level 16, State 7, Line 1
The media family on device 'c:\sqlskills\dbmaint2008.bck' is incorrectly formed. SQL Server cannot process this media family.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.

Bottom line - you can't attach a higher-version database to a lower-version server.

(Continuing my habit of blogging while Kimberly's presenting - at least I'm not on stage this time...)

In early versions of SQL Server, it was sometimes necessary to 'pin' the pages from a heavily used table in the buffer pool to help performance. In later versions, the buffer pool became very sophisticated and does an excellent job of keeping the right data in memory (basically using an LRU algorithm). I saw a blog post today which mentioned that you can get into trouble using DBCC PINTABLE if you pin a table in memory and then the table grows and grows, and ends up taking up such a large proportion of the database that it causes massive performance problems - absolutely true. The blog post also mentioned that DBCC PINTABLE was deprecated in SQL Server 2005.

That's also true - it's deprecated, but in SQL Server 2005 it actually does NOTHING at all. I personally removed all the code - so now it just returns a success message.

Categories:
DBCC | Performance Tuning

I'm very pleased, and deeply honored, to announce that I've been made a Microsoft Regional Director. This is one of a very small group of people (about 120 worldwide) who Microsoft sees as influencers of, and liaisons between, the Microsoft community at large. Apart from me, the other SQL MVPs who are also RDs are Kimberly and Greg Low.

I looked around for a good description of what RDs do and what the RD program is all about and found two blog posts:

And you can get the complete list of RDs (I'm not listed there yet) at the RD site The Region.

Time for a few drinks tonight!

Categories:
General | Personal

After teaching some of the MCM-SharePoint class last week, one of the attendees pointed me at a blog post about measuring churn in SharePoint databases. The poster gave code to measure how large full backups are, which really only measures how much data there is in the database, not whether existing data has changed. If two successive full database backups are the same size, there's no way to tell how much changed - and of course, you need to take a full backup to be able to tell whether the size changed.

A while ago I wrote a script that would enable SQL database DBAs to tell how big the next differential backup will be. A differential backup contains everything that's changed since the last full database backup, so every new and everything changed. Although this still doesn't show whether a single piece of existing content changed multiple times, it can still show whether existing content changed at least once. Better still, you don't need to take any kind of backup to run this script.

So, to get an idea of the churn rate of your content databases, check out this script - see New script: How much of the database has changed since the last full backup?.

Enjoy!

PS If you're using SharePoint and find this useful, please let me know and if there's any other scripts that would be useful - Kimberly and I are both starting to get more into SharePoint admin from a SQL perspective.

While I was teaching the MCM-Database class last week, we were discussing fragmentation and the effect of a high-order GUID key on an index. Without going into too many details, having a random GUID - as generated from the NEWID() - function is bad, but having one generated by NEWSEQUENTIALID() isn't anyway near so bad (I'll discuss the details more in the fragmentation series I'm starting). As part of the demo, we wanted to change the column default for the leading key of a table from NEWID() to NEWSEQUENTIALID() - problem was that none of us could remember the exact syntax, so we worked it out together. I thought it would make an interesting post, so here it is.

First off, here's my table with a poor clustered index key:

CREATE TABLE BadKeyTable (
    c1 UNIQUEIDENTIFIER DEFAULT NEWID () ROWGUIDCOL,
    c2 SMALLDATETIME DEFAULT GETDATE (),
    c3 CHAR (400) DEFAULT 'a',
    c4 VARCHAR(MAX) DEFAULT 'b');
GO
CREATE CLUSTERED INDEX BadKeyTable_CL ON BadKeyTable (c1);
GO

INSERT INTO BadKeyTable DEFAULT VALUES;
GO

(And you'll notice that I've given up doing nice colors in the T-SQL I post - it's too time consuming). The default we're interested in is in bold above. To change the default, we first need to find the constraint name so we can drop it. There are two queries you can use:

SELECT [name] FROM sys.objects
WHERE [parent_object_id] = OBJECT_ID ('BadKeyTable');
GO

or

SELECT [name] FROM sys.default_constraints
WHERE [parent_object_id] = OBJECT_ID ('BadKeyTable');
GO

The second is obviously the more supported way, as the first will return the names of all sub-objects of this table - all constraints, and all internal tables, such as XML indexes. The second query returns:

name
-------------------------------
DF__BadKeyTable__c1__7C8480AE
DF__BadKeyTable__c2__7D78A4E7
DF__BadKeyTable__c3__7E6CC920
DF__BadKeyTable__c4__7F60ED59

The constraint we're interested in is the one for the first column - DF__BadKeyTable__c1__7C8480AE. Now we need to drop the constraint and then add the new one as there's no way to simply alter the constraint in-place. We do that using:

ALTER TABLE BadKeyTable DROP CONSTRAINT DF__BadKeyTable__c1__7C8480AE;
GO

ALTER TABLE BadKeyTable ADD CONSTRAINT DF__BadKeyTable__c1
DEFAULT NEWSEQUENTIALID() FOR c1;
GO

And we're done.

A thread cropped up on SQLServerCentral involving IAM chain corruption (see Inside the Storage Engine: IAM pages, IAM chains, and allocation units for details of IAM chains). The error from DBCC CHECKDB was:

Server: Msg 2576, Level 16, State 1, Line 1
IAM page (0:0) is pointed to by the previous pointer of IAM page (1:394336) object ID 229575856 index ID 4 but was not detected in the scan.

and there was some discussion of what the error meant, and why the initial page ID in the error of (0:0) means something special. There was a further question of how the errors would differ if the IAM page header was partially zero'd out by an I/O subsystem error.

We're on-stage here at SQL Connections doing a pre-con and I'm not on until this afternoon so I can bang out a quick blog post! I'm going to create a small database and show the difference between the two cases. The error above was from a SQL 2000 database, but the behavior is the same on SQL 2005 and 2008. Here's the script to create the database.

CREATE DATABASE CorruptIAMEXample;
GO
USE CorruptIAMExample;
GO

CREATE TABLE test (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
GO
CREATE CLUSTERED INDEX test_cl on test (c1);
GO

SET NOCOUNT ON;
GO
INSERT INTO test DEFAULT VALUES;
GO 1000

After corrupting the database, I can reproduce the error above by running DBCC CHECKDB on it:

Msg 2576, Level 16, State 1, Line 1
The Index Allocation Map (IAM) page (0:0) is pointed to by the previous pointer of IAM page (1:153) in object ID 0, index ID -1, partition ID 0, alloc unit ID 72057594042384384 (type Unknown), but it was not detected in the scan.

The error is saying that the first IAM page in the IAM chain (page 1:153) does not have a reference from metadata. The sysallocunits system table contains a link to the first IAM page, the root page, and the first page. You can see these by querying the sys.system_internals_allocation_units catalog view, or you can see this blog post  - Inside The Storage Engine: sp_AllocationMetadata - putting undocumented system catalog views to work. I corrupted the sysallocunits table so that the link to the first IAM page of the test table was removed. The clue is the first page ID in the error - if it's a (0:0), that's the missing metadata case. For SQL 2000, this can happen if someone manually updates the sysindexes table.

Now what about the other case, where the IAM page itself has a corrupt header? I recreated the database again and corrupted the header of the first IAM page of the test table. Here's the output from DBCC CHECKDB:

Msg 2575, Level 16, State 1, Line 1
The Index Allocation Map (IAM) page (1:153) is pointed to by the next pointer of IAM page (0:0) in object ID 2073058421, index ID 1, partition ID 72057594038386688, alloc unit ID 72057594042384384 (type In-row data), but it was not detected in the scan.
Msg 7965, Level 16, State 2, Line 1
Table error: Could not check object ID 2073058421, index ID 1, partition ID 72057594038386688, alloc unit ID 72057594042384384 (type In-row data) due to invalid allocation (IAM) page(s).
Msg 8928, Level 16, State 6, Line 1
Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown): Page (1:153) could not be processed. See other errors for details.

and a whole bunch of

Msg 8905, Level 16, State 1, Line 1
Extent (1:216) in database ID 21 is marked allocated in the GAM, but no SGAM or IAM has allocated it.

errors. This is because the IAM page no longer looks like an IAM page and so DBCC CHECKDB can't process it as such. If I zero out the IAM page completely, DBCC CHECKDB returns basically the same errors as above. 

I've created a zipped backup of the SQL 2005 database in each case:

You'll need to do a RESTORE FILELISTONLY and then maybe move the files when you restore. Have fun!

In this week's survey, I'd like to know *how* you run consistency checks, not how often. I'll report on the results next week (around 3/27/09)

Thanks!

Last week’s survey was on what kind of regular index maintenance you perform (see here for the survey) as a way of kicking off a new series I’m writing around index maintenance. Here are the results as of 3/21/09 – I find them very encouraging:

As you can see, about 2/5 of respondents are performing some form of analysis-based fragmentation removal (answers 5+6), which I consider the best way to perform index maintenance, if you’re willing to invest the time involved to set it up. It allows the least amount of work to be performed, for the most targeted performance gains – and so is especially appropriate for 24x7 systems where there’s a minimal or non-existent maintenance window.

The next best option is to do all rebuilds or all defrags based on a fragmentation threshold (answers 3+4), which about 1/5 of respondents do. This also allows work to be limited, but by choosing only a single method of removing fragmentation, there are pros and cons. Now, the survey was limited to a number of questions so I couldn’t explore what the threshold is that people are using (e.g. logical fragmentation, page density, extent fragmentation, or something else). Some measures are good to use and some not so good, and I’ll be exploring the various counters and ways of determining fragmentation as the series progresses.

Either of the options to operate on all indexes regardless of fragmentation (options 1+2) can lead to lots of wasted resources (disk space, transaction log space, I/Os, CPU) by operating on indexes that are not fragmented in the first place, or for which fragmentation removal has no benefit for workload performance. About 1/3 of respondents do this. This isn’t surprising to me as rebuild-all-the-indexes-every-night/week is a very common index maintenance plan for “involuntary DBAs” who know that index maintenance is important, but don’t have the knowledge or training to implement a more sophisticated maintenance plan. This growing size of this group of people is one of the main reasons I’m going to write this series.

Doing absolutely nothing for index maintenance, which about 1/10 do, is usually not a good idea, as indexes in a database that’s not read-only commonly become fragmented over time. However, these people may know they don’t suffer from fragmentation issues, or that removing fragmentation has no effect on workload performance. However, my suspicion (based on what I see in the field) is that some of these respondents don’t realize the benefits of performing index maintenance.

You may be surprised to hear that I don’t consider doing nothing to be the worst choice. That dubious honor goes to answer 7 – doing any kind of index maintenance followed by a database shrink operation – as 3 respondents do. A post-maintenance shrink operation may well undo some of the benefits of the maintenance by introducing massive amounts of index fragmentation – see my blog post Auto-shrink - turn it OFF! for details of how bad this can get.

I’m actually pretty encouraged by these results though. Compared to how things were around 1999 when I wrote DBCC INDEXDEFRAG and DBCC SHOWCONTIG for SQL Server 2000, these results show that knowledge in the field (or at least in the group that reads my blog and responded) has vastly improved. My aim for the forthcoming series about index fragmentation and maintenance is to increase knowledge a lot more broadly.

Next up - this week's survey. Thanks for reading!

Way back at the start of me blogging here I wrote a comprehensive description of ghost records and the ghost cleanup process - see Inside the Storage Engine: Ghost cleanup in depth. A question came up in the class I'm teaching this week that's worth answering in a blog post - do ghost records occur in heaps? The answer is no, not during normal processing.

When snapshot isolation is enabled, deletes from a heap are ghosted, as part of the overall versioning process, which can lead to some interesting side-effects. A versioned record has an extra 14-bytes tagged on the end, so a heap record that suddenly becomes versioned is 14-bytes longer - which may mean it doesn't fit on the page any longer. This could lead to it being moved, resulting in a forwarding/forwarded record pair - just because the record was deleted! Now, the page has to be full for this time happen, and the Storage Engine will take steps to avoid this happening for rows less than 32 bytes long - but that's getting a little too deep.

Anyway, I digress. I want to show you the difference between deleting from a clustered index and from a heap. I'm going to create two such tables, then delete row from each and roll it back.

CREATE TABLE t1 (c1 CHAR (10));
CREATE CLUSTERED INDEX t1c1 on t1 (c1);
GO

CREATE TABLE t2 (c1 CHAR (10));
GO

INSERT INTO t1 VALUES ('PAUL');
INSERT INTO t1 VALUES ('KIMBERLY');

INSERT INTO t2 VALUES ('PAUL');
INSERT INTO t2 VALUES ('KIMBERLY');
GO

-- prevent random background transactions
ALTER DATABASE GhostTest SET AUTO_CREATE_STATISTICS OFF;
GO

BEGIN TRAN DelFromClust;
DELETE FROM t1 WHERE c1='KIMBERLY';
ROLLBACK TRAN;
GO

BEGIN TRAN DelFromHeap;
DELETE FROM t2 WHERE c1='KIMBERLY';
ROLLBACK TRAN;
GO

SELECT * FROM ::fn_dblog (null, null);
GO

Here's a portion of the results from looking in the transaction log. The line of code where I turn off auto-update stats is just to prevent the auto-create transactions from cluttering up my view of the transaction log.

The first (highlighted) transaction is for the delete/rollback in the clustered index. You can clearly see that the third column shows a log context of ghosting for the LOP_DELETE_ROWS log record, plus the setting of the 'this page has at least one ghost record' in the PFS byte for that page.

The second (unhighlighted) transaction is for the delete/rollback in the heap. Here you can see that it just does a straight delete.

If you look at the data page contents before the rollback in both cases, for the clustered index you'll still be able to see the deleted (ghosted) record, and for the heap you'll see the deleted record really is deleted.

Hope this helps.

I'm teaching the Microsoft Certified Masters - Database qualification this week here in Redmond, and in part of day one I discuss the FILESTREAM directory structure. I was asked the question where do the directory name GUIDs come from? so I started digging around in the system tables while Kimberly was lecturing. Take a look at my previous blog post (FILESTREAM directory structure) from last week to see the database schema I'm working with. I recreated it again and wrote some queries to find where the GUIDs are stored, as they have to be stored in the database somewhere.

Here's the query to find all the FILESTREAM directory names, for both levels of directory (and you have to run this through the DAC as it's accessing undocumented, hidden system tables):

SELECT o.name AS [Table],
    cp.name AS [Column],
    p.partition_number AS [Partition],
    r.rsguid AS [Rowset GUID],
    rs.colguid AS [Column GUID]
FROM sys.sysrowsets r
    CROSS APPLY sys.sysrscols rs
    JOIN sys.partitions p ON rs.rsid = p.partition_id
    JOIN sys.objects o ON o.object_id = p.object_id
    JOIN sys.syscolpars cp ON cp.colid = rs.rscolid
WHERE rs.colguid IS NOT NULL AND o.object_id = cp.id
AND r.rsguid IS NOT NULL AND r.rowsetid = rs.rsid;
GO

See below for a screenshot of this using my scenario.

You can see that I'm connected through the admin connection in SSMS and that the top-level directory name is derived from the rowset GUID, with the column-level directory name is derived from the column GUID. Note that some parts are byte-reversed, but they're definitely the right GUIDs.

Enjoy!

There's a long-running discussion with people tagging each other to post advice for people new to SQL Server, about what they know now and wished they'd known ealier in their lives/careers - lot's of SQL MVPs and other luminaries have been doing it and I've been tagged now by my good friend Ward Pond - here's his entry.

The idea is to give two pieces of advice to help people out. My two came to mind almost instantly.

The first is a tenet I live by - there's no fate but what you make. It's actually a quote from the Terminator 2 movie and it basically means that nothing happens to you unless you make it, and you're responsible for your own life. This applies equally to life and to your career.

If you're not in an optimal place in your life for whatever reason (happiness, job, city, partner), then it's up to you to change it. And you should have the confidence to try. Sometimes you might try and fail, but at least you can say to yourself that you've tried. I've changed jobs, cities, and partners a few times each and (luckily for me) it always worked out. Sometimes the change was hard to make, sometimes it wasn't. But I knew it was up to me if I wanted a change so I had no choice but to make it happen or adapt to the current situation.

For your career, and this is where I'm touching on SQL Server, there are some situations that you may get into that are entirely of your own creation. Someone breaks into your server because you didn't check security. The company goes under because the sales database was lost and you didn't provision backups. You get a bonus because you tuned the indexing strategy so performance could handle the big sales weekend. Whatever. I believe that everything that happens to you is from your own making (apart from 'random' things like car accidents). You may say that something happened in your job that you had no control over - but you took that job... It's an interesting way to look at life, but that's what I know now.

The second thing is purely about SQL Server. If you're a DBA, some day in your career you will encounter database corruption and you will be responsible for clearing it up. Don't stick your head in the sand and fool yourself that it won't happen to you. It will. Be prepared. Take backups. Check your backups. Come up with an HA strategy. Practice restoring. And so on. I've been involved in hundreds, maybe more than a thousand, of corruption cases while at Microsoft and beyond. The over-riding take-away for me from that (ongoing) experience? People are unprepared and don't know what to do. That's what I know now.

I hope these are helpful and I've fulfilled my obligation.

I hereby tag: Greg Linwood, Jason Massie, and Bob Beauchemin.

PS Kimberly was tagged - her post is here.

Categories:
General

Yes, it's that time again (well a little bit earlier than usual). With the Spring SQL Server Connections show coming up next week, its time to start planning for the Fall 2009 show. Once again Kimberly and I will be Co-Chairs of the Conference and we'd like to invite you to submit abstracts for the Fall 2009 SQL Connections conference, to be held in Las Vegas, November 9th - 13th, 2009.

The conference will focus heavily on SQL Server 2008. Abstracts are still welcome on best practices for SQL Server 2005 and how to upgrade and migrate applications from SQL Server 200x to SQL Server 2008. However, we will consider all topics - from design to performance to troubleshooting to architectures to programming. Also, we're likely to closely follow our past conference format with three primary themes/tracks: Infrastructure and Scalability, Programming, and Business Intelligence. You should aim for the intermediate to advanced audience for any SQL Server 2005 materials and for SQL Server 2008 we'll consider any level - especially depending on the newness of the release.

For submitting session abstracts, please use this URL: 
http://www.deeptraining.com/devconnections/abstracts

The tool will be open from Monday 3/16  to midnight EST April 5th, after which we won't accept any abstracts. If you have used this site before and have forgotten your password, you can have your secret word mailed to you. It's better to do this then to create a new account.

Please keep the abstracts under 200 words each and in one paragraph. Please do not use bulleted items or line breaks, and please use a spell-checker. If you have an issue with the site, please follow up with an email to paul@sqlskills.com.

Please create a new speaker record (if you haven't submitted before) and submit at least 3 abstracts; however, it will help your chances of being selected if you submit 5 or 6 abstracts. Please do not submit abstracts for sessions that you have previously presented at the Fall show.

What you will get if selected:
- $500 per conference talk. (Additional compensation for pre/post conference workshops.)
- Coach airfare and hotel stay paid by the conference
- Free admission to all of the co-located conferences
- Speaker party
- The adoration of attendees
- etc.
 
Thanks for your continued support for our SQL Connections conference and we look forward to seeing your bright and shiny *new* abstracts!

Thanks,
Kimberly L. Tripp & Paul S. Randal
SQL Connections Conference Chairs

Categories:
Conferences

I'm about to start a new series of post about index fragmentation and removing it. For this week's survey, I'd like to know what index maintenance you do to address fragmentation (in any of its forms) - I'll report on it in a week.

Thanks

Last week I kicked off the first weekly survey - on whether you validate your backups or not (see here for the survey). The results are very interesting (as of 3/13/09):

As you can see, almost 25% of respondents never validate their backups! And a further 25% only validate them occasionally, with 30% doing some kind of regular checks, and only a handful checking all the time.

While these results may seem shocking to you, based on what I've heard when teaching, they're pretty normal. There are lots of reasons why DBAs may choose not to validate backups as often as they should, including:

  • Not enough time to restore the backups to check them
  • Not enough disk space
  • Not part of the day-to-day operations guide
  • Don't see why it's important

Kimberly and I have a saying (well, to be fair, Kimberly coined it): you don't have a backup until you've restored it. You don't know whether the backup you just took was corrupt or not and will actually work in a disaster recovery situation.

Can you ever get a guarantee? No. Here's an analogy, taken from a very old post of mine. Consider Paul, who works for the Seattle Police Department in traffic control. Paul's in a control room somewhere in the city with a large bank of monitors connected to various traffic cameras. Paul's job is to cycle through the cameras every 1/2 hour, looking for traffic accidents. At the end of the 1/2 hour cycle, if Paul han;t seen any accidents then he knows that there are no accidents in the city.

Ah - but hold on. Does Paul really know that? No. All Paul knows is that at the point he looked at a particular camera, there was not an accident at that spot in the city. The very instant he switches to another camera feed, an accident could happen at a spot covered by the previous camera.

The same is true for validating backups. As soon as you've validated a backup, it could then be corrupted by the I/O subsystem, but at least you know that it was valid at some point. But what if that happens, I hear you ask? Well, then you need to have multiple copies of your backups, and you should not rely on backups as the only method of disaster recovery. A good high-availability solution includes as many technologies as you need to mitigate all risks - and backups are just one of those technologies. You're going to have to have some kind of redundant system too that you can fall back on (or mayb even immediately failover to, depending on your particular disaster recovery plan). But, saying that, you can't rely on the redundant server either - if it goes wrong, you'll need your backups.

So - whichever way you look at it, validating backups is a really good practice to get into so you don't get bitten when it comes to the crunch. When I teach, I've got many stories of customers losing data, business, time, and money (and DBAs losing their jobs) because the backups didn't work or were destroyed along with the data. Here's one for you (simplified, and no I won't divulge names etc). Major US investment firm decides to provision new hardware, so takes a backup of the database storing all the 401k accounts for all their customers (private and corporate), flattens the hardware, and goes to restore the backup. The backup is corrupt - on SQL 2000, where there's no RESTORE ... WITH CONTINUE_AFTER_ERROR. What happened? Well, the SQL team and Product Support had to get involved to help get the data back, but people in the firm lost their jobs and it cost a lot of time and money to recover the data. If only they'd had multiple copies of the backup, and tested their backup before removing the database (or better yet, restored the database on the new hardware before flattening the old hardware). They learned a costly lesson, but they did change their practices after that.

Unfortunately this is so often the way - people don't realize they need to validate backups or have an HA plan UNTIL they have a disaster. Then suddenly its the top priority at the company. Being proactive can save a lot of grief, and make you look good when disaster strikes.

Backups can be unusable for a number of reasons, including:

  • The full backup is corrupt, because something in the I/O subsystem corrupted it.
  • A backup in the log backup chain is corrupt, meaning restore cannot continue past that point in the chain.
  • All backups following a full backup are written to the same backup set, but the WITH INIT clause is used accidentally on all backups, meaning the only backup present in the backup set is the last one taken.
  • An out-of-band backup was taken without using the WITH COPY_ONLY clause and the log backup chain was broken (see BACKUP WITH COPY_ONLY - how to avoid breaking the backup chain).
  • The backups worked but the database contained corruption before it was backed up (kind of a separate issue).

The only ones that are out of your control are the first two, but they can be mitigated by having multiples copies of backups. All of these though, can be avoided at disaster recovery time by reguarly restoring your backups as a test of what you'd do if there was a real disaster. You might be surprised what you'd find out...

This is more of an editorial style post than a deep technical or example script post - I'm going to start doing more of these around the weekly surveys. Next post - this week's survey.

Thanks!

Theme design by Nukeation based on Jelle Druyts