This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!
If you’ve been following along with our Accidental DBA series, you’ll know that the posts for the last week covered topics related to one of the most important tasks (if not the most important) for a DBA: backups. I consider consistency checks, often referred to as CHECKDB, as one of the next most important tasks for a DBA. And if you’ve been a DBA for a while, and if you know how much I love statistics, you might wonder why fragmentation and statistics take third place. Well, I can fix fragmentation and out-of-date/inaccurate statistics at any point. I can’t always “fix” corruption. But let’s take a step back and start at the beginning.
What are consistency checks?
A consistency check in SQL Server verifies the logical and physical integrity of the objects in a database. A check of the entire database is accomplished with the DBCC CHECKDB command, but there are other variations that can be used to selectively check objects in the database: DBCC CHECKALLOC, DBCC CHECKCATALOG, DBCC CHECKTABLE and DBCC CHECKFILEGROUP. Each command performs a specific set of validation commands, and it’s easy to think that to in order to perform a complete check of the database you need to execute all of them. This is not correct.
When you execute CHECKDB, it runs CHECKALLOC, CHECKTABLE for every table and view (system and user) in the database, and CHECKCATALOG. It also includes some additional checks, such as those for Service Broker, which do not exist in any other command. CHECKDB is the most comprehensive check and is the easiest way to verify the integrity of the database in one shot. You can read an in-depth description of what it does from Paul, it’s author, here.
CHECKFILEGROUP runs CHECKALLOC and then CHECKTABLE for every table in the specified filegroup. If you have a VLDB (Very Large DataBase) you may opt to run CHECKFILEGROUP for different filegroups on different days, and run CHECKCATALOG another day, to break up the work.
How often should I run Consistency Checks?
If you can run a consistency check every day for your database, I recommend that you do so. But it’s quite common that a daily execution of CHECKDB doesn’t fit into your maintenance window – see Paul’s post on how often most people do run checks. In that case, I recommend you run your checks once a week. And if CHECKDB for your entire database doesn’t complete in your weekly maintenance window, then you have to figure out what’s possible within the time-frame available. I mentioned VLDBs earlier, and Paul has a nice post on options for breaking up checks for large database. You will have to determine out what works best for your system – there isn’t a one-size-fits-all solution. You may need to get creative, which is one of the fun aspects of being DBA. But don’t avoid running consistency checks simply because you have a large database or a small maintenance window.
Why do I need to run consistency checks?
Consistency checks are critical because hardware fails and accidents happen. The majority of database corruption occurs because of issues with the I/O subsystem, as Paul mentions here. Most of the time these are events that are out of your control, and all you can do is be prepared. If you haven’t experienced database corruption yet in your career, consider yourself lucky, but don’t think you’re exempt. It’s much more common that many DBAs realize and you should expect that it’s going to occur in one of your databases, on a day that you have meetings booked back-to-back, need to leave early, and while every other DBA is on vacation.
What if I find corruption?
If you encounter database corruption, the first thing to do is run DBCC CHECKDB and let it finish. Realize that a DBCC command isn’t the only way to find corruption – if a page checksum comes up as invalid as part of a normal operation, SQL Server will generate an error. If a page cannot be read from disk, SQL Server will generate an error. However it’s encountered, make sure that CHECKDB has completed and once you have the output from it, start to analyze it (it’s a good idea to save a copy of the output). Output from CHECKDB is not immediately intuitive. If you need help reviewing it, post to one of the MSDN or StackOverflow forums, or use the #sqlhelp hashtag on Twitter.
Understand exactly what you’re facing in terms of corruption before you take your next step, which is deciding whether you’re going to run repair or restore from backup. This decision depends on numerous factors, and this is where your disaster recovery run-book comes into play. Two important considerations are how much data you might lose (and CHECKDB won’t tell you what data you will lose if you run repair, you’ll have to go back and try to figure that afterwards) and how long the system will be unavailable – either during repair or restore. This is not an easy decision. If you decide to repair, make certain you take a full backup of the database first. You always want a copy of the database, just in case. I would also recommend that if you decide to run repair, run it against a copy of the database first, so you can see what it does. This may also help you understand how much data you would lose. Finally, after you’ve either run repair or restored from backup, run CHECKDB again. You need to confirm that the database no longer has integrity issues.
Please understand that I have greatly simplified the steps to go through if you find corruption. For a deeper understanding of what you need to consider when you find corruption, and options for recovering, I recommend a session that Paul did a few years ago on Corruption Survival Techniques, as what he discussed still holds true today.
What about CHECKIDENT and CHECKCONSTRAINTS?
There are two additional DBCC validation commands: DBCC CHECKIDENT and DBCC CHECKCONSTRAINTS. These commands are not part of the normal check process. I blogged about CHECKIDENT here, and you use this command to check and re-seed values for an identity column. CHECKCONSTRAINTS is a command to verify that data in a column or table adheres to the defined constraints. This command should be run any time you run CHECKDB with the REPAIR_ALLOW_DATA_LOSS option. Repair in DBCC will fix corruption, and it doesn’t take constraints into consideration; it just alters data structures as needed so that data can be read and modified. As such, after running repair, constraint violations can exist, and you need to run CHECKCONSTRAINTS for the entire database to find them.
What’s next?
We’ve only scratched the surface of consistency checking. This is a topic worthy of hours of discussion – not just in the how and why, but also what to do when corruption exists. If you plan on attending our Immersion Event for the Accidental DBA, and want to get a jump on the material, I recommend reading through the posts to which I’ve linked throughout, and also going through Paul’s CHECKDB From Every Angle blog category, starting with the oldest post and working your way forward. Hopefully your experience with database corruption will be limited to testing and what you hear about from colleagues…but don’t bet on it 🙂
Our online training (Pluralsight) courses that can help you with this topic: