This is a question that came up yesterday in out Disaster Recovery class so I’m typing it up in between attending sessions at Microsoft Day here at SQL Connections. It’s an interesting experience watching all the MS speakers walking around in the distinctive blue shirts and no longer having to wear one myself.
The question is the following – why does DBCC CHECKDB terminate with an out-of-space error in SS2005? A customer had a 500GB database spread over 17 LUNs on a SAN, with each LUN having only 5GB free. There’s a heavy concurrent workload running while CHECKDB is running and very often it doesn’t compelte but instead stops with an error. What’s going on?
The reason for this is the way that CHECKDB gets a transactionally consistent view of the database. In a nutshell, it creates a internal database snapshot of the database and then checks the database snapshot (you can read more about this in my previous post detailing all the steps of CHECKDB). A database snapshot needs to have one snapshot file for each file in the source database. In the case of CHECKDB, the snapshot files are created as alternate-streams of the existing database files – i.e. stored on the same disk volume – and you have no control over this. This means any changes to the database while CHECKDB is running will cause these alternate streams to grow.
If there’s a significant concurrent workload while CHECKDB is running, then these alternate streams can get very big very quickly. In the case described in the question, the workload caused one of the alternate streams to take up all available space on the LUN and then it ran out of space. When this happens the snapshot is no longer valid and so CHECKDB has to stop.
There are two solutions to this. The obvious first one is to run CHECKDB in a period with low concurrent workload. That’s not feasible for many 24×7 shops so the better solution is to create your own database snapshot (so you can control the placement of it) and then run CHECKDB on that. This is no different than having CHECKDB create its own snapshot.
There’s another reason that CHECKDB may stop with an out-of-space error. CHECKDB has to validate the contents of the whole database. As such it needs to store info about things it has seen at one point during database scannig so that it can match that info against things it sees at later points during database scanning. These bits of info are stored in an in-memory worktable. In many cases though, the amount of info being stored exceeds the available memory and the worktable needs to spill over to tempdb. (For example, running CHECKDB against a 1TB database on a machine with 16GB of memory – the amount of info that CHECKDB needs to store the intermediate pieces of info will likely exceed the memory available to SQL Server). So – if tempdb isn’t big enough to store the worktable, it has to grow. If it can’t grow, then CHECKDB will fail because it needs the worktable to operate.
You can check in advance how much (estimated) space will be needed by CHECKDB by running it using the WITH ESTIMATEONLY option and then making sure that tempdb is sized accordingly to accomodate the CHECKDB requirements AND the regular tempdb space requirements. See Capacity Planning for tempdb in Books Online for more info.