This is a question that came up yesterday in our Disaster Recovery class so I’m typing it up in between attending sessions at Microsoft Day here at the conference. It’s an interesting experience watching all the Microsoft speakers walking around in the distinctive blue shirts and no longer having to wear one myself.
The question is the following – why does DBCC CHECKDB terminate with an out-of-space error in SQL Server 2005? A customer had a 500GB database spread over 17 LUNs on a SAN, with each LUN having only 5GB free. There’s a heavy concurrent workload running while DBCC CHECKDB is running and very often it doesn’t complete but instead stops with an error. What’s going on?
The reason for this is the way that DBCC CHECKDB gets a transactionally consistent view of the database. In a nutshell, it creates a internal database snapshot of the database and then checks the database snapshot (you can read more about this in my previous post detailing all the steps of DBCC CHECKDB). A database snapshot needs to have one snapshot file for each file in the source database. In the case of DBCC CHECKDB, the snapshot files are created in the same place as the existing database files – i.e. stored on the same disk volume – and you have no control over this. This means any changes to the database while DBCC CHECKDB is running will cause these snapshot files to grow.
If there’s a significant concurrent workload while DBCC CHECKDB is running, then these snapshot files can get very big very quickly. In the case described in the question, the workload caused one of the snapshto files to take up all available space on the LUN and then it ran out of space. When this happens the snapshot is no longer valid and so DBCC CHECKDB has to stop.
There are two solutions to this. The obvious first one is to run DBCC CHECKDB in a period with low concurrent workload. That’s not feasible for many 24×7 shops so the better solution is to create your own database snapshot (so you can control the placement of it) and then run DBCC CHECKDB on that. This is no different than having DBCC CHECKDB create its own snapshot.
There’s another reason that DBCC CHECKDB may stop with an out-of-space error. DBCC CHECKDB has to validate the contents of the whole database. As such it needs to store info about things it has seen at one point during database scanning so that it can match that info against things it sees at later points during database scanning. These bits of info are stored in an in-memory worktable. In many cases though, the amount of info being stored exceeds the available memory and the worktable needs to spill over to tempdb. (For example, running DBCC CHECKDB against a 1TB database on a machine with 16GB of memory – the amount of info that DBCC CHECKDB needs to store the intermediate pieces of info will likely exceed the memory available to SQL Server). So – if tempdb isn’t big enough to store the worktable, it has to grow. If it can’t grow, then DBCC CHECKDB will fail because it needs the worktable to operate.
You can check in advance how much (estimated) space will be needed by DBCC CHECKDB by running it using the WITH ESTIMATEONLY option (broken in 2008 R2, fixed in 2012 SP2 and 2014) and then making sure that tempdb is sized accordingly to accommodate the DBCC CHECKDB requirements AND the regular tempdb space requirements. See Capacity Planning for tempdb in Books Online for more info.