Every so often I’ll see posts on the various data corruption forums discussing causes of corruption. In this post I want to debunk some of the myths around what can cause corruption. There are really two types of corruption to deal with, physical corruption and logical corruption.
Physical corruption
This is where something has altered the contents of a data or log file sector with no regard for what is being stored there. Possible causes of physical corruption are:
- Problem with the I/O subsystem (99.8% of all cases I’ve ever seen – only 3 nines as I’d estimate I’ve seen around about a thousand corruption cases). Remember the I/O subsystem is everything underneath SQL Server in the I/O stack – including the OS, 3rd-party file system filter drivers, device drivers, RAID controllers, SAN controllers, network hardware, drives themselves, and so on. Millions of lines of code and lots of moving parts spinning very fast, very close to very fragile pieces of metal oxide (I once heard Jim Gray liken a disk drive head to a 747 jumbo jet flying at 500 mph at a height of 1/4 inch from the ground…)
- Problem with the host machine hardware (0.1% of cases). Most of the time this is a memory error.
- SQL Server bugs (0.1% of cases). Yes, there have been corruption bugs. Every piece of software has bugs. There are KB articles describing bugs.
- Deliberate introduction of corruption using a hex editor or other means.
Physical corruption is what DBCC CHECKDB usually reports and the majority of cases are caused by a physical failures of some kind, with the minority caused by humans – software bugs.
Logical corruption
This is where something has altered some data so that a data relationship is broken. Possible causes of logical corruption are:
:-) Okay…
- Application bug. The application deletes one part of an inherent data relationship but not the other. Or the application designer doesn’t implement a constraint properly. Or the application designer doesn’t cope with a transaction roll-back properly. You get the idea.
- Accidental update/delete. Someone deletes or updates some data incorrectly.
- SQL Server bug. See above.
- DBCC CHECKDB when using the REPAIR_ALLOW_DATA_LOSS option. As is documented in Books Online, and I’ve blogged about and mentioned when lecturing, if you run repair, it doesn’t take into account any inherent or explicit constraints on the data.
The point here is that a physical failure of a component does not cause logical corruption, it causes physical corruption. Conversely, application errors cause logical corruption, not physical corruption. DBCC CHECKDB errors are about physical corruption (okay, with the inclusion of DBCC CHECKCATALOG code in 2005, it will find cases where the DBA has manually altered the system tables, causing logical corruption) and applications cannot cause physical corruption as they can only manipulate data through SQL Server. If an application hits a SQL Server bug which causes physical corruption, that’s still not the application causing physical corruption, it’s SQL Server.
So – on to the myths.
- Can an application cause physical corruption? No.
- Can stopping a shrink operation cause corruption of any kind? No.
- Can stopping an index rebuild cause corruption of any kind? No.
- Can running DBCC CHECKDB without repair cause corruption of any kind? No.
- Can creating a database snapshot cause corruption of any kind? No.
Hope this helps.
5 Responses to Search Engine Q&A #26: Myths around causing corruption
[...] Myths around what can and cannot cause corruption [...]
[...] Lastly, there's a pervasive myth that interrupting a long-running operation (like shrink, index rebuild, bulk load) can cause corruption. No. Unless there's a corruption bug in SQL Server (which happens sometimes, but rarely), nothing you can do from T-SQL can cause corruption. I wrote a detailed blog post on this a couple of years ago – see Search Engine Q&A #26: Myths around causing corruption. [...]
[...] Basically you need to run regular consistency checks. There's a myth that you don't need to run consistency checks – this was perpetuated by various marketing folks when SQL 7.0 shipped, because SQL 6.5 used to cause allocation corruptions and the rewrite for 7.0 removed all the corruption problems. Now, of course there have been bugs in SQL Server that cause problems, but they account for a tiny fraction of the corruptions out there. Nearly all corruptions are caused by something going wrong in the I/O subsystem – and you can't predict when that will or won't happen. Jim Gray once likened the disk heads in a hard drive as akin to a 747 flying at 500mph 1/4 inch above the ground – scary stuff. You DO need to run consistency checks, because corruptions do happen. You can read more about the causes of corruptions in this blog post: Search Engine Q&A #26: Myths around causing corruption. [...]
[...] Paul’s blog post on Myths around causing corruption – so that you can get better insight into where/why the actual corruptions are [...]
[...] And while we're on the subject, check out this post: Search Engine Q&A #26: Myths around causing corruption. [...]