Every so often I’ll see posts on the various data corruption forums discussing causes of corruption. In this post I want to debunk some of the myths around what can cause corruption. There are really two types of corruption to deal with, physical corruption and logical corruption.
This is where something has altered the contents of a data or log file sector with no regard for what is being stored there. Possible causes of physical corruption are:
- Problem with the I/O subsystem (99.8% of all cases I’ve ever seen – only 3 nines as I’d estimate I’ve seen around about a thousand corruption cases). Remember the I/O subsystem is everything underneath SQL Server in the I/O stack – including the OS, 3rd-party file system filter drivers, device drivers, RAID controllers, SAN controllers, network hardware, drives themselves, and so on. Millions of lines of code and lots of moving parts spinning very fast, very close to very fragile pieces of metal oxide (I once heard Jim Gray liken a disk drive head to a 747 jumbo jet flying at 500 mph at a height of 1/4 inch from the ground…)
- Problem with the host machine hardware (0.1% of cases). Most of the time this is a memory error.
- SQL Server bugs (0.1% of cases). Yes, there have been corruption bugs. Every piece of software has bugs. There are KB articles describing bugs.
- Deliberate introduction of corruption using a hex editor or other means.
Physical corruption is what DBCC CHECKDB usually reports and the majority of cases are caused by a physical failures of some kind, with the minority caused by humans – software bugs.
This is where something has altered some data so that a data relationship is broken. Possible causes of logical corruption are:
- Application bug. The application deletes one part of an inherent data relationship but not the other. Or the application designer doesn’t implement a constraint properly. Or the application designer doesn’t cope with a transaction roll-back properly. You get the idea.
- Accidental update/delete. Someone deletes or updates some data incorrectly.
- SQL Server bug. See above.
- DBCC CHECKDB when using the REPAIR_ALLOW_DATA_LOSS option. As is documented in Books Online, and I’ve blogged about and mentioned when lecturing, if you run repair, it doesn’t take into account any inherent or explicit constraints on the data.
The point here is that a physical failure of a component does not cause logical corruption, it causes physical corruption. Conversely, application errors cause logical corruption, not physical corruption. DBCC CHECKDB errors are about physical corruption (okay, with the inclusion of DBCC CHECKCATALOG code in 2005, it will find cases where the DBA has manually altered the system tables, causing logical corruption) and applications cannot cause physical corruption as they can only manipulate data through SQL Server. If an application hits a SQL Server bug which causes physical corruption, that’s still not the application causing physical corruption, it’s SQL Server.
So – on to the myths.
- Can an application cause physical corruption? No.
- Can stopping a shrink operation cause corruption of any kind? No.
- Can stopping an index rebuild cause corruption of any kind? No.
- Can running DBCC CHECKDB without repair cause corruption of any kind? No.
- Can creating a database snapshot cause corruption of any kind? No.
Hope this helps.