This is a quick post to let you know about a bug that a few people are hitting when running DBCC CHECKDB. The symptoms are a series of errors in the error log plus SQL Server forcibly shuts itself down and restarts. I’ve heard of people hitting the bug on SQL Server 2014 and SQL Server 2012 SP1.
[Update 2/24/15] Microsoft has confirmed that it’s a bug in 2014 and 2012 and they’re planning a fix for the next CU of both.
The error log symptoms look something like I show below:
2014-11-18 09:04:15.69 spid64 The operating system returned error 665(The requested operation could not be completed due to a file system limitation) to SQL Server during a write at offset 0x00001EE6FD8000 in file 'C:\SQLskills\\Company_file2.ndf:MSSQL_DBCC23'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online. 2014-11-18 09:04:15.72 spid64 Error: 3314, Severity: 17, State: 3. 2014-11-18 09:04:15.72 spid64 During undoing of a logged operation in database 'Company', an error occurred at log record ID (887654:3321:14). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database. 2014-11-18 09:04:34.38 spid64 Error: 831, Severity: 20, State: 1. 2014-11-18 09:04:34.38 spid64 Unable to deallocate a kept page. 2014-11-18 09:04:34.40 spid64 Error: 3449, Severity: 21, State: 1. 2014-11-18 09:04:34.40 spid64 SQL Server must shut down in order to recover a database (database ID 23). The database is either a user database that could not be shut down or a system database. Restart SQL Server. If the database fails to recover after another startup, repair or restore the database.
The 665 error is from the snapshot file that DBCC CHECKDB creates hitting an NTFS limitation on the number of file fragments in a sparse file. This causes the snapshot creation to fail. The failure causes the undo of a log record in the snapshot to fail (remember that a database snapshot undergoes crash recovery to make it transactionally consistent). This failure then leads to SQL Server thinking it has to forcibly restart to recover the snapshot database, which is should never do for a snapshot – and that’s the bug.
I’ll update this post as soon as I hear about the builds that the fix is in.
This is a rare bug to hit, but it’s a regression (from builds people are reporting), and you can help yourself to avoid it by:
- Creating your own database snapshot, on a volume without file-system free space fragmentation, and running DBCC CHECKDB against the snapshot
- OR, trying to run DBCC CHECKDB when there isn’t a significant amount of change occurring in the database, so the database snapshot doesn’t have to become very large
- OR, using the backup-copy-restore-check method of running DBCC CHECKDB on a restored backup of the database on another server