The Curious Case of… transactions rolling back during DBCC CHECKDB

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

 

Continuing the database snapshot theme from the previous Curious Case post, I had another question from someone who was concerned about transactions rolling back during DBCC CHECKDB. They’d just noticed the messages in the error log saying that when DBCC CHECKDB was executed, it was causing transactions to roll back in the database – and how could that possibly be allowed to happen? They said they panicked and stopped all DBCC CHECKDB executions.

 

There’s no need to panic. The problem is actually a bug in the database snapshot code that’s been there since SQL Server 2005, where it reports the wrong database name.

Let’s try this on SQL Server 2019. First I’ll create a simple database, cycle the error log, and start a transaction:

USE [master];
GO

IF DATABASEPROPERTYEX (N'Company_Snapshot', N'Version') > 0
BEGIN
	DROP DATABASE [Company_Snapshot];
END
GO
IF DATABASEPROPERTYEX (N'Company', N'Version') > 0
BEGIN
	ALTER DATABASE [Company] SET SINGLE_USER
		WITH ROLLBACK IMMEDIATE;
	DROP DATABASE [Company];
END
GO

-- Create a database
CREATE DATABASE [Company];
GO

USE [Company];
GO

CREATE TABLE [t1] ([c1] int);
GO

EXEC sp_cycle_errorlog;
GO

BEGIN TRAN
INSERT INTO [t1] VALUES (1);
GO

Now in a separate window, I’ll create a database snapshot an examine the error log:

CREATE DATABASE [Company_Snapshot]
ON (
	NAME = N'Company',
	FILENAME = N'C:\SQLskills\CompanyData.mdfss')
AS SNAPSHOT OF [Company];
GO
EXEC xp_readerrorlog;
GO

And the pertinent lines from the error log are:

2020-01-15 13:29:30.740 spid58       1 transactions rolled back in database 'Company_Snapshot' (17:0). This is an informational message only. No user action is required.
2020-01-15 13:29:30.740 spid58       Recovery is writing a checkpoint in database 'Company_Snapshot' (17). This is an informational message only. No user action is required.

This is correct; the transaction is being rolled back in the context of the database snapshot, to make it transactionally consistent.

Now let’s try a DBCC CHECKDB:

DBCC CHECKDB (N'Company');
GO
EXEC xp_readerrorlog;
GO

And the pertinent lines from the error log are:

2020-01-15 13:31:52.710 spid58       1 transactions rolled back in database 'Company' (21:0). This is an informational message only. No user action is required.
2020-01-15 13:31:52.900 spid58       DBCC CHECKDB (Company) executed by CRINAN\Paul found 0 errors and repaired 0 errors. Elapsed time: 0 hours 0 minutes 0 seconds.  Internal database snapshot has split point LSN = 00000024:00000087:0001 and first LSN = 00000024:0000003c:0001.

And this is incorrect, and the cause of the concern.

The ‘bug’ is that the name for the internal database snapshot is chosen (by the Database Manager) to be the same as for the real database, and DBCC CHECKDB has no control over that, so it looks like the rollbacks are happening in the real database. But if you look at the database ID in the message, you’ll see that it’s not the same as for the real Company database (which is 5 in this case). So, if you’re ever concerned by a message like this, just look up the database ID of the database you’re running DBCC CHECKDB against and you’ll see that things are fine.

The Curious Case of… a truncated log file after a revert from snapshot

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

I had an email question last week from someone who was very concerned to see a truncated log file after a revert from snapshot. They thought they’d come across a SQL Server bug and wanted to know if I’d seen this before.

Well, the answer is ‘yes’, because that’s how SQL Server has behaved since SQL Server 2005 introduced database snapshots. Whenever you perform a revert to snapshot, the log file is discarded and a new one created with two 256KB VLFs.

This behavior is by design, but understandably surprises people. The idea is that the log does have to be reset during a revert, because the database is being wound backwards in time physically, rather than by rolling back transactions. You’d think that SQL Server would recreate the log at the size it was previously, but the thinking was (when we built the feature) that if the previous log size was very large, recreating it would potentially take a long time, as the log must be zero-initialized on first allocation in the file system. So, it just whacks the log down to two small VLFs, which means after a revert you’ll need to manually grow the log again.

Here’s an example using SQL Server 2019:

USE [master];
GO

IF DATABASEPROPERTYEX (N'SalesDB_Snapshot', N'Version') > 0
	DROP DATABASE [SalesDB_Snapshot];
GO

RESTORE DATABASE SalesDB
	FROM DISK = N'D:\SQLskills\DemoBackups\SalesDB2014.bak'
WITH STATS = 10, REPLACE;
GO

-- Create the snapshot
CREATE DATABASE [SalesDB_Snapshot]
ON (
	NAME = N'SalesDBData',
	FILENAME = N'C:\SQLskills\SalesDBData.mdfss')
AS SNAPSHOT OF [SalesDB];
GO

Now, looking at the log file size (using the old DBCC commands instead of the new DMVs so people playing with this on older versions can follow along) and some of the fields removed from the results for clarity:

DBCC LOGINFO (N'SalesDB');
DBCC SQLPERF (LOGSPACE);
GO
RecoveryUnitId FileId      FileSize             
-------------- ----------- -------------------- 
0              2           10420224             
0              2           10420224             
0              2           10420224             
0              2           10674176             

Database Name Log Size (MB) Log Space Used (%) Status
------------- ------------- ------------------ -----------
master        1.992188      43.77451           0
tempdb        7.992188      6.042278           0
model         19.61719      6.586021           0
SalesDB       39.99219      18.78174           0

Four VLFs of about 10MB each in the SalesDB database. Now I’ll revert to the snapshot and check again:

RESTORE DATABASE [SalesDB]
FROM DATABASE_SNAPSHOT = N'SalesDB_Snapshot';
GO

DBCC LOGINFO (N'SalesDB');
DBCC SQLPERF (LOGSPACE);
GO
RecoveryUnitId FileId      FileSize             
-------------- ----------- -------------------- 
0              2           253952
0              2           253952           

Database Name Log Size (MB) Log Space Used (%) Status
------------- ------------- ------------------ -----------
master        1.992188      43.77451           0
tempdb        7.992188      6.042278           0
model         19.61719      6.586021           0
SalesDB       0.484375      60.28226           0

You can see that the log has been reset to two 256KB VLFs.

Bottom line: it’s expected behavior you need to be aware of. It’s also a neat way to reset the VLFs in a log if someone created the initial log to be enormous and you want to resize it to be a lot smaller, as there’s no other way of reducing the size of the first couple of VLFs.

The Curious Case of… emergency-mode repair

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

I had a blog comment question a few days ago that asked why emergency-mode repair requires the database to be in EMERGENCY mode as well as SINGLE_USER mode.

All repair operations that DBCC CHECKDB (and related commands) performs require the database to be in single-user mode so there’s a guarantee that nothing can be changing while the checks and repairs are done. But that doesn’t change the behavior of what repair does – that needs emergency mode too.

Emergency mode is internally known as ‘bypass recovery’ mode, and is used when a database hasn’t had crash recovery run on it, because of corruption issues or a missing or damaged log file. It allows access to the database in the unrecovered state, which may mean the database has inconsistencies in its structures and/or data because of in-flight transactions that have not been rolled back (as recovery couldn’t run).

When a database is in emergency mode, it’s not possible to create a database snapshot (as that involves running recovery) and so a DBCC CHECKDB without a repair option simply treats the database as if it’s read-only, and runs the consistency checks. When a repair option *is* specified (and only REPAIR_ALLOW_DATA_LOSS is permitted) in emergency mode, that tells DBCC CHECKDB that the drastic, last-resort emergency mode repair should be performed. This will:

  • Run as much crash recovery as possible, skipping errors, unlike real crash recovery which will fail when it encounters an error
  • Delete the transaction log
  • Create a new transaction log
  • Run a regular REPAIR_ALLOW_DATA_LOSS check and repair
  • Bring the database online if possible, albeit maybe with corruptions still in the database

So emergency mode isn’t single-user mode, and vice versa – they’re both required for an emergency-mode repair to run.

You can read more about emergency-mode repair in my blog post EMERGENCY-mode repair: the very, very last resort.