A SQL Server DBA myth a day: (14/30) clearing the log zeroes out log records

(Look in the Misconceptions blog category for the rest of the month’s posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together when you join our Insider list, plus my online Myths and Misconceptions training course on Pluralsight.)

Today is the big day – 8am PST I’m presenting my DBA Mythbusters session at SQL Connections for the first time. I’ll hopefully do it as a longer Spotlight Session at SQL PASS this Fall too.

So a little quickie today as I’m doing 4 sessions in a row.

Myth #14: clearing the log zeroes out log records.

FALSE

The transaction log is *always* zero initialized when first created, manually grown, or auto-grown. Do not confuse this with the process of clearing the log during regular operations. That simply means that one or more VLFs (Virtual Log Files) are marked as inactive and able to be overwritten. When log clearing occurs, nothing is cleared or overwritten. ‘Clearing the log’ is a very confusing misnomer. It means the exact same as ‘truncating the log’, which is another unfortunate misnomer, because the size of the log doesn’t change at all.

You can read more about zero initialization of the log in my blog post Search Engine Q&A #24: Why can’t the transaction log use instant initialization? and about how log clearing works in my TechNet Magazine article from February 2009: Understanding Logging and Recovery in SQL Server.

You can prove this to yourself using trace flag 3004. Turning it on will let you see when SQL Server is doing file zeroing operations (as I described in A SQL Server DBA myth a day: (3/30) instant file initialization can be controlled from within SQL Server). Turn it on and watch for messages coming during the day – you shouldn’t see anything unless the log grows.

Here’s a script to show you what I mean:

DBCC TRACEON (3004, 3605);
GO

-- Create database and put in SIMPLE recovery model so the log will clear on checkpoint
CREATE DATABASE [LogClearTest] ON PRIMARY (
    NAME = N'LogClearTest_data', FILENAME = N'D:\SQLskills\LogClearTest_data.mdf')
LOG ON (
    NAME = N'LogClearTest_log', FILENAME = N'D:\SQLskills\LogClearTest_log.ldf',
SIZE = 20MB);
GO

-- Error log mark 1
ALTER DATABASE [LogClearTest] SET RECOVERY SIMPLE;
GO

USE [LogClearTest];
GO

-- Create table and fill with 10MB - so 10MB in the log
CREATE TABLE [t1] ([c1] INT IDENTITY, [c2] CHAR (8000) DEFAULT 'a');
GO
INSERT INTO [t1] DEFAULT VALUES;
GO 1280

-- Clear the log
CHECKPOINT;
GO

-- Error log mark 2
ALTER DATABASE [LogClearTest] SET RECOVERY SIMPLE;
GO

And in the error log I see:

2010-04-13 13:20:27.55 spid53      DBCC TRACEON 3004, server process ID (SPID) 53. This is an informational message only; no user action is required.
2010-04-13 13:20:27.55 spid53      DBCC TRACEON 3605, server process ID (SPID) 53. This is an informational message only; no user action is required.
2010-04-13 13:20:27.63 spid53      Zeroing D:\SQLskills\LogClearTest_log.ldf from page 0 to 2560 (0x0 to 0x1400000)
2010-04-13 13:20:28.01 spid53      Zeroing completed on D:\SQLskills\LogClearTest_log.ldf
2010-04-13 13:20:28.11 spid53      Starting up database 'LogClearTest'.
2010-04-13 13:20:28.12 spid53      FixupLogTail() zeroing D:\SQLskills\LogClearTest_log.ldf from 0x5000 to 0x6000.
2010-04-13 13:20:28.12 spid53      Zeroing D:\SQLskills\LogClearTest_log.ldf from page 3 to 63 (0x6000 to 0x7e000)
2010-04-13 13:20:28.14 spid53      Zeroing completed on D:\SQLskills\LogClearTest_log.ldf
2010-04-13 13:20:28.16 spid53      Setting database option RECOVERY to SIMPLE for database LogClearTest.
2010-04-13 13:20:29.49 spid53      Setting database option RECOVERY to SIMPLE for database LogClearTest.

The two ALTER DATABASE commands serve as markers in the error log. There’s clearly no zeroing occurring from the CHECKPOINT between the two ALTER DATABASE commands. To further prove to yourself, you can add in calls to DBCC SQLPERF (LOGSPACE) before and after the CHECKPOINT, to show that the log is clearing when the checkpoint occurs (watch the value in the Log Space Used (%) column decrease).

Now it’s session time here in Vegas! (And there’s no-one to blame but yourself if you’re the Conference Chair and you’ve got an 8am session! :-)

A SQL Server DBA myth a day: (3/30) instant file initialization can be controlled from within SQL Server

(Look in the Misconceptions blog category for the rest of the month’s posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together when you join our Insider list, plus my online Myths and Misconceptions training course on Pluralsight.)

A bit of a shorter one for today as we’re flying home from Boston to Seattle – home for a week then off again to SQL Connections in Las Vegas. It’s all go…

Myth #3: Instant file initialization can be a) enabled and b) disabled from within SQL Server.

a) FALSE and b) TRUE, respectively.

Instant file initialization is a little-known feature of SQL Server 2005 onwards that allows data files (only, not log files) to skip the usual zero initialization process that takes place. It’s a fabulous way to reduce downtime when a disaster occurs and you have to restore a database from scratch – as the new data files that are created don’t spend (potentially) hours being zero’d before the actual restore operation can take place.

I’ve done a blog post about instant file initialization misconceptions before (see Misconceptions around instant initialization) but that didn’t cover this aspect of the feature.

You *cannot* enable it from within SQL Server. SQL Server does a one-time check at startup whether the SQL Server service account possesses the appropriate Windows permission (Perform Volume Maintenance Tasks a.k.a. SE_MANAGE_VOLUME_NAME) and then instant file initialization is enabled for that instance. Kimberly’s excellent blog post Instant Initialization – What, Why, and How has the details on how to enable the feature (and a lot more besides).

You *can* check from within SQL Server to see if it’s running. Enable trace flag 3004 (and 3605 to force the output to the error log) and then create a database. In the error log you’ll see messages indicating that the log file is being zero initialized. If instant file initialization is NOT enabled, you’ll also see messages about the data file being zero initialized.

You *can* disable instant file initialization from within SQL Server, albeit only temporarily. Turning on trace flag 1806 will disable instant file initialization while the trace flag is enabled. To turn it off permanently, you’ll need to remove the security permission from the SQL Server service account.

These two trace flags were first documented in the SQL Server Premier Field Engineer Blog by MCMs Cindy Gross (Twitter) and Denzil Ribeiro – see their post How and Why to Enable Instant File Initialization.

If you’re able to – turn this feature on!

Importance of data file size management

In last week's survey I asked how you manage the size of your database *data* files – see here for the survey. Here are the results as of 6/24/09.

The 'other' values are as follows:

  1. 5 x 'manual file growths and a custom mom alter to when the datafiles are 98% full. autogrow set to fixed amount in case we miss the mom e-mail'
  2. 1 x 'auto-grow with a procedure to keep the log file size at 20% relative to the total data file size'
  3. 1 x 'Create with very large initial file size, set auto-growth to %'
  4. 1 x 'Lots of white space in files. Auto-grow to a fixed size (in case of emergency).'
  5. 1 x 'set initial size for 1 yr usage, monitor size, manual grow, autogrow percentage – send alert if it does grow'
  6. 1 x 'set initial size for 2 year growth, capture growth stats daily, monitor physical disk space daily'
  7. 1 x 'Set to autogrow by fixed size to cater for emergencies, otherwise maintain 80-90% free space by daily reporting then manual off-peak size increase if necessary.'

As I mentioned in the survey itself, this is just about database *data* files. I covered log file size management in a previous survey – see Importance of proper transaction log size management.

There are really four parts to data file size management, which I'll discuss below.

The first thing I'll say is that if you're able to, enable instant file initialization on 2005+ – as it will vastly reduce the time required to create data files, grow data files, and restore backups (if the restore has to create data files). We're talking minutes/hours down to single-digit seconds. See Misconceptions around instant file initialization. If you're not one of the miniscule fraction of a percent of customers who have volumes shared between secure file servers and SQL Server instances, turn this on ASAP. Most DBAs don't know about this feature, but everyone I teach it to or show it to are amazed and then go turn it on. No brainer.

Initial data file sizing. This can be tricky. Without getting into the how-to-calculate-the-database-size quagmire, I'll simply say that you should provision as much space as you can, based upon your sizing estimates. Don't just provision for the here-and-now – if you're able to, provision space for the next year's worth of anticipated growth – to prevent auto-growth having to kick-in at all. I rewrote all the Books Online pages for 2005 (links are to the 2008 BOL) on Estimating the Size of Heaps, Clustered Indexes, and Nonclustered Indexes and in the blog post Tool for estimating the size of a database I link to a tool someone put together that codified all my formulas. You can also get sizing tools from hardware vendors too.

Data file growth. If you're able to, auto-grow should ALWAYS be turned on, as an emergency measure in case your monitoring fails – you don't want the database to have to grow but it's unable to and then it stops and the application is offline. However, you shouldn't *rely* on auto-grow – it's just for emergencies. The auto-growth default for data files used to be 10% for 2000 and before, but then changed to 1MB from 2005 onwards (log file default auto-growth remained at 10%). Neither of these are good choices. A percentage-based auto-growth means that as your files get bigger, so does the auto-growth, and potentially the time it takes if you don't have instant file initialization enabled. A 1MB autogrowth is just nonsensical. Your auto-growth should be set to a fixed size – but what that should be is a great big 'it depends'. You need to decide whether you want the auto-growth to be a quick stop-gap, or whether the auto-growth will replace manual growth after monitoring. You also need to consider how long the auto-growth will take, especially without instant file initialization. I can't give any guidance here as to what a good number is, but I'd probably settle on something around 10% (fixed), with the percentage steadily falling as the database size grows. It's very important that you have alerts setup to you can tell when auto-growth does occur, so you can then take any necessary action to grow it even more or tweak your settings.

'Other' response #2 is interesting. There's been a 'best-practice' around for a while that the log file should be sized to be a percentage of the data file size. It's totally unfounded and in most cases bears no relation to reality. The vast majority of the time, the size of the log is *NOT* dependent on the data file sizes in any way. Imagine a 10TB database – would you provision a 2TB log? Of course not. Now, I can see special cases where the operations performed on the tables in the database might affect a fixed portion of the largest table in a single batch, and that could generate enough log (with reserved space too) to equal 20% of the data file size – but that's a pretty nonsensical special case, to be honest. You shouldn't use 'set the log as a percentage of the data file' as a guideline.

Data file size/usage monitoring. There's a growing movement towards monitoring the data file usage and manually growing the files as they approach 100% full – avoiding auto-growth altogether, but still having it enabled for emergencies. In my book, this is the best way to go as you have all the control over what happens and more importantly, when it happens – especially without instant file initialization. There are some quirks here though. SCOM, for instance, has logic that disables file size and usage monitoring if you enable auto-grow. It assumes that if you enable auto-grow then you're not interested in monitoring. I happened to have one of the SCOM devs in my last maintenance class I taught on the Redmond MS campus and he's going to try to get that logic fixed.

Data file shrinking. Just this morning I wrote a long blog post about this – see Why you should not shrink your data files. Running data file shrink causes index fragmentation, uses lots of resources, and the vast majority of the time when people use it, is unnecessary and wasteful. It should NEVER be part of a regular maintenance plan – as you get into the shrink-grow-shrink-grow cycle which becomes a zero-sum game with a ton of transaction log being generated. Think about this – if you run a shrink, which is fully logged, then all the log has to be backed up, log-shipped, database mirrored, scanned by the replication log reader agent, and so on. And then the database will probably auto-grow again through normal operation, or some part of the maintenance job that rebuilds indexes. And then you shrink it again. And the cycle continues…

Bottom line – make sure you size the data files with some free space, have auto-growth set appropriately, have instant file initialization enabled if you can, monitor file sizes and usage, alert on auto-grows, and don't use shrink. And don't use shrink. Really.

Next up – this week's survey!