Multiple log files and why they’re bad

About a month ago I kicked off a survey with some code to run to figure out how many log files your databases have (see here). There are all kinds of misconceptions about transaction logs and how to configure them (and how they work) and I’m amazed at the misinformation I continue to see published. For instance, a few weeks back I skimmed through a video that stated in multiple places that the default transaction log growth rate is 10MB – it’s not, it’s 10% and has been since SQL Server 2005.

I got data back from 1300 SQL Server instances across the world from 75 people (thanks!), with an average of 18 databases per instance including system databases (which jumped to an average of 29 per instance if I included 4 anomalous instances with many thousands of databases each).

Out of all those instances, only 32 had databases with more than one log file:

  • 23 instances with one database with two log files
  • 3 instances with two databases with two log files
  • 1 instance each with 3, 4, 8, 9, and 27 databases with two log files
  • 2 instances with one database with four log files

So 2.5% of instances surveyed had at least one database with more than one log file.

I think I’m pleased about that as I expected the number to be higher, but I also suspect I’m equating poorly configured VLF totals with general log mis-configuration, and a higher percentage of systems out there have transaction logs with too many VLFs. Let’s settle for pleased :-)

But why do I care? And why should you care? Although it seems like there’s nothing damaging about having multiple log files, I don’t think that’s true.

Firstly, having multiple log files implies that the first one ran out of space and because the second one still exists, the first one might still be pretty large (maybe the second one is large too!). I don’t really care about that for crash recovery (which is bounded by how much un-recovered log there is), or performance of HA technologies like database mirroring, Availability Groups, or replication, which are bounded by transaction log generation rate, not size.

What I care about is performing a restore during disaster recovery. If the log files don’t exist, they must be created and zero-initialized, and twice if you restore a diff backup too as both the full and diff restores zero out the log. If the first log file is as big as it can be, and there’s a second log file still, that’s potentially a lot of log file to zero initialize, which translates into more downtime during disaster recovery.

I would much rather that all the log files apart from the first one are removed once they’re no longer needed, by simply waiting until all the active VLFs (marked with status 2 in the output from DBCC LOGINFO – see here) are in the first log file and then simply doing an ALTER DATABASE and removing the extra file, and then reducing the size of the remaining log file to something reasonable.

Which brings me to the second thing I care about: why was the extra log file needed in the first place? Why did the transaction log run out of space, necessitating creating another log file? That’s the only explanation I can think of for having more than one log file as there is no performance gain from multiple log files – SQL Server will write to them sequentially, never in parallel (Jonathan demonstrates this neatly with Extended Events here.)

(I like to draw a parallel here with page splits. People fixate on the fact that they’ve got fragmentation, not the massive performance issue that created the fragmentation in the first place – page splits themselves!)

I blogged about the Importance of proper transaction log file size management more than three years ago (and here five years back), and many others have blogged about it too, but it’s still one of the most common problems I see. Log growth can easily be monitored using the Log Growths performance counter in the Databases performance object and I’m sure someone’s written code to watch for the log growth counter being incremented for databases and alerting the DBA.

For someone who’s a DBA, there’s no excuse for having out-of-control transaction logs IMHO, but for involuntary DBAs and those who administer systems where SQL Server sits hidden for the most part (e.g. SharePoint), I can understand not knowing.

But now you do. Get reading these articles and get rid of those extra log files, and the need for them! Use the code in the original survey (see the link at the top) to see whether you’ve got an extra log files kicking around.

Enjoy!

PS For a general overview of logging, recovery, and the transaction log, see the TechNet Magazine article I wrote back in 2009.

All SQLskills 2013 Immersion Events open for registration!

All of our 2013 public classes are now open for registration!

Based on requests from people, attendee ratings of the hotels we used this year, and the ease of using hotels we know, we’re using the same locations again. This means we cover both sides of the US, central US, and Europe.

Please know that these classes are final as the hotel contracts are signed, and the classes will not be cancelled or moved for any reason, nor will the dates change.

  • February 4-8, 2013: Internals and Performance (IE1) in Tampa, FL – USA
  • February 11-15, 2013: Performance Tuning (IE2) in Tampa, FL – USA
  • April 29-May 3, 2013: Internals and Performance (IE1) in Chicago, IL – USA
  • April 29-May 3, 2013: Immersion Event for Business Intelligence (IEBI) in Chicago, IL – USA (co-located but in a different training room. Attendance is for one event or the other; these cannot be combined for one attendee where they move back/forth.)
  • May 6-10, 2013: Performance Tuning (IE2) in Chicago, IL – USA
  • May 13-17, 2013: High Availability & Disaster Recovery (IE3) in Chicago, IL – USA
  • May 13-17, 2013: Immersion Event for Developers (IEDev) in Chicago, IL – USA (co-located but in a different training room. Attendance is for one event or the other; these cannot be combined for one attendee where they move back/forth.)
  • May 20-24, 2013: Development Support (IE4) in Chicago, IL – USA
  • June 3-7, 2013: Internals and Performance (IE1) in London – UK
  • June 10-14, 2013: Performance Tuning (IE2) in London – UK
  • June 17-21, 2013: High Availability & Disaster Recovery (IE3) in London – UK
  • June 24-28, 2013: Development Support (IE4) in London – UK
  • September 16-20, 2013: Internals and Performance (IE1) in Bellevue, WA – USA
  • September 23-27, 2013: Performance Tuning (IE2) in Bellevue, WA – USA

One thing to note is that the course prices have increased slightly for 2013, reflecting increasing food, logistics, travel, and accommodation costs. We kept our prices the same for the last three years but now we have to raise them a little.

For US classes, the new early-bird price is US$3,295 and the full-price is US$3,795. However, for all registrations received before January 1, 2013, and for all past attendees in the 12 months prior to registration, we will only charge the 2012 early bird price of US$2,995 – super-early-bird! – so get your registrations in early!

For UK classes, the new early-bird price is US$3,795 and the full-price is US$4,295. There is a similar super-early-bird and past-attendee price equal to the 2012 UK early bird price of US$3,495 – so again, get your registrations in early!

See here for the main Immersion Event Calendar page that allows you to drill through to each class for more details and registration links.

So, that’s it for now. You can't get deeper and more comprehensive SQL Server training than we provide, anywhere in the world!

We hope to see you soon!

CHECKDB internals: what is the BlobEater?

Several times over the last month, I've been asked about the query that drives DBCC CHECKDB and other consistency checking commands, which has a variable called BlobEater in it. In this post I'd like to explain what the query is doing.

In my previous post in the CHECKDB From Every Angle series, How does DBCC CHECKDB WITH ESTIMATEONLY work?, I explained how DBCC CHECKDB uses 'facts', little bits of information that describe something that DBCC CHECKDB has noted about a data file page. Once all the pages for an entire table (or set of tables, if batching is enabled – see that same blog post I mentioned above), all the facts are aggregated together and they should all cancel each other out. When there are extra facts (e.g. two pages in an index B-tree point to the same page at a lower level), or missing facts (e.g. a LOB fragment doesn't have any other LOB fragments or data/index record pointing to it), then DBCC CHECKDB can tell there's a corruption.

As DBCC CHECKDB is generating all these factors from essentially random pages in the database (it reads the pages in a table in physical order, not logical order), there has to be some sorting of the facts before aggregation can take place. This is all driven using the query processor. Each thread in DBCC CHECKDB reads pages, generates facts, and gives them to the query processor to sort and aggregate. Once all reading has finished, the facts are then given back to parallel threads inside DBCC CHECKDB to figure out whether corruptions are present.

A picture to show this mechanism looks as follows:

 

If you're doing any tracing or profiling while DBCC CHECKDB is running, you'll see the query below:

DECLARE @BlobEater VARBINARY(8000);
SELECT @BlobEater = CheckIndex(ROWSET_COLUMN_FACT_BLOB)
FROM <memory address of fact rowset>
GROUP BY ROWSET_COLUMN_FACT_KEY
>> WITH ORDER BY
     ROWSET_COLUMN_FACT_KEY,
     ROWSET_COLUMN_SLOT_ID,
     ROWSET_COLUMN_COMBINED_ID,
     ROWSET_COLUMN_FACT_BLOB
OPTION(ORDER GROUP);

The explanation of the parts of this query is documented in the Inside SQL Server 2008 and forthcoming Inside SQL Server 2012 books, and I've quoted it below from my DBCC Internals chapter:

This query brings the query processor and the DBCC CHECKDB code together to perform the fact-generation, fact-sorting, fact-storing, and fact-aggregation algorithms. The parts of the query are as follows:

  • @BlobEater This is a dummy variable with no purpose other than to consume any output from the CheckIndex function (there should never be any, but the syntax requires it).
  • CheckIndex (ROWSET_COLUMN_FACT_BLOB) This is the custom aggregation function inside DBCC CHECKDB that the query processor calls with sorted and grouped facts as part of the overall fact aggregation algorithm.
  • <memory address of fact rowset> This is the memory address of the OLEDB rowset that DBCC CHECKDB provides to the query processor. The query processor queries this rowset for rows (containing the generated facts) as part of the overall fact generation algorithm.
  • GROUP BY ROWSET_COLUMN_FACT_KEYThis triggers the aggregation in the query processor.
  • >> WITH ORDER BY <column list> This is internal-only syntax that provides ordered aggregation to the aggregation step. As I explained earlier, the DBCC CHECKDB aggregation code is based on the assumption that the order of the aggregated stream of facts from the query processor is forced (that is, it requires that the sort order of the keys within each group is the order of the four keys in the query).
  • OPTION(ORDER GROUP) This is a Query Optimizer hint that forces stream aggregation. It forces the Query Optimizer to sort on the grouping columns and avoid hash aggregation.

And there you have it. There are quite a few pieces of query processor syntax that are only callable from inside the Engine itself, and several pieces that are only callable from DBCC, for instance to force an index rebuild to read the heap/clustered index rather than reading from the existing index.

Hope this is interesting!