This is a quickie in response to a blog comment from my previous post on instant initialization: How to tell if you have instant initialization enabled? The comment was:
I must say, I love instant initialization. It's awesome. But I always wondered why it's not available for the log file. I assume there's a technical reason… but what is it? Does it depend on having the rest of the file be zeroed out? Doesn't it already know where it's start and stop points are anyways, since the log is circular?
I couldn't remember the exact answer so I discussed with Peter Byrne on the Storage Engine dev team and now I have the answer to share. There is a lot of metadata kicking around in the Storage Engine about the transaction log (mostly in the boot page – see my post Search Engine Q&A #20: Boot pages, and boot page corruption), including where to start reading the log during crash recovery. However, there's nothing that can be used after a crash occurs to determine where the active transaction log ends (i.e. where should crash recovery stop processing log records).
The way this is done is to have each log sector have parity bits stamped on it. When the log is first created, it is zero-initialized (with zero being an illegal log sector parity value). As the log is written, each sector has parity bits in it. When the end of the log is reached, and it wraps around to the start of the log file, the parity bits are flipped, so that overwritten log sectors have the opposite parity from when they were last written. When a crash occurs, log sectors are read and processed until a log sector with an out-of-sequence parity is found.
This entire process will not work if there's already random data in the space used by the log file – some of the random data could just look like a valid set of parity bits and cause the recovery system to try to process a log sector full of garbage, leading to a suspect database, at best.
So – it's not just a "there wasn't time" – there really is a good, architectural reason why instant initialization cannot be done with the transaction log.
13 Responses to Search Engine Q&A #24: Why can’t the transaction log use instant initialization?
Thanks for the followup :)
Bummer though – log resizing can be expensive, even though I try to keep the logs large enough to hold more than enough data during the busiest of times on my server.
Josh.
Paul,
Thanks for the clarification.
I wonder if the log initialize process is as fast as possible. Since the entire log initialize process must completed before the database is made available for use, possible performance enhancements might include issuing very large (multi-page) log write requests of the pre-initialized pages – like a DB restore, using async overlapped I/O, etc. Perhaps these approaches are already being done.
These issues don’t happen often, but any improvement that doesn’t compromise integrity would be beneficial.
Scott R.
[...] initialization is a little-known feature of SQL Server 2005 onwards that allows data files (only, not log files) to skip the usual zero initialization process that takes place. It's a fabulous way to reduce [...]
[...] Why the transaction log can't be instant initialized [...]
[...] – What, Why and How? In Recovery… | Misconceptions around instant file initialization In Recovery… | Search Engine Q&A #24: Why can’t the transaction log use instant initiali… Tibor Karaszi : Do you have Instant File [...]
[...] file to the size it used to be, waiting for the zero initialization of the log file to occur (see Search Engine Q&A #24: Why can't the transaction log use instant initialization?), or if you don't realize this behavior has occurred, your log will grow itself and your [...]
[...] That took 30 seconds to run, performing the zero initialization of the log file (which must *always* occur – see Search Engine Q&A #24: Why can't the transaction log use instant initialization?). [...]
[...] Switch to SIMPLE when it fills up, shrink the log, then switch back to FULL. This is worse than 2, 3, or 5. If you shrink the log, then it's going to grow again – possibly causing VLF fragmentation (see Transaction Log VLFs – too many or too few?), and definitely causing your workload to pause while the log grows, as the log can't use instant initialization (see Search Engine Q&A #24: Why can't the transaction log use instant initialization?). [...]
[...] can read more about zero initialization of the log in my blog post Search Engine Q&A #24: Why can't the transaction log use instant initialization? and about how log clearing works in my TechNet Magazine article from February 2009 Understanding [...]
[...] to log files too. No it doesn't. Log files cannot be instant initializated – my blog post Search Engine Q&A #24: Why can't the transaction log use instant initialization? explains why [...]
Scott R.: NTFS does not normally allow an application to extend a file without either writing data to every cluster, or zeroing out the file contents. This is to prevent the application from being able to read back what had previously been written by another application. So, when you try to extend a file by writing at the new end point, Windows will synchronously write zeroes to every location in between the previous end point and the new end point, even if you asked for asynchronous writes. I presume, but haven’t tested, that any reads in the new area are blocked until it completes.
When the user running the application has the ‘perform volume maintenance tasks’ right, and the application calls SetFileValidData, instead Windows says, ‘this user is trusted not to abuse this’ and doesn’t zero the file at all. But that means the file contents, when read back, are in an unknown state, they have whatever was in those disk blocks before they were allocated to the file. SQL Server is relying on the contents to make sense.
It would be great if Windows could zero the file asynchronously. However, to be useful, it would also have to support reading from an unwritten chunk (just return zeroes) and writing to part of the file that hasn’t yet been zeroed (OK, but you then have to keep track so you don’t zero areas that have subsequently been written to). That means NTFS itself would have to be able to recover from a partially-zeroed, partially-written-to file.
NTFS does of course support sparse files, which are only allocated as you write to the file. SQL Server uses them for snapshots. However, in a sparse file, only the blocks that have been written are present on disk, and NTFS decides where to allocate them when you write, so they can end up badly fragmented.
One more clarification, in the case of SQL Server, when a new portion of a file needs to be zero initialized, it’s SQL Server that does it instead of NTFS as SQL Server can do it more efficiently.
[...] [...]