This is a quickie in response to a blog comment from my previous post on instant initialization: How to tell if you have instant initialization enabled? The comment was:
I must say, I love instant initialization. It’s awesome. But I always wondered why it’s not available for the log file. I assume there’s a technical reason… but what is it? Does it depend on having the rest of the file be zeroed out? Doesn’t it already know where it’s start and stop points are anyways, since the log is circular?
There is a lot of metadata kicking around in the Storage Engine about the transaction log (mostly in the boot page – see my post Search Engine Q&A #20: Boot pages, and boot page corruption), including where to start reading the log during crash recovery. However, there’s nothing persisted about where the end of the log is, so when a crash occurs SQL Server has to determine where the active transaction log ends (i.e. where should crash recovery stop processing log records).
The way this is done is to have each log sector have parity bits stamped on it. When the log is first created, it is zero-initialized (with zero being an illegal log sector parity value). As the log is written, each 512-byte sector in a VLF has parity bits stamped on it (initially 64). When the end of the log is reached, and it wraps around to the start of the log file, the initial VLF is reused and the parity bits are flipped to 128, so that overwritten log sectors have the new parity bits (and then for each successive reuse of a VLF the parity bits flip back-and-forth between 64 and 128). When a crash occurs, log sectors are read and processed until a log sector in a VLF is found that has the ‘old’ parity bit setting for that VLF.
This entire process will not work if there’s already random data in the space used by the log file – some of the random data could just look like a valid set of parity bits and cause the recovery system to try to process a log sector full of garbage, leading to a suspect database, at best.
All new portions of the transaction log most be zero-intialized, but only once, when the space is first added to the log file.
9 thoughts on “Search Engine Q&A #24: Why can’t the transaction log use instant initialization?”
Thanks for the followup :)
Bummer though – log resizing can be expensive, even though I try to keep the logs large enough to hold more than enough data during the busiest of times on my server.
Thanks for the clarification.
I wonder if the log initialize process is as fast as possible. Since the entire log initialize process must completed before the database is made available for use, possible performance enhancements might include issuing very large (multi-page) log write requests of the pre-initialized pages – like a DB restore, using async overlapped I/O, etc. Perhaps these approaches are already being done.
These issues don’t happen often, but any improvement that doesn’t compromise integrity would be beneficial.
Scott R.: NTFS does not normally allow an application to extend a file without either writing data to every cluster, or zeroing out the file contents. This is to prevent the application from being able to read back what had previously been written by another application. So, when you try to extend a file by writing at the new end point, Windows will synchronously write zeroes to every location in between the previous end point and the new end point, even if you asked for asynchronous writes. I presume, but haven’t tested, that any reads in the new area are blocked until it completes.
When the user running the application has the ‘perform volume maintenance tasks’ right, and the application calls SetFileValidData, instead Windows says, ‘this user is trusted not to abuse this’ and doesn’t zero the file at all. But that means the file contents, when read back, are in an unknown state, they have whatever was in those disk blocks before they were allocated to the file. SQL Server is relying on the contents to make sense.
It would be great if Windows could zero the file asynchronously. However, to be useful, it would also have to support reading from an unwritten chunk (just return zeroes) and writing to part of the file that hasn’t yet been zeroed (OK, but you then have to keep track so you don’t zero areas that have subsequently been written to). That means NTFS itself would have to be able to recover from a partially-zeroed, partially-written-to file.
NTFS does of course support sparse files, which are only allocated as you write to the file. SQL Server uses them for snapshots. However, in a sparse file, only the blocks that have been written are present on disk, and NTFS decides where to allocate them when you write, so they can end up badly fragmented.
One more clarification, in the case of SQL Server, when a new portion of a file needs to be zero initialized, it’s SQL Server that does it instead of NTFS as SQL Server can do it more efficiently.
Stupid question, but, for log files does it get instant initialised but then zeroed out manually as well… or for log files does it specifically not use the instant file API at all?
There is no concept of instant initialized and then zero’d – log files just get zero’d.
Hi Paul, when you said, parity bits are flipped means every time a sector overwritten, it gets parity “zero”? also how the out-of-sequence parity exactly works to stop crash recovery at that point? do you mean out-of-sequence parity is nothing but it finds a zero parity followed by a parity greater than zero?
I haven’t blogged about how all that mechanism works, but I do cover it in-depth in my Pluralsight course on Logging and Recovery.
Paul, when we shrink any data file , each Gam page is checked to find max allocated extent