(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)
And it’s an auspicious day to be posting, as SQLskills/SYSolutions turns 24 today on 7/24/19!
I had an email from someone last week asking why the transaction log has to be zero-initialized and a I realized I’ve never written about this before, so here goes. (Note: I do explain and demonstrate this in my Pluralsight course on SQL Server: Logging, Recovery, and the Transaction Log.)
It’s all to do with crash recovery. SQL Server knows where crash recovery has to start for a database, but not where it ends – i.e. SQL Server does not persist the ‘most recent LSN’ for a database anywhere. This means it has to work out where the end of the log is (and by end, I mean the most recent log record persisted on disk, not the physical end of the log file).
Some background before the explanation:
- The log is split up internally into chunks called virtual log files, or more commonly just VLFs.
- The first time a VLF is activated and used, all used sections of it are stamped with parity bits 64 (the mechanism for this is not important)
- Eventually the VLF will be marked inactive, and eventually reused. The second time a VLF is activated, all used sections of it are stamped with parity bits 128
- And then 64 again
- And then 128 again
- Ad infinitum…
Why 64 and 128 as the alternating parity bits you may ask? Why not? is my response. I can’t think of a reason to use a different bit pattern pair.
Back to the question at hand…
The most common case when crash recovery happens is that the log has wrapped around a few times and so the various VLFs have been activated and deactivated a few times. Crash recovery goes to where it must start: either the most recent checkpoint, or the start of the oldest active transaction at the time the most recent checkpoint happened. It follows the sequence of active VLFs until it comes to a point where a section of a VLF has the wrong parity bits. This means a VLF is active and says all valid sections should have parity bits X, and crash recovery finds an old section of the VLF from its previous use that has parity bits Y. That’s the end of the log. (Yes, when a VLF is reactivated, it is not zeroed out, because the overwriting of the old sections with new sections with new parity bits works instead.)
The much rarer case is when the log *hasn’t* yet wrapped around and not all the VLFs in the log have been used. In that case, crash recovery proceeds from the start until it find a section of an active VLF that is full of zeroes. And that’s the end of the log in that case.
New physical portions of the log file have to zero-initialized as the previous bits and bytes on disk might just happen to look like a section of a VLF with the ‘correct’ parity bits, causing crash recovery to try to use it and most likely crash SQL Server. It’s highly improbable, but there’s a very small possibility.
Hence the log cannot use instant file initialization.
PS: note that in SQL Server 2016, the ‘zeroing’ pattern changed from 0x00 (hexadecimal zero) to 0xc0, for reasons unrelated to what we’re discussing here.
13 thoughts on “The Curious Case of… why the transaction log has to be zero-initialized”
Paul, thanks for the clear and concise description. I warn about the absence of zero initialization any time I talk about the transaction log. Now I’m better equipped to answer why.
And congratulations again on the anniversary. The work you and Kimberly do at SQL Skills has helped fuel my passion for SQL Server. It is good to be passionate about what you do for a living.
You’re welcome, and thanks!
Very interesting, thanks for the post! How is using a parity bit safe enough to keep SQL Server from using an incorrect portion by mistake and crashing, though — can’t the parity bit have the correct value accidentally?
I’ve never known it to happen, even with disk corruption. There’s obviously a small potential for it, but it’s very small. And there are other checks that happen under the covers to make sure portions of the log aren’t corrupt.
Ah, I see, that makes sense. Thanks!
Have they started using 0xc0 because that pattern has a smaller chance of occurring by accident compared to all zeros?
No – they’re just trying to fool the I/O vendors who look for writes of 0x00. Of course, those vendors aren’t going to notice the change at all, are they? :-)
I know it might not be 100% appropriate place to post, but I have been struggling to find the answer to this and thought you might know.
In looking at Transaction Log internals, I am unsure exactly what goes into the log. For example if I update an int field from the value 1 to value 2 for 1000 rows what is specifically recorded.
This article (https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-guides/jj835093(v=sql.110)?redirectedfrom=MSDN) states: Log records for data modifications record either the logical operation performed or they record the before and after images of the modified data.
Does it make a decision on this?
Yes – it will have the before and after values for each row, with one log record per row updated (for the example you describe). Quite a few examples of looking at the transaction log internals in my Pluralsight course: https://app.pluralsight.com/library/courses/sqlserver-logging/table-of-contents
We are trying to investigate issues with our long recovery times after failovers. The error logs suggest that most of the time is spent on the redo phase, but we do not have any long running transactions. However, the db does have 700 vlfs, most of them are very small. We understand this wouldn’t affect the redo phase, however, we still wanted to rule it out. In cleaning it up, we notice our parity bits looked a bit off. We have multiple instances of a stream of 128 bits or 64 bits, followed broken by a single or a stream of the opposing bit (example below) . If there are multiple instances of this, how can sql server determine the correct vlf to start analysis from?
The starting point for crash recovery is based off the logged contents of the most recent checkpoint, the LSN of which is stored in the boot page always. And it’s totally fine and expected to have parity bit patterns like that.
Thanks for the reply, a follow up question. The last two vlf’s have a zero parity bit on our problem db. From what I understand, that means the log hasn’t cycled over. However, we generate quite a bit of log daily and we haven’t failed over or restarted the server in months. We would expect the log to cycle through the vlf’s daily. Is there anything else that might cause the parity bit to remain zero, or the vlf to not be used? And would that mean “crash recovery proceeds from the start” and the start would be from the last LSN stored in the bootpage?
Send me the output from DBCC LOGINFO for that database in email please. And see https://www.sqlskills.com/blogs/paul/how-do-checkpoints-work-and-what-gets-logged/