Inside the Storage Engine: More on the circular nature of the log

 

Every so often (well, ok, very frequently) my internal list of ‘I must blog about that’ gets too big and I have an almost uncontrollable urge. Luckily, these urges end up with me blogging several times, otherwise who knows what kind of trouble I’d get into? :-)

First up today is some pretty deep internals about how the transaction log works in a specific situation. This has come up a few times in the last few internals/maintenance classes I’ve taught in the transaction log module so I wanted to get a blog post out there that proves my answer is correct. The question is easier to frame with a picture, borrowed from my TechNet Magazine article from February (see here).

The image shows the circular natue of the transaction log, and how VLFs can be overwritten once the log has wrapped around, as long as they are no longer active (red = active log/VLF, blue = inactive log/VLF). If none of this makes sense to you, go read the TechNet Magazine article and it should do.

The question is: what happens if log records continue being written to VLFs 1 and 2 in the picture, and eventually the end of VLF 2 has been reached, but VLF 3 is still active, and can’t be overwritten? Does the database just stop?

Let’s find out.

I’m going to create a database and then engineer the situation above.

CREATE DATABASE LogTestDB ON PRIMARY (
NAME = LogTestDB_data,
FILENAME = N’C:\SQLskills\LogTestDB_data.mdf’)
LOG ON (
NAME = ‘LogTestDB_log’,
FILENAME = N’C:\SQLskills\LogTestDB_log.ldf’,
SIZE = 5MB);
GO

 

The database is in the SIMPLE recovery model. Well, to be completely honest, it’s in the FULL recovery model, but there hasn’t been a full database backup yet, so as far as transaction log behavior is concerned, it behaves as if it was in SIMPLE (the log can be cleared by a CHECKPOINT) – again, if this makes no sense, go read the TechNet Magazine article. 

And then see how many VLFs it has using the DBCC LOGINFO command (and this is the *only* way to see how many VLFs there are):

DBCC LOGINFO (‘LogTestDB’);
GO

FileId   FileSize   StartOffset   FSeqNo   Status   Parity CreateLSN
——– ———- ————- ——– ——– —— ———–
2        1245184    8192          31       2        64     0
2        1245184    1253376       0        0        0      0
2        1245184    2498560       0        0        0      0
2        1499136    3743744       0        0        0      0

We’ve got 4 VLFs. The Status column tells us whether the VLF is active or not. A status of 2 means the VLF is active, 0 means it’s not (Edit 8/30/18: a status of 4 is possible on an AG secondary where a VLF exists on the primary but the log file growth hasn’t been replayed on the secondary yet). The sequence number (FSeqNo) is the logical order of the VLFs within the log. The FileSize is in bytes (so each VLF is about 1.25 MB). Right now there’s only one active VLF.

Now I’ll engineer the situation in the image above. I’m going to fill the log so that VLFs 1, 2, and 3 are active. Then I’m going to start an explicit transaction that will hold VLF 3 and onwards active. Then I’ll continue filling the log so it wraps around and starts to fill up VLF 1 again.

USE LogTestDB;
GO
CREATE TABLE BigRows (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT ‘a’);
GO

SET NOCOUNT ON;
INSERT INTO BigRows DEFAULT VALUES;
GO 300

I’ve filled up VLFs 1 and 2, and started on 3. Let’s check with DBCC LOGINFO:

DBCC LOGINFO (‘LogTestDB’);
GO

FileId   FileSize   StartOffset   FSeqNo   Status   Parity CreateLSN
——– ———- ————- ——– ——– —— ———–
2        1245184    8192          31       2        64     0
2        1245184    1253376       32       2        64     0
2        1245184    2498560       33       2        64     0
2        1499136    3743744       0        0        0      0

As you can see, the first 3 VLFs now have a status of 2 (active), and they’re in sequence. Now I’ll create an explicit transaction that will prevent VLF 3 and onwards being cleared.

BEGIN TRAN
INSERT INTO BigRows DEFAULT VALUES;
GO

If I explicitly do a CHECKPOINT now, VLFs 1 and 2 will clear:

CHECKPOINT
GO

DBCC LOGINFO (‘LogTestDB’);
GO

FileId   FileSize   StartOffset   FSeqNo   Status   Parity CreateLSN
——– ———- ————- ——– ——– —— ———-
2        1245184    8192          31       0        64     0
2        1245184    1253376       32       0        64     0
2        1245184    2498560       33       2        64     0
2        1499136    3743744       0        0        0      0

Now I’ll carry on filling up the BigRows table so the log wraps around and starts filling up VLFs 1 and two again.

INSERT INTO BigRows DEFAULT VALUES;
GO 300

DBCC LOGINFO (‘LogTestDB’);
GO

FileId   FileSize   StartOffset   FSeqNo   Status   Parity CreateLSN
——– ———- ————- ——– ——– —— ———-
2        1245184    8192          35       2        128    0
2        1245184    1253376       32       0        64     0
2        1245184    2498560       33       2        64     0
2        1499136    3743744       34       2        64     0

You can see that the log has wrapped around now, but VLFs 3 and 4 are still active. Look at the sequence numbers of the active VLFs… the active log is VLF 3 then 4 then 1, with sequence numbers 33 then 34 then 35. Now if I carry on filling up the table, what’s going to happen when the log bumps up against VLF 3 that is still active? Will it stop or will it grow?

INSERT INTO BigRows DEFAULT VALUES;
GO 300

DBCC LOGINFO (‘LogTestDB’);
GO

FileId   FileSize   StartOffset   FSeqNo   Status   Parity CreateLSN
——– ———- ————- ——– ——– —— ——————
2        1245184    8192          35       2        128    0
2        1245184    1253376       36       2        128    0
2        1245184    2498560       33       2        64     0
2        1499136    3743744       34       2        64     0
2        253952     5242880       37       2        64     36000000049300052
2        270336     5496832       38       2        64     36000000049300052
2        253952     5767168       0        0        0      36000000107500066
2        335872     6021120       0        0        0      36000000107500066
2        253952     6356992       0        0        0      36000000190700020
2        401408     6610944       0        0        0      36000000190700020
2        253952     7012352       0        0        0      37000000037300040
2        466944     7266304       0        0        0      37000000037300040

The answer is that it grew, and kind of skipped the active VLFs! Look at the sequence numbers. The new sequence of the active log is VLF 3 then 4 then 1 then 2 then 5 then 6, as you can see from the sequence numbers. Once the active transaction I created commits or rolls back, VLFs 3, 4, 1, and 2 can clear and then the ‘normal’ sequencing of VLFs in the log will resume.

Every VLF has a small header which contains the sequence number of the VLF within the transaction log, so the log can kind of do contortions to work around active VLFs in the middle of the log. Very cool.

Ok – that was fun – now I feel better!

PS In the last couple of DBCC LOGINFO dumps, where the log has wrapped around, you can see that the parity bits for the log blocks in the VLFs have changed, as I explained in one of my previous posts Search Engine Q&A #24: Why can’t the transaction log use instant initialization?.

26 thoughts on “Inside the Storage Engine: More on the circular nature of the log

  1. Thanks for this post Paul… I now have a better understanding of VLF’s and the circular nature works!

  2. Hi Paul. I´ve read this post and your article in Technet Magazine and i have one doubt: I´ve read about write-ahead logging in books on line and it says that all log registers are written to a log cache before going to disk in the log files. When this records stored in this log cache are written to the log files? Is it in the checkpoints just before the data? If i have a crash , will this registers in log cache be lost if they are not written to disk yet?

    Regards

    1. Andre – log records are written out:
      1) whenever a transaction commits or aborts
      2) whenever a data page is written to disk, all log records up to and including the one that last changed the page being written must be written to disk first
      3) whenever a log block hits 60K in size

      Log blocks are copied to the log cache and then async flushed to disk. This is what Books Online means.

      Data pages cannot be written to disk until the log that affects them is on disk first. Transactions cannot be acknowledged as committed or rolled back unless all their log records are on disk first.

      Make sense?

  3. Paul – Are the log records that are truncated usable in any way? I have a situation where I need to send a database to a third party to troubleshoot a program. I’ve done the following –

    1)Updated sensitive data so it would not be meaningful
    2)Ran a full database backup
    3)Ran a transaction Log backup (just so I could truncate and shrink the log file)
    4)Shrunk the log file just to make it smaller
    5)Ran another full DB backup (which contains the smaller log file) which I had planned to send off

    If I understand correctly it’s possible at least some of the log records from the updates are still in the log file. It’s just not clear to me whether they could be used to restore the sensitive data.

    1. Yes, they could be, but it’s very hard to do. However, what you’ve done has likely removed the pertinent log records from the full backup you’re going to send (step 5). You’d need the full + log backups from steps 2 and 3. Getting data from those log records requires someone or some program that can understand the log records themselves.

  4. Hi Paul, although this is an old post, people do occasionally still seem to be posing questions. And as I’m currently working my way through your ‘Logging, Recovery, and the Transaction Log’ PluralSight course I thought I’d pose another one as I’m having difficulty understanding the ‘flipping’ of parity bits when log sectors are over-written and how this aids crash recovery.

    I’ve read your earlier post, ‘Why can’t the transaction log use instant initialization?’ which says ‘When the end of the log is reached, and it wraps around to the start of the log file, the parity bits are flipped, so that overwritten log sectors have the opposite parity from when they were last written. When a crash occurs, log sectors are read and processed until a log sector with an out-of-sequence parity is found.’. However I’m having a problem understanding why a log sector that is currently part of the log chain doesn’t also represent an out-of-sequence parity’ Let me try and expand, in part using one of your examples from the course and this and other posts.

    A database log file of 5Mb has 4 approximately equal VLFs. At this point VLFs 1 is the active VLF and has a parity of 64.

    I then fill VLFs 1 and 2, begin a transaction which will hold VLF 3 and onwards active and then execute a checkpoint. At this point VLF 3 is the active VLF and has a parity of 64. VLFs 1 and 2 still have a parity of 64 but their status is now 0.

    I continue to fill VLf 3, then VLF 4 and wrap around to fill what now become VLF 5. At this point VLF 5 has been re-activated (status 2) and now has a parity of 128 and is the end of the log chain. VLF 2 still has a parity of 64 and but is inactive (status zero).

    If there is a failure at this point crash recovery will follow the log chain round from the transaction begun in VLF 3, through VLF 4 and back round into VLF 5. If I’ve understood things correctly, when it gets to VLF 2 it encounters an out-of-sequence parity, since VLF 5 has parity 128 and VLF 2 has parity 64, and stops at that point. However here, finally, is my problem. Why does the boundary between VLF 4 and VLF 5 not represent an out-of-sequence parity as well since VLF 4 has parity 64 and VLF 5 parity 128? Why does crash recovery not stop at this point?

    I’m sure the answer is a very simple one however I’m obviously missing some vital clue and just can’t see it at the moment.

    Best Regards,

    Gordon Feeney

    1. Hey Gordon – it’s because the out-of-sequence isn’t based on the current parity bit setting for the VLF, (i.e. the 64 -> 128 boundary between 4 and 5). It’s based on finding a log block within a VLF that has the wrong parity bits given the current parity bits setting for that VLF. It’s an intra-VLF thing, not an inter-VLF thing. Make sense?

  5. Hi Paul. With regard to my previous post regarding parity bits I think i might have seen the light. Parity bits are set for each block within a VLF of course, rather than the VLF as a whole, so during crash recovery the first block within a VLF to have a different parity to those blocks already processed represents ‘out-of-sequence’ parity and so crash recovery stops. Does that sound correct?

    In my previous example VLF 2 would have been ignored anyway as it wasn’t part of the log sequence.

    Gordon.

      1. Thanks to Gordon for his comments, and Paul for your replies. I’m still not 100% clear on this though. When recovery looks at the first block in a VLF, how does it know what parity to expect? Paul, you earlier wrote “the current parity bits setting for that VLF”. If there is a VLF-wide setting, why does recovery need to find a 512 byte block? Wouldn’t the first single byte having the wrong parity bit be sufficient to detect the end of the log? (Or are we saying there is always 1 parity bit per 512 bytes? Under what scenarios will the system detect the ‘wrong’ parity bit for a block of fewer than 512 bytes?) On the other hand, Paul, you wrote that it is “correct” what Gordon said with “during crash recovery the first block within a VLF to have a different parity to those blocks already processed represents ‘out-of-sequence'” – but in this case, how does recovery determine whether the first block in a VLF has the correct parity? There are no prior blocks it can be compared with. I am going to guess there’s an intra-VLF parity setting as described, and so muse on my first questions around “why 512 byte block?” Thanks agaain. Also arrived here via the same course Gordon watched.

          1. Thanks Paul,

            Still a few questions around this. If the header specifies the parity bit, why does the system need to search for 512 continuous bytes? Or is it the case there is 1 parity bit per block and the minimum block size is 512? Or does each 512 byte portion within a block contain one parity bit? In other words, is it the same to say “it detects a (single) parity bit that is the opposite of what’s expected” (ie. regardless of the fact the minimum block size is a 512 byte portion)?

            Also, in the same course, Module 10, initial log backup, the example shows 3 VLFs that have been used, then 2 that are being held active. The first three are labelled “inactive” but earlier in the course it was said that the only thing that can mark a VLF as inactive is log clearing, which occurs only as a result of a checkpoint (simple recovery model) or log backup clearing (bulk/full) “no exceptions”. How then are the first 3 VLFs “marked inactive” before the first log backup?

            Lastly – about that 60kb limit on log blocks – I wonder if sometime in the deep dark past (pre version 1, when Sybase designed logging?) a VLF was fixed at 128kb with an 8kb header and a requirement for there to be at least 2 log blocks per VLF, resulting in the 60kb maximum? Can only speculate, but the 2x60k + 8k header numbers sit nicely.

            Cheers.

  6. What would the consequences be if a database was set to full backup, and had multiple agents running backups? I stumbled across this situation. IE. A SQL maintenance plan was established that performed nightly full @ midnight, with incremental backups on the hour. Then a second service (using a 3rd party agent tool like Barracuda) was also backing up to a cloud service using a nightly full (sometime in the wee morn hours) with incremental running every 4 hrs. I would assume trying to restore would pose some issues since the log sequences would be out of sync. I would not think a recovery would be very reliable?

    1. Full backups have no effect on log backups or the log backup chain. Any one of the full backups taken in your scenario can be used as the basis for a restore sequence.

      Btw – there’s no such thing as an incremental data backup in SQL Server. Maybe you mean a differential backup? They’re not incremental. Successive differential backups get larger and larger until the next full backup. If you mean a log backup, they’re not usually called incremental backups.

      1. Thanks Paul. Yes of course Full backup nightly with log backup running hourly, sorry; slip of the fingers. The part I’m confused with is if I have 2 sets/schedules of backups running full nightly (within minutes of each other for the sake of argument) with some version of log backup running (hourly or every 4 hours or every 23 minutes) – Full backups don’t affect the logs but log backups do. So ignoring the full backups in this scenario: if set A runs log backups every hour and set B runs a log backup every 4 is the sequence complete for either set? In other words, if I only had access to one set, would a restore be complete (or even successful)? Using 1 set, wouldn’t applying the transaction log backups be ‘out of sequence’?

  7. Hi Paul,
    Was wondering if you could please suggest a way to resolve this issue I just am into:
    I posted in stackflow but didn’t see any solutions

    Server had SQL server 2014 DataCenter RTM Version on Windows 2012 Standard Edition for about 2 months and had migrated the database from SQL 2008 R2.

    The database had transactional replication was being replicated to another server. The Distributor and Subscriber are on thier own servers.

    Tried applying the Service Pack 1 on Sunday morning which brought the entire system to a halt and asked for Windows-Activate Code Key.

    These were in error logs earlier before SP 1 was applied:
    -Log is Full due to Active transaction as it gone into recovery due to an alter (recovery took very long)
    -Tried Logbackups but Checkpoint was not happening
    -Backup terminated -so could not do a backup

    The Service Pack was being applied due to the following issues:

    1) Checkpoint didn’t occur

    2) Transaction log was full due to Active transaction

    3) SQL Server takes long time to open the databases after the recovery phase when the number of databases or database files or both is large in SQL Server 2012 or SQL Server 2014

    4)You cannot set a SQL Server 2012 or 2014 database to single-user mode if the database is read-only

    5)SQL Server crashes when the log file of tempdb database is full in SQL Server 2012 or SQL Server 2014 Here the log file for the AHS_DataStore was full due to “Active tansaction” SQL Server 2014 Service Pack 1

    6)Rollback causes database to go to suspect mode in SQL Server 2012 or SQL – Roll back caused it to go in Recovery which was an ALTER on table of IDs(max of Int) – Should have found a better way.

    Finally, had to reinstall the OS yesterday, SQL 2014 again, applied Service pack 1, will be adding the Cumulative updates till 3 for SP1 today.

    Is there any way for me to establish all the replication settings as it was before. I do have all the baks of the System DBs from the previous install.

    OR how do I go back to just attaching the MDF file without the log as it was really huge and the backup happened 1 week back which was on the 11/3/2015 I have all the logs from then.

    Would appreciate any suggestions or workarounds that would bring the DB back up soon.

    Thanks

    1. Replication: if you restore all the databases and logs and bring them and the system databases to the same point in time, replication should be there as it was before.

      Your second question seems confused. You can either attach the database without the log or you can restore from backups – you can’t attach without the log and then make use of log backups.

  8. @Chris – 1) After the VLF header, the VLF has 512-byte blocks, each of which has a byte that holds the parity bits. 2) it should really read ‘no active transactions’. I made it a lot clearer in the more recent course on backups. 3) No – there’s a calculation that’s done to come up with the size, but the explanation of the calculation has been lost AFAIK.

  9. I am experiencing a strange issue. We have a 4 node Availability group with 2 in Asynchronous mode and amongst that one is a read only node.

    My Log files are sized at 71gb and the space used is 3 gb. I am taking regular log backups every 15 minutes. The application just went live and so far I have noticed the log space used is just increasing.

    My point is, aren’t the log back up supposed to truncate the log file and make it clear for reuse.
    There is no open transaction, the log_reuse_wait_desc shows as ‘NOTHING’ but the DBCC LOGINFO showing couple of VLFs with a status of 2. Even if I take log back up still stays at 2.
    The low_water_mark_for_ghosts is increasing also. The VLFs has a space of 4 gb. Will the VLF status move to status of 1 and reuse the space when it goes full …. I have restarted SQL service on the readable secondary just to see if there is any process that is holding the log from being reused but to no avail.

    Is there any workaround please advise as I don’t want to shrink the log files at no cost.

  10. Hi,
    I was wondered why StartOffset and FseqNo do not have the same order. This disorder seems append when autogrow was made. Do you have any information about that?
    Is it possible to realign StartOffset and FseqNo to make the log file more sequential?
    Regards

    1. They’re completely orthogonal and VLF sequence numbers don’t have to follow the physical order of VLFs, which is why the sequence number exists in the first place.

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.