Lazy log truncation or why VLFs might stay at status 2 after log clearing

 Earlier this year I was sent an interesting question about why the person was seeing lots of VLFs in the log with status = 2 (which means ‘active’) after clearing (also known as ‘truncating’) the log and log_reuse_wait_desc showed NOTHING.

I did some digging around and all I could find was an old blog post from 2013 that shows the behavior and mentions that this happens with mirroring and Availability Groups. I hadn’t heard of this behavior before but I guessed at the reason, and confirmed with the SQL Server team.

When an AG secondary is going to be added, at that point in time, the maximum LSN (the Log Sequence Number of the most recent log record) present in the restored copy of the database that will be the secondary must be part of the ‘active’ log on the AG primary (i.e. that LSN must be in a VLF on the primary that has status = 2). If that’s not the case, you need to restore another log backup on what will be the new secondary, and try the AG joining process again. Repeat until it works. You can see how for a very busy system, generating lots of log records and with frequent log backups (which clear the log on the primary), catching up the secondary enough to allow it to join the AG might be difficult, or necessitate temporarily stopping log backups on the primary (possibly opening up a window for increased data loss in a disaster).

To make this whole process easier, when a database is an AG primary, when log clearing occurs, the VLFs don’t go to status = 0; they remain ‘active’ with status = 2. So how does this help? Well, the fact that lots more VLFs are ‘active’ on the AG primary means that it’s more likely that the maximum LSN of a new secondary is still part of the ‘active’ log on the primary, and the AG-joining succeeds without having to repeat the restore-retry-the-join over and over.

(Note: the log manager knows that these VLFs are really ‘fake active’ and can reuse them as if they were ‘inactive’ if the log wraps around (see this post for an explanation) so there’s no interruption to regular operations on the AG primary.)

It’s a clever little mechanism that someone thought of to make a DBA’s life a bit easier and AGs less problematic to set up.

And now you know – log clearing won’t *always* set VLF statuses to zero.

10 thoughts on “Lazy log truncation or why VLFs might stay at status 2 after log clearing

    1. You’d need to look at all open transaction begin LSNs, oldest unreplicated txn begin LSN, last log backup LSN, and so on. Not practical and no point as it has no performance impact whatsoever.

  1. Paul,

    What is the impact of all that on filestream? My understanding is this will impact deletion of files, as filestream garbage collector will not remove deleted files unless VLF with that operation is marked as inactive. Not clear how to address that…

    Thank you,

  2. Hi Paul,

    While I appreciate the the reasons behind this technique, I’m in a position where all my VLFs are in status 2. I know most of them are “fake active” because the active VLF wraps around as you’d expect, and the log neither grows or complains about being full. However the issue is with all VLFs remaining as 2, I can never shrink the transaction log! Is there any known solution to this predicament?

    Thanks,
    Chris.

    1. Jonathan said this:

      “As long as the following conditions are met, the log will shrink:

      All secondary replicas are synchronized
      The log has been backed up
      The replication log reader has processed the log records for CDC/replication if required
      All Secondary replicas have processed redo – if a secondary is blocked during redo by a SCH_M lock being blocked by a SCH_S due to readonly workload it can’t clear the log records till processed.

      Usually in an AG I do a shrink file to remove the inactive portion at the end, take a log backup to clear anything and get it to wrap around, then another shrink and it works just fine. Sometimes I have to do an extra log backup followed by a shrink to get it to fully reduce. I don’t ever use DBCC LOGINFO for that type of operation.”

      1. This has proved invaluable. My general experience is that there is so much that goes on under the hood and which DBAs have to solve – and yet so often have to resort to discussion forums, blogs and support sites to figure it out. That is a real shame.

        Regardless – to add: on SQL Server 2016 having an AG, performing a shrink operation on the log file (where >95% of VLFs showed status 2) led to almost all VLFs being set to zero – which then propagated through to the other replicas’ log files too.

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.