In my previous posts on FILESTREAM I discussed the directory structure of the FILESTREAM data container and how to map the directories to database tables and columns. In this post I’m going to explain how and when the FILESTREAM garbage collection process works as that doesn’t seem to be documented anywhere (even in the FILESTREAM whitepaper I wrote for MS – it wasn’t supposed to be that low-level). There seems to be a lot of confusion about how updates of FILESTREAM data work, and when the old versions of the FILESTREAM files are removed. I’m going to explain how it all works and then show you by example.
The basic behavior that is non-intuitive is that there’s no such thing as a partial update of FILESTREAM data. If you have 10MB of data stored in a FILESTREAM column (and hence have a 10MB FILESTREAM file), then updating even a single byte of it will result in a whole new 10MB FILESTREAM file. Anything that relies on having an up-to-date version of the database (e.g. log backups, log-shipping, replication) will pick up the entire new 10MB FILESTREAM file. Every time an update is made to that data, a new 10MB FILESTREAM file is created and then subsequently backed-up, replicated, etc. This can lead to unexpectedly large log backups, or network traffic between replication nodes.
Once you realize that new versions of the FILESTREAM files are going to be created, the obvious follow-on question is: when do the old versions get removed? The answer is: it depends!
The old versions are removed by a process called garbage collection – in much the same way that memory garbage collection runs for managed code and deallocates object instantiations that are no longer referenced by any variables. The key point is that nothing needs the object instantiation any more; otherwise the memory garbage collection would be corrupting the run-time memory of the managed code application. The same principle applies for FILESTREAM garbage collection – the old versions of the FILESTREAM files cannot be removed until they are no longer needed.
But what does ‘no longer needed’ mean for FILESTREAM files? Well, it’s kind of the same as for transaction log records. An old version of a FILESTREAM file is no longer needed if the transaction that created it has committed or rolled back, AND there are no other technologies that must read it, like a log backup (when running in the FULL or BULK_LOGGED recovery models), or the transactional replication log reader. In fact, the transaction log VLF containing the log record of the creation of the FILESTREAM data file must be switched to inactive before the FILESTREAM file can be garbage collected. Note that I don’t mention database mirroring – in SQL 2008 database mirroring and FILESTREAM cannot be used together.
Once the old FILESTREAM file is no longer needed, it is available for garbage collection. How does the garbage collection process know which FILESTREAM files to physically delete? The answer is that when the file is no longer needed, an entry is made in a special table called a ‘tombstone’ table. The garbage collection process scans the tombstone tables and removes only the FILESTREAM files with an entry in the tombstone table. You can read more about the tombstone tables in this blog post from the CSS blog.
So when does the garbage collection process actually run? It can’t be part of log backups, because in the SIMPLE recovery model, you can’t take log backups. The answer is that it runs as part of the database checkpoint process. This is what causes some confusion – an old FILESTREAM file will not be removed until after it is no longer needed AND a checkpoint runs.
Now let’s see this stuff in action. I’m going to create a database with FILESTREAM data in and then play around with transactions, log backups, and checkpoints to show you garbage collection working.
CREATE DATABASE [FileStreamTestDB] ON PRIMARY (NAME = [FileStreamTestDB_data], FILENAME = N'C:\Metro Demos\FileStreamTestDB\FSTestDB_data.mdf'), FILEGROUP [FileStreamFileGroup] CONTAINS FILESTREAM (NAME = [FileStreamTestDBDocuments], FILENAME = N'C:\Metro Demos\FileStreamTestDB\Documents') LOG ON (NAME = [FileStreamTestDB_log], FILENAME = N'C:\Metro Demos\FileStreamTestDB\FSTestDB_log.ldf'); GO USE [FileStreamTestDB]; GO CREATE TABLE [FileStreamTest1] ( [DocId] UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE, [DocName] VARCHAR (25), [Document] VARBINARY(MAX) FILESTREAM); GO
Now I’m going to put the database into the FULL recovery model and take a full database backup – which means I must now take log backups to manage the size of the transaction log. It also means that a FILESTREAM file cannot be removed until it has been backed up.
ALTER DATABASE [FileStreamTestDB] SET RECOVERY FULL; GO BACKUP DATABASE [FileStreamTestDB] TO DISK = N'C:\SQLskills\FSTDB.bak'; GO
Now I’m going to create some FILESTREAM data.
INSERT INTO [FileStreamTest1] VALUES ( NEWID (), 'Paul Randal', CAST ('SQLskills.com' AS VARBINARY(MAX))); GO
Looking in the FILESTREAM data container I created, I have the following file:
Remember from the previous blog posts, that the FILESTREAM file filenames are the database log sequence number at the time they were created. Now I’ll update the value in an implicit transaction (no BEGIN TRAN and COMMIT TRAN).
UPDATE [FileStreamTest1] SET [Document] = CAST (REPLICATE ('Paul', 2000) AS VARBINARY(MAX)) WHERE [DocName] LIKE '%Randal%'; GO
And we now have the following files:
The new file is the 8KB file and the old FILESTREAM value is the 1KB file. If I try doing an explicit CHECKPOINT, nothing changes as the old file is still required as it hasn’t yet been backed up. Now I’ll do a log backup.
BACKUP LOG [FileStreamTestDB] TO DISK = 'C:\SQLskills\FSTB_log.bak'; GO
And the files are all still there. Although the first 1KB file is no longer needed, a checkpoint hasn’t occurred yet, so garbage collection hasn’t run. Now running an explicit CHECKPOINT, the directory still contains the two files. What happened? The transaction log VLF containing the log record for the creation of the FILESTREAM file is still active, so the file is still needed. I have to do *another* log backup and checkpoint before garbage collection kicks in (as that will cause the log to cycle, when there’s nothing happening in the database and no active transactions) and the directory view changes to:
The alternative would have been to generate more log records, spilling into the next transaction log VLF, then do another log backup which would mark the ‘creation’ VLF inactive, and then the next checkpoint would run garbage collection on the file. This, of course, would be the normal course of events in a production database.
So, don’t get confused if you update a FILESTREAM file, then do a log backup and checkpoint and nothing happens. Remember the transaction log has to have progressed enough for the ‘creation’ VLF to be inactive too. You can prove this to yourself by creating an explicit transaction at the same time as the FILESTREAM update (in another, implicit transaction). No matter how many times you backup the log and checkpoint the database, the garbage collection will not run until the explicit transaction is committed or rolled back, and then another log backup and checkpoint is run.
I’ll leave it as a fun exercise for you to play around with updates in explicit transactions and various backup scenarios to see when garbage collection can and cannot remove old files, but now you know exactly how it works.