A SQL Server DBA myth a day: (18/30) FILESTREAM storage, garbage collection, and more

(Look in the Misconceptions blog category for the rest of the month’s posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together when you join our Insider list, plus my online Myths and Misconceptions training course on Pluralsight.)

Phew – starting to get tiring having to make sure I get out a blog post in the morning before doing everything else in the day in case I forget… another multi-mythbusting post for you today!

Myth #18: various FILESTREAM data myths.

All of them are FALSE

18a) FILESTREAM data can be stored remotely 

No. A FILESTREAM data container (the invented name for the NTFS directory structure that stores the FILESTREAM data) must adhere to the same locality rules as regular database data and log files – i.e. it must be placed on storage 'local' to the Windows server running SQL Server. FILESTREAM data can be *accessed* using a UNC path, as long as the client has contacted the local SQL Server and obtained the necessary transaction context to use when opening the FILESTREAM file.

18b) FILESTREAM data containers can be nested

No. Two FILESTREAM data containers for the same database may share a root directory, but data containers cannot be nested, and data containers from different databases cannot share a directory. I blogged an example script that shows this at Misconceptions around FILESTREAM storage.

18c) Partial updates to FILESTREAM files are supported

No. Any update to a FILESTREAM file creates an entirely new FILESTREAM file, which will be picked up by log backups. This is why FILESTREAM cannot be used with database mirroring – the amount of data to be pushed to the mirror would be prohibitive. Hopefully a future version of SQL Server will implement a differencing mechanism that will allow partial updates and hence database mirroring compatibility.

18d) FILESTREAM garbage collection occurs instantaneously

No. Garbage collection occurs once a FILESTREAM file is no longer required (usually meaning it's been backed up by a log backup) AND a further checkpoint occurs. This is very non-intuitive and leads many people to think that FILESTREAM garbage collection isn't working for them. I explained this in detail in my post FILESTREAM garbage collection.

18e) FILESTREAM directory and filenames cannot be determined

No. There is method to the seeming madness of the GUIDs and weird filenames. The actual FILESTREAM filenames are the character representation of the LSN of the log record that described the creation of the file. The table and column directory names are GUIDs that you can get from system tables. (To be entirely accurate, the table directories are actually 'rowset' directories as far as the Storage Engine is concerned – with a rowset being equal to the portion of a table in a single partition of a partitioned tabled).

I have two blog posts which crack open the various names and system tables and show you how to figure out what's what:

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.