The Curious Case of… how to find FILESTREAM info from an MDF

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

A couple of weeks ago a question came in to the MCM email distribution list where someone wanted to know how to find FILESTREAM info from an MDF. They were working with a client who needed to attach a database and all they had was the MDF – no backup, and no access to the database attached anywhere else. The attach was failing because the database had a FILESTREAM filegroup defined, and so the CREATE DATABASE … FOR ATTACH statement needed to specify the FILESTREAM filegroup name and a new location for it.

Given that there was no SQL Server access to the database or a backup, the only way to find the information was through examining the MDF directly.

Here are the steps I recommended:

  • Download and install the freeware hex editor HxD from here.
  • Open the MDF file in HxD.
  • Hit Search, change the data type to hex values and search for the string ‘6C 00 64 00 66 00’, which is the Unicode byte-reversed hex for ‘ldf’.
  • Hit F3 until it can’t find any more. Now you’re positioned in the list of file for the database.
  • Just below that will be pathnames of other files, so scroll down until you come to one that doesn’t have ldf, mdf, or ndf at the end of the pathname.
  • This is the pathname for the FILESTREAM data container, and above that is the filegroup name.

As an example, the image below shows a portion of the MDF for a test database I created:

C:\SQLskills\Documents is the pathname and FSWaitsDocuments is the logical name.

Bottom line: you never know when a little hex-editor spelunking will come in handy!

The Curious Case of… copying FILESTREAM files between tables

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

I was asked last week whether it’s possible to create a table with a FILESTREAM column and then populate that column by copying FILESTREAM files from another directory in the FILESTREAM data container.

The simple answer is no – you can’t use that method for copying FILESTREAM files between tables.

In the data container, each table with a FILESTREAM column (or partition of a partitioned table with a FILESTREAM column) has a directory with a GUID name, and then there’s a sub-directory for each FILESTREAM column that also have GUID names. Then the FILESTREAM files themselves have filenames in the format 8hexdigits-8hexdigits-4hexdigits. The hex digits are the Log Sequence Number (LSN) of the log record that created the FILESTREAM file, and the data record contains that LSN as part of the in-row portion of the FILESTREAM column. (You can read a full explanation of the storage format in my post FILESTREAM directory structure.)

So, simply copying the files from one directory to another wouldn’t work for two reasons:

  • The FILESTREAM column in the data record of the target table wouldn’t be populated
  • And even if you created empty FILESTREAM columns, the LSN in the column wouldn’t match the LSN of the copied-in files

And you can’t use SELECT-INTO either, because it will not create FILESTREAM files and instead copies them and stores them in the new table as varbinary(max) with a limit of 2GB.

Bottom line: You can use INSERT INTO <target table> SELECT * FROM <source table> and that *will* populate the FILESTREAM files, but that means reading the FILESTREAM data through SQL Server’s buffer pool, which may be very inefficient for large amounts of data. So your best bet may be to write a little application that will using NTFS streaming access to directly copy the FILESTREAM data from the source table to the target table.

A SQL Server DBA myth a day: (18/30) FILESTREAM storage, garbage collection, and more

(Look in the Misconceptions blog category for the rest of the month’s posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together when you join our Insider list, plus my online Myths and Misconceptions training course on Pluralsight.)

Phew – starting to get tiring having to make sure I get out a blog post in the morning before doing everything else in the day in case I forget… another multi-mythbusting post for you today!

Myth #18: various FILESTREAM data myths.

All of them are FALSE

18a) FILESTREAM data can be stored remotely 

No. A FILESTREAM data container (the invented name for the NTFS directory structure that stores the FILESTREAM data) must adhere to the same locality rules as regular database data and log files – i.e. it must be placed on storage 'local' to the Windows server running SQL Server. FILESTREAM data can be *accessed* using a UNC path, as long as the client has contacted the local SQL Server and obtained the necessary transaction context to use when opening the FILESTREAM file.

18b) FILESTREAM data containers can be nested

No. Two FILESTREAM data containers for the same database may share a root directory, but data containers cannot be nested, and data containers from different databases cannot share a directory. I blogged an example script that shows this at Misconceptions around FILESTREAM storage.

18c) Partial updates to FILESTREAM files are supported

No. Any update to a FILESTREAM file creates an entirely new FILESTREAM file, which will be picked up by log backups. This is why FILESTREAM cannot be used with database mirroring – the amount of data to be pushed to the mirror would be prohibitive. Hopefully a future version of SQL Server will implement a differencing mechanism that will allow partial updates and hence database mirroring compatibility.

18d) FILESTREAM garbage collection occurs instantaneously

No. Garbage collection occurs once a FILESTREAM file is no longer required (usually meaning it's been backed up by a log backup) AND a further checkpoint occurs. This is very non-intuitive and leads many people to think that FILESTREAM garbage collection isn't working for them. I explained this in detail in my post FILESTREAM garbage collection.

18e) FILESTREAM directory and filenames cannot be determined

No. There is method to the seeming madness of the GUIDs and weird filenames. The actual FILESTREAM filenames are the character representation of the LSN of the log record that described the creation of the file. The table and column directory names are GUIDs that you can get from system tables. (To be entirely accurate, the table directories are actually 'rowset' directories as far as the Storage Engine is concerned – with a rowset being equal to the portion of a table in a single partition of a partitioned tabled).

I have two blog posts which crack open the various names and system tables and show you how to figure out what's what: