After writing the FILESTREAM whitepaper for Microsoft, I’ve had lots of questions about the structure of the FILESTREAM data container. The FILESTREAM data container is the technical term for the NTFS directory structure where all the FILESTREAM data is stored.
When you want to use FILESTREAM data, you first add a filegroup (during or after database creation):
ALTER DATABASE [FileStreamTestDB] ADD FILEGROUP [FileStreamGroup1] CONTAINS FILESTREAM; GO
And then add a ‘file’ to the filegroup:
ALTER DATABASE [FileStreamTestDB] ADD FILE ( NAME = [FSGroup1File], FILENAME = N'C:\Metro Labs\FileStreamTestDB\Documents') TO FILEGROUP [FileStreamGroup1]; GO
The ‘file’ is actually the pathname to what will become the root directory of the FILESTREAM data container. When it’s initially created, it will contain a single file, filestream.hdr, and a single directory $FSLOG. Filestream.hdr is a metadata file describing the data container and the $FSLOG directory is the FILESTREAM equivalent of the database transaction log. You can think of them as equivalent, although the FILESTREAM log has some interesting semantics.
The question I most often get is: are all the FILESTREAM files for a database stored in one gigantic directory? The answer is no.
The root directory of the data container contains one sub-directory for each table (or each partition of a partitioned table). Each of those directories contains a further sub-directory for each FILESTREAM column defined in the table. An example is below, with the screen shot taken after running the following code:
CREATE TABLE [FileStreamTest1] ( [DocId] UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE, [DocName] VARCHAR (25), [Document] VARBINARY(MAX) FILESTREAM); GO CREATE TABLE [FileStreamTest2] ( [DocId] UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE, [DocName] VARCHAR (25), [Document1] VARBINARY(MAX) FILESTREAM, [Document2] VARBINARY(MAX) FILESTREAM); GO INSERT INTO [FileStreamTest1] VALUES (NEWID (), 'Paul Randal', CAST ('SQLskills.com' AS VARBINARY(MAX))); INSERT INTO [FileStreamTest1] VALUES (NEWID (), 'Kimberly Tripp', CAST ('SQLskills.com' AS VARBINARY(MAX))); GO
This image shows the FILESTREAM data container for our database that has two tables with FILESTREAM columns, each with a single partition. The first table has a two FILESTREAM columns and the second has a single FILESTREAM column. The filenames of all these directories are GUIDs and I explain how to match directories to tables and columns in this post.
In the example, you can see two FILESTREAM files in a column-level directory. The FILESTREAM file names are actually the log-sequence number from the database transaction log at the time the files were created. You can correlate these by looking at the data with DBCC PAGE, but first finding the allocated pages using sp_AllocationMetadata (see this blog post):
EXEC sp_AllocationMetadata FileStreamTest1; GO
Object Name Index ID Alloc Unit ID Alloc Unit Type First Page Root Page First IAM Page --------------- -------- ----------------- --------------- ---------- --------- -------------- FileStreamTest1 0 72057594039697408 IN_ROW_DATA (1:169) (0:0) (1:170) FileStreamTest1 0 72057594039762944 ROW_OVERFLOW_DATA (0:0) (0:0) (0:0) FileStreamTest1 2 72057594039828480 IN_ROW_DATA (1:171) (1:171) (1:172)
Notice there’s a nonclustered index as well as the heap – that’s the index that’s enforcing the uniqueness constraint on the UNIQUEIDENTIFIER column. Now we can use DBCC PAGE to look at the first page of the heap, which will have out data records in:
DBCC TRACEON (3604); DBCC PAGE (FileStreamTestDB, 1, 169, 3); GO
(snip) Slot 0 Offset 0x60 Length 88 Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS Record Size = 88 Memory Dump @0x514EC060 00000000: 30001400 140d5047 2ca9f24f 874d35ca †0.....PG,©òOM5Ê 00000010: e9e77649 03000002 00280058 80506175 †éçvI.....(.X.Pau 00000020: 6c205261 6e64616c 03000000 00000080 †l Randal........ 00000030: 140d5047 2ca9f24f 874d35ca e9e77649 †..PG,©òOM5ÊéçvI 00000040: 01000000 68020000 00000000 17000000 †....h........... 00000050: 79000000 0c000000 †††††††††††††††††††y....... Slot 0 Column 1 Offset 0x4 Length 16 Length (physical) 16 DocId = 47500d14-a92c-4ff2-874d-35cae9e77649 Slot 0 Column 2 Offset 0x1d Length 11 Length (physical) 11 DocName = Paul Randal Document = [Filestream column] Slot 0 Column 3 Offset 0x28 Length 48 ColType = 3 FileId = -2147483648 UpdateSeq = 1 CreateLSN = 00000017:00000079:000c (23:121:12) TxFMiniVer = 0 XdesId = (0:616) (snip)
You can see that the CreateLSN above matches the filename of the first FILESTREAM file in the example image.
Hopefully this explains how the FILESTREAM files are stored – more on this in the next post where I’ll explain how updates and garbage collection are implemented.