After writing the FILESTREAM whitepaper for Microsoft, I've had lots of questions about the structure of the FILESTREAM data container. The FILESTREAM data container is the technical term for the NTFS directory structure where all the FILESTREAM data is stored.

When you want to use FILESTREAM data, you first add a filegroup (during or after database creation):

ALTER DATABASE FileStreamTestDB ADD FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM;
GO

And then add a 'file' to the filegroup:

ALTER DATABASE FileStreamTestDB ADD FILE (
     NAME = FSGroup1File, FILENAME = 'C:\Metro Labs\FileStreamTestDB\Documents')
TO FILEGROUP FileStreamGroup1;
GO

The 'file' is actually the pathname to what will become the root directory of the FILESTREAM data container. When it's initially created, it will contain a single file, filestream.hdr, and a single directory $FSLOG. Filestream.hdr is a metadata file describing the data container and the $FSLOG directory is the FILESTREAM equivalent of the database transaction log. You can think of them as equivalent, although the FILESTREAM log has some interesting semantics, which I'll cover in a separate post.

The question I most often get is: are all the FILESTREAM files for a database stored in one gigantic directory? The answer is no.

The root directory of the data container contains one sub-directory for each table (or each partition of a partitioned table). Each of those directories contains a further sub-directory for each FILESTREAM column defined in the table. An example is below, with the screen shot taken after running the following code:

CREATE TABLE FileStreamTest1 (
    DocId UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,
    DocName VARCHAR (25),
    Document VARBINARY(MAX) FILESTREAM);
GO

CREATE TABLE FileStreamTest2 (
    DocId UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,
    DocName VARCHAR (25),
    Document1 VARBINARY(MAX) FILESTREAM,
    Document2 VARBINARY(MAX) FILESTREAM);
GO

INSERT INTO FileStreamTest1 VALUES (NEWID (), 'Paul Randal', CAST ('SQLskills.com' AS VARBINARY(MAX)));
INSERT INTO FileStreamTest1 VALUES (NEWID (), 'Kimberly Tripp', CAST ('SQLskills.com' AS VARBINARY(MAX)));
GO

fsdirstructure2 FILESTREAM directory structure

This image shows the FILESTREAM data container for our database that has two tables with FILESTREAM columns, each with a single partition. The first table has a two FILESTREAM columns and the second has a single FILESTREAM column. The filenames of all these directories are GUIDs. In the example, you can see two FILESTREAM files in a column-level directory. The FILESTREAM file names are actually the log-sequence number from the database transaction log at the time the files were created. You can correlate these by looking at the data with DBCC PAGE, but first finding the allocated pages using sp_AllocationMetadata (see this blog post):

EXEC sp_AllocationMetadata FileStreamTest1;
GO

Object Name     Index ID Alloc Unit ID     Alloc Unit Type   First Page Root Page First IAM Page
————— ——– —————– —————   ———- ——— ————–
FileStreamTest1 0        72057594039697408 IN_ROW_DATA       (1:169)    (0:0)     (1:170)
FileStreamTest1 0        72057594039762944 ROW_OVERFLOW_DATA (0:0)      (0:0)     (0:0)
FileStreamTest1 2        72057594039828480 IN_ROW_DATA       (1:171)    (1:171)   (1:172)

(3 row(s) affected)

Notice there's a nonclustered index as well as the heap – that's the index that's enforcing the uniqueness constraint on the UNIQUEIDENTIFIER column. Now we can use DBCC PAGE to look at the first page of the heap, which will have out data records in:

DBCC TRACEON (3604);
DBCC PAGE (FileStreamTestDB, 1, 169, 3);
GO

<snip>

Slot 0 Offset 0×60 Length 88

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 88
Memory Dump @0x514EC060

00000000:   30001400 140d5047 2ca9f24f 874d35ca †0…..PG,©òO‡M5Ê
00000010:   e9e77649 03000002 00280058 80506175 †éçvI…..(.X.Pau
00000020:   6c205261 6e64616c 03000000 00000080 †l Randal……..
00000030:   140d5047 2ca9f24f 874d35ca e9e77649 †..PG,©òO‡M5ÊéçvI
00000040:   01000000 68020000 00000000 17000000 †….h………..
00000050:   79000000 0c000000 †††††††††††††††††††y…….

Slot 0 Column 1 Offset 0×4 Length 16 Length (physical) 16

DocId = 47500d14-a92c-4ff2-874d-35cae9e77649

Slot 0 Column 2 Offset 0x1d Length 11 Length (physical) 11

DocName = Paul Randal

Document = [Filestream column] Slot 0 Column 3 Offset 0×28 Length 48

ColType = 3                          FileId = -2147483648                 UpdateSeq = 1
CreateLSN = 00000017:00000079:000c (23:121:12)                            TxFMiniVer = 0
XdesId = (0:616)

<snip>

You can see that the CreateLSN I've highlighted above matches the filename of the first FILESTREAM file in the example image.

Hopefully this explains how the FILESTREAM files are stored – more on this in the next post where I'll show how updates and garbage collection are implemented.