Over the last few weeks I’ve had lots of questions about FILESTREAM performance and how to coax NTFS into scaling nicely. I just finished writing a 30-page whitepaper about FILESTREAM for the SQL Server team, which should be published before PASS in November (I’ll blog the link when I have it). Although my whitepaper isn’t strictly about performance, there is a long section about setting up your system to get high-performance from FILESTREAM. What I want to do in this blog post is give a bullet list of things to do that will help you get good performance. All of these are explained in more detail in the whitepaper.
Here you go, in no particular order:
- Make sure you’re storing the right-sized data in the right way. Jim Gray (et al) published a research paper a couple of years ago based called To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?. To summarize the findings, BLOBs smaller than 256-KB should be stored in a database, and 1-MB or larger should be stored in the file-system. For those in-between, “it depends” – my favorite answer. The upshot of this is that you won’t get good performance if you store lots of small BLOBs in FILESTREAM.
- Use an appropriate RAID level for the NTFS volume that will host the FILESTREAM data container (the NTFS directory structure corresponding to the FILESTREAM filegroup in the database). Don’t use RAID-5, for instance, for a write-intensive workload.
- Use an appropriate disk technology. SCSI will be usually be faster than SATA/IDE, but more expensive. This is because SCSI drives usually have higher rotational speeds, so lower latency and seek times.
- Whichever disk technology you choose, if SATA, ensure it supports NCQ, and if SCSI, ensure it supports CTQ. Both of these allow the drives to process multiple, interleaved IOs concurrently.
- Separate the data containers. Separate them from each other, and separate them from other database data and log files. This avoids contention for the disk heads.
- Defrag the NTFS volume if needed before setting up FILESTREAM, and periodically to maintain good scan performance
- Turn off 8.3 name generation on the NTFS volume. This is an order-N algorithm that has to check that the new name generated doesn’t collide with any existing names in the directory. This slows insert and update performance down *a lot*. Do this using the command line fsutil utility.
- Turn off tracking of last access time using fsutil.
- Set the NTFS cluster size appropriately. For BLOBs 1-MB or large, use a cluster size of 64-KB. This will help to reduce fragmentation.
- A partial update of FILESTREAM data creates a new file. Batch lots of small updates into one large update to reduce churn.
- When streaming the data back to the client, use an ~60-KB SMB buffer size (or multiples thereof). This is so that the buffers don’t get overly fragmented as TCP/IP buffer are 64-KB.
Hope this helps!