Many times I'm asked whether having multiple data files can lead to an improvement in performance. The answer, as with all things SQL (except concerning auto-shrink) is a big, fat "it depends." It depends on what you're using the database for, and the layout of the files on the IO subsystem, and the IO subsystem capabilities. I've heard examples of "yes" and I've heard examples of "no."
Just for kicks, I put some of my test hardware to use to do some experimentation. (You can get to all my other Benchmarking posts using this link.)
My setup for this series of tests is:
- Log file pre-sized to 8GB (to avoid log growth) on 8 x 1TB 7.2k SATA RAID-10 array, one iSCSI NIC, 128 KB stripe size
-
160GB database, variously setup as (all in the PRIMARY filegroup):
-
1 x 160GB file
-
2 x 80GB files
-
4 x 40GB files
-
8 x 20GB files
-
16 x 10GB files
-
-
16 connections inserting 100/16GB each, no other activity, all code executing on the server, no data transfer from clients
Each test was run 5 times and then the time-for-test calculated as the average of the 5 test runs, so the two tests together represent 50 test runs. Luckily I wrote a test harness that will tear down and setup the database automatically each time in the different configurations, so just double click a cmd file and then a day or so later I get an email saying the test has finished. Great when we're traveling!
Here are the test results:
As you can see, it's pretty clear that with both test setups, having more data files definitely does produce a performance improvement, but only up to a point.
Test 1: Data files on 8 x 300GB 15k SCSI RAID-10 array, two iSCSI NICs, 128KB stripe size
Test 2: Data files round-robin between two 4 x 300GB 15k SCSI RAID-10 array, one iSCSI NIC each, 128KB stripe size
In both cases, the performance increases up to eight data files, and then begins to decrease again with sixteen data files. The single data file case was bound to be slower on the SCSI array with fewer drives, and we see that in the results (left-most result in red).
In the best case, the eight-file case on two arrays was just over 6% faster than the single-file case on the single array. Hardly earth-shattering, but still a non-trivial gain.
Where's the gain coming from? I ran wait stats analysis for a few test variations – for example, between the eight data files test and the single data file test using two arrays, the cumulative wait stats were almost identical – 38/39% PAGELATCH_EX, 19/21% PAGELATCH_SH, 12/13% WRITELOG. The gain is mostly coming from the IO subsystem, but the SCSI arrays are still overloaded, as I showed in plenty of the previous benchmarking tests.
Now, this is a very contrived test, with a single operation in my workload – it's definitely NOT representative of a mixed-operation OLTP workload. However, I did see a gain from having multiple data files – and I believe I would have seen more gain had the SCSI array(s) not been pretty much maxed out already.
I've heard plenty of anecdotal evidence that adding a few more data files for user databases can lead to performance improvements, but your mileage is definitely going to vary. I'd be very interested to hear your observations in production as comments to this post (but please keep the comments constructive – don't give me a laundry-list of tests/settings you want me to try, or rant about real-life vs. lab tests).
Enjoy!
PS The next post on SSDs is forthcoming – just finishing up the (extensive) tests – and also the post on how you all have your log files configured, from the survey I did a while ago. Thanks for being patient!
15 Responses to Benchmarking: do multiple data files make a difference?
Great post! I read somewhere that NUMA servers get an IO thread per node. Do you think NUMA or server architecture in general would change these test results at all?
Hi, Thanks for the post. I have a question.
Why there is an increase in time for dual RAID when 160 GB single file is used?
@Preethi – because the 160GB file is being hosted by a RAID array with only 4 drives instead of 8, so the avg disk secs/write goes up
@Dave – not sure to be honest. Even on an SMP box, setting IO affinity might make a difference. I’ll maybe give it a try and report.
Thanks Paul. Its a simple truth, I didn’t see it first!
Thanks Paul. It is good to see a test and data that match the observations I have made in an environment where we implemented multiple data files and saw performance improvements.
Interesting to see this too: A single 160 GB file on single RAID works marginally better than two 80 GB files in dual RAID.
It’s been my experience that alignment between data files and cores is useful. So, if you have 8 cores, you should have at least 8 data files. If there are more paths to the data, either via HBAs or DAS storage connections, then there should be another entire set of data files matching the core count. But, if the data file to core count exceeds a 1:1 ratio, then a lot of context switches are going to occur, which slows the overall I/O.
My question related to this is, I’ve heard arguments both for and against having index and data on separate files gets a boost, because key lookups can be done in parallel to the index lookup (and I know Kim is going to read this…not everything has a covering index. ;)) So, can we get more out of the disk by splitting index and data to separate files?
The number of IO Completion threads is dependant on the NUMA configuration (both soft and hard NUMA). The number of Lazy Writer threads is dependant on the number of Hardware NUMA nodes on the box. And the number of IO Completion threads will depend on the number of Soft NUMA nodes, if configured. If not configurd, then it will be equal to the Hardware NUMA nodes.
Rob Dorr talks about this in his blog post: http://blogs.msdn.com/b/psssql/archive/2010/04/02/how-it-works-soft-numa-i-o-completion-thread-lazy-writer-workers-and-memory-nodes.aspx
Hi Paul,
Can you elaborate on how you performed the insert test?
Was this different files in the same filegroup but on different drives?
thanks
r
@Rich yes – different number of files but all in the primary filegroup.
@Don Thought I’d remembered to reply to this – sometimes you can get a gain by splitting indexes out – totally depends on the workload.
Hello Paul,
Thank you for sharing the very useful R & D information.
So by reading above data can we conclude that if the data file is between 12.5% to 25% of the Total Database size we can get comparatively good performance?
Sure, still it depends on types of transactions and other hardware and software configurations.
Appreciate your thoughts on this.
Thank you in advance.
I would really like to see these tests rerun in a RAID 5 scenario, the 8 vs 4 situation might favor the single array situation due to the IO improvement in increasing disks in a RAID 5 array.
[...] as well (see Paul's excellent post on why a RW filegroup should generally have 2-4 files here: Benchmarking: do multiple data files make a difference?). So, at the initial start I have three [...]
[...] got from messing around with multiple files for the sequential inserts case last time (see here and here for [...]