Search Engine Q&A #12: Should you create multiple files for a user DB on a multi-core box?

There’s been a very interesting discussion going on over at SQLServerCentral.com about whether to create multiple files for a user database because the server has multiple CPUs – see the thread here. I wrote a couple of long responses during the thread that I wanted to duplicate here as I think it’s of broad interest.

My first response was:

Doesn’t make sense to split any database into multiple files for per-cpu performance, except tempdb which can suffer from multiple-CPUs all trying to modify the same allocation bitmaps under high load with small temp tables being created/deleted (see http://www.microsoft.com/technet/prodtechnol/sql/2005/workingwithtempdb.mspx for details).

Now, saying that, there’s an exception – and that’s when a non-tempdb database has the same allocation bitmap contention issues, but that only happens on the very high-end with thousands of single-row inserts per second happening on each CPU. It’s pretty rare. I’ve never seen it but Kimberly has.

What you should be considering is splitting a database into filegroups to be able to isolate critical tables within separate filegroups, or splitting a partitioned table across multiple filegroups. For SS2005, these aren’t for performance but instead for manageability (e.g. being able to restore a single filegroup offline while the rest of the database remains online, or setting several partitions of a table to be read-only). For SS2008, there are improvements to query plan generation over multiple partitions that could allow for better parallelism, but its still not optimal compared to non-partitioned data.

So – complex topic and these are simple guidelines. Hope they help.

This was followed by a discussion pointing out that various hardware vendors imply the opposite, as do several MS sources. So here was my latest response:

Hi folks,

What’s really not made clear in the MS literature is that the multiple-files in a database really only applies to tempdb. Even on tempdb, on 2005 you don’t need one file per core – more like 1/4 -1/2 the number of files as there are cores.

The tempdb problem is this – common workloads create and drop many worktables per second. The allocations that happen for a table are initially single-pages (as opposed to whole extents). This means that a search of the SGAM page for the 4GB interval needs to be done to find a mixed-extent with a free page to allocate. Multiple CPUs all hammering this page cause contention on it and performance problems. Then, a page needs to be allocated for the first IAM page – same thing happens. Then these pages need to be marked allocated in a PFS page – same thing happens. And then these pages need to inserted into the sysindexes row for the tabel – more contention. On 2000 this was particularly bad – so T1118 plus multiple files was the solution, where SQL Server would round-robin the single page allocations in the files in tempdb, alleviating the contention somewhat.

In SQL Server 2005, we changed the temp table mechanism so that whenever a temp table is dropped, one data page, one IAM page, and the system table entries (no longer sysindexes, but instead is a ‘hidden’ table called sys.allocation_units) are cached. When a new temp table is allocated, if there’s a cached ‘template temp table’ it is picked up and used without so much contention on the various allocation bitmaps. On a heavily loaded system there can still be contention and so you still need multiple files for an SMP box, but just not so many. And you don’t need T1118 any more for user databases, but do still for tempdb.

So – this is more prevalent on tempdb, but CAN happen on a user database under extreme load on monster hardware. Testing should show whether this is happening to you – if not, don’t create multiple files for performance.

Now, in terms of what works for your particular vendors IO scalability – that’s beyond me and you may need to think about that if its something they recommend. However – I’d still take it with a pinch of salt and do your own testing. See the whitepaper at http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/pdpliobp.mspx for some testing/banchmarking info.

Hope this helps clarify a bit more – great discussion!

I’m interested to hear any other opinions on this – especially cases where you’ve found it necessary to create multiple files for performance.

Thanks!

7 thoughts on “Search Engine Q&A #12: Should you create multiple files for a user DB on a multi-core box?

  1. Hi,

    In our company we are using multiple file-groups per database, distributing them on separate spindles for better performance. For example, non-clustered indexes are created in an IDX filegroup and placed on physical device apart from the main data.

    This is not related to the present topic, is it? I take it what you object to is the splitting into multiple files to match to the available CPUs.

    Can you pls clarify?

  2. Hi Marios – yes you’re talking about something different, which is fine as long as you’re actually seeing a performance boost. Thanks

  3. What about backup speed ? If you have a DB with several data files, each located on a different partition or mount point, BACKUP DATABASE will read all of them in parallel with asynchronous reads, and interleave the buffers in the output stream(s). This results in significantly faster backups, depending on the setup of your storage/SAN of course. With TB large DB having 8 or 16 data files, I’ve seen a big difference compared to a single data file. Obviously, this is not a valid reason to create several data files for TempDB, since it does not need to (and cannot) be backed up! ;-)
    BTW, I thought that this rule was also valid when SQL has to execute a Table Scan for example? Doesn’t it read all files in parallel if they are located on different partitions or mount points?

  4. I’d love to see an update to this based on current MSSQL Server versions and modern hardware capabilities. I have an instance of MSSQL Server 2017 running on a server with 12 cores. It supports a number of applications with high transaction counts and high CPU usage, so always trying to maximize IO and CPU performance. I’m not an expert when it comes to the internal workings of the MSSQL engine and rely on pro’s like you to explain things. :-) Would love to hear of this practice is still useful to the engine or are there any performance drawbacks to the “one data file per core” practice?

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.