(Look in the Misconceptions blog category for the rest of the month’s posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together when you join our Insider list, plus my online Myths and Misconceptions training course on Pluralsight.)
This is a myth I hear over and over and over…
Myth #12: tempdb should always have one data file per processor core.
Sigh. This is one of the most frustrating myths because there’s so much ‘official’ information from Microsoft, and other blog posts that persists this myth.
One of the biggest confusion points is that the SQL CAT team recommends 1-to-1, but they’re coming from a purely scaling perspective, not from an overall perf perspective, and they’re dealing with big customers with top-notch servers and I/O subsystems. Most people are not.
There’s only one tempdb per instance, and lots of things use it, so it’s often a performance bottleneck. You guys know that already. But when does a performance problem merit creating extra tempdb data files?
When you see PAGELATCH_XX waits on tempdb, you’ve got contention for in-memory allocation bitmaps. When you see PAGEIOLATCH_XX waits on tempdb, you’ve got contention at the I/O subsystem level. You can think of a latch as kind of like a traditional lock, but much lighter weight, much more transitory, and used by the Storage Engine internally to control access to internal structures (like in-memory copies of database pages).
Check out my Wait Stats category for some neat scripts using the sys.dm_os_wait_stats DMV to show you what kind of wait is most prevalent on your server. If you see that it’s PAGELATCH_XX waits, you can use this script Robert Davis (twitter|blog). It uses the sys.dm_os_waiting_tasks DMV to break apart the wait resource and let you know what’s being waited on in tempdb.
If you’re seeing PAGELATCH_XX waits on tempdb, then you can mitigate it using trace flag 1118 (fully documented in KB 328551) and creating extra tempdb data files. I wrote a long blog post debunking some myths around this trace flag and why it’s still potentially required in all versions, including SQL Server 2012 – see Misconceptions around TF 1118.
On SQL Server 2000, the recommendation was one tempdb data file for each logical processor core. On all later versions, including SQL Server 2012, that recommendation persists, but because of some optimizations (see my blog post) you usually do not need one-to-one – you may be fine with the number of tempdb data files equal to 1/4 to 1/2 the number of logical processor cores – which is what everyone apart from the official Microsoft guidance recommends.
But there’s now even better guidance, and what I also recommend. At PASS in 2011, my good friend Bob Ward, who’s the top guy in SQL Product Support, espoused a new formula: if you have less than 8 cores, use #files = #cores. If you have more than 8 cores, use 8 files and if you’re seeing in-memory contention, add 4 more files at a time. This is documented Microsoft advice in KB article 2154845.
Now this is all one big generalization. I heard just last week of a customer who’s tempdb workload was so high that they had to use 64 tempdb data files on a system with 32 logical processor cores – and that was the only way for them to alleviate contention. Does this mean it’s a best practice? Absolutely not!
So, why is one-to-one not always a good idea? Too many tempdb data files can cause performance problems for another reason. If you have a workload that uses query plan operators that require lots of memory (e.g. sorts), the odds are that there won’t be enough memory on the server to accommodate the operation, and it will spill out to tempdb. If there are too many tempdb data files, then the writing out of the temporarily-spilled data can be really slowed down while the allocation system does round-robin allocation. The same thing can happen with very large temp tables in tempdb too.
Why would round-robin allocation cause things to slow down for memory-spills to tempdb with a large number of files? A couple of possibilities:
- Round-robin allocation is per filegroup, and you can only have one filegroup in tempdb. With 16, 32, or more files in tempdb, and very large allocations happening from just a few threads, the extra synchronization and work necessary to do the round-robin allocation (looking at the allocation weightings for each file and deciding whether to allocate or decrement the weighting, plus quite frequently recalculating the weightings for all files – every 8192 allocations) starts to add up and become noticeable. It’s very different from lots of threads doing lots of small allocations. It’s also very different from allocating from a single-file filegroup – which is optimized (obviously) to not do round-robin.
- Your tempdb data files are not the same size and so the auto-grow is only growing a single file (the algorithm is unfortunately broken), leading to skewed usage and an I/O hotspot.
- Having too many files can lead to essentially random IO patterns when the buffer pool needs to free up space through the lazywriter (tempdb checkpoints don’t flush data pages) for systems with not very large buffer pools but *lots* of tempdb data. If the I/O subsystem can’t handle the load across multiple files, it will start to slow down.
I really need to do a benchmarking blog post to show what I mean – but in the mean time, I’ve heard this from multiple customers who’ve created large numbers of tempdb files, and I know this from how the code works (my dev team owned the allocation code).
So you’re damned if you do and damned if you don’t, right? Potentially – yes, this creates a bit of a conundrum for you – how many tempdb data files should you have? Well, I can’t answer that for you – except to give you these guidelines based on talking to many clients and conference/class attendees. Be careful to only create multiple tempdb data files to alleviate contention that you’re experiencing – and not to push it to too many data files unless you really need to – and to be aware of the potential downfalls if you have to. You may have to make a careful balance of scalability vs. performance to avoid helping one workload and hindering another.
Hope this helps.
PS To address a comment that came in – no, the extra files don’t *have* to be on separate storage. If all you’re seeing it PAGELATCH_XX contention, separate storage makes no difference as the contention is on in-memory pages. For PAGEIOLATCH_XX waits, you most likely will need to use separate storage, but not necessarily – it may be that you need to move tempdb itself to different storage from other databases rather than just adding more tempdb data files. Analysis of what’s stored where will be necessary to pick the correct path to take.