(Look in the Misconceptions blog category for the rest of the month’s posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together when you join our Insider list, plus my online Myths and Misconceptions training course on Pluralsight.)
This is a myth I hear over and over and over…
Myth #12: tempdb should always have one data file per processor core.
FALSE
Sigh. This is one of the most frustrating myths because there’s so much ‘official’ information from Microsoft, and other blog posts that persists this myth.
One of the biggest confusion points is that the SQL CAT team recommends 1-to-1, but they’re coming from a purely scaling perspective, not from an overall perf perspective, and they’re dealing with big customers with top-notch servers and I/O subsystems. Most people are not.
There’s only one tempdb per instance, and lots of things use it, so it’s often a performance bottleneck. You guys know that already. But when does a performance problem merit creating extra tempdb data files?
When you see PAGELATCH_XX waits on tempdb, you’ve got contention for in-memory allocation bitmaps. When you see PAGEIOLATCH_XX waits on tempdb, you’ve got contention at the I/O subsystem level. You can think of a latch as kind of like a traditional lock, but much lighter weight, much more transitory, and used by the Storage Engine internally to control access to internal structures (like in-memory copies of database pages).
Check out my Wait Stats category for some neat scripts using the sys.dm_os_wait_stats DMV to show you what kind of wait is most prevalent on your server. If you see that it’s PAGELATCH_XX waits, you can use this script Robert Davis (twitter|blog). It uses the sys.dm_os_waiting_tasks DMV to break apart the wait resource and let you know what’s being waited on in tempdb.
If you’re seeing PAGELATCH_XX waits on tempdb, then you can mitigate it using trace flag 1118 (fully documented in KB 328551) and creating extra tempdb data files. I wrote a long blog post debunking some myths around this trace flag and why it’s still potentially required in all versions, including SQL Server 2012 – see Misconceptions around TF 1118.
On SQL Server 2000, the recommendation was one tempdb data file for each logical processor core. On all later versions, including SQL Server 2012, that recommendation persists, but because of some optimizations (see my blog post) you usually do not need one-to-one – you may be fine with the number of tempdb data files equal to 1/4 to 1/2 the number of logical processor cores – which is what everyone apart from the official Microsoft guidance recommends.
But there’s now even better guidance, and what I also recommend. At PASS in 2011, my good friend Bob Ward, who’s the top guy in SQL Product Support, espoused a new formula: if you have less than 8 cores, use #files = #cores. If you have more than 8 cores, use 8 files and if you’re seeing in-memory contention, add 4 more files at a time.
Now this is all one big generalization. I heard just last week of a customer who’s tempdb workload was so high that they had to use 64 tempdb data files on a system with 32 logical processor cores – and that was the only way for them to alleviate contention. Does this mean it’s a best practice? Absolutely not!
So, why is one-to-one not always a good idea? Too many tempdb data files can cause performance problems for another reason. If you have a workload that uses query plan operators that require lots of memory (e.g. sorts), the odds are that there won’t be enough memory on the server to accommodate the operation, and it will spill out to tempdb. If there are too many tempdb data files, then the writing out of the temporarily-spilled data can be really slowed down while the allocation system does round-robin allocation. The same thing can happen with very large temp tables in tempdb too.
Why would round-robin allocation cause things to slow down for memory-spills to tempdb with a large number of files? A couple of possibilities:
- Round-robin allocation is per filegroup, and you can only have one filegroup in tempdb. With 16, 32, or more files in tempdb, and very large allocations happening from just a few threads, the extra synchronization and work necessary to do the round-robin allocation (looking at the allocation weightings for each file and deciding whether to allocate or decrement the weighting, plus quite frequently recalculating the weightings for all files – every 8192 allocations) starts to add up and become noticeable. It’s very different from lots of threads doing lots of small allocations. It’s also very different from allocating from a single-file filegroup – which is optimized (obviously) to not do round-robin.
- Your tempdb data files are not the same size and so the auto-grow is only growing a single file (the algorithm is unfortunately broken), leading to skewed usage and an I/O hotspot.
- Having too many files can lead to essentially random IO patterns when the buffer pool needs to free up space through the lazywriter (tempdb checkpoints don’t flush data pages) for systems with not very large buffer pools but *lots* of tempdb data. If the I/O subsystem can’t handle the load across multiple files, it will start to slow down.
I really need to do a benchmarking blog post to show what I mean – but in the mean time, I’ve heard this from multiple customers who’ve created large numbers of tempdb files, and I know this from how the code works (my dev team owned the allocation code).
So you’re damned if you do and damned if you don’t, right? Potentially – yes, this creates a bit of a conundrum for you – how many tempdb data files should you have? Well, I can’t answer that for you – except to give you these guidelines based on talking to many clients and conference/class attendees. Be careful to only create multiple tempdb data files to alleviate contention that you’re experiencing – and not to push it to too many data files unless you really need to – and to be aware of the potential downfalls if you have to. You may have to make a careful balance of scalability vs. performance to avoid helping one workload and hindering another.
Hope this helps.
PS To address a comment that came in – no, the extra files don’t *have* to be on separate storage. If all you’re seeing it PAGELATCH_XX contention, separate storage makes no difference as the contention is on in-memory pages. For PAGEIOLATCH_XX waits, you most likely will need to use separate storage, but not necessarily – it may be that you need to move tempdb itself to different storage from other databases rather than just adding more tempdb data files. Analysis of what's stored where will be necessary to pick the correct path to take.
20 Responses to A SQL Server DBA myth a day: (12/30) tempdb should always have one data file per processor core
@Glenn Yeah, like the page at the SQLCat site which promotes 1/4 – 1/2 files per processor for *user* databases and insists on 1-1 for TempDB.
I have also heard that the extra tempdb data files need to be on separate physical spindles to realize any performance improvement. Simply increasing the number of files on the same volume will probably just contribute to disk queuing and thrashing.
Depends why you’re adding files. If you’re adding files just to alleviate contention on the allocation pages, they can be on the same drive. If you’re adding files because you’re bottlenecked on IO, then yes, they need to be separate.
I updated the post to debunk that myth too.
Ray: That’s not correct. The issue has to do with latch contention on certain types of pages (PFS, GAM, SGAM) that need to be written to when new data pages are allocated. This has nothing to do with disk contention so separate spindles are not required to see a benefit if the server is experiencing this issue.
Paul: You can also use my Who is Active script to find these issues…
http://tinyurl.com/WhoIsActive
(not sure if I’ve mentioned that script to you before? :-) )
Well Adam, it depends if its PAGELATCH or PAGEIOLATCH contention. You’re just talking about PAGELATCH contention. Yup – you’ve mentioned the script many times – when I get time, I’ll play with it. You’re one of many in the queue.
There are internal MS engineers on the query engine team who give out similar suggestions. Granted the proper answer is always "your mileage may vary" – however they did point out to us that tempdb had higher contention as of SQL Server 2005.
We are using Read Committed Snapshot. MS told us that row versioning will increase the contention on tempdb and that we should monitor it in order to determine what the appropriate sizing was. They suggested a starting point as 1 per Socket (not core). However it wasn’t a hard and fast allocation, just a strong recommendation to watch it because it can make a significant difference in performance.
Recommendations regarding moving to separate spindles has changed over the years. Define spindle! Modern SANs operate way differently than old style SCSI disks do. NetApp for instance – see how WAFL works. A filesystem block no longer maps to a disk block – you don’t know where it is going to write (okay – "at the end").
Thanks for the scripts. This will make it easier to determine what the correct allocation is.
Be careful – the statement "however they did point out to us that tempdb had higher contention as of SQL Server 2005" is IF you’re using new features that use tempdb (like snapshot isolation) AND your tempdb is not already setup to avoid contention. Snapshot isolation will increase the usage of tempdb, but depending on how much it is used (i.e. version generation rate from all databases, and version lookups from all databases), it might not cause any extra contention. It’s likely going to cause IO contention, not so much PAGELATCH contention, unless there’s a huge version generation rate. (My dev team built the version store code for tempdb.)
Thanks
It’s relatively for the SAN admins to create a separate T: drive for all our tempdb files (if more than one), and that’s typically a given for all our production servers. We typically have two or three as a default mainly because as a habit from our older servers that were SQL 2000 (yes, we’re migrating…). Two or three, SHOULD not be an issue as compared to one (but maybe twenty at the start would).
Andrew – I’ll add a PPS to the post…
This is definitely a persistent myth. You still find lots of Microsoft material that promotes this idea. It is good to see your explanation of the downside of having too many TempDB data files.
I was intrigued by your comment regarding round robin allocation. I can understand > 1 file adds an overhead but are you saying that the more files you have the greater the overhead of round robin? I thought it would just be a case of "next file please"…..
[...] small temp tables. I've discussed this many time – most recently in the post A SQL Server DBA myth a day: (12/30) tempdb should always have one data file per processor core. This post has lots of links to scripts where you can see if this is a problem for you, and [...]
[...] [...]
[...] latch contention is showing – page ID (2:1:1) – the first PFS page in tempdb. (See here for more on tempdb contention, and here for more on PFS [...]
[...] A SQL Server DBA myth a day: (12/30) tempdb should always have one data file per processor core [...]
[...] pages. The most well-known cases of these are the PFS, SGAM, and GAM contention that can occur in tempdb under certain workloads. To find out what page the contention is on, you’ll need to use the DMV [...]
[...] http://www.sqlskills.com/blogs/paul/a-sql-server-dba-myth-a-day-1230-tempdb-should-always-have-one-d… [...]
Hi Paul,
But why do the Microsoft Baseline Configuration Analyzer gives this Error. Please explain and let me know if I can ignore this error.
Category: Performance
Source: localhost
Issue: This instance of SQL Server has only one tempdb data file
Impact: Under a heavy load, the tempdb database can become a single point of contention and affect concurrency and performance
Resolution: To reduce contention in the tempdb database, increase the number of tempdb data files, and configure the required startup trace flags.
I don’t understand – there’s nothing wrong with the error. What do you think is wrong with it?