(Look in the Misconceptions blog category for the rest of the month’s posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together when you join our Insider list, plus my online Myths and Misconceptions training course on Pluralsight.)
This is a myth I hear over and over and over…
Myth #12: tempdb should always have one data file per processor core.
Sigh. This is one of the most frustrating myths because there’s so much ‘official’ information from Microsoft, and other blog posts that persists this myth.
One of the biggest confusion points is that the SQL CAT team recommends 1-to-1, but they’re coming from a purely scaling perspective, not from an overall perf perspective, and they’re dealing with big customers with top-notch servers and I/O subsystems. Most people are not.
There’s only one tempdb per instance, and lots of things use it, so it’s often a performance bottleneck. You guys know that already. But when does a performance problem merit creating extra tempdb data files?
When you see PAGELATCH_XX waits on tempdb, you’ve got contention for in-memory allocation bitmaps. When you see PAGEIOLATCH_XX waits on tempdb, you’ve got contention at the I/O subsystem level. You can think of a latch as kind of like a traditional lock, but much lighter weight, much more transitory, and used by the Storage Engine internally to control access to internal structures (like in-memory copies of database pages).
Check out my Wait Stats category for some neat scripts using the sys.dm_os_wait_stats DMV to show you what kind of wait is most prevalent on your server. If you see that it’s PAGELATCH_XX waits, you can use this script Robert Davis (twitter|blog). It uses the sys.dm_os_waiting_tasks DMV to break apart the wait resource and let you know what’s being waited on in tempdb.
If you’re seeing PAGELATCH_XX waits on tempdb, then you can mitigate it using trace flag 1118 (fully documented in KB 328551) and creating extra tempdb data files. I wrote a long blog post debunking some myths around this trace flag and why it’s still potentially required in all versions, including SQL Server 2012 – see Misconceptions around TF 1118.
On SQL Server 2000, the recommendation was one tempdb data file for each logical processor core. On all later versions, including SQL Server 2012, that recommendation persists, but because of some optimizations (see my blog post) you usually do not need one-to-one – you may be fine with the number of tempdb data files equal to 1/4 to 1/2 the number of logical processor cores – which is what everyone apart from the official Microsoft guidance recommends.
But there’s now even better guidance, and what I also recommend. At PASS in 2011, my good friend Bob Ward, who’s the top guy in SQL Product Support, espoused a new formula: if you have less than 8 cores, use #files = #cores. If you have more than 8 cores, use 8 files and if you’re seeing in-memory contention, add 4 more files at a time. This is documented Microsoft advice in KB article 2154845.
Now this is all one big generalization. I heard just last week of a customer who’s tempdb workload was so high that they had to use 64 tempdb data files on a system with 32 logical processor cores – and that was the only way for them to alleviate contention. Does this mean it’s a best practice? Absolutely not!
So, why is one-to-one not always a good idea? Too many tempdb data files can cause performance problems for another reason. If you have a workload that uses query plan operators that require lots of memory (e.g. sorts), the odds are that there won’t be enough memory on the server to accommodate the operation, and it will spill out to tempdb. If there are too many tempdb data files, then the writing out of the temporarily-spilled data can be really slowed down while the allocation system does round-robin allocation. The same thing can happen with very large temp tables in tempdb too.
Why would round-robin allocation cause things to slow down for memory-spills to tempdb with a large number of files? A couple of possibilities:
- Round-robin allocation is per filegroup, and you can only have one filegroup in tempdb. With 16, 32, or more files in tempdb, and very large allocations happening from just a few threads, the extra synchronization and work necessary to do the round-robin allocation (looking at the allocation weightings for each file and deciding whether to allocate or decrement the weighting, plus quite frequently recalculating the weightings for all files – every 8192 allocations) starts to add up and become noticeable. It’s very different from lots of threads doing lots of small allocations. It’s also very different from allocating from a single-file filegroup – which is optimized (obviously) to not do round-robin.
- Your tempdb data files are not the same size and so the auto-grow is only growing a single file (the algorithm is unfortunately broken), leading to skewed usage and an I/O hotspot.
- Having too many files can lead to essentially random IO patterns when the buffer pool needs to free up space through the lazywriter (tempdb checkpoints don’t flush data pages) for systems with not very large buffer pools but *lots* of tempdb data. If the I/O subsystem can’t handle the load across multiple files, it will start to slow down.
I really need to do a benchmarking blog post to show what I mean – but in the mean time, I’ve heard this from multiple customers who’ve created large numbers of tempdb files, and I know this from how the code works (my dev team owned the allocation code).
So you’re damned if you do and damned if you don’t, right? Potentially – yes, this creates a bit of a conundrum for you – how many tempdb data files should you have? Well, I can’t answer that for you – except to give you these guidelines based on talking to many clients and conference/class attendees. Be careful to only create multiple tempdb data files to alleviate contention that you’re experiencing – and not to push it to too many data files unless you really need to – and to be aware of the potential downfalls if you have to. You may have to make a careful balance of scalability vs. performance to avoid helping one workload and hindering another.
Hope this helps.
PS To address a comment that came in – no, the extra files don’t *have* to be on separate storage. If all you’re seeing it PAGELATCH_XX contention, separate storage makes no difference as the contention is on in-memory pages. For PAGEIOLATCH_XX waits, you most likely will need to use separate storage, but not necessarily – it may be that you need to move tempdb itself to different storage from other databases rather than just adding more tempdb data files. Analysis of what’s stored where will be necessary to pick the correct path to take.
53 thoughts on “A SQL Server DBA myth a day: (12/30) tempdb should always have one data file per processor core”
@Glenn Yeah, like the page at the SQLCat site which promotes 1/4 – 1/2 files per processor for *user* databases and insists on 1-1 for TempDB.
I have also heard that the extra tempdb data files need to be on separate physical spindles to realize any performance improvement. Simply increasing the number of files on the same volume will probably just contribute to disk queuing and thrashing.
Depends why you’re adding files. If you’re adding files just to alleviate contention on the allocation pages, they can be on the same drive. If you’re adding files because you’re bottlenecked on IO, then yes, they need to be separate.
Ray: That’s not correct. The issue has to do with latch contention on certain types of pages (PFS, GAM, SGAM) that need to be written to when new data pages are allocated. This has nothing to do with disk contention so separate spindles are not required to see a benefit if the server is experiencing this issue.
Paul: You can also use my Who is Active script to find these issues…
(not sure if I’ve mentioned that script to you before? :-) )
There are internal MS engineers on the query engine team who give out similar suggestions. Granted the proper answer is always "your mileage may vary" – however they did point out to us that tempdb had higher contention as of SQL Server 2005.
We are using Read Committed Snapshot. MS told us that row versioning will increase the contention on tempdb and that we should monitor it in order to determine what the appropriate sizing was. They suggested a starting point as 1 per Socket (not core). However it wasn’t a hard and fast allocation, just a strong recommendation to watch it because it can make a significant difference in performance.
Recommendations regarding moving to separate spindles has changed over the years. Define spindle! Modern SANs operate way differently than old style SCSI disks do. NetApp for instance – see how WAFL works. A filesystem block no longer maps to a disk block – you don’t know where it is going to write (okay – "at the end").
Thanks for the scripts. This will make it easier to determine what the correct allocation is.
It’s relatively for the SAN admins to create a separate T: drive for all our tempdb files (if more than one), and that’s typically a given for all our production servers. We typically have two or three as a default mainly because as a habit from our older servers that were SQL 2000 (yes, we’re migrating…). Two or three, SHOULD not be an issue as compared to one (but maybe twenty at the start would).
This is definitely a persistent myth. You still find lots of Microsoft material that promotes this idea. It is good to see your explanation of the downside of having too many TempDB data files.
I updated the post to debunk that myth too.
Well Adam, it depends if its PAGELATCH or PAGEIOLATCH contention. You’re just talking about PAGELATCH contention. Yup – you’ve mentioned the script many times – when I get time, I’ll play with it. You’re one of many in the queue.
Be careful – the statement "however they did point out to us that tempdb had higher contention as of SQL Server 2005" is IF you’re using new features that use tempdb (like snapshot isolation) AND your tempdb is not already setup to avoid contention. Snapshot isolation will increase the usage of tempdb, but depending on how much it is used (i.e. version generation rate from all databases, and version lookups from all databases), it might not cause any extra contention. It’s likely going to cause IO contention, not so much PAGELATCH contention, unless there’s a huge version generation rate. (My dev team built the version store code for tempdb.)
I was intrigued by your comment regarding round robin allocation. I can understand > 1 file adds an overhead but are you saying that the more files you have the greater the overhead of round robin? I thought it would just be a case of "next file please"…..
Andrew – I’ll add a PPS to the post…
But why do the Microsoft Baseline Configuration Analyzer gives this Error. Please explain and let me know if I can ignore this error.
Issue: This instance of SQL Server has only one tempdb data file
Impact: Under a heavy load, the tempdb database can become a single point of contention and affect concurrency and performance
Resolution: To reduce contention in the tempdb database, increase the number of tempdb data files, and configure the required startup trace flags.
I don’t understand – there’s nothing wrong with the error. What do you think is wrong with it?
I think what Karthik means (and I also get the same suggestion/error when running the analyzer) is why is it debunked as a myth…, but when running the Microsoft Baseline Configuration Analyzer it comes back saying I should create more tempdb data files under the Category: Performance?
Because tempdb should NOT always have one data file per logical processor core – that was old advice from the 2000 time-frame. You need to be more careful about how many tempdb data files to create, as the post describes – not immediately going 1-1.
I believe the answer being sought is that the writers of the BPA application did not realize that this was a debunked myth and are still perpetuating it. BPA is a “configuration” analyzer, so it looks at your server’s configuration and compares it to its best-practice configuration, generating the “error” when your server has 1 tempdb file instead of 1 for each processor. It does no analysis of wait statistics to verify that this is really an “error” in your particular situation.
one “data files” for logical processor/core, but “data files” are only mdf+ndf or all database files (ldf too)?
Data files only.
Perfect. I have tried some different configurations for tempdb with SQL Server 2012: all tempdb files in a drive different from C, for example F, but in this situation SQL Server doesn’t start if drive F is not available for some reason. Then i moved tempdb’s files in C (with master, msdb, model) and i have added a secondary file to tempdb in F. In this situation SQL Server start if F is not available (and i think this is great!). In this last situation is preffered configure SAME options (size, filegrowth and max size) for mdf and ndf files?
Yes, same settings for all data files.
Never use the C driver for larger productive SQL Servers. And never use disk drives which go,randomly missing.
Above you mentioned “Your tempdb data files are not the same size and so the auto-grow is only growing a single file (the algorithm is unfortunately broken), leading to skewed usage and an I/O hotspot”
Does this mean that if one file is a different size than the others that only one file out of ALL the files assigned to tempdb will grow and the other files remain static OR is it that the server will only allow each file to grow to the size of the smallest configured file?
Only one file out of all tempdb files will grow.
“For PAGEIOLATCH_XX waits, you most likely will need to use separate storage, but not necessarily – it may be that you need to move tempdb itself to different storage from other databases rather than just adding more tempdb data files.”
I love using wait stats to reinforce what I’m seeing. You can also use sys.dm_io_virtual_file_stats to view read/write latencies for each file. If your average write stalls are 4000ms on each of your tempdb data files then you can create tempdb files until you blue in the face. The issue will often be the disk subsystem.
By the way, if you need some direction on wait stats, Pual’s pluralsight course on the topic is the best out to date.
Paul, would the same tempdb rule from Bob Ward hold true if the machine was Virtual?
How do the tempdb files get mapped to a specific core so each one is guaranteed its own etc?
They don’t. Allocations happen round-robin between the files so the total allocations are split evenly between the files (theoretically).
I was reading a whitepaper about TempDB and it stated there was no performance benefit for having more data files than CPUs. The excerpt:
“There will never be a need to have more than one data file per CPU.
The actual number of data files that SQL Server can use is the number
of concurrent processes using tempdb at the same time. All processes
ultimately run one task at a time on each CPU thread and there is only
one CPU thread per logical CPU. If every CPU is running a task and each
of those tasks is using tempdb, the number of threads hitting tempdb will
be the same as the number of logical CPUs. Your server will never have
more processes than number of CPUs hitting tempdb.”
You mention earlier that someone had 64 data files for 32 CPUs. I’m confused…
You’re (rightly) confused because Idera’s whitepaper is wrong. It’s assuming that each running task will be using a different file – which is not true.
Thanks for straightening that out sir, appreciate your time!
If we have multiple instances and using default setting (not configured Affinity mask) then core count would be same for every instance and I have to create same number of temp datafiles on every instance.
Just for an example in my new server, 2 CPU (Socket) of 12 Cores (24 Logical Processors)and 2 SQL instances then how many temp datafiles I have to create.
Please suggest. Thanks in advance.
There is no right answer for the number of data files – it depends on the contention you have.
Some of our customers have more than the number of Tempdb data files, than what you recommend in your blog post above. E.g., they may have 32 data files on a 32 proc server, or 24 data files on a 24 proc server, and so on. We have been trying to say that they should start with 8 and then add files in groups of 4, if there is contention. Regarding large number of Tempdb data files, you have mentioned above “I really need to do a benchmarking blog post to show what I mean”. By any chance have you gotten around to doing this? Do you have some data to show that a large number of Tempdb data files can cause performance problems? If so, this will be very helpful. Thank you!
Not yet – it’s on my long list of things to do.
Regarding: “If you have less than 8 cores, use #files = #cores. If you have more than 8 cores, use 8 files and if you’re seeing in-memory contention, add 4 more files at a time.”
I hate to pick nits but can you define core? We have just been rolling around in the mud working out default MAXDOP settings based on NUMA nodes and hyperthreading. Does your rule of thumb have a similar fudge factors for those scenarios? :-)
We are advised by the Cloud Data Center Sr. SQL DBA that the Temp DB data files should be split into 4 equal sizes each would be 14GB approx. (The total size of that particular partition is 100 GB). Only 47GBs are left as free. We yet to configure our databases over this new environment. I am okay with splitting the TempDB data files (8 Processors) into 4. But the initial size of each file is 14 GB. Is this advisable?
We use Standard Edition of SQL Server 2012.
Thanks & regards,
How long is a piece of string? There’s no right answer to how large tempdb should be. I’m guessing that the DBA is taking the steady-state size of tempdb on your system and splitting that size over multiple files.
This is exactly what the Sr. SQL DBA from Data Center replied “Splitting the files into 4 files is a standard Microsoft best practice. This helps performance by increasing the number of IO operations to the disk at any time given. We automatically grow out the files to about 60% of the disk (which is dedicated to tempdb), which also improves performance to IO. This is basically so that SQL is not waiting on IO, when it needs to extend the files. Both these practices help reduce IO contention and overall improves your SQL performance.”
If I have MaxDOP=1 (because Biztalk wants this setting), do I still need to align the number of datafiles for TempDb? (Server has 12 logical CPUs)
Tempdb contention isn’t to do with parallelism – it’s to do with concurrent connections making use of tempdb. So yes, if you have tempdb contention, you’ll need to have multiple tempdb data files to help mitigate it, regardless of MAXDOP.
Do we need to put Tempdb data (e.g H Drive) and log (e.g. G Drive) file on separate drives or one drive? is there any benefit if its on separate drives? I mean performance benefit.
It depends, on the IOPS demand of your workload on tempdb and the I/I capabilities of the I/O subsystem. There’s no simple yes/no that fits all scenarios.
Have you reached to any conclusion on TempDB size benchmark?
What tempdb size benchmark? Explain please.
I think he means benchmark regarding tempdb performance as a function of number of tempdb files.
Is this a best practice to configure temp files based on cores, even when we don’t notice any temp issues?
I want to question the validity of the Robert Davis script mentioned (or perhaps express my ignorance). In his blog, he describes the three page types, and mentions they begin on either page 1, 2 or 3 of a data file. His script subtracts 1, 2 or 3 from the page_id specified in the wait_resource and uses the result to work out if it is a a PFS, GAM or SGAM. So for example if the wait resource is 2:5:105144, his script will take 105144, subtract 1, and then divide by 8088, to see if it is a PFS. In other articles I don’t see any mention of having to subtract 1 from the page_id before dividing by 8088 to see if it is a PFS. Articles such as this one: https://support.microsoft.com/en-us/help/2154845/recommendations-to-reduce-allocation-contention-in-sql-server-tempdb-d which simply state “If a page is evenly divisible by 8088, it is a PFS page. For example, page 2:3:905856 is a PFS in file_id=3 in tempdb.”
On a SQL 2012 instance, I captured this data regularly over a few days, and wait resources such as 2:5:8088, 2:4:80880, 2:5:420576 that were blocking many sessions, were not identified as PFS. When I removed the subtraction from the script, it seemed to correctly identify the pages as PFS, GAM and SGAM.
Yes – the subtracting of one is wrong – and seems to have been added after he first posted the script as it used to work fine.
Hello! I found this very old thread, but it seems to be still monitored so I’m hopeful and posting our issue here.
I want to describe a problem we’ve been experiencing since moving from 2014 to 2019 just over a month ago, hopefully staying high-level. The symptom, very simply, is extreme GAM contention with the “tempdb metadata memory-optimized” flag turned on. (sometimes a hundred or more sessions waiting on a single GAM page)
I say “symptom” because our true problem seems to point to something else we have no control over, something fundamental to this discussion page – the round-robin algorithm.
I’ll try to describe this as best I can. First, let me say that our tempdb is multiple files (currently 32) all equally sized at 3994MB (one GAM per file). When a PAGELATCH_UP blocking chain starts up on, say, page #2 of file #12, and we come along and initiate a new session, I would expect that session to have a high chance (31 out of 32) of allocating from a different tempdb file and executing successfully. I wouldn’t expect it to always hit file #12. But that’s exactly what we’re seeing.
For a practical example, when we start getting a blocking chain in flight, we cannot run sp_whoisactive. It too gets blocked and becomes a victim in the blocking chain we’re trying to diagnose. Not just once, but over and over and over, different connections, different clients. It just won’t run, period.
We’re at an impass on how many tempdb files we should have, because we seem to always be getting bogged down on a single file each time. We’ve tried from 16 to 64 files, on a 64-CPU system (logical CPUs)
We’re already engaged with MS support, with no clear answer yet, but I figured that throwing a post in here couldn’t hurt just to cover some more bases.
Hey Randy – I’ve seen that once while playing with 2019 and now I can’t get it to repro. It *definitely* seems like a bug so I can’t offer any advice on a workaround.