Correctly adding data files to tempdb

It’s well known that one of the common performance issues that can affect tempdb is allocation bitmap contention. I discuss this, and ways to alleviate it, in these posts:

The current best advice around adding tempdb data files is enshrined in KB article 2154845. If you’re seeing tempdb allocation contention (see top blog post link above), then:

  • If your server has less than 8 logical cores (e.g. a one CPU server with 4 physical cores and hyperthreading enabled has 8 logical cores), use # tempdb data files = # logical cores, equally sized
  • If your server has more than 8 logical cores, start with 8 tempdb data files, and add sets of four at a time, equally sized, until the contention is alleviated

There are three problems that people often face when adding tempdb data files: matching the size of the existing files that are growing, adding a file doesn’t help with contention, and adding too many files.

Matching Existing File Sizes

This problem occurs when the existing tempdb data files are growing, and people find it hard to create additional files that match the size of the existing files.

There’s an easy method for doing this: don’t!

Don’t try to match the size of existing, growing files. Create the new files to be a bit larger than the existing files, then go back and increase the size of the existing files to match the size of the new files.

For example, if I have 4 tempdb data files sized at 6GB each, and they’re growing by 512MB every few minutes because of an ad hoc workload. If I decide to add 4 more files, I might decide to add the four new files at 10GB each, and then go back and do ALTER DATABASE [mydb] MODIFY FILE [DataFileX] (SIZE = 10GB) for each of the 4 existing files. Problem solved.

But also see the bottom section, where you may want to limit the total amount of space taken up by all your tempdb files if the only reason for extra files is to alleviate tempdb allocation contention.

Additionally, if you have one full data file, you may find that…

Adding a File Doesn’t Help

This is very frustrating when it happens to people because it gives the impression that adding tempdb data files does not help with allocation contention. However, there is a simple explanation for this phenomenon.

Consider the case where there is one tempdb data file. Obviously all the allocations have to come from that data file and with the right workload, allocation bitmap contention will result. After the server has been up for a while, and the workload has been running and using tempdb for a while, the single tempdb data file may become quite full.

Now let’s say that you decide to add one more tempdb data file. What happens to the allocations?

Allocation uses two algorithms: round-robin and proportional fill. It will try to allocate from each file in the filegroup in turn, but will allocate proportionally more frequently from files that have proportionally more free space than others in the filegroup.

In the case where one file is very full and the other file is very empty, the vast majority of the allocations will be from the new, empty file. This means that almost all the contention moves from the initially existing tempdb data file to the new one, without much alleviation of the overall contention.

If this happens to you, try adding some more data files so that the allocation system has multiple files that it will allocate from, spreading the contention over those files and leading to an overall drop in contention and increase in transaction throughput.

But beware of immediately…

Adding Too Many Data Files

This is the case where tempdb allocation contention is a problem and people immediately add a large number of additional files where fewer files would work just as well. The problem here is that additional disk space is used up for no real gain, which may or may not be significant in your environment, depending on the size of the files added.

Let’s do an experiment. Below is a screen shot of PerfMon measuring transactions per second in tempdb for a contrived workload that has 100 connections all repeatedly creating and truncating temp tables. It’s running on my laptop (8 logical cores) using SQL Server 2014 RTM CU3.

perfmon1

For the first third of the trace, there’s a single data file. For the middle third of the trace, there are two equally-sized files. For the final third of the trace, there are 8 equally-sized files.

Clearly there isn’t a big performance boost from having the additional 6 data files in the final third, but what’s the sweet spot?

Ideally you’d experiment with varying numbers of tempdb data files to find the sweet spot for your workload. However, that’s easier said than done, especially when you’re trying to standardize a tempdb configuration across multiple servers.

Here’s an example of a slightly different workload running under the same conditions on my laptop.

perfmon2

It starts with a single data file, then 2, 4, 6, and 8 (pausing perfmon between each file addition). In this case, it’s clearly worth it going to 8 data files. But would I make them all the same size as the initial data file?

No, not if the only reason I need the extra files is to alleviate the allocation bitmap contention. I’d lower the size of all the tempdb files, including the initial one, so I’m not taking up a huge amount of extra disk space for these files.

Just be aware that sometimes you don’t need to go all out and add a whole bunch of extra tempdb data files to get a performance boost.

Summary

The easiest way to alleviate tempdb allocation contention is to enable trace flag 1118 and to add more tempdb data files. Just be careful that you add the right number to help with the contention, you make all the files the same size, and that you take into account the total size of all the data files you’ve created, and possibly dial them all down a bit.

Enjoy!

Shrinking tempdb no longer prohibited

For the longest time the guidance around tempdb is that if you shrink it on a live system then it could cause tempdb corruption.

A few months ago I was discussing this with my good friend Bob Ward from Product Support and neither of us could remember the last time we’d seen a case of tempdb corruption that had been caused by shrinking. So we both did some investigations, including looking through the internal bug databases, to find the root cause of the long-running advice.

The bottom line is that tempdb corruption hasn’t been a problem with shrink since early builds of SQL Server 2000. There was also some extensive testing done to verify this.

As such, the KB article that discusses shrinking tempdb has been updated and I got notification last night from the author that it’s been published.

KB 307487 (How to shrink the tempdb database in SQL Server) now explains that even though you may see messages from shrink that look like corruption, they’re not.

Remember though, shrinking should be a rare operation, whether data or log file shrinking – and never a regular operation.

Enjoy!

Are I/O latencies killing your performance?

(Check out our Pluralsight online training course: SQL Server: Improving Storage Subsystem Performance.)

In this post I explain some methods for investigating and reducing high tempdb and transaction log I/O latencies that can severely hamper the performance of your workload.

Back at the end of August I kicked off a survey asking you to run some code to calculate average I/O latencies and send me the results. I have to apologize for taking two months to get to this editorial as I kept putting off spending the time to collate all the results I was sent. I made up for it by putting in lots of useful information :-)

I received results from 1094 random servers around the world (thank you all!) hosting 25445 databases and it took me a while to pull all the results into SQL Server so I could aggregate the data. Here it is.

What is Good or Bad?

For everything we’re talking about, you have to consider two things:

  • What is good or bad for I/O latency?
  • Even if you have “bad” I/O latency, do you care?

Everyone has their idea of what constitutes good or bad I/O latency, and here’s my take:

  • Excellent: < 1ms
  • Very good: < 5ms
  • Good: 5 – 10ms
  • Poor: 10 – 20ms
  • Bad: 20 – 100ms
  • Shockingly bad: 100 – 500ms
  • WOW!: > 500ms

You may want to argue that my numbers are too high, too low, or wrong, based on what you’ve seen/experienced, and that’s fine, but this is my blog and these are my numbers :-) You may not be able to get to my Excellent or Very Good numbers unless you’re using the latest locally-attached flash storage and aren’t driving them hard with your workload. Remember these are just my guidelines, not any kind of rule that SQL Server requires.

Even if your I/O latency is in what I consider the “bad” category, it could be that your workload is performing within the acceptable bounds for your organization and your customers, and so you’re comfortable with that. I’m OK with this, as long as you are aware of your I/O latencies and you’ve consciously decided to accept them as they are. Being ignorant of your I/O latencies and just accepting the workload performance as it is, because that’s just what the performance is, is not acceptable to me.

On to the survey data…

Tempdb Data Files

For this data I worked out the average read and write latency over all tempdb data files for each instance. I didn’t see any instances where one tempdb data file had a huge latency compared to the others so I believe this is a valid approach.

TempdbAvgRead

The tempdb data file read latencies aren’t too bad, to be honest, with more than 93% of all the servers in the survey having read latencies less than 20ms.

TempdbAvgWrite

This is very interesting – almost 42% of all the servers in the survey had average tempdb data file write latency of more than 20ms, and just over 12% of all servers had average tempdb data file write latency of more than half a second per write – that’s ridiculously high!

To be honest, I’m pretty shocked by these results. especially the relatively high number of servers with multi-second average write latencies for tempdb.

So if you check your average I/O latency for tempdb (using a script such as the one I blogged here, using sys.dm_io_virtual_file_stats) and find that it’s really high on my (or your) scale, what can you do?

Well, there are four approaches I can think of:

  1. Don’t do any investigation and just immediately move tempdb to a faster I/O subsystem, such as two SSD cards in a RAID-1 configuration (remember, one SSD is RAID-0, and that’s not good enough because if tempdb is corrupt or unavailable, your instance shuts down!). You might think that this is the lazy approach, but if you *know* you can’t make any changes to workload, this may be your only option. I bumped into this scenario while working with a client that supplies casino software. The optimal solution was to change some SPs to reduce tempdb usage, but that was going to take 18 months to be approved by the Gaming Commission in the state where the client’s client was located. The only solution in the meantime? Pony up for a faster tempdb I/O subsystem.
  2. Investigate the I/O subsystem where tempdb is located, looking for things like (non-exhaustive list):
    • Pathing/network issues, such as having a 1Gb switch in the middle of a 4Gb path, or having mismatched jumbo frame settings in your iSCSI configuration, or where the network to the SAN is being saturated by something other than SQL Server I/O traffic.
    • Incorrect SAN settings, such as not having write-caching enabled for a write-heavy workload, incorrect queue depth compared to your SAN vendor’s recommendations, or tempdb stuck on slow storage on an auto-tiering SAN because of incorrect SAN training.
    • Multiple users of the portion of the I/O subsystem where tempdb is located, such as having tempdb lumped in with other volatile databases, or even having LUNs shared between SQL Server and other applications like Exchange or IIS.
    • Only having a single tempdb data file so missing out on the usual higher I/O subsystem performance gained from having multiple data files in a filegroup (note: don’t confuse this with adding more tempdb data files to reduce PAGELATCH_XX contention on in-memory allocation bitmaps.)
  3. Try to reduce the usage of tempdb, by looking for (non-exhaustive list):
    • Spill warnings in query plans (hash, sort, or exchange) indicating that there was not enough query execution memory and an operator had to spill results to tempdb. See these blog posts for more information: here, here, here, and a blog post of mine explaining how to understand the data vs. log usage for a memory spill to tempdb here.
    • Incorrect, excessive usage of temp tables, such as *always* using a temp table when it may be better sometimes to not do so, pulling more columns or rows into a temp table than are actually required (e.g. SELECT * into the temp table from a user table, with no WHERE clause), creating nonclustered indexes on a temp table that are not used. I’ve seen this over and over with client code.
    • Index rebuilds that use SORT_IN_TEMPDB.
    • Using one of the flavors of snapshot isolation and allowing long-running queries that cause the version store to grow very large.
    • There are lots of useful queries and other information in the whitepaper Working with tempdb in SQL Server 2005 (which is still applicable to all current versions.)
  4. A combination of #2 and #3, and then maybe you just have to move to a faster I/O subsystem, as in #1.

One other thing to consider is the risk of making a change to your code and/or the cost of the engineering effort (dev and test) to do so. It may be cheaper and less risky to move to a faster I/O subsystem. Your call. Another issue you may have is that the bad code is in a 3rd-party application that you have no control over. In that case you may have no choice except to throw hardware at the problem.

Transaction Log Files

For this data I considered each database separately rather than aggregating per instance.

LogReadAvg2

LogWriteAvg2

For the transaction log, you really want the average write latency to be in the 0-5ms range, and it’s good to see more than 79% of transaction log files in the survey are achieving that. I would say that write latency for the transaction log is much more important than read latency, as write latency slows down transactions in your workload. That’s not to say that you should ignore high read latencies, as these slow down log readers (such as log backups, transactional replication, change data capture, asynchronous database mirroring/availability groups), but log read latencies don’t usually slow down your workload unless you have transactions that are rolling back (the only time that transactions will cause log reads) or you rely heavily on change data capture.

So you’re focusing on write latency. Again, there are multiple approaches that are the same as #1-#4 above. What you’re looking for in approach #3 is different though. I’ve written several detailed posts about transaction log performance tuning (including reducing the amount of log generated and changing the log block flush pattern) for SQL Sentry’s SQLPerformance.com blog so rather than duplicate those, I’ll point you to them:

And then there’s my 8-hour Pluralsight online training class that covers SQL Server: Understanding Logging, Recovery, and the Transaction Log.

Summary

It’s really important that you pay attention to the read and write I/O latencies for critical components of your SQL Server environment: tempdb and transaction logs. I’ve given you a lot of info above on what you can do inside and outside SQL Server to reduce these latencies and increase your workload throughput.

Unless you have no choice, don’t just throw some SSDs into the mix without first figuring out whether there are some things you can do inside SQL Server to reduce the I/O load, and if you do throw in some SSDs, make sure that you’re using them to host whatever files are your biggest I/O bottleneck, and make sure you’re using at least two of them in a RAID-1 configuration to protect against failure.

I hope this has been a useful read – happy tuning!