As Kimberly blogged about recently, SQLskills is embarking on a new initiative to blog about basic topics, which we’re calling SQL101. We’ll all be blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.
After I left Microsoft in 2007, one of the first clients I worked with (who we’re still working with today) threw an interesting problem at me: “We can’t run stored proc X any more because it causes tempdb to fill the drive and then it fails.” Game on. I built some monitoring infrastructure into the proc using the DMV sys.dm_db_task_space_usage to figure out how much tempdb space was being used at various points and find the problem area.
It turned out to be problem *areas*, and in fact the proc was loaded with temporary table (I’ll just use the common contraction ‘temp table’ from now on) misuse, illustrating all three of the common temp table problems I’m going to describe below. Once I fixed that proc, (reducing the tempdb usage from more than 60GB down to under 1GB, and the run time from many minutes to a few tens of seconds) I implemented some automated monitoring built around the sys.dm_db_task_space_usage DMV to identify procedures and ad hoc queries that were misusing temp tables. We’ve since used this monitoring at many other clients to identify temp table misuse.
In this post I’d like to describe the three main ways that temp table are misused:
- Over-population of temp tables
- Incorrect indexing on temp tables
- Using a temp table where none are required
Don’t get me wrong though – temp tables are great – when they’re used efficiently.
Over-Population of a Temp Table
This problem involves creating a temp table using something like a SELECT … INTO #temptable construct and pulling far more data into the temp table than is necessary.
The most common thing we see is pulling lots of user table columns into the temp table, where some of the columns are not used ever again in subsequent code. This is a HUGE waste of I/O and CPU resources (extracting the columns from the user table in the first place – and imagine the extra CPU involved if the source data is compressed!) and a big waste of tempdb space (storing the columns in the temp table). I’ve seen code pulling large varchar columns into a temp table that aren’t used, and with multi-million row datasets…
The other facet of over-population of temp tables is pulling in too many rows. For instance, if your code is interested in what happened over the last 12 months, you don’t need to pull in all the data from the last ten years. Not only will it be bloating the temp table, it will also drastically slow down the query operations. This was one of the biggest problems in the client scenario I described above.
The key to better performance is making sure your selection/projection is as focused as possible. To limit your selection, use an effective WHERE clause. To limit your projection, list only the necessary columns in your select list.
Incorrect Indexing on a Temp Table
This problem involves either creating indexes before populating the table (so that no statistics are generated) or creating a bunch of inappropriate indexes that are not used.
The most common example we see is creating a single-column nonclustered index for each of the temp table columns. Those are usually just taking up space for no use whatsoever. Temp tables *DO* need indexes (preferably after load) but as with any form of query tuning – only the RIGHT indexes. Consider creating permanent tables that mimic what’s going on in your temporary objects and then using the Database Tuning Advisor (DTA) to see if it has recommendations. While DTA’s not perfect, it’s often WAY better than guessing. Kimberly has a great post in our Accidental DBA series that discusses indexing strategies – start there.
Also, don’t create any nonclustered indexes until the temp table has been populated, otherwise they won’t have any statistics, which will slow down query performance, possibly drastically.
Oh yes, and, don’t create a clustered index for the temp table before populating it unless you know that the data being entered is already sorted to exactly match the cluster key you’ve chosen. If not, inserts into the temp table are going to cause index fragmentation which will really slow down the time it takes to populate the temp table. If you know the data is sorted and you create the clustered index first, there’s still no guarantee that the Storage Engine will feed the data into the temp table in the right order, so be careful. And if you go that route, remember that you’ll need to update the statistics of the clustered index after the temp table creation.
You need to be careful here because in some versions of SQL Server, changing the schema of a temp table in a stored proc can cause recompilation issues. Do some testing and pick the sequence of events that makes the most sense for performance in your situation.
Using a Temp Table Where None is Required
The SQL Server Query Optimizer is a fabulous beast and is very good at figuring out the most efficient way to execute most queries. If you choose to take some of the query operation and pre-calculate it into a temp table, sometimes you’re causing more harm than good. Any time you populate a temp table you’re forcing SQL Server to materialize the complete set of results of whatever query you ran to populate the temp table. This can really limit SQL Server’s ability to produce a pipeline of data flowing efficiently through a query plan and making use of parallelism and collapsing data flows when possible.
While it’s true that you might be able to do better than the optimizer sometimes, don’t expect that it’s the case all the time. Don’t just go straight to using temp tables, give the optimizer a chance – and, make sure to retest your code/expectations around Service Packs and hot fixes as these may have eliminated the need for temp tables as well.
A good way to test whether a temp table is actually a hindrance to performance is to take the tempdb-creation code, embed it as a derived table in the main query, and see if query performance improves.
It’s quite often the case that temp tables because an architectural standard in an environment when they proved useful long ago and now everyone used them, without ever checking if they’re *really* good for all cases.
One other thing you can consider is replacing temp tables with In-Memory OLTP memory-optimized tables, in all Editions of SQL Server 2016 SP1 and later, and in Enterprise Edition of SQL Server 2014. That’s beyond the scope of this post, but you can read about it in this Books Online page on MSDN.
Always try to follow these guidelines when using a temp table:
- Determine if a temp table is the most efficient way to achieve the goal of the code you’re writing
- Limit the number of columns being pulled into the temp table
- Limit the number of rows being pulled into the temp table
- Create appropriate indexes for the temp table
Take a look at your current temp table usage. You may be surprised to find a lot of tempdb space and CPU resources being consumed by inappropriate temp table usage, population, and indexing.
Hope you found this helpful!
9 thoughts on “SQLskills SQL101: Temporary table misuse”
Thanks for sharing. Nice article.
As of SQL2014 I tend to use Inline creation of clustered indexes on temp tables. On a very busy system SQL can reuse these tables, rather than creating them everytime again for every user.
Would this be a good approach then?
– Create table with inline clustered index
– populate the table (in the correct order of the clustered index)
– create non clustered indexes (if needed)
Yes – but make sure to update the clustered index stats after populating the table, only if that will benefit the subsequent queries.
I recently came to know about SQLSkills and i found it very use full.
While going through your blog, in “Incorrect Indexing on a Temp Table” section, you have recommended to create index on #temp table after populating it, but as per my knowledge in a store proc any alteration on #temp table will result in re compile the following statements in store procedure, irrespective of statements using #temp table. Please correct me if I am wrong.
You are right – you need to be careful to make sure that what you’re doing is optimal for your situation.
Thanks for the article it was great. I have one issue where one of the vendors provided 6 scripts and they did as you mentioned above:
“This problem involves creating a temp table using something like a SELECT … INTO #temptable construct and pulling far more data into the temp table than is necessary.”
Now my question is does writing into the TempDB finally end up in the transaction log of the concenrned database? Making the TL of that database increase dramatically?
No – activity in tempdb can only affect tempdb’s transaction log.
We have a situation where we are trying to select few records using 7 to 8 CTE’s and we are trying to insert these records to temporary table as select into #temp. While inserting into temporary table we have few joins unions and case statements.Question is when we are just trying to select records without using select into #temp, it is getting executed within few minutes as expected and in the query plan we have no parallelism where as when we are trying to insert records into #temp table it is using excess parallelism with multiple threads and we have no result, it just keeps on executing. This instance has 16 core 32 logical processors and with 8 tempdb files. we have noticed cxpacket waittype and while using MAXDOP 1, query is running successfully. Our requirement is how can we avoid this parallelism without using MAXDOP 1 and insert the records to #temp table using select into #temp statement.
Please let me know if you need any further information.
It’s a cardinality estimation problem – the optimizer thinks there’s going to be more rows than there are. You’ll need to compare estimated and actual plans to see where the mis-estimation is. Also compare the plans with and without the temp table to see what the difference is. Hard to go into more depth in a comment like this. As a last resort you could just use a MAXDOP 1 query hint for that query.