SQLskills SQL101: Temporary table misuse

As Kimberly blogged about recently, SQLskills is embarking on a new initiative to blog about basic topics, which we’re calling SQL101. We’ll all be blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

After I left Microsoft in 2007, one of the first clients I worked with (who we’re still working with today) threw an interesting problem at me: “We can’t run stored proc X any more because it causes tempdb to fill the drive and then it fails.” Game on. I built some monitoring infrastructure into the proc using the DMV sys.dm_db_task_space_usage to figure out how much tempdb space was being used at various points and find the problem area.

It turned out to be problem *areas*, and in fact the proc was loaded with temporary table (I’ll just use the common contraction ‘temp table’ from now on) misuse, illustrating all three of the common temp table problems I’m going to describe below. Once I fixed that proc, (reducing the tempdb usage from more than 60GB down to under 1GB, and the run time from many minutes to a few tens of seconds) I implemented some automated monitoring built around the sys.dm_db_task_space_usage DMV to identify procedures and ad hoc queries that were misusing temp tables. We’ve since used this monitoring at many other clients to identify temp table misuse.

In this post I’d like to describe the three main ways that temp table are misused:

  • Over-population of temp tables
  • Incorrect indexing on temp tables
  • Using a temp table where none are required

Don’t get me wrong though – temp tables are great – when they’re used efficiently.

Over-Population of a Temp Table

This problem involves creating a temp table using something like a SELECT … INTO #temptable construct and pulling far more data into the temp table than is necessary.

The most common thing we see is pulling lots of user table columns into the temp table, where some of the columns are not used ever again in subsequent code. This is a HUGE waste of I/O and CPU resources (extracting the columns from the user table in the first place – and imagine the extra CPU involved if the source data is compressed!) and a big waste of tempdb space (storing the columns in the temp table). I’ve seen code pulling large varchar columns into a temp table that aren’t used, and with multi-million row datasets…

The other facet of over-population of temp tables is pulling in too many rows. For instance, if your code is interested in what happened over the last 12 months, you don’t need to pull in all the data from the last ten years. Not only will it be bloating the temp table, it will also drastically slow down the query operations. This was one of the biggest problems in the client scenario I described above.

The key to better performance is making sure your selection/projection is as focused as possible. To limit your selection, use an effective WHERE clause. To limit your projection, list only the necessary columns in your select list.

Incorrect Indexing on a Temp Table

This problem involves either creating indexes before populating the table (so that no statistics are generated) or creating a bunch of inappropriate indexes that are not used.

The most common example we see is creating a single-column nonclustered index for each of the temp table columns. Those are usually just taking up space for no use whatsoever. Temp tables *DO* need indexes (preferably after load) but as with any form of query tuning – only the RIGHT indexes. Consider creating permanent tables that mimic what’s going on in your temporary objects and then using the Database Tuning Advisor (DTA) to see if it has recommendations. While DTA’s not perfect, it’s often WAY better than guessing. Kimberly has a great post in our Accidental DBA series that discusses indexing strategies – start there.

Also, don’t create any nonclustered indexes until the temp table has been populated, otherwise they won’t have any statistics, which will slow down query performance, possibly drastically.

Oh yes, and, don’t create a clustered index for the temp table before populating it unless you know that the data being entered is already sorted to exactly match the cluster key you’ve chosen. If not, inserts into the temp table are going to cause index fragmentation which will really slow down the time it takes to populate the temp table. If you know the data is sorted and you create the clustered index first, there’s still no guarantee that the Storage Engine will feed the data into the temp table in the right order, so be careful. And if you go that route, remember that you’ll need to update the statistics of the clustered index after the temp table creation.

You need to be careful here because in some versions of SQL Server, changing the schema of a temp table in a stored proc can cause recompilation issues. Do some testing and pick the sequence of events that makes the most sense for performance in your situation.

Using a Temp Table Where None is Required

The SQL Server Query Optimizer is a fabulous beast and is very good at figuring out the most efficient way to execute most queries. If you choose to take some of the query operation and pre-calculate it into a temp table, sometimes you’re causing more harm than good. Any time you populate a temp table you’re forcing SQL Server to materialize the complete set of results of whatever query you ran to populate the temp table. This can really limit SQL Server’s ability to produce a pipeline of data flowing efficiently through a query plan and making use of parallelism and collapsing data flows when possible.

While it’s true that you might be able to do better than the optimizer sometimes, don’t expect that it’s the case all the time. Don’t just go straight to using temp tables, give the optimizer a chance – and, make sure to retest your code/expectations around Service Packs and hot fixes as these may have eliminated the need for temp tables as well.

A good way to test whether a temp table is actually a hindrance to performance is to take the tempdb-creation code, embed it as a derived table in the main query, and see if query performance improves.

It’s quite often the case that temp tables because an architectural standard in an environment when they proved useful long ago and now everyone used them, without ever checking if they’re *really* good for all cases.

One other thing you can consider is replacing temp tables with In-Memory OLTP  memory-optimized tables, in all Editions of SQL Server 2016 SP1 and later, and in Enterprise Edition of SQL Server 2014. That’s beyond the scope of this post, but you can read about it in this Books Online page on MSDN.

Summary

Always try to follow these guidelines when using a temp table:

  • Determine if a temp table is the most efficient way to achieve the goal of the code you’re writing
  • Limit the number of columns being pulled into the temp table
  • Limit the number of rows being pulled into the temp table
  • Create appropriate indexes for the temp table

Take a look at your current temp table usage. You may be surprised to find a lot of tempdb space and CPU resources being consumed by inappropriate temp table usage, population, and indexing.

Hope you found this helpful!

Investigating the proportional fill algorithm

This is something that came up recently on the Microsoft Certified Master DL, and is something I discuss in our IEPTO1 class because of the performance implications of it, so I thought it would make an interesting post.

Allocation Algorithms

The SQL Server Storage Engine (SE) uses two algorithms when allocating extents from files in a filegroup: round robin and proportional fill.

Round robin means that the SE will try to allocate from each file in a filegroup in succession. For instance, for a database with two files in the primary filegroup (with file IDs 1 and 3, as 2 is always the log file), the SE will try to allocate from file 1 then file 3 then file 1 then file 3, and so on.

The twist in this mechanism is that the SE also has to consider how much free space is in each of the files in the filegroup, and allocate more extents from the file(s) with more free space. In other words, the SE will allocate proportionally more frequently from files in a filegroup with more free space. This twist is called proportional fill.

Proportional fill works by assigning a number to each file in the filegroup, called a ‘skip target’. You can think of this as an inverse weighting, where the higher the value is above 1, the more times that file will be skipped when going round the round robin loop. During the round robin, the skip target for a file is examined, and if it’s equal to 1, an allocation takes place. If the skip target is higher than 1, it’s decremented by 1 (to a minimum value of 1), no allocation takes place, and consideration moves to the next file in the filegroup.

(Note that there’s a further twist to this: when the -E startup parameter is used, each file with a skip target of 1 will be used for 64 consecutive extent allocations before the round robin loop progresses. This is documented in Books Online here and is useful for increasing the contiguity of index leaf levels for very large scans – think data warehouses.)

The skip target for each file is the integer result of (number of free extents in file with most free space) / (number of free extents in this file). The files in the filegroup with the least amount of free space will therefore have the highest skip targets, and there has to be at least one file in the filegroup with a skip target of 1, guaranteeing that each time round the round robin loop, at least one extent allocation takes place.

The skip targets are recalculated whenever a file is added to or removed from a filegroup, or at least 8192 extent allocations take place in the filegroup.

Investigating the Skip Targets

There’s an undocumented trace flag, 1165, that lets us see the skip targets whenever they’re recalculated and I believe the trace flag was added in SQL Server 2008. It also requires trace flag 3605 to be enabled to allow the debugging info to be output.

Let’s try it out!

First I’ll turn on the trace flags, cycle the error log, creating a small database, and look in the error log for pertinent information:

DBCC TRACEON (1165, 3605);
GO

EXEC sp_cycle_errorlog;
GO

USE [master];
GO

IF DATABASEPROPERTYEX (N'Company', N'Version') > 0
BEGIN
	ALTER DATABASE [Company] SET SINGLE_USER
		WITH ROLLBACK IMMEDIATE;
	DROP DATABASE [Company];
END
GO

CREATE DATABASE [Company] ON PRIMARY (
    NAME = N'Company_data',
    FILENAME = N'D:\SQLskills\Company_data.mdf',
	SIZE = 5MB,
    FILEGROWTH = 1MB)
LOG ON (
    NAME = N'Company_log',
    FILENAME = N'D:\SQLskills\Company_log.ldf'
);

EXEC xp_readerrorlog;
GO
2016-10-04 11:38:33.830 spid56       Proportional Fill Recalculation Starting for DB Company with m_cAllocs -856331000.
2016-10-04 11:38:33.830 spid56       Proportional Fill Recalculation Completed for DB Company new m_cAllocs 8192, most free file is file 1.
2016-10-04 11:38:33.830 spid56       	File [Company_data] (1) has 44 free extents and skip target of 1. 

The m_cAllocs is the threshold at which the skip targets will be recalculated. In the first line of output, it has a random number as the database has just been created and the counter hasn’t been initialized yet. It’s the name of a class member of the C++ class inside the SE that implements filegroup management.

Now I’ll add another file with the same size:

ALTER DATABASE [Company] ADD FILE (
	NAME = N'SecondFile',
	FILENAME = N'D:\SQLskills\SecondFile.ndf',
	SIZE = 5MB,
    FILEGROWTH = 1MB);
GO

EXEC xp_readerrorlog;
GO
2016-10-04 11:41:27.880 spid56       Proportional Fill Recalculation Starting for DB Company with m_cAllocs 8192.
2016-10-04 11:41:27.880 spid56       Proportional Fill Recalculation Completed for DB Company new m_cAllocs 8192, most free file is file 3.
2016-10-04 11:41:27.880 spid56       	File [Company_data] (1) has 44 free extents and skip target of 1. 
2016-10-04 11:41:27.880 spid56       	File [SecondFile] (3) has 79 free extents and skip target of 1. 

Note that even though the two files have different numbers of extents, the integer result of 79 / 44 is 1, so the skip targets are both set to 1.

Now I’ll add a much larger file:

ALTER DATABASE [Company] ADD FILE (
	NAME = N'ThirdFile',
	FILENAME = N'D:\SQLskills\ThirdFile.ndf',
	SIZE = 250MB,
    FILEGROWTH = 1MB);
GO

EXEC xp_readerrorlog;
GO
2016-10-04 11:44:20.310 spid56       Proportional Fill Recalculation Starting for DB Company with m_cAllocs 8192.
2016-10-04 11:44:20.310 spid56       Proportional Fill Recalculation Completed for DB Company new m_cAllocs 8192, most free file is file 4.
2016-10-04 11:44:20.310 spid56       	File [Company_data] (1) has 44 free extents and skip target of 90. 
2016-10-04 11:44:20.310 spid56       	File [ThirdFile] (4) has 3995 free extents and skip target of 1. 
2016-10-04 11:44:20.310 spid56       	File [SecondFile] (3) has 79 free extents and skip target of 50. 

The file with the most free space is file ID 4, so the skip targets of the other files are set to (file 4’s free extents) / (free extents in the file). For example, the skip target for file 1 becomes the integer result of 3995 / 44 = 90.

Now I’ll create a table that can have only one row per page, and force more than 8192 extent allocations to take place (by inserting more than 8192 x 8 rows, forcing that many pages to be allocated). This will also mean the files will have autogrown and will have roughly equal numbers of free extents.

USE [Company];
GO

CREATE TABLE [BigRows] (
	[c1] INT IDENTITY,
	[c2] CHAR (8000) DEFAULT 'a');
GO

SET NOCOUNT ON;
GO

INSERT INTO [BigRows] DEFAULT VALUES;
GO 70000

EXEC xp_readerrorlog;
GO
2016-10-04 11:55:28.840 spid56       Proportional Fill Recalculation Starting for DB Company with m_cAllocs 8192.
2016-10-04 11:55:28.840 spid56       Proportional Fill Recalculation Completed for DB Company new m_cAllocs 8192, most free file is file 3.
2016-10-04 11:55:28.840 spid56       	File [Company_data] (1) has 0 free extents and skip target of 74. 
2016-10-04 11:55:28.840 spid56       	File [ThirdFile] (4) has 0 free extents and skip target of 74. 
2016-10-04 11:55:28.840 spid56       	File [SecondFile] (3) has 74 free extents and skip target of 1. 

We can see that all the files have filled up and auto grown, and randomly file ID 3 is now the one with the most free space.

Spinlock Contention

The skip targets for the files in a filegroup are protected by the FGCB_PRP_FILL spinlock, so this spinlock has to be acquired for each extent allocation, to determine which file to allocate from next. There’s an exception to this when all the files in a filegroup have roughly the same amount of free space (so they all have a skip target of 1). In that case, there’s no need to acquire the spinlock to check the skip targets.

This means that if you create a filegroup that has file sizes that are different, the odds are that they will auto grow at different times and the skip targets will not all be 1, meaning the spinlock has to be acquired for each extent allocation. Not a huge deal, but it’s still extra CPU cycles and the possibility of spinlock contention occurring (for a database with a lot of insert activity) that you could avoid by making all the files in the filegroup the same size initially.

If you want, you can watch the FGCB_PRP_FILL spinlock (and others) using the code from this blog post.

Performance Implications

So when do you need to care about proportional fill?

One example is when trying to alleviate tempdb allocation bitmap contention. If you have a single tempdb data file, and huge PAGELATCH_UP contention on the first PFS page in that file (from  a workload with many concurrent connections creating and dropping small temp tables), you might decide to add just one more data file to tempdb (which is not the correct solution). If that existing file is very full, and the new file isn’t, the skip target for the old file will be large and the skip target for the new file will be 1. This means that subsequent allocations in tempdb will be from the new file, moving all the PFS contention to the new file and not providing any contention relief at all! I discuss this case in my post on Correctly adding data file to tempdb.

The more common example is where a filegroup is full and someone adds another file to create space. In a similar way to the example above, subsequent allocations will come from the new file, meaning that when it’s time for a checkpoint operation, all the write activity will be on the new file (and it’s location on the I/O subsystem) rather than spread over multiple files (and multiple locations in the I/O subsystem). Depending on the characteristics of the I/O subsystem, this may or may not cause a degradation in performance.

Summary

Proportional fill is an algorithm that it’s worth knowing about, so you don’t inadvertently cause a performance issue, and so that you can recognize a performance issue caused by a misconfiguration of file sizes in a filegroup. I don’t expect you to be using trace flag 1165, but if you’re interested, it’s a way to dig into the internals of the allocation system.

Enjoy!

Survey: tempdb file configuration (code to run)

I’m running this survey to help the SQL Server team at Microsoft, who would like to get a broad view of current tempdb configurations. I’ll editorialize the results as well in a week or two.

Feel free to run the code below any way you want, and also add a single preceding column to the result set (e.g. server name or number) if you want, but PLEASE do not add any *rows* of data apart from what I’ve asked for otherwise it makes the data processing very time consuming, especially if you send results from hundreds of servers. I know people that do that are trying to be helpful, but I really don’t need any other data apart from what I’ve asked for.

You can send me results in email in a text file or spreadsheet, or leave a comment below. The code will work on SQL Server 2005 onwards.

Thanks!

IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
    WHERE [name] LIKE N'#PSR_tracestatus%')
    DROP TABLE [#PSR_tracestatus];
GO

CREATE TABLE [#PSR_tracestatus] (
    [TraceFlag] INT, [Status] INT, [Global] INT, [Session] INT);

INSERT INTO #PSR_tracestatus EXEC ('DBCC TRACESTATUS (1117) WITH NO_INFOMSGS');
INSERT INTO #PSR_tracestatus EXEC ('DBCC TRACESTATUS (1118) WITH NO_INFOMSGS');

SELECT
	[os].[cores],
	(SELECT [Global] FROM #PSR_tracestatus WHERE [TraceFlag] = 1117) AS [1117],
	(SELECT [Global] FROM #PSR_tracestatus WHERE [TraceFlag] = 1118) AS [1118],
	[file_id], [type_desc], [size], [max_size], [growth], [is_percent_growth]
FROM
	tempdb.sys.database_files AS [df],
	(
		SELECT COUNT (*) AS [cores]
		FROM sys.dm_os_schedulers
		WHERE status = 'VISIBLE ONLINE'
	) AS [os];

DROP TABLE [#PSR_tracestatus];
GO