This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!
One of the most common performance problems that exists in SQL Server instances across the world is known as tempdb contention. What does that mean?
Tempdb contention refers to a bottleneck for threads trying to access allocation pages that are in-memory; it has nothing to do with I/O.
Consider the scenario of hundreds of concurrent queries that all create, use, and then drop small temporary tables (that by their very nature are always stored in tempdb). Each time a temp table is created, a data page must be allocated, plus an allocation metadata page to keep track of the data pages allocated to the table. This requires making a note in an allocation page (called a PFS page – see here for in-depth info) that those two pages have been allocated in the database. When the temp table is dropped, those pages are deallocated, and they must be marked as such in that PFS page again. Only one thread at a time can be changing the allocation page, making it a hotspot and slowing down the overall workload.
Back in SQL Server 2005, my dev team at Microsoft implemented a small cache of temp tables, to try to reduce this contention point, but it’s only a small cache, so it’s very common for this contention to be an issue, even today.
What’s really interesting though, is that many people don’t realize they have this problem – even seasoned DBAs. It’s really easy to figure out whether you have this kind of problem using the sys.dm_os_waiting_tasks DMV. If you run the query I have below, you’ll get an idea of where the various threads on your server are waiting, as Erin discussed earlier this month.
SELECT [owt].[session_id], [owt].[exec_context_id], [owt].[wait_duration_ms], [owt].[wait_type], [owt].[blocking_session_id], [owt].[resource_description], CASE [owt].[wait_type] WHEN N'CXPACKET' THEN RIGHT ([owt].[resource_description], CHARINDEX (N'=', REVERSE ([owt].[resource_description])) - 1) ELSE NULL END AS [Node ID], [es].[program_name], [est].text, [er].[database_id], [eqp].[query_plan], [er].[cpu_time] FROM sys.dm_os_waiting_tasks [owt] INNER JOIN sys.dm_exec_sessions [es] ON [owt].[session_id] = [es].[session_id] INNER JOIN sys.dm_exec_requests [er] ON [es].[session_id] = [er].[session_id] OUTER APPLY sys.dm_exec_sql_text ([er].[sql_handle]) [est] OUTER APPLY sys.dm_exec_query_plan ([er].[plan_handle]) [eqp] WHERE [es].[is_user_process] = 1 ORDER BY [owt].[session_id], [owt].[exec_context_id]; GO
Note that the [est].text line does not have text delimited – it throws off the plugin.
If you see a lot of lines of output where the wait_type is PAGELATCH_UP or PAGELATCH_EX, and the resource_description is 2:1:1 then that’s the PFS page (database ID 2 – tempdb, file ID 1, page ID 1), and if you see 2:1:3 then that’s another allocation page called an SGAM (more info here).
There are three things you can do to alleviate this kind of contention and increase the throughput of the overall workload:
- Stop using temp tables
- Enable trace flag 1118 as a start-up trace flag
- Create multiple tempdb data files
Ok – so #1 is much easier said than done, but it does solve that problem :-) Seriously though, you might find that temp tables are a design pattern in your environment because they made a query go faster once and then everyone started using them, whether they’re *really* needed or not for enhancing performance and throughput. That’s a whole other topic though and outside the scope of this post.
#2 prevents contention on the SGAM pages by slightly changing the allocation algorithm used. There is no downside from having this enabled, and I even say that all SQL Server instances across the world should have this trace flag enabled by default (and I said the same thing when I ran the dev team that owned the allocation code in the SQL Server Storage Engine).
#3 will help to remove the PFS page contention, by spreading the allocation workload over multiple files, thus reducing contention on the individual, per-file PFS pages. But how many data files should you create?
The best guidance I’ve seen is from a great friend of mine, Bob Ward, who’s the top Escalation Engineer in Microsoft SQL Product Support. Figure out the number of logical processor cores you have (e.g. two CPUS, with 4 physical cores each, plus hyperthreading enabled = 2 (cpus) x 4 (cores) x 2 (hyperthreading) = 16 logical cores. Then if you have less than 8 logical cores, create the same number of data files as logical cores. If you have more than 8 logical cores, create 8 data files and then add more in chunks of 4 if you still see PFS contention. Make sure all the tempdb data files are the same size too. (This advice is now official Microsoft guidance in KB article 2154845.)
Here’s an example screenshot of a 100-connection workload on tempdb.
On the left was when I had a single tempdb data file and I was seeing nearly all the connections waiting for PAGELATCH_UP on 2:1:1. After adding three more tempdb data files and stabilizing, the throughput jumped, as you can clearly see.
This has just been an overview of this problem, but as I said, it’s very common. You can get more information from the links below: