The Accidental DBA (Day 29 of 30): Troubleshooting Deadlocks

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

Deadlocks occur in SQL Server whenever two processes attempt to acquire locks on the same resources in reverse order creating a persistently blocked condition, where neither of the sessions can continue to execute until the other session releases its locks.  In SQL Server, the Lock Monitor background task is responsible for detecting and resolving deadlocks whenever they occur (basically terminating the session that has performed the least amount of work so far), and the resulting 1205 error can be a sign of problems that require further evaluation.  In a third-party vendor application, it may not be possible to make the changes necessary to eliminate deadlocks, but you can still collect information about the deadlocks to assist the third-party vendor in analysis and possibly identifying a solution to the problem.

Collecting Information

Prior to SQL Server 2008, collecting deadlock information from SQL Server required enabling a trace flag, configuring a SQL Trace, Event Notifications, or using a WMI Alert.  Trace Flags 1222, 1205, or 1204 write the deadlock information as text into the ERRORLOG.  SQL Trace, Event Notifications and  WMI Alerts allow collection of the deadlock information as XML. Since the introduction of Extended Events and the new system_health event session in SQL Server 2008, deadlock information has been captured by default in SQL Server and no longer requires enabling additional data collection for analysis.

Analysis

The definitive source for understanding the output from trace flag 1222 is a series of blog posts written by Bart Duncan. His three-part series uses the output from trace flag 1222 to demonstrate how to read the XML deadlock graph information, starting with Deadlock Troubleshooting, Part 1. The same method of analysis applies to deadlock graphs collected by SQL Trace, Event Notifications, WMI, and even Extended Events.  The format of the deadlock graph defines the deadlock victim(s), each of the processes involved in the deadlock (within the process-list node), and the resources contributing to the deadlock (within the resource-list node).  The processes each have an assigned processid that is used to uniquely identify each of the processes and the resources being locked or requested by the process in the graph.  Within each of the process’ information, the execution tsql_stack will show the most recently deadlocked statement backwards to the start of the execution call stack.

One of the key areas of focus for deadlock analysis is the resource-list portion of the graph, which contains all the information about the resources involved and the lock types being held and requested by each of the processes.  This will also contain the index and object names, or the allocation unit associated with the object, which can be used to determine the name of the object and index.  Understanding the locking order between the processes is essential for deadlock troubleshooting.

In addition to viewing the raw XML or text information for the deadlock, it is also possible to view the information graphically as explained in my blog post Graphically Viewing Extended Events Deadlock Graphs.  The graphical view in Management Studio will not show all of the same details as the XML or text, but can be a fast start for understanding the type of deadlock and locking order.  It may be necessary to look at the text or XML for further information in some scenarios, or you can also open the graph graphically in SQL Sentry’s excellent Plan Explorer Pro to get the full output parsed as a table as well.

Possible solutions

There are a lot of potential solutions to prevent deadlocks occurring, and the correct one will depend on the specific deadlock condition that is occurring. In some deadlock scenarios an index change to cover one of the queries executing may be all that is necessary to prevent the deadlock condition from being possible.  In other scenarios, it may be necessary to change isolation levels, or use locking hints to force a blocking lock that is incompatible with other locks to prevent the deadlock condition from being encountered.  Proper analysis of the deadlock graph will help with determining the appropriate solution to the problem, but in most cases simple error handling logic in Transact-SQL or .NET application code to handle the 1205 error and attempt to resubmit the victim transaction can prevent end users from being negatively affected by deadlocks occurring.

Summary

Troubleshooting deadlocks in SQL Server starts off with first collecting the deadlock graph information using one of the available methods.  Extended Events in SQL Server 2008 collect the information by default and eliminate the need to enable further collection and then waiting for the deadlocks to reoccur to gather the information.  Full details of how to configure deadlock graph collection and analysis of specific scenarios can be found in my SimpleTalk article Handling Deadlocks in SQL Server and in my Pluralsight online training course SQL Server: Deadlock Analysis and Prevention.

10 thoughts on “The Accidental DBA (Day 29 of 30): Troubleshooting Deadlocks

  1. I always enable TF 1222 and/1204 to capture deadlock details.
    Now, good to know this information is captured by default starting SQL Server 2008+. A post explaining how to view and analyze the deadlock data captured by default in SQL Server will be really helpful!

    Thanks!
    Suresh Raavi.

  2. Just wanted to say thanks Jonathan for your work to help the community. I refer to your excellent posts regularly and you’ve taught me a lot. Thank you.

  3. Thank you Jonathan! Always great articles. Although I’m a biased Floridian. 🙂
    What kind of overhead does setting the 1222 (et al) trace flags have on the system?

    Doug – Sarasota

    1. Hey Doug,

      Different trace flags have different effects and potentially overheads that are specific to the trace flags implementation. For 1222, the only thing it does is enable deadlock graph generation to the ErrorLog when a deadlock actually occurs. So if there are no deadlocks occurring, there is no overhead for having it enabled. When a deadlock occurs, it has a slight overhead for writing the graph out to the ERRORLOG, but only for the lock monitor thread that detected the deadlock condition. I’ve personally had it enabled for months on busy servers with no visible impact to performance.

  4. Hi Jonathan,

    I noticed deadlocks when concurrent updates are happening on a table with unique non-clustered INCLUDE index and updates are happening on INCLUDE columns.. If I remove INCLUDE columns from the index, deadlocks are not repeating.

    Do you have any suggestions on this? It seems updating INCLUDE columns is resulting into deadlocks.

  5. I was fortunate to use an external tool named Performance Analysis to check for Blocks and Dead lock. this tool gives a graphical representation of the dead lock and the query details.

  6. As we continue migration from SQL2012 to 2016 we are increasingly seeing deadlocking on our databases. we set the compatibility level to 120 (SQL 2014). After this change we no longer see any deadlocks. Unfortunately, none of these solutions are acceptable long-term in our production environment as we are essentially forcing the use of old technology in the case of SQL 2014 compatibility level.

    Version is SQL 2016 SP2 CU7 build number 13.0.5337

    To briefly summarize, we are experiencing intra-query deadlocking on relatively simple read-only select queries. It seems that this was a bug that was fixed in one of the earlier CU’s for SQL 2016. We updated to the latest CU and are still experiencing the intra-query deadlocking bug.

    we tried playing around with different MAXDOP settings. The only setting that eliminated the deadlocks was MAXDOP = 1. This setting did not work for us in our production environment as it was causing longer run times and application timeouts. MAXDOP was changed back to original setting. Forcing single threaded execution is NOT a valid option to us. Second, we also looked into indexing and determined that there was no opportunity to add additional indexes. There were already indexes that covered the query. Lastly, these select queries that cause this deadlock are generated by a third party application so opportunities to tune these queries is very limited.

    1. The cardinality estimator changes in the version you upgraded to, so you probably have different execution plans as a result which is a known and well documented behavior and Microsoft recommends thorough testing before upgrading to the new CE in 2014+ in their documentation. Since you are on 2016 you can try setting the Legacy Cardinality Estimator database scoped configuration option which will put the database affected back to the 70 estimator and would give you the same behaviors as previously on 2012 and earlier. If the deadlock is truly intraquery parallelism then it usually is either a tuning or reducing parallelism as a workaround until the “bug” in query optimization that leads to the merging exchange deadlock between parallel subtasks is fixed.

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Bitten by SSD Bit Rot

Back in 2012, I wrote a post titled Looking at External Disk Performance using USB 3.0 and eSATA with SSD, where I tested a number

Explore

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.