An XEvent a Day (11 of 31) – Targets Week – Using Multiple Targets to Debug Orphaned Transactions

Yesterday’s blog post Targets Week – etw_classic_sync_target covered the ETW integration that is built into Extended Events and how the etw_classic_sync_target can be used in conjunction with other ETW traces to provide troubleshooting at a level previously not possible with SQL Server.  In today’s post we’ll look at how to use multiple targets to simplify analysis of Event collection.

Why Multiple Targets?

You might ask why you would want to use multiple Targets in an Event Session with Extended Events?  The best answer is because each of the Targets handles the Event data in different ways, and by combining their uses, you can easily simplify the task of tracking down problems.  Two days ago I talked about the pair_matching Target and how it only retains Events that have not been matched.  Two years ago out of nowhere, one of the production databases I supported started having transaction log growth problems.  After growing the log for the second time in a day, I started looking at the database because something was obviously not right. 

The first place I looked was the sys.databases DMV and specifically the log_reuse_wait_desc column, which provides the reason that the transaction log for a database is not truncating.  When I queried this DMV, I found that the log_reuse_wait_desc was ACTIVE_TRANSACTION, meaning that the database had an open transaction.  So I ran DBCC OPENTRAN to get some more information about the open transaction (you can query the DMV’s for this information as well) and found that the transaction had been open for over 8 hours.  I queried sys.dm_exec_sessions for the session_id and found that the session was still being used and had submitted a request within the last minute, so it seemed to be an orphaned transaction.

SELECT 
    session_id,
    login_time, 
    last_request_start_time, 
    last_request_end_time
FROM sys.dm_exec_sessions
WHERE session_id = 76

After discussions with the Analyst for the application and the applications vendor, it was decided that the session should be killed forcing a ROLLBACK of the transaction and allowing the log to be truncate.  (I’ll discuss why this might prove be problematic later in this post)  This resolved the problem, at least until the next day when the database began running out of space in the transaction log again, and once again had an open transaction that had been open for hours on a session that was sill being used by the application.  What was really interesting was there was no correlation between the previous days open transactions begin time, and the begin time of the second occurrence of the problem, so it seemed to be a completely random occurrence which was not going to be easy to troubleshoot. 

I created a server side trace and tried over the next two days to figure out what the issue actually was, but didn’t make much head way until I expanded the trace to have the statement starting and completed events along with the Errors and Warnings Events in the trace.  When the problem reoccurred, I was able to read through the trace files using filtering to minimize the Trace Events captured down to the specific spid that held the open transaction and the events that occurred five minutes before and after the transaction_begin_time for the open transaction.  While looking at the event information I found an Attention Event and was able to deduce what had happened.

The application was a ASP.NET application, and the vendor used the CommandTimeout default which is 30 seconds.  What happened was that a process was invoked that called a stored procedure to archive information from a transactional table into an archive table inside of the database, and the number of rows being archived caused the stored procedures execution to exceeded 30 seconds resulting in a timeout in the ASP.NET application, and the application silently handled the exception by doing nothing.  The problem was that the stored procedure issued a BEGIN TRANSACTION before archiving the rows, and when the timeout occurred and the Attention Event was raised, the command aborted leaving the open transaction and creating the problem with the log not truncating. 

The connection was returned to the ASP.NET Connection Pool, and was constantly being reused by the application to do who knows what other operations, which is where killing the connection was potentially a very bad thing to do.  All of the activity performed by this session was performed under the open transaction, so by killing the session, all of the activity would be rolled back.  With no way to tell what exactly would be rolled back, killing the session should not be taken lightly.

For the remainder of this post I am going to show a repro of this particular problem and how to use Multiple Targets in Extended Events to simplify the troubleshooting this.

Setting Up the Demo

To setup a reproduction of this problem you will need two instances of SSMS open.  One of them will be used to connect to the SQL Server normally, and the other will be used to act like the ASP.NET application that originally had the problem.  To setup the second instance of SSMS to act like the ASP.NET application, we are going to set the Execution Timeout using the Options of the Connect to Database Engine window, and we are also going to add an Additional Connection Parameter to the connection to set the Application Name on the connection to “Some Poorly Written App” as shown in the below screenshots from SSMS.

image   image

For the remainder of this blog post I am going to refer to the two different instances of SSMS as Normal and PoorApp in the hopes that this prevents confusion. 

In the Normal SSMS we will create a database and some objects to support the repro of the problem:

CREATE DATABASE [MultiTargetDemo]
GO
USE [MultiTargetDemo]
GO
CREATE TABLE dbo.RandomObjectsArchive
(ArchiveID int identity primary key,
 TableName nvarchar(128),
 IndexName nvarchar(128),
 ColumnName nvarchar(128))
GO
CREATE TABLE dbo.TrackArchiveRunTimes
(RuntimeID int identity primary key,
 ArchiveRuntime datetime DEFAULT(CURRENT_TIMESTAMP))
GO
CREATE PROCEDURE dbo.GenerateRandomObjects
AS
BEGIN TRANSACTION
    INSERT INTO dbo.TrackArchiveRunTimes
    DEFAULT VALUES;

    INSERT INTO RandomObjectsArchive
        (TableName, IndexName, ColumnName)
    SELECT TOP 10000 a.name, i.name, c.name
    FROM sys.objects AS a
        CROSS JOIN sys.indexes AS i
        CROSS JOIN sys.columns AS c
        CROSS JOIN master.dbo.spt_values AS sv
    WHERE sv.type = 'P' 
      AND sv.number < 6 --Adjust to increase runtime
    ORDER BY NEWID() DESC
COMMIT TRANSACTION
GO
USE [master]
GO

The GenerateRandomObjects stored procedure Inserts a row into a tracking table that tracks when the stored procedure was executed, and then simulates a long running archive process by doing something that you should never do in production code.  The sv.number predicate in the query can be increased or decreased based on the performance of the system being tested against to ensure that the stored procedure runs longer than the Execution Timeout setting, which on my PoorApp SSMS instance was set to 10 seconds.  Increasing the value by 1 has an exponential impact on the performance degradation of the stored procedure, so any changes should be made incrementally to ensure that you don’t create a tempdb bloat problem with the Cartesian product of the query being executed.

Setting Up the Event Session

To troubleshoot this problem using Extended Events we will create an Event Session that captures the following Events:

sqlserver.database_transaction_begin

sqlserver.database_transaction_end

sqlserver.sql_statement_starting

sqlserver.sql_statement_completed

sqlserver.sp_statement_starting

sqlserver.sp_statement_completed

sqlserver.rpc_starting

sqlserver.rpc_completed

sqlserver.module_start

sqlserver.module_end

sqlserver.error_reported

We’ll add the following Actions to each of the Events:

sqlserver.session_id

sqlserver.database_id

sqlserver.tsql_stack

and add the sqlserver.sql_text Action to the starting Events so that we can track what is actually being executed.  Every Event in the Event Session will have a Predicate on the sqlserver.client_app_name so that the Event only fires for connections and requests from “Some Poorly Written App”. 

IF EXISTS(SELECT * 
         FROM sys.server_event_sessions 
         WHERE name='OrphanedTransactionHunter') 
    DROP EVENT SESSION [OrphanedTransactionHunter] ON SERVER; 
CREATE EVENT SESSION OrphanedTransactionHunter
ON SERVER
ADD EVENT sqlserver.database_transaction_begin
(    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack)
    WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
ADD EVENT sqlserver.database_transaction_end
(    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack)
    WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
ADD EVENT sqlserver.sql_statement_starting
(    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack, sqlserver.sql_text)
    WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
ADD EVENT sqlserver.sql_statement_completed
(    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack)
    WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
ADD EVENT sqlserver.sp_statement_starting
(    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack, sqlserver.sql_text)
    WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
ADD EVENT sqlserver.sp_statement_completed
(    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack)
    WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
ADD EVENT sqlserver.rpc_starting
(    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack, sqlserver.sql_text)
    WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
ADD EVENT sqlserver.rpc_completed
(    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack)
    WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
ADD EVENT sqlserver.module_start
(    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack, sqlserver.sql_text)
    WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
ADD EVENT sqlserver.module_end
(    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack)
    WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
ADD EVENT sqlserver.error_reported
(    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack)
    WHERE (sqlserver.client_app_name = 'Some Poorly Written App'))
ADD TARGET package0.ring_buffer,
ADD TARGET package0.pair_matching
( SET begin_event = 'sqlserver.database_transaction_begin',
      begin_matching_actions = 'sqlserver.session_id',
      end_event = 'sqlserver.database_transaction_end',
      end_matching_actions = 'sqlserver.session_id',
      respond_to_memory_pressure = 1
)
WITH (MAX_DISPATCH_LATENCY=5 SECONDS, TRACK_CAUSALITY=ON)
GO

ALTER EVENT SESSION OrphanedTransactionHunter
ON SERVER
STATE=START
GO

Fired Events will be dispatched to two different Targets, the package0.ring_buffer to capture the Raw Data (in a true production environment, the package0.asynchronous_file_target would generally be a better Target for Raw Data capture of any volume), and the package0.pair_matching Target which has been configured to match on the sqlserver.database_transaction_begin/end Events based on the sqlserver.session_id Action.  To ensure that we can track the relationship between events, the Event Session will have TRACK_CAUSALITY set to ON, and to minimize the time it takes for Events to be dispatched for our test, the MAX_DISPATCH_LATENCY will be set to 5 seconds.

Putting It All Together

With the Event Session running, we can change over to our PoorApp SSMS instance and execute the GenerateRandomObjects stored procedure inside of the MultiTargetDemo database.

EXECUTE MultiTargetDemo.dbo.GenerateRandomObjects

When this executes, the command will timeout and leave the transaction open, simulating the original problem exactly.  Once the query times out, switch back to the Normal SSMS Instance and in a new window execute the stored procedure again and allow it complete its execution.  Since the default timeout of 0 is used in the Normal SSMS Instance, the execution will not time out.  Then we can look at the sys.databases DMV and see that the log_reuse_wait_desc is ACTIVE_TRANSACTION.

SELECT log_reuse_wait_desc 
FROM sys.databases
WHERE database_id = DB_ID('MultiTargetDemo')

image

If we look at DBCC OPENTRAN for the MultiTargetDemo database we will see our orphaned transaction:

DBCC OPENTRAN([MultiTargetDemo])

image

As I mentioned earlier in this post, the transaction can also be seen in the DMV’s:

SELECT 
    dtst.session_id, 
    dtdt.database_id,
    dtst.transaction_id, 
    dtat.name,
    dtdt.database_transaction_begin_time
FROM sys.dm_tran_session_transactions AS dtst
JOIN sys.dm_tran_active_transactions AS dtat
    ON dtst.transaction_id = dtat.transaction_id
JOIN sys.dm_tran_database_transactions AS dtdt
    ON dtdt.transaction_id = dtst.transaction_id
WHERE database_id = DB_ID('MultiTargetDemo')

image

Now that we have our problem reproduced, lets look at how we can use the information captured by our Extended Event Session to track it back to the source of the problem.  First we’ll query the pair_matching Target to find out information about the sqlserver.database_transaction_begin Event that was unmatched.

-- Query the XML to get the Target Data
SELECT 
    n.value('(event/@name)[1]', 'varchar(50)') AS event_name,
    n.value('(event/@package)[1]', 'varchar(50)') AS package_name,
    DATEADD(hh, 
            DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
            n.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],
    n.value('(event/action[@name="session_id"]/value)[1]', 'int') as session_id,
    n.value('(event/action[@name="database_id"]/value)[1]', 'int') as [database_id],
    n.value('(event/action[@name="tsql_stack"]/value)[1]', 'nvarchar(max)') as tsql_stack,    
    n.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)') as attach_activity_id
FROM
(    SELECT td.query('.') as n
    FROM 
    (
        SELECT CAST(target_data AS XML) as target_data
        FROM sys.dm_xe_sessions AS s 
        JOIN sys.dm_xe_session_targets AS t 
            ON t.event_session_address = s.address
        WHERE s.name = 'OrphanedTransactionHunter'
          AND t.target_name = 'pair_matching'
    ) AS sub
    CROSS APPLY target_data.nodes('PairingTarget/event') AS q(td)
) as tab
-- We are interested in unmatched sqlserver.database_transaction_begin Events
WHERE n.value('(event/@name)[1]', 'varchar(50)') = 'database_transaction_begin'
ORDER BY session_id, activity_id
GO

image

From this we can see our orphaned transaction event, and find the attach_activity_id of that Event.  The attach_activity_id Action is added to the Events in an Event Session when TRACK_CAUSALITY is turned ON.  There are two pieces of information contained in the attach_activity_id Action, the activity Guid (the first 36 characters of the value) and the sequence number for the Event, the number following the Guid.  The Guid can be used to find related Events, and the sequence number can be used to determine the order that the Events occurred.  By using the Guid from the attach_activity_id Action from our first query, we can query the ring_buffer Target and parse out the specific Events we are interested in looking at further.

-- Query the XML to get the Target Data
SELECT 
    n.value('(event/@name)[1]', 'varchar(50)') AS event_name,
    n.value('(event/@package)[1]', 'varchar(50)') AS package_name,
    DATEADD(hh, 
            DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
            n.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],  
    n.value('(event/action[@name="database_id"]/value)[1]', 'int') as [database_id],
    n.value('(event/action[@name="session_id"]/value)[1]', 'int') as [session_id],
    n.value('(event/data[@name="object_id"]/value)[1]', 'int') AS [object_id],
    n.value('(event/data[@name="object_type"]/value)[1]', 'nvarchar(128)') AS [object_type],
    n.value('(event/data[@name="object_name"]/value)[1]', 'nvarchar(128)') AS [object_name],
    n.value('(event/data[@name="error"]/value)[1]', 'int') AS [error],
    n.value('(event/data[@name="severity"]/value)[1]', 'int') AS [severity],
    n.value('(event/data[@name="state"]/value)[1]', 'int') AS [state],
    n.value('(event/data[@name="user_defined"]/value)[1]', 'bit') AS [user_defined],
    n.value('(event/data[@name="message"]/value)[1]', 'nvarchar(4000)') AS [message],
    n.value('(event/data[@name="duration"]/value)[1]', 'int') AS [duration],
    n.value('(event/data[@name="row_count"]/value)[1]', 'int') AS [row_count],
    n.value('(event/data[@name="cpu"]/value)[1]', 'int') AS [cpu],
    n.value('(event/data[@name="reads"]/value)[1]', 'int') AS [reads],
    n.value('(event/data[@name="writes"]/value)[1]', 'int') AS [writes],
    n.value('(event/action[@name="tsql_stack"]/value)[1]', 'nvarchar(max)') AS [tsql_stack],
    n.value('(event/data[@name="offset"]/value)[1]', 'int') AS [offset],
    n.value('(event/data[@name="offset_end"]/value)[1]', 'int') AS [offset_end],
    n.value('(event/data[@name="nest_level"]/value)[1]', 'int') AS [nest_level],           
     n.value('(event/action[@name="sql_text"]/value)[1]', 'nvarchar(max)') AS [sql_text],
    CAST(SUBSTRING(n.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 1, 36) AS uniqueidentifier) AS activity_id,
    CAST(SUBSTRING(n.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 38, 10) AS int) AS event_sequence
FROM
(    SELECT td.query('.') as n
    FROM 
    (
        SELECT CAST(target_data AS XML) as target_data
        FROM sys.dm_xe_sessions AS s 
        JOIN sys.dm_xe_session_targets AS t 
            ON t.event_session_address = s.address
        WHERE s.name = 'OrphanedTransactionHunter'
          AND t.target_name = 'ring_buffer'
    ) AS sub
    CROSS APPLY target_data.nodes('RingBufferTarget/event') AS q(td)
) as tab
-- We are interested in unmatched sqlserver.database_transaction_begin Events
WHERE SUBSTRING(n.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 1, 36) = 'ADCF379A-4BCA-41BA-9B08-4C2265894392'
ORDER BY session_id, event_sequence
GO

 

With a little bit more work, we can reduce the XML parsing to only the important data elements that we need, and we can parse the tsql_stack Action to retrieve the related Event level statement_text from the sys.dm_exec_sql_text() DMF, since the sql_text Action did not have the intended information.

-- Query the XML to get the Target Data
SELECT 
    event_name,
    timestamp,
    database_id, 
    OBJECT_NAME(st.objectid, st.dbid) AS ObjectName,
    SUBSTRING(st.text, (tsql_stack.value('(/frame/@offsetStart)[1]', 'int')/2)+1, 
        ((CASE tsql_stack.value('(/frame/@offsetEnd)[1]', 'int')
            WHEN -1 THEN DATALENGTH(st.text)
            ELSE tsql_stack.value('(/frame/@offsetEnd)[1]', 'int')
            END - tsql_stack.value('(/frame/@offsetStart)[1]', 'int'))/2) + 1) AS statement_text,
    duration,
    activity_id,
    event_sequence
FROM
(
SELECT 
    n.value('(event/@name)[1]', 'varchar(50)') AS event_name,
    DATEADD(hh, 
            DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
            n.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],  
    n.value('(event/action[@name="database_id"]/value)[1]', 'int') as [database_id],
    n.value('(event/data[@name="duration"]/value)[1]', 'int') AS [duration],
    CAST(n.value('(event/action[@name="tsql_stack"]/value)[1]', 'nvarchar(max)') AS XML) AS [tsql_stack],
    CAST(SUBSTRING(n.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 1, 36) AS uniqueidentifier) AS activity_id,
    CAST(SUBSTRING(n.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 38, 10) AS int) AS event_sequence
FROM
(    SELECT td.query('.') as n
    FROM 
    (
        SELECT CAST(target_data AS XML) as target_data
        FROM sys.dm_xe_sessions AS s 
        JOIN sys.dm_xe_session_targets AS t 
            ON t.event_session_address = s.address
        WHERE s.name = 'OrphanedTransactionHunter'
          AND t.target_name = 'ring_buffer'
    ) AS sub
    CROSS APPLY target_data.nodes('RingBufferTarget/event') AS q(td)
) as tab
) as tab2
CROSS APPLY sys.dm_exec_sql_text(tsql_stack.value('xs:hexBinary(substring((/frame/@handle)[1], 3))', 'varbinary(max)')) AS st
-- We are interested in Events in activity_id sequence of the orphaned transaction only
WHERE activity_id = 'ADCF379A-4BCA-41BA-9B08-4C2265894392'
ORDER BY session_id, event_sequence
GO

image

The highlighted value shows that the stored procedure execution ended at the Execution Timeout limit that was set for the PoorApp SSMS Instance.  Beyond that we can track each of the statements and see that when the execution ended, it was in the INSERT INTO RandomObjectsArchive statement in the stored procedure allowing us to target our efforts at troubleshooting to the specific problem in a short amount of time.

3 thoughts on “An XEvent a Day (11 of 31) – Targets Week – Using Multiple Targets to Debug Orphaned Transactions

  1. I am sort of confuzzled as to why this would be happening. It is true that .NET recycles connections through the use of a pool. However every time a connection is taken out of the pool for reuse, the connection manager executes an (internal) procedure called sp_reset_connection which, among other things, terminates whatever open transactions were left from the previous time the connection was in use. So why would the .NET scenario lead to an orphaned transaction situation?

    1. It is possible that the connection is not reused for some time, e.g. when there is an abundant number of connections in the pool and the pool is configured not to shrink the pool. Depending on whatever happens on the server it is possible that the connection is not reused for some time and then sp_reset_connection is not called.

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.