Execution Plan Sanitizer v2

Almost two years ago I blogged about Sanitizing Execution Plans using PowerShell. The original intent behind this idea was to be able to provide completely sanitized execution plans that could be shared outside of your business environment without the risk of sharing any design elements that may be considered Intellectual Property. Since writing that post I’ve had many people contact me with suggestions for the sanitized output, and so I’ve made a bunch of minor changes to the code to fix various output issues and suggestions.

Fast forward two years and I believe this idea is even more useful today, especially since you can now upload execution plans using SQL Sentry Plan Explorer to the SQLPerformance.com website to get assistance from the community in diagnosing performance problems or query plan regressions. When I found out about this feature being included in Plan Explorer, my first question was whether the ability to sanitize or obfuscate the plan would be included, and I found that it wasn’t part of the initial launch plans but may eventually be part of the product. To be perfectly honest, the number of total users that worry about schema information in an execution plan is a minority so from a business standpoint I totally understand it not being in the first release of this functionality. We’ve already seen a few plans posted to the SQLPerformance.com site show up on the #sqlhelp tag on Twitter, so I know that the new upload feature is being used.

Earlier today I was asked if I realized that the script was broken and didn’t sanitize the output XML properly. Unfortunately I never took the time to upload the changes I’ve made over the last two years, so ran my local script through some plans and then asked my good friend Aaron Bertrand (Blog|Twitter) to test out a new version of it for me. He sent me back some feedback which I was able to quickly implement and now there is a v2 version of the script that should handle most of the plan sanitizing issues that existed in the initial version of the PowerShell script.

2013_2_27_ExecutionPlanSanitizer

What plan_handle is Extended Events sqlserver.plan_handle action returning?

The topic for today’s post comes from a forums question and a subsequent Connect feedback item where someone noted that the plan_handle being returned by Extended Events using the sqlserver.plan_handle action was not available in the plan cache even when queried immediately following completion of the event that should have produced the plan cache entry.  To setup the appropriate context of this post, let’s first take a look at the original repro for this problem.

-- Create the Event Session
IF EXISTS(SELECT * 
          FROM sys.server_event_sessions 
          WHERE name='SQLStmtEvents')
    DROP EVENT SESSION SQLStmtEvents 
    ON SERVER;
GO
CREATE EVENT SESSION SQLStmtEvents
ON SERVER
ADD EVENT sqlserver.sql_statement_completed(
    ACTION (sqlserver.client_app_name,
            sqlserver.plan_handle,
            sqlserver.sql_text,
            sqlserver.tsql_stack,
            package0.callstack,
            sqlserver.request_id)
--Change this to match the AdventureWorks, 
--AdventureWorks2008 or AdventureWorks2008 DB_ID()
WHERE sqlserver.database_id=5 
)
ADD TARGET package0.ring_buffer
WITH (MAX_DISPATCH_LATENCY=5SECONDS, TRACK_CAUSALITY=ON)
GO
 
-- Start the Event Session
ALTER EVENT SESSION SQLStmtEvents 
ON SERVER 
STATE = START;
GO
 
-- Change database contexts and insert one row
USE AdventureWorks2008;
GO
INSERT INTO [dbo].[ErrorLog]([ErrorTime],[UserName],[ErrorNumber],[ErrorSeverity],[ErrorState],[ErrorProcedure],[ErrorLine],[ErrorMessage])
VALUES(getdate(),SYSTEM_USER,-1,-1,-1,'ErrorProcedure',-1,'An error occurred')
GO
 
-- Retrieve the Event Data from the Event Session Target
SELECT
    event_data.value('(event/@name)[1]', 'varchar(50)') AS event_name,
    CAST(event_data.value('(event/action[@name="plan_handle"]/value)[1]', 'varchar(max)') AS XML) as plan_handle,
    event_data.value('(event/action[@name="sql_text"]/value)[1]', 'varchar(max)') AS sql_text
FROM(    SELECT evnt.query('.') AS event_data
        FROM
        (    SELECT CAST(target_data AS xml) AS TargetData
            FROM sys.dm_xe_sessions AS s
            JOIN sys.dm_xe_session_targets AS t
                ON s.address = t.event_session_address
            WHERE s.name = 'SQLStmtEvents'
              AND t.target_name = 'ring_buffer'
        ) AS tab
        CROSS APPLY TargetData.nodes ('RingBufferTarget/event') AS split(evnt) 
     ) AS evts(event_data)
     
-- Use the plan_handle from one of the Events to get the query plan
DECLARE @plan_handle varbinary(64) = 0x06000800DD8D6D0840015585000000000000000000000000
SELECT * 
FROM sys.dm_exec_query_plan(@plan_handle)

It doesn’t matter how many times you run the INSERT statement in the above repro, you won’t get a plan_handle back from the Event Session that you can query from the plan cache using sys.dm_exec_query_plan().  However, if you were to search the plan cache XML by parsing it, you would find a cached execution plan for the INSERT statement:

SELECT cp.plan_handle, st.text
FROM sys.dm_exec_cached_plans cp 
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) st
WHERE st.text like '%INSERT%INTO%ErrorLog%'
-- Prevent caching this statement
OPTION(RECOMPILE);

The output of this will be similar to:

image

If you look at the text output you will notice that this isn’t exactly the statement that was executed, so what happened here exactly?  When an adhoc statement is submitted to SQL Server, the optimizer will auto parameterize the inline literals with variables to compile the query plan for execution, and this is what is cached in by SQL Server in the case of this INSERT statement.  You can always tell that a plan was auto parameterized by the way that the variables are listed inside of the statement text for the plan.  The topic of plan caching and auto parameterization are covered in detail in the MSDN whitepaper Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005.

The problem in this case is not that Extended Events is returning incorrect information for the plan_handle; the handle that is being returned is valid at the point that the event fired for the adhoc request against SQL Server, the adhoc plan is not cached by the Database Engine, and instead the auto parameterized plan is what is actually cached.  The caching of the auto parameterized plan is actually an optimization inside of SQL Server in this case since the plan for the inline literals is not likely to be reused by the optimizer for subsequent adhoc SQL Statements performing the exact same INSERT operation but with different literals for the values being inserted.  This was discussed at some depth on the SQL Programmability & API Development Team Blog post: 4.0 Query Parameterization.  Under SQL Server 2005 RTM and SQL Server 2005 SP1, there was a significant problem with plan cache bloat related to single use adhoc query plans and a significant change was made to the caching behavior of SQL Server 2005 in SP2 that have been carried forward in the product as detailed in the SQL Programmability & API Development Team Blog post: 3.0 Changes in Caching Behavior between SQL Server 2000, SQL Server 2005 RTM and SQL Server 2005 SP2.

If we change the original repro for this problem to use a parameterized query the plan_handle returned by the sqlserver.plan_handle action will return the same plan_handle that is cached by SQL Server as show in the following modification of the original repro for this problem:

-- Create the Event Session
IF EXISTS(SELECT * 
          FROM sys.server_event_sessions 
          WHERE name='SQLStmtEvents')
    DROP EVENT SESSION SQLStmtEvents 
    ON SERVER;
GO
CREATE EVENT SESSION SQLStmtEvents
ON SERVER
ADD EVENT sqlserver.sql_statement_completed(
    ACTION (sqlserver.client_app_name,
            sqlserver.plan_handle,
            sqlserver.sql_text,
            sqlserver.tsql_stack,
            package0.callstack,
            sqlserver.request_id)
--Change this to match the AdventureWorks, 
--AdventureWorks2008 or AdventureWorks2008 DB_ID()
WHERE sqlserver.database_id=5 
)
ADD TARGET package0.ring_buffer
WITH (MAX_DISPATCH_LATENCY=5SECONDS, TRACK_CAUSALITY=ON)
GO
 
-- Start the Event Session
ALTER EVENT SESSION SQLStmtEvents 
ON SERVER 
STATE = START;
GO
 
-- Change database contexts and insert one row parameterized
USE AdventureWorks2008R2;
GO
DECLARE @ErrorTime datetime = GETDATE(),
        @UserName sysname = SYSTEM_USER,
        @ErrorNumber int = -1,
        @ErrorSeverity int = -1,
        @ErrorState int = -1,
        @ErrorProcedure nvarchar(126) = 'ErrorProcedure',
        @ErrorLine int = 10, 
        @ErrorMessage nvarchar(4000) = 'An error occured'
        
INSERT INTO [dbo].[ErrorLog]
    ([ErrorTime],[UserName],[ErrorNumber],[ErrorSeverity],[ErrorState],
     [ErrorProcedure],[ErrorLine],[ErrorMessage])
VALUES(@ErrorTime,@UserName,@ErrorNumber,@ErrorSeverity,@ErrorState,
        @ErrorProcedure,@ErrorLine,@ErrorMessage)
GO 5
 
-- Retrieve the Event Data from the Event Session Target
SELECT event_name,
    plan_handle,
    sql_text,
    query_plan
FROM
(
SELECT
    event_data.value('(event/@name)[1]', 'varchar(50)') AS event_name,
    CAST(event_data.value('(event/action[@name="plan_handle"]/value)[1]', 'varchar(max)') AS XML) as plan_handle,
    event_data.value('(event/action[@name="sql_text"]/value)[1]', 'varchar(max)') AS sql_text
FROM(    SELECT evnt.query('.') AS event_data
        FROM
        (    SELECT CAST(target_data AS xml) AS TargetData
            FROM sys.dm_xe_sessions AS s
            JOIN sys.dm_xe_session_targets AS t
                ON s.address = t.event_session_address
            WHERE s.name = 'SQLStmtEvents'
              AND t.target_name = 'ring_buffer'
        ) AS tab
        CROSS APPLY TargetData.nodes ('RingBufferTarget/event') AS split(evnt) 
     ) AS evts(event_data)
) AS tab
CROSS APPLY sys.dm_exec_query_plan(plan_handle.value('xs:hexBinary(substring((plan/@handle)[1], 3))', 'varbinary(max)')) as qp

The output of this execution will result in something similar to the following:

image

Note that the plan_handle is the same for every execution of the INSERT operation, and the plan is readily available from the plan cache because the statement executed was parameterized explicitly in this case by the declaration of variables.  This is yet another case for why explicit parameterization of queries is so incredibly important in SQL Server and why adhoc statements, despite the coding simplicity for application developers is a bad design pattern.

One item that you might notice that is different about the parameterized version of the repro for this is that I used the GO 5 operation to execute the parameterized version of the INSERT operation multiple times.  The reason behind this is that when I ran the initial test of the parameterized version of the INSERT tonight, the first plan was not cached by the Database Engine.  You might ask why this was the case, and there is a really good reason.  As a part of the SQLskills Immersion Training I take part in as an instructor, I had the ‘optimize for adhoc workloads’ sp_configure option set on my laptop, and this causes a plan stub to be generated for adhoc batches, which the above parameterize query actually is, for the first execution, and then the full plan is cached on the second execution only.  To mitigate against the second repro being a problem for someone I decided to have it execute the INSERT five consecutive times to force caching of the plan to show that parameterization does impact the actual plan handle returned by sqlserver.plan_handle action.

Sanitizing Execution Plans using Powershell

A few weeks ago I tweeted a couple of images showing a really nasty execution plan in SQL Sentry’s free Plan Explorer tool being displayed on a 65 inch touch screen LCD monitor.

Large Plan Explorer Plan Large Plan Explorer Plan2

 

I’ve been using Plan Explorer to look at execution plans since its early beta’s, when it was known as Plan Viewer, after seeing a demo of Performance Advisor v6 by Greg Gonzalez (Blog|Twitter).  During the beta I provided a lot of feed back to Greg and I pushed the tool to its limits a couple of times with some of the nasty execution plans I had.  In the case of the plan shown above, Management Studio falls to its knees trying to manage the plan, and is everything except useful when the plan is open, but Plan Explorer handles it without a hitch.

After posting these images and getting some comments, I began to realize how useful this type of plan could be in doing demo’s of the tool, but one of the problems with an execution plan is that it contains a lot of very specific information about the database structures and code that could be considered intellectual property.  It is possible to retain the integrity of the plan structure, and what it shows, while removing all of the object and statement information from the plan by replacing them with meaningless values.  During the beta’s I did this a couple of times when I came across an issue, or thought of a useful feature, but it was always a painstaking manual process.  This time I set out to do it with Powershell by parsing the XML Showplan and performing replace operations on the XML nodes. 

Loading the XML Showplan from a .sqlplan file is relatively straightforward, and I’ve done quite a bit of plan shredding in the past, so I thought this would be a very simple task to complete.  It turns out that I had a bit to learn about processing XML in .NET and by extension Powershell.  The first problem I encountered was how to use the ShowPlan XmlNamespace so that I could shred the document to find the elements of interest.  This is accomplished using the XmlNamespaceManager in .NET.

#Setup the XmlNamespaceManager and add the ShowPlan Namespace to it.

$nsMgr = new-object 'System.Xml.XmlNamespaceManager' $xml.NameTable;

$nsMgr.AddNamespace("sm", "http://schemas.microsoft.com/sqlserver/2004/07/showplan");

The SelectNodes() method of the .NET XmlDocument is overloaded and accepts an XPath and XmlNamespaceManager to determine the nodes to select from the XmlDocument.  With the correct nodes selected replacing the information as easy as setting a new value for the the element of interest.

#Replace the StatementText for all statements

$xml.SelectNodes("//sm:StmtSimple", $nsMgr) | % {

$_.StatementText = "–Statement text stripped by ExecutionPlanSanitizer Powershell Script (http://www.sqlskills.com/blogs/jonathan)" } 

The attached Powershell script replaces the statements, database names, object names, index names, non-dbo schema names,  and column names of the input file and then writes the new plan to an output file that still conforms to the schema and can be shared without sharing intellectual property.

ExecutionPlanSanitizer.ps1 (3.26 kb)