SQL 101: Parallelism Inhibitors – Scalar User Defined Functions

As Kimberly blogged about recently, SQLskills is embarking on a new initiative to blog about basic topics, which we’re calling SQL101. We’ll all be blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

At the spring SQLintersection conference in Orlando, one of the attendees asked me a question about why a query wouldn’t go parallel even though the cost for the plan was in the hundreds.  There are actually a number of different reasons why a query might execute serially, but usually one of the first things that comes to mind is scalar user defined functions. Developers love to use scalar user defined functions inside of SQL Server because it lets them compartmentalize code and easily reuse it, which is a common goal in object-oriented programming, but it’s also a performance anti-pattern for SQL Server as I’ll demonstrate in this post.

For the purposes of this post, I’m using the WorldWideImporters Standard Edition example database, and I’m going to recreate a scalar user defined function that I recently saw in a client engagement that was used to format the output of a column strictly for client presentation purposes.  Using the [Warehouse].[ColdRoomTemperatures_Archive] table, which stores 5 second intervals of temperature readings as the basis for my example report, we are going to calculate the delta temperature between samples. For the delta if the number is greater than zero, then we want it to show with a + sign before the value and it if is below zero a – sign. An example query to provide the basic data set for the report would be:

USE [WideWorldImporters]
GO
SELECT 
    a.ColdRoomSensorNumber,
    a.Temperature AS StartingTemp, 
    b.Temperature AS EndingTemp, 
    a.ValidFrom, 
    a.ValidTo, 
    b.Temperature - a.Temperature AS Delta 
FROM Warehouse.ColdRoomTemperatures_Archive AS a
INNER JOIN Warehouse.ColdRoomTemperatures_Archive AS b 
   ON a.ValidTo = b.ValidFrom 
      AND a.ColdRoomSensorNumber = b.ColdRoomSensorNumber
WHERE a.ValidFrom BETWEEN '05/01/2016' AND '05/02/2016'
ORDER BY ColdRoomSensorNumber, ValidFrom;

For the sake of argument, the scenario I am using is a presentation layer issue, and the argument can be made that this problem should be handled by the presentation/application tier to do the formatting requested.  However, there could be scenarios where the same data is needed in more than one application, reports, API feeds, etc. so for consistency the formatting is determined to be required for the SQL output.  This is where a developer might write a function to handle the formatting so the code can be reused anytime we need to output temperature deltas to make sure that everything does it exactly the same way.

IF OBJECT_ID('dbo.GetFormatedTemperatureDelta') IS NOT NULL
    DROP FUNCTION dbo.GetFormatedTemperatureDelta;
GO
CREATE FUNCTION dbo.GetFormatedTemperatureDelta
(@StartingTemperature decimal(10,2), @EndingTemperature decimal(10,2))
RETURNS VARCHAR(10)
AS 
BEGIN
    DECLARE @Result VARCHAR(10);
    SET @Result =    CASE 
                        WHEN @StartingTemperature-@EndingTemperature > 0 
                            THEN '+'+CAST(@StartingTemperature-@EndingTemperature AS VARCHAR)
                        WHEN @StartingTemperature-@EndingTemperature < 0 
                            THEN CAST(@StartingTemperature-@EndingTemperature AS VARCHAR)
                        ELSE '0.00' 
                    END;    
    RETURN(@Result);
END
GO

SELECT 
    a.ColdRoomSensorNumber,
    a.Temperature AS StartingTemp, 
    b.Temperature AS EndingTemp, 
    a.ValidFrom, 
    a.ValidTo, 
    dbo.GetFormatedTemperatureDelta(b.Temperature, a.Temperature) AS Delta 
FROM Warehouse.ColdRoomTemperatures_Archive AS a
INNER JOIN Warehouse.ColdRoomTemperatures_Archive AS b 
   ON a.ValidTo = b.ValidFrom 
      AND a.ColdRoomSensorNumber = b.ColdRoomSensorNumber
WHERE a.ValidFrom BETWEEN '05/01/2016' AND '05/02/2016'
ORDER BY ColdRoomSensorNumber, ValidFrom;

As soon as they apply this formatting to the reporting query using the function, performance may or may not be noticeably impacted. Using the above two queries the duration of execution for both is just above 1 second of time based on the current data being requested for the report as shown by the STATISTICS TIME output for both:

SQL Server Execution Times:
CPU time = 1155 ms,  elapsed time = 643 ms.

SQL Server Execution Times:
CPU time = 1250 ms,  elapsed time = 1680 ms

However, notice the difference in the duration for the original statement vs the statement using the user defined function.  The user defined function took over 2.5 times the duration.  This problem gets worse as the size of the data set gets larger, but what we don’t see is the impact that this had to the way the query executes.  Comparing the before function use and after function use execution plans:

image
Plan Without Function

image
Plan With Function

Before using the function, we had a parallel execution and after the plan is now serial.  If we look at what happens for each statements execution with Extended Events using the sqlserver.sql_statement_completed and sqlserver.module_end events, filtered to my specific session_id and tracking how events relate to each other with TRACK_CAUSLITY=ON for the session, we’ll find that the function is executing for every row returned by the query, turning our set based operation into a RBAR (row-by-agonizing-row) operation.

CREATE EVENT SESSION [TrackFunctions] ON SERVER 
ADD EVENT sqlserver.module_end(
    WHERE ([sqlserver].[session_id]=(85))),
ADD EVENT sqlserver.sql_statement_completed(
    WHERE ([sqlserver].[session_id]=(85)))
WITH (TRACK_CAUSALITY=ON)
GO

image
Extended Event Session Results (grouped by activity_id)

We could change the function to do nothing at all and the impacts would be exactly the same.  The query with the scalar user defined function will always run serially and row-by-row with the exception of if the function is created using SQLCLR which does allow scalar user defined functions that DO NOT perform data access to leverage parallelism.  However, for this simple logic SQLCLR wouldn’t be required or recommended just to gain parallel query execution back. The easier fix is to simply inline the code to the original query:

SELECT 
    a.ColdRoomSensorNumber,
    a.Temperature AS StartingTemp, 
    b.Temperature AS EndingTemp, 
    a.ValidFrom, 
    a.ValidTo, 
    CASE WHEN b.Temperature - a.Temperature > 0 THEN '+'+CAST(b.Temperature-a.Temperature AS VARCHAR)
        WHEN b.Temperature - a.Temperature < 0 THEN CAST(b.Temperature-a.Temperature AS VARCHAR)
        ELSE '0.00' END         
         AS Delta 
FROM Warehouse.ColdRoomTemperatures_Archive AS a
INNER JOIN Warehouse.ColdRoomTemperatures_Archive AS b 
   ON a.ValidTo = b.ValidFrom 
      AND a.ColdRoomSensorNumber = b.ColdRoomSensorNumber
WHERE a.ValidFrom BETWEEN '05/01/2016' AND '05/02/2016'
ORDER BY ColdRoomSensorNumber, ValidFrom
GO

SQL Server Execution Times:
CPU time = 1233 ms,  elapsed time = 676 ms.

image
Plan with inline expression

Here we have the fast execution time with a parallel plan but we are sacrificing the reuse of the code for other places where we might want to encapsulate the same logic.  However, that doesn’t have to be the case since this logic can be done inline to the query, it can also be written into an inline-table valued function.

IF OBJECT_ID('dbo.GetFormatedTemperatureDelta_tvf') IS NOT NULL
    DROP FUNCTION dbo.GetFormatedTemperatureDelta_tvf;
GO
CREATE FUNCTION dbo.GetFormatedTemperatureDelta_tvf
(@StartingTemperature decimal(10,2), @EndingTemperature decimal(10,2))
RETURNS TABLE
AS 
RETURN (SELECT Delta =CASE 
                        WHEN @StartingTemperature-@EndingTemperature > 0 
                            THEN '+'+CAST(@StartingTemperature-@EndingTemperature AS VARCHAR)
                        WHEN @StartingTemperature-@EndingTemperature < 0 
                            THEN CAST(@StartingTemperature-@EndingTemperature AS VARCHAR)
                        ELSE '0.00' 
                    END)
GO

SELECT 
    a.ColdRoomSensorNumber,
    a.Temperature AS StartingTemp, 
    b.Temperature AS EndingTemp, 
    a.ValidFrom, 
    a.ValidTo, 
    Delta 
FROM Warehouse.ColdRoomTemperatures_Archive AS a
INNER JOIN Warehouse.ColdRoomTemperatures_Archive AS b 
   ON a.ValidTo = b.ValidFrom 
      AND a.ColdRoomSensorNumber = b.ColdRoomSensorNumber
CROSS APPLY dbo.GetFormatedTemperatureDelta_tvf(b.Temperature, a.Temperature)
WHERE a.ValidFrom BETWEEN '05/01/2016' AND '05/02/2016'
ORDER BY ColdRoomSensorNumber, ValidFrom
GO

SQL Server Execution Times:
CPU time = 1251 ms,  elapsed time = 729 ms.

image
Plan With TVF

Here we get the best of both worlds, a parallel execution plan and the ability to reuse this logic in other places where we need the formatted output.  If you look closely at the plan and the Compute Scalar operator, you will find that it is identical to the plan with inline code shown above.  As a consultant, I’ve done a lot of conversions of scalar user defined functions to inline table valued functions to resolve performance issues and improve code scalability for clients. This is a very common problem to see in engagements and understanding the impacts of scalar user defined functions to performance is important for fast performance and optimizing TSQL code.

Tracking Compiles with Extended Events

This post comes from a question by Tara Kizer on DBA StackExchange related to Compilations/Sec in PerfMon and using Extended Events to try and track the compilations that were happening in SQL Server for one of their clients.  There are a number of events in Extended Events that relate to plans in SQL Server, and Tara was using the sqlserver.query_pre_execution_showplan event as a way of tracking compiles with Extended Events.  When I read the post on StackExchange this stood out to me as not being the correct event for tracking an actual compilation, and I would have selected the query_post_compilation_showplan event instead.  These two events have similar descriptions, but they differ by one specific word:

query_pre_execution_showplan

Occurs after a SQL statement is compiled. This event returns an XML representation of the estimated query plan that is generated when the query is optimized. Using this event can have a significant performance overhead so it should only be used when troubleshooting or monitoring specific problems for brief periods of time.

query_post_compilation_showplan

Occurs after a SQL statement is compiled. This event returns an XML representation of the estimated query plan that is generated when the query is compiled. Using this event can have a significant performance overhead so it should only be used when troubleshooting or monitoring specific problems for brief periods of time.

NOTE: Both of these event descriptions state “Using this event can have a significant performance overhead so it should only be used when troubleshooting or monitoring specific problems for brief periods of time.” There are only a handful of events in Extended Events that include this warning in their description and each can introduce significant performance impacts to production workloads, so using them in live workloads is not something I’d recommend. (query_post_execution_showplan impact post)

The events aren’t exactly the same in description and occur at different times using the repro example from the StackExchange post. Using a much larger event session definition that includes additional events, it is easy to see where compilations actually are happening.  The event session definition used for proofing this is:

CREATE EVENT SESSION [Compiles] ON SERVER
ADD EVENT sqlserver.query_post_compilation_showplan(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.query_pre_execution_showplan(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sp_cache_hit(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sp_cache_insert(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sp_cache_miss(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sp_cache_remove(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sp_statement_completed(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sp_statement_starting(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sql_batch_completed(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sql_batch_starting(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sql_statement_completed(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sql_statement_recompile(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sql_statement_starting(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.uncached_sql_batch_statistics(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0)))
WITH (MAX_MEMORY=4096 KB,
      EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
      MAX_DISPATCH_LATENCY=30 SECONDS,
      MAX_EVENT_SIZE=0 KB,
      MEMORY_PARTITION_MODE=NONE,
      TRACK_CAUSALITY=ON,
      STARTUP_STATE=OFF)
GO

Using part of the repro example from the StackExchange post with slight changes to include clearing the plan cache and also to break the steps down into individual batches and then individual statements we can show how the plans were being used from cache and where compiles actually are occurring.

CREATE TABLE t1 (transaction_id int, Column2 varchar(100));
CREATE TABLE t2 (Column1 varchar(max), Column2 varchar(100));
GO

CREATE TRIGGER t2_ins
ON t2
AFTER INSERT
AS

INSERT INTO t1
SELECT (SELECT TOP 1 transaction_id FROM sys.dm_tran_active_transactions), Column2
FROM inserted;
GO

DBCC FREEPROCCACHE
GO

--Both of these show compilation events
INSERT INTO t2 VALUES ('row1', 'value1');
INSERT INTO t2 VALUES ('row2', 'value2');
GO

--Both of these show compilation events
INSERT INTO t2 VALUES ('row1', 'value1');
INSERT INTO t2 VALUES ('row2', 'value2');
GO

--Both of these show compilation events
INSERT INTO t2 VALUES ('row1', 'value1');
GO

INSERT INTO t2 VALUES ('row2', 'value2');
GO

Here you can see the first compilation happening for the INSERT statements as prepared plans being auto-parameterized in the green box. The trigger is compiled in the red box and the plan is inserted into the cache as shown by the sp_cache_insert event. Then in the orange box the trigger execution gets a cache hit and reuses the trigger plan for the second INSERT statement in the batch, so it’s not compiling every execution of the INSERT command and the plan does get reused as you can see with the sp_cache_hit event for the trigger.  The uncached_sql_batch_statistics event also fires for the initial batch since it doesn’t have a plan in cache and has to compile.

If we run the two INSERT statements individually again after the first execution the trigger doesn’t compile again as shown in the events below:

Here the first statement encounters a cache hit for the prepared auto-parameterized version of the statement in cache but a miss for the adhoc batch that was submitted. The trigger gets a cache hit and doesn’t compile again as shown in the red block of events. The green block of events repeats this behavior for the second INSERT statement run as a separate batch. However, in every case you still see the query_pre_execution_showplan event firing which I can only attribute to the difference in being optimized vs compiled in the event description, but the trigger isn’t compiling for every execution as shown by these series of events.  The query_post_compilation_showplan event only fires when the plan is actually being compiled for the statements and/or trigger.

SQL Sentry Plan Explorer Completely Free!

Just in case you missed the announcement by Greg Gonzalez at SQL Sentry this morning, Plan Explorer Pro and what would have become Plan Explorer Ultimate are now completely free.

plan-explorer-blog-post

Since SQL Sentry first released Plan Explorer in late 2007, I have been a huge fan and this is a tool that is one of my most heavily used in consulting work. I recommend it to clients almost everyday and routinely demonstrate the different editions during our Immersion Events and at SQL Saturdays. Where the previous Free edition simplified analysis of execution plans, the new features that have been added from the previous Pro edition and what would have become Ultimate edition make this hands down the best tool for performance tuning and execution plan analysis. The new Index Analysis tab with statistics histogram graphed out will make analyzing plan skew due to parameter sensitivity much easier. This is a feature that I’ve been waiting to see live for months now and can’t wait to make more use of in client engagements.

index-analysis

Add to that the ability to analyze deadlock graphs, collection of execution wait statistics with Extended Events, query execution plan history with notes, and advanced plan layout and filtering options. Now it is all COMPLETELY FREE!  As if that’s not good enough, you can also attend a free query tuning seminar with Aaron Bertrand on Friday September 9th 11:00AM-12:00PM EST where he will demonstrate how to use the new tool more effectively.

Free Seminar on Query Tuning