Common Table Expressions (CTEs), window functions, and views are all common things in SQL Server development these days. When CTEs and window functions were introduced in SQL Server 2005, a lot of developers jumped in and began using them to solve problems and simplify code. While these tools can be a great benefit in SQL Server, they can also be the cause of significant performance problems in certain scenarios. I recently engaged in some performance tuning with a client after a code change was deployed to their production server that resulted in CPU usage of nearly 100% across all 32 logical processors. The resulting performance problems in their application made this a critical situation for the client. The fix required a few hours rewriting a couple of the modified stored procedures to reduce the CPU usage back down to the normal range of 10-20%.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb Common Table Expressions (CTEs), Window Functions, and Views   image thumb1
CPU Usage Initially CPU Usage After Rewrites

The root cause of the problem was something I consider to be a coding anti-pattern, using a CTE with a window function inside of a view.  As we’ll see in this post, mixing the three of these together can result in significant performance problems from the need to materialize the intermediate results of the view before filtering can be applied.

Setting up The Problem

To demonstrate the problem I’ll first create a new table in AdventureWorks2012 to show the impact on performance as the number of rows increases in our test table. This table will be based on the HumanResources.EmployeePayHistory table and can be generated with the script attached to this post.

  • To create the table with 316 rows, comment out the second INSERT statement entirely
  • To create the table with 11,916 rows, change the GO for the second INSERT to be GO 40
  • To create the table with 116,316 rows, change the GO for the second INSERT to be GO 400
  • To create the table with 1,160,316, change the GO for the second INSERT to be GO 4000

With my table created, I’ll create a view to return the current pay value for the employees as follows:

CREATE VIEW dbo.EmployeeCurrentPayData
AS

WITH PayChanges AS(
SELECT
    per.BusinessEntityID,
    per.FirstName,
    per.LastName,
    per.MiddleName,
    emp.JobTitle,
    pay.Rate,
    pay.PayFrequency,
    ROW_NUMBER() OVER (PARTITION BY per.BusinessEntityID, JobTitle, PayFrequency ORDER BY pay.RateChangeDate DESC) AS RevisionID
FROM Person.Person AS per
INNER JOIN HumanResources.Employee AS emp
    ON per.BusinessEntityID = emp.BusinessEntityID
LEFT JOIN HumanResources.EmployeePayHistoryEnlarged AS pay
    ON emp.BusinessEntityID = pay.BusinessEntityID)
SELECT
    BusinessEntityID,
    FirstName,
    LastName,
    MiddleName,
    JobTitle,
    Rate,
    PayFrequency
FROM PayChanges
WHERE RevisionID = 1;

GO

Then I’ll create a stored procedure for my application to be able to get the current pay value for a specific employee by the BusinessEntityID for the employee.

CREATE PROCEDURE dbo.GetEmployeeCurrentPay_View (@BusinessEntityID INT)
AS
BEGIN

SELECT
    BusinessEntityID,
    FirstName,
    LastName,
    MiddleName,
    JobTitle,
    Rate,
    PayFrequency
FROM dbo.EmployeeCurrentPayData
WHERE BusinessEntityID = @BusinessEntityID;

END
GO

If I run the stored procedure for any employee’s BusinessEntityID the results will always be a single row.

DECLARE @BusinessEntityID INT = 250;
EXECUTE dbo.GetEmployeeCurrentPay_View @BusinessEntityID;

Common Table Expressions (CTEs), Window Functions, and Views   image thumb2

For a table with 316 rows of pay history, the execution plan will have the following shape:

Common Table Expressions (CTEs), Window Functions, and Views   image thumb3

The execution plan shows that the tables were all scanned and the CTE was computed before the filter predicate on BusinessEntityID was applied. I can see this in the tool-tip for the Filter operator and by the number of input rows versus output rows for the Filter.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb4

With a small number of rows, this doesn’t show up as being a significant problem as shown by the STATISTICS IO and STATISTICS TIME output for the execution.

SQL Server parse and compile time:
   CPU time = 14 ms, elapsed time = 14 ms.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 0, logical reads 897, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Employee'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'EmployeePayHistoryEnlarged'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 4 ms.

The total execution time was just 4ms with 316 rows. If I scale up our test table to 11,916 rows, the plan changes to using a Merge Join with the EmployeePayHistoryEnlarged table, but I can still see that the CTE was fully computed before the filtering on BusinessEntityID was applied.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb5

Performance wise, the stored procedure executes in 85ms and only uses 31ms of CPU time as shown by the execution statistics.

SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 9 ms.

Table 'EmployeePayHistoryEnlarged'. Scan count 1, logical reads 86, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 0, logical reads 897, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Employee'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 85 ms.
 

The real problems start to occur when the number of rows is increased to 116,316. The increased cost of computing the CTE entirely results in an execution plan that uses parallelism.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb6

Testing this stored procedure on its own wouldn’t make you think that there was a performance problem lying in wait for a production system. Looking at the execution statistics, the procedure executed in 240ms, but the important item to look at is the shift in proportion of CPU time. Since the query used parallelism, I now have significantly more CPU time, by a factor of 3 in the 4vCPU VM I tested this on, which is the result of multiple threads processing the execution.

SQL Server parse and compile time:
   CPU time = 16 ms, elapsed time = 16 ms.

Table 'Employee'. Scan count 25, logical reads 22, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 25, logical reads 319, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'EmployeePayHistoryEnlarged'. Scan count 25, logical reads 949, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 643 ms,  elapsed time = 240 ms.

The problem with this specific procedure is that it doesn’t just get executed once, it gets executed thousands of times every couple of seconds, so the cumulative effect of the CPU time required to compute the CTE completely becomes significant enough to run a 32-core server at nearly 100% CPU utilization as shown in the images at the top of the post.

The problem only gets worse as the number of rows in the table increases, as shown by the execution plan and execution statistics for 1,160,316 rows.

Note: Depending on the hardware that the tests are run on, the plan might be different for these tests. For example the first plan below was generated with 4 vCPUs while the second plan using the Hash Join and Bitmap optimization was generated on the same server but with the -Pn startup parameter to start SQLOS with 24 logical schedulers, as showed in my post SQL Server and Soft NUMA. In either case parallelism is used and the CPU time is roughly three times the execution time.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb7

SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.

Table 'EmployeePayHistoryEnlarged'. Scan count 290, logical reads 7416, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 0, logical reads 923, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Employee'. Scan count 5, logical reads 22, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 4336 ms,  elapsed time = 1788 ms.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb8

SQL Server parse and compile time:
   CPU time = 23 ms, elapsed time = 23 ms.

Table 'Employee'. Scan count 25, logical reads 22, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 25, logical reads 319, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'EmployeePayHistoryEnlarged'. Scan count 290, logical reads 7656, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 6209 ms,  elapsed time = 2235 ms.

How do you fix this?

Fixing this problem is actually very simple; don’t use the VIEW and move the WHERE clause for the BusinessEntityID column up into the CTE so that it can be used to filter the tables before computation occurs.

CREATE PROCEDURE dbo.GetEmployeeCurrentPay_NoView (@BusinessEntityID INT)
AS
BEGIN

WITH PayChanges AS(
SELECT
    per.BusinessEntityID,
    per.FirstName,
    per.LastName,
    per.MiddleName,
    emp.JobTitle,
    pay.Rate,
    pay.PayFrequency,
    ROW_NUMBER() OVER (PARTITION BY per.BusinessEntityID, JobTitle, PayFrequency ORDER BY pay.RateChangeDate DESC) AS RevisionID
FROM Person.Person AS per
INNER JOIN HumanResources.Employee AS emp
    ON per.BusinessEntityID = emp.BusinessEntityID
LEFT JOIN HumanResources.EmployeePayHistoryEnlarged AS pay
    ON emp.BusinessEntityID = pay.BusinessEntityID
WHERE per.BusinessEntityID = @BusinessEntityID)
SELECT
    BusinessEntityID,
    FirstName,
    LastName,
    MiddleName,
    JobTitle,
    Rate,
    PayFrequency
FROM PayChanges
WHERE RevisionID = 1;

END
GO

By pushing the WHERE clause up into the CTE, with 316 rows we get an execution plan that shows only a couple of rows are being returned from the tables during execution.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb9

If I look at the tool-tip for the Clustered Index Seek on the EmployeePayHistoryEnlarged table, I’ll see the Seek Predicate being applied to the BusinessEnityID column to filter the results.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb10

The execution statistics for the rewritten procedure also shows the reduction in logical reads and the execution time.

SQL Server parse and compile time:
   CPU time = 4 ms, elapsed time = 4 ms.

Table 'EmployeePayHistoryEnlarged'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Employee'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 4 ms.

I also get plan stability with this rewrite, regardless of the number of rows I have in our EmployeePayHistoryEnlarged table. At 1,160,316 rows, the execution plan is still the same, except that I can see more rows being returned from EmployeePayHistoryEnlarged to be able to identify the most recent pay rate.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb11

The execution statistics also shows the reduction in logical reads, and the lower CPU time and execution time for the procedure.

SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.

Table 'EmployeePayHistoryEnlarged'. Scan count 1, logical reads 25, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Employee'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 152 ms.

What was Actually the Problem?

Since I have three different coding aspects at play here, you are probably wondering what exactly was the root problem? If you aren’t wondering that, you aren’t alone, I didn’t either, but after doing a review of the article, my colleague Joe Sack pointed out that it was unclear where the problem actually was. I have to agree with Joe; so far I haven’t really shown definitively where the problem is. Is it the fact that I have a CTE within a view? Let’s find out. First I’ll rewrite a new view that only contains a CTE, but still provides the same result as the original test query, using a sub-query to determine the MAX(RateChangeDate) from the EmployeePayHistoryEnlarged table for the BusinessEntityID and PayFrequency.

/* Drop the view before creating it */
IF OBJECT_ID(N'EmployeeCurrentPayData_NoWF') IS NOT NULL
BEGIN
    DROP VIEW dbo.EmployeeCurrentPayData_NoWF;
END
GO

/* Create the problem view for the tests */
CREATE VIEW dbo.EmployeeCurrentPayData_NoWF
AS
WITH PayChanges AS(
SELECT
    per.BusinessEntityID,
    per.FirstName,
    per.LastName,
    per.MiddleName,
    emp.JobTitle,
    pay.Rate,
    pay.PayFrequency
FROM Person.Person AS per
INNER JOIN HumanResources.Employee AS emp
    ON per.BusinessEntityID = emp.BusinessEntityID
LEFT JOIN HumanResources.EmployeePayHistoryEnlarged AS pay
    ON emp.BusinessEntityID = pay.BusinessEntityID
WHERE RateChangeDate = (SELECT MAX(RateChangeDate)
                        FROM HumanResources.EmployeePayHistoryEnlarged AS pay2
                        WHERE pay.BusinessEntityID = pay2.BusinessEntityID
                          AND pay.PayFrequency = pay.PayFrequency)
)
SELECT
    BusinessEntityID,
    FirstName,
    LastName,
    MiddleName,
    JobTitle,
    Rate,
    PayFrequency
FROM PayChanges;
GO

/* Drop the stored procedure using the view before creating it */
IF OBJECT_ID(N'GetEmployeeCurrentPay_ViewNoWF') IS NOT NULL
BEGIN
    DROP PROCEDURE dbo.GetEmployeeCurrentPay_ViewNoWF;
END
GO

/* Create the stored procedure that uses the view to tes the problem */
CREATE PROCEDURE dbo.GetEmployeeCurrentPay_ViewNoWF (@BusinessEntityID INT)
AS
BEGIN
    SELECT
BusinessEntityID,
    FirstName,
    LastName,
    MiddleName,
    JobTitle,
    Rate,
    PayFrequency
    FROM dbo.EmployeeCurrentPayData_NoWF
    WHERE BusinessEntityID = @BusinessEntityID;
END
GO

Now a simple test by running the new procedure and reviewing the execution plan for 1,160,316 rows:

Common Table Expressions (CTEs), Window Functions, and Views   image thumb12

SQL Server parse and compile time:
   CPU time = 12 ms, elapsed time = 12 ms.

Table 'Employee'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'EmployeePayHistoryEnlarged'. Scan count 1, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 102 ms.

I think it is pretty apparent that using a CTE in a view isn’t the problem. Since I now know that the issue isn’t the CTE within the view, how about the window function inside of the CTE? To test this, I can take the view definition, add a filter to the query against the CTE for the BusinessEntityID, and look at the resulting plan and performance:

DECLARE @BusinessEntityID INT = 250;

WITH PayChanges AS(
SELECT
    per.BusinessEntityID,
    per.FirstName,
    per.LastName,
    per.MiddleName,
    emp.JobTitle,
    pay.Rate,
    pay.PayFrequency,
    ROW_NUMBER() OVER (PARTITION BY per.BusinessEntityID, JobTitle, PayFrequency ORDER BY pay.RateChangeDate DESC) AS RevisionID
FROM Person.Person AS per
INNER JOIN HumanResources.Employee AS emp
    ON per.BusinessEntityID = emp.BusinessEntityID
LEFT JOIN HumanResources.EmployeePayHistoryEnlarged AS pay
    ON emp.BusinessEntityID = pay.BusinessEntityID)
SELECT
    BusinessEntityID,
    FirstName,
    LastName,
    MiddleName,
    JobTitle,
    Rate,
    PayFrequency
FROM PayChanges
WHERE RevisionID = 1
  AND BusinessEntityID = @BusinessEntityID;

Common Table Expressions (CTEs), Window Functions, and Views   image thumb13

Since I have the same plan for 1,160,316 rows as my original problem, it’s clear that the issue is the fact that the CTE has to compute the entire result set before it can filter on the BusinessEntityID. So knowing this, why would I even mention views in this post? It’s quite simple, the reason I generally see problems like this occur is that the developer/DBA wanted to simplify the code across multiple objects rather than have the same CTE fully expanded in each of the objects that needed to access the data in this manner. This is the same reason that user defined functions get used as much as they do despite being slower than set-based solutions that require more code in each of the objects that would have used the function for simplicity.

I could test a window function inside a view, but I’ve already shown above that the problem is actually related to the window function inside of the CTE and where the predicate evaluation actually is being applied.

Summary

While using a CTE inside of a view isn’t the actual performance problem, and instead the window function inside of the CTE with filtering after computation is the cause of the performance issue. The desire to use a view to simplify code typically leads to the coding practice which results in the performance problem occurring. By changing the coding pattern to remove the view, and then pushing the WHERE clause up into the CTE definition, only the rows required will be read and computed, reducing the I/O strain on the server, CPU usage, and providing a much more scalability to your code.

This is the first in a series of examples I’ll be blogging about showing some of the problems I come across while solving performance issues for our clients.