SQL Saturday #192 Content Uploaded

I know that it is somewhat late, but I uploaded my slide decks and demos from SQL Saturday #192 in Tampa, FL to the SQL Saturday website earlier today. For simplicity in finding the information, the links to each of the sessions are below:

Performance Tuning by Digging into the Plan Cache

Successfully Virtualizing SQL Server in Production

Free Tools for the DBA

For those that attended the sessions, once again thank you for your time and I hope you gained something from the content that was presented.  Next up is SQL Saturday #202 in Edinburgh, UK where I’ll be co-presenting a Precon with Joe Sack on SQL Server 2012 New Features.  Hope to see you there if you are local.

Execution Plan Sanitizer v2

Almost two years ago I blogged about Sanitizing Execution Plans using PowerShell. The original intent behind this idea was to be able to provide completely sanitized execution plans that could be shared outside of your business environment without the risk of sharing any design elements that may be considered Intellectual Property. Since writing that post I’ve had many people contact me with suggestions for the sanitized output, and so I’ve made a bunch of minor changes to the code to fix various output issues and suggestions.

Fast forward two years and I believe this idea is even more useful today, especially since you can now upload execution plans using SQL Sentry Plan Explorer to the SQLPerformance.com website to get assistance from the community in diagnosing performance problems or query plan regressions. When I found out about this feature being included in Plan Explorer, my first question was whether the ability to sanitize or obfuscate the plan would be included, and I found that it wasn’t part of the initial launch plans but may eventually be part of the product. To be perfectly honest, the number of total users that worry about schema information in an execution plan is a minority so from a business standpoint I totally understand it not being in the first release of this functionality. We’ve already seen a few plans posted to the SQLPerformance.com site show up on the #sqlhelp tag on Twitter, so I know that the new upload feature is being used.

Earlier today I was asked if I realized that the script was broken and didn’t sanitize the output XML properly. Unfortunately I never took the time to upload the changes I’ve made over the last two years, so ran my local script through some plans and then asked my good friend Aaron Bertrand (Blog|Twitter) to test out a new version of it for me. He sent me back some feedback which I was able to quickly implement and now there is a v2 version of the script that should handle most of the plan sanitizing issues that existed in the initial version of the PowerShell script.

2013_2_27_ExecutionPlanSanitizer

Graphically Viewing Extended Events Deadlock Graphs

I’ve previously blogged about the changes to the xml_deadlock_report in Extended Events to support multi-victim deadlock analysis.  One of the side effects of this change was that the XML document for the deadlock report had to be modified to be able to accommodate multiple victim processes in the graph.  As a result of this, the deadlock graphs that are provided by Extended Events in SQL Server 2008 cannot be saved and opened graphically inside of SQL Server Management Studio.  If you attempt to open an Extended Events deadlock graph you will get an error similar to the following:

Graphically Viewing Extended Events Deadlock Graphs   image thumb

Failed to initialize deadlock control.
        There is an error in XML document (1, 2).
                <deadlock xmlns=""> was not expected

Until recently there wasn’t anything you could do about this, but today there are two options for viewing the deadlock graph in graphical format; SQL Sentry Plan Explorer Pro and SQL Server 2012 Management Studio.

Plan Explorer Pro

One of the new Pro features that was added in SQL Sentry Plan Explorer at the beginning of November 2012 was the ability to open deadlock graph .xdl files in Plan Explorer.  The display is exactly the same display that you would get from Performance Advisor for deadlocks proactively collected during it’s monitoring, which really simplifies the analysis of the deadlock graph information.  While I was beta testing the changes ahead of the public release, I mentioned that the XML from the deadlock graphs generated by Extended Events had a different output to support multi-victim deadlock analysis, and Brooke (the developer for Plan Explorer at SQL Sentry) asked for an example.  A few hours later I had another beta build that handled the multi-victim deadlock graph format provided by Extended Events which is shown below.

Graphically Viewing Extended Events Deadlock Graphs   image thumb1

SQL Server 2012 Management Studio

I didn’t actually realize this until I was recording the next Insider Demo Video for our SQLskills Insiders Newsletter, but SQL Server 2012 Management Studio has also been updated to handle the XML format output by Extended Events, and it also understands the multi-victim deadlock information.

Graphically Viewing Extended Events Deadlock Graphs   image thumb2

However, if you compare the relative amount of information provided here, even if you use the hover tips on the individual processes, there is still a significant amount of information that is missing.  For example, you don’t get the isolation levels of the processes in the graphical display at all, which can be important for troubleshooting specific types of deadlocks.

If you haven’t given SQL Sentry Plan Explorer Pro a spin, I’d really recommend it.  I had the opportunity to take the next Beta release of Plan Explorer Pro for a spin last night, and there are some really awesome new features coming that will make plan analysis faster and easier than you’ve ever experienced.

Common Table Expressions (CTEs), Window Functions, and Views

Common Table Expressions (CTEs), window functions, and views are all common things in SQL Server development these days. When CTEs and window functions were introduced in SQL Server 2005, a lot of developers jumped in and began using them to solve problems and simplify code. While these tools can be a great benefit in SQL Server, they can also be the cause of significant performance problems in certain scenarios. I recently engaged in some performance tuning with a client after a code change was deployed to their production server that resulted in CPU usage of nearly 100% across all 32 logical processors. The resulting performance problems in their application made this a critical situation for the client. The fix required a few hours rewriting a couple of the modified stored procedures to reduce the CPU usage back down to the normal range of 10-20%.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb Common Table Expressions (CTEs), Window Functions, and Views   image thumb1
CPU Usage Initially CPU Usage After Rewrites

The root cause of the problem was something I consider to be a coding anti-pattern, using a CTE with a window function inside of a view.  As we’ll see in this post, mixing the three of these together can result in significant performance problems from the need to materialize the intermediate results of the view before filtering can be applied.

Setting up The Problem

To demonstrate the problem I’ll first create a new table in AdventureWorks2012 to show the impact on performance as the number of rows increases in our test table. This table will be based on the HumanResources.EmployeePayHistory table and can be generated with the script attached to this post.

  • To create the table with 316 rows, comment out the second INSERT statement entirely
  • To create the table with 11,916 rows, change the GO for the second INSERT to be GO 40
  • To create the table with 116,316 rows, change the GO for the second INSERT to be GO 400
  • To create the table with 1,160,316, change the GO for the second INSERT to be GO 4000

With my table created, I’ll create a view to return the current pay value for the employees as follows:

CREATE VIEW dbo.EmployeeCurrentPayData
AS

WITH PayChanges AS(
SELECT
    per.BusinessEntityID,
    per.FirstName,
    per.LastName,
    per.MiddleName,
    emp.JobTitle,
    pay.Rate,
    pay.PayFrequency,
    ROW_NUMBER() OVER (PARTITION BY per.BusinessEntityID, JobTitle, PayFrequency ORDER BY pay.RateChangeDate DESC) AS RevisionID
FROM Person.Person AS per
INNER JOIN HumanResources.Employee AS emp
    ON per.BusinessEntityID = emp.BusinessEntityID
LEFT JOIN HumanResources.EmployeePayHistoryEnlarged AS pay
    ON emp.BusinessEntityID = pay.BusinessEntityID)
SELECT
    BusinessEntityID,
    FirstName,
    LastName,
    MiddleName,
    JobTitle,
    Rate,
    PayFrequency
FROM PayChanges
WHERE RevisionID = 1;

GO

Then I’ll create a stored procedure for my application to be able to get the current pay value for a specific employee by the BusinessEntityID for the employee.

CREATE PROCEDURE dbo.GetEmployeeCurrentPay_View (@BusinessEntityID INT)
AS
BEGIN

SELECT
    BusinessEntityID,
    FirstName,
    LastName,
    MiddleName,
    JobTitle,
    Rate,
    PayFrequency
FROM dbo.EmployeeCurrentPayData
WHERE BusinessEntityID = @BusinessEntityID;

END
GO

If I run the stored procedure for any employee’s BusinessEntityID the results will always be a single row.

DECLARE @BusinessEntityID INT = 250;
EXECUTE dbo.GetEmployeeCurrentPay_View @BusinessEntityID;

Common Table Expressions (CTEs), Window Functions, and Views   image thumb2

For a table with 316 rows of pay history, the execution plan will have the following shape:

Common Table Expressions (CTEs), Window Functions, and Views   image thumb3

The execution plan shows that the tables were all scanned and the CTE was computed before the filter predicate on BusinessEntityID was applied. I can see this in the tool-tip for the Filter operator and by the number of input rows versus output rows for the Filter.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb4

With a small number of rows, this doesn’t show up as being a significant problem as shown by the STATISTICS IO and STATISTICS TIME output for the execution.

SQL Server parse and compile time:
   CPU time = 14 ms, elapsed time = 14 ms.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 0, logical reads 897, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Employee'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'EmployeePayHistoryEnlarged'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 4 ms.

The total execution time was just 4ms with 316 rows. If I scale up our test table to 11,916 rows, the plan changes to using a Merge Join with the EmployeePayHistoryEnlarged table, but I can still see that the CTE was fully computed before the filtering on BusinessEntityID was applied.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb5

Performance wise, the stored procedure executes in 85ms and only uses 31ms of CPU time as shown by the execution statistics.

SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 9 ms.

Table 'EmployeePayHistoryEnlarged'. Scan count 1, logical reads 86, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 0, logical reads 897, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Employee'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 85 ms.
 

The real problems start to occur when the number of rows is increased to 116,316. The increased cost of computing the CTE entirely results in an execution plan that uses parallelism.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb6

Testing this stored procedure on its own wouldn’t make you think that there was a performance problem lying in wait for a production system. Looking at the execution statistics, the procedure executed in 240ms, but the important item to look at is the shift in proportion of CPU time. Since the query used parallelism, I now have significantly more CPU time, by a factor of 3 in the 4vCPU VM I tested this on, which is the result of multiple threads processing the execution.

SQL Server parse and compile time:
   CPU time = 16 ms, elapsed time = 16 ms.

Table 'Employee'. Scan count 25, logical reads 22, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 25, logical reads 319, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'EmployeePayHistoryEnlarged'. Scan count 25, logical reads 949, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 643 ms,  elapsed time = 240 ms.

The problem with this specific procedure is that it doesn’t just get executed once, it gets executed thousands of times every couple of seconds, so the cumulative effect of the CPU time required to compute the CTE completely becomes significant enough to run a 32-core server at nearly 100% CPU utilization as shown in the images at the top of the post.

The problem only gets worse as the number of rows in the table increases, as shown by the execution plan and execution statistics for 1,160,316 rows.

Note: Depending on the hardware that the tests are run on, the plan might be different for these tests. For example the first plan below was generated with 4 vCPUs while the second plan using the Hash Join and Bitmap optimization was generated on the same server but with the -Pn startup parameter to start SQLOS with 24 logical schedulers, as showed in my post SQL Server and Soft NUMA. In either case parallelism is used and the CPU time is roughly three times the execution time.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb7

SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.

Table 'EmployeePayHistoryEnlarged'. Scan count 290, logical reads 7416, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 0, logical reads 923, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Employee'. Scan count 5, logical reads 22, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 4336 ms,  elapsed time = 1788 ms.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb8

SQL Server parse and compile time:
   CPU time = 23 ms, elapsed time = 23 ms.

Table 'Employee'. Scan count 25, logical reads 22, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 25, logical reads 319, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'EmployeePayHistoryEnlarged'. Scan count 290, logical reads 7656, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 6209 ms,  elapsed time = 2235 ms.

How do you fix this?

Fixing this problem is actually very simple; don’t use the VIEW and move the WHERE clause for the BusinessEntityID column up into the CTE so that it can be used to filter the tables before computation occurs.

CREATE PROCEDURE dbo.GetEmployeeCurrentPay_NoView (@BusinessEntityID INT)
AS
BEGIN

WITH PayChanges AS(
SELECT
    per.BusinessEntityID,
    per.FirstName,
    per.LastName,
    per.MiddleName,
    emp.JobTitle,
    pay.Rate,
    pay.PayFrequency,
    ROW_NUMBER() OVER (PARTITION BY per.BusinessEntityID, JobTitle, PayFrequency ORDER BY pay.RateChangeDate DESC) AS RevisionID
FROM Person.Person AS per
INNER JOIN HumanResources.Employee AS emp
    ON per.BusinessEntityID = emp.BusinessEntityID
LEFT JOIN HumanResources.EmployeePayHistoryEnlarged AS pay
    ON emp.BusinessEntityID = pay.BusinessEntityID
WHERE per.BusinessEntityID = @BusinessEntityID)
SELECT
    BusinessEntityID,
    FirstName,
    LastName,
    MiddleName,
    JobTitle,
    Rate,
    PayFrequency
FROM PayChanges
WHERE RevisionID = 1;

END
GO

By pushing the WHERE clause up into the CTE, with 316 rows we get an execution plan that shows only a couple of rows are being returned from the tables during execution.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb9

If I look at the tool-tip for the Clustered Index Seek on the EmployeePayHistoryEnlarged table, I’ll see the Seek Predicate being applied to the BusinessEnityID column to filter the results.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb10

The execution statistics for the rewritten procedure also shows the reduction in logical reads and the execution time.

SQL Server parse and compile time:
   CPU time = 4 ms, elapsed time = 4 ms.

Table 'EmployeePayHistoryEnlarged'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Employee'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 4 ms.

I also get plan stability with this rewrite, regardless of the number of rows I have in our EmployeePayHistoryEnlarged table. At 1,160,316 rows, the execution plan is still the same, except that I can see more rows being returned from EmployeePayHistoryEnlarged to be able to identify the most recent pay rate.

Common Table Expressions (CTEs), Window Functions, and Views   image thumb11

The execution statistics also shows the reduction in logical reads, and the lower CPU time and execution time for the procedure.

SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.

Table 'EmployeePayHistoryEnlarged'. Scan count 1, logical reads 25, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Employee'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 152 ms.

What was Actually the Problem?

Since I have three different coding aspects at play here, you are probably wondering what exactly was the root problem? If you aren’t wondering that, you aren’t alone, I didn’t either, but after doing a review of the article, my colleague Joe Sack pointed out that it was unclear where the problem actually was. I have to agree with Joe; so far I haven’t really shown definitively where the problem is. Is it the fact that I have a CTE within a view? Let’s find out. First I’ll rewrite a new view that only contains a CTE, but still provides the same result as the original test query, using a sub-query to determine the MAX(RateChangeDate) from the EmployeePayHistoryEnlarged table for the BusinessEntityID and PayFrequency.

/* Drop the view before creating it */
IF OBJECT_ID(N'EmployeeCurrentPayData_NoWF') IS NOT NULL
BEGIN
    DROP VIEW dbo.EmployeeCurrentPayData_NoWF;
END
GO

/* Create the problem view for the tests */
CREATE VIEW dbo.EmployeeCurrentPayData_NoWF
AS
WITH PayChanges AS(
SELECT
    per.BusinessEntityID,
    per.FirstName,
    per.LastName,
    per.MiddleName,
    emp.JobTitle,
    pay.Rate,
    pay.PayFrequency
FROM Person.Person AS per
INNER JOIN HumanResources.Employee AS emp
    ON per.BusinessEntityID = emp.BusinessEntityID
LEFT JOIN HumanResources.EmployeePayHistoryEnlarged AS pay
    ON emp.BusinessEntityID = pay.BusinessEntityID
WHERE RateChangeDate = (SELECT MAX(RateChangeDate)
                        FROM HumanResources.EmployeePayHistoryEnlarged AS pay2
                        WHERE pay.BusinessEntityID = pay2.BusinessEntityID
                          AND pay.PayFrequency = pay.PayFrequency)
)
SELECT
    BusinessEntityID,
    FirstName,
    LastName,
    MiddleName,
    JobTitle,
    Rate,
    PayFrequency
FROM PayChanges;
GO

/* Drop the stored procedure using the view before creating it */
IF OBJECT_ID(N'GetEmployeeCurrentPay_ViewNoWF') IS NOT NULL
BEGIN
    DROP PROCEDURE dbo.GetEmployeeCurrentPay_ViewNoWF;
END
GO

/* Create the stored procedure that uses the view to tes the problem */
CREATE PROCEDURE dbo.GetEmployeeCurrentPay_ViewNoWF (@BusinessEntityID INT)
AS
BEGIN
    SELECT
BusinessEntityID,
    FirstName,
    LastName,
    MiddleName,
    JobTitle,
    Rate,
    PayFrequency
    FROM dbo.EmployeeCurrentPayData_NoWF
    WHERE BusinessEntityID = @BusinessEntityID;
END
GO

Now a simple test by running the new procedure and reviewing the execution plan for 1,160,316 rows:

Common Table Expressions (CTEs), Window Functions, and Views   image thumb12

SQL Server parse and compile time:
   CPU time = 12 ms, elapsed time = 12 ms.

Table 'Employee'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'EmployeePayHistoryEnlarged'. Scan count 1, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 102 ms.

I think it is pretty apparent that using a CTE in a view isn’t the problem. Since I now know that the issue isn’t the CTE within the view, how about the window function inside of the CTE? To test this, I can take the view definition, add a filter to the query against the CTE for the BusinessEntityID, and look at the resulting plan and performance:

DECLARE @BusinessEntityID INT = 250;

WITH PayChanges AS(
SELECT
    per.BusinessEntityID,
    per.FirstName,
    per.LastName,
    per.MiddleName,
    emp.JobTitle,
    pay.Rate,
    pay.PayFrequency,
    ROW_NUMBER() OVER (PARTITION BY per.BusinessEntityID, JobTitle, PayFrequency ORDER BY pay.RateChangeDate DESC) AS RevisionID
FROM Person.Person AS per
INNER JOIN HumanResources.Employee AS emp
    ON per.BusinessEntityID = emp.BusinessEntityID
LEFT JOIN HumanResources.EmployeePayHistoryEnlarged AS pay
    ON emp.BusinessEntityID = pay.BusinessEntityID)
SELECT
    BusinessEntityID,
    FirstName,
    LastName,
    MiddleName,
    JobTitle,
    Rate,
    PayFrequency
FROM PayChanges
WHERE RevisionID = 1
  AND BusinessEntityID = @BusinessEntityID;

Common Table Expressions (CTEs), Window Functions, and Views   image thumb13

Since I have the same plan for 1,160,316 rows as my original problem, it’s clear that the issue is the fact that the CTE has to compute the entire result set before it can filter on the BusinessEntityID. So knowing this, why would I even mention views in this post? It’s quite simple, the reason I generally see problems like this occur is that the developer/DBA wanted to simplify the code across multiple objects rather than have the same CTE fully expanded in each of the objects that needed to access the data in this manner. This is the same reason that user defined functions get used as much as they do despite being slower than set-based solutions that require more code in each of the objects that would have used the function for simplicity.

I could test a window function inside a view, but I’ve already shown above that the problem is actually related to the window function inside of the CTE and where the predicate evaluation actually is being applied.

Summary

While using a CTE inside of a view isn’t the actual performance problem, and instead the window function inside of the CTE with filtering after computation is the cause of the performance issue. The desire to use a view to simplify code typically leads to the coding practice which results in the performance problem occurring. By changing the coding pattern to remove the view, and then pushing the WHERE clause up into the CTE definition, only the rows required will be read and computed, reducing the I/O strain on the server, CPU usage, and providing a much more scalability to your code.

This is the first in a series of examples I’ll be blogging about showing some of the problems I come across while solving performance issues for our clients.

Performance Tuning SQL Server on Windows Server 2012

Properly configuring Windows and SQL Server to get the best performance from your server hardware is an important task for database administrators.  There is a lot of information available online with different recommendations about how to configure your servers for the best performance.  The challenge is knowing what recommendations are correct and what advice you should follow when setting up a new server or performance tuning the configuration of an existing server.

Last week I received an email from a member of the community that had attended a user group meeting where the presenter was talking about configuration options for SQL Server on Windows Server 2012.  The source of the information that was being presented is actually a Microsoft whitepaper titled Performance Tuning Guidelines for Windows Server 2012 which has a section titled “Performance Tuning for OLTP Workloads”.  While the document mentions the TPC-E benchmark from the Transaction Processing Performance Council, what is not made clear in this document is that the recommendations in this section DO NOT apply to a majority of the SQL Server installations that exist in the world, they are strictly for servers trying to achieve a high score on the benchmark.  This section also existed in the whitepaper Performance Tuning Guidelines for Windows Server 2008 R2, but under the title “Performance Tuning for TPC-E Workload”, which is more a accurate indication of what it applies to.

If you want to know why this paper doesn’t apply to most workloads, read on.  If you want to know how to configure a SQL Server with general best practices, jump down to the “How should I really configure SQL?” section at the end of this post.

Tuner Specials

The “SQL Server Tunings for OLTP Workloads” section in the whitepaper has a lot of items that I refer to as tuner specials because they exist for benchmarks like TPC-C and TPC-E, and they are documented because the benchmarks require full disclosure of the configuration before the results can be published.  Most of the tuner specials can be found in KB article 920093.  The startup configuration options mentioned in the whitepaper include:

  • -x : Disable SQL Server perfmon counters
  • -T661: Disable the ghost record removal process
  • -T834: Use Microsoft Windows large-page allocations for the buffer pool
  • -T652: Disable page-prefetching scans
  • -T8744: Disallow prefetch for ranges

Of these, the only one that applies to general systems running SQL Server is –T834, to use large-page allocations in servers with large amounts of RAM installed.  Even this trace flag has some important considerations around its usage, since the buffer pool has to allocate its total size at startup from contiguous memory.  If a contiguous allocation is not possible, the instance tries to allocate a smaller value until it finds a contiguous memory region to allocate from. This can significantly increase the instance startup time and is explained in further detail by Bob Ward in his blog post SQL Server and Large Pages Explained.

While there are some workloads where using all these startup options can be beneficial, they don’t apply to general SQL Server workloads.

Priority Boost?

The most interesting thing I found in the whitepaper was the recommendation to set the ‘priority boost’ value to 1 in sp_configure.  This goes against Microsoft’s own recommendations for this configuration option.  When I am doing a health check of a server for a client, this configuration option is a big, red flashing light.  When I see this set to 1 it usually means that I’m going to find a bunch more incorrectly configured options.  Additionally this option has been marked as deprecated since SQL Server 2008 R2 released and will be removed from future releases of the product.  This option is used for TPC-E to gain a slight advantage on a server where all unnecessary background services have been disabled, SQL Server is the only thing running, and the goal is strictly to obtain the best benchmark.

How should I really configure SQL?

My colleague Glenn Berry (Blog | Twitter) knows a lot about performance tuning SQL Server hardware configurations, and he knows a lot about how to provision and configure a new SQL Server Instance, which he shared in his three part series on SimpleTalk (Part 1 | Part 2 | Part 3) last year, and in a new Pluralsight online course this year.  If you’re looking for a guide to setting up a new SQL Server I’d start off there.  I published my own checklist for new server configuration back in 2010, but upon reviewing some of the recommendations I think I’ll update that with a new revised checklist in a future post (for example I’d leave hyper-threading turned on today, and set ‘max degree of parallelism’ based on the NUMA configuration and workload).

We’ve been helping a lot of clients with upgrading to SQL Server 2012 on Windows Server 2012 – if you need help, drop us a line.

Migrating Legacy LOB Data Types to Current Ones – A Big Gotcha!

Recently, Erin (Blog|Twitter) and I encountered a bug in SQL Server that can affect some scenarios in a very negative manner.  When working with a database that was designed in SQL Server 2000 using the legacy LOB data types, text, ntext, and image, it is possible to encounter a problem when the ‘text in row’ table option has been configured and the data types for the columns were later migrated to the newer varchar(max), nvarchar(max), or varbinary(max) data types in SQL Server.  If you have a table that fits the specific scenario in question, and you try to analyze the table for data compression using the sp_estimate_data_compression_savings stored procedure, the end result is a hung session with a Schema Modification lock being held in tempdb that requires a restart of the instance to clear.  NOT GOOD!!!!!

We were able to reproduce this problem in SQL Server and we’ve subsequently filed a bug internally with Microsoft as well.  The good thing is that there is a work around, which I’ll cover later in this blog post, if you happen to run into this specific scenario and bug.

To start off with, if you create a table using varchar(max) and try to use sp_tableoption to set ‘text in row’ for the table it will error out:

DROP TABLE dbo.Table_1

CREATE TABLE dbo.Table_1
(
RowID int NOT NULL,
TestColumn varchar(max) NULL,
OtherColumn varchar(50) NULL,
ChangeDate datetime NOT NULL
)

exec sp_tableoption 'Table_1', 'text in row', '7000';

 

Msg 2599, Level 16, State 1, Procedure sp_tableoption, Line 102
Cannot switch to in row text in table "Table_1".

This is expected behavior by the Engine since the ‘text in row’ option only applies to the legacy data types and cannot be used with the newer LOB data types in SQL Server 2005+. For these, you instead must use the ‘large value types out of row’ option.

However, if you have a table that was created in SQL 2000 or before, and used text, image, or ntext columns, and the ‘text in row option’ was set:

DROP TABLE dbo.Table_1

CREATE TABLE dbo.Table_1
(
RowID int NOT NULL,
TestColumn text NULL,
OtherColumn varchar(50) NULL,
ChangeDate datetime NOT NULL
)

EXECUTE sp_tableoption 'Table_1', 'text in row', '7000';</blockquote>

and then later you upgraded to 2005+ and changed the column data type to varchar(max) or one of the other LOB data types:

ALTER TABLE Table_1;
ALTER COLUMN TestColumn varchar(max);

The ‘text in row’ value remains configured for the table.

SELECT text_in_row_limit
FROM sys.tables
WHERE name = 'Table_1';

If you attempt to estimate the compression for this table it will generate an error because the compression code creates a copy of the table in tempdb and tries to call sp_tableoption to set the option on the temp table used for sampling.

WARNING: Running the below code will result in a locked session that requires restarting the SQL instance to clear! DO NOT DO THIS ON A PRODUCTION SQL SERVER!

EXECUTE sp_estimate_data_compression_savings 'dbo','Table_1', 0, 1, 'row'

This command will result in error 2599 occurring, however, the SP doesn’t handle the error correctly for some reason, and it ends up with the session waiting on a SCH-M lock in tempdb that is being blocked by itself and never goes away.  The proc will actually complete and return the error for 2599 in SQL Server 2008 R2 + SP1, but it results in a stuck session for SQL Server 2008R2 RTM, SQL Server 2008R2 + SP2, and all versions of SQL Server 2012 that we’ve tested so far (we are waiting for further information from Microsoft at this point about why this might occur, so this may be updated in the near future, but I wouldn’t rely on this behavior currently).
In addition to this, you can’t clear the ‘text in row’ value from the table because executing sp_tableoption will return the above error.  The only work around that I’ve found so far for this issue is to add a dummy column to the table with one of the legacy types, turn off the option, and then drop the column.

ALTER TABLE Table_1 ADD [dummy] text;
EXECUTE sp_tableoption 'Table_1', 'text in row', 'OFF';
ALTER TABLE Table_1 DROP COLUMN [dummy];

Once you do this, sp_estimate_data_compression_savings  can estimate the compression savings. If you have migrated data types from legacy LOB types to the current LOB types, you should check for this scenario before attempting to run sp_estimate_data_compression_savings against the tables that changed.

I’ve done a bit of testing and I don’t think that this problem is prevalent in SQL Server instances in the field for a couple of reasons.  Firstly, it requires that you are attempting to estimate the compression savings for a table affected by the problem, and in my experience, data compression is not really widely used.  Secondly, it requires that you migrated from one of the legacy LOB data types to a current one through the use of ALTER TABLE DDL explicitly (the SSMS UI will generate a script to create a new table, copy all the data into it, drop the existing tables indexes and constraints, then drop the old table, and the rename the new table to the old table name, then build all the necessary constraints and indexes, so the table option won’t be in effect after the designer based change).

You really need the perfect storm of conditions to hit this specific problem inside of SQL Server, but it is still something to be wary of.

Hopefully you won’t hit it, but I’m sure someone else will at some point.

CPU Ready Time in VMware and How to Interpret its Real Meaning

(If you’re having persistent problems with your virtual machine configurations and SQL Server performance, Jonathan can help you fix it – fast! Click here for details.)

In the last month I have had to explain how to interpret CPU Ready Time information for SQL Server VMs running on VMware to a number of people. The first time was on Twitter, and the topic is really too big for a 140 character discussion, but I went ahead and gave it a shot. It is rare in my experience for SQL professionals to know anything about virtualization under VMware or Hyper-V, which is why we spend a half a day, or more, on virtualization during IE3: High Availability & Disaster Recovery. Based on my experience, I was surprised to see the question being asked on #sqlhelp at all. It turns out that Idera Diagnostics Manager has been updated to include information about VM performance that is collected from the Virtual Center if you have it installed in your data center.

Now, overall this is a good thing because DBAs now have additional information that they didn’t have access to unless they had negotiated access to Virtual Center with their VM Administrator. My experience consulting is that few DBAs have access to, or even know about Virtual Center, even though it is one of the critical tools for troubleshooting performance problems. Another item that most DBAs don’t know is that their VM administrator can configure roles and read-only security access to allow anyone the ability to see performance and configuration information inside of Virtual Center, but not be able to make changes, so there really is no good reason for a DBA to not have access to the performance data available in Virtual Center.

Now that DBAs have access to the data, it is important to know what is being displayed and what it really means. Unfortunately, the way the information is being presented makes it easy to misinterpret what is being shown, so the end result is confusion for DBAs that don’t know a lot about VMware, or what the information they are looking at actually means. An example of the VM CPU Ready information from Diagnostics Manager is shown in the chart in Figure 1.

CPU Ready Time in VMware and How to Interpret its Real Meaning   clip image0028 thumb
Figure 1 – VM CPU Ready graph

Looking at this chart on the report what would your first interpretation of it be? So far, 3 out of 3 people, including the one on twitter, have thought that there was a problem with this VM. The reality of the situation is that there is absolutely nothing wrong with this VM, it is doing just fine. It just so happens that the scale of this graph makes it easy to jump to an interpretation that this is showing a percentage value, and we all know that anything performance related that is over 80% can’t be good. It doesn’t say that, but it doesn’t provide any context for the information’s meaning either. To make matters even more confusing, CPU Ready in VMware is available as a summation value, which happens to be what is shown here in milliseconds, and as a percentage (RDY% in esxtop) of time spent waiting with work to do to be scheduled by the hypervisor. We can confirm that the information presented is the summation value and not the percentage by looking at the real-time information available in Virtual Center for the same server as shown in the chart in Figure 2.

CPU Ready Time in VMware and How to Interpret its Real Meaning   clip image004 thumb
Figure 2 – CPU Ready real-time summation from Virtual Center

So now that we know it’s a summation and represented in milliseconds what exactly does that tell us? Unfortunately, it actually doesn’t tell us anything on it’s own.

What is CPU Ready Time and why do we even care?

CPU Ready Time is the time that the VM waits in a ready-to-run state (meaning it has work to do) to be scheduled on one or more of the physical CPUs by the hypervisor. It is generally normal for VMs to have small values for CPU Ready Time accumulating even if the hypervisor is not over subscribed or under heavy activity, it’s just the nature of shared scheduling in virtualization. For SMP VMs with multiple vCPUs the amount of ready time will generally be higher than for VMs with fewer vCPUs since it requires more resources to schedule/co-schedule the VM when necessary and each of the vCPUs accumulates the time separately.

At what point does CPU Ready Time start to affect performance?

To be honest it is always having some minimal effect, but it really depends on a lot of different factors, for example which CPU Ready value you are looking at and where you are getting the information. If you are looking at raw RDY% values from esxtop, the value has a completely different meaning than the summation values that are available from Virtual Center. Inside of Virtual Center, the level of summation you are looking at when reading the values also affects the meaning that the value has, and you have to perform calculations to convert the summation into a percentage to know the effect as documented in the VMware Knowledge Base. In this case, for a real-time summation, the data point is actually a 20 second summation of ready time accumulation.

At one point VMware had a recommendation that anything over 5% ready time per vCPU was something to monitor. In my experience for a SMP SQL VM, anything over 5% per vCPU is typically a warning level and anything over 10% per vCPU is critical. The reason this specifically says per vCPU is that each vCPU allocates 100% to the VM’s scheduling total, so a 4 vCPU VM would have a scheduling total of 400%. A 10% CPU Ready on a 4 vCPU VM only equates to 2.5% per vCPU. If this isn’t already complex, it gets worse.

This makes providing a general recommendation impossible for this counter, because it depends on a lot of different factors. For example, if the VM has had a CPU Limit placed on it, whenever the VM exceeds its allocated limit it will accumulate CPU Ready time while it waits to be allowed to execute again. If the CPU Limit is being enforced under business SLAs or a chargeback system, the VM could easily have high CPU Ready values that fit what is required for the configuration. Using the formula from the KB article to convert a summation value to percent, if we round the average of 81.767 down to 80 for simple math, this results in:

(80 / (20s * 1000)) * 100 = 0.4% CPU ready

Four tenths of a percent CPU Ready time, which is not going to negatively impact the performance of the VM. The example VM shown above, also has 8 vCPUs allocated to it, so after taking this into account, it really only has 0.05% per vCPU, well below the older recommended value.

What scenarios cause high CPU Ready times?

While there are a number of scenarios where high CPU Ready times can occur, there are generally two common scenarios that I see when I am consulting. The most common reason tends to be host over subscription, where too many vCPUs have been allocated per pCPU ratio wise. While ESX 5 supports a maximum of 25 vCPUs per physical CPU, this is definitely a case where just because you can, doesn’t mean it’s good to do. As always your mileage may vary based on your specific VM workloads, but typically I start to see problems when a host is in the range of 2-2.5X over subscribed for server workloads.

The second common scenario that I see where CPU Ready times are high is when a larger SMP VM for SQL Server, for example one with 4-8 vCPUs is running on a host that has a lot of smaller VMs with 1-2 vCPUs for application servers. Depending on the number of physical processors, and the total number of vCPUs allocated on the host, the larger resource allocation for the SQL Server VM results in it having to wait longer for the hypervisor to preempt the necessary physical CPUs to schedule/co-schedule the workload. Often in cases where this occurs, after asking some questions I find that the number of vCPUs for the SQL Server was increased from four to eight due to performance problems for the VM. Unfortunately, if CPU Ready time was the original problem, increasing the vCPUs actually doesn’t improve performance, it generally makes things worse.

What do I do if this is actually a problem?

If you have gone through the information and you can see that CPU Ready is really a problem for your VMs there are a couple of different things that can be done. The correct one depends on your virtual infrastructure. If the problem is purely host over subscription vCPU to pCPU ratio wise, start off by evaluating whether the VMs need to have the number of configured vCPUs to determine if any of them can be reduced to lower the ratio. If this can’t be done, the only real answer is to add additional hosts to allow the load to be balanced better and reduce the over subscription rates. If the problem is specific to the larger SMP VMs for applications like SQL Server, evaluate whether you can consolidate the larger VMs onto one or most hosts and move the smaller VMs to the other hosts to separate the VMs based on their sizes. This has worked well for a number of clients that I have worked with were they truly needed eight or more vCPUs for their workload.

Summary

Understanding the data that you are looking at and what it actually means is critical to making the right decisions about what is happening in a virtualized environment. CPU Ready time specifically requires a good understanding of what the value actually is showing and how it relates to the configuration of the VM, the other VMs on the host, and the physical host resources. If you are looking at summation data for the CPU Ready time, converting it to a CPU Ready percent value is what provides the proper meaning to the data for understanding whether or not it is actually a problem. However, keep in mind that other configuration options like CPU Limits can affect the accumulated CPU Ready time and must be checked as well. Whenever I am performing a health check of a SQL Server VM on VMware, I make sure that I get screenshots of the CPU Ready information from Virtual Center for each of the summation levels available so that I can determine whether or not it is affecting the performance of the VM, but I am always careful to calculate using the correct formula what the percentage value actually works out to and then review the rest of the VM configuration before making any conclusions.  In the worst case I’ve seen, for one client the CPU Ready time was roughly 63% per vCPU, and you could visibly see the VM freeze while moving the mouse in a RDP session.  Reviewing the configuration showed that the VM had 8 vCPUs on a host with 8 physical CPUs that was also running 10 other VMs with a total of 14 additional vCPUs.  Moving that VM back down to 2 vCPUs was instant relief to their biggest bottleneck, and then we started talking about hardware changes to fit their increased virtualization usage.  If you’d like expert assistance with implementing, configuring, troubleshooting, or understanding SQL Server on VMware we have a number of different services to fit your needs.

Clustering SQL Server on Virtual Machines (Round 2)

Recently there was lengthy discussion on the #sqlhelp hash tag on Twitter about clustering SQL Server on VMs and whether or not that was a good idea or not. Two years ago I first blogged about this same topic on my blog post, Some Thoughts on Clustering SQL Server Virtual Machines. If you haven’t read that post, I recommend reading it before continuing with this one because it gives a lot of background that I won’t be rehashing as a part of this post. However, a lot has changed in VMware and Hyper-V since I wrote that original post and those changes really affect the recommendations that I would make today.

As I stated in the twitter discussion, we have a number of clients at SQLskills that run WSFC clusters across VMs for SQL Server HA and few have problems with the clusters related to them being VMs. There are some additional considerations that have to be made when working with VMs for the cluster nodes, a big one is that you should plan for those nodes to be on different hosts so that a hardware failure of the host doesn’t take out both of your cluster nodes, which defeats the purpose of having a SQL cluster entirely. There is also the additional layer of the hypervisor and resource management that plays into having a cluster on VMs but with proper planning and management of the VM infrastructure this won’t be a problem, it’s just another layer you have to consider if you do happen to have a problem.

In response to the discussion, Chuck Boyce Jr (Blog|Twitter) wrote a blog post that provided his opinion, which was not to do it, that started a separate discussion later on Twitter. The biggest problem Chuck points out is rooted in problems with inter-team communications within an IT shop. To be honest, Chuck’s point is not an incorrect one, I see this issue all the time, but it’s not specific to VMs. If you work in a shop that has problems with communication between DBAs, Windows administrators, VM administrators, the networking team, and any other IT resource in the business, the simple fact is those problems can be just as bad for a physical implementation of a SQL cluster as they might be for a VM implementation. The solution is to fix the communication problem and find ways to make the “team” cooperate better when problems arise, not avoid merging technologies in the hopes of preventing problems that will still occur in a physical implementation as well.

Am I saying that clustering VMs for SQL Server is for every shop? No, certainly not. There are plenty of places where clustering isn’t the best solution overall. However, with virtualization, depending on the infrastructure, the other SQL HA options that exist might not be a better decision as they would in a physical world either. One of the biggest things to think about is where are the VMs ultimately going to be stored? If the answer is a shared SAN then options like Database Mirroring and Log Shipping don’t really provide you with the same advantages that they do in physical implementation, the big one being that you have a second copy of the database on completely different storage generally. Yes I know that you could have two SQL Servers connected to the same SAN physically that use Database Mirroring, and my response to that would be that a cluster probably would make more sense because the SAN is your single point of failure in either configuration.

If you are new to clustering SQL Server, I wouldn’t recommend that you start out with VMs for your failover cluster. The odds are that you also don’t have a lot of VM experience and if there is a problem, you aren’t going to be able to troubleshoot it as effectively because you have two new technologies that you have to try and dig through. If you are comfortable with clustering SQL Server, adding virtualization as a new technology to the mix is really not that big of a deal, you just need to read the configuration guides and whitepapers for how to setup the VMs, usually your VM administrator is going to have to do this so it’s a good area to break the ice with them and work together to start the open lines of communication, to allow for a supported WSFC implementation and then finally install SQL Server and manage it like you would any other SQL Server failover cluster.

Where else would I recommend not implementing a cluster on VMs? iSCSI SANs that only offer 1Gb/s connectivity over Ethernet, simply because you are likely to run into I/O limitations quickly, and to build the cluster you have to use the software initiator for iSCSI so there is a CPU overhead associated with the configuration. Generally the host has limited ports so you end up sharing the networking between normal traffic and the iSCSI traffic which can be problematic as well. Does that mean it’s not possible? No – I have a number of clients that have these setups and they work fine for their workloads, but it’s not a configuration I would recommend if we were planning a new setup from the ground up.

The big thing that I work through with clients when they are considering whether to cluster VMs for SQL Server is the business requirements for availability and whether or not those can be met without having to leverage one of the SQL HA options or not. With the changes in VMware ESX 5 and Hyper-V 2012, you can scale VMs considerably, and both platforms allow for virtualized NUMA configurations inside of the guest VM for scalability, so the performance and sizing considerations I had two years ago are no longer primary concerns for the implementation to me. If we need 16 vCPUs and 64GB RAM for the nodes, with the correct host configuration, we can easily do that, and we can do it without performance problems while using Standard Edition licensing if we plan the infrastructure correctly.

In my previous post on this topic I linked to a number of VMware papers, and in the post prior to that one I linked to even more papers that include best practice considerations for configuration and sizing of the VMs, how to configure the VMs for clustering, and many other topics. Newer versions of these documents exist for ESX 5 and a number also exist for Hyper-V as well. I recommend that anyone looking at running SQL in a VM, whether as a clustered instance or not, spend some time reading through the papers about the hypervisor you want to run the VM on so you understand how it works, the best practices for running SQL Server on that hypervisor, and what to look for while troubleshooting problems should they occur.

In the end, Microsoft supports SQL Server failover clustering on SVVP certified hypervisors, so there isn’t a hard reason to not consider using VMs objectively to evaluate whether they might be an appropriate fit your business requirements.  When I teach about virtualization in our IE3: High Availability & Disaster Recovery class, most of the perceptions at the start of the virtualization module are negative towards SQL Server on VMs, often from past experiences of failed implementations.  By the end of the demos for the module, most of the opinions have changed, and in a lot of cases attendees have found and been able to communicate correctly with their VM administrator to get the problem fixed while I am performing demos of specific problems and their causes.  In the last year I’ve setup a number of SQL Server clusters on VMs for clients where it was the best fit for their needs.  If you would like assistance with reviewing the infrastructure, business requirements, and determining the best configuration for your needs, I’d be happy to work with you as well. 

New Article: Performance Problems with SQL Server 2012 Enterprise Edition Under CAL Licensing

My latest article on SQLPerformance.com was published yesterday and highlights the potential for performance problems while using the default configuration of SQL Server 2012 using the Enterprise Server+CAL upgrade licensing under Software Assurance.

http://www.sqlperformance.com/2012/11/system-configuration/2012-cal-problems

Faking Hardware NUMA on a SMP Machine for Learning and Demos

I’ve blogged about NUMA a couple of times in the past and we cover it extensively in our Immersion Events, specifically when discussing SQLOS and memory and scheduling management.  One of the challenges with this is that laptops are SMP machines, so unless you have a server with hardware NUMA it is impossible to show the differences in how SQLOS boots up with or without NUMA, at least that’s what I used to think.  It turns out that in Windows 7 and Windows Server 2008 R2, the support for > 64 processor support and processor groups allows you to be able to fake a hardware NUMA configuration at the OS level for testing purposes.

To show how this works I am going to use one of my test VMs running in VMware Workstation 8 on my Dell M6700 laptop that I use regularly for teaching and presentation demos.  For the purposes of this post I have reconfigured the VM to have four CPUs with two cores each and 16GB RAM.

SMP Configuration (Default)

Under the default configuration of the VM has a single memory node presented by Windows to SQLOS so it starts up with a single NUMA node and all of the logical processors are assigned to that node, as shown in the Server Properties dialog below.

Faking Hardware NUMA on a SMP Machine for Learning and Demos   image thumb

We can also see this information from the DMVs with the following query:

SELECT
    mn.processor_group
    , mn.memory_node_id
    , n.node_id AS cpu_node_id
    , n.online_scheduler_count
    , n.cpu_affinity_mask
    , n.online_scheduler_mask
FROM sys.dm_os_memory_nodes AS mn
INNER JOIN sys.dm_os_nodes AS n
    ON mn.memory_node_id = n.memory_node_id;

Faking Hardware NUMA on a SMP Machine for Learning and Demos   image thumb

NUMA Configuration (bcdedit)

A fake NUMA configuration can be created by setting the groupsize value in the Boot Configuration Data (BCD) using BCDEdit to the number of CPUs that should be grouped into each processor group. The appropriate values are 1 to 64 in powers of 2. This isn’t documented in the standard BCDEdit documentation in the Books Online, but it is documented in the Windows Drivers development section.

BCDEdit /set (Windows Drivers)

To use BCDEdit you have to be running from a elevated command prompt as a local administrator.  For the purposes of this post I am going to use a groupsize of 2.

Faking Hardware NUMA on a SMP Machine for Learning and Demos   image thumb

After changing the BCD, it is necessary to restart the machine for the configuration changes to take effect to the configuration.

Disclaimer:  This is not something you should do on a production SQL Server and this is only something that should be used for testing and/or learning about SQLOS on a non-NUMA hardware machine.

After rebooting the server, if we look at the Server Properties in, we will have four NUMA nodes, each with two logical processors assigned to it.

Faking Hardware NUMA on a SMP Machine for Learning and Demos   image thumb

Looking at the DMVs in SQL Server will also show the changes to how SQLOS started up and configured the instance.

Faking Hardware NUMA on a SMP Machine for Learning and Demos   image thumb

Additionally, the Performance Counters in PerfMon for the Buffer Nodes we can see memory usage information per node:

Faking Hardware NUMA on a SMP Machine for Learning and Demos   image thumb

Keep in mind this is all false information, but you can really learn a lot about SQL Server and perform tests of behaviors under NUMA using this method.