I’ve been asked questions about these two features a number of times in the last year where the specific concepts for them have been severely confused.  Since I have used the same information each time I’ve answered these questions I figured it would be worth blogging about here as a reference as well.

Event Notifications were introduced in SQL Server 2005 and offer the ability to collect a very specific subset of SQL Trace (http://msdn.microsoft.com/en-us/library/ms190655.aspx) and DDL Events (http://msdn.microsoft.com/en-us/library/bb522542.aspx) through a Service Broker service and queue.  The use of Service Broker allows you to write stored procedures that can process the events asynchronously when they are placed on the queue through a functionality of Service Broker known as queue activation.  You can also make use of remote servicing that allows the event data to be sent from one SQL Server to another for processing which is very useful in larger environments where you want to centralize all of the events for reporting.

Extended Events were introduced in SQL Server 2008 and use entirely different architecture inside of the database engine for providing events.  Extended Events contain a number of events that are not available by any other means in SQL Server, but there are a number of events in SQL Trace that are not available in Extended Events in SQL Server 2008; for example the blocked process report is not available by Extended Events.  In SQL Server 2012 RTM, all of the events from SQL Trace have been implemented in Extended Events, and Extended Events will replace SQL Trace entirely in a future release of SQL Server since SQL Trace is a newly deprecated feature in 2012 (deprecated = slated for removal in 2-3 product life cycles, it does not mean removed entirely). 

The key difference between the two features is that Event Notifications allow automated processing of the events asynchronously through Service Broker.  There is no similar mechanism for Extended Events currently; even in SQL Server 2012 where Event Notifications still piggy back SQL Trace for event generation.  If you are interested in building an alerting infrastructure I would still use Event Notifications depending on the information that you are interested in gathering.  To build an alerting solution on top of Extended Events would require consistently polling the event session targets and tracking which events have already been seen by the solution in SQL Server 2008, and in SQL Server 2012 it would require the same kind of polling solution unless you wrote a .NET application to leverage the streaming API for Extended Events to harvest the events in real time.  Depending on the event frequency and target being used either of these solutions could lead to missed alerts or an expensive polling operation that reads the same information repeatedly.

Either feature can safely be used in a highly transactional system if the appropriate considerations are made in the implementation, and either feature can lead to performance impacts as well.  It all depends on the events being collected and how frequently they get fired.  The events that are available in Event Notifications are a generally safe subset of the events from SQL Trace and generally won't cause performance problems, but your mileage may very based on the frequency of the events firing.  As always you should test the event selection and monitor for impact after implementing a change to the system.

I hope this helps anyone interested in these two features.

Traditionally questions about how much memory SQL Server needs were aimed at how to appropriately set the 'max server memory' sp_configure option in SQL Server, and in my book the recommendation that I make is to reserve 1 GB of RAM for the OS, 1 GB for each 4 GB of RAM installed from 4–16 GB, and then 1 GB for every 8 GB RAM installed above 16 GB RAM.  This has typically worked out well for servers that are dedicated to SQL Server.  You can also get much more technical with determining where to set 'max server memory' by working out the specific memory requirements for the OS, other applications, the SQL Server thread stack, and other multipage allocators.  Typically this would be  ((Total system memory) – (memory for thread stack) – (OS memory requirements ~ 2-4GB) – (memory for other applications) - (memory for multipage allocations; SQLCLR, linked servers, etc)), where the memory for thread stack = ((max worker threads) *(stack size)) and the stack size is 512KB for x86 systems, 2MB for x64 systems and 4MB for IA64 systems.  The value for 'max worker threads' can be found in the max_worker_count column of sys.dm_os_sys_info.  However, the assumption with either of these methods is that you want SQL Server to use everything that is available on the machine, unless you've made reservations in the calculations for other applications.

As more shops move towards virtualizing SQL Servers in their environment this question is more and more geared towards determining what is the minimum amount of memory that a SQL Server will need to run as a VM.  Unfortunately there is no way to calculate out what the ideal amount of RAM for a given instance of SQL Server might actually be since SQL Server is designed to cache data in the buffer pool, and it will typically use as much memory as you can give it.  One of the things to keep in mind when you are looking at reducing the RAM allocated to a SQL Server instance is that you will eventually get to a point where the lower memory gets traded off for higher disk I/O access in the environments.

If you need to figure out the ideal configuration for SQL Server memory in an environment that has been over provisioned the best way to try to go about doing this is start off with a baseline of the environment and the current performance metrics.  Counters to begin monitoring would include:

  • SQL Server:Buffer Manager\Page Life Expectancy
  • SQL Server:Buffer Manager\Page reads/sec
  • Physical Disk\Disk Reads/sec

Typically if the environment has excess memory for the buffer pool, the Page Life Expectancy value will continue to increase by a value of one every second and it won't typically drop off under the workload because all of the data pages end up being cached.  At the same time, the number of SQL Server:Buffer Manager\Page reads/sec will be low after the cache ramp up occurs which will also correspond to a low value for Physical Disk\Disk Reads/sec. 

Once you have your baseline for the environment, make a change to the sp_configure 'max server memory' option to reduce the size of the buffer pool by 1GB and then monitor the impact to the performance counters after things stabilize from the initial cache flushing that may typically occur when RECONFIGURE is run in the environment.  If the level of Page Life Expectancy remains acceptable for your environment (keeping in mind that a fixed target of >= 300 is ridiculous for servers with large amounts of RAM installed), and the number of SQL Server:Buffer Manager\Page reads/sec is within what the disk I/O subsystem can support without performance degradation, repeat the process of reducing the sp_configure value for 'max server memory' by 1GB and continuing to monitor the impact to the environment. 

While teaching last week at our Tampa Immersion Event, I mentioned the support for SMB as a storage in SQL Server 2012, which is documented in the following Books Online topics.

Install SQL Server with SMB fileshare as a storage option
Hardware and Software Requirements for Installing SQL Server 2012 : Storage Types for Data Files

Someone asked the question about whether SMB could be used for the shared storage for a failover cluster in SQL Server 2012, and I had to go do a little research, but I found that it was a supported configuration.

AlwaysOn Failover Cluster Instances (SQL Server)
Before Installing Failover Clustering

Last year, Kevin Farlee from the Storage Engine Team blogged about the changes in the SMB stack that make hosting SQL Server databases over SMB 2.2 a feasible prospect in his blog post SQL Databases on File Shares - It's time to reconsider the scenario.  Additionally the SQLCAT team, actually Kevin Cox specifically, blogged about the use of SMB for hosting database files in the Top 10 Hidden Gems in SQL Server 2008R2 post.  Now that this is a viable alternative to expensive SAN’s it is not surprising to see that it is also a viable storage option for SQL Server Failover Clustering in SQL Server 2012. 

(Note: SAN infrastructures provide a lot of additional benefits that a single server hosting a file share doesn’t, so before you run out and place a mission critical SQL Server database on a SMB file share, make sure that you understand the implications of doing so, and the trade offs that you are making in the process.)

Building the Failover Cluster in Windows

Building the Failover Cluster in Windows is essentially the same as building a standard Failover Cluster for SQL Server, with the exception that you have to skip the Disk checks in the Cluster Validation Tool, and then override the failed checks to actually form the cluster.  To do this, select the option to Run only tests I select on the Validate a Configuration Wizard.

image

Then collapse all of the parent nodes in the treeview and uncheck the Storage checks and run the cluster validation.

image

When the validation tests complete, review the report and make sure that you have a supported configuration for failover clustering.

image

Then click the No option on the Validation Warning page and create the cluster by providing a Cluster Name and IP Address.

image

image

image

Configuring the File Share for Failover Clustering Support

The first step in setting up a Failover Cluster instance for SQL Server that uses SMB for its shared storage is to configure the File Share server to support the clustered instance being able to connect to it with the appropriate permissions to manage the SQL Server databases.  Since there is no shared storage in the environment, there is an additional requirement, if you are using a 2 node Failover Cluster, that the File Share will need to be the witness in the quorum configuration using Node and File Share Majority.  The requirements for this are documented in the Exchange Books Online (I tried to find a SQL Server Books Online or Windows Server Failover Clustering Books Online entry that provided the correct details and couldn’t) http://technet.microsoft.com/en-us/library/bb676490(EXCHG.80).aspx.  Essentially, you need to have a separate File Share for the quorum, and a separate File Share for the SQL Server instance to use for data storage. 

Note: I didn’t initially set my environment up this way, and due to the time it took to create the screenshots for this blog post before filling in the text, I chose to not go back and correct this error, but as a best practice, you should have separate shares for the quorum and SQL Server instance installation if you choose to use SMB for a failover cluster.  This is documented in the Windows Server Failover Clustering Books Online. http://technet.microsoft.com/en-us/library/cc770620(WS.10).aspx

The first thing you want to do is Provision a new Share using the Provision Share context menu item from the Share and Storage Management node on the File Server.

image

Then you need to provide a path for the storage of the share.  If you are actually setting this up, perform all these steps twice to create two separate shares with the necessary permissions.

image

Then you want to manually edit the NTFS Permissions for the File Share being created.

image

For the Quorum Share, you need to provide the Failover Cluster Virtual Computer Object (VCO) account Full Control of the Share in NTFS.  The VCO is the ClusterName followed by a $ in Active Directory, and you will need to click the Objects button to add in the Computers object for the search to find the account.  For the installation share, the SQL Server Service Account will require Full Control permissions in NTFS.

image

The next step is to provide a name for the File Share(s) that you are creating which will be the path to the share for configuring your quorum and then your Failover Cluster installation.

image

I am skipping over the SMB Settings window and going to the SMB Permissions where you will need to customize the share permissions for the specific Users that need to access the share to minimize security issues in the environment.

image

Just like with NTFS, for the Quorum Share, you need to provide the Failover Cluster Virtual Computer Object (VCO) account Full Control of the Share.  For the installation share, the SQL Server Service Account will require Full Control permissions over the Share.

image

Then create the share and you are ready to configure the quorum settings for the failover cluster.

Configuring Quorum Settings

To configure the Node and File Share Majority quorum for the cluster, required only if you have an even number of voters in the configuration which is typical for 2 node clusters, click the Configure Cluster Quorum Settings menu item from the failover clusters context menu.

image

Pick the Node and File Share Majority radio button.

image

Then Browse for the shared folder and type in the name of the File Server that is hosting the SMB share for the quorum.

image

If all the permissions are configured correctly, you will have a successful configuration of the quorum in the environment.

image

Installing the Failover Cluster Nodes

For the interests of brevity, I am going to skip showing all of the standard Failover Cluster installation setup screens and only show the ones that matter for the configuration using SMB as the shared storage.  For the instance features in this blog post, the Database Engine, Client Connectivity, Client Connectivity Backwards Compatibility, and Management Tools have been selected for a minimum install in the environment.   Everything is exactly the same as it would be in a standard SQL Server Failover Clustered instance installation until you get to the Cluster Disk Selection page of the installation.  Here there will be nothing shown in the environment, and what is interesting in comparison to SQL Server 2008R2 and previous is that you still have the option to click Next.

image

The next page of interest is the the Database Engine Configuration page, where you will provide the UNC path to the SMB share for the Data root directory, and if you have multiple shares that target different physical disk arrays in the File Server for the instance you could also specify those UNC paths to achieve physical isolation of the I/O for logs, data files, tempdb and backups following best practices.

image

Notice the warnings about the file server being specified as the data directory.  If you double click on one of the warnings it will produce a dialog box similar to the following box.

image

If you don’t get the warnings, don’t worry, you will still get the above warning dialog box as soon as you click Next on the page.

After this last warning Setup will proceed as normal only it will install into the File Share everything that would have been on the shared disk in a standard configuration with a SAN.  Once the first node is setup, you can then proceed to run setup on the additional nodes and they will work just like a standard failover cluster configuration would against a SAN.

Summary

Once all of the setup completes, you can validate the location of the database files using the sys.master_files DMV in SQL Server.

image

The SMB storage option lets you get past the 25 instance limitation for SQL Server when using shared disks with drive letters assigned, since each instance requires a separate drive letter.  Using SMB up to 50 instances can be installed in a failover cluster, which provides a significant increase in the number of instances you can get over using a SAN.  The improvements in the SMB stack make this a much more viable solution to expensive shared storage implementations using SAN hardware, but as I pointed out earlier, there are still considerations that should be made about the level of redundancy and other benefits that are provided by SAN implementations before you determine that a SMB solution for shared storage in a failover cluster is appropriate for your specific environment.

Yesterday I was asked by email about a problem that someone encountered associated with a SQL Server Failover Cluster configuration that I have run into a number of times myself, and I have had questions about repeatedly in the past.  The problem is that during the SQL Server Setup Validation of the environment, a warning is raised stating that the Network Binding Order is incorrect for the environment.  If you click on the specific Setup Validation Warning you will get a box like the following:

image

What this warning is telling you, is that you have multiple network interfaces configured on the server, and the default binding order for the interfaces places the current Domain access interface in a position that is not the first interface for the server.  This can be changed by adjusting the network binding order for the server in the Network Properties.  To make this change, open up the Network and Sharing Center by clicking on the network connection on the system tray and then clicking the Open Network and Sharing Center link.

image

When the Network and Sharing Center opens, click on the Change adapter settings link on the left hand side.

image

Then when the Network Connections window opens, you have to press the Alt key to open up the window context menu so that you can then click on Advanced and Advanced Settings.

image

When the Advanced Settings window opens, you will be on the Adapters and Bindings tab, which allows you to change the binding order for the network interfaces on the server.

image

If you click on the appropriate interface for the domain connection, you can move it up to the first position in the binding order which will eliminate the warning in SQL Server Setup.

image

If changing the binding order does not resolve the warning, there may be a disabled or ghost network adapter in the system as discussed in the following KB article. (http://support.microsoft.com//kb/955963)

For the most part I have been relatively quiet about the coming changes in SQL Server 2012 with regards to Extended Events.  Primarily this has been to allow the new features of the product to become fully baked to ensure that the information would continue to be applicable as the product lifecycle progressed, and there have been a number of major changes that have made this decision a really good one.  With SQL Server 2012 in it’s RC0 phase, and based on the responses I have seen to a number of feedback items for bugs on Microsoft SQL Server Connect, like the recent one for the template issue I blogged about on my blog post, Workaround for Bug in Activity Tracking Event Session Template in 2012 RC0, I’ve decided to go ahead and start a new series of posts that outline the new features of Extended Events in SQL Server 2012.  I can’t think of a better way to start off a series on the new features in SQL Server 2012 for Extended Events than the new SQL Server Management Studio User Interface for Extended Events in a couple of blog posts.

For this initial post, we’ll take a look at some of the features that replace existing SQL Profiler functionality that most DBA’s tend to use in their day to day operations.  To start off this topic, the first thing you need to know is that there is a new node for Extended Events under the Management folder in Object Explorer for SQL Server 2012.

image

If you happen to be connecting to a SQL Server 2008 server using SSMS from SQL Server 2012, this node will not exist.  This is due to the fact that UI for SSMS in SQL Server 2012 are not backwards compatible with SQL Server 2008, even though Extended Events exist in SQL Server 2008.  At some point when SQL Server 2012 actually releases to manufacturing (RTM), I will release an update to my SSMS Addin for Extended Events that back ports compatibility for SQL Server 2008 to Management Studio 2012, allowing full integration between SSMS 2012 and SQL Server 2008.

Within the scope of SQL Server 2012, the UI provides a lot of new functionality that should simplify the implementation and usage of Extended Events for most DBA’s.  One of the best enhancements is the ability to create an event session using the New Session Wizard to define an event session based with the least number of steps possible, possibly using an template for the event session, or manually defining a custom configuration that is applied to all of the events configured for the session.

image

By opening the New Session Wizard, immediately a Introductory page is presented that can be bypassed by selecting the option to Do not show this page again:

image

Since this page will typically slow down the creation of an event session for use, I would typically check this option before clicking on Next. The Set Session Properties page will allow you to specify a name for the session as well as whether or not the event session will startup automatically when SQL Server starts.  This can be very useful for troubleshooting an infrequent problem that does not predictably occur and you need to ensure that the session data is collected whenever the problem next occurs.  Some examples of where this might be applied will be shown in future posts in this series.

image

The New Session Wizard provides the ability to create the new event session based on a previously created template, or one of the templates provided by default with SQL Server 2012.  Keep in mind that in the RC0 build, and unfortunately the RTM release of SQL Server 2012, the Activity Tracking template has a bug in the XML definition that will cause this UI to error out.  This was documented by Mike Wachal, the PM for Extended Events at Microsoft on his blog post Activity Tracking event session template is broken, but a fix for the problem in the XML is available in my blog post Workaround for Bug in Activity Tracking Event Session Template in 2012 RC0.  After replacing the broken template with the one attached to my previous blog post we can select it in the UI.

image

The alternative to using an existing template is to create a blank event session by selecting the option to Do not use a template.  If a template is selected, once you click Next the events from the template will be displayed in the Select Events To Capture page, which also displays the events available in Extended Events and information about the data returned by the events.  The event library can be searched a number of ways to simplify finding the correct events for the session.  The most common way to search, once you start learning the events that are available, would be to start typing the event name in the textbox (green circle below) and the results will dynamically begin filtering out the events that don’t match the search text.  If you click on a specific event in the table, the event description and information about the columns returned by the event will be displayed (purple box below).

image

However, if you don’t know the specific events that you want, but you know the general category that the events apply to you can make use of other search options in the UI as well.  Events in Extended Events are broken down by two attributes, a Category (Keyword in the DMVs) and a Channel, that make them compatible with Event Tracing for Windows (ETW).  The Category is similar to the trace category that the existing Trace events have and can be used for logically grouping events similar to the way SQL Profiler groups events in previous versions of SQL Server.  The Channel aligns with the channels you would see with ETW.  By default one of the Channels is excluded in the UI, the Debug Channel.  Debug events are focused towards internal debugging tasks and are not of general purpose use by most DBAs.  These events tend to be counting in nature, or can fire incredibly frequently. If you want to see the Debug events in the UI, you can click the drop down and check the checkbox for the Debug Channel and they will be available.

image

Additionally you can filter out specific categories by clicking the Category dropdown and unchecking specific Category names from the selection.  To add an Event to the session, you can double click on the event, or you can select multiple events using Ctrl or Shift + click on the event names and then clicking the arrow that points to the right.  Alternately, you can also remove events using a double click or by highlighting the event and clicking the arrow that points to the left.

image

After adding events to the session, the next page allows you to specify the Global Fields, known as actions in Extended Events, that you want added to each of the events in the session.  If you look at the columns being returned by the events in Extended Events, there are significantly fewer data elements being returned at the individual event level.  Many of the trace columns map to the Global Fields (actions) in Extended Events and can be added as needed to the events.  The goal was to minimize the size of the firing events allowing additional information to be added as needed to maximize the performance of Extended Events firing.

image

After selecting the global fields to add to the Event Session and clicking Next, the Set Session Event Filters page is displayed where you can define filtering (known as predicates) on the events in the session.  Any filters that were configured as a part of the template will be displayed in the upper table, while new filters that will be applied to all of the events in the session can be added to the bottom table.  New filters can only be created on the columns that are available for all of the events in the event session, which typically there won’t be any if using multiple events, or on the global fields available to Extended Events.

image

This is a very restrictive functionality of the New Session Wizard that was put in place to provide a parity for session creation to what most users would expect from SQL Server Profiler.  I’ll show more about how this is not the best thing when we look at the other parts of the new UI in SSMS in other posts.  Once the filters have been created, the next step is to define the event storage.  The New Session Wizard restricts you to the event_file and ring_buffer targets, which are going to be the most commonly used targets by most DBAs since they retain the full event data and do not apply additional processing to the events.  For an event session that is going to be collecting data for a long period of time, or data that generates at a fast rate, the event_file target should be used, and similar to SQL Trace you can setup the maximum file size and whether or not file rollover should occur.  If the event session is going to be collecting data for a short period of time or where the event predicates will restrict the session to only collecting a small amount of data, the ring_buffer target will generally be a good choice.

image

Once the data storage has been configured the session can be created by clicking Finish, or you can click Next to get to the Summary page which will allow you to review all the configured options for the Event Session and Script the session definition for further changes to the DDL if necessary.

image

Once the Event Session is created, the last page provides you the opportunity to start the event session and to open the Live Data Viewer for the event session which is similar to the SQL Server Profiler view from SQL Trace.

image

In the next blog post I’ll show the New Session dialog which is not a wizard based implementation and why it provides a much more robust method of creating an event session in SQL Server 2012.

My latest article on Simple-Talk was published this morning.  In this article I dig into the actual meaning of one of the performance counters I often see mentioned on the forums, but in a completely incorrect context.

Great SQL Server Debates: Buffer Cache Hit Ratio

After a recent discussion about Lock Pages in Memory on Twitter, I wrote an article that talks about the history of Lock Pages in Memory and the differing opinions on the appropriate usage of Lock Pages in Memory for 64 bit instances of SQL Server. 

Great SQL Server Debates: Lock Pages in Memory

Thanks to Brent Ozar (Blog|Twitter) and Glenn Berry (Blog|Twitter) for their part in discussing this on twitter, by email, and for reviewing the article before I submitted it.

Just over a year ago I blogged about the enhancements that were made to the sqlserver.page_split Event in SQL Server 2012 to make it easier to identify what the splitting object was and the type of split that was being performed.  Sadly what I discovered writing that post was that even with the extra information about the split type, the event didn’t give you enough information to really focus on the problematic splits that lead to fragmentation and page density issues in the database.  I didn’t do a whole lot with this again until recently when a question was posted by Ami Levin (Blog | Twitter) on the MVP email list that commented that the page_split event was broken in SQL Server 2012 based on a presentation he’d seen by Guy Glantser (Blog | Twitter).

Let me start off by saying, the event isn’t broken, it tracks page splits, but it doesn’t differentiate between an end page split that occurs for an ever increasing index, versus a mid-page split for a random index that leads to fragmentation and page density issues in the database.  Both of these are technically splits inside the storage engine, even if we as DBA’s don’t really care about the end-page split for a increasing key value like an IDENTITY column in the database.  I had Ami pass my information along to the presenter and we traded a few emails on the subject of tracking splits with the specific focus on trying to pull out the mid-page, fragmenting splits.  While going through things for the third time, it dawned on me that this is incredibly simple, based one of the demo’s that was sent to me.  Just over a year ago, I also blogged about tracking transaction log activity in SQL Server 2012 using the sqlserver.transaction_log event, which can be used to track mid-page splits in a database.

Last year when I wrote about the sqlserver.transaction_log event, there were 10 columns output by the event in CTP1, but as of RC0, the events output has changed and only 9 columns are output by the event.

SELECT 
    oc.name, 
    oc.type_name, 
    oc.description
FROM sys.dm_xe_packages AS p
INNER JOIN sys.dm_xe_objects AS o
    ON p.guid = o.package_guid
INNER JOIN sys.dm_xe_object_columns AS oc
    ON oc.object_name = o.name
        AND oc.object_package_guid = o.package_guid
WHERE o.name = 'transaction_log'
  AND oc.column_type = 'data';

image

For the purposes of identifying the mid-page splits, we want to look at the operation column that is output by the event, which contains the specific operation being logged.  In the case of a mid-page split occurring, the operation will be a LOP_DELETE_SPLIT, which marks the delete of rows from a page as a result of the split.  To build our event session, we are going to need the map_key for the LOP_DELETE_SPLIT log_op map.  This can be obtained from the sys.dm_xe_map_values DMV:

SELECT *
FROM sys.dm_xe_map_values
WHERE name = 'log_op'
  AND map_value = 'LOP_DELETE_SPLIT';

With the map_key value, we have a couple of ways to collect the information with our targets.  We could collect everything into an event_file, but that doesn’t really make sense for this event.  Instead the best target for this type of information is the histogram target which will bucket our results based on how we configure the target and tell us how frequently the event fires based on our bucketing criteria.  If we don’t know anything about the server in question, we can start off with a very general event session that has a predicate on the operation only, and then aggregate the information in the histogram target based on the database_id to find the databases that have the most mid-page splits occurring in them in the instance.

-- If the Event Session exists DROP it
IF EXISTS (SELECT 1 
            FROM sys.server_event_sessions 
            WHERE name = 'SQLskills_TrackPageSplits')
    DROP EVENT SESSION [SQLskills_TrackPageSplits] ON SERVER

-- Create the Event Session to track LOP_DELETE_SPLIT transaction_log operations in the server
CREATE EVENT SESSION [SQLskills_TrackPageSplits]
ON    SERVER
ADD EVENT sqlserver.transaction_log(
    WHERE operation = 11  -- LOP_DELETE_SPLIT 
)
ADD TARGET package0.histogram(
    SET filtering_event_name = 'sqlserver.transaction_log',
        source_type = 0, -- Event Column
        source = 'database_id');
GO
        
-- Start the Event Session
ALTER EVENT SESSION [SQLskills_TrackPageSplits]
ON SERVER
STATE=START;
GO

This event session will allow you to track the worst splitting database on the server, and the event data can be parsed out of the histogram target.  To demonstrate this, we can create a database that has tables and indexes prone to mid-page splits and run a default workload to test the event session:

USE [master];
GO
-- Drop the PageSplits database if it exists
IF DB_ID('PageSplits') IS NOT NULL
BEGIN
    ALTER DATABASE PageSplits SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE PageSplits;
END
GO
-- Create the database
CREATE DATABASE PageSplits
GO
USE [PageSplits]
GO
-- Create a bad splitting clustered index table
CREATE TABLE BadSplitsPK
( ROWID UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID() PRIMARY KEY,
  ColVal INT NOT NULL DEFAULT (RAND()*1000),
  ChangeDate DATETIME2 NOT NULL DEFAULT CURRENT_TIMESTAMP);
GO
--  This index should mid-split based on the DEFAULT column value
CREATE INDEX IX_BadSplitsPK_ColVal ON BadSplitsPK (ColVal);
GO
--  This index should end-split based on the DEFAULT column value
CREATE INDEX IX_BadSplitsPK_ChangeDate ON BadSplitsPK (ChangeDate);
GO
-- Create a table with an increasing clustered index
CREATE TABLE EndSplitsPK
( ROWID INT IDENTITY NOT NULL PRIMARY KEY,
  ColVal INT NOT NULL DEFAULT (RAND()*1000),
  ChangeDate DATETIME2 NOT NULL DEFAULT DATEADD(mi, RAND()*-1000, CURRENT_TIMESTAMP));
GO
--  This index should mid-split based on the DEFAULT column value
CREATE INDEX IX_EndSplitsPK_ChangeDate ON EndSplitsPK (ChangeDate);
GO
-- Insert the default values repeatedly into the tables
WHILE 1=1
BEGIN
    INSERT INTO dbo.BadSplitsPK DEFAULT VALUES;
    INSERT INTO dbo.EndSplitsPK DEFAULT VALUES;
    WAITFOR DELAY '00:00:00.005';
END
GO

If we startup this workload and allow it to run for a couple of minutes, we can then query the histogram target for our session to find the database that has the mid-page splits occurring.

-- Query the target data to identify the worst splitting database_id
SELECT 
    n.value('(value)[1]', 'bigint') AS database_id,
    DB_NAME(n.value('(value)[1]', 'bigint')) AS database_name,
    n.value('(@count)[1]', 'bigint') AS split_count
FROM
(SELECT CAST(target_data as XML) target_data
 FROM sys.dm_xe_sessions AS s 
 JOIN sys.dm_xe_session_targets t
     ON s.address = t.event_session_address
 WHERE s.name = 'SQLskills_TrackPageSplits'
  AND t.target_name = 'histogram' ) as tab
CROSS APPLY target_data.nodes('HistogramTarget/Slot') as q(n)

image

With the database_id of the worst splitting database, we can then change our event session configuration to only look at this database, and then change our histogram target configuration to bucket on the alloc_unit_id so that we can then track down the worst splitting indexes in the database experiencing the worst mid-page splits.

-- Drop the Event Session so we can recreate it 
-- to focus on the highest splitting database
DROP EVENT SESSION [SQLskills_TrackPageSplits] 
ON SERVER

-- Create the Event Session to track LOP_DELETE_SPLIT transaction_log operations in the server
CREATE EVENT SESSION [SQLskills_TrackPageSplits]
ON    SERVER
ADD EVENT sqlserver.transaction_log(
    WHERE operation = 11  -- LOP_DELETE_SPLIT 
      AND database_id = 8 -- CHANGE THIS BASED ON TOP SPLITTING DATABASE!
)
ADD TARGET package0.histogram(
    SET filtering_event_name = 'sqlserver.transaction_log',
        source_type = 0, -- Event Column
        source = 'alloc_unit_id');
GO

-- Start the Event Session Again
ALTER EVENT SESSION [SQLskills_TrackPageSplits]
ON SERVER
STATE=START;
GO

With the new event session definition, we can now rerun our problematic workload for a 2 minute period and look at the worst splitting indexes based on the alloc_unit_id’s that are in the histogram target:

 

-- Query Target Data to get the top splitting objects in the database:
SELECT
    o.name AS table_name,
    i.name AS index_name,
    tab.split_count,
    i.fill_factor
FROM (    SELECT 
            n.value('(value)[1]', 'bigint') AS alloc_unit_id,
            n.value('(@count)[1]', 'bigint') AS split_count
        FROM
        (SELECT CAST(target_data as XML) target_data
         FROM sys.dm_xe_sessions AS s 
         JOIN sys.dm_xe_session_targets t
             ON s.address = t.event_session_address
         WHERE s.name = 'SQLskills_TrackPageSplits'
          AND t.target_name = 'histogram' ) as tab
        CROSS APPLY target_data.nodes('HistogramTarget/Slot') as q(n)
) AS tab
JOIN sys.allocation_units AS au
    ON tab.alloc_unit_id = au.allocation_unit_id
JOIN sys.partitions AS p
    ON au.container_id = p.partition_id
JOIN sys.indexes AS i
    ON p.object_id = i.object_id
        AND p.index_id = i.index_id
JOIN sys.objects AS o
    ON p.object_id = o.object_id
WHERE o.is_ms_shipped = 0;

image

With this information we can now go back and change our FillFactor specifications and retest/monitor the impact to determine whether we’ve had the appropriate reduction in mid-page splits to accommodate the time between our index rebuild operations:

-- Change FillFactor based on split occurences
ALTER INDEX PK__BadSplit__97BD02EB726FCA55 ON BadSplitsPK REBUILD WITH (FILLFACTOR=70)
ALTER INDEX IX_BadSplitsPK_ColVal ON BadSplitsPK REBUILD WITH (FILLFACTOR=70)
ALTER INDEX IX_EndSplitsPK_ChangeDate ON EndSplitsPK REBUILD WITH (FILLFACTOR=80)
GO

-- Stop the Event Session to clear the target
ALTER EVENT SESSION [SQLskills_TrackPageSplits]
ON SERVER
STATE=STOP;
GO

-- Start the Event Session Again
ALTER EVENT SESSION [SQLskills_TrackPageSplits]
ON SERVER
STATE=START;
GO

With the reset performed we can again start up our workload generation and begin monitoring the effect of the FillFactor specifications on the indexes with our code.  After another 2 minute period, the following splits were noted.

image

With this information we can go back and again attempt to tune our FillFactor values for the worst splitting indexes and rinse/repeat until we determine the best FillFactor for each of the indexes to minimize splits.  This is an incredibly powerful tool for the DBA moving into SQL Server 2012, and will definitely change how we perform index fragmentation analysis and troubleshoot problems with excessive log generation in SQL Server 2012 onwards.

Cheers!

This afternoon, Orson Weston (Twitter), asked how to find the difference between two binary sets of schema options for replication on the #SQLHelp hash tag on twitter.  The valid values for the @schema_options parameter in replication are documented in the BOL Topic for sp_addarticle (http://msdn.microsoft.com/en-us/library/ms173857.aspx).  However, just having a table of values doesn’t really help you figure out what is different between two binary @schema_options values without doing some bitwise operations on the values.  To do this, we can create a table variable to hold the valid values and descriptions for the schema options and then use the & (Bitwise AND) operator to find which options are set for each of the binary values.

DECLARE @options TABLE
(Value VARBINARY(8), [Description] VARCHAR(1000))

INSERT INTO @options (Value, Description)
VALUES 
    (0x00, 'Disables scripting by the Snapshot Agent and uses creation_script.'),
    (0x01, 'Generates the object creation script (CREATE TABLE, CREATE PROCEDURE, and so on). This value is the default for stored procedure articles.'),
    (0x02, 'Generates the stored procedures that propagate changes for the article, if defined.'),
    (0x04, 'Identity columns are scripted using the IDENTITY property.'),
    (0x08, 'Replicate timestamp columns. If not set, timestamp columns are replicated as binary.'),
    (0x10, 'Generates a corresponding clustered index. Even if this option is not set, indexes related to primary keys and unique constraints are generated if they are already defined on a published table.'),
    (0x20, 'Converts user-defined data types (UDT) to base data types at the Subscriber. This option cannot be used when there is a CHECK or DEFAULT constraint on a UDT column, if a UDT column is part of the primary key, or if a computed column references a UDT column. Not supported for Oracle Publishers.'),
    (0x40, 'Generates corresponding nonclustered indexes. Even if this option is not set, indexes related to primary keys and unique constraints are generated if they are already defined on a published table.'),
    (0x80, 'Replicates primary key constraints. Any indexes related to the constraint are also replicated, even if options 0x10 and 0x40 are not enabled.'),
    (0x100, 'Replicates user triggers on a table article, if defined. Not supported for Oracle Publishers.'),
    (0x200, 'Replicates foreign key constraints. If the referenced table is not part of a publication, all foreign key constraints on a published table are not replicated. Not supported for Oracle Publishers.'),
    (0x400, 'Replicates check constraints. Not supported for Oracle Publishers.'),
    (0x800, 'Replicates defaults. Not supported for Oracle Publishers.'),
    (0x1000, 'Replicates column-level collation.'),
    (0x2000, 'Replicates extended properties associated with the published article source object. Not supported for Oracle Publishers.'),
    (0x4000, 'Replicates UNIQUE constraints. Any indexes related to the constraint are also replicated, even if options 0x10 and 0x40 are not enabled.'),
    (0x8000, 'This option is not valid for SQL Server 2005 Publishers.'),
    (0x10000, 'Replicates CHECK constraints as NOT FOR REPLICATION so that the constraints are not enforced during synchronization.'),
    (0x20000, 'Replicates FOREIGN KEY constraints as NOT FOR REPLICATION so that the constraints are not enforced during synchronization.'),
    (0x40000, 'Replicates filegroups associated with a partitioned table or index.'),
    (0x80000, 'Replicates the partition scheme for a partitioned table.'),
    (0x100000, 'Replicates the partition scheme for a partitioned index.'),
    (0x200000, 'Replicates table statistics.'),
    (0x400000, 'Default Bindings'),
    (0x800000, 'Rule Bindings'),
    (0x1000000, 'Full-text index'),
    (0x2000000, 'XML schema collections bound to xml columns are not replicated.'),
    (0x4000000, 'Replicates indexes on xml columns.'),
    (0x8000000, 'Create any schemas not already present on the subscriber.'),
    (0x10000000, 'Converts xml columns to ntext on the Subscriber.'),
    (0x20000000, 'Converts large object data types (nvarchar(max), varchar(max), and varbinary(max)) introduced in SQL Server 2005 to data types that are supported on SQL Server 2000. For information about how these types are mapped, see the "Mapping New Data Types for Earlier Versions" section in Using Multiple Versions of SQL Server in a Replication Topology.'),
    (0x40000000, 'Replicate permissions.'),
    (0x80000000, 'Attempt to drop dependencies to any objects that are not part of the publication.'),
    (0x100000000, 'Use this option to replicate the FILESTREAM attribute if it is specified on varbinary(max) columns. Do not specify this option if you are replicating tables to SQL Server 2005 Subscribers. Replicating tables that have FILESTREAM columns to SQL Server 2000 Subscribers is not supported, regardless of how this schema option is set. '),
    (0x200000000, 'Converts date and time data types (date, time, datetimeoffset, and datetime2) introduced in SQL Server 2008 to data types that are supported on earlier versions of SQL Server. For information about how these types are mapped, see the "Mapping New Data Types for Earlier Versions" section in Using Multiple Versions of SQL Server in a Replication Topology.'),
    (0x400000000, 'Replicates the compression option for data and indexes. For more information, see Creating Compressed Tables and Indexes.'),
    (0x800000000, 'Set this option to store FILESTREAM data on its own filegroup at the Subscriber. If this option is not set, FILESTREAM data is stored on the default filegroup. Replication does not create filegroups, therefore, if you set this option, you must create the filegroup before you apply the snapshot at the Subscriber. For more information about how to create objects before you apply the snapshot, see Executing Scripts Before and After the Snapshot Is Applied.'),
    (0x1000000000, 'Converts common language runtime (CLR) user-defined types (UDTs) that are larger than 8000 bytes to varbinary(max) so that columns of type UDT can be replicated to Subscribers that are running SQL Server 2005.'),
    (0x2000000000, 'Converts the hierarchyid data type to varbinary(max) so that columns of type hierarchyid can be replicated to Subscribers that are running SQL Server 2005. For more information about how to use hierarchyid columns in replicated tables, see hierarchyid (Transact-SQL).'),
    (0x4000000000, 'Replicates any filtered indexes on the table. For more information about filtered indexes, see Filtered Index Design Guidelines.'),
    (0x8000000000, 'Converts the geography and geometry data types to varbinary(max) so that columns of these types can be replicated to Subscribers that are running SQL Server 2005.'),
    (0x10000000000, 'Replicates indexes on columns of type geography and geometry.'),
    (0x20000000000, 'Replicates the SPARSE attribute for columns. For more information about this attribute, see Using Sparse Columns.')

DECLARE @schema_option VARBINARY(8) = 0x000000000807509F;

SELECT CONVERT(INT, @schema_option, 1) & CONVERT(INT,value,1), *
FROM @options
WHERE CONVERT(INT, @schema_option, 1) & CONVERT(INT,value,1) <> 0;

SET @schema_option = 0x000000000803509F;

SELECT CONVERT(INT, @schema_option, 1) & CONVERT(INT,value,1), *
FROM @options
WHERE CONVERT(INT, @schema_option, 1) & CONVERT(INT,value,1) <> 0;

Using Orson’s two values we can see that the difference between the two is, the first value includes the “Replicates filegroups associated with a partitioned table or index.” option where the second value does not.

In the first post in this blog series on using SQL Server 2012 Distributed Replay, Installing and Configuring SQL Server 2012 Distributed Replay, we looked at how to configure a Distributed Replay environment using multiple clients and a dedicated replay controller.  In this post we’ll actually make use of the previously configured servers to perform a distributed replay using a random workload that has been generated against the AdventureWorks2008R2 database installed on our Replay SQL Server.

Collecting the Replay Trace Data

For the purposes of generating a random workload against AdventureWorks2008R2, I created a workload generator that can be found on my blog post The AdventureWorks2008R2 Books Online Random Workload Generator.  I used this with 2 different PowerShell Windows from SQL2012-DRU1 and SQL2012-DRU2 to run a random workload across multiple sessions against the SQL2012-DB1 server.  To capture the trace data required for performing the replay, SQL Server Profiler was used along with the TSQL_Replay template to create the capture.

image

For production systems, the best way to go about capturing a Replay Trace is to script the trace definition to a file, and then create the trace as a server side trace that is writing to a trace file on local disks for the server.  This has a significantly lower impact that tracing directly from Profiler, which uses the rowset provider for Trace.  With the replay trace running, and the workload generating events I waited for the trace to collect around 80000 rows of data and then shutdown the trace so that I could access the trace file to copy it from the SQL2012-DB1 server to the SQL2012-DRU server where the Distributed Replay Controller is installed.

Preprocessing the Trace File(s)

At the point that I went to perform the preprocessing of the trace file for replay, I realized a difference in my environment using multiple servers to build this blog series versus my original setup using a single server for learning how to use Distributed Replay.  In order to preprocess the trace file for replay, you have to have the Management Tools Basic installed on the server that will be used for preprocessing the trace data.  If you have been following this blog series to learn how to use Distributed Replay, you will need to run Setup on the SQL2012-DRU server to add this feature before it can be used for pre-processing the trace file.  This is necessary to administer Distributed Replay.

image

Once the Management Tools Basic have been installed the server will have to be restarted and then it is possible to make use of the DReplay.Exe executable to administer the Distributed Replay components on the controller server. The DReplay executable has multiple options that can be discovered by using a –? from the command line as follows:

C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn>dreplay -?
Info DReplay    Usage:
DReplay.exe {preprocess|replay|status|cancel} [options] [-?]}

Verbs:
preprocess Apply filters and prepare trace data for intermediate file on controller.
replay     Transfer the dispatch files to the clients, launch and synchronize replay.
status     Query and display the current status of the controller.
cancel     Cancel the current operation on the controller.
-?         Display the command syntax summary.

Options:
dreplay preprocess [-m controller] -i input_trace_file -d controller_working_dir [-c config_file] [-f status_interval]
dreplay replay [-m controller] -d controller_working_dir [-o] [-s target_server] -w clients [-c config_file] [-f status_interval]
dreplay status [-m controller] [-f status_interval]
dreplay cancel [-m controller] [-q]
Run dreplay <verb> -? for detailed help on each verb.

To perform the preprocessing, you will need to do a couple of different steps.  The first thing you need to do is edit any options that you want to set for the pre-processing by editing the DReplay.Exe.Preproces.config file in the C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn path on the server.  There are two configuration files for DReplay.Exe as highlighted below.  At this time make sure that you are only editing the Preprocess.config file.

image

The DReplay.Exe.Preproces.config file contains a schema defined XML document that controls the configuration of the preprocessing.  In general the options set for preprocessing should not need to be changed but if you want to include system sessions as a part of the replay, you can change the options in the XML, which is listed below.

<?xml version="1.0" encoding="utf-8"?>
<Options>
    <PreprocessModifiers>
        <IncSystemSession>No</IncSystemSession>
        <MaxIdleTime>-1</MaxIdleTime>
    </PreprocessModifiers>
</Options>

To preprocess the trace data, open a new command prompt window and change directories to the C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn path.  The trace file has been copied onto the SQL2012-DRU server as C:\DReplay\SQL2012_ReplayTrace.trc.  To preprocess this file first start the “SQL Server Distributed Replay Controller” service by using NET START:

NET START "SQL Server Distributed Replay Controller"

Then execute the following command from within the Binn path to actually preprocess the trace file and output:

dreplay preprocess -i "C:\DReplay\SQL2012_ReplayTrace.trc" -d "C:\DReplay"

This will process the trace file and output the working files for performing the Distributed Replay to the C:\DReplay path.  Below is a screenshot of the full window for preprocessing the trace file.

image

Note: The dreplay executable can be called from any path within the server because the Binn path is a part of the Path Environmental variables.  However, the executable has to be called from within the Binn folder to access the necessary .config files and .xsd schema files for the configuration.  If you want to be able to run this executable from another location on the server, you will need to copy the .config and .xsd files out of the Binn folder to the folder that you want to be able to run dreplay within for it to work.

Performing the Replay

The first step in performing the replay is to start the “SQL Server Distributed Replay Client” service on each of the replay clients using NET START.

NET START "SQL Server Distributed Replay Client"

You will want to verify that each of the clients was able to successfully connect to the controller in the logs as shown in the previous post in this series.  Once this has been done, your environment is almost ready for replay.  For the purposes of this blog series, a SELECT only workload has been generated for replay against AdventureWorks2008R2.  However, in most environments you won’t have a SELECT only workload, so you will have to plan for and prepare your replay environment using a BACKUP/RESTORE of the production database from a point within the captured workload so that the database can be replayed against without having problems associated with Primary Key constraint violations during the replay.

If you want to change any of the parameters associated with the replay operation, you can edit the DReplay.Exe.Replay.config file in the C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn path.  The default contents of the configuration file are shown below:

<?xml version="1.0" encoding="utf-8"?>
<Options>
    <ReplayOptions>
        <Server></Server>
        <SequencingMode>stress</SequencingMode>
        <ConnectTimeScale>100</ConnectTimeScale>
        <ThinkTimeScale>100</ThinkTimeScale>
        <HealthmonInterval>60</HealthmonInterval>
        <QueryTimeout>3600</QueryTimeout>
        <ThreadsPerClient>255</ThreadsPerClient>
        <EnableConnectionPooling>No</EnableConnectionPooling>
        <StressScaleGranularity>SPID</StressScaleGranularity>
    </ReplayOptions>
    <OutputOptions>
        <ResultTrace>
            <RecordRowCount>Yes</RecordRowCount>
            <RecordResultSet>No</RecordResultSet>
        </ResultTrace>
    </OutputOptions>
</Options>

Before performing the actual replay, make sure that the account being used to run the SQL Server Distributed Replay Client service has been granted appropriate access to the target SQL Server and database to be able to perform the replay operations.  Once this has been done replay can be performed using the command line options for DReplay.Exe by providing the appropriate switches, or you can alternately provide the –c command line switch to specify the configuration file that should be used for performing the replay.  If you change any of the default values listed above in the DReplay.Exe.Replay.config file, you will need to specify the –c command line switch for those to take effect.  To perform a replay with the defaults, the following command line execution can be run:

dreplay replay -s "SQL2012-DB1" -d "C:\DReplay" -w "SQL2012-DRU1, SQL2012-DRU2"

Once this is executed, the Distributed Replay Controller will take read in the preprocessed replay file, and then synchronize the replay across all of the clients specified with the –w command line parameter.  While the replay operation occurs, the command window for the controller will output periodic updates about the current status of the replay process.

image

The frequency of the status updates can be controlled using the –f command line switch to specify the number of seconds between each of the updates.  Each of the status updates will provide information about each of the clients including the total number of events that have been replayed, the success rate of the replay operations per client, as well as an estimate for the total amount of time remaining to complete the replay operation.  When the replay completes the total elapsed time and pass rate for the events is output.

image

In the next and final post in this series, we’ll look at some of the common problems with using Distributed Replay and how to resolve them, including manually configuring the Controller and add additional Client Service accounts to the environment after Setup has been completed.

This morning Paul received an email from a member of the community that was asking if it is safe to run SQL Server inside of a VM and whether we would suggest running databases in virtual machines or not.  The root of the question being asked was an article that the person had read back in January titled SQL Server and VMware: A Potentially Fatal Combination.  This article is an amazing example of how misinformation on the internet can lead to confusion and an incorrect decisions about what options are feasible for a given environment or not.  Most people that read my blog know that I have been running SQL Server virtualized on VMware in production environments since late 2004/early 2005.  In all that time, not once have I ever experienced a data corruption issue associated with SQL Server running inside of a virtual machine, even on my laptop where the machine has powered off due to loss of battery life and the machines experienced a hard crash during operation.  Why?

Analyzing the Source of the Misinformation

Lets start out at the top of the article and debunk the misinformation that is contained in it about running SQL Server on VMware.  The first thing is that the article makes some very broad claims about VMware based on the use of VMware Player, which is akin to Microsoft Virtual PC for Windows XP Mode in Windows 7.  This is not a server class hypervisor, nor is it in any way representative of VMware’s main products ESX/ESXi or vSphere.  This is like comparing SQL Server to Microsoft Access, they aren’t even in the same class of functionality or features.

The next claim in the article is that the database is at risk due to hypervisor caching of the writes being performed by SQL Server, which uses the FILE_FLAG_WRITE_THROUGH flag when calling the CreateFile function to open the database, instructing the system to write through any cache directly to disk.  The article claims that VMware Player caches I/O operations and can result in lost writes.  Unfortunately, the example provided is a machine running on Windows Vista which as a hosted platform always uses unbuffered I/O for virtual machines, so it is not caching the writes being performed.  You don’t have to take my word for this, it is documented in the Storage IO crash consistency with VMware products KB article on their site:

For hosted products, write handling depends on the host operating system.

On Linux hosts, VMware does not use unbuffered IO, because it is not safe or supported across all the Linux versions that VMware supports. So currently, VMware hosted products on Linux hosts always use buffered IO.

On Windows hosts, VMware hosted products use unbuffered IO by default.

Unfortunately the entire foundation of the article is based on misinformation that can be proven by reading VMware’s documentation.  The VMware ESX hypervisors DO NOT cache I/O under any circumstance, which is also covered in the same KB article:

VMware ESX acknowledges a write or read to a guest operating system only after that write or read is acknowledged by the hardware controller to ESX. Applications running inside virtual machines on ESX are afforded the same crash consistency guarantees as applications running on physical machines or physical disk controllers. 

If this is the case, you might be wondering how the database in the article became corrupt? Unfortunately it is impossible to know because the necessary parameters surrounding the test configuration are not provided in the article.  Instead the article provides a Disclaimer that tells you the specifics are “rather unimportant”

Disclaimer:
In many articles you will find the execution times of different queries. Please do not interpret these results scientifically or as official benchmarks. You will not even find the configuration of the computer used for these experiments because it is rather unimportant.

I disagree that this kind of information is unimportant, especially when making claims that SQL Server will suffer database corruption issues if running under VMware. 

Setting up a Repeatable Test Scenario

For scientific purposes, I decided to take one of my standard SQL Server 2008R2 VMs running inside of my Dell M6500 laptop and run a series of repeatable tests using VMware Workstation 7 on Windows 7.  The VM is configured with 2 vCPUs and 2GB RAM, and like the article a thumb drive will be used to store the database data and log files.  To setup the environment for the tests, the following script will be used:

-- Create a "Safe" database on the VMDK stored on my laptop SSD
CREATE DATABASE [VMSafe]
ON PRIMARY
( NAME = N'VMSafe',
  FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\VMSafe.mdf',
  SIZE = 6144KB,
  FILEGROWTH = 1024KB )
LOG ON
( NAME = N'VMSafe_log',
  FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\VMSafe_log.ldf',
  SIZE = 1024KB ,
  FILEGROWTH = 1024KB)
GO

-- Create a "Test" database on a removable thumb drive
CREATE DATABASE [VMTest]
ON PRIMARY
( NAME = N'VMTest',
  FILENAME = N'E:\SQLskills\VMTest.mdf',
  SIZE = 6144KB ,
  FILEGROWTH = 1024KB )
LOG ON
( NAME = N'VMTest_log',
  FILENAME = N'E:\SQLskills\VMTest_log.ldf' ,
  SIZE = 1024KB ,
  FILEGROWTH = 1024KB)
GO

-- Create the baseline table in the "Safe" database
USE [VMSafe];
GO
IF OBJECT_ID('dbo.TestTable') IS NOT NULL
BEGIN
    DROP TABLE dbo.TestTable;
END
GO

CREATE TABLE dbo.TestTable
(RowID INT IDENTITY PRIMARY KEY,
Col1 INT,
Col2 CHAR(400) DEFAULT('ABC123'),
Col3 NCHAR(400) DEFAULT('789XYZ'))
GO
DECLARE @StartTime DATETIME = CURRENT_TIMESTAMP
DECLARE @loop INT = 1
BEGIN TRANSACTION
WHILE @loop <=100000
BEGIN
    INSERT INTO TestTable (Col1) VALUES (@loop);
    SET @loop = @loop + 1;
END
COMMIT TRANSACTION
SELECT DATEDIFF(ms, @StartTime, CURRENT_TIMESTAMP) AS Table1_Time
GO
UPDATE TestTable
SET Col1 = RowID;
GO

-- Create a duplicate of the table in the "Test" database
USE [VMTest]
GO
CREATE TABLE dbo.TestTable
(RowID INT IDENTITY PRIMARY KEY,
Col1 INT,
Col2 CHAR(400) DEFAULT('ABC123'),
Col3 NCHAR(400) DEFAULT('789XYZ'))
GO
SET IDENTITY_INSERT dbo.TestTable ON
INSERT INTO TestTable (RowID, Col1, Col2, Col3)
SELECT RowID, Col1, Col2, Col3
FROM VMSafe.dbo.TestTable
SET IDENTITY_INSERT dbo.TestTable OFF
GO

-- Issue CHECKPOINT to flush dirty buffer pages to disk
CHECKPOINT
GO

Now that we have our databases created and identical data in them so that we can validate the consistency of the environment in the event of a forced crash by removing the USB thumbdrive from the computer, lets issue a UPDATE against all the rows in the VMTest databases to prefix them with a 1_ for the Col2 column.

image

With the UPDATE complete, pull out the USB thumbdrive and then issue another CHECKPOINT operation to force a flush of dirty buffers to the disk.  This will raise a 9001 error for the VMTest database since the log is not available to write the CHECKPOINT log records to.

image

At this point we have a crashed database, so we can plug in our USB thumbdrive and restart the SQL Server instance to allow crash recovery to run for the databases.  Immediately upon instance restart we will see:

image

OMG!  It has the same status as the article, that must mean we have a problem!  NO! The In Recovery status tells you that crash recovery is running for the database.  If the database is In Recovery you should not set it to Emergency Mode as the article states.  Instead open up the ErrorLog for the instance and look at what is happening with the recovery, the database engine logs informational messages about the progression of crash recovery for the databases while it is running.

image

image

In this case, within a few minutes of opening the database the estimated time for crash recovery to be performed is already at 86 seconds.  The only thing you have to do here is be patient and allow the Database Engine to perform the Undo/Redo operations the way it is designed to.

image

image

All it took was 103 seconds for crash recovery to be performed and the database is available online and in a consistent manner.  The log records describing the transaction were hardened in the transaction log before the commit of the update operation occurred ensuring that the database could be recovered in a consistent state.

I have run this series of tests a number of different ways and the database always starts up in a crash consistent state.  Feel free to test this yourself by pulling the USB thumbdrive out while the UPDATE is running, and it will force a rollback of the transaction during crash recovery.

Summary

In Summary, there is nothing inherently unsafe about running SQL Server on VMware.  Microsoft has supported SQL Server on VMware for nearly three years at this point as a validated Server Virtualization Validation Program (SVVP) configuration (http://support.microsoft.com/?id=956893).  This article is further evidence of the danger of misinformation on the internet, and the need to validate information online before making decisions based on the information.  A SQL Server running as a virtual machine in a SVVP validated configuration is going to be as safe as a SQL Server running on physical hardware.

Just over two years ago, right before PASS Summit 2009, I got the insane idea after finishing Chapter 3 for the SQL Server 2008 Internals and Troubleshooting book that I wanted to write another book based on my experiences answering questions on the MSDN forums that would be based on the top 10 most frequently asked questions that I had seen over the years repeatedly.  At the time I had no idea how to contact a publisher (the Wrox book was handled entirely by Christian Bolton and I was honored to be asked to contribute Chapter 3 to it). I didn’t want to write this book for profit, and I really liked the way that Redgate and SimpleTalk publishing gave away free copies of the books that they published at events as well as online eBooks that could be downloaded for free.  At PASS I asked Grant Fritchey how he got hooked up with SimpleTalk to publish his book on Execution Plans, and he introduced me to the editor at Redgate, Tony Davis, who really liked the idea behind the book, and within weeks of PASS finishing, I was fast at work outlining chapters and writing the beginnings of what would ultimately become Troubleshooting SQL Server: A Guide for the Accidental DBA.

While the title of this says that it is a Guide for the Accidental DBA, the the book is far more in-depth than just covering the basics. The first chapter in the book provides a methodology for troubleshooting problems in SQL Server that I use every day while working on client machines, and that I have used time and again over the last 6 years that I have been a DBA for SQL Server.  One of the key points that we try to make clear in this chapter is that a problem is rarely identified correctly based on a single piece of information. Appropriate identification of the root cause of a problem is critical in applying the appropriate fix, and this chapter covers how to look at all the available information that SQL Server provides to find the root cause of a problem as fast as possible.   A slightly modified for web preview of the first chapter is available on the SimpleTalk website as an article titled A Performance Troubleshooting Methodology for SQL Server.  This chapter sets up the content for the remainder of the book, and the remaining chapters target specific problem areas that we see repeatedly on the various forums online.

  • High Disk I/O –RAID misconfiguration, inadequate I/O throughput, poor workload distribution, SAN issues, disk partition misalignment and more
  • High CPU usage –insufficient memory, poorly written queries, inadequate indexing, inappropriate configuration option settings, and so on
  • Memory mismanagement – the advent of 64-bit SQL Server removes the memory allocation "shackles" placed on its 32-bit predecessors, but issues arising from incorrect memory configuration are still common
  • Missing indexes – arguably the number one cause of wasteful resource usage in SQL Server
  • Blocking – caused mainly by poorly designed databases that lack proper keys and indexing, and applications that apply needlessly restrictive transaction isolation levels
  • Deadlocking – covering the Bookmark Lookup deadlock, the Serializable Range Scan deadlock, the Cascading Constraint deadlock and more
  • Full transaction logs – lack of log backups, hefty index maintenance operations, long running transaction, problems with replication and mirroring environments, and more.
  • Accidentally-lost data – "oops, wrong database!" Let's hope you've got backups!

This book would have never made it to print if it weren’t for the amazing efforts of the other people involved in the project.  The technical editor for the book was Gail Shaw, I will never understand how she was able to work this into her busy schedule.  When I had to pick a technical editor for the book, the depth of content made it incredibly hard to choose the right person to work with.  Luckily Gail agreed to take on the task and I am forever indebted to her for all of the ideas, additional content, and the countless hours she has worked over the last month and a half while we sprinted to the finish to have the book in print by PASS 2011. The coauthor of the book, Ted Krueger, agreed to step in to help by writing two of the chapters last year after I had fallen behind on the writing repeatedly, and made it possible for the book to be copy complete in time for PASS.  Last but absolutely not least is the editor Tony Davis, who has an amazing amount of patience and has probably spent more time on phone calls trying to work through copy and technical edits to provide clarity to the information being presented. 

In the last 48 hours I have seen two different people having the exact same problem so I thought that I would go about blogging some code I’ve had lying around for a while and been meaning to blog that would help them with finding the root cause of their problems.  In both cases, the question started because the person asking it noticed that Page Life Expectancy was below 300, which is based on dated information published by Microsoft before I began working with SQL Server, and consequently is something I’ll discuss a little at the end of this blog post.  In the first case I started asking questions about the amount of memory installed, and what was using the data cache in the server, (see Paul’s blog post Performance issues from wasted buffer pool memory).  In the second case this information had already been presented and as a matter of the troubleshooting of both of the problems it was noted that a single index was using a majority of the data pages in the buffer cache on the servers and the question became, “How do I find out what is using this index?”  and a solution that would help them with figuring out what queries are using a specific index in a database. 

Another Case for Querying the Plan Cache?

In the past I have written a number of scripts that can be used to find interesting information in the plan cache of SQL Server and this is just another one of those.  I originally wrote this back at the end of March when my good friend Thomas LaRock (Blog|Twitter) sent me an email asking if I would mind him providing my contact information to someone that had a question about Extended Events. What the person wanted to do was identify all the queries that were using a specific index using Extended Events, but it turned out this isn’t possible in Extended Events.  Bummer…  However I could offer an alternate solution that involved querying the plan cache using XQuery and then monitoring what was found over time to determine a majority of the queries that use a specific index. 

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
DECLARE @IndexName AS NVARCHAR(128) = 'PK__TestTabl__FFEE74517ABC33CD';

-- Make sure the name passed is appropriately quoted
IF (LEFT(@IndexName, 1) <> '[' AND RIGHT(@IndexName, 1) <> ']') SET @IndexName = QUOTENAME(@IndexName);
--Handle the case where the left or right was quoted manually but not the opposite side
IF LEFT(@IndexName, 1) <> '[' SET @IndexName = '['+@IndexName;
IF RIGHT(@IndexName, 1) <> ']' SET @IndexName = @IndexName + ']';

-- Dig into the plan cache and find all plans using this index
;WITH XMLNAMESPACES
   (DEFAULT '
http://schemas.microsoft.com/sqlserver/2004/07/showplan')   
SELECT
stmt.value('(@StatementText)[1]', 'varchar(max)') AS SQL_Text,
obj.value('(@Database)[1]', 'varchar(128)') AS DatabaseName,
obj.value('(@Schema)[1]', 'varchar(128)') AS SchemaName,
obj.value('(@Table)[1]', 'varchar(128)') AS TableName,
obj.value('(@Index)[1]', 'varchar(128)') AS IndexName,
obj.value('(@IndexKind)[1]', 'varchar(128)') AS IndexKind,
cp.plan_handle,
query_plan
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS batch(stmt)
CROSS APPLY stmt.nodes('.//IndexScan/Object[@Index=sql:variable("@IndexName")]') AS idx(obj)
OPTION(MAXDOP 1, RECOMPILE);


Now I have been asked this before so I will go ahead and defend it ahead of time.  I use transaction isolation level read uncommitted when querying the plan cache just to make sure I don’t cause any problems because this can be a long running query depending on the size of the plan cache and I don’t care if I get some phantom plan as a result of being read uncommitted.  I also use MAXDOP 1 for the query because it tends to be a CPU resource hog, and a lot of times I am running this on a production system or telling someone to run this on a production system and it has the potential for impacting performance.  If your server is considerably beefy hardware wise, it probably won’t matter, and for most people it doesn’t negatively impact their environment, but like anything else, I apply a rule of “First do no harm” to something that might be run against a production system.

Where have I used this code since originally writing it back in March?  All over the place, but generally when working with index tuning so I can find what is using a specific index in the system to evaluate what impact changing that index or removing it might have to the environment.  It is also useful for troubleshooting the problem that a specific index is using all of the buffer pool and causing buffer pool flushing to occur on a constant basis resulting in a low Page Life Expectancy for the server.  Using this query you can easily find the plans and statements that have been using this index and begin working to review the query stats information to find out where the problem is in the code, design, or indexing to try and improve the situation.

What’s Wrong about Page Life Expectancy >= 300?

Aside from being a performance metric that was recommended by Microsoft that has never been updated to reflect the changes that have occurred in hardware over the last 12 years, there is nothing wrong with this recommendation (i.e. sarcasm… :-)).  The problem with this fixed value is that it was determined when servers generally had 4GB of RAM installed in them, and servers with 16GB+ of RAM installed in them were extremely expensive and fairly rare. 

This performance counter tells you the number of seconds, at the current point in time, a page will remain in memory without being referenced (so a value of 300 means your buffer pool is flushing every 5 minutes).  So 10 years ago when you were reading anywhere from 1.7GB up to 12GB of data (depending on your server’s memory) from disk into the buffer cache every 5 minutes it was a sign of memory pressure on the server and something you needed to investigate. 

Fast forward to today, where it is not uncommon for a SQL Servers to have anywhere from 48-144GB+ RAM installed in them.  These RAM values equate to 32-132GB of buffer cache size depending on the ‘max server memory’ sp_configure option setting and the amount of memory being used for the plan cache, but you probably get the point by now.  If reading 1.7GB-12GB of data every 5 minutes was bad, how bad would it have to be to read 32GB-132GB of data from disk every 5 minutes consistently? 

Today the value 300 is ridiculously small as a threshold for when to start worrying about buffer pool pressure.

Now from my own experiences as a DBA and over the past 6 months as a consultant, the I/O subsystem is one of the most undersized components in a majority of SQL Server implementations, so the last thing I want to be doing is hammering the disks in my server because I relied on a performance counter that was out of date years ago but never updated.  For the last 3-4 years I have relied on the amount of memory being used by the data cache in SQL Server to determine when Page Life Expectancy was a sign of impending problems.  This means I replaced the old 300 threshold with a more reasonable value of (DataCacheSizeInGB/4GB *300) as a basis for when to begin investigating things. 

Keep in mind that short drops in Page Life Expectancy can be the result of a large query executing and are not a sign of problems in the system (though it may be a sign that you have a tuning opportunity).

Summary: don’t use the value 300 as any kind of Page Life Expectancy threshold. Anyone that continues to recommend doing so is doing you a disservice. Use an adaptive formula like (DataCacheSizeInGB/4GB*300). Even better – monitor the steady-state value of PLE and react when it dips *and stays* below your steady-state value.

Run the Cluster Validation Report

At this point, I have to admit that as a part of trying to streamline the configuration of this environment, I made a mistake in the first post when I added the Failover Clustering feature to the template VM that was cloned to create the two cluster node VMs.  To proceed with setting up a supported cluster configuration, the Failover Clustering feature will have to be removed from each of the cluster nodes, the nodes will have to be restarted, and then the feature will have to have the Failover Clustering feature added back to them in order for the Cluster Validation Report to succeed.  If you don’t perform this step, the Network configuration and Active Directory configuration of the servers will fail in the Cluster Validation Report.

UPDATED 09/20/2011 

NOTE FROM EMAILS: You may have to disconnect the NAT adapters, or "unplug the cable from them" in VirtualBox to pass the Cluster Validation Report.  For some reason it seems like some people are hitting duplicate IP Address issues with the DHCP applied IP Address from VirtualBox for the NAT connection.  If you get a failed CVR, disconnect the NAT adapters, run the CVR Tests again and it should pass.

After removing the Failover Clustering feature, rebooting, and then adding the Failover Clustering feature back to both of the cluster nodes, the cluster nodes will be ready to build the failover cluster.  The first step is to run the Cluster Validation Report to ensure that the current configuration is supported for building a Windows Server Failover Cluster.  To do this, open the Failover Cluster Manager from one of the two VMs from Start | Administrative Tools | Failover Cluster Manager. When the Failover Cluster Manager opens click on the Validate a Configuration link in the right hand corner of the Actions pane, or under the Management pane.

image_thumb[58]

When the Validate Configuration Wizard opens, click Next and then add both of the cluster nodes to the wizard on the Select Servers or a Cluster screen.

image_thumb[59]

Click Next to and then leave the option to Run all tests (Recommended) selected and click Next, and Next again to begin running the tests.

image_thumb[60]image_thumb[61]

When the validation tests complete, as long as the Summary screen doesn’t show a Failure, the cluster can be created based on the configuration of the nodes.  The validation report summary should show that a warning icon, and to see the specific tests that resulted in the validation warnings, you can click on the View Report button on the Summary screen.  Otherwise, click on the Finish button to close the Cluster Validation Wizard.

image_thumb[62]

Creating the Cluster

To build the cluster, click on the Create a Cluster link in the Failover Cluster Manager, and then click Next to get to the Select Servers or a Cluster screen, then add both of the cluster nodes to the wizard and click Next.  On the Access Point for Administering the Cluster screen, type SQL2K8R2CLSTR in the Cluster Name box, and then uncheck the iSCSI networks, 192.168.31.0/24 and 192.168.32.0/24, from the Network list, and then add the IP address 192.168.81.5 as the cluster network IP address to the 192.168.81.0/24 network, then click Next.

image_thumb[66]

On the Confirmation screen click Next, and the Windows Server Failover Cluster will be built.

image_thumb[67]image_thumb[68]

On the Summary screen click Finish to close the Create Cluster Wizard.

Validating the Quorum Configuration

When the cluster is formed, the default configuration should be to use Node and Disk Majority based on the number of cluster nodes being used, in this case 2 cluster nodes, and the disk witness configuration should be for the smallest shared disk, in this case Drive Q to be configured as the disk witness.  However, this configuration should be validated before the cluster is used to install SQL Server Failover Clustering in the environment.  To validate the Quorum settings, in Failover Cluster Manager, right-click on the cluster and expand More Actions and click on the Configure Cluster Quorum Settings menu option, or click on More Actions in the Actions pane and select the Configure Cluster Quorum Settings option to setup the Quorum Configuration for the Windows Server Failover Cluster.

image

In the Configure Cluster Quorum Wizard, click Next, and then validate that the option for Node and Disk Majority (recommended for your current number of nodes), and then click Next.

image

On the Configure Storage Witness screen, ensure that the Witness Disk is set to the 1GB shared disk on Drive Q.

image

If this matches the environment, click Cancel to close the dialog.  If it doesn’t change the Witness Disk to the 1GB disk on Drive Q and then click Next, then click Next again, and finally Finish to complete the reconfiguration of the Quorum settings.

Installing SQL Server Failover Clustering

Unfortunately as a part of building this blog series, I have also been using VirtualBox for the first time because it was free, and I wanted to focus this blog series on using things that are free.  However, when I went to build the failover cluster for SQL Server I quickly learned that not everything is equal in VirtualBox that I have always assumed would be based on years of working with SQL Server Failover Cluster configurations in VMware Workstation.  There happens to be a bug associated with the installation of the VirtualBox Guest Tools that prevents you from setting up a failover cluster when the tools are installed. If you attempt to install SQL as a failover cluster a series of errors like the following will occur with the primary failure being the following error:

The given network name is unusable because there was a failure trying to determine if the network name is valid for use by the clustered SQL instance due to the following error: 'The network address is invalid.'

imageimageimage

To work around this, the VirtualBox tools have to be removed from the cluster node VMs, but before we do this, we need to copy the installation media into the VMs using a Shared Folder from the host.  If you recall from the first post in this series, I recommended that you download SQL Server 2008 R2 Evaluation Edition, and once you have downloaded the SQLFULL_x64_ENU.exe file, it will need to be extracted to a folder on the host operating system which we will then enable as a shared folder in the two guests VMs that we are setting up as our cluster nodes.  In this case I chose to create a folder named SQL2K8R2_Setup under my Downloads folder where the file was downloaded.

imageimage

Once the files have been extracted, logon to the SQL2K8R2CLSTR1 VM and then open the VM settings from Machine | Settings menu item. 

image

Click on the Shared Folders item on the left hand side and then click on the Add button on the far right to add a shared folder to the VM.  Then browse to the path of the local folder or type in the path in the Folder Path box and provide a Folder Name, then click the Read-only checkbox and then click OK and OK again to return the VM.

imageimage

Inside the VM click Start | Computer to open the Explorer window for the local computer, then click on the Network icon on the left sidebar and then double-click on the VBOXSVR computer object to access the SQL2K8R2_Setup shared folder.  Now create a new folder name SQL2L8R2_Setup on the desktop of both of the cluster nodes and copy the contents of the shared folder into the Desktop folder.  Repeat these steps for both of the Cluster nodes.  Once this is complete, uninstall the VirtualBox tools from Start | Control Panel | Programs | Uninstall a Program and then reboot both of the nodes to continue.

For the purposes of this blog series, which was originally only planned to be three posts long but has been extended into infinity based on the number of tasks that could be accomplished in the configuration of an environment for SQL Server and learning inside of that environment, we are going to install a non-slipstream RTM install of SQL Server 2008 R2 into the test environment as a failover cluster.  The reason for doing this is that it will allow a future post to talk about rolling updates of the cluster nodes and how to best accomplish patching in a clustered environment with minimal downtime.  Open the SQL2K8R2_Setup folder on the Desktop that we previously created, then double-click on the setup application in the folder to begin SQL Server Setup on the cluster node.

imageimage

When the Open File – Security Warning dialog opens click on Run to begin Setup for SQL Server.

image

When the SQL Server Installation Center opens up, click on the Installation link on the left hand side and then click on the New SQL Server failover cluster installation link on the right hand side to launch the failover cluster setup.  If another Open File – Security Warning dialog opens click Run again.

image

On the Setup Support Rules screen click OK.  Then on the Product Key screen, leave the option set to Specify a free edition, and the dropdown set to Evaluation and then click Next.

imageimage

On the License Terms screen click the checkbox for I accept the license terms. and then click Next.  Then click Install on the Setup Support Files screen.

imageimage

When the Setup Support Rules checks complete, review the list of items checked and as long as there are only Warnings, click Next.  We will handle most of these warnings in later blog posts in this series, for example configuring MSDTC as a cluster resource, and reviewing the Network Binding order on the servers.

image

On the Features Selection screen click the Select All button and then uncheck the checkboxes for Analysis Services and Reporting Services, then click Next.

image

On the Instance Configuration screen type SQL2K8R2FC in the SQL Server Network Name text box, and then click Next.

image

Click Next on the Disk Space Requirements screen, then click Next on the Cluster Resource Group screen.  On the Cluster Disk Selection screen ensure the the checkbox next to Cluster Disk 1 is checked and then click Next.

image

On the Cluster Network Configuration screen, uncheck the DHCP checkbox and then type 192.168.81.6 in the Address box and then click Next.

image

For the Cluster Security Policy screen leave the option to Use service SIDs (recommended) selected and then click Next.

image

Switch to the Domain Controller VM and then logon and open the Active Directory Users and Computers snapin from Start | Administrative Tools | Active Directory Users and Computers.  Expand SQLskills.demos and then right-click on Users and create a new user from New | User.  Type SQL Server Services into the Full name textbox, and then type SQLServiceAcct in the User logon name textbox. Name the user SQLServiceAcct

image

Type pass@word1 in both the Password and Confirm password textboxes on the New Object – User screen, and then uncheck the User must change password at next logon checkbox, and check the User cannot change password and Password never expires checkboxes.  Click Finish on the subsequent screen to complete creating the new user.

image

Switch back to the SQL2K8R2CLSTR1 VM and type the SQLskills\SQLServiceAcct in the Account Name for the SQL Server Agent and SQL Server Database Engine services.  Then type in pass@word1 for the Password for both of the services and click next.

image

On the Database Engine Configuration screen, click the radio button for Mixed Mode (SQL Server authentication and Windows authentication) and then type pass@word1 into the Enter password and Confirm password textboxes.  Then click on the Add Current User button, and click Next.

Check the checkbox on the Error Reporting screen and click Next.  Click Next on the Cluster Installation Rules screen, and then click Install on the Ready to Install screen to begin the failover cluster installation.

Once the failover cluster setup on SQL2K8R2CLSTR1 completes reinstall the VirtualBox tools on the VM and let it reboot.  Then logon to the SQL2K8R2CLSTR2 VM and then open the SQL2K8R2_Setup folder on the Desktop that we previously created, then double-click on the setup application in the folder to begin SQL Server Setup on the cluster node.  When the Open File – Security Warning dialog opens click on Run to begin Setup for SQL Server.

image

When the SQL Server Installation Center opens up, click on the Installation link on the left hand side and then click on the Add node to a SQL Server failover cluster link on the right hand side to launch the failover cluster setup. Complete the initial setup steps by selecting Evaluation edition, accepting the License Agreement and installing the SQL Support Files.  Click Next on the Setup Support Rules screen and then on the Cluster Node Configuration screen ensure the the MSSQLSERVER instance is selected in the SQL Server instance name dropdown box, then click Next.

image

Type the password for the Service Accounts in the Password box of the Service Accounts screen and then click Next.

image

Check the checkbox on the Error Reporting screen and click Next.  Click Next on the Cluster Installation Rules screen, and then click Install on the Ready to Install screen to begin the failover cluster installation.

When the installation completes, you will have a fully functional SQL Server 2008 R2 failover cluster setup in your free test environment.

To continue the build out of our Completely FREE Playground for SQL Server, we will begin by setting up an Active Directory Domain Controller and a separate iSCSI Virtual SAN, which are both requirements for setting up a SQL Server Failover Cluster.  To make things easier to follow, this post will break these two tasks up into separate sections, with each section focused on the specifics of the setup involved with each of the servers.

Active Directory Domain Controller

To begin the setup of the Active Directory Domain Controller, the first step is to create a Clone of the “base” VM that was created in the previous blog post.  To do this, right click on the VM and select the Clone option from the context menu.

image

To create the Clone, provide a common name to the VM and specify the option to “Reinitialize the MAC address of all network cards” on the first screen of the “Clone a virtual machine” wizard in VirtualBox.

image

When creating a clone of an existing VM, there are two options available; “Full Clone” which copies the existing virtual hard disk file entirely, and “Linked Clone” which creates a snapshot of the existing virtual machine and creates a differential virtual hard disk for the clone.  If you are cloning a VM on a laptop, or on a system with limited space, a “Linked Clone” can save space by reusing the base VMs virtual hard disk for the Windows OS, rather than having duplicated copies of the base OS installation.  However, when a “Linked Clone” is created, the base VM is required for the clone to be able to start up.  If the base VM is moved or lost, all of the “Linked Clones” will become unusable instantaneously.  This is the trade off between the two options; reduced storage vs. viability.

image

Once I create a “Linked Clone” for the Domain Controller, I open the Settings for the VM and remove Adapter 3 and 4 from the Network settings of the VM.  The Active Directory Domain Controller will not have iSCSI targets configured on it, and does not need the iSCSI network adapters configured for it, so it is best to remove those adapters from the VM.  It would also be possible to remove the previously configured Features from Windows Server 2008 R2 from the Active Directory VM if you chose to do so, those features being .NET Framework 3.5.1, Failover Clustering, and Multipath I/O, but it does not matter for this particular setup.  The next step in setting up the Active Directory Domain Controller is to accept the License Terms and login/set a password for the local Administrator account.  Since this is going to be a Domain Controller, we will need to setup a static IP Address for the Domain network adapter.  To do this, click on the Configure networking link on the Initial Configuration Tasks window that pops up immediately after you login to Windows the first time.  Then open the properties dialog for the connection that is not using NAT and setup on the Domain Internal Network.  To determine which network connection is the Domain network in the VM, you can open the VM settings (Machine>Settings) and simulate unplugging the network cable from the vNIC by toggling the Cable connected checkbox.

image  image

For this playground setup, I am going to use 192.168.81.x for the Domain network, and 192.168.31.x for the iSCSI network.  In IPv4 TCP/IP Properties, set the IP Address to 192.168.81.1, the Subnet mask to 255.255.255.0, and the Preferred DNS server to 127.0.0.1.

image

Now the VM needs to be renamed so that it has the appropriate naming convention to match your locally setup environment.  For the purposes of this blog series, the AD DC will be named SQLskills-DC in the environment.  Once the servers name is changed the VM will require a reboot.

Once the reboot for the rename operation completes, the VM is ready to be configured as an Active Directory Domain Controller.  For the purposes of having a playground environment, a very basic configuration of Active Directory can be done, to minimize the steps required and keep things simple. The following steps will provide you with a fully function Active Directory Domain Controller in your playground, but there is no consideration for high availability of the Domain Controller, backups, best practices for configuration, or anything else that would qualify you to be an Active Directory Domain Administrator beyond the confines of the playground environment that is being built.  To begin the configuration, the Active Directory Domain Services role will need to be added to the server.

image

After the ADDS role has been installed, run dcpromo to begin the configuration of the AD Domain on this server.

image image

Click Next and then Next again to get to the Deployment Configuration screen.  Choose the option to create a new domain in a new forest and click Next.

image

Provide a Fully Qualified Domain Name (FQDN) for the new domain and then click Next.  You don’t have to provide an actual domain for the FQDN.  In this series I am using SQLskills.Demos as the FQDN for the domain.

image

The Domain Functional Level and Forest Functional Level defaults of Windows Server 2003 can be used within the sandbox environment.  Click Next on both of these screens.  After the DNS Configuration check completes, the wizard will recommend additional options for the Domain Controller, including the option to configure it as a DNS server. 

image

Leave the boxes checked on the screen and click Next.  If you’ve followed this guide up to this point, there should be a Static IP Assignment warning that pops up due to the dynamically assigned IP address for the NAT connection.  Click Yes on this box to continue, and then click Yes on the pop up that tells you the delegation could not be created for the parent zone in DNS.

image image

For the playground environment, there is no need to change the database and log file locations for AD.  Click Next on this Wizard window, provide a password for the Directory Services Restore Mode Administrator account and click Next again.

imageimage

At this point everything is set to complete the configuration of Active Directory and the information is displayed in a final summary screen.  Clicking Next will begin the configuration of the Active Directory Domain on the server.

image  image

Once the server reboots, it will be a fully functional Domain Controller for your playground environment.

iSCSI Virtual SAN

To facilitate building failover cluster configurations in our playground we are going to need a virtual SAN to provide the shared storage between the failover cluster nodes.  One of the requirements for the virtual SAN is that it must support SCSI-3 Persistent Reservations, which are required for failover clustering using Windows Server 2008/2008R2.  There are a number of free virtual SAN products available online, but for the purposes of this setup I chose to use the Microsoft Windows Server 2008 R2 iSCSI Target 3.3 which is available as a free download.  This iSCSI target meets all of the requirements for failover clustering support, and can be configured in a few minutes.

The steps to clone the template VM for the iSCSI Virtual SAN are exactly the same as they were for creating the Domain Controller, with the exception that all of the vNICs will be left configured for the VM.  Once Windows starts up, the first step is to configure the networking for the Domain network so that the VM can be joined into the playground domain.  To do this, first identify the Domain network connection in Windows by disconnecting the cable in the vNIC settings as previously shown.  Then change the IPv4 TCP/IP properties for the connection and set the IP address to 192.168.81.2, the subnet mask to 255.255.255.0 and then the Preferred DNS server to 192.168.81.1 (the IP address of the Domain Controller).

image

Once this has been done, the server name can be changed and the server can be joined to the domain at the same time.  Once the domain information has been set, the server will attempt to connect to the domain and will request credentials for an account with permissions to join the server to the domain.

image image

After the server joins the domain, it will require a reboot to complete.  When the server restarts, you have to use the Switch User option to change the user that you login with from being the local Administrator account to login with a domain based account.

Note: At this point I would recommend creating a separate user account in the Domain that you use to work on the servers in the playground.  It is never a good practice to use the domain administrator account to work with SQL Server.

Once you’ve logged into the server using a domain account, download the Microsoft iSCSI Software Target 3.3 for Windows Server 2008 R2 and extract the contents to a location on the server.

image

Installing the iSCSI Target onto the server is a very straightforward process that is basically a series of check boxes for options like accepting the licensing agreement, whether to join the Customer Experience Improvement Program, and whether to use Microsoft Update or not.

imageimageimageimageimageimage

Once the installation completes, an entry for the iSCSI Target will be added to the Start Menu and the target MMC can be opened up to begin configuring the iSCSI Targets for the environment.

image

The next post in this series will go through the configuration options for the iSCSI Target and how to create a new LUN to support configuration of a SQL Server Failover Cluster inside of the playground environment.

Steve Jones (Blog|Twitter) asked a question on the #sqlhelp hash tag tonight that I found interesting so I thought I’d play around with Powershell for a few minutes and see if I could beat Aaron Nelson (Blog|Twitter) to the punch with a Powershell answer for once.  Steve’s question was:

@wayoutwest: “Anyone have a PSH script that can add a login, run sp_helprevlogin, pipe the output as a new batch to a 2nd server? #sqlhelp #powershell”

Initially I thought, well that should be easy enough, but then I realized, if you are creating a new login, you don’t really need sp_helprevlogin to create the login on the second server, you just need the new login SID from the first server.  We already have the new login name and password, the only thing sp_helprevlogin gives us that we don’t already have in this case is the SID for the new login.  Ostensibly, the intent behind Steve’s request is to have the login created on both servers with the same SID so that a restore of the database from the first server to the second server doesn’t result in orphaned database users.  A good place for this would be database mirroring where the logins are created on the principle server and the mirror needs an identical login with the same SID for the application to work correctly after a failover occurs.

We can leverage SMO to do this with Powershell by loading the Microsoft.SqlServer.SMO assembly into our environment. 

[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO') | out-null

Then we create and set a number of strings for the principle and mirror server names, and the new login’s name and password. 

[string]$PrincipleServer = ".\2008R2Lab1";
[string]$MirrorServer = ".\2008R2Lab2";
[string]$LoginName = "TestLogin";
[string]$Password = "pass@word1";

Then we create a Server object to both of the SQL instances:

# Create SMO Connections to both SQL Instances
$PriSvr = New-Object ('Microsoft.SqlServer.Management.Smo.Server') $PrincipleServer
$MirSvr = New-Object ('Microsoft.SqlServer.Management.Smo.Server') $MirrorServer

For development and testing purposes we can check the principle server to determine if the login already exists and if so drop it before we create it:

# Check if the login exists on the Principle server and Drop if it does
if ($PriSvr.Logins.Contains($LoginName))
{ $PriSvr.Logins[$LoginName].Drop(); }

To create the new login, we create a new Login object passing the Server object for the principle server and the new logins name into the object constructor.  Then we set the LoginType for the new login object to be a SQL Login and finally we call Create() on the object and pass in the password for the new login to be created.

# Create a new login for the Principle Server
$NewLogin = New-Object ('Microsoft.SqlServer.Management.Smo.Login') $PriSvr, $LoginName
# Specify that this is a SQL Login
$NewLogin.LoginType = [Microsoft.SqlServer.Management.Smo.LoginType]::SqlLogin;
# Create the login on the Principle Server
$NewLogin.Create($Password);

After creating our login, we will need to refresh the Logins Collection on the principle server to get the new login back with its SID, and then we can grab the login into a local variable for easier access later.

# Refresh the login collection to get the login back with SID
$PriSvr.Logins.Refresh();
# Get a Login object for the Principle Server Login we just created
$PriLogin = $PriSvr.Logins[$LoginName]

Same as before, on the Mirror Server we can check if the new login exists and drop it if it does for development and testing purposes:

# Check if the login exists on the Mirror server and Drop if it does
if ($MirSvr.Logins.Contains($LoginName))
{    $MirSvr.Logins[$LoginName].Drop(); }

To create the new login on the mirror server, we follow all the same steps as we did on the principle, except we add one additional step to call set_SID() and provide the SID from the login on the principle server, in this case using the $PriLogin variable we stored the login in for easier access.  Once we set the SID we can call Create() for the login and pass in the password to create the login with.

# Create a new login for the Mirror Server
$NewLogin = New-Object ('Microsoft.SqlServer.Management.Smo.Login') $MirSvr, $LoginName
# Specify that this is a SQL Login
$NewLogin.LoginType = [Microsoft.SqlServer.Management.Smo.LoginType]::SqlLogin;
# Assign the SID to this login from the Principle Server Login
$NewLogin.set_Sid($PriLogin.get_Sid());
# Create the Login on the Mirror Server
$NewLogin.Create($Password);

To validate that this all worked as we expected it to, we can open up SSMS and create a new query window that uses SQLCMD mode, allowing us to connect to both instances and query the sys.server_principals DMV for the TestLogin login to validate we have the same SID.

:CONNECT .\Lab1
SELECT *
FROM sys.server_principals
WHERE name = 'TestLogin';
GO
:CONNECT .\Lab2
SELECT *
FROM sys.server_principals
WHERE name = 'TestLogin';
GO

image

As you can see in the above image, the SID’s are identical which means that the database users will properly map to their server login in the event of a restore from one instance to another, or a database failover when using database mirroring.

The complete script for this is attached to this blog post.

Create Mirrored Login.ps1 (2.13 kb)

Lately I have been doing a lot of work on SQL Server’s that have had 24 or more processor cores installed in them. Many of these systems utilize hardware non-uniform memory architectures, or NUMA, while a few of them were not. This led me into a good bit of additional research into the differences between the two configurations. In this blog post, we’ll look at what NUMA is, and why it matters, and then in a separate blog post we’ll look at what this means to SQL Server and the differences between hard and soft NUMA configurations specifically for SQL.

Only a few years ago NUMA was synonymous with specialized hardware configurations that were designed to scale beyond the processor and RAM limitations of traditional Symmetric Multi-Processing (SMP) systems. The first NUMA system I learned anything about was the IBM x3950 series of servers that were used in a PASS Keynote Demo in 2009. The server being used in the demo was actually four x3950 servers that were interconnected as hardware NUMA nodes using the ScaleExpander Option Kit sold by IBM. Each of the hardware NUMA nodes was a very basic 4-way SMP system with its own processors, memory and I/O devices that could have also operated as a stand alone server. In a SMP configuration, all of the processors have equal access to the memory and I/O in the system as shown in the following diagram.

image

While SMP systems made it possible to scale up the number of processors available inside a single server, as more processors were added to the system the processor bus became a limitation to the overall system performance. In the past systems like the IBM x3950 series of servers were able to scale up the number of processors and amount of RAM available in a single solution by interconnecting multiple SMP systems together with a high speed interconnect that allowed two or more systems to function as if they were a single system. However, one of the problems associated with connecting multiple nodes with an interconnect was the memory access between the processors in one node to the memory in another node was not uniform; hence the name Non-Uniform Memory Architecture. The interconnect between the two systems introduced latency for the memory access across nodes. The traditional concept of NUMA using multiple interconnected SMP systems as NUMA nodes would conceptually resemble the following:

image

Technically this is more complicated, but the general concept is that each of the nodes has access to the memory available in the total system through the interconnect, but at a slight penalty performance wise. For this reason special considerations had to be taken when writing software to run on NUMA configured systems and the Windows Server OS and SQL Server both have NUMA optimizations built into their code base to optimize memory locality. While in the past most people that were using hardware NUMA configurations would have known about it, recent changes in processor architectures have made hardware NUMA configurations much more common place and most people I’ve talked to don’t even realize that their new server utilizes hardware NUMA. 

AMD Hyper-Transport (HT)

Where the older SMP architecture had a separate memory controller in the Northbridge of the motherboard, newer systems have an integrated memory controller built into the processor itself, and each processor has its own memory bank. The first processors to introduce an integrated memory controller were the AMD Opteron series of processors in early 2003. Since the introduction of the Opteron series, other AMD processor lines have included an integrated memory controller as well. Just as with traditional NUMA configurations, the hardware NUMA created by each processor having an integrated memory controller requires an interconnect to exist between the processors to allow for memory access by the other processors in the system. AMD processors share memory access through Hyper-Transport (HT) links between the processors.

image

The HT interconnects between processors allows for the remote memory access in one CPU’s memory bank by the processor cores in a different CPU, but there is a cost for accessing this remote data. To determine the most recent copy of the data probes must be sent out to all of the CPU’s to determine if the current CPU has the most recent version of the data, and this increases latency. To combat this, AMD introduced HT Assist in the Istanbul line of processors which reduces the traffic by tracking data in the processor caches to help direct the processors to the correct location of the most recent copy of the data. This reduces the bus traffic and increases the efficiency of the CPU. 

Intel Quick-Path Interconnect (QPI)

As the number of cores on a single processor increased, Intel followed AMD and introduced an integrated memory controller on its Nehalem-EX series of processors that share memory access through Intel’s Quick-Path Interconnect (QPI). The Intel QPI interconnects the processors with each other in a similar manner to the AMD HT interconnect. However, the QPI snoop based cache forwarding implementation can return remote data in as little as 2 hops. (It should be noted that this in no way is an endorsement of Intel over AMD as there may be similar optimizations in AMD processors that I have not yet read about.) 

image

Intel offers a really detailed demo of their QPI architecture on their site that really helps understanding the differences between SMP and NUMA configurations (http://www.intel.com/technology/quickpath/demo/demo.htm).

Node Interleaving

On newer server systems based on processors that have integrated memory controllers, the option still exists to run the server in an SMP like configuration known as Node Interleaving. This option is generally configurable in the system Bios and causes the system to be treated as if it were a traditional SMP system by the Operating System and any applications that are NUMA aware and optimized. When node interleaving is enabled, the system becomes a Sufficiently Uniform Memory Architecture (SUMA) configuration where the memory is broken into 4KB addressable regions and mapped from each of the nodes in a round robin fashion so that the memory address space is distributed across the nodes. When the system is configured for NUMA, the memory from each node is mapped into a single sequential block of memory address space for all of the memory in each node. For certain workloads, node interleaving can be beneficial to performance since the latency between the hardware nodes of a system using processors with integrated memory controllers is small. However, for applications that are NUMA aware and that have NUMA optimizations, node interleaving should remain disabled to allow the application to make optimized use of the hardware resources.

Why does this matter?

I’ve been working with hardware NUMA servers, using the newer processors with integrated memory controllers, for the last two years, and not once did I really have to worry about whether the server was using NUMA or not. Microsoft has done a lot of work in how they built Windows Server and SQL Server to make working with NUMA configurations relatively pain free. However, recently I ran into some issues while working with client systems that made me have to look deeper into NUMA to gain an understanding of how it affected SQL Server, and what options were available for configuring SQL Server for NUMA. I’ll discuss the specifics of these in my next blog post in a few days.

The first SQL Rally was held last week in Orlando, FL, and I had the honor of being selected for one of the spotlight sessions by the community in the DBA track.  SQL Rally was a different experience from the regular PASS Summit; it wasn’t anywhere as big as the normal summit, but it was larger than most of the SQL Saturday events that I have attended.  If I had to make a comparison, I would say that SQL Rally was more on par with the experience I had attending SQL Bits 7 in York, UK last October, which seems to be right about where PASS wanted the experience to be.  I always enjoy attending events, large or small, where people are passionate about SQL Server. 

Since the event I have had a number of requests for the presentation materials and demos I used in my Deadlocking for Mere Mortals presentation on Friday afternoon.  Below is the session abstract, and attached to this blog post is a copy of the slides and demo’s for the presentation. 

Title:  Deadlocking for Mere Mortals
Speaker: 
Jonathan Kehayias
Category:  Summit Spotlight
Level: 200

Abstract:
While troubleshooting deadlocking in SQL Server has gotten easier in SQL Server 2005 and 2008, it continues to be a constant source of questions in the forums online. This session will look at the most common deadlocks asked about on the forums, and how to troubleshoot them using the various methods available to DBA’s in SQL Server 2005 and 2008; including Trace Flags, SQL Trace, Event Notifications, and Extended Events.

Session Goals

  • Understand why deadlocks occur in SQL Server
  • Understand how to capture deadlock graphs in various SQL Server versions.
  • Understand how to read the deadlock graph to determine the specific cause and how to mitigate against the deadlock.

SQL Rally 2010 - DBA200 - Deadlocking for Mere Mortals.zip (964.07 kb)

This blog post came from a question on the MSDN Forums asking how to change the owner of all SQL Agent Jobs on a server without having to do it manually.  I had to accomplish a similar task in the past, where I needed to scan a list of servers for jobs that were owned by a specific AD account and then change the owner to a different account after an employee left the company I was working for.  I did a quick edit to that script and created the following script which will change the Job owner of every job on a server:

[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.Smo') | out-null

$InstanceName = "."
$NewOwnerLoginName = "sa"

$smosvr = New-Object ('Microsoft.SqlServer.Management.Smo.Server') $InstanceName
$agent = $smosvr.JobServer;

$agent.Jobs | % {
    $_.set_OwnerLoginName($NewOwnerLoginName);
    $_.Alter();
    }

This script is really simple and uses SMO to connect to an instance and then iterates the Jobs in the JobServer changing the OwnerLoginName to the specified name and commiting the changes by calling the Alter() method of the Job object in SMO.

Yesterday I received an email asking me a question related to my Using a Certificate Signed Stored Procedure to Execute sp_send_dbmail on SQL Server Central that intrigued me enough that I had to put a couple of minutes into actually figuring out what the problem the being encountered was.  In the article, I show how to create a stored procedure in a user database that can be used to call sp_send_dbmail without the calling user being in the DatabaseMailUserRole role in msdb, and there are actually a number of uses for this implementation beyond the example that is provided in the article.  I use this same method to allow Service Broker activation procedures to send me emails through Database Mail all the time, and there are numerous other cases where you might need to create a stored procedure in one database that executes code in a separate database, where the calling user should not have access to the separate database. 

I don’t know the exact use case for the problem that was posed in the email question, but the basic tenet of the question was that there was a stored procedure in one database that needed to be called from multiple databases and the intent was to use the same certificate to sign stored procedures in multiple databases to call this centralized stored procedure without giving the users access to the central database.  Is that confusing to you?  I confused me initially when I read the email, and it took a second pass for me to actually understand the problem being encountered and then build out a repro of the issue to begin looking at what the problem might be.  To keep things simple and non-confusing if that is possible, lets first create two databases, two procedures and build out the basic framework from the article for two user stored procedures.

-- Create the calling database
CREATE DATABASE a;
GO
-- Create the target database
CREATE DATABASE b;
GO
-- Switch to the target database
USE b
GO
-- Create a table and insert 10 rows into the table
CREATE TABLE SomeTable (RowID int identity primary key)
GO
INSERT INTO SomeTable DEFAULT VALUES
GO 10
-- Create the target stored procedure that selects from the table
-- in the target database.
CREATE PROCEDURE Select_SomeTable
AS
SELECT * FROM SomeTable
GO
-- Switch to the calling database
USE a
GO
-- Create a stored procedure that calls the stored procedure
-- in the target database
CREATE PROCEDURE ExecuteDatabaseB_Select_SomeTable
WITH EXECUTE AS OWNER
AS
EXECUTE b.dbo.Select_SomeTable
GO
-- Create a certificate to sign the calling stored procedures with
CREATE CERTIFICATE [SigningCertificate]
ENCRYPTION BY PASSWORD = '$tr0ngp@$$w0rd'
WITH SUBJECT = 'Certificate for signing a Stored Procedure';
GO
-- Backup certificate so it can be create in master database
BACKUP CERTIFICATE [SigningCertificate]
TO FILE = 'c:\SQLskills\SigningCertificate.cer';
GO
-- Add Certificate to Master Database
USE [master]
GO
CREATE CERTIFICATE [SigningCertificate]
FROM FILE = 'c:\SQLskills\SigningCertificate.cer';
GO
-- Create a login from the certificate
CREATE LOGIN [SigningLogin]
FROM CERTIFICATE [SigningCertificate];
GO
-- The Login must have Authenticate Sever access
-- per http://msdn.microsoft.com/en-us/library/ms190785.aspx
GRANT AUTHENTICATE SERVER TO [SigningLogin]
GO
-- Create a user in database b for the Login
USE b
GO
CREATE USER [SigningLogin] FROM LOGIN [SigningLogin]
GO
-- Grant EXECUTE on the target stored procedure to the 
-- certificate based login
GRANT EXECUTE ON [dbo].[Select_SomeTable] TO [SigningLogin]
GO
-- Switch to the calling database
USE a
GO
-- Sign the procedure with the certificate's private key
ADD SIGNATURE TO OBJECT::[ExecuteDatabaseB_Select_SomeTable]
BY CERTIFICATE [SigningCertificate] 
WITH PASSWORD = '$tr0ngp@$$w0rd';
GO
-- Create a test login to test that the certificate signed procedure
-- can properly execute the target procedure without the login having
-- access to the target database or target procedure directly
USE master;
GO
CREATE LOGIN [testuser] WITH PASSWORD=N't3stp@$$', DEFAULT_DATABASE=[master], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF
GO
-- Create the test user for the test login in the calling database and
-- grant it EXECUTE privileges on the calling stored procedure
USE [a]
GO
CREATE USER [testuser] FOR LOGIN [testuser]
GO
GRANT EXECUTE ON [dbo].[ExecuteDatabaseB_Select_SomeTable] TO [testuser]
GO
-- Switch to the test login context
EXECUTE AS LOGIN = 'testuser'
EXECUTE a.dbo.ExecuteDatabaseB_Select_SomeTable
GO
-- Revert the context back to our sysadmin login
REVERT
GO

In the above code, database “a” has a stored procedure that calls a stored procedure in database “b” and the “testuser” login only has access to database “a” and EXECUTE privileges on the stored procedure in database “a”.  The certificate from database “a” is backed up and created in the “master” database so that a certificate login can be created from the certificate which has the EXECUTE privilege on the stored procedure in database “b”.  The “testuser” login does not have access to database “b” or any of the objects inside of the database but because the stored procedure in database “a” is signed by the certificate and a login has been created in master from the certificate that has access to EXECUTE the stored procedure in database “b”, the “testuser” login can execute the stored procedure in database “a” and get access to the information in database “b” without having to have explicit access to database “b” or the stored procedure contained in that database.

At this point everything is essentially in line with the information contained in the SQL Server Central article I wrote, and it works exactly as shown in the article, even though it is all being applied to user databases and objects.  Now let’s try to extend this functionality to a third database as a caller of the stored procedure in database “b” by creating a new database, the same stored procedure that was created in database “a”, and then creating the certificate in the new database from the previous backup from database “a” and signing the stored procedure using the same certificate.

-- Create a third database to create another calling database
-- and stored procedure to the target database and stored procedure
CREATE DATABASE c;
GO
-- Create the calling stored procedure in the third database
USE c
GO
CREATE PROCEDURE ExecuteDatabaseB_Select_SomeTable
WITH EXECUTE AS OWNER
AS
EXECUTE b.dbo.Select_SomeTable
GO
-- Create the certificate from in the third database from the 
-- previous certificate backup to allow signing of the procedure
CREATE CERTIFICATE [SigningCertificate]
FROM FILE = 'c:\SQLskills\SigningCertificate.cer';
GO
-- Sign the procedure with the certificate's private key
ADD SIGNATURE TO OBJECT::[ExecuteDatabaseB_Select_SomeTable]
BY CERTIFICATE [SigningCertificate] 
WITH PASSWORD = '$tr0ngp@$$w0rd';
GO

This all works great, right up to the point that we try to sign the stored procedure using the certificate we created from our backup from database “a”.  When we try to sign the stored procedure in database “c” we get the following error back from SQL Server:

Msg 15556, Level 16, State 1, Line 2
Cannot decrypt or encrypt using the specified certificate, either because it has no private key or because the password provided for the private key is incorrect.

So what exactly happened here, the process worked the first time, but now we can’t duplicate it for additional databases that need to call this stored procedure?  The problem is that when we backed up the certificate from database “a” we only backed up the certificate, we didn’t backup the certificate’s private key to a separate file, so we can’t use this same certificate in additional databases to sign stored procedures unless we go back and backup the private key from the certificate as well.

To fix the problem, we first have to drop the stored procedure in database “c” and then drop the certificate without its private key from the database as well.  Then we will need to take a new backup of the certificate from database “a” and specify the WITH PRIVATE KEY clause to backup the private key for the certificate to a separate private key file so that we can recreate the certificate in database “c” with the correct private key to sign the stored procedure.  Once we have done this, we can recreate our stored procedure and sign it with the certificate, then create the database user for the “testuser” login and grant the database user the EXECUTE privilege on the stored procedure in database “c”, and test the configuration by executing the stored procedure in database “c” to ensure that we get the results from database “b”.

-- Fix the problem
USE [c]
GO
-- Drop the calling procedure with the invalid signature
-- Create the calling stored procedure in the third database
DROP PROCEDURE ExecuteDatabaseB_Select_SomeTable
GO
-- Drop the certificate without the private key
DROP CERTIFICATE [SigningCertificate]
GO
-- Backup the certificate with its private key so it can be used in
-- additional databases to sign stored procedures calling the target
-- stored procedure.
USE a
GO
BACKUP CERTIFICATE [SigningCertificate]
TO FILE = 'c:\SQLskills\SigningCertificate_WithKey.cer'
WITH PRIVATE KEY (
FILE = 'c:\SQLskills\SigningCertificate_WithKey.pvk',
ENCRYPTION BY PASSWORD = '$tr0ngp@$$w0rd',
DECRYPTION BY PASSWORD = '$tr0ngp@$$w0rd');
GO
-- Create the certificate in our new database with the private key
-- from the new certificate backup with the private key.
USE c
GO
CREATE CERTIFICATE [SigningCertificate]
FROM FILE = 'c:\SQLskills\SigningCertificate_WithKey.cer'
WITH PRIVATE KEY (
FILE = 'c:\SQLskills\SigningCertificate_WithKey.pvk',
ENCRYPTION BY PASSWORD = '$tr0ngp@$$w0rd',
DECRYPTION BY PASSWORD = '$tr0ngp@$$w0rd');
GO
-- Recreate the calling stored procedure in the third database
CREATE PROCEDURE ExecuteDatabaseB_Select_SomeTable
WITH EXECUTE AS OWNER
AS
EXECUTE b.dbo.Select_SomeTable
GO
-- Sign the procedure with the certificate's private key
ADD SIGNATURE TO OBJECT::[ExecuteDatabaseB_Select_SomeTable]
BY CERTIFICATE [SigningCertificate] 
WITH PASSWORD = '$tr0ngp@$$w0rd';
GO
-- Create the database user for the testuser login and grant it
-- the EXECUTE privilege on the calling stored procedure
CREATE USER [testuser] FOR LOGIN [testuser]
GO
GRANT EXECUTE ON [dbo].[ExecuteDatabaseB_Select_SomeTable] TO [testuser]
GO
-- Switch to the test login context
EXECUTE AS LOGIN = 'testuser'
EXECUTE c.dbo.ExecuteDatabaseB_Select_SomeTable
GO
-- Revert the context back to our sysadmin login
REVERT
GO

This all works as expected and we’ve resolved our problem.  This was a really interesting problem because I had never actually anticipated that someone would use the same certificate in multiple databases to sign multiple stored procedures that perform cross database operations without granting the calling users access to the target database.  This repro shows that this is technically possible if you backup the private key for the certificate and restore the private key in the additional databases with the certificate.

See you on the playground!

Two weeks ago at SQL Connections in Orlando, FL, I got to participate in a session that Paul and Kimberly do at the end of the conference titled, “Follow the Rabbit.”  The premise of the session is that Paul and Kimberly throw a big list of topics up on the screen and anyone in the audience can ask any question they have about one of those topics and Paul and Kimberly will try to answer it.  I wasn’t the only person to end up participating in this session answering questions, Maciej Pilecki another MVP who also recently passed the Microsoft Certified Masters Exam for SQL Server 2008 answered a number of questions as well.  One of my favorite questions that was asked during this session was “Does index fragmentation matter with SSD’s anymore?”  Paul’s answer to the question was very practical and dealt with the wasted space utilization that excessive index fragmentation can cause in a database that uses a uniqueidentifier column with newid() as the primary key, and given the cost per gigabyte for SSD storage this could be quite significant.  Paul pointed out a couple of other points that escape me, primarily because my mind immediately started churning ideas about how to test the impact that index fragmentation actually has on a database.

It should be no surprise that this blog post is going to use Extended Events, it has after all been my favorite feature in SQL Server since it was first released in an early CTP for SQL Server 2008.  One of the sessions that I presented at SQL Connections last month was on Extended Events and I made some pretty heavy use of the I/O related events as a part of that session.  I also made heavy usage of the I/O related events back in December for a couple of my XEvent a Day blog posts, so I knew that I could really get some detailed information back from the system about what kind of impact fragmentation really had on a database, and since I have a SSD in my laptop, which while not enterprise class is more than worthy of performing the tests on, I thought I’d give a whirl at figuring out how much impact fragmentation really had.

When I began working on this problem, I started out using Windows Server 2008R2 and SQL Server 2008R2, primarily because it is my standard VM platform for the Immersion Training, and for presentations and it was immediately available. I originally intended to use the I/O size information from the file_read_completed event to aggregate the I/O sizes being read from disk, and I soon learned that this was not available in SQL Server 2008 as I had originally thought.  I knew I had used it in the past so I went back to my XEvent a Day blog series and found An XEvent a Day (29 of 31) – The Future – Looking at Database Startup in Denali

Bummer!

I happened to have my old Denali CTP1 VM that I used to create that blog post on a external hard disk so I decided to make use of it for the investigation.  I had originally hoped to be able to create a reproducible demo that anyone could use in their SQL Server 2008 environment to see the impact of fragmentation, but since it is not possible this post will be based on the information available through SQL Server Denali instead.

The first thing that we will need is a database with two tables that have identical data in them; one fragmented and the other not. To create this, we’ll use a uniqueidentifier with newid() as the primary key for the first table and a uniqueidentifier with newsequentialid() as the primary key for the second table. Then we will load approximately 1000 pages worth of data into the first table and then copy that data into the second table, and rebuild the indexes on the second table to remove any residual fragmentation from it.

CREATE DATABASE FragmentationTest
GO
USE FragmentationTest
GO
CREATE TABLE GuidHighFragmentation
(UniqueID UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY,
 FirstName nvarchar(50) NOT NULL,
 LastName nvarchar(50) NOT NULL)
GO
 
CREATE NONCLUSTERED INDEX IX_GuidHighFragmentation_LastName
ON GuidHighFragmentation(LastName)
GO
 
CREATE TABLE GuidLowFragmentation
(UniqueID UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID() PRIMARY KEY,
 FirstName nvarchar(50) NOT NULL,
 LastName nvarchar(50) NOT NULL)
GO
 
CREATE NONCLUSTERED INDEX IX_GuidLowFragmentation_LastName
ON GuidLowFragmentation(LastName)
GO
 
INSERT INTO GuidHighFragmentation (FirstName, LastName)
SELECT TOP 1000
    a.name, b.name
FROM master.dbo.spt_values AS a
CROSS JOIN master.dbo.spt_values AS b
WHERE a.name IS NOT NULL 
    AND b.name IS NOT NULL
ORDER BY NEWID()
GO 70
 
INSERT INTO GuidLowFragmentation (FirstName, LastName)
SELECT FirstName, LastName
FROM GuidHighFragmentation
GO
 
ALTER INDEX ALL ON GuidLowFragmentation REBUILD

With our tables built, we can validate the fragmentation information by querying the sys.dm_index_physical_stats() DMF:

SELECT 
    OBJECT_NAME(ps.object_id),
    i.name,
    ps.index_id,
    ps.index_depth,
    avg_fragmentation_in_percent,
    fragment_count,
    page_count,
    avg_page_space_used_in_percent,
    record_count
FROM sys.dm_db_index_physical_stats(
        DB_ID(), 
        NULL, 
        NULL, 
        NULL, 
        'DETAILED') AS ps
JOIN sys.indexes AS i
    ON ps.object_id = i.object_id
        AND ps.index_id = i.index_id
WHERE index_level = 0

image

Next we will create our event session to capture the I/O events that are related to physical reads from disk.  The event session is being created using dynamic SQL so that each of the events has a predicate on the sqlserver.database_id for the current database.

-- Create an Event Session to investigate our IO operations
IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='FragmentationEffect')
    DROP EVENT SESSION [FragmentationEffect] ON SERVER;
DECLARE @sqlcmd nvarchar(4000) = '
CREATE EVENT SESSION FragmentationEffect
ON SERVER
ADD EVENT sqlserver.sql_statement_starting
( ACTION (sqlserver.sql_text)),
ADD EVENT sqlserver.sql_statement_completed
( ACTION (sqlserver.sql_text)),
ADD EVENT sqlserver.file_read
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlserver.file_read_completed
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlserver.physical_page_read
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlos.async_io_requested
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlos.async_io_completed
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+'))--,
ADD TARGET package0.asynchronous_file_target(
     SET filename=''C:\SQLskills\EE_FragmentationEffect.xel'',
         metadatafile=''C:\SQLskills\EE_FragmentationEffect.xem'')
WITH (MAX_MEMORY = 8MB, EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS, 
      TRACK_CAUSALITY = ON, MAX_DISPATCH_LATENCY=5SECONDS)'
EXEC (@sqlcmd)
GO

This event session is configured to have MAX_MEMORY set at 8MB to account for the partitioning of the memory buffers and to ensure that there is ample buffer space for the session to try and mitigate against the potential for event loss.  The EVENT_RETENTION_MODE for the session can not be configured for NO_EVENT_LOSS since the sqlserver.physical_page_read event is a part of the event session.  TRACK_CAUSALITY is turned ON for the event session so that correlation between events can be made to tie the I/O operations back to the statement that generated them, and the MAX_DISPATCH_LATENCY has been set at five seconds because I am impatient and don’t want to have to wait the default of thirty seconds for the events to be dispatched to the targets while doing a demo.

Prior to starting the event session, a manual CHECKPOINT will be issued against the database to flush any dirty pages from the buffer cache to disk, and then the buffer cache will be cleared using DBCC DROPCLEANBUFFERS to ensure that the test statements have to physically read the data from disk into cache.

-- Issue checkpoint to flush dirty buffers to disk
CHECKPOINT
GO
 
-- Clear the Buffer Cache to force reads from Disk 
DBCC DROPCLEANBUFFERS 
GO 

With this completed, we can now start our event session and run the same query against each table to force a scan of all the pages in the table from disk, then stop the event session so that our target file only has events associated with this specific test in it.

-- Start the Event Session
ALTER EVENT SESSION FragmentationEffect
ON SERVER
STATE=START
GO
 
-- Aggregate the data from both tables 
SELECT LastName, COUNT(*)
FROM GuidLowFragmentation
GROUP BY LastName
GO        
SELECT LastName, COUNT(*)
FROM GuidHighFragmentation
GROUP BY LastName
GO
 
-- Wait for the events to dispatch to the target
WAITFOR DELAY '00:00:10'
GO
 
-- Stop the Event Session
ALTER EVENT SESSION FragmentationEffect
ON SERVER
STATE=STOP
GO

Now that we have our data, we need to read it from the file and break down the event data into columns to simplify analysis of the information.  To do this, we’ll first load the event data as XML into a staging table.  Trying to parse the XML while reading from the file through the DMF is incredibly slow by comparison.  Then we can shred the XML into a tabular format based on the events being collected.

-- Drop Results tables if they exist
IF OBJECT_ID('FragmentationEffectResults') IS NOT NULL
    DROP TABLE FragmentationEffectResults 
GO
IF OBJECT_ID('FragmentationEffectResultsParsed') IS NOT NULL
    DROP TABLE FragmentationEffectResultsParsed 
GO
 
-- Create results table to load data from XE files
CREATE TABLE FragmentationEffectResults
(RowID int identity primary key, event_data XML)
GO
 
-- Load the event data from the file target
INSERT INTO FragmentationEffectResults(event_data)
SELECT
    CAST(event_data AS XML) AS event_data
FROM sys.fn_xe_file_target_read_file('C:\SQLskills\EE_FragmentationEffect*.xel', 
                                     'C:\SQLskills\EE_FragmentationEffect*.xem', 
                                     null, null)
GO
 
-- Parse the event data
SELECT 
    RowID,
    event_data.value('(event/@name)[1]', 'varchar(50)') AS event_name,
    DATEADD(hh, 
            DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
            event_data.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],
    COALESCE(event_data.value('(event/data[@name="database_id"]/value)[1]', 'int'), 
             event_data.value('(event/action[@name="database_id"]/value)[1]', 'int')) AS database_id,
    event_data.value('(event/data[@name="mode"]/text)[1]', 'nvarchar(4000)') AS [mode],
    event_data.value('(event/data[@name="file_handle"]/value)[1]', 'nvarchar(4000)') AS [file_handle],
    event_data.value('(event/data[@name="offset"]/value)[1]', 'bigint') AS [offset],
    event_data.value('(event/data[@name="page_id"]/value)[1]', 'int') AS [page_id],
    event_data.value('(event/data[@name="file_id"]/value)[1]', 'int') AS [file_id],
    event_data.value('(event/data[@name="file_group_id"]/value)[1]', 'int') AS [file_group_id],
    event_data.value('(event/data[@name="size"]/value)[1]', 'bigint') AS [size],
    event_data.value('(event/data[@name="wait_type"]/text)[1]', 'nvarchar(100)') AS [wait_type],
    event_data.value('(event/data[@name="duration"]/value)[1]', 'bigint') AS [duration],
    event_data.value('(event/action[@name="sql_text"]/value)[1]', 'nvarchar(4000)') AS [sql_text],
    event_data.value('(event/data[@name="cpu"]/value)[1]', 'int') AS [cpu],
    event_data.value('(event/data[@name="reads"]/value)[1]', 'bigint') AS [reads],
    event_data.value('(event/data[@name="writes"]/value)[1]', 'bigint') AS [writes],
    CAST(SUBSTRING(event_data.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 1, 36) AS uniqueidentifier) as activity_id,
    CAST(SUBSTRING(event_data.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 38, 10) AS int) as event_sequence
INTO FragmentationEffectResultsParsed
FROM FragmentationEffectResults
ORDER BY Rowid

This is where the real fun begins.  There are too many events in results set to be useful in raw form, so what I decided to do was to locate the activity_id from TRACK_CAUSALITY for the sqlserver.sql_statement_starting events for both of the tables, and store those into variables for use in querying off the results set.  The first thing I looked at was how many times did each of the events actually fire for each of the tables.

DECLARE @FragmentationHighActivityID varchar(50),
        @FragmentationLowActivityID varchar(50)
        
SELECT @FragmentationHighActivityID = activity_id
FROM FragmentationEffectResultsParsed
WHERE event_name = 'sql_statement_starting'
  AND sql_text LIKE '%GuidHighFragmentation%'
 
SELECT @FragmentationLowActivityID = activity_id
FROM FragmentationEffectResultsParsed
WHERE event_name = 'sql_statement_starting'
  AND sql_text LIKE '%GuidLowFragmentation%'
 
-- Aggregate the results by the event name and file_id
SELECT event_name, 
    CASE activity_id
        WHEN @FragmentationHighActivityID THEN 'GuidHighFragmentation'
        WHEN @FragmentationLowActivityID THEN 'GuidLowFragmentation'
        ELSE 'UNKNOWN'
    END as QueryTable, 
    COUNT(*) AS occurences
FROM FragmentationEffectResultsParsed
WHERE activity_id IN (@FragmentationHighActivityID, @FragmentationLowActivityID)
  AND event_name IN ('async_io_completed', 'async_io_requested', 
        'file_read', 'file_read_completed', 'physical_page_read')
GROUP BY event_name, activity_id
ORDER BY activity_id, event_name

image

Impressive!  SQL Server had to do ten times the I/O operations against the fragmented table than it did against the non-fragmented table.  You might notice in the output that the number of physical_page_read events is larger than the number of actual pages in the indexes for the table.  Yes and no, I restricted the output of sys.dm_db_index_physical_stats to just level 0 of the indexes, or the leaf level.  There are two additional levels in the nonclustered indexes being scanned but they only account for nine additional pages for the fragmented index and five additional pages for the non-fragmented index.  The other pages are from the system metadata tables which have to be read by the engine as well, which can be confirmed by reading the parsed event data from the dbo.FragmentationEffectResultsParsed table.

Where the impact really gets interesting is when we take a look at the I/O sizes for file_read_completed events and aggregate the results based on the table and the I/O size being read from disk using the same activity_id’s from the previous query for aggregation.

-- Aggregate the file_read_completed events by the IO size and table
SELECT
    CASE activity_id
        WHEN @FragmentationHighActivityID THEN 'GuidHighFragmentation'
        WHEN @FragmentationLowActivityID THEN 'GuidLowFragmentation'
        ELSE 'UNKNOWN'
    END as QueryTable, 
    size/1024 as read_size_kb, 
    size/1024/8 as read_size_pages, 
    COUNT(*) AS occurences
FROM FragmentationEffectResultsParsed
WHERE activity_id IN (@FragmentationHighActivityID, @FragmentationLowActivityID)
  AND event_name IN ('file_read_completed')
GROUP BY database_id, file_id, size, activity_id
ORDER BY QueryTable, size desc
GO

image

I suspected that fragmentation would impact the ability of the database engine to do large sequential reads from the table index while scanning it, but I was not expecting the impact to be as bad as it actually was.  I ran this test numerous times to isolate out if it was some kind of a fluke and the results are consistently repeatable within a relatively low margin of change.  The smallest read size for the non-fragmented index was 64KB where the fragmented table did a majority of its I/O using 64KB reads or smaller.

Now I know what you are thinking, the non-fragmented index in this case is smaller than the fragmented one so it should do less total I/O.  You’d be right to think that, but keep in mind that they both contain the same number of records, demonstrating the wasted space that Paul talked about in the session at SQL Connections.  However, not to worry, we can fix the non-fragmented table and add additional records to it to push the index size over the size of the fragmented one in page count.

-- Add additional data to bring page count higher than the GuidHighFragmentation table!
INSERT INTO GuidLowFragmentation (FirstName, LastName)
SELECT TOP 10000
    a.name, b.name
FROM master.dbo.spt_values AS a
CROSS JOIN master.dbo.spt_values AS b
WHERE a.name IS NOT NULL 
    AND b.name IS NOT NULL
ORDER BY NEWID()
GO 10
 
-- Rebuild the indexes to remove any fragmentation
ALTER INDEX ALL ON GuidLowFragmentation REBUILD
GO

If we go back and rerun our query against sys.dm_db_index_physical_stats the non-fragmented index will have an additional 100000 records and just over 1000 pages.

image

Now we can delete the files from our original test, execute CHECKPOINT and flush the buffer cache and rerun the tests to see what affect more than double the index pages has on the non-fragmented indexes I/O operations.

image

For the event counts, even with more pages in the index, the non-fragmented table still incurs significantly less I/O operations.

image

For the I/O size, the smallest IO performed for the non-fragmented table is still 64KB, and there was a increase in reads at 256KB and 512KB sizes. 

While SSD’s might reduce the IO latency for random read operations, they don’t negate the need to continue to adhere to proper design principals for primary keys, fill factor on indexes, and index maintenance.  The actual runtimes of the two tests were nearly identical, and often flipped back and forth between which one took a few milliseconds longer than the other to complete for this demo.  However, keep in mind that these tables were less than 16MB data total until the non-fragmented table had additional records added to it.  Even at larger data sizes the performance of the two scans will be close enough that the average end user wouldn’t notice the difference, but under the covers the number of I/O operations being performed is significantly different.  One thing to consider is that an SSD while faster still has an upper limit to the number of I/O operations per second (IOPS) that it can perform.  Your average database might not be able to push that limit even with heavy fragmentation, but with the cost of SSD’s still at a premium I wouldn’t waste any of what was available if I had them in my server.

So to wrap up, the answer to the question “Does index fragmentation matter with SSD’s?”  It Depends on if you care about wasting space and wasting I/O more than anything else, but YES it still matters and it still has an impact.

One note about this post:

The demo for this post works with SQL Server 2008 and 2008 R2 with the exception that you don’t get the I/O size back from the file_read_completed event.  You can still see the impact that fragmentation has on the number of I/O operations being performed using the demo exactly as provided from this blog post with SQL Server 2008 and 2008 R2.  I chose to go the Denali route because the impact to the I/O size is quite interesting IMO.

See you on the playground!

When Microsoft announced that changes were being made to the Microsoft Certified Masters program for SQL Server 2008 last year I was initially pretty bummed out. I had been hoping to be able to attend one of the onsite MCM rotations at Microsoft. I wasn’t looking forward to the expense associated with that, but I was really interested in the training that was a part of the program more than anything else. When the new MCM exams became available at PASS Summit, I immediately signed up to take the written exam, initially on November 19th, and then due to some technical problems in Orlando, I ended up having to reschedule for January 21st. I found out last week that I passed the written portion of the certification, and immediately scheduled the Lab Exam for this morning, and I found out this afternoon that I had passed the Lab Exam. There has been a lot of discussion about the MCM exams and what someone might have to do to be able to pass them, so in this post I’d like to share my thoughts based on my experiences.

The Written Exam

Out of the two exams you have to take I’d say that the written exam was the worst of the two. Not because it was more difficult in level of content, but because it is multi-guess and those types of exams stress me out. When you arrive at the Prometric testing center, the only two items you should take into the building with you are 2 forms of ID and your car keys. Speaking of the two forms of ID, make sure that you read the requirements well; only one has to have a photo on it, but both have to have your name and a signature. When you walk into the testing center, they check your two ID’s, your keys go into a locker, you have to turn all your pockets inside out to prove there is nothing in them before you can go back to the testing center where they validate your information again, and then told you that you can put put your pockets back where they belong. From there, the test is like any other Microsoft written exam you’ve ever taken; only more intimidating. 

I didn’t feel that the questions asked were overly difficult; they fit inside of the concepts that I was expecting to be on the exam. To be honest I’ve worked harder problems on the MSDN Forums before, but there I don’t have a list of possible answers to choose from and I can always ask for more information before deciding what the correct way for someone to fix a problem is.  The problem with multiple-guess technology exams is that there are more incorrect answers than there are correct ones, and at least to me a lot of times multiple answers seem like they are correct answers and that gets me second guessing myself. I left the written exam uncertain about whether I had passed it or not, and I was feeling like I might have ended up on the wrong side of things.

If you plan to take the written exam, read everything that’s on the SQL Server 2008 MCM Pre-reading and the MCM: SQL Server online training lists, and understand the core concepts behind the subjects. When that list was originally published, I was surprised at how little of the material I hadn’t already read at some point in the past.  A lot of it was read trying to understand a problem someone had asked about on the forums.  Someone today asked me if I’d blog about how I studied and prepared for the exams. I’d have to say that my studying and preparation began four years ago when I started answering questions on the forums, and has continued ever since. People do all kinds of stuff with SQL Server that I would never dream of trying or doing.  They also get to upgrade and play with newer hardware and configurations faster than I ever will be able to, so when something new pops up as a problem, for instance the Power Management default in Windows Server 2008 R2 being Balanced cutting CPU speed in half, I’ve generally seen it by the time I am making similar changes in my own environments.

The Lab Exam

The Lab Exam for the MCM was AWESOME! I really excel at hands on work. And, generally you know when/if you got a question right by the results of the operations being performed. Taking the Lab Exam this morning was like having a week’s worth of the random stuff that pops up at work to make it a bad day, crammed into five and half hours. The exam is pretty much yours to run as you see fit, and I bounced all over the place the first half hour until I had resolved a order of priority for the questions based on what topics I knew the best to what I knew the least.  You do have access to the Books Online inside of the lab environment, but unless it’s just a quick look up and you know exactly what you are going after, it is going to do you absolutely no good.  You actually have to know SQL Server at a 400+ level and you have to know what you are doing from past hands-on experience so that it’s almost second nature.

I spent the last few nights moonlighting trying to watch as many of the MCM: Microsoft SQL Server 2008 Microsoft readiness videos as I could, and I even watched most of them at 1.4 to 1.6 times the regular speed.  If you think Paul Randal is hard to keep up with when he talks normally, give that a shot, you’ll have a whole new perspective the next time you listen to him speak in real life.  I actually don’t recommend that you watch the training videos at a higher speed if you are trying to actually learn from them. I did this because I found that I already knew a majority of the content and I was just trying to catch the stuff I didn’t know in a compressed amount of time.  Watching the videos alone won’t get you past the exam; you have to be able to apply the information covered in the videos and have real practical experience.  Also, you can’t know everything. If I had practical experience in every aspect of SQL Server, the last hour of my lab exam wouldn’t have been quite as stressful as it was. But, don’t worry; you don’t have to get every answer right to pass. Microsoft knows that even masters don’t know everything too!

The Database Compression feature in SQL Server 2008 Enterprise Edition can provide some significant reductions in storage requirements for SQL Server databases, and in the right implementations and scenarios performance improvements as well.  There isn’t really a whole lot of information about the operations of database compression that is documented as being available in the DMV’s or SQL Trace.  Paul Randal pointed out on Twitter today that sys.dm_db_index_operational_stats() provides the page_compression_attempt_count and page_compression_success_count available.  Beyond that the only other documented information for monitoring Data Compression are the Page Compression Attempts/sec and Pages Compressed/sec Performance Counters of the SQL Server:Access Methods object in Perfmon (http://msdn.microsoft.com/en-us/library/cc280449.aspx). 

There is one thing in common about the documented methods of monitoring Data Compression, and that is they all only deal with Page compression, and not Row compression, and in Extended Events we find the same commonality as there are no Row compression Events in Extended Events.  There are two Page compression Events in Extended Events; sqlserver.page_compression_attempt_failed and sqlserver.page_compression_tracing.  These two Events can be used to track Page compression operations at multiple levels, including database, object, index, and even down to the individual page. The sqlserver.page_compression_tracing Event provides Start and End tracing of Page compression operations inside of the Database Engine and returns the database_id, index_id, rowset_id, page_id, and duration of the compression operation.  The sqlserver.page_compression_attempt_failed is really poorly named, and doesn’t provide information about failures in the sense that something broke, but provides information for why a page compression attempt did not actually change the compression of the data in the page.  It also returns the database_id, index_id, rowset_id, and page_id for the compression attempt, and it also includes a failure_reason column which correlates to the page_compression_failure_reason Map Value.

-- Get the payload information for the Events 
SELECT 
    object_name, 
    column_id, 
    name, 
    type_name
FROM sys.dm_xe_object_columns
WHERE object_name IN ('page_compression_tracing', 
                      'page_compression_attempt_failed')
  AND column_type = 'data'

image

To demonstrate how these Events function, I am going to use the LineItem table from the TPC-H Benchmark that was created by Quest Benchmark Factory using Level 2 for the table sizing, which makes the table just at 1.8GB in size.  All of the indexes on the table will be rebuilt using PAGE compression, and then 10,000 rows will be added to the table.  To setup the environment, first load the TPC-H LineItem table with the appropriate seed of data, this can be done with the free trial version of Benchmark Factory.  Then rebuild all of the indexes on the LineItem table using PAGE compression, and review the PAGE compression statistics from sys.dm_db_index_operational_stats for the database and object.

USE [TPCH]
GO
-- Rebuild the indexes with Page compression 
ALTER INDEX ALL ON dbo.H_Lineitem REBUILD WITH (DATA_COMPRESSION = PAGE)
GO
-- Look at the compression information in sys.dm_db_index_operational_stats
SELECT 
    database_id, 
    object_id, 
    index_id, 
    page_compression_attempt_count, 
    page_compression_success_count,
    (page_compression_attempt_count - page_compression_success_count) as page_compression_failure_count
FROM sys.dm_db_index_operational_stats(db_id('TPCH'), object_id('H_Lineitem'), null, null)
GO

image

Once the table and its indexes have been rebuilt using PAGE compression, we can then create our Event Session, start it, and add 10,000 rows to the LineItem table.  After we add the rows, we can then check the page compression statistics in sys.dm_db_index_operational_stats, and drop our Event Session from the server.

-- Create an Event Session to Track the Failed attempts
CREATE EVENT SESSION PageCompressionTracing
ON SERVER
ADD EVENT sqlserver.page_compression_attempt_failed,
ADD EVENT sqlserver.page_compression_tracing
ADD TARGET package0.asynchronous_file_target(
     SET filename='C:\SQLBlog\PageCompressionTracing.xel',
         metadatafile='C:\SQLBlog\PageCompressionTracing.xem')
WITH (MAX_MEMORY = 8MB, EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS, MAX_DISPATCH_LATENCY=5SECONDS)
GO
-- Start the Event Session
ALTER EVENT SESSION PageCompressionTracing
ON SERVER
STATE=START
GO
-- Insert 10000 rows into the H_Lineitem table
INSERT INTO H_Lineitem
    (l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, 
     l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, 
     l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, 
     l_comment)
SELECT TOP 10000 
     l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, 
     l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, 
     l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, 
     l_comment
FROM H_Lineitem
GO
-- Look at the compression information in sys.dm_db_index_operational_stats
SELECT 
    database_id, 
    object_id, 
    index_id, 
    page_compression_attempt_count, 
    page_compression_success_count,
    (page_compression_attempt_count - page_compression_success_count) as page_compression_failure_count
FROM sys.dm_db_index_operational_stats(db_id('TPCH'), object_id('H_Lineitem'), null, null)
GO
-- Drop the Event Session
DROP EVENT SESSION PageCompressionTracing
ON SERVER
GO

image

Now we can parse the Events that were captured by our Event Session and compare the information presented by sys.dm_db_index_operational_stats() with what was collected by Extended Events.

-- Create our result Analysis database
CREATE DATABASE [PageCompTestResults]
GO
USE [PageCompTestResults]
GO
-- Create intermediate temp table for raw event data
CREATE TABLE RawEventData
(Rowid int identity primary key, event_data xml)
GO
-- Read the file data into intermediate temp table
INSERT INTO RawEventData (event_data)
SELECT
    CAST(event_data AS XML) AS event_data
FROM sys.fn_xe_file_target_read_file('C:\SQLBlog\PageCompressionTracing*.xel', 
                                     'C:\SQLBlog\PageCompressionTracing*.xem', 
                                     null, null)
GO
-- Fetch the Event Data from the Event Session Target
SELECT 
    RowID,
    event_data.value('(event/@name)[1]', 'varchar(50)') AS event_name,
    DATEADD(hh, 
            DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
            event_data.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],
    COALESCE(event_data.value('(event/data[@name="database_id"]/value)[1]', 'int'), 
             event_data.value('(event/action[@name="database_id"]/value)[1]', 'int')) AS database_id,
    event_data.value('(event/data[@name="file_id"]/value)[1]', 'int') AS [file_id],
    event_data.value('(event/data[@name="page_id"]/value)[1]', 'int') AS [page_id],
    event_data.value('(event/data[@name="rowset_id"]/value)[1]', 'bigint') AS [rowset_id],
    event_data.value('(event/data[@name="failure_reason"]/text)[1]', 'nvarchar(150)') AS [failure_reason],
    event_data.value('(event/action[@name="system_thread_id"]/value)[1]', 'int') AS [system_thread_id],
    event_data.value('(event/action[@name="scheduler_id"]/value)[1]', 'int') AS [scheduler_id],
    event_data.value('(event/action[@name="cpu_id"]/value)[1]', 'int') AS [cpu_id]
INTO ParsedResults
FROM RawEventData
GO

After parsing out the data, we can begin to really leverage the information we’ve gathered.  If we join the ParsedResults table to sys.partitions for our TPCH database by rowset_id = hobt_id, we can get the object_id and index_id and aggregate the failure reasons up to the object and index level.

SELECT 
    pr.database_id, 
    p.object_id, 
    p.index_id,
    failure_reason, 
    COUNT(*) as failure_count
FROM TPCH.sys.partitions p
JOIN ParsedResults pr
    ON pr.rowset_id = p.hobt_id
WHERE event_name = 'page_compression_attempt_failed'
GROUP BY     pr.database_id, 
    p.object_id, 
    p.index_id,
    failure_reason
GO
-- Look at the compression information in sys.dm_db_index_operational_stats
SELECT 
    database_id, 
    object_id, 
    index_id, 
    page_compression_attempt_count, 
    page_compression_success_count,
    (page_compression_attempt_count - page_compression_success_count) as page_compression_failure_count
FROM sys.dm_db_index_operational_stats(db_id('TPCH'), object_id('H_Lineitem'), null, null)
GO

image

With this we can se that the Extended Events sqlserver.page_compression_attempt_failed Event tracks failures and attempts that are not counted in sys.dm_db_index_operational_stats().  The PageModCountBelowThreshold failure isn’t really a failed attempt at compression.  This reason shows that the page was evaluated for recalculation, and the modified counter for the page hadn’t passed the internal threshold for recalculation so the actual compression operation wasn’t performed.  If we look at the sqlserver.page_compression_tracing Event information, we can see how the numbers begin to come together to match what is output by sys.dm_db_index_operational_stats().

SELECT 
    pr.database_id, 
    p.object_id, 
    p.index_id,
    COUNT(*) as attempt_count
FROM TPCH.sys.partitions p
JOIN ParsedResults pr
    ON pr.rowset_id = p.hobt_id
WHERE event_name = 'page_compression_tracing'
  AND opcode = 'Begin'
GROUP BY     pr.database_id, 
    p.object_id, 
    p.index_id
GO
-- Look at the compression information in sys.dm_db_index_operational_stats
SELECT 
    database_id, 
    object_id, 
    index_id, 
    page_compression_attempt_count, 
    page_compression_success_count,
    (page_compression_attempt_count - page_compression_success_count) as page_compression_failure_count
FROM sys.dm_db_index_operational_stats(db_id('TPCH'), object_id('H_Lineitem'), null, null)
GO

image

We have 193 attempts by this Event, and we have 72 PageModCountBelowThreshold failures, matching our actual attempts of 121 from the DMF.  We can then subtract out the other failures and get the 93 successful operations matching the DMF as well.

On my blog post Virtualizing SQL on VMware Reference List, Oscar Zamora (Blog | Twitter) asked the following question in a comment:

As a virtualized instance has the benefit of "failing over" to another physical box, would you consider clustering a virtualized instance?

The answer to this question more than I want to write up in a comment, so I decided instead to blog my response.  Since at least ESX 3, VMware has provided a number of high availability features in their enterprise server virtualization product.  A detailed listing of the HA features available in VSphere 4 can be found in the vSphere Availability Guide.
In this post I’ll talk about the most popular ones and the ones that VMware marketing uses to try to convince people that VMware HA solves all of their High Availability needs.

High Availability and DRS Clusters

ESX hosts can be clustered together providing high availability from the hardware failure of a host for the guests running inside of the cluster.  If a host fails, the VM’s that were running on that host fail as well, but the cluster detects this and powers the VM’s up on other hosts inside of the cluster automatically, bringing the VM’s back online and restoring application serviceability.  Other features such as Server vMotion and Storage vMotion allow movement of the VM’s and storage dynamically to allow for hardware maintenance and upgrades with little to no downtime to the applications running on the VM’s. In addition to this the VMware Tools installed inside of the guests can provide monitoring of the guest to detect Operating System failures and lock ups inside of the VM and automatically restart the VM.

VMware High Availability: Easily Deliver High Availability for All of Your Virtual Machines
VMware High Availability: Concepts, Implementation, and Best Practices

Fault Tolerance

VSphere 4 introduced a new high availability feature for VM guests called Fault Tolerance.  Fault Tolerance creates a synchronized Secondary virtual machine on another host in the high availability cluster that is lock stepped with the Primary VM.  In the event of a host failure, guests that have Fault Tolerance enabled immediately failover to their Secondary in a manner that is similar to vMotion preventing application downtime from occurring.  When this occurs a new Secondary is created on another host inside of the cluster and synchronized with the new primary maintaining the fault tolerance of the guest inside of the environment.

VMware vSphere™ 4 Fault Tolerance: Architecture and Performance
VMware® Fault Tolerance Recommendations and Considerations on VMware vSphere™ 4
Protecting Mission-Critical Workloads with VMware Fault Tolerance

What does this all mean to SQL Server?

These features are really great features provided by virtualization, but that doesn’t make them the solution to all of your High Availability needs.  SQL Server is often considered and treated by server administrators as just another application server, especially when it comes to virtualization.  However, SQL Server is not just another application and SQL Server provides its own High Availability options, like clustering, that may be more appropriate based on your environmental requirements.  There are specific reasons that the above features may not be acceptable HA features for SQL Server. 

Fault Tolerance is currently limited to single vCPU guest VM’s only, so unless your SQL Server VM’s are all single vCPU, that’s not going to help you out.  This leaves you with VMware HA and the potential for failure with automatic restart on another host.  However, what none of the VMware features provides is minimization of downtimes associated with planned Windows Updates of the guest VM’s or the application of SQL Server Service Packs.  If you have the ability to take periodically planned downtimes of one to two hours for patching the basic HA features of VMware will probably meet your needs.  This may be an acceptable configuration in your environment, and if it is, I won’t fault you for deciding that it meets your high availability needs.  I have VM’s in production that are protected first, by good database backups, and then by VMware HA. 

However, I also have systems that have minimal downtime requirements, and because of this, relying on VMware HA as my primary HA solution doesn’t provide the level of availability required for those systems.  This leads to SQL Server clustering, which is supported in virtualization if it meets specific requirements as documented on the Support policy for Microsoft SQL Server products that are running in a hardware virtualization environment and in Bob Wards blog post SQL Server Support Policy for Failover Clustering and Virtualization gets an update.  This brings us to the question that started this discussion: “Would I consider clustering a virtualized instance?”  Maybe, if the host environment had the resources to support it, but using VM’s for my cluster nodes wouldn’t be my first choice for clustering for a couple of reasons.

First if a database has a minimal downtime requirement it probably also has a minimum performance requirement that is coupled with it.  While you can, and should, setup reservations for the resources allocated to a SQL Server VM, in NUMA enabled hosts like newer Nahalem systems, the maximum recommended size of a single VM is the resources available in a single NUMA node.  That means if you have a quad socket quad core server with 128GB RAM with 4 NUMA nodes, on per socket, each node would have 4 cores and 32GB RAM, making the largest VM 4 vCPU and 32GB RAM, unless you trade off the NUMA optimizations in ESX and memory locality.  In addition to this, you have to be careful where your SQL Server VM cluster nodes exist inside of the VMware host cluster to actually maintain the high availability of the SQL Server cluster.  If both of the SQL Server VM cluster nodes exist on the same physical host, and that host fails, both of the SQL Server VM cluster nodes are going to fail as well.  On top of these reasons, configuration of the VM guests for MSCS is not a trivial process as shown by the 36 page whitepaper Setup for Failover Clustering and Microsoft Cluster Service.

In my current environment we have multiple clustered SQL Server instances, and none of them are virtualized and at the current time we are planning to build additional clustered SQL Server instances to migrate databases that have high availability and minimal downtime requirements to.  Despite having a dedicated VMware host cluster for our SQL VM environment comprised of very powerful hardware, there isn’t any added benefit to building the clusters inside of virtual machines.  Your environment may be different and clustering inside of virtual machines may make sense for your specific requirements, but in the end it is the database downtime requirements that should determine whether or not the database should be clustered, not that VMware provides built in High Availability, because that only covers one aspect of minimizing downtime.

While working on a problem today I happened to think about what the impact to startup might be for a really large tempdb transaction log file.  Its fairly common knowledge that data files in SQL Server 2005+ on Windows Server 2003+ can be instant initialized, but the transaction log files can not.  If this is news to you see the following blog posts:

Kimberly L. Tripp | Instant Initialization - What, Why and How?
In Recovery... | Misconceptions around instant file initialization
In Recovery… | Search Engine Q&A #24: Why can't the transaction log use instant initialization?
Tibor Karaszi : Do you have Instant File Initialization?

The thought occurred to me today that despite having log files 8GB in size for tempdb, I’ve never really noticed that it takes that long for SQL Server to startup.  So I jumped on twitter and shot a tweet out to Paul Randal (Blog | Twitter) and I also included the #sqlhelp hash tag to see what others in the community thought.  I got a couple of comments, one linking me to Paul Randal’s blog post, another saying test it, and another saying the transaction log is always zero initialized.

When I got home tonight I was still thinking about this and as I went to go test it, I remembered that I had read a blog post once about a trace flag that would output information about zero file initialization.  A quick search and I found the post on the Premier Field Engineers Blog.  So I jumped on a test system I have and added the –T3004 and –T3605 trace flags to the startup parameters.  Since the PFE blog provided the following disclaimer about using these trace flags so will I.

WARNING: These trace flags should be used under the guidance of Microsoft SQL Server support. They are used in this post for discussion purposes only and may not be supported in future versions.

After setting the flags I checked tempdb and it was currently setup with a 4GB transaction log file.  With this information in hand I restarted the instance and once it was online I opened the ErrorLog to look at what –T3004 could tell me about tempdb log file initialization.

2010-05-13 18:42:13.52 spid12s     Clearing tempdb database.
<……….skipped content…………>
2010-05-13 18:42:30.93 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 2 to 17 (0x4000 to 0x22000)
2010-05-13 18:42:30.93 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf
2010-05-13 18:42:30.93 spid12s     Starting up database 'tempdb'.
2010-05-13 18:42:30.96 spid12s     FixupLogTail(progress) zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from 0x5000 to 0x6000.
2010-05-13 18:42:30.96 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 3 to 483 (0x6000 to 0x3c6000)
2010-05-13 18:42:30.96 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf

Hmm, so the log file is zeroed for tempdb, but it doesn’t take a rocket scientist to notice that the page counts being zeroed out (17-2=15 total pages) don’t add up to 4GB of space.  To check this, I created a new user database with a 8GB data file, and a 4GB log file to see the output for zeroing out a 4GB transaction log.

2010-05-13 18:45:42.61 spid54      Zeroing L:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\ZeroLog_log.ldf from page 0 to 524288 (0x0 to 0x100000000)
2010-05-13 18:46:20.92 spid54      Zeroing completed on L:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\ZeroLog_log.ldf
2010-05-13 18:46:36.35 spid54      Starting up database 'ZeroLog'.
2010-05-13 18:46:36.36 spid54      FixupLogTail(progress) zeroing L:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\ZeroLog_log.ldf from 0x5000 to 0x6000.
2010-05-13 18:46:36.36 spid54      Zeroing L:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\ZeroLog_log.ldf from page 3 to 483 (0x6000 to 0x3c6000)
2010-05-13 18:46:36.37 spid54      Zeroing completed on L:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\ZeroLog_log.ldf

So a user database with a 4GB log file will zero out 524288 pages which is 4GB of space.  I got a private message on twitter about the topic from Remus Rusanu (Blog | Twitter), telling me that the entire file isn’t initialized at startup, but if you use ALTER DATABASE to grow the size of the tempdb log, the space you grow by will be zero initialized entirely, so to test that I grew the log file out to 8GB in size:

ALTER DATABASE [tempdb] MODIFY FILE ( NAME = N'templog', SIZE = 8388608KB )

When I ran this, it returned completed immediately, so I jumped over to the ErrorLog to pull the results and was momentarily confused by what I got back. 

2010-05-13 19:20:24.02 spid57      Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 524288 to 524319 (0x100000000 to 0x10003e000)
2010-05-13 19:20:24.02 spid57      Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf

It only initialized 31 pages??  Can that be correct?  So I went back and checked my log files actual size and I remembered this little problem mentioned by Kimberly Tripp on her blog post Kimberly L. Tripp | Transaction Log VLFs - too many or too few? that occurs when you grow the file in 4GB increments.  So I reran the ALTER DATABASE statement and this time it took a minute for the command to complete.  Much better, now I know we did some zeroing out of the file, and the ErrorLog output proved it:

2010-05-13 19:21:17.47 spid57      Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 524319 to 1048576 (0x10003e000 to 0x200000000)
2010-05-13 19:21:49.54 spid57      Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf

Much better, so now the begging question is, why isn’t the log file zero initialized in its entirety at startup?  I can’t speak with complete certainty on this, but I would guess that it has to do with the fact that the tempdb transaction log is never used for crash recovery, so it doesn’t really matter that the space isn’t zero initialized.  Paul Randal explains in his blog post, In Recovery… | Search Engine Q&A #24: Why can't the transaction log use instant initialization?, how the parity bits are used during crash recovery to identify where recovery should stop processing log records.  Perhaps full zero initialization is skipped for the tempdb log at startup because the log is never used for crash recovery, but that doesn’t explain why the log, when grown, does perform full zero initialization, unless it is due to the way that the log can wrap around, for example based on the last image above (VLF Usage After Log Reuse), if the log space continues to be used without truncation when the log gets back to FSeqNo 29, the log will have to grow causing the allocation to become non-sequential since FSeqNo 30-35 are still active.  Maybe someone else will explain the reason why the tempdb log has to be zero initialized when grown but not at startup better.

EDIT:

After posting this, I jumped back over to twitter and saw some interesting comments from Brent Ozar (Blog | Twitter) that made me go back and test the impact of deleting the tempdb files from disk and then starting the instance up.  When I did this, the entire file was zero initialized

2010-05-13 20:21:06.21 spid12s     Clearing tempdb database.
2010-05-13 20:21:06.23 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 0 to 1048576 (0x0 to 0x200000000)
<……….skipped content…………>
2010-05-13 20:22:22.81 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf
2010-05-13 20:22:22.84 spid12s     Starting up database 'tempdb'.
2010-05-13 20:22:22.85 spid12s     FixupLogTail(progress) zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from 0x5000 to 0x6000.
2010-05-13 20:22:22.85 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 3 to 483 (0x6000 to 0x3c6000)
2010-05-13 20:22:22.86 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf

So the answer is YES the tempdb transaction log is completely initialized when it is first physically created, but after that, its not zero initialized entirely as the instance starts up.  There is a definite difference in startup times on my server when I deleted the files. 

With Files

2010-05-13 18:42:12.54 Server      Microsoft SQL Server 2008 (SP1) - 10.0.2757.0 (X64)
2010-05-13 18:42:31.04 spid8s      Recovery is complete. This is an informational message only. No user action is required.

Without Files

2010-05-13 20:21:05.49 Server      Microsoft SQL Server 2008 (SP1) - 10.0.2757.0 (X64)
2010-05-13 20:22:22.97 spid9s      Recovery is complete. This is an informational message only. No user action is required.

When the files preexist, instance startup only took 19 seconds, without the files it took 1 minute and 17 seconds.  This actually makes the zero initialization during log growth make sense.  I’d say a fair bit of my misunderstanding of this is the way tempdb is often referred to be as being recreated at restart.  Its not actually recreated based on these tests, but it is cleared, as show by the “Clearing tempdb database.” log entries.  Interesting stuff.

ANOTHER EDIT:

So after thinking about this some more, curiosity got the best of me and I wanted to know what would happen if the size of the tempdb log file on disk was different from the size configured for tempdb.  To test this, I used ALTER DATABASE to change the size of the log file from 8GB to 1GB and restarted SQL Server.  This reduced the size of the log file on startup from 8GB to 1GB and the log showed that it didn’t zero initialize the entire file:

2010-05-13 21:20:45.92 spid12s     Clearing tempdb database.

<……….skipped content…………>
2010-05-13 21:21:02.59 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 2 to 17 (0x4000 to 0x22000)
2010-05-13 21:21:02.59 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf
2010-05-13 21:21:02.62 spid12s     Starting up database 'tempdb'.
2010-05-13 21:21:02.63 spid12s     FixupLogTail(progress) zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from 0x5000 to 0x6000.
2010-05-13 21:21:02.63 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 3 to 483 (0x6000 to 0x3c6000)
2010-05-13 21:21:02.64 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf

In addition to this, the startup time was fast, taking only 17 seconds to complete recovery of the instance.

2010-05-13 21:20:45.23 Server      Microsoft SQL Server 2008 (SP1) - 10.0.2757.0 (X64)
2010-05-13 21:21:02.78 spid9s      Recovery is complete. This is an informational message only. No user action is required.

So I shut down SQL Server and renamed the log file to templog_small.ldf and then restarted SQL.  Once recovery completed, I grew the transaction log back to 8GB and once again shutdown SQL Server.  Then I renamed the current 8GB log file to templog_big.ldf and renamed templog_small.ldf to templog.ldf, replacing the 8GB log file with a 1GB log file.  SQL Server recognized the change during startup, and once again zero initialized the entire log.

2010-05-13 21:25:11.72 spid12s     Clearing tempdb database.
2010-05-13 21:25:11.73 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 0 to 1048576 (0x0 to 0x200000000)
<……….skipped content…………>
2010-05-13 21:26:17.95 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf
2010-05-13 21:26:28.05 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 2 to 17 (0x4000 to 0x22000)
2010-05-13 21:26:28.05 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf
2010-05-13 21:26:28.07 spid12s     Starting up database 'tempdb'.
2010-05-13 21:26:28.09 spid12s     FixupLogTail(progress) zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from 0x5000 to 0x6000.
2010-05-13 21:26:28.09 spid12s     Zeroing T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf from page 3 to 483 (0x6000 to 0x3c6000)
2010-05-13 21:26:28.10 spid12s     Zeroing completed on T:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Data\templog.ldf

This once again impacted the time required to recover the instance.

2010-05-13 21:25:11.01 Server      Microsoft SQL Server 2008 (SP1) - 10.0.2757.0 (X64)
2010-05-13 21:26:28.23 spid9s      Recovery is complete. This is an informational message only. No user action is required.

So that leaves me asking, does SQL Server really recreate tempdb from the model database every time it starts?  According to KB Article 307487 :

“When SQL Server starts, the tempdb is re-created by using a copy of the model database and is reset to its last configured size.”

It would certainly seem as if this statement is wrong based on testing.  What I find the most interesting is if I reverse the process and replace a 1GB log file with a previously created 8GB log file, SQL Server doesn’t zero initialize the 8GB file, it just shrinks it back to 1GB.  It only performs the zero initialization as a part of having to grow the preexisting log file during instance startup.

A good while ago I was asked if it was possible to consistently produce a parallel plan by Tom LaRock(blog/twitter), and I recalled from reading Grant Fritchey’s(blog/twitter) book on Execution Plans that it was possible to do this by adjusting the ‘cost threshold for parallelism’ sp_configure option from the default of five to a value of one which will make even simple queries run in parallel if you have multiple logical processors. 

This got me think about this option and wondering if the default is still applicable in today’s servers and database environments.  If you think about when the default values for SQL Server were actually created, it was a long time ago and in today’s large commodity hardware servers some of them aren’t the best for SQL any longer.  For example, most people migrating from an x86 SQL Server to a x64 server learn really quick that you have to set Max Server Memory on a 64 bit server, something that you could get away with not setting on older x86 hardware even though it was still a recommended practice. 

Max Degree of Parallelism is another hot one these days with newer Xeon Quad Core and Nahelem six core processors and reduced server costs, it is easy to slam sixteen or twenty-four cores into a relatively cheap server that has 64-128GB of RAM, a configuration that only a few years ago was specialty hardware that was cost prohibitive for most shops.  These kinds of servers have quickly become the answer to performance problems that aren’t necessarily hardware related but instead poor design.  What you get is a person with no idea how SQL Server works, with a really powerful server that is full of performance problems because it is trying to run with the default settings which aren’t appropriate for this size of hardware.

In Bob Ward’s session on wait stats at PASS, one thing he said a couple of times is that the answer to CXPACKET waits, isn’t to reduce the ‘max degree of parallelism’ sp_configure option for the hole server.  I have to admit that I am guilty of recommending this solution hundreds of times before PASS and in most cases I continue to recommend this solution to people who have ho idea what they are doing  because:

A.  Most seem to be to lazy to read the referenced whitepapers provided to them for troubleshooting.
B.  They have no idea what they are doing.
C.  They want a quick fix that gets their manager off their back.

One thing that Tom’s question lead me to thinking about was whether or not adjusting the ‘cost threshold for parallelism’ up from five to a number that still allows large queries that don’t execute frequently to work with parallelism, while minimizing the number of smaller queries that execute more frequently from running in parallel.  To help with determining where this option might be set, I wrote a query to search the plan cache for existing parallel plans and see the cost associations to current plans that executed parallel.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; 
WITH XMLNAMESPACES   
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')  
SELECT  
     query_plan AS CompleteQueryPlan, 
     n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS StatementText, 
     n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS StatementOptimizationLevel, 
     n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') AS StatementSubTreeCost, 
     n.query('.') AS ParallelSubTreeXML,  
     ecp.usecounts, 
     ecp.size_in_bytes 
FROM sys.dm_exec_cached_plans AS ecp 
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp 
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n) 
WHERE  n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1 

I look at the high use count plans, and see if there is a missing index associated with those queries that is driving the cost up.  If I can tune the high execution queries to reduce their cost, I have a win either way.  However, if you run this query, you will note that there are some really high cost queries that you may not get below the five value.  If you can fix the high use plans to reduce their cost, and then increase the ‘cost threshold for parallelism’ based on the cost of your larger queries that may benefit from parallelism, having a couple of low use count plans that use parallelism doesn't have as much of an impact to the server overall, at least based on my own personal experiences.

Theme design by Nukeation based on Jelle Druyts