The Accidental DBA (Day 21 of 30): Essential PerfMon counters

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

So far this week, Erin has covered the importance of having a baseline of your SQL Server performance, some of the tools for collecting and analyzing the information collected, and Kimberly has talked about some of the important DMVs. In this post we’ll take a look at some of the most common performance counters for SQL Server, what they mean, and which ones are the most important for routine monitoring to identify problems with SQL Server.

PAL Template

Erin talked about the PAL tool earlier this week and mentioned the template that is provided by the tool for creating a Data Collector Set in Performance Monitor to capture SQL Server-specific performance counters. You can see how to do this in the article that Erin linked to, Free Tools for the DBA: PAL Tool if you haven’t already looked at this.  As a DBA, I loved the PAL tool for it’s simplicity and the time it saved me configuring performance counter collections on each of my SQL Servers, and then the time it saved me analyzing the data that was collected.

However, the PAL tool has its limitations, and one of the limitations is that it applies fixed thresholds to certain performance counters, and it uses thresholds that have been published since SQL Server 2000 by Microsoft without major updates. Until just last week, the SQL Server template applied a fixed threshold of 300 to the analysis of Page Life Expectancy which doesn’t fit the changes in server hardware memory sizes that have occurred since that value was first published.  Now the template analyzes the Page Life Expectancy based on the value you specify as being normal for the server, or a default of 700 if no value is defined.

Essential Counters

While the PAL template for SQL Server is very comprehensive, I also maintain a short list of performance counters that I use for spot checking server performance periodically.  To simplify covering each of the counters, why it’s important and what to look for, we’ll look at counters for CPU, memory, and the disk separately, starting with CPU.

CPU Usage

  • Processor
    • %Processor Time
    • %Privileged Time
  • Process (sqlservr.exe)
    • %Processor Time
    • %Privileged Time

The Processor\%Processor Time counter is probably the most familiar counter in performance monitor and gives us information about the total CPU usage for the server we are looking at unless it is a virtual machine.  For virtual machines (VMs), the Processor\%Processor Time counter is not a reliable measure of actual CPU usage for the server, and instead shows the percentage of the allocated CPU resources that the VM is actually using at a given point in time.  High values for % Processor  Time in a VM need to be checked against the host performance counters for the VM to determine if the VM is contending for physical CPU allocations with other guest VMs on the same host or not. In addition to the Processor counter we also want to collect the Process (sqlservr.exe)\%Processor Time counter to be able to balance whether a high CPU condition is actually the result of SQL Server utilization or another application that might be running on the server.  For most SQL Server workloads the average processor usage over time should be fairly consistent and unexplained increases from the baseline trending should be investigated further.  Values greater than 80% consistently should also be investigated to determine if the workload is exceeding the current hardware capabilities or for problems that are increasing the overall CPU usage for the instance.

Memory Usage

  • Memory
    • Available Mbytes
  • SQL Server:Buffer Manager
    • Lazy writes/sec
    • Page life expectancy
    • Page reads/sec
    • Page writes/sec
  • SQL Server:Memory Manager
    • Total Server Memory (KB)
    • Target Server Memory (KB)

The first counter that I start with when looking at memory usage on a SQL Server is the Memory\Available MBytes counter.  I start with this counter first to determine if Windows is experiencing memory pressure, which would affect the SQL Server performance counter values potentially.  The Memory\Available MBytes counter should ideally be above 150-300MB for the server, which leaves memory available for other applications to run without pushing Windows into a low memory condition.  When the Available MBytes counter drops below 64MB, on most servers Windows signals a low memory notification that SQL Server monitors for, and the SQLOS ( the ‘operating system’) inside of SQL Server will reduce memory usage as a result of this notification occurring.

After confirming that Windows has available memory to prevent SQL Server from shrinking memory usage, the next counters I generally look at are Buffer Manager\Page life expectancy and Lazy writes/sec.  The Page life expectancy (PLE) should be generally consistent on average, but may fluctuate with changes in the server workload, with lower values at peak periods.  In general, the more memory allocated to a SQL Server instance, the higher I’d like to see this counter’s normal value.  For example, a server with 230GB RAM allocated to the SQL Server buffer pool and a PLE of 300 would equate roughly to 785MB/sec of I/O activity to maintain the page churn inside of the buffer pool.  While it might be possible for the I/O subsystem to keep up with this demand, this represents a significant amount of page churn in the buffer pool and may be a sign of missing indexes, implicit conversions due to mismatched data types, and all kinds of other problems that can be fixed if identified as the root cause.

If you find a low PLE for the server, look at the other performance counters in the Buffer Manager category for correlation of overall memory pressure inside of SQL Server.  If Lazy writes/sec is consistently experiencing non-zero values with a low PLE and elevated values for Page reads/sec and Page writes/sec the server is experiencing buffer pool contention and you will need to go about troubleshooting this problem further.  If the Memory\Available MBytes counter was fluctuating and under the 64MB low memory threshold, looking at the Memory Manager\Total Server Memory (KB) and Target Server Memory (KB) counters will tell you if that has resulted in SQL Server reducing the size of the buffer pool in response.  On a stable system Total Server Memory (KB) will be lower than Target Server Memory (KB) during the initial buffer pool ramp up, but then remain equal under normal operation unless a low memory condition occurs.  If the server is a VM running on VMware, look for memory ballooning by the host with the VM Memory\Memory Ballooned (MB) counter for non-zero values to see if the host is causing the low memory condition.

Disk Usage

  • Physical Disk
    • Avg. Disk sec/Read
    • Avg. Disk Bytes/Read
    • Avg. Disk sec/Write
    • Avg. Disk Bytes/Write
  • Paging File
    • %Usage
  • SQL Server:Access Methods
    • Forwarded Records/sec
    • Full Scans/sec
    • Index Searches/sec

The Avg. Disk sec/Read and /Write counters provide the current average latency for each of the disks on the server.  Latency is one of the most important metrics for SQL Server I/O performance, but the latency should be compared to the size of I/Os that are being performed before determining whether or not a specific value is an indicator of a problem or not.  As the size of the I/O operations increases, so does the latency for the operation, so it would be normal to see higher latency values for a reporting workload doing large table scans vs. a transactional workload with smaller I/O operations occurring.  The Microsoft general recommendations for I/O latency for SQL Server are:

  • < 8ms: excellent
  • < 12ms: good
  • < 20ms: fair
  • > 20ms: poor

Over the last two years consulting, only a few of the servers that I’ve looked at during health checks meet the <20ms latency numbers for data and log files.  Most servers tend to fall into the < 30ms range for I/O latency per disk on average.  SSDs are changing this rapidly, and we’re seeing more and more disk configurations that include SSDs and the result is very low I/O latency.  When looking at our overall I/O for the server, reviewing the Access Method counters for Full Scans/sec, Forwarded Records/sec, and Index Searches/sec can give us a clue to the type of workload that is occurring to generate the I/O.

Summary

An important consideration when looking at performance counters, or any monitoring data from SQL Server for that matter, is that no single data point will tell you the root cause of a problem, if one occurs. For performance counters, you need to look across multiple counters for correlating information to pinpoint the root of problems.

For example, if we look at disk I/O and see high latency numbers with lower I/O sizes on average, we could infer that we have a disk bottleneck and we need to improve our I/O performance. However, if we look at the Buffer Manager\Page life expectancy and find that it is lower than our normal baseline numbers for the server, and then see the Buffer Manager\Page Reads/sec is higher than usual, the data would point to a memory contention in the buffer pool which is going to affect I/O performance to keep up with the page churn occurring in the buffer pool.

Understanding the counters and how they relate makes it much easier to spot when problems are occurring and then pinpoint where the actual root of the problem might be.

The Accidental DBA (Day 16 of 30): General Security

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

SQL Server offers multiple layers of security configuration that allow for a defense-in-depth approach to securing business critical data. A strong security implementation starts with the SQL Server installation and service account configuration, and continues into the surface area configuration for SQL Server by minimizing the options that are turned on that may increase the risk of threat from an attack to the instance, and finally securing individual logins and database level user permissions following the principal of least-privilege required to accomplish a task.

Installation and Service Accounts

A secure installation starts with properly selecting service accounts that the individual services will run under. Ideally, if the server is participating in an Active Directory domain, these will be domain accounts that are isolated per service and the accounts shouldn’t have additional permissions added to them except where necessary for the activity required. The service accounts should not be members of the Domain Admins, or the local server Administrators groups under any circumstances.

For most environments, the service account should be created as a low privilege user, and then during SQL Server setup, the required permissions will be added on the local server to allow SQL Server to use the account to run the services. The only additional permissions that should be added in most cases are the Perform Volume Maintenance Tasks and Lock Pages in Memory user rights assignements if Instant File Intialization and locked pages are going to be used by the instance, and then any specific file folder ACLs to allow the instance to access additional paths for data, transaction log, or writing backups to local disk locations. Additionally if network shares are used for backups or for file transfers, eg SSIS packages reading files for data loading, specific permissions for those locations can be added for the account. Permissions on data, transaction log, and backup file locations should be restricted to the service account and administrator users only whenever possible to minimize the accessibility of these files to other users within the domain.

Surface Area Configuration

SQL Server 2008/R2 and 2012 by default take great strides at minimizing the surface area for attack through SQL Server during setup. This low surface area configuration was first introduced in SQL Server 2005 and has been improved with each release to minimize the risks associated with a default installation of SQL Server. Features such as xp_cmdshell, OLE Automation, and SQLCLR are disabled by default to minimize the potential security risks that could occur when these features are enabled. Before enabling any of these features, consider the implications that they might have to the overall security of the server. If the server will host third-party application databases that require these features, there may be little that you can do to prevent enabling these features, but it is always worthwhile to question the need for enabling these to ensure that you’ve properly documented the reasons behind having these features enabled.

As a part of the instance surface area, the Windows Firewall should be configured to allow TCP access to the ports necessary for SQL Server connectivity.  For default instances this usually means enabling TCP ports 1433 and 1434 to allow user connections and connections to the Dedicated Administrative Connection.  For named instances, the default configuration of the instance is to use dynamic ports.  However, this should be changed for each named instance to fix it’s TCP listener port to allow the appropriate Windows Firewall rules to be created.  While it may be easier to disable the firewall, this does not minimize the attack surface for the server, and specific rules should be created instead of disabling the firewall entirely.

Controlling Server Access

An important consideration to overall security is controlling physical access to the server hardware as well as controlling access for who can actually login to the server OS. Physical security is incredibly important and should be thought out as a part of overall infrastructure security. Generally SQL Servers are installed in the corporate data center, but it is not uncommon for servers to also exist in smaller shops in unsecured locations, for example under the administrators desk or in other generally available locations within an office space. This can pose a significant risk for loss of data in the event the server is physically stolen or for downtimes if someone indavertently unplugs the server from power or the network. Servers that contain private information or data should always be locked inside a server closet or secured inside of a datacenter that has limited access to a restricted set of employees only.

Windows access to SQL Servers should equally be restricted to only administrative users that have a need to manage SQL Server or the OS that is running. One important item to remember is that any local administrator on the server has the ability to access the database and log files, backup files, and if they are so inclined gain access to SQL Server using easy to Google/Bing options that would add their account to the sysadmin fixed server role within SQL Server, even if they are not currently able to access SQL Server. The local administrators on a SQL Server should be limited to the Domain Admins group, which should contain a minimum number of users requiring that level of permissions, and, where appropriate for security controls, the database administrators only, other users should not have direct access to the server OS for a SQL Server, and the list of users that have access should be reviewed regularly to ensure that only those users requiring access actually have access to the server.

Controlling SQL Server Permissions

Permissions within the SQL Server instance can be granted at multiple levels from the server instance level to individual databases and even objects with a database. In general server level permissions should be restricted to administrators that need to manage the SQL Server instance. Most server level permissions are granted through the use of one of the many fixed server roles that are a part of the default installation of SQL Server. The sysadmin, serveradmin, and securityadmin roles should be minimized to use only be actual administrator requiring permissions to manage SQL Server. Just like the Local Administrators group in Windows, these roles provide elevated permissions to control the instance, make configuration changes, create additional logins and manage security within the instance.  Application accounts should not be a member of these roles as a matter of least privilege, specifically the sysadmin role. However, certain applications like SharePoint require the securityadmin fixed server role and there is nothing you can do as an administrator to prevent this. The sysadmin role should be audited routinely like the Local Administrators group in Windows to track the logins that have administrative control over the instance.

Application logins should continue the principal of least privilege by only having access to the databases(s) and objects required by them. While permissions to individual objects should be enforced as a general security best practice, the reality of most third-party applications is that the application login will need to be the dbo or a member of the db_owner database role for the databases that are used by the application.  In cases like this, there will also be little that can be done as an administrator to change the security requirements for the application.  However, in cases where a third party application requires elevated permissions to the sysadmin fixed server role, understanding the requirements for this high level role should be a top priority, and any application with this elevated level of access should be isolated from other databases on separate instances to minimize the risk imposed.

Summary

Creating a secure SQL Server installation starts with pre-installation planning of the service accounts, to minimize the permissions the instance has to other resources within the network. After installing the instance, minimizing the surface area through proper firewall configuration and leaving configuration options that are disabled by default disabled unless absolutely necessary continues to build on the defense in depth concept for SQL Server. Auditing of the Local Administrators group in Windows and the sysadmin fixed server role should be a part of normal system security reviews to understand who has administrative access to the server or instance. Further information about SQL Server security can be found in Paul’s 2009 TechNet article Security: Common SQL Server Security Issues  and Solutions.

The Accidental DBA (Day 14 of 30): Index Maintenance

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

Like a high performance sports car, SQL Server databases require routine maintenance to maintain their best performance. Unless a database is read-only, one of the most common maintenance tasks that is required periodically is rebuilding or reorganizing the indexes in the database to remove fragmentation. Index fragmentation generally is categorized as one of the following two problems:

  • Logical scan fragmentation – physical page ordering not matching the index key order, due to page splits occurring during INSERT/UPDATE operations. This affects the performance of range scan operations and results in an increased number of smaller block I/O operations, reading as little as a single 8-KB page at a time.
  • Low page density – individual pages within the index have large amounts of free space due to page splits during INSERT/UPDATE operations or DELETE operations removing data from the table.

There are many ways to deal with index fragmentation that can be automated in SQL Server.  Probably the most common method implemented by Accidental DBAs is the use of the built in Database Maintenance Plans in SQL Server using a plan built through the Maintenance Plan Wizard. However, other options exist that are better, using custom T-SQL scripts that do less work by first analyzing the index fragmentation and page density levels before determining the best option for reducing the fragmentation with the least amount of work.

Database Maintenance Plans

Database Maintenance Plans exist in SQL Server to simplify the tasks required for general maintenance of the databases within an instance, and are generally very easy to configure.  However, using the Maintenance Plan Wizard can lead to redundant tasks for index maintenance, and the tasks are executed against all of the indexes, regardless of whether fragmentation exists or not.  Common Maintenance Plans that I’ve seen when consulting have a Reorganize Index Task that is immediately followed by a Rebuild Index Task in the same schedule, since this is what the default options for the wizard will allow you to configure unless changes are made. While something is better than nothing, this unfortunately also performs double the work by first reorganizing the indexes, which removes the fragmentation but may not be the most efficient method of doing so, and then rebuilding the index, which creates a copy of the index that does not contain fragmentation before dropping the original page allocations for the index.

The Reorganize Index Task is best suited for indexes with low fragmentation levels whereas the Rebuild Index Task is best suited for indexes with high fragmentation.  However, the problem with both of these maintenance plan tasks is that neither of them actually check the fragmentation level before running the ALTER INDEX operation against the indexes.  If the task is configured to run for All Databases, which is very common for most maintenance plan usage, then the task will reorganize or rebuild all of the indexes in all of the databases, even if the indexes are not actually fragmented.  For the rebuild task, this can result in high I/O activity and transaction log generation that could be avoided through analysis-based index maintenance using T-SQL scripts.

Custom Scripts

The preferred method of performing index maintenance with the least amount of overhead involves first analyzing the index fragmentation levels and then deciding whether it is more efficient to reorganize the index using ALTER INDEX … REORGANIZE, or to rebuild the index using ALTER INDEX … REBUILD.  The index fragmentation level can be found using the sys.dm_db_index_physical_stats function in SQL Server, and the general recommendations in the Books Online are:

  • If an index has less than 1,000 pages, don’t bother removing fragmentation
  • If the index has:
    • less than 10% logical fragmentation, don’t do anything
    • between 10% and 30% logical fragmentation, reorganize it (using ALTER INDEX … REORGANIZE)
    • more than 30% logical fragmentation, rebuild it (using ALTER INDEX … REBUILD)

Since this type of index maintenance requires custom coding it is not quite as easy to implement as the built in Maintenance Plans, it might seem like more work than it’s worth.  However, there are a number of freely available scripts that are easy to deploy that automate all of this for you, and can even be faster to implement than manually building a Maintenance Plan using the Maintenance Plan Wizard.

In a recent survey by Paul, Easy automation of SQL Server database maintenance, more than 40% of the almost 500 respondents use some or all of Ola Hallengren’s Maintenance Solution, and I always recommend these scripts to clients as a way to save them consulting time.  Ola provides an easy-to-deploy alternative to Database Maintenance Plans in a single script that creates all the customized stored procedures and SQL Agent jobs for index maintenance, statistics updates, consistency checks, and even database backups that follow recommended best practices for SQL Server.  I replaced my own scripts as a DBA close to five years ago with Ola’s scripts when they were first available and have been using them for SQL Server maintenance tasks ever since.  Paul’s blog post also points out other popular scripts from SQL Server MVPs Michelle Ufford and Tara Kizer as other options specifically for index maintenance.

Summary

Proper index maintenance is critical task for maintaining the performance of SQL Server databases, and while there are multiple options for automating this common task, custom scripts provide the best method of minimizing the work to be done by performing the most efficient operation based on the actual fragmentation that exists.

The Accidental DBA (Day 12 of 30): Backups: VM Snapshots

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

In the final post in the Backup section of our Accidental DBA series, I’m going to a look at backups using virtual machine (VM) snapshots, which are popular among VM administrators but may not be the right solution for your SQL Server recovery needs, depending on your RPO and RTO requirements.

VM Snapshot Backup Support

The “Support policy for Microsoft SQL Server products that are running in a hardware virtualization environment” Knowledge Base article includes information about the support of virtualization-aware backups for SQL Server. As long as the backup solution uses the volume shadow-copy service (VSS) to perform volume-based snapshots of the virtual machine hard disks, those backups are supported for recovery with SQL Server in a virtualized environment. Some examples of these tools include:

  • Hyper-V backup
  • Veeam
  • vRanger

There are also many other VM backup tools available; this is just a brief list of the ones that come immediately to my mind. Additionally, many of the traditional backup products like Symantec NetBackup also support VSS-based backups of virtual machine images as well. VSS integration is required to provide an application-consistent backup of the databases contained within the SQL Server instance. This ensures that disk I/O activity is quiesced (frozen) properly prior to the snapshot being created. Any snapshot functionality that does not use VSS may leave SQL Server in an inconsistent state, and this may include standard VM snapshots that can save the VM memory and device state to disk without quiescing through VSS, unless specifically configured to do so.

Limitations of VM Backups

VM snapshot-backup solutions are popular with VM and server administrators because they standardize the backup implementation across the enterprise and remove the need for application-specific backup configurations. While this is generally a benefit, there are other considerations that should be evaluated when it comes to relational database management systems (RDBMS) like SQL Server. I specifically say RDBMS here because these same considerations need to be applied to any RDBMS and not just SQL Server for backups. Below are a few of the considerations that you should keep in mind while evaluating whether to use VM snapshot backups for SQL Server.

Point-in-time Recovery?

One of the features provided by VM backup solutions is the ability to perform a point-in-time restore of the virtual machine image, or even the files contained within the VM. While it is true that you can restore to a point in time, that point is simply the last snapshot backup point for the VM being backed up. Depending on the frequency of the backups that are occurring, this might meet your business recovery requirements, but it doesn’t provide the same capabilities as native SQL Server backups, which provide the ability to restore to any point in time using a combination of the latest full backup of the database, and all of the transaction log backups since that full backup (using the full recovery model, or the bulk_logged recovery model with some restrictions). If you are not currently using the full recovery model for databases and also taking transaction log backups at regular intervals, the point-in-time recovery provided by VM snapshot backups can reduce the risk of data loss over only performing daily full or differential SQL Server backups of the database, depending on the snapshot backup interval configured for the VM.

Single Database Restore?

Depending on the tool being used for VM backups, it may or may not be possible to restore a single database from the VM backup without first having to restore the entire VM image from the backups to obtain access to the database files contained in the image. Some tools do allow guest-OS file indexing of the VM backups which allows for individual files to be restored from the VM backup without having to restore the entire VM image. Other tools also offer the ability to mount a VM backup as a VM that is boot-able to allow for object-level recovery of individual tables through data transfers back to the production VM SQL Server.

Transaction Log Clearing?

For databases that use the full or bulk_logged recovery models, transaction log clearing only occurs when the log is backed up using SQL Server native backup commands. The VSS backup will not cause log clearing to occur inside of SQL Server on it’s own. Some of the VM backup tools offer options to perform transaction log clearing, but you need to be very careful with these options, especially if you have concurrent SQL Server backups being performed for the VM. The way that certain tools clear the transaction log during a VM snapshot backup is to issue a subsequent command to the SQL Server, BACKUP LOG <database_name> TO DISK =’NUL’, which dumps the transaction log records to nowhere, essentially throwing them away completely and breaking the log backup chain for any native SQL Server backups being performed. If you are still using native SQL Server backups, it is recommended that the VM snapshot backups be configured to not truncate the transaction logs for the databases.

Summary

While VM snapshot backups can provide a simplified method of backing up SQL Server VMs, there are some trade-offs that exist when using them that should be considered to ensure that you are going to be able to meet your business RPO and RTO requirements. The ability to restore a VM, database files, or even an individual object inside of a database to the point-in-time of the last snapshot backup can help minimize the data loss associated with a crash or accidental deletion of data inside of a database.

However, contrary to popular belief, it is not possible to restore to an intermediate point-in-time that exists between the VM snapshot backups, so if that level of recovery is needed or expected, you will still need to perform SQL Server native backups of the databases. Paul explained how to get near zero data loss using SQL Server backups in his post The Accidental DBA (Day 8 of 30): Backups: Planning a Recovery Strategy.

This level of recovery currently can only be accomplished through the use of SQL Server native backups.

The Accidental DBA (Day 5 of 30): Virtualization Considerations

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

So far this month, Glenn has done an awesome job talking about the hardware considerations for SQL Server in a physical implementation including CPU and memory considerations, disk configuration, and considerations for SSD usage. Glenn also covered installation and configuration best practices for SQL Server as well.

In today’s post I’m going to discuss specific considerations for running SQL Server in a virtualized configuration. One of the most important things to keep in mind is that the CPU, memory, and I/O requirements for SQL Server in a virtual world are generally no different than in the physical world, so I recommend reading Glenn’s earlier posts before reading this one.

Reasons for Virtualizing

There are a number of factors that motivate businesses to virtualize line-of-business applications like SQL Server, but in general the main factor is that utilizing virtualization is generally cheaper overall than having a number of physical servers. “Cheaper” can mean a lot of different things. For example, cheaper can mean fewer licenses, less physical rack space, lower power consumption, and lower cooling costs.

Cheaper can also mean less hardware is used, which can translate into performance problems for SQL Server depending on the configuration being used. I typically see three major problems over and over when consulting and dealing with SQL Server virtual machines (VMs):

  • Incorrect I/O sizing for the workload based on capacity and not performance requirements in IOPS (I/Os per second) or MB/sec transfer rates.
  • Overcommitment of memory under VMware where no memory reservations have been set for the SQL Server VM to prevent “ballooning”.
  • Overcommitment of host resources for CPU.

I’m going to explain each of these problems in a bit more detail to help you out if you’re an Accidental DBA with responsibility for SQL Server in a virtual environment.

VM I/O Considerations

Of all the problems that can affect SQL Server VM performance, the ones that I dread seeing is are I/O subsystem sizing problems, as these are the hardest to fix.

Typically when I come across SQL Server VMs that have these issues, it is the result of the I/O subsystem having been sized only on storage capacity, with a view to minimizing costs. Quite often this sizing technique results in a small number of high capacity, slow rotational speed disks, combined with 1-Gb/sec iSCSI for connectivity to the SAN.

In the worst case I’ve seen consulting, the underlying I/O subsystem was built using six 1-TB, 7200-RPM disks in a NetApp SAN. After running a round of SQLIO (a free tool from Microsoft that allows I/O subsystem performance testing) tests on that SAN, followed by another round of tests to validate the very poor results I was seeing, I decided to run comparison tests against a USB-2, 5400-RPM external disk that I used at the time for backups of important files on my laptops. The results of these SQLIO tests are shown below:

VM SAN vs USB

VM SAN vs USB

The vertical axis is MBs/second and the horizontal axis lists the I/O size being tested in KB. As you can see, my external USB drive came really close to this company’s entire SAN for performance!

For the NetApp SAN, it turned out that the company’s entire data center was virtualized on this same SAN. For application VMs this type of setup might work reasonably well, or at least you won’t see the effects of the poor I/O configuration as strongly as you will for a SQL Server workload (which is generally a much heavier I/O workload). Additionally, the SAN was connected using 1-Gb/sec iSCSI with only 4 network paths to the storage, and these paths were shared by three different VM hosts, servicing all of the I/O demands of all of the VMs.

There is no SAN magic in the world that can beat the physical limitations of an improperly sized I/O subsystem that has to service all of the I/O demands of a heavily virtualized environment.

SQL Server I/O requirements should be expressed in terms of performance and not just capacity, with the expectation that in order to obtain the best performance there will be unused capacity in the configuration. You also need to consider the throughput requirements in MB/sec, and whether or not the available network connections to the SAN will be able to meet those requirements for your workload. In general each 1-GB/sec will work out to between 90 to 110-MB/sec of throughput (nominally it is actually 128-MB/sec but you’ll rarely achieve this, in my experience) if the underlying configuration for the storage can also meet that level of performance.

VM Memory Considerations

SQL Server is a memory-dependent application. In fact, large amounts of memory for SQL Server can often hide underlying I/O subsystem inadequacies by allowing more data to be cached in memory, thereby reducing the read I/O associated with the server workload.

However, one of the problems with running SQL Server in a VM is that the memory allocated to the VM might not be available if the host is over-subscribed (the sum of all the memory allocated to the various VMs is more than that available on the physical host).

VMware and Hyper-V have different approaches for VM memory allocations. VMware allows for over-provisioning of memory on a host by default, whereas Hyper-V performs admission checks at VM startup to ensure that the memory is going to be available for the VM. The exception to this in Hyper-V is when Dynamic Memory has been configured for the VM, which then performs the admission check for the startup memory configuration for the VM and allows the memory usage to grow to the maximum memory setting. Under Hyper-V, the VM also has a minimum memory configuration value which specifies the minimum size the VM will be allotted if memory pressure on the host occurs. Under VMware, the VM tools install a Balloon Driver, which is designed to reclaim memory from VMs if the host is over-provisioned and under memory pressure.

Since SQL Server is a very memory dependent application, neither Dynamic Memory or ballooning is generally desirable for high performance configurations. VMware does allow the configuration of static memory reservations to prevent ballooning of individual VMs and when memory over-commit is in use for a host, the recommended best practice from VMware is to set memory reservations for low latency applications like SQL Server. Make sure that memory pressure is not causing a performance problem for your SQL Server VMs.

VM CPU Considerations

As a general rule, VMs should be configured with the lowest number of virtual CPUs (vCPUs) required to meet the application workload demands. For SQL Servers, you should also consider the impact that a reduced number of vCPUs might have on the query workload when parallel query execution is common. However, it is not uncommon for “wide” virtual machines (with a high number of vCPUs) to run into scheduling problems when mixed with other “narrow” virtual machines (with a small number of vCPUs) on the same host. The problem occurs because the physical CPUs are being shared in time slices with the other VMs that are also running on the same host.

While the virtual machine appears to the guest OS (the OS that’s running in the VM) as if it has 4 processors, the virtual processors might not be scheduled equally inside of the available time slices for the physical processors if there is concurrent activity in other VMs. To prevent problems associated with skew accumulation across the vCPU scheduling (i.e. vCPUs not being scheduled), the hypervisors (the process that manages the VMs running on the physical host) will periodically co-schedule the vCPU execution, requiring the same number of physical CPUs as vCPUs be made available so the wide VM can be scheduled to its workload can progress. VMware implemented a relaxed co-scheduling algorithm that has changed over time to only require co-scheduling of the vCPUs that have skewed, instead of requiring all vCPUs to co-schedule, but under parallel workloads in SQL Server, this could still correlate to full co-scheduling of the VM.

What all this means is that a SQL Server workload in a wide VM may stall periodically waiting for processors to become available.

This is a common occurrence that I see with SQL Server VMs: a specific workload is hitting performance problems with CPU, generally measured using the % Processor Time within the VM (which only represents the VM’s usage of the allocated resources and not actual physical processor usage), and the number of vCPUs is increased, which results in even more performance degradation.

When this occurs, reducing the number of vCPUs often improves performance. When co-scheduling issues exist, either as a factor of host over-commitment, or mixing “wide” virtual machines with smaller ones, this is usually the only option aside from adding another physical host to the overall configuration to try to balance out the workload. In VMware, CPU Ready time, can often be a sign of CPU issues for VMs and is something that I generally track to determine if a VM is having scheduling problems inside of the hypervisor.

Summary

Virtualizing SQL Server workloads is becoming more common in many companies, and is even supported by Microsoft (see KB article 956893) if the host configuration meets all their documented requirements. With the appropriate considerations around workload sizing and hardware provisioning, SQL Server VMs can deliver the required performance for many line-of-business uses. However, when cost cutting is the driving reason behind server virtualization, the resulting configurations can be problematic for SQL Server performance. With the information I’ve provided in this post, you now know the three main areas that can cause performance problems in SQL Server VMs.

Querying the VMware vCenter Database (VCDB) for Performance and Configuration Information

In my previous post I showed how DBAs can be provided with read-only access to Virtual Center for monitoring VM performance in the data-center   One thing that many administrators don’t realize or think about is that Virtual Center uses a database for storing information about the virtual data center and the most common database platform used for this purpose is SQL Server.  Virtual Center can use an existing SQL Server in the environment, or it can install SQL Server Express during setup.

As a consultant, I do a lot of reviews of SQL Servers running on VMware, and as part of that work I have to understand the underlying virtualization configuration – not just for SQL Server, but for the virtualized data center.  This includes needing information about the host configuration, other VMs running on the host, and performance metrics from the VM historical data that is stored in the vCenter database, which is named VIM_VCDB by default.

The VIM_VCDB database isn’t entirely documented, but a list of the views and their definitions can be found in the Using VirtualCenter Database Views technical note from VMware.  Using this information, and additional information collected using a server-side trace against the database while working in vCenter, I’ve compiled a series of Transact-SQL scripts for pulling information from the VIM_VCDB database in an environment.

The first time I ever demonstrated these scripts was last week during the Virtualization module of our IE3: High Availability and Disaster Recovery course in Chicago.  I’m also using these scripts in the Pluralsight course on SQL Server Virtualization that I am currently recording.  When I demonstrated these scripts last week, the first question the class asked was where can they get them.  We always provide the scripts from our demos to students, but I also promised to blog them as well this week.

Host Configuration

The VPXV_HOSTS view provides host level information and I commonly JOIN that view to the VPXV_VMS view to aggregate VM information per host as a part of a environment health check.

-- Host Configuration
SELECT
    vh.NAME AS HOST_NAME,
    HOST_MODEL,
    CPU_MODEL,
    CPU_COUNT,
    CPU_CORE_COUNT,
    CPU_HZ,
    CPU_THREAD_COUNT,
    SUM(CASE WHEN vm.POWER_STATE = N'On' THEN vm.NUM_VCPU ELSE 0 END) AS VM_VCPU_ACTIVE,
    MEM_SIZE,
    SUM(CASE WHEN vm.POWER_STATE = N'On' THEN vm.NUM_VCPU ELSE 0 END)*1./CPU_THREAD_COUNT AS THREAD_OVERCommit,
    SUM(CASE WHEN vm.POWER_STATE = N'On' THEN vm.NUM_VCPU ELSE 0 END)*1./CPU_CORE_COUNT AS CORE_OVERCommit,
    CAST(MEM_SIZE AS BIGINT)/1024/1024 AS MEM_SIZE_MB,
    SUM(CASE WHEN vm.POWER_STATE = N'On' THEN vm.MEM_SIZE_MB ELSE 0 END) AS VM_MEM_SIZE_MB,
    SUM(CASE WHEN vm.POWER_STATE = N'On' THEN vm.MEM_SIZE_MB ELSE 0 END)*1./(CAST(MEM_SIZE AS BIGINT)/1024/1024) AS MEM_OVERCommit,
    SUM(CAST(vm.MEMORY_OVERHEAD AS BIGINT)) AS VM_MEMORY_OVERHEAD,
    SUM(vm.MEM_SIZE_MB) AS VM_MEM_SIZE_MB_POTENTIAL,
    SUM(vm.NUM_VCPU) AS VM_VCPU_ALLOC_POTENTIAL,
    NIC_COUNT,
    HBA_COUNT,
    SUM(CASE WHEN vm.VMMWARE_TOOL = N'OK' THEN 1 ELSE 0 END) AS VM_TOOLS_OK,
    SUM(CASE WHEN vm.VMMWARE_TOOL = N'Old' THEN 1 ELSE 0 END) AS VM_TOOLS_OUT_OF_DATE,
    SUM(vm.NUM_VCPU) AS VM_VCPU_ALLOC
FROM dbo.VPXV_HOSTS AS vh
INNER JOIN dbo.VPXV_VMS AS vm
    ON vh.HOSTID = vm.HOSTID
GROUP BY vh.NAME, HOST_MODEL, CPU_MODEL, CPU_COUNT, CPU_CORE_COUNT, CPU_HZ,
    CPU_THREAD_COUNT, MEM_SIZE, NIC_COUNT, HBA_COUNT;

Performance Counter Information

Performance counters are maintained historically in the VIM_VCDB database at multiple roll-up levels. The VPXV_HIST_STAT_DAILY view has the daily roll-up values for each of the VMs and the VPXV_HIST_STAT_WEEKLY view has the weekly roll-up values.

Querying individual counter values and making decisions is not appropriate for all counters. An example of this is the CPU Ready counter which has to be calculated from the roll-up summation to determine if a problem actually exists (see CPU Ready Time in VMware and How to Interpret its Real Meaning). Two counters that I look at for the virtual data center and all of the VMs are CPU Ready and Memory Ballooned (vmmemctl).

CPU Ready Values

-- Daily %RDY values
SELECT
    vh.NAME AS HostName,
    vv.NAME AS GuestName,
    SAMPLE_TIME,
    SAMPLE_INTERVAL,
    (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 AS READY_PERCENT,
    CASE
        WHEN (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 > 5
                AND (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 < 10
            THEN N'WARN'
        WHEN (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 > 10 THEN N'RED'
        ELSE N'OK'
    END AS CPURDY_State,
    STAT_VALUE AS CPUReady_Summation,
    NUM_VCPU
FROM dbo.VPXV_HIST_STAT_DAILY AS vhsd
INNER JOIN dbo.VPXV_VMS AS vv
    ON vhsd.ENTITY = N'vm-'+CAST(vv.VMID AS NVARCHAR)
INNER JOIN  dbo.VPXV_HOSTS AS vh
    ON vv.HOSTID = vh.HOSTID
WHERE STAT_GROUP = N'cpu'
  AND STAT_NAME = N'ready'
  AND CASE
        WHEN (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 > 5
            AND (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 < 10
            THEN N'WARN'
        WHEN (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 > 10 THEN N'RED'
        ELSE N'OK'
        END <> N'OK'
ORDER BY CPURDY_State, READY_PERCENT DESC;

-- Weekly %RDY values
SELECT
    vh.NAME AS HostName,
    vv.NAME AS GuestName,
    SAMPLE_TIME,
    SAMPLE_INTERVAL,
    (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 AS READY_PERCENT,
    CASE
        WHEN (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 > 5
                AND (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 < 10
            THEN N'WARN'
        WHEN (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 > 10 THEN N'RED'
        ELSE N'OK'
    END AS CPURDY_State,
    STAT_VALUE AS CPUReady_Summation,
    NUM_VCPU
FROM dbo.VPXV_HIST_STAT_WEEKLY AS vhsd
INNER JOIN dbo.VPXV_VMS AS vv
    ON vhsd.ENTITY = N'vm-'+CAST(vv.VMID AS NVARCHAR)
INNER JOIN  dbo.VPXV_HOSTS AS vh
    ON vv.HOSTID = vh.HOSTID
WHERE STAT_GROUP = N'cpu'
  AND STAT_NAME = N'ready'
  AND CASE
        WHEN (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 > 5
            AND (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 < 10
            THEN N'WARN'
        WHEN (STAT_VALUE/ (vv.NUM_VCPU * SAMPLE_INTERVAL * 1000)) * 100 > 10 THEN N'RED'
        ELSE N'OK'
        END <> N'OK'
ORDER BY CPURDY_State, READY_PERCENT DESC;

Memory Ballooning Counters

-- Weekly Memory Ballooned
SELECT
    vh.NAME AS HostName,
    vv.NAME AS VMName,
    Start,
    Finish,
    tab.SAMPLE_INTERVAL,
    MIN(STAT_VALUE)/1024. AS MinBallooned_MB,
    MAX(STAT_VALUE)/1024. AS MaxBallooned_MB,
    AVG(STAT_VALUE)/1024. AS AvgBallooned_MB,
    COUNT(*) * (tab.SAMPLE_INTERVAL/60) AS MinutesBallooned
FROM dbo.VPXV_HIST_STAT_WEEKLY AS vhsd
INNER JOIN dbo.VPXV_VMS AS vv
    ON vhsd.ENTITY = N'vm-'+CAST(vv.VMID AS NVARCHAR)
INNER JOIN  dbo.VPXV_HOSTS AS vh
    ON vv.HOSTID = vh.HOSTID
CROSS JOIN (SELECT
                MIN(SAMPLE_TIME) AS Start,
                MAX(SAMPLE_TIME) AS Finish,
                SAMPLE_INTERVAL
            FROM dbo.VPXV_HIST_STAT_WEEKLY
            WHERE STAT_NAME = N'vmmemctl'
              AND STAT_VALUE > 0
            GROUP BY SAMPLE_INTERVAL) AS tab
WHERE STAT_NAME = N'vmmemctl'
  AND STAT_VALUE > 0
GROUP BY vh.Name, vv.Name, Start, Finish, tab.SAMPLE_INTERVAL
ORDER BY HostName, MinutesBallooned DESC;

-- Daily Memory Ballooned
SELECT
    vh.NAME AS HostName,
    vv.NAME AS VMName,
    Start,
    Finish,
    tab.SAMPLE_INTERVAL,
    MAX(STAT_VALUE)/1024. AS MaxBallooned_MB,
    AVG(STAT_VALUE)/1024. AS AvgBallooned_MB,
    COUNT(*) * (tab.SAMPLE_INTERVAL/60) AS MinutesBallooned
FROM dbo.VPXV_HIST_STAT_DAILY AS vhsd
INNER JOIN dbo.VPXV_VMS AS vv
    ON vhsd.ENTITY = N'vm-'+CAST(vv.VMID AS NVARCHAR)
INNER JOIN  dbo.VPXV_HOSTS AS vh
    ON vv.HOSTID = vh.HOSTID
CROSS JOIN (SELECT
                MIN(SAMPLE_TIME) AS Start,
                MAX(SAMPLE_TIME) AS Finish,
                SAMPLE_INTERVAL
            FROM dbo.VPXV_HIST_STAT_DAILY
            WHERE STAT_NAME = N'vmmemctl'
                AND STAT_VALUE > 0
            GROUP BY SAMPLE_INTERVAL) AS tab
WHERE STAT_NAME = N'vmmemctl'
  AND STAT_VALUE > 0
GROUP BY vh.Name, vv.Name, Start, Finish, tab.SAMPLE_INTERVAL
ORDER BY HostName, MinutesBallooned DESC;

Summary

The VIM_VCDB contains a lot more information than I’ve covered in this post.  These are simply examples of the types of information that you can query from the database as a part of troubleshooting problems or evaluating the configuration of a virtual data center.  One of the important things about having access to this data is to always validate that you are applying the appropriate understanding to the values being stored in the database.

Read-Only Access to Virtual Center for DBAs

Most environments using VMware for server virtualization are going to also have Virtual Center Server installed for administration and monitoring.  As a DBA, I always had access to Virtual Center with limited permissions.  It is incredibly easy for VM administrators to provide read-only access to Virtual Center to non-administrators so that they can monitor the performance counters for their servers.  In this post I’ll show how easy it is to configure permissions for Virtual Center, as a reference for DBAs to provide to VM administrators when requesting access.

Configuring Read-Only Access to Virtual Center

The VM administrator can provision accounts for access to vCenter two ways; creating named accounts with passwords in Virtual Center directly, or through Active Directory Authentication.  The Permissions tab on any object in Virtual Center allows an administrator to provision access as shown below.

image

Right-clicking inside the window and choosing Add Permission will open the Assign Permissions dialog shown below.

image

Here an account can be selected from the Virtual Center users or through Active Directory, and then permissions can be assigned to the user or group to allow access into Virtual Center.  For simplicity, Virtual Center ships with a number of default roles that can be used to provide access with limited permissions.

Default Roles

The full definition of the default roles can be found on page 93 of the Datacenter Administrator Guide. While this guide is for vSphere 4.1, the same role configurations exist in the most recent version of Virtual Center Server.  The two key roles for DBAs are Read Only and Virtual Machine User.  As a DBA, I had Virtual Machine User access to all of my SQL Server VMs, which were grouped in a folder inside of Virtual Center to simplify permissions management.

Read Only

The Read Only role provides the bare minimum set of access to Virtual Center to allow monitoring performance information and viewing configuration information for a VM.  It provides the following level of access to Virtual Center:

  • View the state and details about the object
  • View all the tab panels in the vSphere Client except the Console tab
  • Cannot perform any actions through the menus and toolbars

Virtual Machine User

The Virtual Machine User role provides all of the access to vCenter that the Read Only role provides, but also provides the following additional permissions in vCenter:

  • Interact with a virtual machine’s console, insert media, and perform power operations
  • Does not grant privileges to make virtual hardware changes to the virtual machine
  • All privileges for the scheduled tasks privileges group
  • Selected privileges for the global items and virtual machine privileges groups
  • No privileges for the folder, datacenter, datastore, network, host, resource, alarms, sessions, performance, and permissions privileges groups

Summary

As a DBA, I would push for Virtual Machine User permissions inside of Virtual Center, but I’d want Read Only access as the bare minimum.  Configuring permissions is incredibly easy to do, and providing the ability to track performance and configuration information for VMs makes it easier to diagnose and track problems when they occur.  In the past, having the ability to control the VM power has prevented after-hours calls to VM administrators.  Additionally, having console access to the VM has allowed multiple team members to view the console remotely during troubleshooting rather than having to be in the office.  The fact that these permissions prevent configuration changes should negate any arguments against providing access to DBAs.

New Article – How expensive are column-side Implicit Conversions?

Last week I blogged about Implicit Conversions that cause Index Scans, and showed two charts for where implicit conversions could result in an index scan operation instead of an index seek during query execution.  As a part of writing that blog post I thought it would be interesting to also look at the performance overhead of the column-side implicit conversion occurring, given that I’ve seen it’s effects a number of times in the last few weeks.  Today my article How expensive are column-side Implicit Conversions? went live on the SQLPerformance.com site to show using Performance Monitor information on CPU usage and batch requests per second, the effect of the performance overhead when column-side implicit conversions occur that cause index scans during query execution.

Implicit Conversions that cause Index Scans

Implicit conversions seem to be a growing trend in the SQL Server performance tuning work that I’ve been engaged in recently, and I’ve blogged in the past about ways to identify when implicit conversions are occurring using the plan cache.

For those of you who don’t know, implicit conversions occur whenever data with two different data types are being compared, based on the data type precedence.  The precedence establishes the hierarchy of of the types, and lower precedence data types will always be implicitly converted up to the higher precedence type.  These conversions increase CPU usage for the operation, and when the conversion occurs on a table column can also result in an index scan where an index seek would have been possible without the implicit conversion.

For a long time I’ve wanted to map out the most common data types and the effect of a column-side implicit conversion for creating an index seek versus an index scan and recently I finally got around to mapping it all out.

Setting up the tests

To map out the implicit conversion affects I created two databases using different collations, one using SQL_Latin_General_CP1_CI_AS and the other using Latin_General_CI_AS, and then created the following table in each of the databases.

CREATE TABLE dbo.TestImplicitConverts
(
	RowID int NOT NULL IDENTITY (1, 1),
	BigIntCol bigint NOT NULL,
	BitCol bit NOT NULL,
	CharCol char(10) NOT NULL,
	DateCol date NOT NULL,
	DateTimeCol datetime NOT NULL,
	DateTime2Col datetime2(7) NOT NULL,
	DateTimeOffsetCol datetimeoffset(7) NOT NULL,
	DecimalCol decimal(10, 2) NOT NULL,
	FloatCol float(53) NOT NULL,
	IntCol int NOT NULL,
	MoneyCol money NOT NULL,
	NCharCol nchar(10) NOT NULL,
	NumericCol numeric(10, 2) NOT NULL,
	NVarchrCol nvarchar(50) NOT NULL,
	RealCol real NOT NULL,
	SmallDateTimeCol smalldatetime NOT NULL,
	SmallIntCol smallint NOT NULL,
	SmallMoneyCol smallmoney NOT NULL,
	TimeCol time(7) NOT NULL,
	TinyIntCol tinyint NOT NULL,
	GUIDCol uniqueidentifier NOT NULL,
	VarcharCol varchar(50) NOT NULL,
	CONSTRAINT PK_TestImplicitConverts PRIMARY KEY CLUSTERED (RowID)
);
GO

I then created a nonclustered index on each of the columns in the test table so that a single column query filtering on each column could then generate an execution plan with a single index seek when using the matching data type for the filtering.

-- Create nonclustered indexes on all columns to test implicit conversion affects
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_BigIntCol ON dbo.TestImplicitConverts (BigIntCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_BitCol ON dbo.TestImplicitConverts (BitCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_CharCol ON dbo.TestImplicitConverts (CharCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_DateCol ON dbo.TestImplicitConverts (DateCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_DateTime2Col ON dbo.TestImplicitConverts (DateTime2Col);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_DateTimeCol ON dbo.TestImplicitConverts (DateTimeCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_DateTimeOffsetCol ON dbo.TestImplicitConverts (DateTimeOffsetCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_DecimalCol ON dbo.TestImplicitConverts (DecimalCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_FloatCol ON dbo.TestImplicitConverts (FloatCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_GUIDCol ON dbo.TestImplicitConverts (GUIDCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_NVarcharCol ON dbo.TestImplicitConverts (NVarchrCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_RealCol ON dbo.TestImplicitConverts (RealCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_SmallDateTimeCol ON dbo.TestImplicitConverts (SmallDateTimeCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_SmallIntCol ON dbo.TestImplicitConverts (SmallIntCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_IntCol ON dbo.TestImplicitConverts (IntCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_MoneyCol ON dbo.TestImplicitConverts (MoneyCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_NCharCol ON dbo.TestImplicitConverts (NCharCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_NumericCol ON dbo.TestImplicitConverts (NumericCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_SmallMoneyCol ON dbo.TestImplicitConverts (SmallMoneyCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_TimeCol ON dbo.TestImplicitConverts (TimeCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_TinyIntCol ON dbo.TestImplicitConverts (TinyIntCol);
GO
CREATE NONCLUSTERED INDEX IX_TestImplicitConverts_VarcharCol ON dbo.TestImplicitConverts (VarcharCol);
GO

Finally each of the tables was loaded with 467000 rows of random data and the indexes were rebuilt to remove any fragmentation before testing was started.

INSERT INTO dbo.TestImplicitConverts
(	BigIntCol, BitCol, CharCol, DateCol, DateTimeCol, DateTime2Col, DateTimeOffsetCol,
	DecimalCol, FloatCol, IntCol, MoneyCol, NCharCol, NumericCol, NVarchrCol, RealCol,
	SmallDateTimeCol, SmallIntCol, SmallMoneyCol, TimeCol, TinyIntCol, GUIDCol, VarcharCol)
SELECT a.number, a.number%1, CAST(b.name AS CHAR(10)), DATEADD(ms, -1*a.number, GETDATE()), DATEADD(ms, -1*a.number, GETDATE()), DATEADD(ms, -1*a.number, GETDATE()), DATEADD(ms, -1*a.number, GETDATE()),
	a.number, a.number, a.number, a.number, CAST(b.name AS NCHAR(10)), a.number, b.name, a.number,
	DATEADD(ms, -1*a.number, GETDATE()), a.number, a.number, DATEADD(ms, -1*a.number, GETDATE()), a.number%255, NEWID(), b.name
FROM master.dbo.spt_values AS a
CROSS JOIN master.dbo.spt_values AS b
WHERE a.type = N'P'
  AND a.number < 1000
  AND b.name IS NOT NULL;
GO
ALTER INDEX ALL ON TestImplicitConverts REBUILD;
GO

The Test Harness

The test harness first connects to the instance of SQL Server and retrieves the column list from the database for the TestImplicitConverts table and stores them into an ArrayList that is then used to iterate over the tests, generating dynamic statements and at the same time using a different type for each test.  The types are selected from the SqlDbType enumeration in .NET and filter out the types that are not being tested as a part of this investigation.  Each test is executed with SET STATISTICS XML ON to collect the actual execution plan for each of the tests for processing to determine if the implicit conversion caused a scan to occur or not.

#Load the Showplan as a XML document for parsing
$xml = New-Object System.Xml.XmlDocument;
$xml.LoadXml($String);

#Setup the XmlNamespaceManager and add the ShowPlan Namespace to it.
$nsMgr = New-Object 'System.Xml.XmlNamespaceManager' $xml.NameTable;
$nsMgr.AddNamespace("sm", "<a href="http://schemas.microsoft.com/sqlserver/2004/07/showplan&quot;);">http://schemas.microsoft.com/sqlserver/2004/07/showplan");</a>

#Search for a Index Scan or Clustered Index Scan operation that is an Implicit Conversion
$HasImplictConvert = $xml.SelectNodes("//sm:RelOp[@LogicalOp='Index Scan' or @LogicalOp='Clustered Index Scan']/sm:IndexScan/sm:Predicate//sm:Convert[@Implicit='1']/sm:ScalarOperator/sm:Identifier/sm:ColumnReference", $nsMgr).Count -gt 0;

Then the results of each test is output to the host as a comma separated list that can then be placed in Excel and used to generate a chart of the implicit conversion effects.

Note:  The full test harness and scripts are attached to this post.

Results

The results of the tests are mapped out in the two charts below.  The green blocks show where an implicit conversion occurs but still results in an index seek operation.  The yellow blocks show where the implicit conversion causes an index scan to occur.  The red blocks show where an operand clash occurs and implicit conversion is not actually supported by SQL Server for the data types, and the grey blocks are the same data type, so no conversion was required.

image

Figure 1 – Implicit Conversions using SQL_Latin_General_CP1_CI_AS

There is a difference between using the SQL_Latin_General_CP1_CI_AS collation and the Latin_General_CI_AS for the char and varchar data types, when converting to nchar or nvarchar.  In the SQL collation the conversion results in an index scan as shown above, but in the Windows collation, the scan does not occur and an index seek is still used to execute the query.

image

Figure 2 – Implicit Conversions using Latin_General_CI_AS

Conclusion

While implicit conversions occur in SQL Server to allow data evaluations against different data types, they can introduce performance problems for specific data type conversions that result in an index scan occurring during the execution.  Good design practices and code reviews can easily prevent implicit conversion issues from ever occurring in your design or workload.

Download scripts here!

Finding Key Lookups inside the Plan Cache

This is actually a blog post I thought I’d written more than two years ago, but this morning when I went looking for it after receiving a question by email  I realized that I’ve never blogged about this before.  At PASS Summit 2010 I presented a session on performance tuning SQL Server by digging into the plan cache, and since then I have blogged a number of other plan cache parsing scripts.  One of the demo scripts I showed at PASS was a query to find plans and statements in the plan cache that perform a Key Lookup operation.  An example statement that performs a Key Lookup is shown below.

USE [AdventureWorks];
GO

SELECT NationalIDNumber, HireDate, MaritalStatus
FROM HumanResources.Employee
WHERE NationalIDNumber = N'14417807';
GO

image

Taking a look at the execution plan XML we can see that there isn’t an actual Key Lookup operator, instead the Index Scan operator associated with the Clustered Index Seek will have a Lookup attribute that is set to a value of 1, as shown in the XML snip-it below:

<RelOp NodeId="3" PhysicalOp="Clustered Index Seek" LogicalOp="Clustered Index Seek" EstimateRows="1" EstimateIO="0.003125" EstimateCPU="0.0001581" AvgRowSize="17" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="290" Parallel="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row">
  <OutputList>
    <ColumnReference Database="[AdventureWorks]" Schema="[HumanResources]" Table="[Employee]" Column="MaritalStatus" />
    <ColumnReference Database="[AdventureWorks]" Schema="[HumanResources]" Table="[Employee]" Column="HireDate" />
  </OutputList>
  <IndexScan Lookup="1" Ordered="1" ScanDirection="FORWARD" ForcedIndex="0" ForceSeek="0" ForceScan="0" NoExpandHint="0" Storage="RowStore">
    <DefinedValues>
      <DefinedValue>
        <ColumnReference Database="[AdventureWorks]" Schema="[HumanResources]" Table="[Employee]" Column="MaritalStatus" />
      </DefinedValue>
      <DefinedValue>
        <ColumnReference Database="[AdventureWorks]" Schema="[HumanResources]" Table="[Employee]" Column="HireDate" />
      </DefinedValue>
    </DefinedValues>
    <Object Database="[AdventureWorks]" Schema="[HumanResources]" Table="[Employee]" Index="[PK_Employee_EmployeeID]" TableReferenceId="-1" IndexKind="Clustered" />
    <SeekPredicates>
      <SeekPredicateNew>
        <SeekKeys>
          <Prefix ScanType="EQ">
            <RangeColumns>
              <ColumnReference Database="[AdventureWorks]" Schema="[HumanResources]" Table="[Employee]" Column="EmployeeID" />
            </RangeColumns>
            <RangeExpressions>
              <ScalarOperator ScalarString="[AdventureWorks].[HumanResources].[Employee].[EmployeeID]">
                <Identifier>
                  <ColumnReference Database="[AdventureWorks]" Schema="[HumanResources]" Table="[Employee]" Column="EmployeeID" />
                </Identifier>
              </ScalarOperator>
            </RangeExpressions>
          </Prefix>
        </SeekKeys>
      </SeekPredicateNew>
    </SeekPredicates>
  </IndexScan>
</RelOp>

As a side note, the ShowplanXML uses the IndexScan XML element for both Seeks and Scans.  An Index Seek is identified by the SeekPredicates child element, whereas an Index Scan would have a Predicates child element instead. Another important note when dealing with ShowplanXML is that the attribute values of a cached execution plan differ from those of an Actual execution plan collected from SSMS.  In the above XML from the plan cache, the Lookup attribute has a value of 1.  However, in an Actual plan collected by SSMS, the value would be true as shown in the Actual XML below:

<IndexScan Lookup="true" Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false">

When building an XML parser for querying the plan cache, the parser must check for a value of 1, not true, when looking for a Lookup attribute in the cached plan XML.  Given this information we can build a parser to search the plan cache and find the statements and columns associated with Key Lookups occurring.


SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

WITH XMLNAMESPACES
   (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT
    n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS sql_text,
    n.query('.'),
    i.value('(@PhysicalOp)[1]', 'VARCHAR(128)') AS PhysicalOp,
    i.value('(./IndexScan/Object/@Database)[1]', 'VARCHAR(128)') AS DatabaseName,
    i.value('(./IndexScan/Object/@Schema)[1]', 'VARCHAR(128)') AS SchemaName,
    i.value('(./IndexScan/Object/@Table)[1]', 'VARCHAR(128)') AS TableName,
    i.value('(./IndexScan/Object/@Index)[1]', 'VARCHAR(128)') as IndexName,
    i.query('.'),
    STUFF((SELECT DISTINCT ', ' + cg.value('(@Column)[1]', 'VARCHAR(128)')
       FROM i.nodes('./OutputList/ColumnReference') AS t(cg)
       FOR  XML PATH('')),1,2,'') AS output_columns,
    STUFF((SELECT DISTINCT ', ' + cg.value('(@Column)[1]', 'VARCHAR(128)')
       FROM i.nodes('./IndexScan/SeekPredicates/SeekPredicateNew//ColumnReference') AS t(cg)
       FOR  XML PATH('')),1,2,'') AS seek_columns,
    i.value('(./IndexScan/Predicate/ScalarOperator/@ScalarString)[1]', 'VARCHAR(4000)') as Predicate,
	cp.usecounts,
    query_plan
FROM (  SELECT plan_handle, query_plan
        FROM (  SELECT DISTINCT plan_handle
                FROM sys.dm_exec_query_stats WITH(NOLOCK)) AS qs
        OUTER APPLY sys.dm_exec_query_plan(qs.plan_handle) tp
      ) as tab (plan_handle, query_plan)
INNER JOIN sys.dm_exec_cached_plans AS cp 
    ON tab.plan_handle = cp.plan_handle
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/*') AS q(n)
CROSS APPLY n.nodes('.//RelOp[IndexScan[@Lookup="1"] and IndexScan/Object[@Schema!="[sys]"]]') as s(i)
OPTION(RECOMPILE, MAXDOP 1);

The first CROSS APPLY above breaks out the individual statements so that we can tie the Key Lookup operation to the specific statement that is causing it to occur.  The second CROSS APPLY breaks out the actual IndexScan element in the XML and filters out any Key Lookups that are happening in the ‘sys’ schema, since many of the system tables have frequent Key Lookup operations that occur.  Using the two XML fragments created by the CROSS APPLY operations we can extract the statement, the table and index the lookup operation occurs against, the output columns of the lookup operation and the seek predicate being used to perform the lookup.   With this information we can look at the indexing on our tables and determine if a change to a non-clustered index to make it covering is warranted to improve performance or not.

Hope this is helpful!