Using ‘dbghelp.dll’ version ’4.0.5′ error fixed in SQL Server 2012 SP1 CU6

If you use Extended Events you may have noticed that the ERRORLOG file gets bloated with messages like:

Using ‘dbghelp.dll’ version ’4.0.5′

every time you query sys.dm_xe_sessions or read a file using Transact-SQL and the sys.fn_xe_file_target_read_file() table-valued function. This issue can be especially problematic on SharePoint 2013 installations, where a timer job queries Extended Events every 15 seconds to monitor the SharePoint SQL Server instance.

This has been fixed in SQL Server 2012 SP1 + CU6 which was just released today (https://support.microsoft.com/kb/2878139).  The specifics of this “feature” are contained in the following KB article (https://support.microsoft.com/kb/2878139).

ALTER DATABASE failed. The default collation of database ‘%.*ls’ cannot be set to %.*ls.

Last week I was working with a client on upgrading one of their systems from SQL Server 2000 to SQL Server 2012, while also performing a collation change of the database and all of the table columns from Latin1_General_BIN to SQL_Latin1_General_CP1_CI_AS.  What started out as a straight forward upgrade, this actually became quite a challenge.  After upgrading to SQL Server 2008R2 since SQL Server doesn’t support direct upgrades from SQL Server 2000 to SQL Server 2012, I found metadata corruption.  We’ve seen and dealt with this before, so back to SQL Server 2000 to fix the orphaned entries, and then another upgrade attempt to SQL Server 2008R2.

At this point I had a corruption free database and started running the scripts I had generated to migrate from Latin1_General_BIN to SQL_Latin1_General_CP1_CI_AS.  When I got to the point of changing the database default collation I was dismayed to get the following error back from SQL Server:

Msg 1505, Level 16, State 1, Line 1
The CREATE UNIQUE INDEX statement terminated because a duplicate key was found for the object name ‘dbo.sysschobjs’ and the index name ‘nc1′. The duplicate key value is (0, 1, person).
Msg 5072, Level 16, State 1, Line 1
ALTER DATABASE failed. The default collation of database ‘TestCollationChange’ cannot be set to SQL_Latin1_General_CP1_CI_AS.

Thinking about the previous metadata corruption, I was certain that there was something wrong with the database still, but I couldn’t find anything with CHECKDB or CHECKCATALOG.  It turns out, there is nothing wrong with the database, there is something wrong with my expectations and assumptions.  To demonstrate this, consider the following example:

CREATE DATABASE [TestCollationChange]
ON  PRIMARY
( NAME = N'TestCollationChange', FILENAME = N'C:\SQLData\TestCollationChange.mdf')
LOG ON
( NAME = N'TestCollationChange_log', FILENAME = N'C:\SQLData\TestCollationChange_log.ldf')
COLLATE Latin1_General_BIN;
GO
USE [TestCollationChange];
GO
CREATE TABLE dbo.Person
(    RowID int NOT NULL IDENTITY (1, 1),
FirstName varchar(30) NOT NULL,
LastName varchar(30) NOT NULL);
GO
ALTER TABLE dbo.Person ADD CONSTRAINT PK_Person PRIMARY KEY CLUSTERED (RowID);
GO
CREATE TABLE dbo.person
(    RowID int NOT NULL IDENTITY (1, 1),
FirstName varchar(30) NOT NULL,
LastName varchar(30) NOT NULL);
GO
ALTER TABLE dbo.person ADD CONSTRAINT PK_person PRIMARY KEY CLUSTERED (RowID);
GO

Under the Latin1_General_BIN collation, this is a completely valid schema because case sensitivity is applied.  However, when you try and switch to a case insensitive collation with:

ALTER DATABASE [TestCollationChange] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
ALTER DATABASE [TestCollationChange] COLLATE SQL_Latin1_General_CP1_CI_AS;

these immediately become duplicate objects.  So where do we go from here?  First, the error message tells us that the object name is ‘person’, so you might consider doing a query against sys.objects:

SELECT * FROM sys.objects where name = 'person';

The only problem is that this will return 1 row, remember we are still in Latin1_General_BIN so case sensitivity is being applied.  To get around this, we need to change our query to collate the name column using our new collation:

SELECT * FROM sys.objects where name COLLATE SQL_Latin1_General_CP1_CI_AS = 'person';

This will show us both of the objects and it becomes immediately clear why we have a duplication issue, different cases.In the case of the actual database I was working on, the duplicate objects were two stored procedures (actually four if you think about), and the duplicates had object definitions similar to the following:

CREATE PROCEDURE [TestProcedure] AS
BEGIN
SELECT …….
-- Lots more logic, etc
END
GO

CREATE PROCEDURE [TESTProcedure] AS RETURN;
GO

I have no idea what the intent of the second procedure was, but after consulting with the client, it was determined that these duplicate stubs could be dropped, which then allowed the database collation change to SQL_Latin1_General_CP1_CI_AS. This might not be a viable solution if the application actually relies on the case sensitive nature of the naming convention, though I wouldn’t personally ever build a database with duplicate object names regardless of the collation.

Updated Availability Group Demonstrator

Since the first release of the SQLskills Availability Group Demonstrator, I’ve had an number of requests to add the ability to enable MultiSubnetFailover in the connection string, and to specify a Timeout value. I made these changes months ago but never got around to actually releasing the newer build on my blog. However, after releasing the two add-ins yesterday, I took a few minutes and ran through testing the demonstrator application against my local Availability Group and then uploaded the latest build for download.

Updated Availability Group Demonstrator   Availability Group Demonstrator

SQL Server 2012 Extended Events Add-in to Manage 2008/R2 Instances

Extended Events are a powerful new way of troubleshooting problems with SQL Server, and the addition of UI support in SQL Server 2012 Management Studio has helped increase the awareness and usage of this feature. One short-coming is that the new UI for Extended Events only works for SQL Server 2012 instances leaving administrators that manage multiple versions, no way to explore the option of using Extended Events on their SQL Server 2008/R2 instances.  The Extended Event Manager Add-in that I wrote for SQL Server 2008/R2 is not compatible with SQL Server 2012 and because of the new UI I never planned on making it function in Management Studio 2012.  However, I’ve had a lot of requests for this to provide backwards compatibility, and I’ve been reminded by Erin every time she presents on Extended Events about how useful it would be if I would make the Add-in work in Management Studio 2012.

Today, we’re releasing a new SQL Server 2012 Extended Events Add-in  to provide backwards compatibility with SQL Server 2008 and SQL Server 2008R2 for Extended Events by providing the following features:

  • View Extended Events Metadata for all Available Objects
  • View event sessions
  • Start/Stop event sessions
  • Create new event sessions
  • Alter event sessions
  • Drop event session
  • Script all operations
  • View target data for active event sessions
  • Configurable UI options

The SQL Server 2012 Extended Events Add-in is built on the code for the 2008 version of the add-in (available on Codeplex) with updates to resolve known bugs and allow integration in SQL Server 2012 Management Studio. A full walkthrough of the UIs in the add-in can be found on my blog post An XEvent a Day (12 of 31) – Using the Extended Events SSMS Addin.

Many thanks go out to Dan Taylor (@DBABullDog) and Paul Timmerman (@mnDBA) for beta testing the add-in over the last week, providing feedback and bug reports that were critical to getting the add-in modified for SQL Server 2012.  Also thanks to Erin (@erinstellato) for the constant reminders that this would be a useful feature.

Synchronize Availability Group Logins and Jobs

An important part of ensuring application functionality with Availability Groups is manually maintaining any uncontained objects across each of the replicas in the Availability Group. As more of our clients upgrade to SQL Server 2012 and implement Availability Groups, we’ve had to face the challenges of maintaining uncontained objects across replicas multiple times and there hasn’t been a good solution to the problem. No one solution is a perfect fit for every client, and one of the challenges is the difference in change control policies that might exist.  Some environments might be agreeable to an SSIS package that runs nightly to copy all logins and jobs, where others require a script be generated to accomplish the same tasks. 

To solve this problem, we developed the SQL Server 2012 Availability Group Add-in for SQL Server 2012 Management Studio specifically targeted at enhancing the UIs for Availability Groups.  The add-in creates an additional menu in Object Explorer for the Availability Group node for easy access.

Synchronize Availability Group Logins and Jobs   SQL Server 2012 AG Add in Menu

By clicking on the menu item, the Availability Group Synchronization UI form will open allowing you to easily step through the configuration of the different objects to synchronize and how.  Currently the add-in supports synchronizing:

  • User-defined Server Roles
  • Server Logins and Permissions
  • SQL Server Agent Jobs

The current output is a SQLCMD mode script in a new query window to allow an administrator to review all of the changes prior to changing SQL Server Management Studio to SQLCMD mode to execute the script.  This requires an additional step, but also fulfills the requirements of many change control processes.  We may evaluate based on feedback enabling the ability to automatically synchronize instead of generating a script in a future update to the add-in.

Synchronize Availability Group Logins and Jobs   image thumb

This add-in is under continuous development to add additional objects to synchronize and additional reporting for configuration reviews across replicas. The latest build of the add-in can be downloaded from the SQL Server 2012 Availability Group Add-in page on our site.

The Accidental DBA (Day 29 of 30): Troubleshooting Deadlocks

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

Deadlocks occur in SQL Server whenever two processes attempt to acquire locks on the same resources in reverse order creating a persistently blocked condition, where neither of the sessions can continue to execute until the other session releases its locks.  In SQL Server, the Lock Monitor background task is responsible for detecting and resolving deadlocks whenever they occur (basically terminating the session that has performed the least amount of work so far), and the resulting 1205 error can be a sign of problems that require further evaluation.  In a third-party vendor application, it may not be possible to make the changes necessary to eliminate deadlocks, but you can still collect information about the deadlocks to assist the third-party vendor in analysis and possibly identifying a solution to the problem.

Collecting Information

Prior to SQL Server 2008, collecting deadlock information from SQL Server required enabling a trace flag, configuring a SQL Trace, Event Notifications, or using a WMI Alert.  Trace Flags 1222, 1205, or 1204 write the deadlock information as text into the ERRORLOG.  SQL Trace, Event Notifications and  WMI Alerts allow collection of the deadlock information as XML. Since the introduction of Extended Events and the new system_health event session in SQL Server 2008, deadlock information has been captured by default in SQL Server and no longer requires enabling additional data collection for analysis.

Analysis

The definitive source for understanding the output from trace flag 1222 is a series of blog posts written by Bart Duncan. His three-part series uses the output from trace flag 1222 to demonstrate how to read the XML deadlock graph information, starting with Deadlock Troubleshooting, Part 1. The same method of analysis applies to deadlock graphs collected by SQL Trace, Event Notifications, WMI, and even Extended Events.  The format of the deadlock graph defines the deadlock victim(s), each of the processes involved in the deadlock (within the process-list node), and the resources contributing to the deadlock (within the resource-list node).  The processes each have an assigned processid that is used to uniquely identify each of the processes and the resources being locked or requested by the process in the graph.  Within each of the process’ information, the execution tsql_stack will show the most recently deadlocked statement backwards to the start of the execution call stack.

One of the key areas of focus for deadlock analysis is the resource-list portion of the graph, which contains all the information about the resources involved and the lock types being held and requested by each of the processes.  This will also contain the index and object names, or the allocation unit associated with the object, which can be used to determine the name of the object and index.  Understanding the locking order between the processes is essential for deadlock troubleshooting.

In addition to viewing the raw XML or text information for the deadlock, it is also possible to view the information graphically as explained in my blog post Graphically Viewing Extended Events Deadlock Graphs.  The graphical view in Management Studio will not show all of the same details as the XML or text, but can be a fast start for understanding the type of deadlock and locking order.  It may be necessary to look at the text or XML for further information in some scenarios, or you can also open the graph graphically in SQL Sentry’s excellent Plan Explorer Pro to get the full output parsed as a table as well.

Possible solutions

There are a lot of potential solutions to prevent deadlocks occurring, and the correct one will depend on the specific deadlock condition that is occurring. In some deadlock scenarios an index change to cover one of the queries executing may be all that is necessary to prevent the deadlock condition from being possible.  In other scenarios, it may be necessary to change isolation levels, or use locking hints to force a blocking lock that is incompatible with other locks to prevent the deadlock condition from being encountered.  Proper analysis of the deadlock graph will help with determining the appropriate solution to the problem, but in most cases simple error handling logic in Transact-SQL or .NET application code to handle the 1205 error and attempt to resubmit the victim transaction can prevent end users from being negatively affected by deadlocks occurring.

Summary

Troubleshooting deadlocks in SQL Server starts off with first collecting the deadlock graph information using one of the available methods.  Extended Events in SQL Server 2008 collect the information by default and eliminate the need to enable further collection and then waiting for the deadlocks to reoccur to gather the information.  Full details of how to configure deadlock graph collection and analysis of specific scenarios can be found in my SimpleTalk article Handling Deadlocks in SQL Server and in my Pluralsight online training course SQL Server: Deadlock Analysis and Prevention.

The Accidental DBA (Day 24 of 30): Virtualization High Availability

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

Virtualization has been popular for many years, and more and more businesses are moving low-latency line-of-business applications like SQL Server into virtual machines every day.  One of the common reasons that I’ve heard over the years for moving SQL Server to a virtual machine is that high availability is built-in.  Usually what this translates into is, “We don’t need to use SQL Server availability options because the VM already has HA.”  This may be the case for some scenarios but as the saying goes “there’s no such thing as a free lunch.”  In this post we’ll look at the high availability provided to virtual machines and the considerations that need to be taken into account when determining whether or not to implement SQL Server high availability while using virtual machines.

Basic Virtual Machine HA

The high availability provided through virtualization depends on the configuration of the host environment on which the VMs are running.  Typically for a high-availability configuration for virtualization, multiple host servers are clustered together using a shared-storage solution on a SAN, NFS, or NAS for the virtual machine hard disks.  This provides resilience against failure of one of the host servers by allowing the virtual machines to restart on one of the other hosts.  Both Hyper-V and VMware provide automated detection of guest failures in the event of a problem and will restart the VMs automatically on another host, provided that sufficient resources exist to meet any reservations configured for the individual VMs.

VMs also gain better availability over physical servers through features like Live Migration/vMotion and the ability to perform online storage migrations to move the virtual hard disks from one storage array to another one available to the host(s).  This can be very useful for planned maintenance windows, SAN upgrades, or for balancing load across the host servers to maximize performance in response to performance problems. The VM tools that are installed in the guest, to improve performance and integration with the host server, can also monitor availability of the guest through regular ‘heart-beats’ allowing the host to determine that a VM has crashed, for example a blue screen of death (BSOD), and automatically restart the guest VM in response.

VM Specific HA Features

Addition to the basic high availability provided by virtualization, there are VM-specific HA features that are offered by both VMware and Hyper-V for improving availability of individual VMs.  VMware introduced a feature for VM guests called Fault Tolerance in vSphere 4 that creates a synchronized secondary virtual machine on another host in the high-availability cluster that is lock stepped with the primary.  In the event of a host failure, guests that have Fault Tolerance enabled immediately failover to their secondary in a manner that is similar to a vMotion operation, preventing application downtime from occurring.  At the same time, a new secondary VM is created on another host inside of the cluster and synchronized with the new primary maintaining the fault tolerance of the guest inside of the environment. Unfortunately this is limited to a single virtual CPU, even in ESX 5.1 so it’s not likely to be used with SQL Server VMs.

Hyper-V does not currently provide an equivalent feature to VMware Fault Tolerance, even in Server 2012.  Hyper-V 2012 introduced Replica’s which are provide disaster recovery through replication to a remote data center with manual failover, but it doesn’t provide automated failover in a similar manner to Fault Tolerance.

SQL Server Considerations

The primary consideration I ask about when it comes to SQL Server high availability on virtualization is whether or not it is acceptable to incur planned down times associated with routine maintenance tasks like Windows Server OS patching, and SQL Server patching with Service Packs or Cumulative Updates. If a planned down time is possible to allow for patching then the high availability provided by virtualization may meet your business requirements.  However, I would always recommend testing a host failure to determine the amount of time required to detect the failure, and then restart the VM on another host, including the time required for Windows to boot, and SQL Server to perform crash recovery to make the databases available again.  This may take 3-5 minutes, or even longer depending on the environment, which may not fit within your downtime SLAs.

If planned down time for applying server patches is not possible, you will need to pick a SQL Server availability option using the same considerations as you would for a physical server implementation.  Support for Failover Clustering of SQL Server on SVVP-certified platforms was introduced in 2008, and Database Mirroring and Availability Groups are also supported under server virtualization.  However, none of the SQL Server high availability options are supported in conjunction with Hyper-V Replicas, so there are additional limitations that need to be considered whenever you combine features on top of server virtualization.  One of the limitations that should always be factored into the decision to virtualize SQL Server and use SQL native high availability options is the added complexity that exists by adding the virtualization layer to the configuration.

The Accidental DBA (Day 22 of 30): Determining a High-Availability Strategy

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

A lot of times when I talk to people about high availability for SQL Server the first thing that they think of is a failover cluster. While failover clustering is one of the high-availability features provided by SQL Server, it’s not the only option available. Selecting the correct high-availability strategy should be a part of the initial planning of a SQL Server installation, but commonly I find that high-availability considerations only become important after a problem has occurred that resulted in downtime for SQL Server. When high availability is considered as an after thought, the costs for implementing the chosen strategy may be higher than if it was implemented initially. In this post we’ll take a look at the necessary considerations for choosing a high-availability strategy

SQL Server offers many high-availability features such as database mirroring, failover clustered instances, availability groups, and log shipping. Even transactional replication can be used as a high-availability option.

Gather Requirements

The first step in the process of determining a high availability strategy for SQL Server is gathering business requirements to establish SLAs.  Paul already covered RTO and RPO in his blog post (Day 6 of 30): Backups: Understanding RTO and RPO, and these are the key requirements that need to be understood as a part of requirements gathering.  It is important to ensure that realistic expectations are set for the availability requirements for the solution.  Otherwise, when we get to the next step and begin evaluating our limitations we may need to come back to this first step again to reevaluate what is actually going to be possible.

Evaluate Limitations

Once we know the expected SLAs for the solution we can then begin to evaluate the limitations that exist to determine if we will be able to meet those SLAs or not. Limitations can generally be categorized as technical or non-technical.  For example, the budget is a non-technical limitation that is going to be important in picking technologies and determining if you can meet the SLA requirements.  There is no point in designing an entire solution around multiple data centers and SAN-based replication if the budget for high availability is only $10K.  Other important non-technical limitations to consider are the skill-set requirements for maintaining an implementation, space availability in the data center, power and cooling requirements, and even the time requirements for the implementation.  Any or all of these can be overcome if the budget supports expanding data-center space, additional training for employees, or even hiring experienced consultants to perform/help with the initial implementation when time constraints exist.

Technical limitations are limitations around SQL Server that can affect the ability of specific technologies to be used.  For example, if you can only use SQL Server Standard Edition in SQL Server 2012, the hardware will be limited to 16 cores, 64GB RAM and you won’t be able to leverage newer features like availability groups.  Other technical factors such as the volume of transaction log generation, average transaction size, database recovery model, and whether or not you can modify the application will also limit the options available when you begin selecting technologies based on the limitations and requirements.

Selecting Technologies

Once you have the requirements and limitations sorted out it is time to begin reviewing the available technologies to determine if it’s actually going to be possible to meet the SLAs within the existing limitations or not.  It is not uncommon to find that the existing limitations will not support the business requirements and SLAs, and when this occurs it is important to present the limitations and explain the SLAs won’t be achievable within the current constraints.  One of two things can happen at this point: the business requirements may be revised to work within the available limitations, or the requirements can be prioritized to determine the order of importance for the design.  The specifics the availability options in SQL Server will be covered in the next few posts in this series, but it is critical to ensure all of the features required by the application are going to be met by the selected technology.  For example, if the database uses FILESTREAM, database mirroring won’t be a viable high availability option, likewise for databases that don’t use the FULL recovery model, which is required for database mirroring and availability groups.

Testing, Validation, and Documentation

Testing and validation are generally a part of the initial implementation of a new high-availability strategy, but they should also become routine tasks to continuously validate that the current implementation continues to meet the business requirements and SLAs.  Documentation of the solution configuration, as well as the failover and recovery plan is an important part of implementing any high-availability solution.  The documentation should ideally be written by a senior team member, but then tested by the most junior person to ensure that all of the necessary steps for performing a failover or recovering from a problem have been appropriately addressed in the documentation.  During ongoing validation of the configuration, documentation updates should be made for any new limitations that are discovered or as the configuration changes to continue to meet the business SLAs.

Summary

Care must be taken when formulating a high-availability strategy to ensure that it meets  the requirements while also working within the limitations.

You can read more about these whitepapers that Paul wrote:

The Accidental DBA (Day 21 of 30): Essential PerfMon counters

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

So far this week, Erin has covered the importance of having a baseline of your SQL Server performance, some of the tools for collecting and analyzing the information collected, and Kimberly has talked about some of the important DMVs. In this post we’ll take a look at some of the most common performance counters for SQL Server, what they mean, and which ones are the most important for routine monitoring to identify problems with SQL Server.

PAL Template

Erin talked about the PAL tool earlier this week and mentioned the template that is provided by the tool for creating a Data Collector Set in Performance Monitor to capture SQL Server-specific performance counters. You can see how to do this in the article that Erin linked to, Free Tools for the DBA: PAL Tool if you haven’t already looked at this.  As a DBA, I loved the PAL tool for it’s simplicity and the time it saved me configuring performance counter collections on each of my SQL Servers, and then the time it saved me analyzing the data that was collected.

However, the PAL tool has its limitations, and one of the limitations is that it applies fixed thresholds to certain performance counters, and it uses thresholds that have been published since SQL Server 2000 by Microsoft without major updates. Until just last week, the SQL Server template applied a fixed threshold of 300 to the analysis of Page Life Expectancy which doesn’t fit the changes in server hardware memory sizes that have occurred since that value was first published.  Now the template analyzes the Page Life Expectancy based on the value you specify as being normal for the server, or a default of 700 if no value is defined.

Essential Counters

While the PAL template for SQL Server is very comprehensive, I also maintain a short list of performance counters that I use for spot checking server performance periodically.  To simplify covering each of the counters, why it’s important and what to look for, we’ll look at counters for CPU, memory, and the disk separately, starting with CPU.

CPU Usage

  • Processor
    • %Processor Time
    • %Privileged Time
  • Process (sqlservr.exe)
    • %Processor Time
    • %Privileged Time

The Processor\%Processor Time counter is probably the most familiar counter in performance monitor and gives us information about the total CPU usage for the server we are looking at unless it is a virtual machine.  For virtual machines (VMs), the Processor\%Processor Time counter is not a reliable measure of actual CPU usage for the server, and instead shows the percentage of the allocated CPU resources that the VM is actually using at a given point in time.  High values for % Processor  Time in a VM need to be checked against the host performance counters for the VM to determine if the VM is contending for physical CPU allocations with other guest VMs on the same host or not. In addition to the Processor counter we also want to collect the Process (sqlservr.exe)\%Processor Time counter to be able to balance whether a high CPU condition is actually the result of SQL Server utilization or another application that might be running on the server.  For most SQL Server workloads the average processor usage over time should be fairly consistent and unexplained increases from the baseline trending should be investigated further.  Values greater than 80% consistently should also be investigated to determine if the workload is exceeding the current hardware capabilities or for problems that are increasing the overall CPU usage for the instance.

Memory Usage

  • Memory
    • Available Mbytes
  • SQL Server:Buffer Manager
    • Lazy writes/sec
    • Page life expectancy
    • Page reads/sec
    • Page writes/sec
  • SQL Server:Memory Manager
    • Total Server Memory (KB)
    • Target Server Memory (KB)

The first counter that I start with when looking at memory usage on a SQL Server is the Memory\Available MBytes counter.  I start with this counter first to determine if Windows is experiencing memory pressure, which would affect the SQL Server performance counter values potentially.  The Memory\Available MBytes counter should ideally be above 150-300MB for the server, which leaves memory available for other applications to run without pushing Windows into a low memory condition.  When the Available MBytes counter drops below 64MB, on most servers Windows signals a low memory notification that SQL Server monitors for, and the SQLOS ( the ‘operating system’) inside of SQL Server will reduce memory usage as a result of this notification occurring.

After confirming that Windows has available memory to prevent SQL Server from shrinking memory usage, the next counters I generally look at are Buffer Manager\Page life expectancy and Lazy writes/sec.  The Page life expectancy (PLE) should be generally consistent on average, but may fluctuate with changes in the server workload, with lower values at peak periods.  In general, the more memory allocated to a SQL Server instance, the higher I’d like to see this counter’s normal value.  For example, a server with 230GB RAM allocated to the SQL Server buffer pool and a PLE of 300 would equate roughly to 785MB/sec of I/O activity to maintain the page churn inside of the buffer pool.  While it might be possible for the I/O subsystem to keep up with this demand, this represents a significant amount of page churn in the buffer pool and may be a sign of missing indexes, implicit conversions due to mismatched data types, and all kinds of other problems that can be fixed if identified as the root cause.

If you find a low PLE for the server, look at the other performance counters in the Buffer Manager category for correlation of overall memory pressure inside of SQL Server.  If Lazy writes/sec is consistently experiencing non-zero values with a low PLE and elevated values for Page reads/sec and Page writes/sec the server is experiencing buffer pool contention and you will need to go about troubleshooting this problem further.  If the Memory\Available MBytes counter was fluctuating and under the 64MB low memory threshold, looking at the Memory Manager\Total Server Memory (KB) and Target Server Memory (KB) counters will tell you if that has resulted in SQL Server reducing the size of the buffer pool in response.  On a stable system Total Server Memory (KB) will be lower than Target Server Memory (KB) during the initial buffer pool ramp up, but then remain equal under normal operation unless a low memory condition occurs.  If the server is a VM running on VMware, look for memory ballooning by the host with the VM Memory\Memory Ballooned (MB) counter for non-zero values to see if the host is causing the low memory condition.

Disk Usage

  • Physical Disk
    • Avg. Disk sec/Read
    • Avg. Disk Bytes/Read
    • Avg. Disk sec/Write
    • Avg. Disk Bytes/Write
  • Paging File
    • %Usage
  • SQL Server:Access Methods
    • Forwarded Records/sec
    • Full Scans/sec
    • Index Searches/sec

The Avg. Disk sec/Read and /Write counters provide the current average latency for each of the disks on the server.  Latency is one of the most important metrics for SQL Server I/O performance, but the latency should be compared to the size of I/Os that are being performed before determining whether or not a specific value is an indicator of a problem or not.  As the size of the I/O operations increases, so does the latency for the operation, so it would be normal to see higher latency values for a reporting workload doing large table scans vs. a transactional workload with smaller I/O operations occurring.  The Microsoft general recommendations for I/O latency for SQL Server are:

  • < 8ms: excellent
  • < 12ms: good
  • < 20ms: fair
  • > 20ms: poor

Over the last two years consulting, only a few of the servers that I’ve looked at during health checks meet the <20ms latency numbers for data and log files.  Most servers tend to fall into the < 30ms range for I/O latency per disk on average.  SSDs are changing this rapidly, and we’re seeing more and more disk configurations that include SSDs and the result is very low I/O latency.  When looking at our overall I/O for the server, reviewing the Access Method counters for Full Scans/sec, Forwarded Records/sec, and Index Searches/sec can give us a clue to the type of workload that is occurring to generate the I/O.

Summary

An important consideration when looking at performance counters, or any monitoring data from SQL Server for that matter, is that no single data point will tell you the root cause of a problem, if one occurs. For performance counters, you need to look across multiple counters for correlating information to pinpoint the root of problems.

For example, if we look at disk I/O and see high latency numbers with lower I/O sizes on average, we could infer that we have a disk bottleneck and we need to improve our I/O performance. However, if we look at the Buffer Manager\Page life expectancy and find that it is lower than our normal baseline numbers for the server, and then see the Buffer Manager\Page Reads/sec is higher than usual, the data would point to a memory contention in the buffer pool which is going to affect I/O performance to keep up with the page churn occurring in the buffer pool.

Understanding the counters and how they relate makes it much easier to spot when problems are occurring and then pinpoint where the actual root of the problem might be.

The Accidental DBA (Day 16 of 30): General Security

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

SQL Server offers multiple layers of security configuration that allow for a defense-in-depth approach to securing business critical data. A strong security implementation starts with the SQL Server installation and service account configuration, and continues into the surface area configuration for SQL Server by minimizing the options that are turned on that may increase the risk of threat from an attack to the instance, and finally securing individual logins and database level user permissions following the principal of least-privilege required to accomplish a task.

Installation and Service Accounts

A secure installation starts with properly selecting service accounts that the individual services will run under. Ideally, if the server is participating in an Active Directory domain, these will be domain accounts that are isolated per service and the accounts shouldn’t have additional permissions added to them except where necessary for the activity required. The service accounts should not be members of the Domain Admins, or the local server Administrators groups under any circumstances.

For most environments, the service account should be created as a low privilege user, and then during SQL Server setup, the required permissions will be added on the local server to allow SQL Server to use the account to run the services. The only additional permissions that should be added in most cases are the Perform Volume Maintenance Tasks and Lock Pages in Memory user rights assignements if Instant File Intialization and locked pages are going to be used by the instance, and then any specific file folder ACLs to allow the instance to access additional paths for data, transaction log, or writing backups to local disk locations. Additionally if network shares are used for backups or for file transfers, eg SSIS packages reading files for data loading, specific permissions for those locations can be added for the account. Permissions on data, transaction log, and backup file locations should be restricted to the service account and administrator users only whenever possible to minimize the accessibility of these files to other users within the domain.

Surface Area Configuration

SQL Server 2008/R2 and 2012 by default take great strides at minimizing the surface area for attack through SQL Server during setup. This low surface area configuration was first introduced in SQL Server 2005 and has been improved with each release to minimize the risks associated with a default installation of SQL Server. Features such as xp_cmdshell, OLE Automation, and SQLCLR are disabled by default to minimize the potential security risks that could occur when these features are enabled. Before enabling any of these features, consider the implications that they might have to the overall security of the server. If the server will host third-party application databases that require these features, there may be little that you can do to prevent enabling these features, but it is always worthwhile to question the need for enabling these to ensure that you’ve properly documented the reasons behind having these features enabled.

As a part of the instance surface area, the Windows Firewall should be configured to allow TCP access to the ports necessary for SQL Server connectivity.  For default instances this usually means enabling TCP ports 1433 and 1434 to allow user connections and connections to the Dedicated Administrative Connection.  For named instances, the default configuration of the instance is to use dynamic ports.  However, this should be changed for each named instance to fix it’s TCP listener port to allow the appropriate Windows Firewall rules to be created.  While it may be easier to disable the firewall, this does not minimize the attack surface for the server, and specific rules should be created instead of disabling the firewall entirely.

Controlling Server Access

An important consideration to overall security is controlling physical access to the server hardware as well as controlling access for who can actually login to the server OS. Physical security is incredibly important and should be thought out as a part of overall infrastructure security. Generally SQL Servers are installed in the corporate data center, but it is not uncommon for servers to also exist in smaller shops in unsecured locations, for example under the administrators desk or in other generally available locations within an office space. This can pose a significant risk for loss of data in the event the server is physically stolen or for downtimes if someone indavertently unplugs the server from power or the network. Servers that contain private information or data should always be locked inside a server closet or secured inside of a datacenter that has limited access to a restricted set of employees only.

Windows access to SQL Servers should equally be restricted to only administrative users that have a need to manage SQL Server or the OS that is running. One important item to remember is that any local administrator on the server has the ability to access the database and log files, backup files, and if they are so inclined gain access to SQL Server using easy to Google/Bing options that would add their account to the sysadmin fixed server role within SQL Server, even if they are not currently able to access SQL Server. The local administrators on a SQL Server should be limited to the Domain Admins group, which should contain a minimum number of users requiring that level of permissions, and, where appropriate for security controls, the database administrators only, other users should not have direct access to the server OS for a SQL Server, and the list of users that have access should be reviewed regularly to ensure that only those users requiring access actually have access to the server.

Controlling SQL Server Permissions

Permissions within the SQL Server instance can be granted at multiple levels from the server instance level to individual databases and even objects with a database. In general server level permissions should be restricted to administrators that need to manage the SQL Server instance. Most server level permissions are granted through the use of one of the many fixed server roles that are a part of the default installation of SQL Server. The sysadmin, serveradmin, and securityadmin roles should be minimized to use only be actual administrator requiring permissions to manage SQL Server. Just like the Local Administrators group in Windows, these roles provide elevated permissions to control the instance, make configuration changes, create additional logins and manage security within the instance.  Application accounts should not be a member of these roles as a matter of least privilege, specifically the sysadmin role. However, certain applications like SharePoint require the securityadmin fixed server role and there is nothing you can do as an administrator to prevent this. The sysadmin role should be audited routinely like the Local Administrators group in Windows to track the logins that have administrative control over the instance.

Application logins should continue the principal of least privilege by only having access to the databases(s) and objects required by them. While permissions to individual objects should be enforced as a general security best practice, the reality of most third-party applications is that the application login will need to be the dbo or a member of the db_owner database role for the databases that are used by the application.  In cases like this, there will also be little that can be done as an administrator to change the security requirements for the application.  However, in cases where a third party application requires elevated permissions to the sysadmin fixed server role, understanding the requirements for this high level role should be a top priority, and any application with this elevated level of access should be isolated from other databases on separate instances to minimize the risk imposed.

Summary

Creating a secure SQL Server installation starts with pre-installation planning of the service accounts, to minimize the permissions the instance has to other resources within the network. After installing the instance, minimizing the surface area through proper firewall configuration and leaving configuration options that are disabled by default disabled unless absolutely necessary continues to build on the defense in depth concept for SQL Server. Auditing of the Local Administrators group in Windows and the sysadmin fixed server role should be a part of normal system security reviews to understand who has administrative access to the server or instance. Further information about SQL Server security can be found in Paul’s 2009 TechNet article Security: Common SQL Server Security Issues  and Solutions.