SQL Server 2016 Distributed Replay Errors

If you’ve tried to install and configure Distributed Replay in SQL Server 2016, I wouldn’t be surprised to hear that you ran into all sorts of problems and probably didn’t end up getting it to work in a multi-client setup and eventually gave up. For whatever reason, Microsoft didn’t make the initial configuration of Distributed Replay in 2012, 2014 or 2016 very user friendly, and the error messages that you get when something isn’t correctly configured are less than helpful.  For example:

2017-05-31 10:05:25:211 Error DReplay   Unexpected error occurred!

Critical Error: code=[c8503012], msg=Unexpected error occurred!

Security violation with invalid remote caller.

Error Code: 0xC8502002

None of these errors help to pinpoint the cause of the problems, all of which are security/permissions related from what I’ve run into so far, but it’s not easy to figure it out unless you already know a fair bit about Distributed Replay and how it SHOULD be configured so you can spot where problems might be and try making changes.

Defaults After Installation

For my 2016 environment, I installed a Distributed Replay Controller and two separate Distributed Replay Clients, all of which used Service SIDs and were configured to use the correct controller following the installer information in my 2012 post.  The only difference, aside from server names, was that I didn’t setup domain service accounts and let the installer setup Service SIDs for the controller and client services.  When I start the controller service I get the following in the log using Windows Server 2016 and SQL Server 2016:

2017-05-31 11:05:29:669 OPERATIONAL  [Controller Service]  Microsoft SQL Server Distributed Replay Controller – 13.0.1601.5.
2017-05-31 11:05:29:669 OPERATIONAL  [Controller Service]  © Microsoft Corporation.
2017-05-31 11:05:29:669 OPERATIONAL  [Controller Service]  All rights reserved.
2017-05-31 11:05:29:684 OPERATIONAL  [Controller Service]  Current edition is: [Enterprise Edition].
2017-05-31 11:05:29:684 OPERATIONAL  [Controller Service]  The number of maximum supported client is 16.
2017-05-31 11:05:29:684 OPERATIONAL  [Controller Service]  Windows service “Microsoft SQL Server Distributed Replay Controller” has started under service account “NT SERVICE\SQL Server Distributed Replay Controller”. Process ID is 6572.
2017-05-31 11:05:29:684 OPERATIONAL  [Controller Service]  Time Zone: Eastern Standard Time.
2017-05-31 11:05:29:684 OPERATIONAL  [Common]              Initializing dump support.
2017-05-31 11:05:29:684 OPERATIONAL  [Common]              Failed to get DmpClient. [HRESULT=0x8007007F]

The Failed to get DmpClient error seems to be pretty common from Google search results, but isn’t actually a problem.  So if I start the clients, I get the following in the logs:

2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      Microsoft SQL Server Distributed Replay Client – 13.0.1601.5.
2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      © Microsoft Corporation.
2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      All rights reserved.
2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      Current edition is: [Enterprise Edition].
2017-05-31 11:12:16:672 OPERATIONAL  [Common]              Initializing dump support.
2017-05-31 11:12:16:672 OPERATIONAL  [Common]              Failed to get DmpClient. [HRESULT=0x8007007F]
2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      Windows service “Microsoft SQL Server Distributed Replay Client” has started under service account “NT SERVICE\SQL Server Distributed Replay Client”. Process ID is 7008.
2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      Time Zone: Eastern Standard Time.
2017-05-31 11:12:16:688 OPERATIONAL  [Client Service]      Controller name is “SQL2K16-AG01”.
2017-05-31 11:12:16:688 OPERATIONAL  [Client Service]      Working directory is “C:\Program Files (x86)\Microsoft SQL Server\130\Tools\DReplayClient\WorkingDir”.
2017-05-31 11:12:16:688 OPERATIONAL  [Client Service]      Result directory is “C:\Program Files (x86)\Microsoft SQL Server\130\Tools\DReplayClient\ResultDir”.
2017-05-31 11:12:16:688 OPERATIONAL  [Client Service]      Heartbeat Frequency(ms): 3000
2017-05-31 11:12:16:688 OPERATIONAL  [Client Service]      Heartbeats Before Timeout: 3

Notice that the last line DOES NOT say it was registered with the controller.  It should say Registered with controller “SQL2K16-AG01” if it had successfully registered, but it doesn’t so something isn’t allowing the client to register correctly with the controller.  To prove this, if we attempt a replay operation using the controller, it will output the following:

image

C:\DRUDemo>dreplay replay -s SQL2K16-AG01 -w “SQL2K16-AG02, SQL2K16-AG03” -f 10 -d “C:\DRUDemo\ReplayFiles” -o -c “c:\DRUDemo\DReplay.Exe.Replay.config”

2017-05-31 11:14:24:467 Error DReplay   The client ‘SQL2K16-AG02’ is not a registered distributed replay client. Make sure that the SQL Server Distributed Replay Client services is running on ‘SQL2K16-AG02’, and that the client is registered with controller ‘localhost’.

So this confirms that out-of-the-box 2016 DRU won’t work and permissions changes will be required to make it work properly.

Configuring Component Services Permissions

On the Distributed Replay Controller machine, permissions need to be set in Component Services to allow the Distributed Replay Client Service accounts Launch and Activate permissions remotely on the COM component. The service accounts also need to be in the Distributed COM Users group in Windows. So in Component Services, expand Computers > My Computer > DCOM Config > DReplayController and right-click and open the Properties for the COM Component.

image

Edit the Launch and Activation Permissions and add the Service Account for the clients, in this case because a Service SID is being used, the computer account from Active Directory for each client machine, and allow Local Launch, Remote Launch, Local Activation and Remote Activation.  Then edit the Access permissions and set Local Access and Remote Access for the Service accounts again.

imageimage

Now as I mentioned above, the service accounts also need to be in the Distributed COM Users group in Windows. So make sure that the service accounts have been added to that group, and restart the services on the controller and clients. Checking the Client log file should now show:

2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Microsoft SQL Server Distributed Replay Client – 13.0.1601.5.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      © Microsoft Corporation.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      All rights reserved.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Current edition is: [Enterprise Edition].
2017-05-31 11:20:27:454 OPERATIONAL  [Common]              Initializing dump support.
2017-05-31 11:20:27:454 OPERATIONAL  [Common]              Failed to get DmpClient. [HRESULT=0x8007007F]
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Windows service “Microsoft SQL Server Distributed Replay Client” has started under service account “NT SERVICE\SQL Server Distributed Replay Client”. Process ID is 6172.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Time Zone: Eastern Standard Time.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Controller name is “SQL2K16-AG01”.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Working directory is “C:\Program Files (x86)\Microsoft SQL Server\130\Tools\DReplayClient\WorkingDir”.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Result directory is “C:\Program Files (x86)\Microsoft SQL Server\130\Tools\DReplayClient\ResultDir”.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Heartbeat Frequency(ms): 3000
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Heartbeats Before Timeout: 3
2017-05-31 11:21:21:798 OPERATIONAL  [Client Service]      Registered with controller “SQL2K16-AG01”.

However, when we try a replay operation now, we get:

C:\DRUDemo>dreplay replay -s SQL2K16-AG01 -w “SQL2K16-AG02, SQL2K16-AG03” -f 10 -d “C:\DRUDemo\ReplayFiles” -o -c “c:\DRUDemo\DReplay.Exe.Replay.config”

2017-05-31 11:21:33:203 Error DReplay Unexpected error occurred!

Yep that’s really helpful, so lets go check the controller log and see what it has:

2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]  Event replay in progress. Detailed options:
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Target DB Server: [SQL2K16-AG01].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Controller Working Directory: [C:\DRUDemo\ReplayFiles].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Generate Result Trace: [Yes].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Sequencing Mode: [SYNC].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Connect Time Scale: [100].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Think Time Scale: [100].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Healthmon Polling Interval: [60].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Query Timeout: [3600].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Data Provider Type: [ODBC].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Threads Per Client: [255].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Record Row Count: [Yes].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Record Result Set: [No].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Connection Pooling Enabled: [Yes].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Stress Scale Granularity: [Connection].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Replay Clients: [SQL2K16-AG02, SQL2K16-AG03].
2017-05-31 11:21:33:203 CRITICAL     [Controller Service] **** Critical Error ****
2017-05-31 11:21:33:203 CRITICAL     [Controller Service]  Machine Name: SQL2K16-AG01
2017-05-31 11:21:33:203 CRITICAL     [Controller Service] Error Code: 0xC8502002
2017-05-31 11:21:33:203 OPERATIONAL  [Controller Service]  Event replay completed.
2017-05-31 11:21:33:203 OPERATIONAL  [Controller Service]  Elapsed time: 0 day(s), 0 hour(s), 0 minute(s), 1 second(s).

You can try and Google/Bing that error code, hopefully you already did and it brought you to this blog post.  So lets go back and check the client logs again, and we find these added messages:

2017-05-31 11:21:33:189 CRITICAL     [Client Service]      Security violation with invalid remote caller.
2017-05-31 11:21:33:189 CRITICAL     [Client Service]      Caller auth level is 2.
2017-05-31 11:21:33:189 CRITICAL     [Client Service]      Caller impersonation level is 1.
2017-05-31 11:21:33:189 CRITICAL     [Client Service]      Caller identity is SQLSKILLSDEMOS\SQL2K16-AG01$.
2017-05-31 11:21:33:189 CRITICAL     [Client Service]      Controller account is NT SERVICE\SQL Server Distributed Replay Controller.

So this points to another security issue, but I wasn’t sure how to go about troubleshooting this further using s Service SID, so at this point I changed from Service SIDs to Active Directory User accounts to run the services, DReplayClient for the clients and DReplayController for the controller.  I reset all the permissions in Component Services on the controller machine and assigned the DReplayClient account to the Distributed COM Users group on the controller machine and gave it another shot.

image

C:\DRUDemo>dreplay replay -s SQL2K16-AG01 -w “SQL2K16-AG02, SQL2K16-AG03” -f 10 -d “C:\DRUDemo\ReplayFiles” -o -c “c:\DRUDemo\DReplay.Exe.Replay.config”

2017-05-31 11:37:51:189 Info DReplay    Dispatching in progress.
2017-05-31 11:37:51:189 Info DReplay    0 events have been dispatched.
2017-05-31 11:37:58:892 Info DReplay    Dispatching has completed.
2017-05-31 11:37:58:892 Info DReplay    0 events dispatched in total.
2017-05-31 11:37:58:892 Info DReplay    Elapsed time: 0 day(s), 0 hour(s), 0 minute(s), 0 second(s).
2017-05-31 11:37:58:892 Info DReplay    Event replay in progress.
2017-05-31 11:37:58:892 Info DReplay    Event replay has completed.
2017-05-31 11:37:58:892 Info DReplay    0 events (100 %) have been replayed in total. Pass rate 0.00 %.
2017-05-31 11:37:58:892 Info DReplay    Elapsed time: 0 day(s), 0 hour(s), 0 minute(s), 9 second(s).
2017-05-31 11:37:58:892 Error DReplay   Unexpected error occurred!

Well at least this time there was slightly more progress, it attempts to begin dispatching events, but ends miserably with another not so helpful error message.  Looking at the replay client logs, the following information is output:

2017-05-31 11:35:56:969 OPERATIONAL  [Controller Service]  Event replay in progress. Detailed options:
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Target DB Server: [SQL2K16-AG01].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Controller Working Directory: [C:\DRUDemo\ReplayFiles].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Generate Result Trace: [Yes].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Sequencing Mode: [SYNC].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Connect Time Scale: [100].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Think Time Scale: [100].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Healthmon Polling Interval: [60].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Query Timeout: [3600].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Data Provider Type: [ODBC].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Threads Per Client: [255].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Record Row Count: [Yes].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Record Result Set: [No].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Connection Pooling Enabled: [Yes].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Stress Scale Granularity: [Connection].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Replay Clients: [SQL2K16-AG02, SQL2K16-AG03].
2017-05-31 11:35:59:048 OPERATIONAL  [Controller Service]  Event dispatch in progress.
2017-05-31 11:36:05:766 OPERATIONAL  [Controller Service]  Event replay completed.
2017-05-31 11:36:05:766 OPERATIONAL  [Controller Service]  Elapsed time: 0 day(s), 0 hour(s), 0 minute(s), 8 second(s).

Not much help there either, and this is where I would expect that most people would end up giving up because there is nothing really actionable here at all.

Client Service Account Permissions on Target SQL Server

While nothing is documented about changes in Distributed Replay behavior in SQL Server 2016, this last error is different than the behavior of previous versions of Distributed Replay.  The problem is that the DReplayClient service account doesn’t have permissions in the target SQL Server to connect.  To prove this, here is a trace capture of User Error Message events from the last replay operation attempt:

image

Each of the Replay clients is attempting to connect to the target server and failing.  If we add the DReplayClient login to the target SQL Server and retry the reply, everything checks out and it actually begins to dispatch the events for the replay operation:

image

C:\DRUDemo>dreplay replay -s SQL2K16-AG01 -w “SQL2K16-AG02, SQL2K16-AG03” -f 10 -d “C:\DRUDemo\ReplayFiles” -o -c “c:\DRUDemo\DReplay.Exe.Replay.config”

2017-05-31 11:45:14:376 Info DReplay    Dispatching in progress.
2017-05-31 11:45:14:376 Info DReplay    0 events have been dispatched.
2017-05-31 11:45:24:377 Info DReplay    30753 events have been dispatched.
2017-05-31 11:45:34:377 Info DReplay    68262 events have been dispatched.
2017-05-31 11:45:44:377 Info DReplay    106677 events have been dispatched.
2017-05-31 11:45:54:393 Info DReplay    144226 events have been dispatched.
2017-05-31 11:46:04:408 Info DReplay    183595 events have been dispatched.
2017-05-31 11:46:14:424 Info DReplay    221378 events have been dispatched.
2017-05-31 11:46:24:424 Info DReplay    257754 events have been dispatched.
2017-05-31 11:46:34:455 Info DReplay    298436 events have been dispatched.
2017-05-31 11:46:44:471 Info DReplay    336026 events have been dispatched.
2017-05-31 11:46:54:471 Info DReplay    373717 events have been dispatched.
2017-05-31 11:47:04:486 Info DReplay    410378 events have been dispatched.
2017-05-31 11:47:14:502 Info DReplay    449949 events have been dispatched.
2017-05-31 11:47:24:518 Info DReplay    486431 events have been dispatched.
2017-05-31 11:47:34:533 Info DReplay    526228 events have been dispatched.
2017-05-31 11:47:44:549 Info DReplay    563484 events have been dispatched.
2017-05-31 11:47:48:361 Info DReplay    Dispatching has completed.
2017-05-31 11:47:48:361 Info DReplay    573630 events dispatched in total.
2017-05-31 11:47:48:361 Info DReplay    Elapsed time: 0 day(s), 0 hour(s), 2 minute(s), 35 second(s).
2017-05-31 11:47:48:361 Info DReplay    Event replay in progress.

Conclusion

Permissions, permissions, permissions…  While having some of the worst error messages imaginable, the problems with getting Distributed Replay in SQL Server 2016 configured have so far boiled down to permissions issues and ensuring that the permissions for the service accounts are set correctly on the Controller machine, and on the Target Server for the replay operation should resolve the issues.  Don’t forget to configure Firewall rules to allow access to the network appropriately as described in the original 2012 DRU post I wrote a few years back.   Hopefully this post will save someone the trouble of trying to figure this all out blindly.

Using Group Managed Service Accounts for SQL Server

This afternoon I ran into an issue after rebooting one of my lab environment VM’s for SQL Server 2016 to give the VM more memory to allow some tests I’m doing with in-memory OLTP.  When the VM rebooted I tried to connect to SQL Server with SSMS and the connection failed.  So I opened up SQL Server Configuration Manager and found that the instance of SQL Server didn’t start when Windows started, so I tried to manually start the instance and received the following error:

image

When I opened up the SQL Server ERRORLOG file, I found the following errors:

2017-05-30 17:08:17.65 spid11s     Error: 17190, Severity: 16, State: 1.
2017-05-30 17:08:17.65 spid11s     Initializing the FallBack certificate failed with error code: 1, state: 20, error number: 0.
2017-05-30 17:08:17.66 spid11s     Unable to initialize SSL encryption because a valid certificate could not be found, and it is not possible to create a self-signed certificate.
2017-05-30 17:08:17.66 spid11s     Error: 17182, Severity: 16, State: 1.
2017-05-30 17:08:17.66 spid11s     TDSSNIClient initialization failed with error 0x80092004, status code 0x80. Reason: Unable to initialize SSL support. Cannot find object or property.
2017-05-30 17:08:17.66 spid11s     Error: 17182, Severity: 16, State: 1.
2017-05-30 17:08:17.66 spid11s     TDSSNIClient initialization failed with error 0x80092004, status code 0x1. Reason: Initialization failed with an infrastructure error. Check for previous errors. Cannot find object or property.
2017-05-30 17:08:17.66 spid11s     Error: 17826, Severity: 18, State: 3.
2017-05-30 17:08:17.66 spid11s     Could not start the network library because of an internal error in the network library. To determine the cause, review the errors immediately preceding this one in the error log.
2017-05-30 17:08:17.66 spid11s     Error: 17120, Severity: 16, State: 1.
2017-05-30 17:08:17.66 spid11s     SQL Server could not spawn FRunCommunicationsManager thread. Check the SQL Server error log and the Windows event logs for information about possible related problems.

 

This is when I remembered that I had set this lab VM up to use a Group Managed Service Account (gMSA) for SQL Server, and my Active Directory Domain Controller for the Lab environment wasn’t running.  The instance couldn’t start because it couldn’t talk to the domain controller to obtain the credentials for the gMSA which as the service account for SQL Server is at the top of the Encryption Hierarchy for the instance.  So a lesson learned on lab environment VM’s that use managed service accounts, you have to have the Active Directory Domain controller running or the instance won’t start because it can’t retrieve the credential information for the service account to run the instance.  This can also happen in production environments where a domain controller might not be local to a SQL Server installation and networking issues prevent the SQL instance from being able to retrieve the credentials to start the server.

The Accidental DBA (Day 16 of 30): General Security

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

SQL Server offers multiple layers of security configuration that allow for a defense-in-depth approach to securing business critical data. A strong security implementation starts with the SQL Server installation and service account configuration, and continues into the surface area configuration for SQL Server by minimizing the options that are turned on that may increase the risk of threat from an attack to the instance, and finally securing individual logins and database level user permissions following the principal of least-privilege required to accomplish a task.

Installation and Service Accounts

A secure installation starts with properly selecting service accounts that the individual services will run under. Ideally, if the server is participating in an Active Directory domain, these will be domain accounts that are isolated per service and the accounts shouldn’t have additional permissions added to them except where necessary for the activity required. The service accounts should not be members of the Domain Admins, or the local server Administrators groups under any circumstances.

For most environments, the service account should be created as a low privilege user, and then during SQL Server setup, the required permissions will be added on the local server to allow SQL Server to use the account to run the services. The only additional permissions that should be added in most cases are the Perform Volume Maintenance Tasks and Lock Pages in Memory user rights assignements if Instant File Intialization and locked pages are going to be used by the instance, and then any specific file folder ACLs to allow the instance to access additional paths for data, transaction log, or writing backups to local disk locations. Additionally if network shares are used for backups or for file transfers, eg SSIS packages reading files for data loading, specific permissions for those locations can be added for the account. Permissions on data, transaction log, and backup file locations should be restricted to the service account and administrator users only whenever possible to minimize the accessibility of these files to other users within the domain.

Surface Area Configuration

SQL Server 2008/R2 and 2012 by default take great strides at minimizing the surface area for attack through SQL Server during setup. This low surface area configuration was first introduced in SQL Server 2005 and has been improved with each release to minimize the risks associated with a default installation of SQL Server. Features such as xp_cmdshell, OLE Automation, and SQLCLR are disabled by default to minimize the potential security risks that could occur when these features are enabled. Before enabling any of these features, consider the implications that they might have to the overall security of the server. If the server will host third-party application databases that require these features, there may be little that you can do to prevent enabling these features, but it is always worthwhile to question the need for enabling these to ensure that you’ve properly documented the reasons behind having these features enabled.

As a part of the instance surface area, the Windows Firewall should be configured to allow TCP access to the ports necessary for SQL Server connectivity.  For default instances this usually means enabling TCP ports 1433 and 1434 to allow user connections and connections to the Dedicated Administrative Connection.  For named instances, the default configuration of the instance is to use dynamic ports.  However, this should be changed for each named instance to fix it’s TCP listener port to allow the appropriate Windows Firewall rules to be created.  While it may be easier to disable the firewall, this does not minimize the attack surface for the server, and specific rules should be created instead of disabling the firewall entirely.

Controlling Server Access

An important consideration to overall security is controlling physical access to the server hardware as well as controlling access for who can actually login to the server OS. Physical security is incredibly important and should be thought out as a part of overall infrastructure security. Generally SQL Servers are installed in the corporate data center, but it is not uncommon for servers to also exist in smaller shops in unsecured locations, for example under the administrators desk or in other generally available locations within an office space. This can pose a significant risk for loss of data in the event the server is physically stolen or for downtimes if someone indavertently unplugs the server from power or the network. Servers that contain private information or data should always be locked inside a server closet or secured inside of a datacenter that has limited access to a restricted set of employees only.

Windows access to SQL Servers should equally be restricted to only administrative users that have a need to manage SQL Server or the OS that is running. One important item to remember is that any local administrator on the server has the ability to access the database and log files, backup files, and if they are so inclined gain access to SQL Server using easy to Google/Bing options that would add their account to the sysadmin fixed server role within SQL Server, even if they are not currently able to access SQL Server. The local administrators on a SQL Server should be limited to the Domain Admins group, which should contain a minimum number of users requiring that level of permissions, and, where appropriate for security controls, the database administrators only, other users should not have direct access to the server OS for a SQL Server, and the list of users that have access should be reviewed regularly to ensure that only those users requiring access actually have access to the server.

Controlling SQL Server Permissions

Permissions within the SQL Server instance can be granted at multiple levels from the server instance level to individual databases and even objects with a database. In general server level permissions should be restricted to administrators that need to manage the SQL Server instance. Most server level permissions are granted through the use of one of the many fixed server roles that are a part of the default installation of SQL Server. The sysadmin, serveradmin, and securityadmin roles should be minimized to use only be actual administrator requiring permissions to manage SQL Server. Just like the Local Administrators group in Windows, these roles provide elevated permissions to control the instance, make configuration changes, create additional logins and manage security within the instance.  Application accounts should not be a member of these roles as a matter of least privilege, specifically the sysadmin role. However, certain applications like SharePoint require the securityadmin fixed server role and there is nothing you can do as an administrator to prevent this. The sysadmin role should be audited routinely like the Local Administrators group in Windows to track the logins that have administrative control over the instance.

Application logins should continue the principal of least privilege by only having access to the databases(s) and objects required by them. While permissions to individual objects should be enforced as a general security best practice, the reality of most third-party applications is that the application login will need to be the dbo or a member of the db_owner database role for the databases that are used by the application.  In cases like this, there will also be little that can be done as an administrator to change the security requirements for the application.  However, in cases where a third party application requires elevated permissions to the sysadmin fixed server role, understanding the requirements for this high level role should be a top priority, and any application with this elevated level of access should be isolated from other databases on separate instances to minimize the risk imposed.

Summary

Creating a secure SQL Server installation starts with pre-installation planning of the service accounts, to minimize the permissions the instance has to other resources within the network. After installing the instance, minimizing the surface area through proper firewall configuration and leaving configuration options that are disabled by default disabled unless absolutely necessary continues to build on the defense in depth concept for SQL Server. Auditing of the Local Administrators group in Windows and the sysadmin fixed server role should be a part of normal system security reviews to understand who has administrative access to the server or instance. Further information about SQL Server security can be found in Paul’s 2009 TechNet article Security: Common SQL Server Security Issues  and Solutions.