SQL Server 2016 Distributed Replay Errors

If you’ve tried to install and configure Distributed Replay in SQL Server 2016, I wouldn’t be surprised to hear that you ran into all sorts of problems and probably didn’t end up getting it to work in a multi-client setup and eventually gave up. For whatever reason, Microsoft didn’t make the initial configuration of Distributed Replay in 2012, 2014 or 2016 very user friendly, and the error messages that you get when something isn’t correctly configured are less than helpful.  For example:

2017-05-31 10:05:25:211 Error DReplay   Unexpected error occurred!

Critical Error: code=[c8503012], msg=Unexpected error occurred!

Security violation with invalid remote caller.

Error Code: 0xC8502002

None of these errors help to pinpoint the cause of the problems, all of which are security/permissions related from what I’ve run into so far, but it’s not easy to figure it out unless you already know a fair bit about Distributed Replay and how it SHOULD be configured so you can spot where problems might be and try making changes.

Defaults After Installation

For my 2016 environment, I installed a Distributed Replay Controller and two separate Distributed Replay Clients, all of which used Service SIDs and were configured to use the correct controller following the installer information in my 2012 post.  The only difference, aside from server names, was that I didn’t setup domain service accounts and let the installer setup Service SIDs for the controller and client services.  When I start the controller service I get the following in the log using Windows Server 2016 and SQL Server 2016:

2017-05-31 11:05:29:669 OPERATIONAL  [Controller Service]  Microsoft SQL Server Distributed Replay Controller – 13.0.1601.5.
2017-05-31 11:05:29:669 OPERATIONAL  [Controller Service]  © Microsoft Corporation.
2017-05-31 11:05:29:669 OPERATIONAL  [Controller Service]  All rights reserved.
2017-05-31 11:05:29:684 OPERATIONAL  [Controller Service]  Current edition is: [Enterprise Edition].
2017-05-31 11:05:29:684 OPERATIONAL  [Controller Service]  The number of maximum supported client is 16.
2017-05-31 11:05:29:684 OPERATIONAL  [Controller Service]  Windows service “Microsoft SQL Server Distributed Replay Controller” has started under service account “NT SERVICE\SQL Server Distributed Replay Controller”. Process ID is 6572.
2017-05-31 11:05:29:684 OPERATIONAL  [Controller Service]  Time Zone: Eastern Standard Time.
2017-05-31 11:05:29:684 OPERATIONAL  [Common]              Initializing dump support.
2017-05-31 11:05:29:684 OPERATIONAL  [Common]              Failed to get DmpClient. [HRESULT=0x8007007F]

The Failed to get DmpClient error seems to be pretty common from Google search results, but isn’t actually a problem.  So if I start the clients, I get the following in the logs:

2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      Microsoft SQL Server Distributed Replay Client – 13.0.1601.5.
2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      © Microsoft Corporation.
2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      All rights reserved.
2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      Current edition is: [Enterprise Edition].
2017-05-31 11:12:16:672 OPERATIONAL  [Common]              Initializing dump support.
2017-05-31 11:12:16:672 OPERATIONAL  [Common]              Failed to get DmpClient. [HRESULT=0x8007007F]
2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      Windows service “Microsoft SQL Server Distributed Replay Client” has started under service account “NT SERVICE\SQL Server Distributed Replay Client”. Process ID is 7008.
2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      Time Zone: Eastern Standard Time.
2017-05-31 11:12:16:688 OPERATIONAL  [Client Service]      Controller name is “SQL2K16-AG01”.
2017-05-31 11:12:16:688 OPERATIONAL  [Client Service]      Working directory is “C:\Program Files (x86)\Microsoft SQL Server\130\Tools\DReplayClient\WorkingDir”.
2017-05-31 11:12:16:688 OPERATIONAL  [Client Service]      Result directory is “C:\Program Files (x86)\Microsoft SQL Server\130\Tools\DReplayClient\ResultDir”.
2017-05-31 11:12:16:688 OPERATIONAL  [Client Service]      Heartbeat Frequency(ms): 3000
2017-05-31 11:12:16:688 OPERATIONAL  [Client Service]      Heartbeats Before Timeout: 3

Notice that the last line DOES NOT say it was registered with the controller.  It should say Registered with controller “SQL2K16-AG01” if it had successfully registered, but it doesn’t so something isn’t allowing the client to register correctly with the controller.  To prove this, if we attempt a replay operation using the controller, it will output the following:

image

C:\DRUDemo>dreplay replay -s SQL2K16-AG01 -w “SQL2K16-AG02, SQL2K16-AG03” -f 10 -d “C:\DRUDemo\ReplayFiles” -o -c “c:\DRUDemo\DReplay.Exe.Replay.config”

2017-05-31 11:14:24:467 Error DReplay   The client ‘SQL2K16-AG02’ is not a registered distributed replay client. Make sure that the SQL Server Distributed Replay Client services is running on ‘SQL2K16-AG02’, and that the client is registered with controller ‘localhost’.

So this confirms that out-of-the-box 2016 DRU won’t work and permissions changes will be required to make it work properly.

Configuring Component Services Permissions

On the Distributed Replay Controller machine, permissions need to be set in Component Services to allow the Distributed Replay Client Service accounts Launch and Activate permissions remotely on the COM component. The service accounts also need to be in the Distributed COM Users group in Windows. So in Component Services, expand Computers > My Computer > DCOM Config > DReplayController and right-click and open the Properties for the COM Component.

image

Edit the Launch and Activation Permissions and add the Service Account for the clients, in this case because a Service SID is being used, the computer account from Active Directory for each client machine, and allow Local Launch, Remote Launch, Local Activation and Remote Activation.  Then edit the Access permissions and set Local Access and Remote Access for the Service accounts again.

imageimage

Now as I mentioned above, the service accounts also need to be in the Distributed COM Users group in Windows. So make sure that the service accounts have been added to that group, and restart the services on the controller and clients. Checking the Client log file should now show:

2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Microsoft SQL Server Distributed Replay Client – 13.0.1601.5.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      © Microsoft Corporation.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      All rights reserved.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Current edition is: [Enterprise Edition].
2017-05-31 11:20:27:454 OPERATIONAL  [Common]              Initializing dump support.
2017-05-31 11:20:27:454 OPERATIONAL  [Common]              Failed to get DmpClient. [HRESULT=0x8007007F]
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Windows service “Microsoft SQL Server Distributed Replay Client” has started under service account “NT SERVICE\SQL Server Distributed Replay Client”. Process ID is 6172.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Time Zone: Eastern Standard Time.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Controller name is “SQL2K16-AG01”.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Working directory is “C:\Program Files (x86)\Microsoft SQL Server\130\Tools\DReplayClient\WorkingDir”.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Result directory is “C:\Program Files (x86)\Microsoft SQL Server\130\Tools\DReplayClient\ResultDir”.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Heartbeat Frequency(ms): 3000
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Heartbeats Before Timeout: 3
2017-05-31 11:21:21:798 OPERATIONAL  [Client Service]      Registered with controller “SQL2K16-AG01”.

However, when we try a replay operation now, we get:

C:\DRUDemo>dreplay replay -s SQL2K16-AG01 -w “SQL2K16-AG02, SQL2K16-AG03” -f 10 -d “C:\DRUDemo\ReplayFiles” -o -c “c:\DRUDemo\DReplay.Exe.Replay.config”

2017-05-31 11:21:33:203 Error DReplay Unexpected error occurred!

Yep that’s really helpful, so lets go check the controller log and see what it has:

2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]  Event replay in progress. Detailed options:
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Target DB Server: [SQL2K16-AG01].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Controller Working Directory: [C:\DRUDemo\ReplayFiles].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Generate Result Trace: [Yes].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Sequencing Mode: [SYNC].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Connect Time Scale: [100].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Think Time Scale: [100].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Healthmon Polling Interval: [60].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Query Timeout: [3600].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Data Provider Type: [ODBC].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Threads Per Client: [255].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Record Row Count: [Yes].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Record Result Set: [No].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Connection Pooling Enabled: [Yes].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Stress Scale Granularity: [Connection].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Replay Clients: [SQL2K16-AG02, SQL2K16-AG03].
2017-05-31 11:21:33:203 CRITICAL     [Controller Service] **** Critical Error ****
2017-05-31 11:21:33:203 CRITICAL     [Controller Service]  Machine Name: SQL2K16-AG01
2017-05-31 11:21:33:203 CRITICAL     [Controller Service] Error Code: 0xC8502002
2017-05-31 11:21:33:203 OPERATIONAL  [Controller Service]  Event replay completed.
2017-05-31 11:21:33:203 OPERATIONAL  [Controller Service]  Elapsed time: 0 day(s), 0 hour(s), 0 minute(s), 1 second(s).

You can try and Google/Bing that error code, hopefully you already did and it brought you to this blog post.  So lets go back and check the client logs again, and we find these added messages:

2017-05-31 11:21:33:189 CRITICAL     [Client Service]      Security violation with invalid remote caller.
2017-05-31 11:21:33:189 CRITICAL     [Client Service]      Caller auth level is 2.
2017-05-31 11:21:33:189 CRITICAL     [Client Service]      Caller impersonation level is 1.
2017-05-31 11:21:33:189 CRITICAL     [Client Service]      Caller identity is SQLSKILLSDEMOS\SQL2K16-AG01$.
2017-05-31 11:21:33:189 CRITICAL     [Client Service]      Controller account is NT SERVICE\SQL Server Distributed Replay Controller.

So this points to another security issue, but I wasn’t sure how to go about troubleshooting this further using s Service SID, so at this point I changed from Service SIDs to Active Directory User accounts to run the services, DReplayClient for the clients and DReplayController for the controller.  I reset all the permissions in Component Services on the controller machine and assigned the DReplayClient account to the Distributed COM Users group on the controller machine and gave it another shot.

image

C:\DRUDemo>dreplay replay -s SQL2K16-AG01 -w “SQL2K16-AG02, SQL2K16-AG03” -f 10 -d “C:\DRUDemo\ReplayFiles” -o -c “c:\DRUDemo\DReplay.Exe.Replay.config”

2017-05-31 11:37:51:189 Info DReplay    Dispatching in progress.
2017-05-31 11:37:51:189 Info DReplay    0 events have been dispatched.
2017-05-31 11:37:58:892 Info DReplay    Dispatching has completed.
2017-05-31 11:37:58:892 Info DReplay    0 events dispatched in total.
2017-05-31 11:37:58:892 Info DReplay    Elapsed time: 0 day(s), 0 hour(s), 0 minute(s), 0 second(s).
2017-05-31 11:37:58:892 Info DReplay    Event replay in progress.
2017-05-31 11:37:58:892 Info DReplay    Event replay has completed.
2017-05-31 11:37:58:892 Info DReplay    0 events (100 %) have been replayed in total. Pass rate 0.00 %.
2017-05-31 11:37:58:892 Info DReplay    Elapsed time: 0 day(s), 0 hour(s), 0 minute(s), 9 second(s).
2017-05-31 11:37:58:892 Error DReplay   Unexpected error occurred!

Well at least this time there was slightly more progress, it attempts to begin dispatching events, but ends miserably with another not so helpful error message.  Looking at the replay client logs, the following information is output:

2017-05-31 11:35:56:969 OPERATIONAL  [Controller Service]  Event replay in progress. Detailed options:
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Target DB Server: [SQL2K16-AG01].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Controller Working Directory: [C:\DRUDemo\ReplayFiles].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Generate Result Trace: [Yes].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Sequencing Mode: [SYNC].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Connect Time Scale: [100].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Think Time Scale: [100].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Healthmon Polling Interval: [60].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Query Timeout: [3600].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Data Provider Type: [ODBC].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Threads Per Client: [255].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Record Row Count: [Yes].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Record Result Set: [No].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Connection Pooling Enabled: [Yes].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Stress Scale Granularity: [Connection].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Replay Clients: [SQL2K16-AG02, SQL2K16-AG03].
2017-05-31 11:35:59:048 OPERATIONAL  [Controller Service]  Event dispatch in progress.
2017-05-31 11:36:05:766 OPERATIONAL  [Controller Service]  Event replay completed.
2017-05-31 11:36:05:766 OPERATIONAL  [Controller Service]  Elapsed time: 0 day(s), 0 hour(s), 0 minute(s), 8 second(s).

Not much help there either, and this is where I would expect that most people would end up giving up because there is nothing really actionable here at all.

Client Service Account Permissions on Target SQL Server

While nothing is documented about changes in Distributed Replay behavior in SQL Server 2016, this last error is different than the behavior of previous versions of Distributed Replay.  The problem is that the DReplayClient service account doesn’t have permissions in the target SQL Server to connect.  To prove this, here is a trace capture of User Error Message events from the last replay operation attempt:

image

Each of the Replay clients is attempting to connect to the target server and failing.  If we add the DReplayClient login to the target SQL Server and retry the reply, everything checks out and it actually begins to dispatch the events for the replay operation:

image

C:\DRUDemo>dreplay replay -s SQL2K16-AG01 -w “SQL2K16-AG02, SQL2K16-AG03” -f 10 -d “C:\DRUDemo\ReplayFiles” -o -c “c:\DRUDemo\DReplay.Exe.Replay.config”

2017-05-31 11:45:14:376 Info DReplay    Dispatching in progress.
2017-05-31 11:45:14:376 Info DReplay    0 events have been dispatched.
2017-05-31 11:45:24:377 Info DReplay    30753 events have been dispatched.
2017-05-31 11:45:34:377 Info DReplay    68262 events have been dispatched.
2017-05-31 11:45:44:377 Info DReplay    106677 events have been dispatched.
2017-05-31 11:45:54:393 Info DReplay    144226 events have been dispatched.
2017-05-31 11:46:04:408 Info DReplay    183595 events have been dispatched.
2017-05-31 11:46:14:424 Info DReplay    221378 events have been dispatched.
2017-05-31 11:46:24:424 Info DReplay    257754 events have been dispatched.
2017-05-31 11:46:34:455 Info DReplay    298436 events have been dispatched.
2017-05-31 11:46:44:471 Info DReplay    336026 events have been dispatched.
2017-05-31 11:46:54:471 Info DReplay    373717 events have been dispatched.
2017-05-31 11:47:04:486 Info DReplay    410378 events have been dispatched.
2017-05-31 11:47:14:502 Info DReplay    449949 events have been dispatched.
2017-05-31 11:47:24:518 Info DReplay    486431 events have been dispatched.
2017-05-31 11:47:34:533 Info DReplay    526228 events have been dispatched.
2017-05-31 11:47:44:549 Info DReplay    563484 events have been dispatched.
2017-05-31 11:47:48:361 Info DReplay    Dispatching has completed.
2017-05-31 11:47:48:361 Info DReplay    573630 events dispatched in total.
2017-05-31 11:47:48:361 Info DReplay    Elapsed time: 0 day(s), 0 hour(s), 2 minute(s), 35 second(s).
2017-05-31 11:47:48:361 Info DReplay    Event replay in progress.

Conclusion

Permissions, permissions, permissions…  While having some of the worst error messages imaginable, the problems with getting Distributed Replay in SQL Server 2016 configured have so far boiled down to permissions issues and ensuring that the permissions for the service accounts are set correctly on the Controller machine, and on the Target Server for the replay operation should resolve the issues.  Don’t forget to configure Firewall rules to allow access to the network appropriately as described in the original 2012 DRU post I wrote a few years back.   Hopefully this post will save someone the trouble of trying to figure this all out blindly.

New Article on SQLPerformance.com comparing “Observer Overhead” of Trace vs Extended Events

I have been so busy this week that I didn’t get a chance to blog about this yesterday when it happened, but I had a new article published on SQLPerformance.com that compares the performance impact or "observer overhead" of using SQL Trace and Extended Events.  I had a lot of fun running different benchmarks for this article and the results are very revealing about the overhead associated with diagnostic data collection against SQL Server under load. 

http://www.sqlperformance.com/2012/10/sql-trace/observer-overhead-trace-extended-events

I’ll be writing additional performance related articles on SQLPerformance.com in the next few months along with other members of SQLskills so make sure you add the RSS Feed to your favorite feed reader.

Performing a Distributed Replay with Multiple Clients using SQL Server 2012 Distributed Replay

In the first post in this blog series on using SQL Server 2012 Distributed Replay, Installing and Configuring SQL Server 2012 Distributed Replay, we looked at how to configure a Distributed Replay environment using multiple clients and a dedicated replay controller.  In this post we’ll actually make use of the previously configured servers to perform a distributed replay using a random workload that has been generated against the AdventureWorks2008R2 database installed on our Replay SQL Server.

Collecting the Replay Trace Data

For the purposes of generating a random workload against AdventureWorks2008R2, I created a workload generator that can be found on my blog post The AdventureWorks2008R2 Books Online Random Workload Generator.  I used this with 2 different PowerShell Windows from SQL2012-DRU1 and SQL2012-DRU2 to run a random workload across multiple sessions against the SQL2012-DB1 server.  To capture the trace data required for performing the replay, SQL Server Profiler was used along with the TSQL_Replay template to create the capture.

image

For production systems, the best way to go about capturing a Replay Trace is to script the trace definition to a file, and then create the trace as a server side trace that is writing to a trace file on local disks for the server.  This has a significantly lower impact that tracing directly from Profiler, which uses the rowset provider for Trace.  With the replay trace running, and the workload generating events I waited for the trace to collect around 80000 rows of data and then shutdown the trace so that I could access the trace file to copy it from the SQL2012-DB1 server to the SQL2012-DRU server where the Distributed Replay Controller is installed.

Preprocessing the Trace File(s)

At the point that I went to perform the preprocessing of the trace file for replay, I realized a difference in my environment using multiple servers to build this blog series versus my original setup using a single server for learning how to use Distributed Replay.  In order to preprocess the trace file for replay, you have to have the Management Tools Basic installed on the server that will be used for preprocessing the trace data.  If you have been following this blog series to learn how to use Distributed Replay, you will need to run Setup on the SQL2012-DRU server to add this feature before it can be used for pre-processing the trace file.  This is necessary to administer Distributed Replay.

image

Once the Management Tools Basic have been installed the server will have to be restarted and then it is possible to make use of the DReplay.Exe executable to administer the Distributed Replay components on the controller server. The DReplay executable has multiple options that can be discovered by using a –? from the command line as follows:

C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn>dreplay -?
Info DReplay    Usage:
DReplay.exe {preprocess|replay|status|cancel} [options] [-?]}

Verbs:
preprocess Apply filters and prepare trace data for intermediate file on controller.
replay     Transfer the dispatch files to the clients, launch and synchronize replay.
status     Query and display the current status of the controller.
cancel     Cancel the current operation on the controller.
-?         Display the command syntax summary.

Options:
dreplay preprocess [-m controller] -i input_trace_file -d controller_working_dir [-c config_file] [-f status_interval]
dreplay replay [-m controller] -d controller_working_dir [-o] [-s target_server] -w clients [-c config_file] [-f status_interval]
dreplay status [-m controller] [-f status_interval]
dreplay cancel [-m controller] [-q]
Run dreplay <verb> -? for detailed help on each verb.

To perform the preprocessing, you will need to do a couple of different steps.  The first thing you need to do is edit any options that you want to set for the pre-processing by editing the DReplay.Exe.Preproces.config file in the C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn path on the server.  There are two configuration files for DReplay.Exe as highlighted below.  At this time make sure that you are only editing the Preprocess.config file.

image

The DReplay.Exe.Preproces.config file contains a schema defined XML document that controls the configuration of the preprocessing.  In general the options set for preprocessing should not need to be changed but if you want to include system sessions as a part of the replay, you can change the options in the XML, which is listed below.

<?xml version="1.0" encoding="utf-8"?>
<Options>
    <PreprocessModifiers>
        <IncSystemSession>No</IncSystemSession>
        <MaxIdleTime>-1</MaxIdleTime>
    </PreprocessModifiers>
</Options>

To preprocess the trace data, open a new command prompt window and change directories to the C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn path.  The trace file has been copied onto the SQL2012-DRU server as C:\DReplay\SQL2012_ReplayTrace.trc.  To preprocess this file first start the “SQL Server Distributed Replay Controller” service by using NET START:

NET START "SQL Server Distributed Replay Controller"

Then execute the following command from within the Binn path to actually preprocess the trace file and output:

dreplay preprocess -i "C:\DReplay\SQL2012_ReplayTrace.trc" -d "C:\DReplay"

This will process the trace file and output the working files for performing the Distributed Replay to the C:\DReplay path.  Below is a screenshot of the full window for preprocessing the trace file.

image

Note: The dreplay executable can be called from any path within the server because the Binn path is a part of the Path Environmental variables.  However, the executable has to be called from within the Binn folder to access the necessary .config files and .xsd schema files for the configuration.  If you want to be able to run this executable from another location on the server, you will need to copy the .config and .xsd files out of the Binn folder to the folder that you want to be able to run dreplay within for it to work.

Performing the Replay

The first step in performing the replay is to start the “SQL Server Distributed Replay Client” service on each of the replay clients using NET START.

NET START "SQL Server Distributed Replay Client"

You will want to verify that each of the clients was able to successfully connect to the controller in the logs as shown in the previous post in this series.  Once this has been done, your environment is almost ready for replay.  For the purposes of this blog series, a SELECT only workload has been generated for replay against AdventureWorks2008R2.  However, in most environments you won’t have a SELECT only workload, so you will have to plan for and prepare your replay environment using a BACKUP/RESTORE of the production database from a point within the captured workload so that the database can be replayed against without having problems associated with Primary Key constraint violations during the replay.

If you want to change any of the parameters associated with the replay operation, you can edit the DReplay.Exe.Replay.config file in the C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn path.  The default contents of the configuration file are shown below:

<?xml version="1.0" encoding="utf-8"?>
<Options>
    <ReplayOptions>
        <Server></Server>
        <SequencingMode>stress</SequencingMode>
        <ConnectTimeScale>100</ConnectTimeScale>
        <ThinkTimeScale>100</ThinkTimeScale>
        <HealthmonInterval>60</HealthmonInterval>
        <QueryTimeout>3600</QueryTimeout>
        <ThreadsPerClient>255</ThreadsPerClient>
        <EnableConnectionPooling>No</EnableConnectionPooling>
        <StressScaleGranularity>SPID</StressScaleGranularity>
    </ReplayOptions>
    <OutputOptions>
        <ResultTrace>
            <RecordRowCount>Yes</RecordRowCount>
            <RecordResultSet>No</RecordResultSet>
        </ResultTrace>
    </OutputOptions>
</Options>

Before performing the actual replay, make sure that the account being used to run the SQL Server Distributed Replay Client service has been granted appropriate access to the target SQL Server and database to be able to perform the replay operations.  Once this has been done replay can be performed using the command line options for DReplay.Exe by providing the appropriate switches, or you can alternately provide the –c command line switch to specify the configuration file that should be used for performing the replay.  If you change any of the default values listed above in the DReplay.Exe.Replay.config file, you will need to specify the –c command line switch for those to take effect.  To perform a replay with the defaults, the following command line execution can be run:

dreplay replay -s "SQL2012-DB1" -d "C:\DReplay" -w "SQL2012-DRU1, SQL2012-DRU2"

Once this is executed, the Distributed Replay Controller will take read in the preprocessed replay file, and then synchronize the replay across all of the clients specified with the –w command line parameter.  While the replay operation occurs, the command window for the controller will output periodic updates about the current status of the replay process.

image

The frequency of the status updates can be controlled using the –f command line switch to specify the number of seconds between each of the updates.  Each of the status updates will provide information about each of the clients including the total number of events that have been replayed, the success rate of the replay operations per client, as well as an estimate for the total amount of time remaining to complete the replay operation.  When the replay completes the total elapsed time and pass rate for the events is output.

image

In the next and final post in this series, we’ll look at some of the common problems with using Distributed Replay and how to resolve them, including manually configuring the Controller and add additional Client Service accounts to the environment after Setup has been completed.