Downgrading SQL Server Editions

At some point in your career working with SQL Server, you will run into a situation where the wrong edition of SQL Server has been installed on a server and will need to change the edition for licensing reasons.  Whether it is Enterprise Edition where Standard Edition should have been installed, Enterprise Edition where Developer Edition should have been used, or my favorite, Evaluation Edition where the 180 day trial has expired and Enterprise Edition isn’t going to be used, the only route available for downgrading the edition is to uninstall and reinstall SQL Server entirely.  SQL Server Setup makes upgrading editions a piece of cake with SKUUPGRADE as a command line option for going from Standard/Developer/Evaluation to Enterprise, but anything else requires a full uninstall and reinstall to change the SKU/Edition and then restore all of the system and user databases to the new instance, which typically means a lot of work.  I hate having to restore system databases and avoid having to do it if possible, so here is how I do this process and minimize the work required:

No matter what you are going to have to do an uninstall and reinstall of the SQL Server instance to downgrade the SKU.  However, you can save yourself some time and the headache of trying to restore the system databases if you are careful about what you do.  I have done a plenty of SKU downgrades in the past and the easiest way to do it, and I am not saying this is the Microsoft supported way but that it works if done correctly, is to:

  1. Take a good backup of all of your databases (system and user).  
  2. Run SELECT @@VERSION and note the specific build number of SQL Server that you are currently on.
  3. Shut down your existing instance of SQL Server.  
  4. Copy the master, model, and msdb database files (both mdf and ldf), don’t move them copy them, from the current location to a new folder that you mark as readonly. 
  5. Uninstall SQL Server from the system.
  6. Reboot the server.
  7. Install SQL Server Standard Edition.
  8. Apply the necessary Service Pack and/or Cumulative Updates to bring the instance up to your previous build number.
  9. Shutdown SQL Server.
  10. Copy the master, model, and msdb database files (both mdf and ldf) from the folder you saved them in to the correct location for the new install and remove the readonly flag from the files, and change the file ACL’s so that the SQL Service account has Full Control over the files.
  11. Startup SQL Server and if you did it correctly it will startup and be exactly where you were before you made any changes, with all of your user databases online and you should be ready to let applications connect and resume operations.

If you screw something up in the above, you still have your backups and you can run setup to rebuild the system databases and then go about following the Microsoft supported path for restoring the system databases and then user databases into the system to bring it online.  Essentially the file copy is no different that what would occur through attach/detach you are just doing it with system databases which is not explicitly supported, but it does work.  The key is to have your backups from before you do anything so you have the supported route available if you encounter an issue.  The only issue I have ever had doing this set of steps is that I didn’t set the file ACL’s correctly and the database engine threw Access Denied errors and failed to start until I fixed the ACL’s correctly.  This can save you many hours of frustration and downtime trying to restore everything since the database files are already there and it is just some small copy operations to put them where you need them to be.

SQL 101: Parallelism Inhibitors – Scalar User Defined Functions

As Kimberly blogged about recently, SQLskills is embarking on a new initiative to blog about basic topics, which we’re calling SQL101. We’ll all be blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

At the spring SQLintersection conference in Orlando, one of the attendees asked me a question about why a query wouldn’t go parallel even though the cost for the plan was in the hundreds.  There are actually a number of different reasons why a query might execute serially, but usually one of the first things that comes to mind is scalar user defined functions. Developers love to use scalar user defined functions inside of SQL Server because it lets them compartmentalize code and easily reuse it, which is a common goal in object-oriented programming, but it’s also a performance anti-pattern for SQL Server as I’ll demonstrate in this post.

For the purposes of this post, I’m using the WorldWideImporters Standard Edition example database, and I’m going to recreate a scalar user defined function that I recently saw in a client engagement that was used to format the output of a column strictly for client presentation purposes.  Using the [Warehouse].[ColdRoomTemperatures_Archive] table, which stores 5 second intervals of temperature readings as the basis for my example report, we are going to calculate the delta temperature between samples. For the delta if the number is greater than zero, then we want it to show with a + sign before the value and it if is below zero a – sign. An example query to provide the basic data set for the report would be:

USE [WideWorldImporters]
GO
SELECT 
    a.ColdRoomSensorNumber,
    a.Temperature AS StartingTemp, 
    b.Temperature AS EndingTemp, 
    a.ValidFrom, 
    a.ValidTo, 
    b.Temperature - a.Temperature AS Delta 
FROM Warehouse.ColdRoomTemperatures_Archive AS a
INNER JOIN Warehouse.ColdRoomTemperatures_Archive AS b 
   ON a.ValidTo = b.ValidFrom 
      AND a.ColdRoomSensorNumber = b.ColdRoomSensorNumber
WHERE a.ValidFrom BETWEEN '05/01/2016' AND '05/02/2016'
ORDER BY ColdRoomSensorNumber, ValidFrom;

For the sake of argument, the scenario I am using is a presentation layer issue, and the argument can be made that this problem should be handled by the presentation/application tier to do the formatting requested.  However, there could be scenarios where the same data is needed in more than one application, reports, API feeds, etc. so for consistency the formatting is determined to be required for the SQL output.  This is where a developer might write a function to handle the formatting so the code can be reused anytime we need to output temperature deltas to make sure that everything does it exactly the same way.

IF OBJECT_ID('dbo.GetFormatedTemperatureDelta') IS NOT NULL
    DROP FUNCTION dbo.GetFormatedTemperatureDelta;
GO
CREATE FUNCTION dbo.GetFormatedTemperatureDelta
(@StartingTemperature decimal(10,2), @EndingTemperature decimal(10,2))
RETURNS VARCHAR(10)
AS 
BEGIN
    DECLARE @Result VARCHAR(10);
    SET @Result =    CASE 
                        WHEN @StartingTemperature-@EndingTemperature > 0 
                            THEN '+'+CAST(@StartingTemperature-@EndingTemperature AS VARCHAR)
                        WHEN @StartingTemperature-@EndingTemperature < 0 
                            THEN CAST(@StartingTemperature-@EndingTemperature AS VARCHAR)
                        ELSE '0.00' 
                    END;    
    RETURN(@Result);
END
GO

SELECT 
    a.ColdRoomSensorNumber,
    a.Temperature AS StartingTemp, 
    b.Temperature AS EndingTemp, 
    a.ValidFrom, 
    a.ValidTo, 
    dbo.GetFormatedTemperatureDelta(b.Temperature, a.Temperature) AS Delta 
FROM Warehouse.ColdRoomTemperatures_Archive AS a
INNER JOIN Warehouse.ColdRoomTemperatures_Archive AS b 
   ON a.ValidTo = b.ValidFrom 
      AND a.ColdRoomSensorNumber = b.ColdRoomSensorNumber
WHERE a.ValidFrom BETWEEN '05/01/2016' AND '05/02/2016'
ORDER BY ColdRoomSensorNumber, ValidFrom;

As soon as they apply this formatting to the reporting query using the function, performance may or may not be noticeably impacted. Using the above two queries the duration of execution for both is just above 1 second of time based on the current data being requested for the report as shown by the STATISTICS TIME output for both:

SQL Server Execution Times:
CPU time = 1155 ms,  elapsed time = 643 ms.

SQL Server Execution Times:
CPU time = 1250 ms,  elapsed time = 1680 ms

However, notice the difference in the duration for the original statement vs the statement using the user defined function.  The user defined function took over 2.5 times the duration.  This problem gets worse as the size of the data set gets larger, but what we don’t see is the impact that this had to the way the query executes.  Comparing the before function use and after function use execution plans:

image
Plan Without Function

image
Plan With Function

Before using the function, we had a parallel execution and after the plan is now serial.  If we look at what happens for each statements execution with Extended Events using the sqlserver.sql_statement_completed and sqlserver.module_end events, filtered to my specific session_id and tracking how events relate to each other with TRACK_CAUSLITY=ON for the session, we’ll find that the function is executing for every row returned by the query, turning our set based operation into a RBAR (row-by-agonizing-row) operation.

CREATE EVENT SESSION [TrackFunctions] ON SERVER 
ADD EVENT sqlserver.module_end(
    WHERE ([sqlserver].[session_id]=(85))),
ADD EVENT sqlserver.sql_statement_completed(
    WHERE ([sqlserver].[session_id]=(85)))
WITH (TRACK_CAUSALITY=ON)
GO

image
Extended Event Session Results (grouped by activity_id)

We could change the function to do nothing at all and the impacts would be exactly the same.  The query with the scalar user defined function will always run serially and row-by-row with the exception of if the function is created using SQLCLR which does allow scalar user defined functions that DO NOT perform data access to leverage parallelism.  However, for this simple logic SQLCLR wouldn’t be required or recommended just to gain parallel query execution back. The easier fix is to simply inline the code to the original query:

SELECT 
    a.ColdRoomSensorNumber,
    a.Temperature AS StartingTemp, 
    b.Temperature AS EndingTemp, 
    a.ValidFrom, 
    a.ValidTo, 
    CASE WHEN b.Temperature - a.Temperature > 0 THEN '+'+CAST(b.Temperature-a.Temperature AS VARCHAR)
        WHEN b.Temperature - a.Temperature < 0 THEN CAST(b.Temperature-a.Temperature AS VARCHAR)
        ELSE '0.00' END         
         AS Delta 
FROM Warehouse.ColdRoomTemperatures_Archive AS a
INNER JOIN Warehouse.ColdRoomTemperatures_Archive AS b 
   ON a.ValidTo = b.ValidFrom 
      AND a.ColdRoomSensorNumber = b.ColdRoomSensorNumber
WHERE a.ValidFrom BETWEEN '05/01/2016' AND '05/02/2016'
ORDER BY ColdRoomSensorNumber, ValidFrom
GO

SQL Server Execution Times:
CPU time = 1233 ms,  elapsed time = 676 ms.

image
Plan with inline expression

Here we have the fast execution time with a parallel plan but we are sacrificing the reuse of the code for other places where we might want to encapsulate the same logic.  However, that doesn’t have to be the case since this logic can be done inline to the query, it can also be written into an inline-table valued function.

IF OBJECT_ID('dbo.GetFormatedTemperatureDelta_tvf') IS NOT NULL
    DROP FUNCTION dbo.GetFormatedTemperatureDelta_tvf;
GO
CREATE FUNCTION dbo.GetFormatedTemperatureDelta_tvf
(@StartingTemperature decimal(10,2), @EndingTemperature decimal(10,2))
RETURNS TABLE
AS 
RETURN (SELECT Delta =CASE 
                        WHEN @StartingTemperature-@EndingTemperature > 0 
                            THEN '+'+CAST(@StartingTemperature-@EndingTemperature AS VARCHAR)
                        WHEN @StartingTemperature-@EndingTemperature < 0 
                            THEN CAST(@StartingTemperature-@EndingTemperature AS VARCHAR)
                        ELSE '0.00' 
                    END)
GO

SELECT 
    a.ColdRoomSensorNumber,
    a.Temperature AS StartingTemp, 
    b.Temperature AS EndingTemp, 
    a.ValidFrom, 
    a.ValidTo, 
    Delta 
FROM Warehouse.ColdRoomTemperatures_Archive AS a
INNER JOIN Warehouse.ColdRoomTemperatures_Archive AS b 
   ON a.ValidTo = b.ValidFrom 
      AND a.ColdRoomSensorNumber = b.ColdRoomSensorNumber
CROSS APPLY dbo.GetFormatedTemperatureDelta_tvf(b.Temperature, a.Temperature)
WHERE a.ValidFrom BETWEEN '05/01/2016' AND '05/02/2016'
ORDER BY ColdRoomSensorNumber, ValidFrom
GO

SQL Server Execution Times:
CPU time = 1251 ms,  elapsed time = 729 ms.

image
Plan With TVF

Here we get the best of both worlds, a parallel execution plan and the ability to reuse this logic in other places where we need the formatted output.  If you look closely at the plan and the Compute Scalar operator, you will find that it is identical to the plan with inline code shown above.  As a consultant, I’ve done a lot of conversions of scalar user defined functions to inline table valued functions to resolve performance issues and improve code scalability for clients. This is a very common problem to see in engagements and understanding the impacts of scalar user defined functions to performance is important for fast performance and optimizing TSQL code.

Reducing Long R720 POST Boot Time

I was recently working with a client that uses R720 servers for SQL Server and during restarts of the server, the POST (Power On Self Test) would stall for as long as 15 minutes before the Windows Server OS would actually begin starting.  I did some searching and found the Dell PowerEdge 12G Server Bios PDF which covers PCIe Slot Disablement, which allows the PCIe slots to be set to Boot Driver Disabled to prevent the Option ROM from running during the POST to allow for quicker boot times.  What this does is disable the preboot services for the device when it is a non-bootable device, while allowing the device to remain available to the OS.  On the systems where the boot times were taking over 15 minutes, none of the PCIe devices were used for booting the server, and were PCIe SSDs and NIC’s.  Setting the slots to Boot Driver Disabled reduced the POST time and allowed the server to reboot significantly faster.

To change these settings, first press F2 to enter into the BIOS during POST.  You will have to wait for all of the preboot before the System Setup screen will show up.  Then click on the Integrated Devices menu item:

image

Then click on the Slot Disablement menu item:

image

Then set the slots that are not required for bootable devices to Boot Driver Disabled:

image

Save the configuration and reboot the server and the POST should complete significantly faster now that it doesn’t have to initialize the device preboot and load the Option ROM from the device to allow it to be used for a bootable device.  Since the OS driver is still available once the OS boots, the device will function inside of Windows, but as with anything your mileage may vary and you need to test this change before introducing it on production systems.

Tracking Compiles with Extended Events

This post comes from a question by Tara Kizer on DBA StackExchange related to Compilations/Sec in PerfMon and using Extended Events to try and track the compilations that were happening in SQL Server for one of their clients.  There are a number of events in Extended Events that relate to plans in SQL Server, and Tara was using the sqlserver.query_pre_execution_showplan event as a way of tracking compiles with Extended Events.  When I read the post on StackExchange this stood out to me as not being the correct event for tracking an actual compilation, and I would have selected the query_post_compilation_showplan event instead.  These two events have similar descriptions, but they differ by one specific word:

query_pre_execution_showplan

Occurs after a SQL statement is compiled. This event returns an XML representation of the estimated query plan that is generated when the query is optimized. Using this event can have a significant performance overhead so it should only be used when troubleshooting or monitoring specific problems for brief periods of time.

query_post_compilation_showplan

Occurs after a SQL statement is compiled. This event returns an XML representation of the estimated query plan that is generated when the query is compiled. Using this event can have a significant performance overhead so it should only be used when troubleshooting or monitoring specific problems for brief periods of time.

NOTE: Both of these event descriptions state “Using this event can have a significant performance overhead so it should only be used when troubleshooting or monitoring specific problems for brief periods of time.” There are only a handful of events in Extended Events that include this warning in their description and each can introduce significant performance impacts to production workloads, so using them in live workloads is not something I’d recommend. (query_post_execution_showplan impact post)

The events aren’t exactly the same in description and occur at different times using the repro example from the StackExchange post. Using a much larger event session definition that includes additional events, it is easy to see where compilations actually are happening.  The event session definition used for proofing this is:

CREATE EVENT SESSION [Compiles] ON SERVER
ADD EVENT sqlserver.query_post_compilation_showplan(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.query_pre_execution_showplan(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sp_cache_hit(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sp_cache_insert(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sp_cache_miss(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sp_cache_remove(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sp_statement_completed(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sp_statement_starting(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sql_batch_completed(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sql_batch_starting(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sql_statement_completed(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sql_statement_recompile(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.sql_statement_starting(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0))),
ADD EVENT sqlserver.uncached_sql_batch_statistics(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[is_system]=(0)))
WITH (MAX_MEMORY=4096 KB,
      EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
      MAX_DISPATCH_LATENCY=30 SECONDS,
      MAX_EVENT_SIZE=0 KB,
      MEMORY_PARTITION_MODE=NONE,
      TRACK_CAUSALITY=ON,
      STARTUP_STATE=OFF)
GO

Using part of the repro example from the StackExchange post with slight changes to include clearing the plan cache and also to break the steps down into individual batches and then individual statements we can show how the plans were being used from cache and where compiles actually are occurring.

CREATE TABLE t1 (transaction_id int, Column2 varchar(100));
CREATE TABLE t2 (Column1 varchar(max), Column2 varchar(100));
GO

CREATE TRIGGER t2_ins
ON t2
AFTER INSERT
AS

INSERT INTO t1
SELECT (SELECT TOP 1 transaction_id FROM sys.dm_tran_active_transactions), Column2
FROM inserted;
GO

DBCC FREEPROCCACHE
GO

--Both of these show compilation events
INSERT INTO t2 VALUES ('row1', 'value1');
INSERT INTO t2 VALUES ('row2', 'value2');
GO

--Both of these show compilation events
INSERT INTO t2 VALUES ('row1', 'value1');
INSERT INTO t2 VALUES ('row2', 'value2');
GO

--Both of these show compilation events
INSERT INTO t2 VALUES ('row1', 'value1');
GO

INSERT INTO t2 VALUES ('row2', 'value2');
GO

Here you can see the first compilation happening for the INSERT statements as prepared plans being auto-parameterized in the green box. The trigger is compiled in the red box and the plan is inserted into the cache as shown by the sp_cache_insert event. Then in the orange box the trigger execution gets a cache hit and reuses the trigger plan for the second INSERT statement in the batch, so it’s not compiling every execution of the INSERT command and the plan does get reused as you can see with the sp_cache_hit event for the trigger.  The uncached_sql_batch_statistics event also fires for the initial batch since it doesn’t have a plan in cache and has to compile.

If we run the two INSERT statements individually again after the first execution the trigger doesn’t compile again as shown in the events below:

Here the first statement encounters a cache hit for the prepared auto-parameterized version of the statement in cache but a miss for the adhoc batch that was submitted. The trigger gets a cache hit and doesn’t compile again as shown in the red block of events. The green block of events repeats this behavior for the second INSERT statement run as a separate batch. However, in every case you still see the query_pre_execution_showplan event firing which I can only attribute to the difference in being optimized vs compiled in the event description, but the trigger isn’t compiling for every execution as shown by these series of events.  The query_post_compilation_showplan event only fires when the plan is actually being compiled for the statements and/or trigger.

SQL Server 2016 Distributed Replay Errors

If you’ve tried to install and configure Distributed Replay in SQL Server 2016, I wouldn’t be surprised to hear that you ran into all sorts of problems and probably didn’t end up getting it to work in a multi-client setup and eventually gave up. For whatever reason, Microsoft didn’t make the initial configuration of Distributed Replay in 2012, 2014 or 2016 very user friendly, and the error messages that you get when something isn’t correctly configured are less than helpful.  For example:

2017-05-31 10:05:25:211 Error DReplay   Unexpected error occurred!

Critical Error: code=[c8503012], msg=Unexpected error occurred!

Security violation with invalid remote caller.

Error Code: 0xC8502002

None of these errors help to pinpoint the cause of the problems, all of which are security/permissions related from what I’ve run into so far, but it’s not easy to figure it out unless you already know a fair bit about Distributed Replay and how it SHOULD be configured so you can spot where problems might be and try making changes.

Defaults After Installation

For my 2016 environment, I installed a Distributed Replay Controller and two separate Distributed Replay Clients, all of which used Service SIDs and were configured to use the correct controller following the installer information in my 2012 post.  The only difference, aside from server names, was that I didn’t setup domain service accounts and let the installer setup Service SIDs for the controller and client services.  When I start the controller service I get the following in the log using Windows Server 2016 and SQL Server 2016:

2017-05-31 11:05:29:669 OPERATIONAL  [Controller Service]  Microsoft SQL Server Distributed Replay Controller – 13.0.1601.5.
2017-05-31 11:05:29:669 OPERATIONAL  [Controller Service]  © Microsoft Corporation.
2017-05-31 11:05:29:669 OPERATIONAL  [Controller Service]  All rights reserved.
2017-05-31 11:05:29:684 OPERATIONAL  [Controller Service]  Current edition is: [Enterprise Edition].
2017-05-31 11:05:29:684 OPERATIONAL  [Controller Service]  The number of maximum supported client is 16.
2017-05-31 11:05:29:684 OPERATIONAL  [Controller Service]  Windows service “Microsoft SQL Server Distributed Replay Controller” has started under service account “NT SERVICE\SQL Server Distributed Replay Controller”. Process ID is 6572.
2017-05-31 11:05:29:684 OPERATIONAL  [Controller Service]  Time Zone: Eastern Standard Time.
2017-05-31 11:05:29:684 OPERATIONAL  [Common]              Initializing dump support.
2017-05-31 11:05:29:684 OPERATIONAL  [Common]              Failed to get DmpClient. [HRESULT=0x8007007F]

The Failed to get DmpClient error seems to be pretty common from Google search results, but isn’t actually a problem.  So if I start the clients, I get the following in the logs:

2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      Microsoft SQL Server Distributed Replay Client – 13.0.1601.5.
2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      © Microsoft Corporation.
2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      All rights reserved.
2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      Current edition is: [Enterprise Edition].
2017-05-31 11:12:16:672 OPERATIONAL  [Common]              Initializing dump support.
2017-05-31 11:12:16:672 OPERATIONAL  [Common]              Failed to get DmpClient. [HRESULT=0x8007007F]
2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      Windows service “Microsoft SQL Server Distributed Replay Client” has started under service account “NT SERVICE\SQL Server Distributed Replay Client”. Process ID is 7008.
2017-05-31 11:12:16:672 OPERATIONAL  [Client Service]      Time Zone: Eastern Standard Time.
2017-05-31 11:12:16:688 OPERATIONAL  [Client Service]      Controller name is “SQL2K16-AG01”.
2017-05-31 11:12:16:688 OPERATIONAL  [Client Service]      Working directory is “C:\Program Files (x86)\Microsoft SQL Server\130\Tools\DReplayClient\WorkingDir”.
2017-05-31 11:12:16:688 OPERATIONAL  [Client Service]      Result directory is “C:\Program Files (x86)\Microsoft SQL Server\130\Tools\DReplayClient\ResultDir”.
2017-05-31 11:12:16:688 OPERATIONAL  [Client Service]      Heartbeat Frequency(ms): 3000
2017-05-31 11:12:16:688 OPERATIONAL  [Client Service]      Heartbeats Before Timeout: 3

Notice that the last line DOES NOT say it was registered with the controller.  It should say Registered with controller “SQL2K16-AG01” if it had successfully registered, but it doesn’t so something isn’t allowing the client to register correctly with the controller.  To prove this, if we attempt a replay operation using the controller, it will output the following:

image

C:\DRUDemo>dreplay replay -s SQL2K16-AG01 -w “SQL2K16-AG02, SQL2K16-AG03” -f 10 -d “C:\DRUDemo\ReplayFiles” -o -c “c:\DRUDemo\DReplay.Exe.Replay.config”

2017-05-31 11:14:24:467 Error DReplay   The client ‘SQL2K16-AG02’ is not a registered distributed replay client. Make sure that the SQL Server Distributed Replay Client services is running on ‘SQL2K16-AG02’, and that the client is registered with controller ‘localhost’.

So this confirms that out-of-the-box 2016 DRU won’t work and permissions changes will be required to make it work properly.

Configuring Component Services Permissions

On the Distributed Replay Controller machine, permissions need to be set in Component Services to allow the Distributed Replay Client Service accounts Launch and Activate permissions remotely on the COM component. The service accounts also need to be in the Distributed COM Users group in Windows. So in Component Services, expand Computers > My Computer > DCOM Config > DReplayController and right-click and open the Properties for the COM Component.

image

Edit the Launch and Activation Permissions and add the Service Account for the clients, in this case because a Service SID is being used, the computer account from Active Directory for each client machine, and allow Local Launch, Remote Launch, Local Activation and Remote Activation.  Then edit the Access permissions and set Local Access and Remote Access for the Service accounts again.

imageimage

Now as I mentioned above, the service accounts also need to be in the Distributed COM Users group in Windows. So make sure that the service accounts have been added to that group, and restart the services on the controller and clients. Checking the Client log file should now show:

2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Microsoft SQL Server Distributed Replay Client – 13.0.1601.5.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      © Microsoft Corporation.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      All rights reserved.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Current edition is: [Enterprise Edition].
2017-05-31 11:20:27:454 OPERATIONAL  [Common]              Initializing dump support.
2017-05-31 11:20:27:454 OPERATIONAL  [Common]              Failed to get DmpClient. [HRESULT=0x8007007F]
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Windows service “Microsoft SQL Server Distributed Replay Client” has started under service account “NT SERVICE\SQL Server Distributed Replay Client”. Process ID is 6172.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Time Zone: Eastern Standard Time.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Controller name is “SQL2K16-AG01”.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Working directory is “C:\Program Files (x86)\Microsoft SQL Server\130\Tools\DReplayClient\WorkingDir”.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Result directory is “C:\Program Files (x86)\Microsoft SQL Server\130\Tools\DReplayClient\ResultDir”.
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Heartbeat Frequency(ms): 3000
2017-05-31 11:20:27:454 OPERATIONAL  [Client Service]      Heartbeats Before Timeout: 3
2017-05-31 11:21:21:798 OPERATIONAL  [Client Service]      Registered with controller “SQL2K16-AG01”.

However, when we try a replay operation now, we get:

C:\DRUDemo>dreplay replay -s SQL2K16-AG01 -w “SQL2K16-AG02, SQL2K16-AG03” -f 10 -d “C:\DRUDemo\ReplayFiles” -o -c “c:\DRUDemo\DReplay.Exe.Replay.config”

2017-05-31 11:21:33:203 Error DReplay  Unexpected error occurred!

Yep that’s really helpful, so lets go check the controller log and see what it has:

2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]  Event replay in progress. Detailed options:
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Target DB Server: [SQL2K16-AG01].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Controller Working Directory: [C:\DRUDemo\ReplayFiles].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Generate Result Trace: [Yes].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Sequencing Mode: [SYNC].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Connect Time Scale: [100].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Think Time Scale: [100].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Healthmon Polling Interval: [60].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Query Timeout: [3600].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Data Provider Type: [ODBC].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Threads Per Client: [255].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Record Row Count: [Yes].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Record Result Set: [No].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Connection Pooling Enabled: [Yes].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Stress Scale Granularity: [Connection].
2017-05-31 11:21:31:374 OPERATIONAL  [Controller Service]      Replay Clients: [SQL2K16-AG02, SQL2K16-AG03].
2017-05-31 11:21:33:203 CRITICAL     [Controller Service] **** Critical Error ****
2017-05-31 11:21:33:203 CRITICAL     [Controller Service]  Machine Name: SQL2K16-AG01
2017-05-31 11:21:33:203 CRITICAL     [Controller Service] Error Code: 0xC8502002
2017-05-31 11:21:33:203 OPERATIONAL  [Controller Service]  Event replay completed.
2017-05-31 11:21:33:203 OPERATIONAL  [Controller Service]  Elapsed time: 0 day(s), 0 hour(s), 0 minute(s), 1 second(s).

You can try and Google/Bing that error code, hopefully you already did and it brought you to this blog post.  So lets go back and check the client logs again, and we find these added messages:

2017-05-31 11:21:33:189 CRITICAL     [Client Service]      Security violation with invalid remote caller.
2017-05-31 11:21:33:189 CRITICAL     [Client Service]      Caller auth level is 2.
2017-05-31 11:21:33:189 CRITICAL     [Client Service]      Caller impersonation level is 1.
2017-05-31 11:21:33:189 CRITICAL     [Client Service]      Caller identity is SQLSKILLSDEMOS\SQL2K16-AG01$.
2017-05-31 11:21:33:189 CRITICAL     [Client Service]      Controller account is NT SERVICE\SQL Server Distributed Replay Controller.

So this points to another security issue, but I wasn’t sure how to go about troubleshooting this further using s Service SID, so at this point I changed from Service SIDs to Active Directory User accounts to run the services, DReplayClient for the clients and DReplayController for the controller.  I reset all the permissions in Component Services on the controller machine and assigned the DReplayClient account to the Distributed COM Users group on the controller machine and gave it another shot.

image

C:\DRUDemo>dreplay replay -s SQL2K16-AG01 -w “SQL2K16-AG02, SQL2K16-AG03” -f 10 -d “C:\DRUDemo\ReplayFiles” -o -c “c:\DRUDemo\DReplay.Exe.Replay.config”

2017-05-31 11:37:51:189 Info DReplay    Dispatching in progress.
2017-05-31 11:37:51:189 Info DReplay    0 events have been dispatched.
2017-05-31 11:37:58:892 Info DReplay    Dispatching has completed.
2017-05-31 11:37:58:892 Info DReplay    0 events dispatched in total.
2017-05-31 11:37:58:892 Info DReplay    Elapsed time: 0 day(s), 0 hour(s), 0 minute(s), 0 second(s).
2017-05-31 11:37:58:892 Info DReplay    Event replay in progress.
2017-05-31 11:37:58:892 Info DReplay    Event replay has completed.
2017-05-31 11:37:58:892 Info DReplay    0 events (100 %) have been replayed in total. Pass rate 0.00 %.
2017-05-31 11:37:58:892 Info DReplay    Elapsed time: 0 day(s), 0 hour(s), 0 minute(s), 9 second(s).
2017-05-31 11:37:58:892 Error DReplay   Unexpected error occurred!

Well at least this time there was slightly more progress, it attempts to begin dispatching events, but ends miserably with another not so helpful error message.  Looking at the replay client logs, the following information is output:

2017-05-31 11:35:56:969 OPERATIONAL  [Controller Service]  Event replay in progress. Detailed options:
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Target DB Server: [SQL2K16-AG01].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Controller Working Directory: [C:\DRUDemo\ReplayFiles].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Generate Result Trace: [Yes].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Sequencing Mode: [SYNC].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Connect Time Scale: [100].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Think Time Scale: [100].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Healthmon Polling Interval: [60].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Query Timeout: [3600].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Data Provider Type: [ODBC].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Threads Per Client: [255].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Record Row Count: [Yes].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Record Result Set: [No].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Connection Pooling Enabled: [Yes].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Stress Scale Granularity: [Connection].
2017-05-31 11:35:56:985 OPERATIONAL  [Controller Service]      Replay Clients: [SQL2K16-AG02, SQL2K16-AG03].
2017-05-31 11:35:59:048 OPERATIONAL  [Controller Service]  Event dispatch in progress.
2017-05-31 11:36:05:766 OPERATIONAL  [Controller Service]  Event replay completed.
2017-05-31 11:36:05:766 OPERATIONAL  [Controller Service]  Elapsed time: 0 day(s), 0 hour(s), 0 minute(s), 8 second(s).

Not much help there either, and this is where I would expect that most people would end up giving up because there is nothing really actionable here at all.

Client Service Account Permissions on Target SQL Server

While nothing is documented about changes in Distributed Replay behavior in SQL Server 2016, this last error is different than the behavior of previous versions of Distributed Replay.  The problem is that the DReplayClient service account doesn’t have permissions in the target SQL Server to connect.  To prove this, here is a trace capture of User Error Message events from the last replay operation attempt:

image

Each of the Replay clients is attempting to connect to the target server and failing.  If we add the DReplayClient login to the target SQL Server and retry the reply, everything checks out and it actually begins to dispatch the events for the replay operation:

image

C:\DRUDemo>dreplay replay -s SQL2K16-AG01 -w “SQL2K16-AG02, SQL2K16-AG03” -f 10 -d “C:\DRUDemo\ReplayFiles” -o -c “c:\DRUDemo\DReplay.Exe.Replay.config”

2017-05-31 11:45:14:376 Info DReplay    Dispatching in progress.
2017-05-31 11:45:14:376 Info DReplay    0 events have been dispatched.
2017-05-31 11:45:24:377 Info DReplay    30753 events have been dispatched.
2017-05-31 11:45:34:377 Info DReplay    68262 events have been dispatched.
2017-05-31 11:45:44:377 Info DReplay    106677 events have been dispatched.
2017-05-31 11:45:54:393 Info DReplay    144226 events have been dispatched.
2017-05-31 11:46:04:408 Info DReplay    183595 events have been dispatched.
2017-05-31 11:46:14:424 Info DReplay    221378 events have been dispatched.
2017-05-31 11:46:24:424 Info DReplay    257754 events have been dispatched.
2017-05-31 11:46:34:455 Info DReplay    298436 events have been dispatched.
2017-05-31 11:46:44:471 Info DReplay    336026 events have been dispatched.
2017-05-31 11:46:54:471 Info DReplay    373717 events have been dispatched.
2017-05-31 11:47:04:486 Info DReplay    410378 events have been dispatched.
2017-05-31 11:47:14:502 Info DReplay    449949 events have been dispatched.
2017-05-31 11:47:24:518 Info DReplay    486431 events have been dispatched.
2017-05-31 11:47:34:533 Info DReplay    526228 events have been dispatched.
2017-05-31 11:47:44:549 Info DReplay    563484 events have been dispatched.
2017-05-31 11:47:48:361 Info DReplay    Dispatching has completed.
2017-05-31 11:47:48:361 Info DReplay    573630 events dispatched in total.
2017-05-31 11:47:48:361 Info DReplay    Elapsed time: 0 day(s), 0 hour(s), 2 minute(s), 35 second(s).
2017-05-31 11:47:48:361 Info DReplay    Event replay in progress.

Conclusion

Permissions, permissions, permissions…  While having some of the worst error messages imaginable, the problems with getting Distributed Replay in SQL Server 2016 configured have so far boiled down to permissions issues and ensuring that the permissions for the service accounts are set correctly on the Controller machine, and on the Target Server for the replay operation should resolve the issues.  Don’t forget to configure Firewall rules to allow access to the network appropriately as described in the original 2012 DRU post I wrote a few years back.   Hopefully this post will save someone the trouble of trying to figure this all out blindly.

Using Group Managed Service Accounts for SQL Server

This afternoon I ran into an issue after rebooting one of my lab environment VM’s for SQL Server 2016 to give the VM more memory to allow some tests I’m doing with in-memory OLTP.  When the VM rebooted I tried to connect to SQL Server with SSMS and the connection failed.  So I opened up SQL Server Configuration Manager and found that the instance of SQL Server didn’t start when Windows started, so I tried to manually start the instance and received the following error:

image

When I opened up the SQL Server ERRORLOG file, I found the following errors:

2017-05-30 17:08:17.65 spid11s     Error: 17190, Severity: 16, State: 1.
2017-05-30 17:08:17.65 spid11s     Initializing the FallBack certificate failed with error code: 1, state: 20, error number: 0.
2017-05-30 17:08:17.66 spid11s     Unable to initialize SSL encryption because a valid certificate could not be found, and it is not possible to create a self-signed certificate.
2017-05-30 17:08:17.66 spid11s     Error: 17182, Severity: 16, State: 1.
2017-05-30 17:08:17.66 spid11s     TDSSNIClient initialization failed with error 0x80092004, status code 0x80. Reason: Unable to initialize SSL support. Cannot find object or property.
2017-05-30 17:08:17.66 spid11s     Error: 17182, Severity: 16, State: 1.
2017-05-30 17:08:17.66 spid11s     TDSSNIClient initialization failed with error 0x80092004, status code 0x1. Reason: Initialization failed with an infrastructure error. Check for previous errors. Cannot find object or property.
2017-05-30 17:08:17.66 spid11s     Error: 17826, Severity: 18, State: 3.
2017-05-30 17:08:17.66 spid11s     Could not start the network library because of an internal error in the network library. To determine the cause, review the errors immediately preceding this one in the error log.
2017-05-30 17:08:17.66 spid11s     Error: 17120, Severity: 16, State: 1.
2017-05-30 17:08:17.66 spid11s     SQL Server could not spawn FRunCommunicationsManager thread. Check the SQL Server error log and the Windows event logs for information about possible related problems.

 

This is when I remembered that I had set this lab VM up to use a Group Managed Service Account (gMSA) for SQL Server, and my Active Directory Domain Controller for the Lab environment wasn’t running.  The instance couldn’t start because it couldn’t talk to the domain controller to obtain the credentials for the gMSA which as the service account for SQL Server is at the top of the Encryption Hierarchy for the instance.  So a lesson learned on lab environment VM’s that use managed service accounts, you have to have the Active Directory Domain controller running or the instance won’t start because it can’t retrieve the credential information for the service account to run the instance.  This can also happen in production environments where a domain controller might not be local to a SQL Server installation and networking issues prevent the SQL instance from being able to retrieve the credentials to start the server.

SQLskills SQL101: NUMA Support in SQL

I’ve blogged about NUMA and SQL Server in the past and how SQLOS automatically handles NUMA configurations for SQL Server by default, meaning typically as a DBA you don’t need to customize any configuration options for SQL Server to optimize it’s internal structures for NUMA systems.  I’ve also talked about vNUMA issues in virtual machines with Hot-Add CPU enabled and how that affects performance.

While speaking at the SQLintersection conference this week an attendee came up to me and asked if NUMA support in SQLOS was Edition specific in SQL Server.  It’s not an Edition-specific feature, SQLOS configures the internal structures of SQL Server when it boots based on the hardware NUMA layout being presented by the Windows OS, regardless of the Edition of SQL Server that is running.  The attendee then pulled up a blog post recently published that stated that “NUMA awareness is an Enterprise Edition feature” in SQL Server; unfortunately that blog post (extract below) is incorrect.

image

With anything on the internet, including this blog, you need to verify the information before you trust it if actual tests and verification aren’t provided.  Even if the information provides a demonstration or reproduction that shows a particular behavior, you need to look at the version and time-frame for when the information was published because things change constantly in technology and what was accurate a year ago may not be accurate today.  So to demonstrate that SQL Server Standard Edition will recognize NUMA configurations, here are two screenshots from one of our lab servers at SQLskills.

image

image

As you can see, this instance sees NUMA and has two NUMA nodes configured inside of SQLOS running Standard Edition.  NUMA is not an Edition-specific feature in SQL Server and never has been.  SQLOS optimizes the way the internal structures are created under NUMA for memory nodes (and CPU nodes) based on how the OS is presenting the hardware layout.

Using Storage Spaces Direct S2D on VMware Workstation

While working through my new demos for SQL Server 2016 on Windows Server 2016 I had to rebuild my hyper-converged SQL Failover Cluster environment using local storage and Storage Spaces Direct for the cluster shared storage instead of configuring a iSCSI SAN environment or SMB File Share as I have done in the past.  However, there is a catch with the virtual disks under VMware Workstation by default, they don’t expose the SerialNumber which is required for S2D to be able to use the drives in the pool.  In PowerShell, Get-PhysicalDisk on a new VMware Workstation VM looks like this:

image

To fix the issue, the VMX file for the VM must be modified to add the following line:

disk.EnableUUID=”true”

Once the VM is rebooted, the SerialNumber is populated in Powershell for Get-PhysicalDisk, and S2D will be able to use the disks in the storage pool.

image

Exception calling “InitializeAsReader” in PowerShell Reading Trace Files

Today I needed to process some results files from Distributed Replay to compare before and after metrics from a synchronized replay that I had a client run for testing a change in hardware.  I’ve done this before using PowerShell to automate things and the script for it is pretty simple:

[Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.ConnectionInfoExtended") | Out-Null;
[Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.ConnectionInfo") | Out-Null;
[Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.Smo") | Out-Null;

$InputFile = "E:\Results\27\ResultDir\ReplayResult.trc";
$TableName = "ReplayResults_2"

$conn = New-Object "Microsoft.SqlServer.Management.Common.SqlConnectionInfo";
$conn.ServerName = ".";
$conn.DatabaseName = "ReplayAnalysis"
$InputTrace = New-Object "Microsoft.SqlServer.Management.Trace.Tracefile";
$InputTrace.InitializeAsReader($InputFile);
$TableWriter = New-Object "Microsoft.SqlServer.Management.Trace.TraceTable";
$TableWriter.InitializeAsWriter($InputTrace, $conn, $TableName);

while($TableWriter.Write())
{}

However, when I tried to do this on a new VM that was running Windows Server 2012R2 with SQL Server 2016 using the PowerShell ISE I kept running into the following exception that I had never encountered in the past before:

Exception calling “InitializeAsReader” with “1” argument(s): “Failed to initialize object as reader.”
At line:14 char:1
+ $InputTrace.InitializeAsReader($InputFile);
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : SqlTraceException

I asked about this on Twitter and Shane O’Neill (@SOZDBA) gave me a nudge in the right direction towards finding out the root of the problem by suggesting I look at the error in more detail.  So I wrapped the command in a try/catch block piping the exception to a select *

[Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.ConnectionInfoExtended") | Out-Null;
[Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.ConnectionInfo") | Out-Null;
[Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.Smo") | Out-Null;

$InputFile = "E:\Results\27\ResultDir\ReplayResult.trc";
$TableName = "ReplayResults_2"

$conn = New-Object "Microsoft.SqlServer.Management.Common.SqlConnectionInfo";
$conn.ServerName = ".";
$conn.DatabaseName = "ReplayAnalysis"
$InputTrace = New-Object "Microsoft.SqlServer.Management.Trace.Tracefile";
try {
$InputTrace.InitializeAsReader($InputFile);
}
catch
{
$_ | select *
}

Now I had a more detail about the exception being raised:

Exception : System.Management.Automation.MethodInvocationException: Exception calling “InitializeAsReader” with “1” argument(s): “Failed to initialize object as reader.” —>
Microsoft.SqlServer.Management.Trace.SqlTraceException: Failed to initialize object as reader. —> System.IO.FileNotFoundException: Could not load file or assembly
‘file:///C:\Program Files\Microsoft SQL Server\130\Tools\Binn\pfclnt.dll’ or one of its dependencies. The system cannot find the file specified.
at System.Reflection.RuntimeAssembly._nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, RuntimeAssembly locationHint, StackCrawlMark&
stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
at System.Reflection.RuntimeAssembly.InternalLoadAssemblyName(AssemblyName assemblyRef, Evidence assemblySecurity, RuntimeAssembly reqAssembly, StackCrawlMark&
stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
at System.Reflection.RuntimeAssembly.InternalLoadFrom(String assemblyFile, Evidence securityEvidence, Byte[] hashValue, AssemblyHashAlgorithm hashAlgorithm,
Boolean forIntrospection, Boolean suppressSecurityChecks, StackCrawlMark& stackMark)
at System.Reflection.Assembly.LoadFrom(String assemblyFile)
at Microsoft.SqlServer.Management.Trace.TraceUtils.CreateInstance(String assemblyRelativePath, String objectClass)
at Microsoft.SqlServer.Management.Trace.TraceFile.InitializeAsReader(String fileName)
— End of inner exception stack trace —
at Microsoft.SqlServer.Management.Trace.TraceFile.InitializeAsReader(String fileName)
at CallSite.Target(Closure , CallSite , Object , Object )
— End of inner exception stack trace —
at System.Management.Automation.ExceptionHandlingOps.CheckActionPreference(FunctionContext funcContext, Exception exception)
at System.Management.Automation.Interpreter.ActionCallInstruction`2.Run(InterpretedFrame frame)
at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame)
at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame)

A quick Google search for pfclnt.dll landed me on a old Connect item that pointed out that the SQL Server Profiler supporting DLL’s for SMO are 32-bit only and to use them in .NET you must compile your assembly for x86.  So the answer was pretty simple, start Windows PowerShell (x86) from the Start menu, paste the same exact script into the ISE window and run it and it works as expected.

Tracking database usage with Extended Events returns invalid database_id?

I was recently emailed with a question about tracking database usage information using Extended Events and the person wanted to know why the event session was returning invalid database_id’s as a part of what it was collecting.  The funny thing is that I’ve seen this before during demo’s when I have a very specific concurrent workload running for what I am demonstrating and I used to make sure to do this exact thing so I could explain it during Immersion Events.  The TLDR answer to this is that Extended Events isn’t returning an invalid database_id at the time that the event occurs, the database_id just isn’t valid when the data is being reviewed or consumed.  How could that be you wonder?

DBCC CHECKDB or a manual user snapshot of a database; in either of these two scenarios, the snapshot used internally by DBCC CHECKDB (also CHECKTABLE, and the other CHECK commands) is assigned a database_id just like a regular user snapshot is, so when the event fires the database_id reported for the event is actually correct, but the snapshot is eliminated by the time the data is consumed, so using DB_NAME() on the database_id returns NULL making it appear like the event session is returning invalid data.