Downgrading SQL Server Editions

At some point in your career working with SQL Server, you will run into a situation where the wrong edition of SQL Server has been installed on a server and will need to change the edition for licensing reasons.  Whether it is Enterprise Edition where Standard Edition should have been installed, Enterprise Edition where Developer Edition should have been used, or my favorite, Evaluation Edition where the 180 day trial has expired and Enterprise Edition isn’t going to be used, the only route available for downgrading the edition is to uninstall and reinstall SQL Server entirely.  SQL Server Setup makes upgrading editions a piece of cake with SKUUPGRADE as a command line option for going from Standard/Developer/Evaluation to Enterprise, but anything else requires a full uninstall and reinstall to change the SKU/Edition and then restore all of the system and user databases to the new instance, which typically means a lot of work.  I hate having to restore system databases and avoid having to do it if possible, so here is how I do this process and minimize the work required:

No matter what you are going to have to do an uninstall and reinstall of the SQL Server instance to downgrade the SKU.  However, you can save yourself some time and the headache of trying to restore the system databases if you are careful about what you do.  I have done a plenty of SKU downgrades in the past and the easiest way to do it, and I am not saying this is the Microsoft supported way but that it works if done correctly, is to:

  1. Take a good backup of all of your databases (system and user).  
  2. Run SELECT @@VERSION and note the specific build number of SQL Server that you are currently on.
  3. Shut down your existing instance of SQL Server.  
  4. Copy the master, model, and msdb database files (both mdf and ldf), don’t move them copy them, from the current location to a new folder that you mark as readonly. 
  5. Uninstall SQL Server from the system.
  6. Reboot the server.
  7. Install SQL Server Standard Edition.
  8. Apply the necessary Service Pack and/or Cumulative Updates to bring the instance up to your previous build number.
  9. Shutdown SQL Server.
  10. Copy the master, model, and msdb database files (both mdf and ldf) from the folder you saved them in to the correct location for the new install and remove the readonly flag from the files, and change the file ACL’s so that the SQL Service account has Full Control over the files.
  11. Startup SQL Server and if you did it correctly it will startup and be exactly where you were before you made any changes, with all of your user databases online and you should be ready to let applications connect and resume operations.

If you screw something up in the above, you still have your backups and you can run setup to rebuild the system databases and then go about following the Microsoft supported path for restoring the system databases and then user databases into the system to bring it online.  Essentially the file copy is no different that what would occur through attach/detach you are just doing it with system databases which is not explicitly supported, but it does work.  The key is to have your backups from before you do anything so you have the supported route available if you encounter an issue.  The only issue I have ever had doing this set of steps is that I didn’t set the file ACL’s correctly and the database engine threw Access Denied errors and failed to start until I fixed the ACL’s correctly.  This can save you many hours of frustration and downtime trying to restore everything since the database files are already there and it is just some small copy operations to put them where you need them to be.

Parallel Maintenance Tasks with Service Broker

I’ve been doing a bit of work on Service Broker while building a series of new Pluralsight courses and one of the things I recently found a good use for Service Broker on a client system with a 8TB VLDB was for implementing parallel maintenance tasks using multiple threads.  The idea for this implementation came out of the fact that their maintenance window continues to become shorter and shorter the larger their workload increases, while the size of their database continues to get larger and larger.  One of the things I noticed while doing a monthly health check of their server was that during the maintenance window, their hardware is no where close to being utilized and there are plenty of resources available due to the serial execution of their maintenance tasks.  This specific client uses Ola Hallengren’s maintenance procedures, which we recommend to all of our clients, so I started to think about ways I could make things run in parallel to get better utilization of the hardware during the maintenance window and hopefully reduce the time required to perform the day to day maintenance in their database.

I wish I could say the first thing I thought of was Service Broker, but it wasn’t I toyed around with the idea of a custom C# app to run Ola’s scripts and log the output commands to the CommandLog table in master, then use multiple threads to run each of the individual tasks, something similar using Powershell instead of a compiled app, and even multiple agent jobs to break things up using different parameters for IndexOptimize. It turns out that these are all over engineering a very basic problem that Service Broker is highly suited towards solving and it only took a little bit of trial and error to create a very robust solution that runs index and statistics maintenance tasks in parallel in SQL Server.

Setting up Service Broker Components

I chose to implement the Service Broker components for this configuration in the MSDB database for a couple of reasons.  First, it is already enabled for Service Broker, so I don’t have to worry about that, second, it is also already TRUSTWORTHY ON which is required by Agent, and third it is owned by SA, so an activation stored procedure marked with EXECUTE AS OWNER, runs as a full sysadmin in the server allowing it to run any maintenance command generated by Ola’s scripts.  We need to configure some basic objects for Service Broker in the database; a message type, a contract, a queue and a service.

USE msdb
GO
-- Create the message types
CREATE MESSAGE TYPE
[OlaHallengrenMaintenanceTaskMessage]
VALIDATION = WELL_FORMED_XML;
GO

-- Create the contract
CREATE CONTRACT [OlaHallengrenMaintenanceTaskContract]
([OlaHallengrenMaintenanceTaskMessage]
SENT BY INITIATOR);
GO

-- Create the target queue and service
CREATE QUEUE OlaHallengrenMaintenanceTaskQueue;
GO

CREATE SERVICE
[OlaHallengrenMaintenanceTaskService]
ON QUEUE OlaHallengrenMaintenanceTaskQueue
([OlaHallengrenMaintenanceTaskContract]);
GO

Building an Activation Procedure to Automate Processing

With that setup, now we need to build an activation stored procedure to process the messages from the queue and perform the operations generated by Ola’s scripts.  At first I thought that this would be pretty straightforward, but it actually required a fair amount of additional logic to prevent poison messages and queue deactivation from occurring when multiple online index rebuilds were attempted on the same table.  When this occurs the engine raises Error 1912:

Could not proceed with index DDL operation on %S_MSG ‘%.*ls’ because it conflicts with another concurrent operation that is already in progress on the object. The concurrent operation could be an online index operation on the same object or another concurrent operation that moves index pages like DBCC SHRINKFILE.

which causes the XACT_STATE() of the operation to become –1, which only allows the transaction processing the command to be rolled back, leading to a poison message within Service Broker.  So to mitigate against that I had to add in a check against sys.dm_tran_locks for any ALTER_INDEX command to identify if a conflicting lock is currently being held and if it is then we just requeue the message and increment the retry count held within the message so that we can incrementally back-off attempting to run the command each time it encounters a conflicting lock using a WAITFOR.  Since Ola’s scripts log the command type, database name, object name, and command to be executed to the CommandLog table in the master database, all we need to queue in our message is the ID of the command to be processed by Service Broker, and we can get the rest of the pertinent information from the CommandLog table directly.  As a part of each tasks execution in our activation stored procedure we will also update the StartTime and EndTime of the operation for historical trending of execution durations.

IF OBJECT_ID('OlaHallengrenMaintenanceTaskQueue_ActivationProcedure') IS NOT NULL
BEGIN
	DROP PROCEDURE OlaHallengrenMaintenanceTaskQueue_ActivationProcedure;
END
GO

CREATE PROCEDURE OlaHallengrenMaintenanceTaskQueue_ActivationProcedure
AS

  DECLARE @conversation_handle UNIQUEIDENTIFIER;
  DECLARE @message_body XML;
  DECLARE @message_type_name SYSNAME;
  DECLARE @Command NVARCHAR(MAX);
  DECLARE @ID INT
  DECLARE @DBName SYSNAME;
  DECLARE @ObjectName SYSNAME;
  DECLARE @CommandType NVARCHAR(60);
  DECLARE @Retry INT;
  DECLARE @FQN NVARCHAR(400);

  WHILE (1=1)
  BEGIN

    BEGIN TRANSACTION;

    WAITFOR
    ( RECEIVE TOP(1)
        @conversation_handle = conversation_handle,
        @message_body = message_body,
        @message_type_name = message_type_name
      FROM OlaHallengrenMaintenanceTaskQueue
    ), TIMEOUT 5000;

    IF (@@ROWCOUNT = 0)
    BEGIN
      ROLLBACK TRANSACTION;
      BREAK;
    END

    IF @message_type_name = N'OlaHallengrenMaintenanceTaskMessage'
    BEGIN
		SELECT @ID = @message_body.value('(CommandLogID)[1]', 'int'),
				@Retry = ISNULL(@message_body.value('(CommandLogID/@retry)[1]', 'int'), 0);

	   
		SELECT @Command = Command, 
				@ObjectName = ObjectName, 
				@DBName = DatabaseName,
				@FQN = QUOTENAME(DatabaseName) + '.' + QUOTENAME(SchemaName) + '.' + QUOTENAME(ObjectName),
				@CommandType = CommandType
		FROM master.dbo.CommandLog
		WHERE ID = @ID;

		-- 	Check for Index rebuilds if one is already running and requeue the request after waiting.
		IF @CommandType = 'ALTER_INDEX'
		BEGIN

			-- Check if we have an incompatible lock that would lead to a failed execution 
			IF EXISTS (SELECT 1
						FROM sys.dm_tran_locks AS tl
						WHERE (request_mode = 'SCH-M' OR 
							-- Concurrent maintenance task doing UpdateStats?
							(request_mode = 'LCK_M_SCH_S' AND EXISTS (SELECT 1 FROM sys.dm_exec_sessions AS s WHERE is_user_process = 1 AND tl.request_session_id = s.session_id)))
						  AND resource_associated_entity_id = OBJECT_ID(@FQN)
						  AND resource_database_id = DB_ID(@DBName) )
			BEGIN
				-- Wait for 5 seconds times the number of retrys to do an incremental backoff
				-- This will eventually cause all queue readers to die off and serial execution of tasks
				DECLARE @Delay NVARCHAR(8) = CAST(DATEADD(ss, @Retry*5, CAST('00:00:00'AS TIME)) AS VARCHAR)
				WAITFOR DELAY @Delay

				-- Increment retry count in the message
				SELECT @message_body = N'<CommandLogID retry="'+CAST(@Retry+1 AS NVARCHAR)+'">'+CAST(@id AS NVARCHAR)+N'</CommandLogID>';


				-- Send the message back to the queue for later processing
				;SEND ON CONVERSATION @conversation_handle
						MESSAGE TYPE [OlaHallengrenMaintenanceTaskMessage]
						(@message_body);

				GOTO SkipThisRun
			END
		END

		UPDATE master.dbo.CommandLog
		SET StartTime = GETDATE()
		WHERE ID = @ID;

		BEGIN TRY 
			EXECUTE(@Command);

			UPDATE master.dbo.CommandLog
			SET EndTime = GETDATE()
			WHERE ID = @ID;

		END TRY
		BEGIN CATCH

			UPDATE master.dbo.CommandLog
			SET EndTime = GETDATE(),
				ErrorMessage = ERROR_MESSAGE(),
				ErrorNumber = ERROR_NUMBER()
			WHERE ID = @ID;

		END CATCH

       END CONVERSATION @conversation_handle;
    END

	-- If end dialog message, end the dialog
    ELSE IF @message_type_name = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog'
    BEGIN
       END CONVERSATION @conversation_handle;
    END

	-- If error message, log and end conversation
    ELSE IF @message_type_name = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error'
    BEGIN
		DECLARE @error INT;
		DECLARE @description NVARCHAR(4000);

		-- Pull the error code and description from the doc
		WITH XMLNAMESPACES ('http://schemas.microsoft.com/SQL/ServiceBroker/Error' AS ssb)
		SELECT
			@error = @message_body.value('(//ssb:Error/ssb:Code)[1]', 'INT'),
			@description = @message_body.value('(//ssb:Error/ssb:Description)[1]', 'NVARCHAR(4000)');
		
		RAISERROR(N'Received error Code:%i Description:"%s"', 16, 1, @error, @description) WITH LOG;

		-- Now that we handled the error logging cleanup
		END CONVERSATION @conversation_handle;
	END
 
 SkipThisRun:   
    COMMIT TRANSACTION;

  END
GO

Enabling Parallel Queue Activation

The last thing we need to do is bind our stored procedure to the queue for automated activation and to set the MAX_QUEUE_READERS to the number of concurrent parallel tasks we want to have executing on the system at any given point in time.  I spent some time testing different values for MAX_QUEUE_READERS and found that 5 was generally sufficient to provide significant reductions in runtimes without creating contention between the tasks being executed.

-- Alter the target queue to specify internal activation
ALTER QUEUE OlaHallengrenMaintenanceTaskQueue
WITH ACTIVATION
( STATUS = ON,
PROCEDURE_NAME = OlaHallengrenMaintenanceTaskQueue_ActivationProcedure,
MAX_QUEUE_READERS = 5,
EXECUTE AS SELF
);
GO

Now all we have to do is setup a job to feed the tasks into the queue as messages and Service Broker will automatically begin processing the tasks in parallel up to the MAX_QUEUE_READERS configured value.  To do this, I just used Ola’s scripts to log the commands to the CommandLog table but not execute them and then using a cursor and ROW_NUMBER() definition to split the work up across different tables instead of queuing the commands in the order generated, send each ID as a message to Service Broker to process.

DECLARE @MaxID INT
SELECT @MaxID = MAX(ID)
FROM master.dbo.CommandLog;

SELECT @MaxID = ISNULL(@MaxID, 0)

-- Load new tasks into the Command Log
EXECUTE master.dbo.IndexOptimize
@Databases = 'AdventureWorks2014',
@FragmentationLow = 'INDEX_REBUILD_OFFLINE',
@FragmentationMedium = 'INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
@FragmentationHigh = 'INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
@UpdateStatistics = 'ALL',
@FragmentationLevel1 = 5,
@FragmentationLevel2 = 30,
@LogToTable = 'Y',
@Execute = 'N';

DECLARE @NewMaxID INT
SELECT @NewMaxID = MAX(ID)
FROM master.dbo.CommandLog;

USE    msdb;

DECLARE @id int;

-- Don't submit commands  in exact command order or parallel processing
-- of indexes/stats on same object will occur and could block
DECLARE command_cursor CURSOR FOR
SELECT ID
FROM (
SELECT ROW_NUMBER() OVER
(PARTITION BY ObjectName
ORDER BY COALESCE(IndexName, StatisticsName), CommandType) AS Ranking,
ID
FROM master.dbo.CommandLog
WHERE ID > @MaxID AND ID <= @NewMaxID)  AS t
ORDER BY Ranking

OPEN command_cursor

FETCH NEXT FROM command_cursor
INTO @id

WHILE @@FETCH_STATUS = 0
BEGIN

-- Begin a conversation and send a request message
DECLARE @conversation_handle UNIQUEIDENTIFIER;
DECLARE @message_body XML;

BEGIN TRANSACTION;

BEGIN DIALOG @conversation_handle
FROM SERVICE [OlaHallengrenMaintenanceTaskService]
TO SERVICE N'OlaHallengrenMaintenanceTaskService'
ON CONTRACT [OlaHallengrenMaintenanceTaskContract]
WITH ENCRYPTION = OFF;

SELECT @message_body = N'<CommandLogID>'+CAST(@id AS NVARCHAR)+N'</CommandLogID>';

SEND ON CONVERSATION @conversation_handle
MESSAGE TYPE [OlaHallengrenMaintenanceTaskMessage]
(@message_body);

COMMIT TRANSACTION;

-- Get the next command to run
FETCH NEXT FROM command_cursor
INTO @id
END
CLOSE command_cursor;
DEALLOCATE command_cursor;
GO

I originally started out processing the records in the order generated, but I found that I would hit blocking with statistics updates on the same table until Trace Flag 7471 was enabled for the instance to change the locking behavior in SQL Server 2014 SP1+CU6.  However, if you don’t have access to the trace flag, by randomizing the order of objects as they are queued, only the tables with a large number of indexes or statistics will have problems running parallel and only at the end of the maintenance tasks since the threads will be working through different tables by the order of queuing.  This has resulted in 30-45% reductions in maintenance times depending on the database schema and the types of maintenance that have to be performed so far in testing.

Why I hate the ring_buffer target in Extended Events

I’ve had so many questions about the same problem with the ring_buffer target in Extended Events lately that I figured I would write a blog post that explains all the information I teach about the ring_buffer target and the problems associated with it.  Since the release of SQL Server 2012, and the new UI for Extended Events, I have never used the ring_buffer target once, in fact as the title of this post says I really hate the ring_buffer target. In this post I’ll go over the reasons I hate the ring_buffer target and hopefully convince you why you should use the file_target instead.

Missing Events

This is by far the most common problem I have to explain about the ring_buffer target by email.  Generally the question is phrased along the lines of:

“Hey Jonathan,

I got the code below from one of your articles on SQL Server central and it is not working. The problem I have is that when I run the code, it doesn’t show any deadlock graphs even though I know one just occurred in the application. It seems like I only see older deadlocks in the system_health session, but never the most recent one.  I’ve turned on Trace 1222 and get the information that way, so why doesn’t this work.”

The reality of the situation is that the events are actually there, you just can’t see them because of a limitation of the sys.dm_xe_session_targets DMV.  The target_data column of this DMV can only output roughly 4MB of XML data. The information about the 4MB formated XML limitation of the DMV was explained by Bob Ward on the CSS SQL Server Engineers blog post You may not see the data you expect in Extended Event Ring Buffer Targets…. back in 2009.  To demonstrate the effect of this limitation, lets look at the number of events contained in the ring_buffer target for the system_health event session on a SQL Server 2012 SP1+CU7 server that I have permission to share the information from using the following query.

SELECT 
    ring_buffer_event_count, 
    event_node_count, 
    ring_buffer_event_count - event_node_count AS events_not_in_xml
FROM
(    SELECT target_data.value('(RingBufferTarget/@eventCount)[1]', 'int') AS ring_buffer_event_count,
        target_data.value('count(RingBufferTarget/event)', 'int') as event_node_count
    FROM 
    (    SELECT CAST(target_data AS XML) AS target_data  
        FROM sys.dm_xe_sessions as s
        INNER JOIN sys.dm_xe_session_targets AS st 
            ON s.address = st.event_session_address
        WHERE s.name = N'system_health'
            AND st.target_name = N'ring_buffer'    ) AS n    ) AS t;

image

The ring_buffer_event_count comes from the eventCount attribute maintained in the root RingBufferTarget element of the XML document returned, and the event_node_count is just a count of the RingBufferTarget/event nodes contained by the XML document returned from the DMV.  Here you can see that there are a total of 5000 events contained in the ring_buffer target, based on the new 2012 MAX_EVENTS_LIMIT option which is set at 5000 for the system_health session. Out of those, only 3574 are actually being output by the XML document from the DMV.  That leaves us with 1426 events that are not available, but still exist in memory. The XML document outputs the events oldest first, so the events that we can’t see are actually the most recently collected events and are the ones that we’d probably want to actually have access to.

We can look at the memory usage for the target using the query below to compare the binary data size maintained within the target, versus the size of the formatted XML being output by the DMV.

SELECT 
    target_data.value('(RingBufferTarget/@memoryUsed)[1]', 'int') AS buffer_memory_used_bytes,
    ROUND(target_data.value('(RingBufferTarget/@memoryUsed)[1]', 'int')/1024., 1) AS buffer_memory_used_kb,
    ROUND(target_data.value('(RingBufferTarget/@memoryUsed)[1]', 'int')/1024/1024., 1) AS buffer_memory_used_MB,
    DATALENGTH(target_data) AS xml_length_bytes,
    ROUND(DATALENGTH(target_data)/1024., 1) AS xml_length_kb,
    ROUND(DATALENGTH(target_data)/1024./1024,1) AS xml_length_MB
FROM (
SELECT CAST(target_data AS XML) AS target_data  
FROM sys.dm_xe_sessions as s
INNER JOIN sys.dm_xe_session_targets AS st 
    ON s.address = st.event_session_address
WHERE s.name = N'system_health'
 AND st.target_name = N'ring_buffer') as tab(target_data)

image

Here we can see that the memory usage is roughly 1.7MB of binary data, but once serialized to formatted XML, the length of the document is roughly 4.7MB in size, which is larger than the published XML size, but only slightly.  This is the problem with the nature of the data that can be generated by Extended Events, it can be very compact in binary format, but serializing the formatted XML creates a lot of added overhead for the events that have been collected.  The system_health event session is specifically prone to having the maximum, 5000, number of events collected but the ring_buffer, but only a smaller subset of the events can actually be output by the DMV due to the sizes of the events that get collected by that event session.  The events most likely to have a fairly large size are the sp_server_diagnostics_component_output and the xml_deadlock_report events, since the sizes of the XML being returned by both of those events depends on the specifics of the conditions for when they fired.  I’ve seen sp_server_diagnostics_component_output events that are over 512KB in size on busy servers when problems occur, so that would significantly limit the available data that could be returned by the DMV for the ring_buffer target when one of those events is contained in the output XML.

No UI Support

This for me is probably the biggest reason that I don’t deal with the ring_buffer target anymore in my day to day work.  The UI for Extended Events doesn’t support breaking down the information that is contained in the ring_buffer target at all.  The only thing the UI does is show the XML that was output by the target_data column of the sys.dm_xe_session_targets DMV.

image

This means that to consume the data you have to either open the XML and scan down the events, or write XQuery to parse the XML into table form, which requires that you know the events, columns, and actions that were used in the event sessions definition to really access the data. If I am doing short term data collection and don’t want it persisted to a file in the file system on SQL Server 2012, I just opt for the Live View to stream the data back into the grid, where I don’t have to deal with the XML and can rapidly find the information I was interested.  For anything long term, or even looking at information from the system_health event session, I use the file_target, which the UI can read and process the events from, eliminating the need to do any XQuery manually.

Watch the Memory Usage

This is a new one to me, but it is something to be incredibly careful about with the way the ring_buffer target gets configured on a production server.  Two weeks ago, Andy Galbraith (Blog|Twitter) ran into an issue on a server where all connections were receiving 701 Insufficient System Memory errors. During his analysis Andy realized that the MEMORYCLERK_XE memory clerk in SQL Server was consuming close to 10GB of memory on a server with 16GB of total RAM with ‘max server memory’ configured to 11000MB.  The problem was that the event session that was created with a MAX_EVENTS_LIMIT of 1,000,000 events, but with no MAX_MEMORY configuration on the target.  Since the memory usage of the target is based on the events that have been captured this is, and no maximum was set, it can essentially use an unlimited amount of memory, leading to problems on the server if memory is limited like the example server Andy was troubleshooting.