I’ve been doing a bit of work on Service Broker while building a series of new Pluralsight courses and one of the things I recently found a good use for Service Broker on a client system with a 8TB VLDB was for implementing parallel maintenance tasks using multiple threads. The idea for this implementation came out of the fact that their maintenance window continues to become shorter and shorter the larger their workload increases, while the size of their database continues to get larger and larger. One of the things I noticed while doing a monthly health check of their server was that during the maintenance window, their hardware is no where close to being utilized and there are plenty of resources available due to the serial execution of their maintenance tasks. This specific client uses Ola Hallengren’s maintenance procedures, which we recommend to all of our clients, so I started to think about ways I could make things run in parallel to get better utilization of the hardware during the maintenance window and hopefully reduce the time required to perform the day to day maintenance in their database.
I wish I could say the first thing I thought of was Service Broker, but it wasn’t I toyed around with the idea of a custom C# app to run Ola’s scripts and log the output commands to the CommandLog table in master, then use multiple threads to run each of the individual tasks, something similar using Powershell instead of a compiled app, and even multiple agent jobs to break things up using different parameters for IndexOptimize. It turns out that these are all over engineering a very basic problem that Service Broker is highly suited towards solving and it only took a little bit of trial and error to create a very robust solution that runs index and statistics maintenance tasks in parallel in SQL Server.
Setting up Service Broker Components
I chose to implement the Service Broker components for this configuration in the MSDB database for a couple of reasons. First, it is already enabled for Service Broker, so I don’t have to worry about that, second, it is also already TRUSTWORTHY ON which is required by Agent, and third it is owned by SA, so an activation stored procedure marked with EXECUTE AS OWNER, runs as a full sysadmin in the server allowing it to run any maintenance command generated by Ola’s scripts. We need to configure some basic objects for Service Broker in the database; a message type, a contract, a queue and a service.
USE msdb GO -- Create the message types CREATE MESSAGE TYPE [OlaHallengrenMaintenanceTaskMessage] VALIDATION = WELL_FORMED_XML; GO -- Create the contract CREATE CONTRACT [OlaHallengrenMaintenanceTaskContract] ([OlaHallengrenMaintenanceTaskMessage] SENT BY INITIATOR); GO -- Create the target queue and service CREATE QUEUE OlaHallengrenMaintenanceTaskQueue; GO CREATE SERVICE [OlaHallengrenMaintenanceTaskService] ON QUEUE OlaHallengrenMaintenanceTaskQueue ([OlaHallengrenMaintenanceTaskContract]); GO
Building an Activation Procedure to Automate Processing
With that setup, now we need to build an activation stored procedure to process the messages from the queue and perform the operations generated by Ola’s scripts. At first I thought that this would be pretty straightforward, but it actually required a fair amount of additional logic to prevent poison messages and queue deactivation from occurring when multiple online index rebuilds were attempted on the same table. When this occurs the engine raises Error 1912:
Could not proceed with index DDL operation on %S_MSG ‘%.*ls’ because it conflicts with another concurrent operation that is already in progress on the object. The concurrent operation could be an online index operation on the same object or another concurrent operation that moves index pages like DBCC SHRINKFILE.
which causes the XACT_STATE() of the operation to become –1, which only allows the transaction processing the command to be rolled back, leading to a poison message within Service Broker. So to mitigate against that I had to add in a check against sys.dm_tran_locks for any ALTER_INDEX command to identify if a conflicting lock is currently being held and if it is then we just requeue the message and increment the retry count held within the message so that we can incrementally back-off attempting to run the command each time it encounters a conflicting lock using a WAITFOR. Since Ola’s scripts log the command type, database name, object name, and command to be executed to the CommandLog table in the master database, all we need to queue in our message is the ID of the command to be processed by Service Broker, and we can get the rest of the pertinent information from the CommandLog table directly. As a part of each tasks execution in our activation stored procedure we will also update the StartTime and EndTime of the operation for historical trending of execution durations.
IF OBJECT_ID('OlaHallengrenMaintenanceTaskQueue_ActivationProcedure') IS NOT NULL BEGIN DROP PROCEDURE OlaHallengrenMaintenanceTaskQueue_ActivationProcedure; END GO CREATE PROCEDURE OlaHallengrenMaintenanceTaskQueue_ActivationProcedure AS DECLARE @conversation_handle UNIQUEIDENTIFIER; DECLARE @message_body XML; DECLARE @message_type_name SYSNAME; DECLARE @Command NVARCHAR(MAX); DECLARE @ID INT DECLARE @DBName SYSNAME; DECLARE @ObjectName SYSNAME; DECLARE @CommandType NVARCHAR(60); DECLARE @Retry INT; DECLARE @FQN NVARCHAR(400); WHILE (1=1) BEGIN BEGIN TRANSACTION; WAITFOR ( RECEIVE TOP(1) @conversation_handle = conversation_handle, @message_body = message_body, @message_type_name = message_type_name FROM OlaHallengrenMaintenanceTaskQueue ), TIMEOUT 5000; IF (@@ROWCOUNT = 0) BEGIN ROLLBACK TRANSACTION; BREAK; END IF @message_type_name = N'OlaHallengrenMaintenanceTaskMessage' BEGIN SELECT @ID = @message_body.value('(CommandLogID)', 'int'), @Retry = ISNULL(@message_body.value('(CommandLogID/@retry)', 'int'), 0); SELECT @Command = Command, @ObjectName = ObjectName, @DBName = DatabaseName, @FQN = QUOTENAME(DatabaseName) + '.' + QUOTENAME(SchemaName) + '.' + QUOTENAME(ObjectName), @CommandType = CommandType FROM master.dbo.CommandLog WHERE ID = @ID; -- Check for Index rebuilds if one is already running and requeue the request after waiting. IF @CommandType = 'ALTER_INDEX' BEGIN -- Check if we have an incompatible lock that would lead to a failed execution IF EXISTS (SELECT 1 FROM sys.dm_tran_locks AS tl WHERE (request_mode = 'SCH-M' OR -- Concurrent maintenance task doing UpdateStats? (request_mode = 'LCK_M_SCH_S' AND EXISTS (SELECT 1 FROM sys.dm_exec_sessions AS s WHERE is_user_process = 1 AND tl.request_session_id = s.session_id))) AND resource_associated_entity_id = OBJECT_ID(@FQN) AND resource_database_id = DB_ID(@DBName) ) BEGIN -- Wait for 5 seconds times the number of retrys to do an incremental backoff -- This will eventually cause all queue readers to die off and serial execution of tasks DECLARE @Delay NVARCHAR(8) = CAST(DATEADD(ss, @Retry*5, CAST('00:00:00'AS TIME)) AS VARCHAR) WAITFOR DELAY @Delay -- Increment retry count in the message SELECT @message_body = N'<CommandLogID retry="'+CAST(@Retry+1 AS NVARCHAR)+'">'+CAST(@id AS NVARCHAR)+N'</CommandLogID>'; -- Send the message back to the queue for later processing ;SEND ON CONVERSATION @conversation_handle MESSAGE TYPE [OlaHallengrenMaintenanceTaskMessage] (@message_body); GOTO SkipThisRun END END UPDATE master.dbo.CommandLog SET StartTime = GETDATE() WHERE ID = @ID; BEGIN TRY EXECUTE(@Command); UPDATE master.dbo.CommandLog SET EndTime = GETDATE() WHERE ID = @ID; END TRY BEGIN CATCH UPDATE master.dbo.CommandLog SET EndTime = GETDATE(), ErrorMessage = ERROR_MESSAGE(), ErrorNumber = ERROR_NUMBER() WHERE ID = @ID; END CATCH END CONVERSATION @conversation_handle; END -- If end dialog message, end the dialog ELSE IF @message_type_name = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog' BEGIN END CONVERSATION @conversation_handle; END -- If error message, log and end conversation ELSE IF @message_type_name = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error' BEGIN DECLARE @error INT; DECLARE @description NVARCHAR(4000); -- Pull the error code and description from the doc WITH XMLNAMESPACES ('http://schemas.microsoft.com/SQL/ServiceBroker/Error' AS ssb) SELECT @error = @message_body.value('(//ssb:Error/ssb:Code)', 'INT'), @description = @message_body.value('(//ssb:Error/ssb:Description)', 'NVARCHAR(4000)'); RAISERROR(N'Received error Code:%i Description:"%s"', 16, 1, @error, @description) WITH LOG; -- Now that we handled the error logging cleanup END CONVERSATION @conversation_handle; END SkipThisRun: COMMIT TRANSACTION; END GO
Enabling Parallel Queue Activation
The last thing we need to do is bind our stored procedure to the queue for automated activation and to set the MAX_QUEUE_READERS to the number of concurrent parallel tasks we want to have executing on the system at any given point in time. I spent some time testing different values for MAX_QUEUE_READERS and found that 5 was generally sufficient to provide significant reductions in runtimes without creating contention between the tasks being executed.
-- Alter the target queue to specify internal activation ALTER QUEUE OlaHallengrenMaintenanceTaskQueue WITH ACTIVATION ( STATUS = ON, PROCEDURE_NAME = OlaHallengrenMaintenanceTaskQueue_ActivationProcedure, MAX_QUEUE_READERS = 5, EXECUTE AS SELF ); GO
Now all we have to do is setup a job to feed the tasks into the queue as messages and Service Broker will automatically begin processing the tasks in parallel up to the MAX_QUEUE_READERS configured value. To do this, I just used Ola’s scripts to log the commands to the CommandLog table but not execute them and then using a cursor and ROW_NUMBER() definition to split the work up across different tables instead of queuing the commands in the order generated, send each ID as a message to Service Broker to process.
DECLARE @MaxID INT SELECT @MaxID = MAX(ID) FROM master.dbo.CommandLog; SELECT @MaxID = ISNULL(@MaxID, 0) -- Load new tasks into the Command Log EXECUTE master.dbo.IndexOptimize @Databases = 'AdventureWorks2014', @FragmentationLow = 'INDEX_REBUILD_OFFLINE', @FragmentationMedium = 'INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE', @FragmentationHigh = 'INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE', @UpdateStatistics = 'ALL', @FragmentationLevel1 = 5, @FragmentationLevel2 = 30, @LogToTable = 'Y', @Execute = 'N'; DECLARE @NewMaxID INT SELECT @NewMaxID = MAX(ID) FROM master.dbo.CommandLog; USE msdb; DECLARE @id int; -- Don't submit commands in exact command order or parallel processing -- of indexes/stats on same object will occur and could block DECLARE command_cursor CURSOR FOR SELECT ID FROM ( SELECT ROW_NUMBER() OVER (PARTITION BY ObjectName ORDER BY COALESCE(IndexName, StatisticsName), CommandType) AS Ranking, ID FROM master.dbo.CommandLog WHERE ID > @MaxID AND ID <= @NewMaxID) AS t ORDER BY Ranking OPEN command_cursor FETCH NEXT FROM command_cursor INTO @id WHILE @@FETCH_STATUS = 0 BEGIN -- Begin a conversation and send a request message DECLARE @conversation_handle UNIQUEIDENTIFIER; DECLARE @message_body XML; BEGIN TRANSACTION; BEGIN DIALOG @conversation_handle FROM SERVICE [OlaHallengrenMaintenanceTaskService] TO SERVICE N'OlaHallengrenMaintenanceTaskService' ON CONTRACT [OlaHallengrenMaintenanceTaskContract] WITH ENCRYPTION = OFF; SELECT @message_body = N'<CommandLogID>'+CAST(@id AS NVARCHAR)+N'</CommandLogID>'; SEND ON CONVERSATION @conversation_handle MESSAGE TYPE [OlaHallengrenMaintenanceTaskMessage] (@message_body); COMMIT TRANSACTION; -- Get the next command to run FETCH NEXT FROM command_cursor INTO @id END CLOSE command_cursor; DEALLOCATE command_cursor; GO
I originally started out processing the records in the order generated, but I found that I would hit blocking with statistics updates on the same table until Trace Flag 7471 was enabled for the instance to change the locking behavior in SQL Server 2014 SP1+CU6. However, if you don’t have access to the trace flag, by randomizing the order of objects as they are queued, only the tables with a large number of indexes or statistics will have problems running parallel and only at the end of the maintenance tasks since the threads will be working through different tables by the order of queuing. This has resulted in 30-45% reductions in maintenance times depending on the database schema and the types of maintenance that have to be performed so far in testing.