Yesterday’s post, Targets Week – asynchronous_file_target, looked at the asynchronous_file_target Target in Extended Events and how it outputs the raw Event data in an XML document. Continuing with Targets week today, we’ll look at the bucketizer targets in Extended Events which can be used to group Events based on the Event data that is being returned.
What is the bucketizer?
The bucketizer performs grouping of Events as they are processed by the target into buckets based on the Event data and the Targets configuration. There are two bucketizer targets in Extended Events; a synchronous_bucketizer and an asynchronous_bucketizer. The only difference between the two is the manner in which the Event data is processed; either synchronously on the connection that generated the Event, or asynchronously after being dispatched to the target based on the MAX_DISPATCH_LATENCY for the Event Session, or when the dispatch buffer becomes full. Since the two bucketizers are identical in every way, except for their processing, this blog post will use the asynchronous_bucketizer for all further references. The bucketizers are a memory resident target, similar to the ring_buffer and like the ring_buffer, only contain the grouped Event data when the Event Session is active. When the Event Session is stopped, the memory buffers allocated to the bucketizer target are freed and all data contained in the target disappears. The bucketizer targets can be used to simplify troubleshooting by identifying the events that are occurring the most, and then allowing more focused Event collection for further analysis. Further analysis could include using either the ring_buffer or the asynchronous_file_target to look at the actual Event data being generated, or changing the bucketizer Targets configuration to group event occurrences based on a different criteria.
The ring_buffer like most of the targets has configuration options that can be found in the sys.dm_xe_object_columns DMV.
-- Target Configurable Fields SELECT oc.name AS column_name, oc.column_id, oc.type_name, oc.capabilities_desc, oc.description FROM sys.dm_xe_packages AS p JOIN sys.dm_xe_objects AS o ON p.guid = o.package_guid JOIN sys.dm_xe_object_columns AS oc ON o.name = oc.OBJECT_NAME AND o.package_guid = oc.object_package_guid WHERE (p.capabilities IS NULL OR p.capabilities & 1 = 0) AND (o.capabilities IS NULL OR o.capabilities & 1 = 0) AND o.object_type = 'target' AND o.name = 'asynchronous_bucketizer'
In SQL Server 2008, 2008R2, and SQL Server Denali CTP1, there are four configuration options for the asynchronous_bucketizer Target. The slots option sets the maximum number of buckets the target will collect. Once this number of buckets is reached, new events that do not apply to an existing bucket are dropped by the target and not grouped. The filtering_event_name option is used to set the name of the specific Event in the Event Session to filter on. The source_type option is used to specify whether the source being used for bucketing is a part of the Event data or an Action that has been added to the Events contained in the Event Session. The source option specifies the source that will be used to generate the buckets for grouping in the target.
As shown above the source is the only required option for the asynchronous_bucketizer Target. However, when the source is an Action the source_type option is also required to specify that the source is an Action. When using one of the Event Data elements as the source, only the Data element (also known as a column name) needs to provided to the source. When using an Action for the source, the Package name must be specified along with the Action name in the format of packagename.actionname. Likewise when specifying a filtering_event_name, the Package name must also be provided in the format of packagename.eventname.
Understanding the Target Data Format
The bucketizer Targets like the other Targets already output the data in XML format, and the XML is not schema bound, but has a predictable format. Inside the Extended Events Engine, the bucketing data is maintained in a binary format that minimizes the amount of memory necessary for the Targets memory buffers. The bucketing data is materialized into an XML document when the Target information is queried using the sys.dm_xe_session_targets DMV, allowing it to be used for analysis. The asynchronous_bucketizer XML document contains a parent XML <BucketizerTarget> Node that contains attributes about the Targets operation since the Event Session was started including the number of truncated Events and the maximum number of buckets contained in the Target. The bucket groups are contained in <Slot> nodes that have two attributes; the count is the number of events that have occurred and the trunc is the number of bytes that have been truncated. The <Slot> node contains a <value> node that contains the source that the bucket belongs to. A simplified representation of the XML document for the asynchronous_bucketizer target is:
<BucketizerTarget truncated="" buckets=""> <Slot count="" trunc=""> <value></value> </Slot> </BucketizerTarget>
Querying/Parsing the Target Data
The asynchronous_bucketizer targets simplistic XML output makes querying it relatively simple compared to the targets that we’ve already looked at this week. However, unlike the other ring_buffer and asychronous_file_target, the asychronous_bucketizer can not be parsed using Adam Machanic’s Extended Events Code Generator. The simplicity of the XML and its standard output doesn’t really require specialized code to generate a easily usable table output for this. The Extended Events SSMS Addin for SQL Server 2008 TargetDataViewer will shred the XML but its not even worth using for this particular target, since the XQuery is very simple, and you can do a lot more with the TSQL depending on the Event Session that your create. To demonstrate the usage of the asynchronous_bucketizer, we’ll look at a couple of examples. The first example will show how to track recompiles by database_id to find the databases that have the most recompiles occurring.
-- Create an Event Session to Track Recompiles IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='BucketizerTargetDemoRecompiles') DROP EVENT SESSION [BucketizerTargetDemoRecompiles] ON SERVER; CREATE EVENT SESSION [BucketizerTargetDemoRecompiles] ON SERVER ADD EVENT sqlserver.sql_statement_starting ( ACTION (sqlserver.database_id) -- database_id to bucket on WHERE (state=1) -- recompile state from dm_xe_map_values ), ADD EVENT sqlserver.sp_statement_starting ( ACTION (sqlserver.database_id) -- database_id to bucket on WHERE (state=1) -- recompile state from dm_xe_map_values ) ADD TARGET package0.asynchronous_bucketizer ( SET source_type=1, -- specifies bucketing on Action source='sqlserver.database_id' -- Action to bucket on ) WITH (MAX_DISPATCH_LATENCY = 5 SECONDS) GO ALTER EVENT SESSION [BucketizerTargetDemoRecompiles] ON SERVER STATE=START
The above session collects the sql_statement_starting and sp_statement_starting Events, adds the database_id Action to the Event so that we can bucket on it, and then filters the Events to only fire if the state for the Event matches the map_key in sys.dm_xe_map_values for Recompile. If the server being tested on doesn’t have a high recompile rate, an easy way to trigger Recompiles is to update the statistics on the tables inside of a database.
EXECUTE sp_MSforeachtable 'UPDATE STATISTICS ?'
To view the bucketized data from the target, we query sys.dm_xe_session_targets for our session and target using CAST to convert the target_data to XML in a derived table, and then using a CROSS APPLY of the .node() method to split on the <Slot> nodes.
SELECT DB_NAME(n.value('(value)', 'int')) AS DatabaseName, n.value('(@count)', 'int') AS EventCount, n.value('(@trunc)', 'int') AS EventsTrunc FROM (SELECT CAST(target_data as XML) target_data FROM sys.dm_xe_sessions AS s JOIN sys.dm_xe_session_targets t ON s.address = t.event_session_address WHERE s.name = 'BucketizerTargetDemoRecompiles' AND t.target_name = 'asynchronous_bucketizer') as tab CROSS APPLY target_data.nodes('BucketizerTarget/Slot') as q(n)
With the <Slot> nodes split, pulling the <value> node and attributes is very simple, and since we bucketed on database_id, we can use the DB_NAME() function in SQL to return the database name associated with the database_id in the <value> node.
Considerations for Usage
The bucketizer targets are great for simplifying analysis of Event data to determine who to best proceed with further troubleshooting. However, in SQL Server 2008, and 2008R2 a bug exists that causes incorrect output from the bucketizers when used to bucket on the wait_info event wait_type Data element. This was fixed in SQL Server 2008 Service Pack 2 (http://support.microsoft.com/kb/2285068), and is not a problem in SQL Server Denali CTP1, but as of this writing has yet to be corrected in SQL Server 2008 R2 (at least the CU’s I have tested, there may be a newer one that I have missed, but I didn’t find one in a search). To demonstrate this problem the following Event Session can be used:
IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='BucketizerTargetDemoWaits') DROP EVENT SESSION [BucketizerTargetDemoWaits] ON SERVER; CREATE EVENT SESSION [BucketizerTargetDemoWaits] ON SERVER ADD EVENT sqlos.wait_info ( ACTION (sqlserver.database_id) WHERE (duration > 0)) ADD TARGET package0.asynchronous_bucketizer( SET filtering_event_name='sqlos.wait_info', source_type=0, source='wait_type') WITH (MAX_DISPATCH_LATENCY = 5 SECONDS) GO ALTER EVENT SESSION [BucketizerTargetDemoWaits] ON SERVER STATE=START
The above Event Session will return valid map_key values for the wait_types Map in sys.dm_xe_map_values on SQL Server 2008 Service Pack 2 and SQL Server Denali CTP1, but will have erroneous information in the <value> node on SQL Server 2008 RTM and SP1 and SQL Server 2008 R2. To query the bucketed waits from the target, use the following query:
SELECT mv.map_value AS WaitType, n.value('(@count)', 'int') AS EventCount, n.value('(@trunc)', 'int') AS EventsTrunc, n.value('(value)', 'int') AS MapKey FROM (SELECT CAST(target_data as XML) target_data FROM sys.dm_xe_sessions AS s JOIN sys.dm_xe_session_targets t ON s.address = t.event_session_address WHERE s.name = 'BucketizerTargetDemoWaits' AND t.target_name = 'asynchronous_bucketizer') as tab CROSS APPLY target_data.nodes('BucketizerTarget/Slot') as q(n) JOIN sys.dm_xe_map_values as mv ON mv.map_key = n.value('(value)', 'int') WHERE mv.name = 'wait_types'
Well, at this point we are nearly half way through the Targets in Extended Events, and tomorrow we’ll continue our investigation by looking at the synchronous_event_counter, which can be used to help determine the impact an Event Session may have without having to perform full Event collection.