When moving from SQL Server 2008+ to SQL Server 2012, be aware that the sampling algorithm has changed.

For example, I created a partitioned clustered index on two identical tables (same schema, rows and distribution), one in SQL Server 2008 R2 SP1 and the other in SQL Server 2012 RC0. 

Below shows the partial output (STAT_HEADER) of a DBCC SHOW_STATISTICS on SQL Server 2008 R2 SP1:

clip_image001

All rows were evaluated for the statistics generation.

Now compare this to SQL Server 2012 RC0:

clip_image002

SQL Server 2012 now uses default sampling for the partition creation and rebuild.

Is this good or bad?

Certainly if you’ve required a full scan in the past in order to get higher quality stats, this may not be a good thing.  Otherwise you may not notice anything at all from a query performance perspective if the default is good (or accurate) enough.

You can see the number of steps decreased in my examples in this post between 2008 R2 SP1 to SQL Server 2012.  My histograms for this identical table also changed of course.  For example, moving from average rows on one step of 5504 down to 5305, distinct range rows from 343 up to 361... The real question is, will these changes be enough to turn a good plan bad. 

Post-upgrade, if you see changes in query performance against partitioned tables, evaluate the histograms and test performance of the workload after the default sampling versus after a full scan update of the statistics (via UPDATE STATISTICS for example).  

I was interested in exploring various questions about columnstore indexing this morning – and I realized I should probably blog about what I observed.  This truly was just an exploration and not a formal test (call it semi-structured fiddling).  Also, some of my “questions” were really just confirmations.  While I believe what I read, I like to see things firsthand whenever possible.

Today I thought I would take a few minutes to explore columnstore index behavior on a 123,695,104 row fact table.  My previous tests had been limited to < 10 million rows, which doesn’t really showcase columnstore index capabilities.  So I thought I would experiment with higher numbers – although I still want to start the billion row testing scenarios at some point.

The experimentations took place on a SQL Server instance capped at 8GB of RAM max server memory and 8 logical processors.  I used the FactInternetSales table from AdventureWorksDWDenali – increasing it to 123,695,104 rows. 

I covered the entire fact table for my initial exploration (although as you’ll see, I made a few changes both to this index and also moving from a heap table to a clustered index):

CREATE NONCLUSTERED COLUMNSTORE INDEX [NonClusteredColumnStoreIndex-20120225-092018] ON [dbo].[FactInternetSales]

(

       [ProductKey],

       [OrderDateKey],

       [DueDateKey],

       [ShipDateKey],

       [CustomerKey],

       [PromotionKey],

       [CurrencyKey],

       [SalesTerritoryKey],

       [SalesOrderNumber],

       [SalesOrderLineNumber],

       [RevisionNumber],

       [OrderQuantity],

       [UnitPrice],

       [ExtendedAmount],

       [UnitPriceDiscountPct],

       [DiscountAmount],

       [ProductStandardCost],

       [TotalProductCost],

       [SalesAmount],

       [TaxAmt],

       [Freight],

       [CarrierTrackingNumber],

       [CustomerPONumber],

       [OrderDate],

       [DueDate],

       [ShipDate]

) WITH (DROP_EXISTING = OFF)

GO

Remember that we can have up to 1,024 columns and that the concept of key columns and INCLUDE don’t apply here.  Nor does the ASC or DESC keywords or creation of the columnstore index as clustered (only nonclustered is supported at this point). 

Regarding the data types, we can cover the standard “business” ones – but not the “max”, LOB and xml flavors.

<quick aside> By the way – a shout out to Eric Hanson’s Columnstore Index references on the TechNet Wiki.  These have been invaluable in understanding columnstore indexing and also revealing tools to help fish for your own answers.  I love seeing this kind of high quality content coming directly from the Program Managers.  I really hope that more PMs start updating their associated feature areas on TechNet Wiki in the future.  When I first heard about TechNet Wiki I was more than a little dubious.  We already have blogs, forums, BOL, papers, videos, etc.  Why do we need another information channel?  Seeing Eric’s content has totally shifted my opinion on this.  The information is timely, fresh (you see it get updated frequently) and easily discoverable.  It also allows for community participation and questions – which, if managed well – only helps the associated quality of the content.   I don’t believe TechNet Wiki replaces BOL or other information channels, but I feel it has definitely earned a seat at the table.  Again, I hope other PMs follow this lead – or alternatively use their blogs to heavily describe their feature areas (much like the great blog post series on the AlwaysOn Readable Secondary feature from Sunil Agarwal).   Just doing periodic updates on their pet features can help do wonders for feature adoption and comprehension. </quick aside>

Back on task… The following are the various questions I asked and the various observations made….

Was tempdb used in creating the columnstore index on a heap table?

I intentionally had tempdb configured to an initial small size with auto-growth enabled in order to see if it was used at all during the CREATE COLUMNSTORE INDEX against the 123 million row table (I could have looked through counters and other ways too – this was just a side effect observation). The answer? It was not used (staying at a very small value of 10 MBs).

How do things look in sys.indexes and sys.index_columns?

Nothing terribly interesting - except that the is_included_column is a value of “1”:

clip_image002

Anything interesting in sys.column_store_dictionaries?

Most definitely… This was a pretty illuminating catalog to query.  A few of the observations:

·        There were 149 rows across the 26 columns defined in the index.

·        25 columns had one row – and 1 column had 125 rows.  Not surprisingly – the SalesOrderNumber was the column with 125 entries.  This is also the column I populated with a random GUID that I converted to nvarchar(20) for new entries to the table.  The values for this column, while not constrained, were nearly unique in my test (with the exception of the base data).

·        The on disk (dictionary) storage for SalesOrderNumber was by far the worst (not surprisingly).  It was 2,328,411,337 in bytes.  Compare that with 21,384 bytes for an integer column, 74,584 bytes for a datetime column and 1,088 bytes for a money data type column.  So this would point to a design decision around not including 100% unique values as part of the index – or if you do, choosing a much more compression friendly data type mapped to an associated dimension table.

·        The entry_count column_store_dictionaries column was also very interesting.  For example, the int data type ProductKey had 158 entries.  The ShipDate datetime column had 1,126 entries.  The SalesOrderNumber had 123,662,365 entries. No magic here (and the difference between those entries and the 123,695,104 row count was that the initial data populated didn’t use my random GUID values).

·        The type column from the results told me about the type of dictionary… For the columns with “1”, that maps to hash dictionary for integer based columns.  A type of “4” represented my float columns. Type “3” represented the string values.

·        The flags (“internal use only”) column from column_store_dictionaries was “0” for all columns except for UnitPriceDiscountPct (float), DiscountAmount (float), OrderDate/DueDate/ShipDate (datetime).  The flags column was “2” for the CustomerPONumber (nvarchar(25)) and CarrierTrackingNumber (nvarchar(25)).  Flags was 0 for everything else, so I’m definitely curious about this.

Before I continued, I dropped and recreated the columnstore index, this time without the SalesOrderNumber.    It wasn’t going to provide any value for my next set of explorations.

With the SalesOrderNumber column removed, leaving the other columns on the columnstore index, what was the on-disk footprint?

310,400 bytes for the dictionary size – for 123 million rows (per sys.column_store_dictionaries).  I double checked sys.dm_db_partition_stats as well and the lob_used_page_count was 304,579.  So the page count is around 2.3 GB – and the dictionary size is 310 KB.  Again, this is for 123,695,104 rows.  The heap itself is 2,876,461 pages – or roughly 22.4 GB.

Anything interesting in sys.column_store_segments?

Another great catalog view to explore with many interesting ways to slice and dice this data:

·        All 26 columns had 120 segments associated with them. But what is interesting is that this is for 26 columns – even though my columnstore index was recreated with just 25 columns.  So it was segmenting based on each column of the heap (seemingly).  If you enable trace flag 646 and 3605, you’ll see during index creation a message like “Column 25 was omitted from VertiPaq during column index build.” So it made me wonder if this really did represent 120 segments for the omitted column?

·        The average row count per segment was 1,030,792 rows and a maximum of 1,048,576. 

·        The minimum rows for a segment was 2,416. 

·        The 2,416 row count was for all columns on segment number 118. This was not the last segment.  Segment 119 was the last segment (first segment is 0-based) for each column and each one of these had 1,040,352.  Eric Hanson gave an explanation on a forum which talks about segments being built on parallel threads and when hitting the end of the table, not having enough rows to completely fill segments.

·        As for encoding_type, that was also interesting.  Some columns had both an encoding_type of 1 AND 2.  Only the first five columns of the table had an encoding type of “1” and then columns 1 through 25 had an encoding type of “2”.  Columns 21 and 22 (Freight and CarrierTrackingNumber) had an encoding of 3.  Freight is money data type and CarrierTrackingNumber is nvarchar(25).  Column 26 (ShipDate datetime) had an encoding type of “4”.  In BOL – encoding_type is defined as “type of encoding used for that segment”.  Sigh.

·        The min_data_id and max_data_id value rangers were also interesting.  Sometimes they reflected the actual column data ranges (for example in the date columns) and sometimes not.  Looking at “Understanding Segment Elimination” – Eric Hanson describes this behavior – saying that the values in these ranges can also reference “into a dictionary”.

Anything interesting in sys.column_store_index_stats?

The only noteworthy value was the number_of_segments (3,120).   

Can I see some segment elimination for my queries? (for non-string types on a heap table)

Remember that I created a columnstore index on a heap.  Let’s say we look at the segment ranges for the ProductKey column (column id 1).  Here are the associated ranges:

SELECT column_id, min_data_id, max_data_id, segment_id

FROM sys.column_store_segments

WHERE column_id = 1

ORDER BY column_id, min_data_id, max_data_id, segment_id;

All 120 segments have identical min and max ranges:

clip_image003

What about other columns – like the DueDate (datetime)?

clip_image004

Same thing applies – each segment, with the exception of a trailing row, have the same ranges. 

What about segment elimination on a table with a clustered index?

So next I dropped the columnstore index, and created a clustered index on ProductKey.  I’m not saying this is a great choice for the clustered index key – but rather, I’m just trying to understand the behavior in comparison to a heap and also look at any potential segment elimination.

In the following query, I’m pulling total order quantity for three products:

-- Segment elimination written to error log

DBCC TRACEON(3605, -1);

GO

DBCC TRACEON(646, -1);

GO

SELECT ProductKey,

             [DueDateKey],

             SUM(OrderQuantity) Total_OrderQuantity

FROM [dbo].[FactInternetSales]

GROUP BY ProductKey, DueDateKey

HAVING ProductKey IN

             (478, 343, 574);

I first validated that batch mode was indeed being used:

clip_image005

I then looked at the SQL Error Log (per the trace flag) – and I saw the storage engine skipped 28 “row group” (segments):

clip_image007

If I look at the segment meta data though for column_id 1, and I see blocks of segments covering the same ranges (instead of the same ranges across all segments):

SELECT segment_id, min_data_id, max_data_id

FROM sys.column_store_segments

WHERE column_id = 1

ORDER BY segment_id

clip_image009

By the way, the 28 segment elimination I saw in the error log for column 1 translated to 27,226,112 rows across (row_count from sys.column_store_segments).  The query itself takes less than 1 second to return 569 rows – from a 120 million+ row table. 

Will I get segment elimination on other columns not part of the clustered index key?

I tried the following query that filtered on DueDateKey:

SELECT ProductKey,

             [DueDateKey],

             SUM(OrderQuantity) Total_OrderQuantity

FROM [dbo].[FactInternetSales]

GROUP BY ProductKey, DueDateKey

HAVING DueDateKey = 20040522

Nope.  While the query still ran in less than a second – using batch instead of row mode for the columnstore index scan, segment elimination event did not occur.  Looking at the min and max ranges from sys.column_store_segments helps answer why segment elimination wasn’t possible:

SELECT column_id, min_data_id, max_data_id, segment_id

FROM sys.column_store_segments

WHERE column_id = 3 AND

       20040522 BETWEEN min_data_id AND max_data_id

This returns all 120 rows:

clip_image011

What if the DueDateKey is a secondary key column on the clustered index? 

I dropped the columnstore index and recreated the clustered index with ProductKey and DueDateKey.  I then recreated the columnstore index.

Checking the DueDateKey range in sys.column_store_segments, I see the ranges are no longer identical:

clip_image013

That seemed promising, so I executed the query filtering by DueDateKey:

clip_image015

Sure enough – I had segment elimination, even though this was defined as the second column in the clustered index key.  And what’s more, my ProductKey column segment elimination was still working (I tested the earlier queries again).  This raises some interesting questions around new clustered index key strategies when you know your table will primarily be used with a columnstore index.

Okay – that’s enough for today.  I still have many other scenarios I’d like to try out, and I’ll share here when I get the opportunity.

This post idea was prompted by a discussion I had this week with Jonathan Kehayias about an environment that had multiple transactional replication publications defined with overlapping table articles. 

In other words, a table was defined as an article in more than one publication.

While I can think of some cases where you would want to leverage different article options or filters, in this particular case the articles had no differences in how they were defined.  I’ve seen this in other environments in the past – and as I recall it wasn’t a conscious decision, but rather a lack of coordination across application teams and projects.

For small databases with lower volumes of modifications, this overlap could likely go unnoticed.   For larger tables with high amounts of data modifications, well, consider the following scenario:

·        You have two transactional replication publications that each reference the same table as an article.  No other article properties are changed between the two publications and articles.

·        Each publication maps to a single subscriber.

·        Your table article setting for this scenario use the default - propagating INSERTs, UPDATEs and DELETEs via the default statement delivery method (spMSins_ / sp_MSupd_ sp_MSdel) etc. (And while we are propagating changes made directly to the table, we’re not using stored procedure execution articles.)

So let’s say we execute the following single statement batch update against the redundantly published table.  This is one statement that updates 3,120 rows:

UPDATE dbo.charge

SET charge_amt = charge_amt * .97

WHERE provider_no = 386;

If we used sp_replcmds in the publisher database (I had the log reader agent job stopped in order to step through the scenario), how many command transactions would you expect to see marked for replication?

The answer is – 6,240.  One call per row updated, multiplied by two separate publications (and we’re still only in the publication database):

clip_image002

And as you may expect, those 6,240 rows move on to the distribution database (you can validate via  sp_browsereplcmds or MSrepl_commands):

clip_image004

Now had you instead just created ONE publication with that article sent to the two different subscribers, you would see just 3,120 in the publication database for the original update – and 3,120 as well at the distributor prior to multicasting the update to the two subscribers.

Coupled with the already “chatty” nature of transactional replication – you can imagine scenarios where performance rapidly degrades for large batch updates, particularly on already-constrained topologies.

Categories:
Performance | Replication

Last month I wrote a post called Hash Partitioning with SQL Server 2012’s SEQUENCE object and CYCLE argument and I was asked a good question by Eric Humphrey (twitter) about whether there was any advantage of using a sequence object for hash partitioning over an identity/modulus/computed-column technique. 

At that point I hadn’t compared the two techniques, but I wanted to perform a test eventually, so here is what I found…  This was a quick-and-dirty test, by the way, as I really should have been working on something else (first), but I needed a little “fun” time.

Before testing, I made a few minor modifications to the schema used in my original post:

1.      I split the range of the partition function as follows:

 

ALTER PARTITION FUNCTION pfFactInternetSales () SPLIT RANGE (0)

 

2.      I changed the sequence object to be zero-based instead of one-based:

 

CREATE SEQUENCE dbo.Seq_FactInternetSales

    AS int

               START WITH 0

               INCREMENT BY 1

    MINVALUE 0

    MAXVALUE 10

    CYCLE

    CACHE 10;

That is all that was changed with the sequence object technique I used before. As for a modulus technique, I created two additional columns to another version of the FactInternetSales table:

ProductID int NOT NULL IDENTITY(1,1),

               PartitionBucketKey AS ProductID %11  PERSISTED,

 

After adding the columns I referenced the PartitionBucketKey table as the partition key for the clustered index and imported the data from an INSERT…SELECT (as I did with the sequence object method).

 

So did the performance differ between the two techniques?

 

Here are the results of eight separate tests (each executed with a cold data cache for the SELECT, truncated destination table and pre-sized data and log files):

 

Test number

Rows loaded

Buckets

Hash Partition

Method

1

1,207,960 rows

11 buckets

26 seconds

Sequence

2

1,207,960 rows

11 buckets

26 seconds

Sequence

3

1,207,960 rows

11 buckets

29 seconds

Sequence

4

1,207,960 rows

11 buckets

26 seconds

Sequence

5

1,207,960 rows

11 buckets

27 seconds

Modulus

6

1,207,960 rows

11 buckets

24 seconds

Modulus

7

1,207,960 rows

11 buckets

25 seconds

Modulus

8

1,207,960 rows

11 buckets

26 seconds

Modulus

So the modulus method seems to perform ever-so-slightly faster than the sequence method.  Stylistically, the sequence method involved a more direct way to create X number of hash buckets (basically no calculation necessary).  With that said, the execution plans are nearly identical.

Sequence Plan (estimated cost – 17.4357)

clip_image002

Modulus Plan (estimated cost – 17.635)

clip_image004

I’m sure there are other variations of this test that I could try out, but that’s all for now. I’m curious about other techniques people may be using (like a CLR function to do the hashing), so please post them here if you have used an interesting variation on this post’s theme.

Consider the following scenario:

·        You have Transactional Replication deployed

·        Data is flowing, but just not as fast as you would like

·        This scenario could apply to local/remote distributors and push/pull subscribers

There are several different techniques we can use to narrow down where the replication performance issue is happening.  Perhaps you’ve already found that the performance issue is happening for log reader reads or distribution database writes.  Or perhaps you suspect the issue is on the subscriber? 

While the various replication techniques can help us narrow down the lagging member of the topology, I still would like more visibility into why a particular agent read or write process is performing more slowly.  Fortunately, you can do this in SQL Server 2008+…

In the following example, I’ll start by retrieving the session IDs of the log reader and distribution agents (and as an aside my replication topology is SQL Server instance version 10.50.2500):

-- Log Reader

SELECT session_id, program_name,

             reads,

             writes,

             logical_reads

FROM sys.dm_exec_sessions

WHERE original_login_name =

       'SQLSKILLS\SQLskillsLogReaderAG';

-- Distribution Agent

SELECT session_id, program_name,

             reads,

             writes,

             logical_reads

FROM sys.dm_exec_sessions

WHERE original_login_name =

       'SQLSKILLS\SQLskillsDistAGT';

In this example I’m using separate accounts to run the agent executables, however I could have also added a predicate on program_name based on the publication I was interested in evaluating.  For example, I could have said for the Log Reader agent – program_name = ‘Repl-LogReader-0-AdventureWorks2008R2-6’ and for the Distribution agent – that’s more interesting, as we have program_name = ‘CAESAR-AdventureWorks2008R2-Pub_AW_2008R2-AUGUSTUS-1’ (subscriber is AUGUSTUS, publisher is CAESAR).  But if you just used that program name, you won’t get Replication Distribution History session, which would be program_name = ‘Replication Distribution History’ and may also be interesting.

So in my example, I have 5 different sessions I’m interested in (and yours will vary based on the number of published databases, independent agents, server role, etc):

·        The log reader agent was using sessions 55, 57, 59

·        The distribution agent had two sessions (61 for history and 62 for the executable)

Now that I have my session ids, I’m going to create an extended events session that I can run during the “slow performing” period to help illuminate where to investigate next (and for more general discussion on this technique, see Paul Randal’s post “Capturing wait stats for a single operation”):

CREATE EVENT SESSION Replication_AGT_Waits

ON SERVER

ADD EVENT sqlos.wait_info(

       ACTION (sqlserver.session_id)

    WHERE ([package0].[equal_uint64]([sqlserver].[session_id],(55)) OR [package0].[equal_uint64]([sqlserver].[session_id],(57)) OR [package0].[equal_uint64]([sqlserver].[session_id],(59)) OR [package0].[equal_uint64]([sqlserver].[session_id],(61)) OR [package0].[equal_uint64]([sqlserver].[session_id],(62)))),

ADD EVENT sqlos.wait_info_external(

       ACTION (sqlserver.session_id)

    WHERE ([package0].[equal_uint64]([sqlserver].[session_id],(55)) OR [package0].[equal_uint64]([sqlserver].[session_id],(57)) OR [package0].[equal_uint64]([sqlserver].[session_id],(59)) OR [package0].[equal_uint64]([sqlserver].[session_id],(61)) OR [package0].[equal_uint64]([sqlserver].[session_id],(62))))

ADD TARGET package0.asynchronous_file_target

       (SET FILENAME = N'C:\temp\ReplAGTStats.xel',

        METADATAFILE = N'C:\temp\ReplAGTStats.xem')

GO

After creating the session, I’ll start.  In my test, I just ran the slow performing workload against one of the published tables, launched Replication Monitor, waited for the rows to arrive at the subscriber and then stopped the event session:

ALTER EVENT SESSION Replication_AGT_Waits

ON SERVER STATE = START;

-- Run representative replication workload against publisher

-- Launch Monitor and wait for all trans to be fully distributed

ALTER EVENT SESSION Replication_AGT_Waits

ON SERVER STATE = STOP;

Next, I created two intermediate temp tables to start going through the collected data:

-- Raw data into intermediate table

SELECT CAST(event_data as XML) event_data

INTO #ReplicationAgentWaits_Stage_1

FROM sys.fn_xe_file_target_read_file

             ('C:\temp\ReplAGTStats*.xel',

              'C:\temp\ReplAGTStats*.xem',

              NULL, NULL)

-- Aggregated data into intermediate table

-- #ReplicationAgentWaits       

SELECT

       event_data.value

       ('(/event/action[@name=''session_id'']/value)[1]', 'smallint') as session_id,

       event_data.value

       ('(/event/data[@name=''wait_type'']/text)[1]', 'varchar(100)') as wait_type,

       event_data.value

       ('(/event/data[@name=''duration'']/value)[1]', 'bigint') as duration,

       event_data.value

       ('(/event/data[@name=''signal_duration'']/value)[1]', 'bigint') as signal_duration,

       event_data.value

       ('(/event/data[@name=''completed_count'']/value)[1]', 'bigint') as completed_count

INTO #ReplicationAgentWaits_Stage_2

FROM #ReplicationAgentWaits_Stage_1;

Then I took a look at how things broke out by session_id:

SELECT session_id,

             wait_type,

             SUM(duration) total_duration,

             SUM(signal_duration) total_signal_duration,

             SUM(completed_count) total_wait_count

FROM #ReplicationAgentWaits_Stage_2

GROUP BY session_id,

             wait_type

ORDER BY session_id,

             SUM(duration) DESC;

Here were the results:

clip_image002

Session 55, 57 and 59 were my log agent sessions.  Just looking at session 57 (highlighted in purple), we see that IO_COMPLETION had the highest wait duration.  If I check out the accumulated reads from sys.dm_exec_sessions for that session, I see it is doing all reads, whereas session 59 was doing all writes (so we can start mapping to the agent thread roles). 

Session id 61 (in yellow) represented the Replication Distribution History process and session id 62 (in green) represented the distribution agent process.  As we can see for 62 – the longest duration was due to NETWORK_IO.  We also see a similar value from PREEMPTIVE_OS_WAITFORSINGLEOBJECT (and if you think that these seem correlated, indeed this preemptive wait type is seen in conjunction with the network waittype wait).

So what would we see at the subscriber side?  For this specific scenario, I saw the following (using the session of my distribution agent account) which was session id 55:

clip_image003

In this case, the top wait (by duration) was WRITELOG on the subscriber for the CAESAR_AdventureWorks2008R2_Pub_AW_2008R2 distribution agent process - although the number was not very high.

So if you’re experiencing slow replication, you may consider this additional technique in order to help further identify where the bottlenecks may be in the topology and also get initial ideas on why this may be. 

Categories:
Performance | Replication

Let’s say you’ve “disabled” parallelism on your SQL Server instance via the ‘max degree of parallelism’ setting as follows:

EXEC sp_configure 'max degree of parallelism', 1

RECONFIGURE

 

Now most folks know (or are finding out) that this doesn’t really prevent parallel plans if you throw in a MAXDOP hint.  For example, let’s take the following query:

 

EXEC sp_executesql

       N'SELECT charge_no FROM dbo.charge

       WHERE charge_dt = @charge_dt OPTION (MAXDOP 4)',

       N'@charge_dt datetime', 

       @charge_dt = '1999-07-20 10:49:11.833';

If we look at the actual plan and properties, even though I’ve capped the instance level max degree of parallelism, I get a parallel plan (using SQL Sentry Plan Explorer):

 

clip_image002

clip_image004

 

Now let’s say I want to out-smart anyone capping the MAXDOP by using Resource Governor and the MAX_DOP workload group setting:

 

CREATE WORKLOAD GROUP wgReportUsers

WITH

(

     MAX_DOP = 1

) USING [rpReportUsers]

GO

 

I also created a classifier function and a resource pool (assuming I want to restrict anything else).  I won’t add that setup code here since it’s just the standard RG configuration process.  For this example I’ve configured nothing unusual with the exception of setting MAX_DOP = 1 and making sure my reporting user gets classified to the constrained workload group.

 

Now let’s say the user connects in after RG is configured.  I am able to confirm this via Profiler through the PreConnect:Completed event class:

 

clip_image006

 

In this example, GroupId 257 maps to my workload group.

So I’m logged in as the report user, and I’ve given that user the appropriate read-permissions and also SHOWPLAN so we can see that serial plan we’re expecting.  I also cleared the cache so that the new execution would be a new compilation.

But what do we see when the user tries to override the MAXDOP?

 

EXEC sp_executesql

       N'SELECT charge_no FROM dbo.charge

       WHERE charge_dt = @charge_dt OPTION (MAXDOP 4)',

       N'@charge_dt datetime', 

       @charge_dt = '1999-07-20 10:49:11.833';

 

clip_image008

And you’ll see the same in SQL Server Management Studio (in case you were wondering):

clip_image010

 

So what about the rows per thread that I showed you earlier?  How does it look now?  This time, there are no separate columns by thread:

 

clip_image012

Not there.  And while there are parallel iterators, there rows per thread is not showing.

Seen another way, let’s look at the properties of the Clustered Index Scan and associated XML Showplan RunTimeCountersPerThread values by thread for the original, unconstrained execution:

 

clip_image014

<RunTimeCountersPerThread Thread="3" ActualRows="2" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="2" ActualRows="2" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="4" ActualRows="6" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="1" ActualRows="6" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="0" ActualRows="0" ActualEndOfScans="0" ActualExecutions="0" />

 

And this is what the MAXDOP 1 workload group constrained request looks like in the properties in contrast:

 

clip_image016

<RunTimeCountersPerThread Thread="0" ActualRows="16" ActualEndOfScans="1" ActualExecutions="1" />

 

For the constrained plan, I can also see that the DegreeOfParallelism is 0 for the QueryPlan element, but not for the RelOps:

 

<QueryPlan DegreeOfParallelism="0" CachedPlanSize="16" CompileTime="1" CompileCPU="1" CompileMemory="120">

<RelOp AvgRowSize="19" EstimateCPU="0.880078" EstimateIO="6.8935" EstimateRebinds="0" EstimateRewinds="0"

EstimateRows="14.8802" LogicalOp="Clustered Index Scan" NodeId="1" Parallel="true"

PhysicalOp="Clustered Index Scan" EstimatedTotalSubtreeCost="7.77357" TableCardinality="1600000">

What’s more, while the graphical plans look identical and has identical estimated costs (I didn’t mention this earlier – but both plans had an estimated cost of 8.18609).

So the request did indeed honor the max degree of parallel setting for the workload group in the end – it just didn’t remove the parallelism related iterator and properties.

I use Windows 7 as my laptop host OS and this limits me from using Hyper-V natively to handle VM guests.   Yes, I could do the dual-boot thing, but I haven’t done it yet and I do like the simplicity of not having to switch host OS contexts.  Windows 8 will, thankfully, be adding Hyper-V hosting support to the client, but in the meantime I’ve been using VirtualBox for my guest OS needs.

Jonathan Kehayias blogged about it extensively in his “Building a Completely Free Playground” series – and once I moved to SQLskills I felt less skittish about running a product that happened to have the big “O” word listed prominently on the application toolbar header.

I actually use VirtualBox for a variety of purposes – not just for SQL Server test environments. For example, if I need to connect to an environment that has conflicting (or unique) remote connectivity requirements, rather than risk clobbering my host VPN settings, I provision a clean, isolated guest environment that I use for connectivity and configuration instead.

Sometimes I’ll have a group of servers that I want to launch at the same time or I’ll want to put a group of them in a saved state at the same time (for example – when running a guest Failover Cluster).  In one case I have six different guests that I want to save or start at (nearly) the same time and I don’t like having to right-click save on each one – especially since my guests will quickly think something is wrong and when I bring them back from a saved state, it takes a few seconds to get them de-frazzled.

By the way – I’m using these scripts for test environments.  These are test environments that if I lost them, I’d be annoyed – but not devastated. 

Rather than rely on the GUI, I created a BAT file to save the guests in a batch:

cd /D "C:\Program Files\Oracle\VirtualBox\"

START /b VBoxManage.exe  controlvm "Node4_D2"  savestate

START /b VBoxManage.exe  controlvm "Node3_D2"  savestate

START /b VBoxManage.exe  controlvm "Node2_D1"  savestate

START /b VBoxManage.exe  controlvm "Node1_D1"  savestate

START /b VBoxManage.exe  controlvm "SAN_D1" savestate

START /b VBoxManage.exe  controlvm "Node_DC"  savestate

I use the START option so that I’m not waiting for the save operation in a serial fashion.  I use the “/b” so that six windows don’t pop up.  As for the Virtualbox command syntax – it’s pretty straightforward – I call VBoxManage.exe with controlvm followed by the name of the VM and the option to save it (savestate).

As for starting them up from the saved state, I do stagger things a bit.  I use startvm instead of controlvm, followed by the VM name and I don’t use the START keyword for anything I want to ensure is executed in order (I also use the TIMEOUT option to wait a few seconds before launching other VMs):

cd /D "C:\Program Files\Oracle\VirtualBox\"

VBoxManage.exe startvm "Node_DC" 

VBoxManage.exe startvm "SAN_D1"

TIMEOUT 10

START /b VBoxManage.exe startvm "Node1_D1" 

START /b VBoxManage.exe startvm "Node2_D1" 

START /b VBoxManage.exe startvm "Node3_D2" 

START /b VBoxManage.exe startvm "Node4_D2"

Even with these scripts I wouldn’t call this an orderly and synchronized save and start.  A Windows guest cluster may sometimes catch on to something being amiss – but often it does not.  Sometimes I’ll have to wait a few seconds for the nodes to start talking to each other again.

Anyhow – if you also use VirtualBox and have any tips you’d like to share when it comes to coordinating the guest launches, please post here as I’m definitely curious to learn about them.

Categories:
Off-Topic | Virtual Machines

Theme design by Nukeation based on Jelle Druyts