“CSelCalcCombineFilters_ExponentialBackoff” Calculator– Part II

At the end of my last post, “CSelCalcCombineFilters_ExponentialBackoff” Calculator I gave the following homework:

Homework assignment – add another member column to the mix that has a 0.100 selectivity (for example – county) and compare the estimates using a pre-2014 and 2014 CTP2 compatibility mode.

Here is the answer-key.  I’ll start off with adding a new column called “county” to the member table (and yes – county gets a little involved for some cities like NYC, but let’s just go with a uniform county association to city/state for now):

```ALTER TABLE [dbo].[member]
```

And then next I’ll revise my previous city/state update script to include county:

```UPDATE  [dbo].[member]
SET     [city] = 'Minneapolis',
[state_prov] = 'MN',
[county] = 'Hennepin'
WHERE   [member_no] % 10 = 0;

UPDATE  [dbo].[member]
SET     [city] = 'New York',
[state_prov] = 'NY',
[county]='Manhattan'
WHERE   [member_no] % 10 = 1;

UPDATE  [dbo].[member]
SET     [city] = 'Chicago',
[state_prov] = 'IL',
[county]='Cook'
WHERE   [member_no] % 10 = 2;

UPDATE  [dbo].[member]
SET     [city] = 'Houston',
[state_prov] = 'TX',
[county]='Harrisburg'
WHERE   [member_no] % 10 = 3;

UPDATE  [dbo].[member]
[state_prov] = 'PA',
WHERE   [member_no] % 10 = 4;

UPDATE  [dbo].[member]
SET     [city] = 'Phoenix',
[state_prov] = 'AZ',
[county]=''
WHERE   [member_no] % 10 = 5;

UPDATE  [dbo].[member]
SET     [city] = 'San Antonio',
[state_prov] = 'TX',
[county]='Bexar'
WHERE   [member_no] % 10 = 6;

UPDATE  [dbo].[member]
SET     [city] = 'San Diego',
[state_prov] = 'CA',
[county]='San Diego'
WHERE   [member_no] % 10 = 7;

UPDATE  [dbo].[member]
SET     [city] = 'Dallas',
[state_prov] = 'TX',
[county]='Dallas'
WHERE   [member_no] % 10 = 8;
GO
```

Now, what row estimate will we get for the member table pre-2014 if we execute the following query?

```SELECT  m.[lastname],
m.[firstname],
SUM(c.[charge_amt]) AS [Total_Charge_amt]
FROM    [dbo].[member] AS [m]
INNER JOIN [dbo].[charge] AS 1 ON
m.[member_no] = c.[member_no]
WHERE   [city] = 'Minneapolis' AND
[state_prov] = 'MN' AND
[county] = 'Hennepin'
GROUP BY m.[lastname],
m.[firstname]
OPTION (RECOMPILE);
GO
```

We see the following estimated rows versus actual (keeping in mind that there are no multi-column stats to help us out):

Estimated number of rows is 10, but actual is 1000.

Given that we know the three predicates are correlated, but SQL Server thinks they are independent (no multi-column stats), we can see the calculation is as follows:

Now if I switch to 2014 compatibility mode, I see the following estimated number of rows:

And I see the following query_optimizer_estimate_cardinality event output we see the following for the 2014 version of the plan:

<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcCombineFilters_ExponentialBackoff” Conjunction=”true”>
<SubCalculator>
<FilterCalculator CalculatorName=”CSelCalcColumnInInterval” Selectivity=”0.100″ TableName=”[m]” ColumnName=”state_prov” StatId=”5″ />
<FilterCalculator CalculatorName=”CSelCalcColumnInInterval” Selectivity=”0.100″ TableName=”[m]” ColumnName=”county” StatId=”8″ />
<FilterCalculator CalculatorName=”CSelCalcColumnInInterval” Selectivity=”0.100″ TableName=”[m]” ColumnName=”city” StatId=”4″ />
</SubCalculator>
</FilterCalculator>
</CalculatorList>

So updating the formulas, how is the 177.828 derived?

So we see in 2014 CTP2 that instead of multiplying the selectivity of each predicate (again, assuming independence), the Exponential Backoff calculator is blunting the impact 10000*0.1*POWER(0.1,1/2)*POWER(0.1,1/4).

Still a skew – but less so.  177.828 estimated vs. 10.  And think it doesn’t really matter?

For this example on my test system, pre-2014 the Clustered Index Scan was serial and fed to a Distribute Streams exchange operator, and with the new estimate, the operator is parallel.

Before:

After:

Plenty more to explore in this area…

“CSelCalcCombineFilters_ExponentialBackoff” Calculator

One more post for the day, just while I have the blogging bug…

Per my last post, we saw that the query_optimizer_estimate_cardinality event can provide some interesting information in SQL Server 2014 around how cardinality estimates get calculated.

I have a setup I’ve used in the past to demonstrate column correlation challenges (see my post, Exploring Column Correlation and Cardinality Estimates, for the full example).  I’ll use that setup for this post as well, using the Credit database on SQL Server 2014 with the database compatibility level set to 120:

```USE [Credit];
GO

-- Modifying the Credit data to set up the DBA's scenario
UPDATE  [dbo].[member]
SET     [city] = 'Minneapolis',
[state_prov] = 'MN'
WHERE   [member_no] % 10 = 0;

UPDATE  [dbo].[member]
SET     [city] = 'New York',
[state_prov] = 'NY'
WHERE   [member_no] % 10 = 1;

UPDATE  [dbo].[member]
SET     [city] = 'Chicago',
[state_prov] = 'IL'
WHERE   [member_no] % 10 = 2;

UPDATE  [dbo].[member]
SET     [city] = 'Houston',
[state_prov] = 'TX'
WHERE   [member_no] % 10 = 3;

UPDATE  [dbo].[member]
[state_prov] = 'PA'
WHERE   [member_no] % 10 = 4;

UPDATE  [dbo].[member]
SET     [city] = 'Phoenix',
[state_prov] = 'AZ'
WHERE   [member_no] % 10 = 5;

UPDATE  [dbo].[member]
SET     [city] = 'San Antonio',
[state_prov] = 'TX'
WHERE   [member_no] % 10 = 6;

UPDATE  [dbo].[member]
SET     [city] = 'San Diego',
[state_prov] = 'CA'
WHERE   [member_no] % 10 = 7;

UPDATE  [dbo].[member]
SET     [city] = 'Dallas',
[state_prov] = 'TX'
WHERE   [member_no] % 10 = 8;
GO
```

So let’s look at the cardinality estimate for a specific operator in the following query (using RECOMPILE to get fresh query_optimizer_estimate_cardinality events on each execution):

```SELECT  m.[lastname],
m.[firstname],
SUM(c.[charge_amt]) AS [Total_Charge_amt]
FROM    [dbo].[member] AS [m]
INNER JOIN [dbo].[charge] AS 1 ON
m.[member_no] = c.[member_no]
WHERE   [city] = 'Minneapolis' AND
[state_prov] = 'MN'
GROUP BY m.[lastname],
m.[firstname]
OPTION (RECOMPILE);
GO
```

Below is the abridged plan and the operator and estimates I’m interested in:

In SQL Server 2014 CTP2 – we estimate 316.228 rows for the Clustered Index Scan against the member table – and the actual number of rows is 1,000.

Now if I flip my database compatibility mode to pre-120, I’ll see the following actual vs. estimated instead:

Before SQL Server 2014, the estimate for  “[city] = ‘Minneapolis’ AND  [state_prov] = ‘MN’” is 100 rows – and if we added the supporting statistics or index on city and state_prov, we’ll get a better estimate (but there are limitations to how useful this can be – another topic altogether, so see this post).  We know that, in this example, the two columns are correlated.  But we would need to help SQL Server if we didn’t want it to assume each predicate was independent.

Anyhow – the original issue pre SQL Server 2014 was that our city and state_prov columns were seen as independent, and so we saw our estimates reflect this (10% of row estimate for city multiplied by 10% of row estimate for state_prov).  So we wind up with an underestimate.

Now in SQL Server 2014 CTP2, with the Credit database compatibility mode put back to 120, we see the skew still exists, but the gap isn’t as extreme – showing 316.228 instead of 100 as an estimate.  Still incorrect, but less so.

Now putting this in Excel, we can see the following:

Earlier we updated 10% of the 10,000 rows for various city/state combinations, so what’s up with the 32% for the state predicate selectivity?

That 32% is, using Excel formula style, =POWER((10%),1/2).

And if we look at the query_optimizer_estimate_cardinality for the 2014 CTP2 version, we’ll see the following calculator reference (abridged output):

And here is the full calculator blurb for the CSelCalcCombineFilters_ExponentialBackoff:

<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcCombineFilters_ExponentialBackoff” Conjunction=”true”>
<SubCalculator>
<FilterCalculator CalculatorName=”CSelCalcColumnInInterval” Selectivity=”0.100″ TableName=”[m]” ColumnName=”state_prov” StatId=”5″ />
<FilterCalculator CalculatorName=”CSelCalcColumnInInterval” Selectivity=”0.100″ TableName=”[m]” ColumnName=”city” StatId=”4″ />
</SubCalculator>
</FilterCalculator>
</CalculatorList>

We see the selectivity of state_prov and city are both 0.100.  And yet unlike the pre-2014 version that multiples the two selectivity percentages, we’re not doing that in 2014 CTP2. The calculator name references “ExponentialBackoff” – which according to Wikipedia (yeah I know), is defined as “an algorithm that uses feedback to multiplicatively decrease the rate of some process, in order to gradually find an acceptable rate”.

What does this mean in this specific case?

In absence of multi-column statistics for city and state_prov, our estimated rows is not underestimated nearly as much as it used to be pre-2014.

Homework assignment – add another member column to the mix that has a 0.100 selectivity (for example – county) and compare the estimates using a pre-2014 and 2014 CTP2 compatibility mode.

A first look at the query_optimizer_estimate_cardinality XE event

This weekend I set up a Windows Azure virtual machine running SQL Server 2014 CTP2 (12.0.1524).  I had a local copy of SQL Server 2014 CTP2 in VMware as well, but thought it would be more fun to explore it on Windows Azure (kill two explorations with one stone).  On a side note, I’m really digging having the azure credits to play with each month.  If you have an MSDN subscription, be sure to take advantage of this benefit.

In this post I’m just sharing my initial exploration steps regarding the query_optimizer_estimate_cardinality XE event.  I’m not entirely sure how well this event will be documented, but I’m definitely interested in learning more about it.

For my test scenario, I attached a version of AdventureWorksLT2012, set it to compatibility level 120 and then created the following session:

```CREATE EVENT SESSION [XE_Card_Calculator] ON SERVER
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,
MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO
```

If you’re looking for the query_optimizer_estimate_cardinality in the GUI, keep in mind that it is in the Debug channel (so you’ll need to select this in order to see it).  This also implies that there is likely nontrivial overhead to enabling this event, so while there isn’t an explicit warning for this event like other more invasive ones, I would still use it with caution.

As for the description of this event in the GUI, it is as follows:

“Occurs when the query optimizer estimates cardinality on a relational expression.”

Okay – no big deal, right?  Why care?

So in keeping things simple, I executed the following query against a single table (using the RECOMPILE so I can get the event each time I tested it out):

```SELECT AddressLine1
OPTION (RECOMPILE);
```

The actual query execution plan had a Clustered Index Seek with an estimate of 1 row.  And gathering the query_optimizer_estimate_cardinality event I saw two events surfaced.

The first event had the following information:

 calculator creation_time 2013-11-16 16:56:35.6666666 input_relation query_hash 13158512245962950952 stats_collection stats_collection_id 2

The second event had the following information:

 calculator creation_time 2013-11-16 16:56:35.6666666 input_relation query_hash 13158512245962950952 stats_collection stats_collection_id 2

So there is a lot here to dig through, but I highlighted a couple of values that stood out. And I know that AddressID happens to be my clustered, unique, primary key column for this table.

What happens if I reference a non-unique key value that is covered by an index (such as StateProvince)?

```SELECT AddressID
WHERE StateProvince = 'Arizona'
OPTION (RECOMPILE);
```

This query uses in Index Seek as I expected, and this time for query_optimizer_estimate_cardinality I saw a new calculator value:

<CalculatorList>
</CalculatorList>

The stats_collection value was as follows:

<StatsCollection Name=”CStCollFilter” Id=”2″ Card=”13.00″>
<StatsInfo DbId=”5″ ObjectId=”69575286″ StatsId=”5″ />
</StatsCollection>

Also – for a scenario where I didn’t have stats – and disabled them from being auto-created (to simulate a wild guess scenario), I saw the following calculator list:

<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcPointPredsFreqBased“>
<SubCalculator Role=”DistinctCountPlan”>
<DistinctCountCalculator CalculatorName=”CDVCPlanLeaf Guesses=”1″ CoveringStatId=”4″ CoveringStatDensity=”450.000″ />
</SubCalculator>
</FilterCalculator>

</CalculatorList>

The “Guesses” part looks promising (thinking magic numbers/selectivity guesses/heuristics, whatever you like to call it).

When executing a query that kicked off auto-stats operations, I saw the following operator information:

<Operator Name=”LogOp_GbAgg” ClassNo=”31″>
<StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”847.00″ TableName=”SalesLT.Customer” />
<Operator Name=”AncOp_PrjList ” ClassNo=”137″>
<Operator Name=”AncOp_PrjEl ” ClassNo=”138″>
<Operator Name=”ScaOp_AggFunc ” ClassNo=”90″>
<AggFuncInfo AggType=”STATMAN” />
<Operator Name=”ScaOp_Identifier ” ClassNo=”99″>
</Operator>
</Operator>
</Operator>
</Operator>
</Operator>

And I saw the following calculator information (for the auto-stats operations):

<CalculatorList>
<DistinctCountCalculator CalculatorName=”CDVCPlanTrivial” />
</CalculatorList>

And lastly, I tried a query with a bad-practice (fiddling with the column reference via concatenation) to see what steps would be taken:

```SELECT [CustomerID]
FROM [SalesLT].[Customer]
WHERE LastName + ' ' = 'Gates'
OPTION (RECOMPILE);
```

This query plan just had a Clustered Index Scan, but spawned five query_optimizer_estimate_cardinality events associated with it (and I tested this a few times to see if the 5-event output was consistent):

 calculator input_relation

Lots of scenarios to mull over and dig through as time permits.

Why care?

Many query performance issues (and associated query plan quality issues) are due to cardinality estimate skews.    It would be great to have a way to more efficiently point to how the various estimates are being calculated and why the estimates are off.

I’m not sure how in-depth this event and associated calculators will be documented by Microsoft, and my assumption is that we’ll need to figure it out via collective reverse-engineering.  But in the meantime this new XE event might prove to be quite useful for troubleshooting the more mysterious cardinality estimates.

New Course: “SQL Server: Temporary Objects”

This is my first blog post since going-independent. As mentioned in my previous post, I’ll still be blogging periodically on my original SQLskills blog.

As for what I’ve been up to these last few weeks, I’ve been working on-site with a local Minneapolis client – spending about half of my time on the DBA side of things and half embedded with a developer team.  I’ve been helping out on longer-term projects and I’m definitely enjoying it so far.

So on to the subject of this brief blog post – today Pluralsight published my new course, “SQL Server: Temporary Objects”.

This is a short course on how to effectively leverage SQL Server temporary objects in your T-SQL code. The decision as to whether to use temporary tables or table variables can have a significant impact on your workload performance and scalability, and this course walks through several specific scenarios so that you can see first-hand the nuances of this design decision.

There and Back (to Minneapolis) Again

Exactly two years ago I started my journey with SQLskills. It has been an incredible experience and I’m grateful for my time with everyone in the company and also grateful for all the people I got to meet along the way. I’ve gained friendships and working relationships that will last a lifetime.

While this has been a great adventure, I found myself feeling over the last two years that I’m a bit of a Hobbit at heart (hence the blog post title reference). I prefer the comforts of home and have missed the longer-term, multi-month engagements and projects that I’ve worked on in the past in other roles. While I do a lot of consulting work at SQLskills, that work is mostly shorter engagements spread over many clients, so I realized a transition would be needed at some point.

Paul and I worked out the plan last May, so this plan has been in-the-works for a while. Nothing dramatic to report and no regrets on either side.

Where am I going next?

I set up my own company and booked my first local client engagement in Minneapolis starting October 1st working with a local Fortune 500 company for at least the next few months full time, if not longer.

I’ll still be partnering with SQLskills when the opportunity arises, be it a consulting engagement, guest-blog post or Pluralsight course.

Not to be sentimental – and yet I am – so I’ll share just a few of my favorite SQLskills memories. It was hard to choose, since there were so many and they may seem like simple experiences, but it is the simple stuff that means the most to me in life:

• Playing chess and drinking Prosecco late at night with Paul after class. We had the best conversations and sometimes Paul would even let me win (or rather, I would wait for him to get sleepy enough so that I would start winning).
• During the Immersion Events when not teaching that day, Kimberly would invite me to tag along to pick up the books for the next week of class at Kinkos. We had our best conversations during these mini-road trips.
• Visiting Jonathan in Florida and spending time with him and his incredible family. Jonathan is fiercely loyal and big-hearted (but don’t tell him I said that).
• Laughing at Glenn’s stealth jokes and wicked-dry humor. Favorite Glenn catch phrases include “that’s just silly” – and how he gets all excited about his favorite processor chipset.
• Weekly discussions with Erin have been incredible – discussing both life and SQL Server with her – and I always got something from the conversation.

So next week I’ll be teaching in Seattle (IE2: Immersion Event on Performance Tuning). It will be a great way to end this two-year adventure.

So that’s all for now. But you’ll still be hearing from me on this blog and elsewhere. Thanks!

Workloads that generate the 0×0000000000000000 query hash value?

In the last SQLskills Insider demo video I went through a scenario where you can use the query_hash column from sys.dm_exec_query_stats to aggregate statistics for queries that only differ by literal values.

I received an interesting question from someone who watched the video, ran the demo query in production and saw a 0×0000000000000000 value as the highest consumer of worker time (and also had the highest execution count).

I asked this person to show me the query text where [query_hash] = 0×0000000000000000 and it ended up being a series of FETCH NEXT and FETCH API_CURSOR calls, for example:

FETCH API_CURSOR00000000072A0B6E
FETCH API_CURSOR000000000729B7B2
FETCH API_CURSOR000000000729030C
FETCH API_CURSOR000000000729030C

There may be other circumstances where 0×0000000000000000 may manifest – but I haven’t witnessed them or heard of them.  If you have non-cursor scenarios where 0×0000000000000000 occurs, please share in the comments.

Thanks!

New Article on SQLPerformance.com “Exploring Low Priority Lock Wait Options in SQL Server 2014 CTP1”

My 10th guest blog post was published today on SQLPerformance.com:

Exploring Low Priority Lock Wait Options in SQL Server 2014 CTP1

In this post I further explore (from the last SQLPerformance.com post) online options, and specifically low priority lock wait improvements being included in SQL Server 2014 CTP1.

New Article on SQLPerformance.com “Exploring Partition-Level Online Index Operations in SQL Server 2014 CTP1”

My 9th guest blog post was published today on SQLPerformance.com:

Exploring Partition-Level Online Index Operations in SQL Server 2014 CTP1

In this post I explore the online, single-partition rebuild improvement being introduced in SQL Server 2014 CTP1.

We can handle the truth

Today I attended the MCM call with Microsoft Learning (MSL).

I won’t get personal here, because in spite of everything, I do imagine that the folks in MSL are under quite a bit of stress right now (yes – so is the community, but more on that later).  I myself remember getting chewed out back in November of 2010 when we announced that SQL MCM was removing the training requirement.  While I received some support, I received a good share of hate-email and comments – and I cannot recall a more stressful period of time in my career.  Communicating change is tough – and there is definitely a right way to do it and a wrong way to do it.  I think the first mistake is to think your audience can’t handle the truth.

With that said, here, in my dream world, is what I wished MSL would have said on today’s call.  They might have expressed some variations on a few of these items – and I’ll save it for MSL to communicate – but otherwise this is just an imaginary list of talking points:

<imaginary MSL talking points>

• I’m sorry about how and when we communicated the program cancellation.  It was incredibly ungracious and we really regret it.
• For anyone who has invested in the program in the last X number of months, we’ll be providing full refunds and will work through each scenario on a case-by-case basis.
• We will extend the ability to take exams for X number of months.  We agree it was unreasonable and unfair to give a 1 month notice.
• We ended the MCM program because we never really knew how to make it work.  Our organization isn’t structured to support programs like this – programs that are strategic but don’t generate direct-revenue.
• We wanted to model the programs after what Cisco does, but we didn’t actually do much of what we should have to make it more like Cisco.
• We wanted MCM to have industry-wide recognition, but we didn’t invest in long-term marketing.
• We don’t really plan on making an MCM\MCA replacement, hence the cancellation.
• When we say “pause” – we mean cancel and retire.  There will be a new “top” tier certification, but a much broader audience and it will not resemble MCM.
• Even if we ask the product team to protect these programs, they have other priorities right now and aren’t in the certification business.
• We will move all distribution lists and NDA-access related benefits to someone on the MVP community team to manage.  They have budget and know how to handle very large technical communities. They will manage this moving forward and you will be a member of the community and will be grandfathered in as appreciation of your time investment.
• Business is business, but we’ll throw in what perks we can to soften the blow (MSDN subscriptions, PASS tickets, we’ll use our imagination).

</imaginary MSL talking points>

Now back to my own, non-imaginary voice for a bit.  A few thoughts and opinions:

• I do really hope that anyone in the pipeline gets a chance to complete what they started if they choose to do so.
• I do also hope that people are reimbursed according to each situation.
• I hold out a very, very small hope that the various product teams will re-adopt each MCM/MCA program.
• I hope that everyone will be civil and not resort to bullying the people in MSL.  Be tough.  Be honest.  Be vocal.  But don’t be vicious or get personal please.  Keep perspective.
• I know we don’t need an acronym to be masters with the product.  The biggest benefit of being an MCM was the community and also the process of achieving it.
• We’ll all be okay.

Lastly, I of course remain fiercely loyal to SQL Server.  It is the horse I bet on 16 years ago and I have no regrets.   But as for the SQL Server certification program, quite a bit will need to happen before I would feel comfortable advocating for them again.

A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

Now before I begin this post, a quick disclaimer…

SQLskills partners with SQL Sentry on a couple of things. We periodically write guest-posts on SQLPerformance.com and SQL Sentry also provides exclusive sponsorship of two of our Immersion Events. I am also a big fan of SQL Sentry Plan Explorer, but that would have been the case regardless of the existing business relationships.

With that said, I wanted to write a post about SQL Sentry’s new “AlwaysOn Monitoring” functionality integrated into SQL Sentry Performance Advisor for SQL Server v7.5, highlighting a few areas that I think are useful after an initial, informal investigation.

I explored this new functionality in a VMware environment with ten availability groups configured on a four node Windows Server Failover Cluster (WSFC) with four standalone SQL Server instances.

Here were the top areas that stood out for me…

Let’s say you’re the new DBA on the team. Your SQL Server production environment is being monitored by SQL Sentry Performance Advisor and you want to get a quick look at the current availability group topologies…

Steps I performed to do this:

• Launched the SQL Sentry Client
• Right-clicked the Default Site and selected “Open AlwaysOn Management”

This is what I saw by-default:

The zoom-level for the availability group topology visualizations was set to 150% and the layout style was set to “WSFC Node/Group Matrix”. Scrolling down I was able to see all ten availability groups defined on SQL Server instances hosted on the WSFC.

There are various layout styles to choose from, including:

• “WSFC Node / Group Matrix” (what you saw in the earlier visual)
• “WSFCs” (showing the monitored WSFCs in your environment)
• “WSFC Members” (nodes associated with a specific WSFC)

• “Instance/Group Matrix”

• “Instances”

•  “Groups/Replicas”

Availability Group Configuration Visualizations

You can answer quite a few questions by just looking at the availability group animations. You’re seeing static screen shots on this blog post – but keep in mind that this view changes state dynamically. The image below shows me that “AG-Test2” has two replicas on two nodes out of the four node WSFC:

I can clearly see which replica is primary and which is secondary, and the spoke symbol (zoomed in below) lets me know which replicas are configured for automatic failover:

If a replica is set for read-only connection mode, we’ll see the following symbol:

Hovering over the read-only symbol will clarify if it is read-intent only (SQL Sentry uses the word “Specified” for read-intent and just “All” when read-intent is not used).

If replicas are asynchronous, we’ll see a double-pipe symbol by the associated replica (I put red boxes around them so you could see exactly what I’m talking about):

The connecting pipelines also have inherent meaning. Let’s say you’ve heard about performance issues in the availability group topology and you want to narrow down quickly to the offending availability group, the bars within the pipeline will automatically be sized based on relative load of the other availability groups and you’ll see the colors shift from green, to yellow to red for availability groups pushing more load:

Hovering over the pipelines will also provide information regarding KB to Replica/Sec and the Recovery queue KB.

Tabular Data

If animated topology layouts aren’t your thing, you have a series of tabs below the layout pane that are responsive to the objects you click in the layout area. Tabs cover WSFCs, WSFC Members, Instances, FCIs, Availability Groups, Replicas, Databases, Database Replica States and Availability Group Listener information. Below shows an example of database replica states information:

There is a significant amount of information exposed in the grid section and based on initial impressions, I think it would be likely that I would most often view the Availability Groups tab and Database Replica States tabs.

Most of the grids are expandable. For example- expanding an availability group to see the availability databases and then clicking Replicas to see where they are hosted:

Charts

In the upper-right hand pane you’ll see dynamic chart graphics depending on where you’ve clicked. For example, selecting a specific WSFC Member on the WSFC Members tab shows the following KB From Replica/Sec data:

Error Logs and Replica State Changes

In the same area as the Chart tab is an Error Log and Replica State Changes tab. This information isn’t on by default, and you if you want this information you need to enable it explicitly (the interface makes this clear). The reason behind this is related to the monitoring noise that then gets added to the SQL Server Error Log, with an ongoing entry of “Using ‘dbghelp.dll’ version ‘4.0.5’”. For more information on this issue, see the Connect item “xEvents: selecting System Health information records Using ‘dbghelp.dll’ version ’4.0.5′ in Errorlog”.

But if you’re okay with the additional noise in the SQL Server Error Log, these two information panes can be useful in order to see real-time availability group activity. For example – here is what we see after a controlled failover of an availability group for the Error Log:

And here is what we see for Replica State Changes:

And as an aside, the visualization of the availability group does dynamically change during a failover event – and you’ll see a change in pipeline activity and also in the status color (the example below shows a red/pink represent of an “unhealthy” availability group state).

Wish List

I do have a few items on the wish list:

• A new name. “AlwaysOn” is a Microsoft marketing term that encompasses more than Availability Groups, so I think the name should be “Availability Group Monitoring.”
• Dynamic auto-zoom based on the number of objects in a topology.
• I think some of the graphical elements could use some clarity – for example, adding a legend that translates the symbols for new users of this functionality.
• Have an easy way to export out the tabular data from the various panes.

My first impressions are very favorable and I do think this is a useful tool – particularly for environments that have numerous availability groups and higher complexity topologies.

PS: This product also provides built-in alerting for health changes and failovers that isn’t provided natively by Microsoft, but I didn’t get a chance to investigate this yet.  I do plan on testing this out in the future and will share any interesting findings accordingly.