A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

Now before I begin this post, a quick disclaimer…

SQLskills partners with SQL Sentry on a couple of things. We periodically write guest-posts on SQLPerformance.com and SQL Sentry also provides exclusive sponsorship of two of our Immersion Events. I am also a big fan of SQL Sentry Plan Explorer, but that would have been the case regardless of the existing business relationships.

With that said, I wanted to write a post about SQL Sentry’s new “AlwaysOn Monitoring” functionality integrated into SQL Sentry Performance Advisor for SQL Server v7.5, highlighting a few areas that I think are useful after an initial, informal investigation.

I explored this new functionality in a VMware environment with ten availability groups configured on a four node Windows Server Failover Cluster (WSFC) with four standalone SQL Server instances.

Here were the top areas that stood out for me…

Get your bearings quickly

Let’s say you’re the new DBA on the team. Your SQL Server production environment is being monitored by SQL Sentry Performance Advisor and you want to get a quick look at the current availability group topologies…

Steps I performed to do this:

  • Launched the SQL Sentry Client
  • Right-clicked the Default Site and selected “Open AlwaysOn Management”

This is what I saw by-default:

9 6 2013 8 55 33 AM thumb A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

The zoom-level for the availability group topology visualizations was set to 150% and the layout style was set to “WSFC Node/Group Matrix”. Scrolling down I was able to see all ten availability groups defined on SQL Server instances hosted on the WSFC.

There are various layout styles to choose from, including:

  • “WSFC Node / Group Matrix” (what you saw in the earlier visual)
  • “WSFCs” (showing the monitored WSFCs in your environment)
  • “WSFC Members” (nodes associated with a specific WSFC)

9 6 2013 10 19 12 AM thumb A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

  • “Instance/Group Matrix”

9 6 2013 10 20 35 AM thumb A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

  • “Instances”

9 6 2013 10 22 20 AM thumb A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

  •  “Groups/Replicas”

9 6 2013 1 39 14 PM thumb A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

Availability Group Configuration Visualizations

You can answer quite a few questions by just looking at the availability group animations. You’re seeing static screen shots on this blog post – but keep in mind that this view changes state dynamically. The image below shows me that “AG-Test2” has two replicas on two nodes out of the four node WSFC:

image thumb A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

I can clearly see which replica is primary and which is secondary, and the spoke symbol (zoomed in below) lets me know which replicas are configured for automatic failover:

9 6 2013 10 26 37 AM thumb A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

If a replica is set for read-only connection mode, we’ll see the following symbol:

9 6 2013 11 06 14 AM thumb2 A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

Hovering over the read-only symbol will clarify if it is read-intent only (SQL Sentry uses the word “Specified” for read-intent and just “All” when read-intent is not used).

If replicas are asynchronous, we’ll see a double-pipe symbol by the associated replica (I put red boxes around them so you could see exactly what I’m talking about):

SNAGHTMLa08656b thumb1 A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

The connecting pipelines also have inherent meaning. Let’s say you’ve heard about performance issues in the availability group topology and you want to narrow down quickly to the offending availability group, the bars within the pipeline will automatically be sized based on relative load of the other availability groups and you’ll see the colors shift from green, to yellow to red for availability groups pushing more load:

9 6 2013 10 36 54 AM thumb1 A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

Hovering over the pipelines will also provide information regarding KB to Replica/Sec and the Recovery queue KB.

Tabular Data

If animated topology layouts aren’t your thing, you have a series of tabs below the layout pane that are responsive to the objects you click in the layout area. Tabs cover WSFCs, WSFC Members, Instances, FCIs, Availability Groups, Replicas, Databases, Database Replica States and Availability Group Listener information. Below shows an example of database replica states information:

9 6 2013 10 45 25 AM thumb1 A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

There is a significant amount of information exposed in the grid section and based on initial impressions, I think it would be likely that I would most often view the Availability Groups tab and Database Replica States tabs.

Most of the grids are expandable. For example- expanding an availability group to see the availability databases and then clicking Replicas to see where they are hosted:

9 6 2013 10 51 06 AM thumb A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

9 6 2013 10 51 54 AM thumb A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

Charts

In the upper-right hand pane you’ll see dynamic chart graphics depending on where you’ve clicked. For example, selecting a specific WSFC Member on the WSFC Members tab shows the following KB From Replica/Sec data:

9 6 2013 10 54 31 AM thumb A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

Error Logs and Replica State Changes

In the same area as the Chart tab is an Error Log and Replica State Changes tab. This information isn’t on by default, and you if you want this information you need to enable it explicitly (the interface makes this clear). The reason behind this is related to the monitoring noise that then gets added to the SQL Server Error Log, with an ongoing entry of “Using ‘dbghelp.dll’ version ‘4.0.5’”. For more information on this issue, see the Connect item “xEvents: selecting System Health information records Using ‘dbghelp.dll’ version ’4.0.5′ in Errorlog”.

But if you’re okay with the additional noise in the SQL Server Error Log, these two information panes can be useful in order to see real-time availability group activity. For example – here is what we see after a controlled failover of an availability group for the Error Log:

9 6 2013 11 02 40 AM thumb A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

And here is what we see for Replica State Changes:

9 6 2013 11 03 32 AM thumb A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

And as an aside, the visualization of the availability group does dynamically change during a failover event – and you’ll see a change in pipeline activity and also in the status color (the example below shows a red/pink represent of an “unhealthy” availability group state).

9 6 2013 11 01 50 AM thumb A Brief Tour of SQL Sentry’s “AlwaysOn Monitoring”

Wish List

I do have a few items on the wish list:

  • A new name. “AlwaysOn” is a Microsoft marketing term that encompasses more than Availability Groups, so I think the name should be “Availability Group Monitoring.”
  • Dynamic auto-zoom based on the number of objects in a topology.
  • I think some of the graphical elements could use some clarity – for example, adding a legend that translates the symbols for new users of this functionality.
  • Have an easy way to export out the tabular data from the various panes.

My first impressions are very favorable and I do think this is a useful tool – particularly for environments that have numerous availability groups and higher complexity topologies.

PS: This product also provides built-in alerting for health changes and failovers that isn’t provided natively by Microsoft, but I didn’t get a chance to investigate this yet.  I do plan on testing this out in the future and will share any interesting findings accordingly.

New Article on SQLPerformance.com: “Exploring SQL Server 2014 SELECT INTO Parallelism”

My 8th guest blog post was published today on SQLPerformance.com:

Exploring SQL Server 2014 SELECT INTO Parallelism

In this post I kick the tires of the new SELECT … INTO parallelism improvement in SQL Server 2014 CTP1 – and the results I saw were favorable.

New Course: “SQL Server 2012: Transact-SQL Error Handling”

Today Pluralsight published my new course, “SQL Server 2012: Transaction-SQL Error Handling”.

As the title suggests, this course steps through how to write Transact-SQL code that deals with anticipated and unanticipated errors during code execution.  The audience is for developers and DBAs – and while the title states it is for SQL Server 2012, several aspects are applicable from SQL Server 2005 and onward.

Presenting “Performance Issue Patterns” at the PASSMN Minnesota SQL Server User Group

If you’re in town (Minneapolis/St. Paul) – just a heads-up that I’ll be presenting at the next PASSMN meeting on Tuesday, August 20th (scheduled from 4PM to 6PM).  This will be located at Microsoft, 3601 West 76th Street, Suite 600 Edina, MN  55437.

Here is what I’ll be presenting:

Title: Performance Issue Patterns

Description: While a SQL Server professional may encounter various one-off situations over the course of a career, there are also several shared patterns and scenarios.  In this session we’ll walk through a few of these patterns interactively – where I’ll reproduce the issue and we’ll work through the scenario until we get to the root cause.

If you’ve seen my Pluralsight course with the same name – please note that this session is structured differently.  I’ll be setting up a scenario (with helpful and unhelpful symptoms), kicking off the pattern and then it is up to us to collectively troubleshoot.  Plenty of scenarios to choose from over 75 minutes, so I’ll likely pick and choose scenarios based on participant interaction as we go along.  Think “Video Game for DBAs”.

New Article on SQLPerformance.com: “Observer Overhead and Wait Type Symptoms”

My 7th guest blog post was published today on SQLPerformance.com:

Observer Overhead and Wait Type Symptoms

This article details seven test scenarios I performed in order to see what wait-type symptoms (if any) arise based on different types of “observer overhead” due to concurrently running SQL Trace and XE sessions.

New Course: “SQL Server: Transactional Replication Fundamentals”

Today Pluralsight published my new course, “SQL Server: Transactional Replication Fundamentals.”

This course provides a fundamental overview of how to configure, monitor, tune and troubleshoot a SQL Server transactional replication topology. Transactional replication meets a few specific data distribution requirements that no other native SQL Server feature does out-of-the-box.  Even if you’re not a fan of this feature (and I’ve met quite a few), if you are responsible for architecting SQL Server data-distribution solutions it is helpful to understand how transactional replication fits into the overall scalability, high availability and disaster recovery feature landscape. 

New Article on SQLPerformance.com: "Avoid HA/DR Solution Self-Delusion"

My sixth guest blog post was published today on SQLPerformance.com:

Avoid HA/DR Solution Self-Delusion

This article gives a few examples of how we can delude ourselves into thinking that a solution is fully meeting our high availability and disaster recovery requirements.

Control Flow vs. Data Flow Demo

I had an interesting question yesterday in class about query execution plan control flow (from the root of the plan to the leaf level) and data flow (from leaf to root) and thought I would share a small demo I put together over lunch break to walk through the discussion.

Let’s start by dropping clean buffers (on a test system, please):

DBCC DROPCLEANBUFFERS;

USE [Credit];
GO

SELECT  COUNT(*) AS [page_count]
FROM    [sys].[dm_os_buffer_descriptors] AS bd
WHERE   [bd].[database_id] = DB_ID() AND
[bd].[allocation_unit_id] = 15045483298816;
GO

This returns 0 rows for the allocation_unit_id associated with the table we’re about to query from the Credit database:

SELECT  [charge].[charge_no]
FROM    [dbo].[charge]
ORDER BY [charge].[charge_no];
GO

The actual plan shows the following (via SQL Sentry Plan Explorer):

image thumb Control Flow vs. Data Flow Demo

Nothing fancy, just a Clustered Index Scan.  And in terms of page counts from sys.dm_os_buffer_descriptors, we see 9,303 data pages in cache now.

Now let’s drop the clean buffers and execute a query returning just the top 100 rows:

DBCC DROPCLEANBUFFERS;

USE [Credit];
GO

SELECT TOP 100
[charge].[charge_no]
FROM    [dbo].[charge]
ORDER BY [charge].[charge_no];
GO

The actual plan is as follows:

image thumb1 Control Flow vs. Data Flow Demo

Re-executing the query against sys.dm_os_buffer_descriptors, this time we see just 13 data pages.

The original question/discussion was with regards to the storage engine – and whether all data pages still get loaded into memory even with a TOP.  As we see in this scenario, this was not the case.

SQLPerformance.com “Troubleshooting SQL Server CPU Performance Issues”

My fifth guest blog post was published today on SQLPerformance.com:

Troubleshooting SQL Server CPU Performance Issues

In this post I describe the general framework you can use to perform root cause analysis for high CPU or CPU-related performance issues.

New Course: “SQL Server 2012: Nonclustered Columnstore Indexes”

Yesterday Pluralsight published my new course, “SQL Server 2012: Nonclustered Columnstore Indexes”.

The intent behind this course was to provide a fast ramp-up on the aspects of this feature that you should consider when looking at whether or not this is a viable option for your SQL Server relational data warehouse workloads.  The course covers the nonclustered columnstore index feature fundamentals, batch-execution mode and segment elimination considerations, data modification techniques, and how to plan for deployment.

Even if you’re unsure of whether or not you’ll be using nonclustered columnstore indexes anytime soon, it is part of the database engine, so from a core SQL Server perspective it is a good idea to familiarize yourself with this topic. With vNext of SQL Server, Microsoft is introducing clustered columnstore indexes, so I think we’ll see a significant uptick in overall usage in the future.