Let’s say you have a heap table with 1,000,000 rows in it.  Let’s also say that your automatic creation of statistics are disabled, as well as updates to the statistics (and in this scenario, there are NO existing indexes or statistics).

What kind of selectivity guess would the optimizer make for a query like the following? (I copied over the FactInternetSales table with no indexes and put 1,000,000 rows in it – tested on SQL Server 2012, 11.0.2316)

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey = 537;

First of all, the actual plan on my system was as follows:

SNAGHTML1904f4f

If I look at the properties of the table scan, I see the actual number of rows was 23,042 versus the estimated of 31,622.8.  (As an aside – execute SELECT POWER(1000000.0,.75) and you’ll see how the estimated number of rows was derived).  I also see that the table cardinality is known (and correct) at 1,000,000.

SNAGHTML19c0871

What if I have a predicate referencing another column, like OrderDateKey?  This too results in an estimate of 31,622.8 rows.

The following table shows a sampling of various queries and associated search conditions, along with their associated row estimates:

Query Estimated Rows Percentage (based on 1,000,000 rows)

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey = 537;

31,622.8 3.16228%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE OrderDateKey = 20040224;

31,622.8 3.16228%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey BETWEEN 537 AND 600;

90,000 9%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey BETWEEN 537 AND 537;

90,000 9%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE TotalProductCost < 100.00;

300,000 30%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE TotalProductCost > 100.00;

300,000 30%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE TotalProductCost > 100.00 AND
      TotalProductCost < 200.00;

90,000 9%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE TotalProductCost >= 100.00 AND
      TotalProductCost <= 200.00;

90,000 9%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE TotalProductCost LIKE 100.00;

100,000 10%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE SalesOrderNumber LIKE '%A%';

539,232 53.9232%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE SalesOrderNumber LIKE 'A%';

269,616 26.9616%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE SalesOrderNumber LIKE '%A';

269,616 26.9616%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE TotalProductCost > (ProductStandardCost + 10.00);

300,000 30%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey IN (1);

31,622.8 3.16228%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey IN (1,2);

62,245.6 6.22456%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey = 1 OR
      ProductKey = 2;

62,245.6 6.22456%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey IN (1,2,3);

91,900 9.19%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey = 1 OR
      ProductKey = 2 OR
      ProductKey = 3;

91,900 9.19%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey = 1 AND
      ProductKey = 2 AND
      ProductKey = 3;

10,746.1 1.07461%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey = 1 AND
      ProductKey = 2;

13,335.2 1.33352%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE (TaxAmt * UnitPrice) > 10.00;

300,000 30%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey NOT IN (1);

968,377 96.8377%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey <> 1;

968,377 96.8377%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey NOT IN (1,2);

13,335.2 1.33352%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey NOT IN (1,2,3);

10746.1 1.07461%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey <> 1 AND
      ProductKey <> 2 AND
      ProductKey <> 3;

10746.1 1.07461%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE EXISTS (SELECT 1);

1,000,000 100%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE NOT EXISTS (SELECT 1);

0 0%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey IS NOT NULL;

1,000,000 100%

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey IS NULL;

0 0%

ALTER TABLE dbo.FactInternetSales
ALTER COLUMN ProductKey int NULL;

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey IS NOT NULL;

968,377 96.8377%

ALTER TABLE dbo.FactInternetSales
ALTER COLUMN ProductKey int NULL;

SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE ProductKey IS NULL;

31,622.8 3.16228%

There are numerous other scenarios I could have put in this table, but you probably get the point by now that the search conditions help define the which “guess” calculation is used in absence of statistics.  When there are additional details beyond table cardinality, such as the nullability of a column or constraints, SQL Server can often leverage it accordingly. 

Categories:
Performance

This post is just a reminder to be attentive to the locking overhead of your Transact-SQL server cursors.

For example, the following cursor is using default options in SQL Server 2012 to iterate row-by-row through the Employee table.  I’m declaring a variable and populating it with the BusinessEntityID from each row (and I’m not doing anything with it – as I just wanted to avoid the chatter back to my SQL Server Management Studio window):

   1:  DECLARE Employee_Cursor CURSOR FOR
   2:   
   3:  SELECT BusinessEntityID
   4:  FROM [HumanResources].[Employee];
   5:   
   6:  SELECT properties
   7:  FROM sys.dm_exec_cursors(52);
   8:   
   9:  OPEN Employee_Cursor;
  10:   
  11:  DECLARE @BusinessEntityID int;
  12:   
  13:  FETCH NEXT FROM Employee_Cursor
  14:  INTO @BusinessEntityID;
  15:   
  16:  WHILE @@FETCH_STATUS = 0
  17:  BEGIN
  18:     
  19:      FETCH NEXT FROM Employee_Cursor
  20:      INTO @BusinessEntityID;
  21:   
  22:  END
  23:   
  24:  CLOSE Employee_Cursor;

Now what kind of locking behavior did I see?  Using extended events and tracking lock_acquired and lock_released, I saw 2,340 events total.  Below is just a bit of what I saw – with the IS mode acquired for object, then IS mode lock for page, then S lock mode for the key, released for that key, acquired for the next key, released, and then released for the page and object and then starting all over again until reaching the final row of the table.

SNAGHTML1610fa0

I also included a call to sys.dm_exec_cursors for my test session id (52).  This returned the following property value:

TSQL | Dynamic | Optimistic | Global (0)

What about using SCROLL_LOCKS? (TSQL | Dynamic | Scroll Locks | Global (0)).  Here is the slightly modified T-SQL code and abridged result set:

   1:  DECLARE Employee_Cursor  CURSOR SCROLL_LOCKS
   2:  FOR
   3:   
   4:  SELECT BusinessEntityID
   5:  FROM [HumanResources].[Employee];
   6:   
   7:  SELECT properties
   8:  FROM sys.dm_exec_cursors(55);
   9:   
  10:  OPEN Employee_Cursor;
  11:   
  12:  FETCH NEXT FROM Employee_Cursor;
  13:   
  14:  WHILE @@FETCH_STATUS = 0
  15:  BEGIN
  16:     
  17:      FETCH NEXT FROM Employee_Cursor;
  18:   
  19:  END
  20:   
  21:  CLOSE Employee_Cursor;
  22:  DEALLOCATE Employee_Cursor;

SNAGHTML1712dcb

Now we see IX (object), IU (page) and U (key) modes enter the mix.  By the way, the 1237579447 is associated with the Employee table.  The 72057594045136896 value is the container_id – which is a type “1” (IN_ROW_DATA), so we can cross reference to sys.partitions.hobt_id, which then correlated to the Employee table.

Are the locks necessary? Maybe, depending on what you wish to achieve, but I find that often the cursor settings are not configured intentionally. 

As an exercise, you can check sys.dm_exec_cursor to see cursors running on your server at that moment in time.  This DMF will surface the session IDs, cursor name, properties and the associated sql_handle.  You may see settings that are unexpected or are overkill for what is actually required.

Categories:
Performance

Last November I blogged about how index usage stats don’t get updated when the associated index statistics (but not index) are used.

This post will describe another scenario that you should be aware of (the topic came up today in class while Kimberly was teaching – as we were trying to recall tricks to clearing stats for sys.dm_db_index_usage_stats)…

Imagine that I’ve queried a specific table as follows:

SELECT member_no, lastname, firstname, middleinitial, street, city, state_prov, country
FROM dbo.member
WHERE member_no = 1;

If I check sys.dm_db_index_usage_stats for any reference to the member table, I’ll see the following:

SELECT i.index_id, i.name, u.user_seeks, u.user_lookups, u.user_scans
FROM sys.dm_db_index_usage_stats u
INNER JOIN sys.indexes i ON
     u.object_id = i.object_id AND
     u.index_id = i.index_id
WHERE u.object_id=object_id('dbo.member')

This returns:

image

Now let’s say that you have a weekly rebuild of specific indexes (for example):

ALTER INDEX member_ident
ON dbo.member REBUILD

If I check for usage stats after rebuilding the query (and before anyone has accessed the member table specifically) – the stats have been cleared out for that table. 

image

Why does this matter? 

If you’re using the sys.dm_db_index_usage_stats to determine which indexes should be removed, you’re running the risk of making decisions based on recently cleared out statistics.  This is similar to the case where a SQL Server instance has been recently restarted.  You should not be dropping indexes without knowing whether the accumulated statistics represent the full set of critical workloads. 

For tables with frequent index rebuilds, be sure to capture data from sys.dm_db_index_usage_stats before these jobs run.  This DMV is definitely a useful tool, but if you’re not careful, you could be dropping indexes based on missing information.

A few other noteworthy items:

  • Rebuilding an index only clears the stats for the index involved.  For example – if I have two rows in the DMV for two different indexes on the same object, rebuilding one index will only clear stats for that object.
  • Reorganizing the index does NOT clear the stats from sys.dm_db_index_usage_stats.
  • This is not the only circumstance where the stats will clear.
  • I have only tested this recently on SQL Server 2008 R2 and SQL Server 2012.  At one point when I was at Microsoft I blogged about seeing the stats persist after a REBUILD (more in the context of REBUILD itself – and I had mentioned the DMV in passing).  Which makes me think that perhaps this behavior wasn’t always the case.  If you’ve found a different behavior on different versions, please post your comment here.

Thanks!

Categories:
indexing | Performance

Let’s say you are querying a partitioned table and you would like to see which partitions were accessed by looking at the graphical execution plan:

SNAGHTML59cf4d6

“Actual Partition Count” shows a value of 1 and “Actual Partitions Accessed” shows a value of 50.  The “Actual Partitions Accessed” property name could cause confusion though, since what you’re actually looking at is the partition numbers accessed (not the count of partitions accessed). 

I prefer the XML naming convention instead:


  <PartitionsAccessed PartitionCount="1">
    <PartitionRange Start="50" End="50" />
  </PartitionsAccessed>

The name mapping is as follows from Graphical-to-XML Plan formats:

“Actual Partition Count” = PartitionsAccessed element & PartitionCount attribute

“Actual Partitions Accessed” = PartitionRange

If I modify the query to access two partitions, I see the following (graphical and XML plan output):

image

<PartitionsAccessed PartitionCount="2">
               <PartitionRange Start="1" End="1" />
               <PartitionRange Start="50" End="50" />
</PartitionsAccessed>

And here is an example accessing all partitions in the table:

image
 

<PartitionsAccessed PartitionCount="63">
    <PartitionRange Start="1" End="63" />
  </PartitionsAccessed>

Once you realize the mapping, its no big deal to understand what’s going on, although I do see it causing confusion (hence this blog post).

Categories:
Performance | Execution Plan

The word “dashboard” immediately puts me into a state of suspicion. This is probably because I’ve been a part of the corporate world for 18 years now and, fair or not, the word “dashboard” gives me flashbacks of executive conversations where I’m asked to summarize complicated information into a single square – while somehow defying the laws of physics and providing all the necessary detail being asked for.

So when I heard of the “AlwaysOn Dashboard” – I was pretty sure I wouldn’t like it and that I would stick primarily with other methods.

SNAGHTML1d45817

Well – I was wrong.

While any tool could use further improvement, the AlwaysOn Dashboard is definitely something I’ll be using in the future. Hence the purpose of this post was just to show the various questions the dashboard can answer.

Where am I?

I start with this question because depending on which replica you’re launching the dashboard from, you’ll see a different view of the world. For example, the following is the first section of the dashboard when launched from the primary replica of an availability group called EMU-AG1:

SNAGHTML1d4043d

As you can see, from the primary I’m getting three sections and a view of my replicas, availability databases and more (which I’ll get in to later in this post).

Contrast this to launching the dashboard from a secondary replica:

SNAGHTML1d3bc65

You can see that I’m getting information from the perspective of just that replica. You may have no other choice in a disaster recovery scenario – and in that case, you will see even less. But the key is not to be surprised to see this subset of information if you’re launching the dashboard from a secondary instead of the primary. If you want to ensure you’re connecting to the primary, you could connect with SSMS to the Availability Group listener name instead, again assuming you’re not in an outage scenario.

What can I find out by default?

 

You saw by the earlier screen shots that we can find out quite a bit by default:

· Where is the primary replica? (called “primary instance” on the dashboard)

· What are the roles of each replica? (primary or secondary)

· What are the failover modes? (automatic or manual)

· What are the synchronization states? (for example – “not synchronizing”, “synchronizing”, “synchronized”)

· What are the availability databases?

· What are their individual availability database states?

· What are the database failover readiness states? (data loss possible, no data loss)

· Any issues at the replica scope?

· Any issues at the availability database scope?

Regarding the “issues” – let’s say one of my Windows Server Failover Cluster (WSFC) nodes is paused, and thus, one of my replicas is now offline. The issues column in the availability replicas shows the following:

SNAGHTML1d29a73

Clicking on the link-text, I see additional information in the Policy Evaluation Result on Availability Replica dialog box:

SNAGHTML1d235b9

So the theme is becoming clear that the dashboard is pointing out the summarized things I should be concerned with and then allowing me to click through to add additional information – instead of overwhelming me with information by default.

What else?

Right-click on the availability replica section and you’ll see other very useful data points:

SNAGHTML1d2ff4e

We start seeing information that merges the availability group and Windows Server Failover Cluster (WSFC) worlds – such as Quorum Votes and Member State (for example – if I pause a WSFC node, I’ll see Member State = Offline).

 

We have the same ability to add columns for the availability databases section. Right clicking the column headers reveals quite a few useful statistics (and again – I’m impressed that the UI designers exerted editorial control and didn’t try to stuff all columns into the report by default):

SNAGHTML1d16df7

Some excellent data points here – including Estimated Recovery Time, Estimated Data Loss, Synchronization Performance, Log Send Queue Size and Redo Queue Size. And while this isn’t replacing Perfmon for trending analysis – if you’re in the middle of a “things are slow now” scenario, this additional information is incredibly useful.

Where else can I go?

In the upper-right hand corner of the dashboard (I’m using the 11.0.2316 version of the SQL Server 2012, by the way), I can navigate into three different useful areas…

SNAGHTML1d134de

The “Start Failover Wizard” is as you might suspect. This wizard allows me to fail over to a secondary replica, making it the new primary replica.

The View AlwaysOn Health Events option allows you to explore Extended Events captured via the “AlwaysOn_health” session (which Jonathan has already talked about in more detail). What I love about this viewer (pictured below) is that unlike the dashboard which gives us a moment in time, with the viewer I can see what has happened recently that is noteworthy. And just like the dashboard, I can add the columns I’m interested in by right-clicking the column headers and then choosing what I need and also clicking specific rows to reveal details:

SNAGHTML1d0f5ad

Back to the main dashboard, if I click the “View Cluster Quorum Information” link, I’ll see data from a WSFC quorum perspective (nodes, member type, member state, and vote assignment):

SNAGHTML1d0c3a5

What about the T-SQL?

The information captured on the dashboard also provides us (if you snoop via Extended Events or SQL Trace) with guidance on the new catalog views and DMVs. Below I’ve capturing the commands behind the scenes – removing most of the extraneous T-SQL to show the core areas where some of this data is pulled for the dashboard. I didn’t clean up the original formatting from what I captured – so please forgive everything being in lower-case. Also – in many cases more information is pulled than displayed – and many fields are converted to a more readable format for use on the dashboard. I still find it useful to understand the origins of the data:

SNAGHTML1d08ada

select *

from master.sys.availability_groups;

  

select agstates.group_id, agstates.primary_replica

from master.sys.dm_hadr_availability_group_states as agstates;

   

select group_id, replica_id, replica_metadata_id

from master.sys.availability_replicas;

     

select replica_id, is_local, role

from master.sys.dm_hadr_availability_replica_states;

SNAGHTML1d017fc

select *

from master.sys.availability_groups;

   

select group_id, replica_id, replica_server_name,create_date,

 modify_date, endpoint_url, read_only_routing_url,

 primary_role_allow_connections, secondary_role_allow_connections,

 availability_mode,failover_mode, session_timeout, backup_priority,

 owner_sid 

 from master.sys.availability_replicas;

 select group_id, replica_id, role,operational_state,recovery_health,

 synchronization_health,connected_state, last_connect_error_number,

 last_connect_error_description, last_connect_error_timestamp

from master.sys.dm_hadr_availability_replica_states;

   

select replica_id,join_state

from master.sys.dm_hadr_availability_replica_cluster_states;

   

SELECT DISTINCT replica_server_name, node_name

FROM master.sys.dm_hadr_availability_replica_cluster_nodes AS arrc

LEFT OUTER JOIN master.sys.dm_hadr_cluster_members AS cm ON

      UPPER(arrc.node_name) = UPPER(cm.member_name) ;

select * from master.sys.dm_hadr_availability_group_states;

SNAGHTML1cfe114

select *

from master.sys.availability_groups;

   

select group_id, replica_id,replica_server_name,availability_mode 

from master.sys.availability_replicas;

   

select replica_id,group_database_id,database_name,is_database_joined,is_failover_ready 

from master.sys.dm_hadr_database_replica_cluster_states;

   

select *

from master.sys.dm_hadr_database_replica_states;

select replica_id,role,is_local

from master.sys.dm_hadr_availability_replica_states;

SNAGHTML1cf9e2b

declare @cluster_name nvarchar(128)

declare @quorum_type tinyint

declare @quorum_state tinyint

BEGIN TRY

    SELECT @cluster_name = cluster_name,

        @quorum_type = quorum_type,

        @quorum_state = quorum_state

    FROM sys.dm_hadr_cluster

END TRY

BEGIN CATCH

    IF(ERROR_NUMBER() != 297)

    BEGIN

        THROW

    END

END CATCH

     

SELECT

ISNULL(@cluster_name, '') AS [ClusterName],

ISNULL(@quorum_type, 4) AS [ClusterQuorumType],

ISNULL(@quorum_state, 3) AS [ClusterQuorumState]

SELECT

'Server[@Name=' + quotename(CAST(

        serverproperty(N'Servername')

       AS sysname),'''') + ']' + '/ClusterMemberState[@Name=' + quotename(cm.member_name,'''') + ']' AS [Urn],

cm.member_name AS [Name],

cm.member_type AS [MemberType],

ISNULL(cm.member_state, 2) AS [member_state],

ISNULL(cm.number_of_quorum_votes, -1) AS [NumberOfQuorumVotes]

FROM

sys.dm_hadr_cluster_members cm

How often does the dashboard refresh?

First of all, you have an “on” and “off” option in the upper right-hand corner:

SNAGHTML1cf50a8

And in SSMS you can configure the refresh interval in seconds using Tools/Options/SQL Server AlwaysOn/Dashboard:

SNAGHTML1cf06ec

(Initial) Wish List

So regarding what I would like to see added or improved upon, I think this will evolve over time, but initially I’d like to see the following:

· Ability to “snapshot” the report to an external file (PDF or XML file for example). Granted, I can just take a screen shot, but it would be nice to take a quick snapshot if I’ve stumbled across some unusual or short lived conditions.

· Ability to access the full availability group metadata from the dashboard of secondary replicas. This extends beyond the dashboard and into the data available through the data sources themselves.

Not a very long list yet – and again, I’ll certainly be adding the AlwaysOn Dashboard to my “useful” list of tools. Unlike the executive dashboards I mentioned at the beginning of the post, this one is actually quite useful and strikes that rare balance of providing just enough detail while also exerting editorial restraint.

Categories:
AlwaysOn

Yesterday I was working on a SSMS SQLCMD-mode script and I noticed that periodically my session_id changed across test executions.  I was tracking wait statistics based on specific session IDs, so I had to periodically add new session ids to my extended events session definition to make sure I captured my session’s wait stats accordingly. It was an isolated instance, so I could rely on the various session ids only being associated with my test.

It got me thinking about capturing session_id and scheduler assignment using SQLCMD mode. I wasn’t so much interested in the new session ids being used, but rather, I was interested in where my session ids were landing from a scheduler perspective. Certainly you could test this through an application, PowerShell script, etc. But SQLCMD mode offered a quick way to do this and I thought I would explore further. 

There are a few scenarios that I thought would be interesting, but for now I’ll just show one of them. For this scenario I had SQL Server instance (10.50.2500) with four visible schedulers and without multiple NUMA nodes (stay tuned on that for another time).  I started off by creating a table in tempdb to track session_id, scheduler_id, parent_node_id, cpu_id and a counter of connects:

:CONNECT CAESAR

USE tempdb;

CREATE TABLE [dbo].[SQLCMDRoundRobin](

    [sqlcmdroundrobin_id] [int] NOT NULL PRIMARY KEY CLUSTERED IDENTITY(1,1),

      [session_id] [smallint] NOT NULL,

      [scheduler_id] [int] NOT NULL,

      [parent_node_id] [int] NOT NULL,

      [cpu_id] [smallint] NOT NULL,

      [connect_counter] [int] NOT NULL

) ON [PRIMARY];

GO

Before I executed my test, I launched a separate background set of query executions (looping) which put the system under CPU pressure across all schedulers.

clip_image001[4]

Next, I connected using SQLCMD mode and used MERGE to INSERT/UPDATE specific session_id/scheduler_id combos.  I used “GO 10000” to connect-and-execute 10,000 times:

:CONNECT CAESAR

USE tempdb;

SET NOCOUNT ON;

MERGE  dbo.SQLCMDRoundRobin AS target

USING (

      SELECT      r.session_id,

                  s.scheduler_id,

                  s.parent_node_id,

                  s.cpu_id

      FROM sys.dm_exec_requests AS r

      INNER JOIN sys.dm_os_schedulers AS s ON

            r.scheduler_id = s.scheduler_id

      WHERE r.scheduler_id IS NOT NULL AND

            r.session_id = @@SPID) AS source

                  (session_id, scheduler_id, parent_node_id, cpu_id)

ON (target.session_id = source.session_id AND

    target.scheduler_id = source.scheduler_id)

WHEN MATCHED THEN

      UPDATE SET connect_counter = target.connect_counter  + 1

WHEN NOT MATCHED THEN

      INSERT (session_id, scheduler_id, parent_node_id, cpu_id, connect_counter )

      VALUES (source.session_id, source.scheduler_id,

             source.parent_node_id, source.cpu_id,  1);

GO 10000

Then I executed the following query to look at the distribution based on every SQLCMD connect:

:CONNECT CAESAR

USE tempdb;

SELECT session_id, scheduler_id, parent_node_id, cpu_id, connect_counter

FROM [dbo].[SQLCMDRoundRobin];

GO

This returned:

clip_image003[4]

Removing all concurrent activity, I saw the following distribution instead:

clip_image005[4]

I also tested the workload and CPU driving background activity with the processor affinity mask explicitly set, and the distribution was as follows:

clip_image007[4]

I’d like to apply this framework to more scenarios – including for those on NUMA systems and also against a wider variety of workloads, since this has an impact on how the sessions will get distributed.  When I get the chance, I’ll write about the observations here, but I thought I would share the capturing method in the meantime.

Categories:
Performance

The DBMIRROR_DBM_MUTEX wait type is undocumented. Or rather, it is documented in books online as being “Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.” Paul Randal had some additional information on this wait type in his “Wait statistics, or please tell me where it hurts” post, defining the wait type as “… contention for the send buffer that database mirroring shares between all the mirroring sessions on a server. It could indicate that you've got too many mirroring sessions.”

I saw one forum entry where someone said “you can ignore” the DBMIRROR_DBM_MUTEX wait type because, basically, it is undocumented. This assumes that Microsoft only documents wait types that we should care about – but this really isn’t the case.

A few months ago I had a discussion with a DBA about whether or not this wait type was associated with mirroring contention issues as Paul described or whether this really just represented a background process that naturally accumulated time (idle/spin waits). The DBA said this was an idle wait type that can be ignored. I said that it was indeed associated with activity and coupled with other statistics, may point to mirroring related contention.

As with anything undocumented, while you cannot point to official documentation, there is nothing to stop you from setting up a test scenario and evaluating what happens under specific conditions. The “show, don’t just tell” method is always something I prefer, even when I’m confident that a stated fact is correct.

Question 1: Does DBMIRROR_DBM_MUTEX accumulate during inactive periods?

I set up synchronous mirroring with automatic failover for this test (Credit database) and executing the following simple before/after WAITFOR capture of overall accumulated statistics to see which database mirroring related wait stats incremented during inactive (user inactive) periods. SQL Server version was 10.50.2500. The script runs in SQLCMD mode against CAESAR (principal) and AUGUSTUS (mirror)):

-- Principal database instance

:CONNECT CAESAR

SELECT wait_type, wait_time_ms, signal_wait_time_ms

INTO tempdb.dbo.before_waits

FROM sys.dm_os_wait_stats

WHERE wait_type LIKE 'DBM%';

GO

-- Mirror database instance

:CONNECT AUGUSTUS

SELECT wait_type, wait_time_ms, signal_wait_time_ms

INTO tempdb.dbo.before_waits

FROM sys.dm_os_wait_stats

WHERE wait_type LIKE 'DBM%';

GO

:CONNECT CAESAR

USE Credit;

WAITFOR DELAY '00:00:30'

GO

-- Principal

:CONNECT CAESAR

SELECT wait_type, wait_time_ms, signal_wait_time_ms

INTO tempdb.dbo.after_waits

FROM sys.dm_os_wait_stats;

SELECT b.wait_type,

a.wait_time_ms - b.wait_time_ms wait_time_ms,

a.signal_wait_time_ms - b.signal_wait_time_ms signal_time_ms

FROM tempdb.dbo.before_waits b

INNER JOIN tempdb.dbo.after_waits a ON

b.wait_type = a.wait_type

WHERE a.wait_time_ms > b.wait_time_ms

ORDER BY a.wait_time_ms - b.wait_time_ms DESC;

DROP TABLE tempdb.dbo.before_waits;

DROP TABLE tempdb.dbo.after_waits;

GO

:CONNECT AUGUSTUS

-- Secondary replica

SELECT wait_type, wait_time_ms, signal_wait_time_ms

INTO tempdb.dbo.after_waits

FROM sys.dm_os_wait_stats;

SELECT b.wait_type,

a.wait_time_ms - b.wait_time_ms wait_time_ms,

a.signal_wait_time_ms - b.signal_wait_time_ms signal_time_ms

FROM tempdb.dbo.before_waits b

INNER JOIN tempdb.dbo.after_waits a ON

b.wait_type = a.wait_type

WHERE a.wait_time_ms > b.wait_time_ms

ORDER BY a.wait_time_ms - b.wait_time_ms DESC;

DROP TABLE tempdb.dbo.before_waits;

DROP TABLE tempdb.dbo.after_waits;

GO

 

Over a 30 second inactive period, the following DBM related wait stats accumulated on the principal:

wait_type

wait_time_ms

signal_time_ms

DBMIRROR_EVENTS_QUEUE

30043

5

And here is what accumulated on the mirror:

wait_type

wait_time_ms

signal_time_ms

DBMIRROR_EVENTS_QUEUE

30966

6

No DBMIRROR_DBM_MUTEX to be found.

Question 2: Will DBMIRROR_DBM_MUTEX accumulate during a large data load in synchronous mode?

In my second test, I inserted 1,600,000 rows into the dbo.charge table. Granted – we have all kinds of reasons why there could be contention in a mirroring session including network considerations, synchronous versus asynchronous settings, disk contention for the databases (principal/mirror), memory constraints, 32-bit/64-bit considerations, performance overhead of non-mirrored databases on the same instance or server, number of concurrent mirroring sessions and more.

But in a simple test environment with minimal concurrent activity, 4 scheduler and 2 GB of RAM for the principal and the same for the mirror, can I get DBMIRROR_DBM_MUTEX to appear? I ran the following test:

:CONNECT CAESAR

USE Credit;

INSERT dbo.charge

(member_no, provider_no, category_no, charge_dt, charge_amt, statement_no, charge_code)

SELECT member_no, provider_no, category_no, charge_dt, charge_amt, statement_no, charge_code

FROM dbo.charge;

GO

These are the database mirroring related wait statistics that accumulated on the principal SQL Server instance:

wait_type

wait_time_ms

signal_time_ms

DBMIRRORING_CMD

253578

2083

DBMIRROR_EVENTS_QUEUE

83949

23775

DBMIRROR_SEND

5017

25

DBMIRROR_DBM_EVENT

12

0

And here is what accumulated on the mirror:

wait_type

wait_time_ms

signal_time_ms

DBMIRROR_DBM_MUTEX

482105

222

DBMIRROR_SEND

258917

5338

DBMIRROR_EVENTS_QUEUE

84676

3848

Changing to the mirroring session to asynchronous mode instead of synchronous, I saw the following on the principal:

wait_type

wait_time_ms

signal_time_ms

DBMIRROR_EVENTS_QUEUE

75359

22974

DBMIRRORING_CMD

64219

1655

DBMIRROR_SEND

8740

60

And I saw the following on the mirror:

wait_type

wait_time_ms

signal_time_ms

DBMIRROR_DBM_MUTEX

113730

105

DBMIRROR_EVENTS_QUEUE

78699

3051

DBMIRROR_SEND

69729

4057

So I saw the DBMIRROR_DBM_MUTEX wait type appear (on the mirror side) for both the synchronous and asynchronous modes during an INSERT of 1,600,000 rows. There are plenty of other tests I could have executed – but I wanted to show just a couple of examples to demonstrate viewing what gets accumulated over a specific period of time (and differentiating between idle waits versus activity-related waits).

Does this necessarily point to a database mirroring performance issue? As I’ve mentioned before, I use wait statistics as an initial pointer so that I’m prioritizing my efforts appropriately. We don’t have enough information to define root cause at this point just based on this data, but what we can derive through testing and observation is which wait types may be seen in conjunction with user activity in a mirrored partnership.

Categories:

You may have already heard that SQL Server 2012 adds automatic generation of statistics for AlwaysOn availability group readable secondaries, storing the statistics in tempdb.  This functionality was also extended to other areas beyond just AlwaysOn availability groups as I’ll demonstrate next…

Using the AdventureWorksDWDenali database, I’ll start by checking the current statistics on the FactInternetSales table:

USE [AdventureWorksDWDenali];

GO

-- Confirm statistics on FactInternetSales

EXEC sp_helpstats 'FactInternetSales';

/*

statistics_name statistics_keys

_WA_Sys_00000008_1273C1CD  SalesTerritoryKey

_WA_Sys_0000000A_1273C1CD  SalesOrderLineNumber

_WA_Sys_0000000B_1273C1CD  RevisionNumber

_WA_Sys_0000000C_1273C1CD  OrderQuantity

_WA_Sys_0000000D_1273C1CD     UnitPrice

_WA_Sys_0000000E_1273C1CD  ExtendedAmount

_WA_Sys_0000000F_1273C1CD  UnitPriceDiscountPct

_WA_Sys_00000010_1273C1CD  DiscountAmount

_WA_Sys_00000011_1273C1CD  ProductStandardCost

_WA_Sys_00000012_1273C1CD  TotalProductCost

_WA_Sys_00000013_1273C1CD  SalesAmount

_WA_Sys_00000014_1273C1CD  TaxAmt

_WA_Sys_00000015_1273C1CD  Freight

_WA_Sys_00000016_1273C1CD  CarrierTrackingNumber

_WA_Sys_00000017_1273C1CD  CustomerPONumber

_WA_Sys_00000018_1273C1CD  OrderDate

_WA_Sys_00000019_1273C1CD  DueDate

_WA_Sys_0000001A_1273C1CD  ShipDate

*/

Next, I’m going to add a UnitPrice value to an existing row (creating just one row out of the table with this value):

UPDATE TOP (1) dbo.FactInternetSales

SET UnitPrice = 777.77;

After that, I’ll drop the statistics for the UnitPrice column.  Why?  I want demonstrate a scenario where automatic statistics may not have already been generated due to incoming query requests – but may still be needed for queries against the database snapshot:

-- UnitPrice stats

DROP STATISTICS FactInternetSales._WA_Sys_0000000D_1273C1CD;

GO

Now I’ll create a database snapshot on the database:

-- Create the database snapshot

CREATE DATABASE AdventureWorksDWDenali_AWDW_S1

ON

( NAME = AdventureWorksDWDenali_Data,

FILENAME =

'C:\Program Files\Microsoft SQL Server\MSSQL11.CAESAR\MSSQL\Data\AdventureWorksDWDenali_AWDW_S1.ss' )

AS SNAPSHOT OF AdventureWorksDWDenali;

GO

What happens if I use the database snapshot to query for a specific UnitPrice value?  There are no statistics on the read-write database and now that I’m using the database snapshot, I’m not triggering automatic-statistics on the source database:

USE AdventureWorksDWDenali_AWDW_S1;

GO

SELECT ProductKey, ProductStandardCost

FROM dbo.FactInternetSales

WHERE UnitPrice = 777.77;

Looking at the actual plan for this query, I see that the estimated rows vs. actual do indeed match (and as an aside – the optimizer “guess” for this predicate would have been 3,852.72 without the supporting stats):

clip_image001

But I can also check to see if there are automatic statistics generated by qualifying is_temporary = 1:

SELECT name, stats_id

FROM sys.stats

WHERE OBJECT_ID = object_id('FactInternetSales')

     AND is_temporary = 1;

This returns:

clip_image003

The stats name even gave me an additional hint of its origins via the “readonly” naming convention.  If I look up the column for the automatically generated temporary stats, I see the following:

SELECT     object_name(object_id) tblname,

           COL_NAME(object_id, column_id) columnname

FROM sys.stats_columns

WHERE stats_id = 512000;

clip_image005

If you need a supporting index, you still need to provide this structure in the read-write database, but when a query can be optimized using statistics alone - this is definitely a welcome new feature.

I gave a presentation on columnstore indexing last week and one question I received was regarding the performance of a (hypothetical) narrow, supporting nonclustered index versus a columnstore index.  We discussed how nonclustered indexes were still going to be more effective for specific types of queries (for example – singleton operations, smaller range scans, etc.).  But what about a situation where we’re aggregating data across the entire fact table? Was there any way that a narrow nonclustered index could measure up against a columnstore index?

The ability to contrast the performance is of course very much dependent on the query construction, table design, data distribution and other factors.  With that said, I thought I would test out a scenario as follows…

First of all, I used FactInternetSales - pumped up to 123,695,104 rows.  The query I used for comparing performance was as follows (with a few hint-modifications to force index and parallelism caps that I used later):

SELECT     p.ProductLine,

           SUM(f.SalesAmount) TotalSalesAmount

FROM [dbo].[FactInternetSales] f

INNER JOIN [dbo].[DimProduct] p ON

     f.ProductKey = p.ProductKey

GROUP BY p.ProductLine

ORDER BY p.ProductLine;

 

I created the following nonclustered indexes (index on joining keys – and then an INCLUDE on the SalesAmount which is being aggregated):

CREATE NONCLUSTERED INDEX [NCI_FactInternetSales_ProductKey]

ON [dbo].[FactInternetSales]

([ProductKey] ASC)

INCLUDE ( [SalesAmount]) ON [PRIMARY];

GO

 

CREATE NONCLUSTERED INDEX [NCI_DimProduct_ProductKey_ProductLine] ON [dbo].[DimProduct]

(    [ProductKey] ASC,

     [ProductLine] ASC

)ON [PRIMARY];

GO

 

I also created a columnstore index that covered all columns in the fact table.  To force my query to ignore that index for the comparison test, I added the following line:

OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX);

 

So how did the performance compare between a supporting nonclustered index versus the columnstore index? (By the way, I used a warm cache for both tests – executing twice):

Index Support

Elapsed Time

CPU Time

Columnstore

256 ms.

1045 ms.

Nonclustered index

34189 ms.

33322 ms

 

The columnstore index plan was as follows:

clip_image002

The nonclustered index plan was as follows:

clip_image004

One obvious difference is that the columnstore index execution leveraged parallelism.  The query also benefited from batch execution mode:

clip_image006

If I hobble the columnstore index by capping max degree of parallelism, the results (in comparison) are as follows (third row represents the test):

Intended Index Support

Elapsed Time

CPU Time

Mode

Columnstore

256 ms.

1045 ms.

Batch

Nonclustered index

34189 ms.

33322 ms.

Row

Columnstore index (MAXDOP 1) – ending up using NCI

33540 ms.

34560 ms.

Row

 

Looking at the plan, we see the columnstore index wasn’t used at all:

clip_image008

If I force use of the columnstore index with MAXDOP 1, I see:

Index Support

Elapsed Time

CPU Time

Mode

Columnstore

256 ms.

1045 ms.

Batch

Nonclustered index

34189 ms.

33322 ms.

Row

Columnstore index (MAXDOP 1) – ending up using NCI

33540 ms.

34560 ms.

Row

Columnstore index (MAXDOP 1) – forced columnstore index

29859 ms.

30107 ms.

Row

 

The plan shows the index is used, but it is row mode (due to capping parallelism):

clip_image010

clip_image012

Looping back to the inciting discussion – even with a supporting nonclustered index, this specific query which scanned all 123 million rows and returned a smaller result set with aggregated values performed significantly better when using a columnstore index in conjunction with batch execution mode.

When moving from SQL Server 2008+ to SQL Server 2012, be aware that the sampling algorithm has changed.

For example, I created a partitioned clustered index on two identical tables (same schema, rows and distribution), one in SQL Server 2008 R2 SP1 and the other in SQL Server 2012 RC0. 

Below shows the partial output (STAT_HEADER) of a DBCC SHOW_STATISTICS on SQL Server 2008 R2 SP1:

clip_image001

All rows were evaluated for the statistics generation.

Now compare this to SQL Server 2012 RC0:

clip_image002

SQL Server 2012 now uses default sampling for the partition creation and rebuild.

Is this good or bad?

Certainly if you’ve required a full scan in the past in order to get higher quality stats, this may not be a good thing.  Otherwise you may not notice anything at all from a query performance perspective if the default is good (or accurate) enough.

You can see the number of steps decreased in my examples in this post between 2008 R2 SP1 to SQL Server 2012.  My histograms for this identical table also changed of course.  For example, moving from average rows on one step of 5504 down to 5305, distinct range rows from 343 up to 361... The real question is, will these changes be enough to turn a good plan bad. 

Post-upgrade, if you see changes in query performance against partitioned tables, evaluate the histograms and test performance of the workload after the default sampling versus after a full scan update of the statistics (via UPDATE STATISTICS for example).  

Theme design by Nukeation based on Jelle Druyts