Planning to use Availability Groups? Keep this page bookmarked…

If you’re planning on using Availability Groups in a production environment, I highly recommend you review the following Books Online topic:

Prerequisites, Restrictions, and Recommendations for AlwaysOn Availability Groups (SQL Server)

There are other topics that should be reviewed too, of course, but this particular landing page is being kept up-to-date with various recommended hotfixes,  prerequisite checklists and even information about anticipated thread usage that was originally discussed on the CSS SQL Server Engineers Blog.

Now I’m recommending you keep the online version of this page bookmarked, not the offline version.  If you downloaded the local version of BOL using the Microsoft Help Viewer technique, you don’t have the latest version of the topic (at least I didn’t as of 10/2/2012 and didn’t find a more recent update).  This may get refreshed periodically, but to be safe I recommend you reference the latest version on MSDN and not risk working with stale information.

Recap: SQLSaturday #149

Yesterday’s SQLSaturday #149 (Minnesota) was most excellent.  A few comments/observations/recaps:

  • The event was extremely well organized and professional – while still managing to maintain a regional, relaxed atmosphere across 400+ attendees.  The organizers did an absolutely incredible job and should be proud of themselves.  There were several folks involved, and plenty of “orange shirts” walking around helping people throughout the day.  Hopefully they are all sleeping (Paul Timmerman in particular).
  • Lots of great sessions to pick from, and as always, it was hard to choose.
  • I got my first opportunity to present “Resolving Cardinality Estimate Issues” and I also facilitated an informal “Lunch-on-a-Stick” query tuning and indexing discussion, facilitated along with Ted Krueger.  We were going to do separate sessions, but we were sharing the same room and it just seemed like a better idea to combine forces.  It worked out well.
  • I got a chance to meet a ton of new people and also folks I knew only in digital-form, but hadn’t yet met in person.  Lot’s of familiar faces and old friends too.

Lastly, I just finished uploading my deck from my session yesterday, “Resolving Cardinality Estimate Issues”.  I’ll be expanding this out into a Pluralsight course in the future – as there is plenty more to discuss and demonstrate, and 75 minutes is really just a start…

RunAs Radio Interview

A few weeks ago I was interviewed by Richard Cambell of RunAs Radio:

 Joe Sack Tunes Queries in SQL Server!

We talk about performance tuning, consulting scenarios and a little bit about SQL Server 2012 columnstore indexing.

Presenting at SQLSaturday 149

Next Saturday, September 29th, I’ll be presenting at SQLSaturday #149 in Minneapolis, Minnesota – my home town.

This will be the second SQLSaturday I’ve been able to attend and speak at since joining SQLskills (almost) one year ago. I’ll be presenting “Resolving Cardinality Estimation Issues” and also facilitating the SQL Server performance topic over lunch time.

The speaker lineup is an embarrassment of riches – especially considering that this is a free event.  Much to choose from at varying levels and aspects of the SQL Server product line.  Don’t worry if this is taking up your weekend, as it will be time well spent.

If you get an opportunity, please stop by my session or the lunch table to say hello and introduce yourself!

New Course: SQL Server: Transact-SQL Basic Data Modification

As Paul recently announced, the entire SQLskills team will be rolling out SQL Server content across several subjects on Pluralsight.com.

My latest course,  “SQL Server: Transact-SQL Basic Data Modification”, which is the companion to “SQL Server: Transact-SQL Basic Data Retrieval”, was just published today by Pluralsight.com.

The course description is as follows:

“If you need to modify data in a SQL Server database, then you need to know how to use the INSERT, UPDATE and DELETE statements. This course starts by explaining how to find information about the tables and columns you want to modify. It then explains the INSERT, UPDATE, and DELETE statements in detail along with various methods for limiting the data being modified. Finally it moves beyond the basic modification statements to more advanced topics like the MERGE statement, error handling and more. More then thirty five demos help to give you a thorough understanding of how to perform these essential operations, all using a freely-available demo environment that you’re shown how to set up and configure. This course is perfect for developers who need to modify data in SQL Server databases, from complete beginners through to more experienced developers who can use some of the modules as reference material. The information in the course applies to all versions from SQL Server 2005 onwards.”

I’ll continue doing other T-SQL courses in the future for various levels, starting with the beginner level content as I’ve been doing.  My next Pluralsight course will be on a different subject related to SQL Server performance, but I’ll hold off on describing it for now.

What Does the Future Hold for Cardinality Estimates and Cost Models in Windows Azure SQL Database?

I’ve been interested in exploring what changes may be coming through in SQL Azure (Windows Azure – “SQL Database” now) around query execution cost models and cardinality estimates being fed to them.  This was motivated by a couple of things…

First of all, I saw a blog post from Grant Fritchey where he noticed that the estimated costs for a few plan operators were different.

Secondly, I read a paper called “Testing Cardinality Estimation Models in SQL Server” by Campbell Fraser, Leo Giakoumakis, Vikas Hamine, and Katherine F. Moore-Smith.  This was a for-fee article, but the non-member price of $15.00 was worth it.  One particularly interesting quote was as follows:

“The new CE model is planned as a future service release of the Microsoft SQL Azure service.”

That quote was a tipping point for further investigation, so, collaborating with Jonathan Kehayias, we discussed a testing approach and set up two different Azure databases, with one database on Web Edition and the other Business Edition.  The intention wasn’t to perform “formal” tests, but I did want to sniff around and see what variations in cost and cardinality estimates I could find (if any) between SQL Azure (version 11.0.2006) and SQL Server 2012 (version 11.0.2316) across various types of queries.  I used the Credit database for a variety of test queries – with identical schema and data in all three databases (one engine DB and two Azure DBs).

One thing I’ve learned so far is that you should watch out for is misinterpreting cost differences for “identical” databases.  Even if you load the same exact schema and rows, you will likely have a different data page count between Engine and Azure (think of the Azure fillfactor and RCSI behavior).  For example, after loading my dbo.member table in SQL Azure, it had 159 pages versus the 142 pages in my SQL Server 2012 version.  So testing an initial Clustered Index Scan query against that table showed me an estimated IO of 0.1075694 in SQL Server 2012 versus 0.120162 for SQL Azure.  So assuming one random I/O and the rest sequential, I see that my SQL Azure cost is still calculated the same for the Clustered Index Scan:

– Random I/O – 0.003125
– Sequential – 0.000740741
SELECT    0.003125 +
        0.000740741 * (159-1);

So the key will be to make sure I’m looking at true apples-to-apples comparisons here.  I’ll be testing when I have a few spare moments between other tasks – but in the meantime I’m very interested to learn more about what new changes will come in to SQL Azure in the future.  I’ll share anything interesting I find on the blog – and if you find noteworthy QO items, please share on the comments of this blog as well.

Thanks!

Transactional Replication Publications and Availability Groups

Books Online documents a few scenarios regarding Replication and Availability Group interoperability.  Today I tested out the process detailed here:

Configure Replication for AlwaysOn Availability Groups (SQL Server)

It worked as advertised and I tested this on a five replica AG topology with three synchronous replicas (including the primary) and two asynchronous replicas.  I won’t rehash the BOL steps – but I did want to mention a few observations about the process:

  • One of my AG replicas was also the same SQL Server instance as my subscription database (non-AG database), so I skipped the sys.sp_addlinkedserver step for that particular SQL Server instance.  Collocation of the primary replica and subscriber worked fine.
  • While it is possible to make one of your participating replica SQL Server instances the distributor, it doesn’t make sense to do so from an HA/DR perspective.  But if your distributor is indeed remote and not collocated with the AG replicas, think about FCIs for providing HA.
  • The publications show up in SQL Server Management Studio under the Replication\Local Publications folder.  Hovering over the publication from a secondary replica will still show a yellow (tooltips-like) dialog box showing the original SQL Server instance where you created the publication – even if that replica is currently a secondary.
  • The New Publication Wizard doesn’t stop you from creating a Peer-to-Peer publication for an availability database, even though this combo is not supported by Microsoft.  I didn’t finish P2P configuration – but now I’m curious if it actually works (even though it wouldn’t have support).
  • Deleting a publication for an availability database raises the error 18752 “Only one Log Reader Agent or log-related procedure (sp_repldone, sp_replcmds, and sp_replshowcmds) can connect to a database at a time”.  This error was repeatable with or without existing subscribers.  The error also gets followed up with a “change database context to” message. Even after the message, the publication does indeed get removed.  This message is seen both with the GUI and with sp_droppublication.  I’ll likely put out a Connect item on this one (I didn’t see one that matched my scenario).

Why consider replication when you have AG readable secondaries?  There are several use-cases that I could think of – for example if you want to have a sub-set of the overall data and use customized indexing on the subscriber.  Another case would be to have access to replicated data if there is an outage of the AG. 

I’m going to write about testing the AG subscriber scenario in another post.

Replication Extended Events, Not a Tool in your Toolbox (Yet)

There are already a number of data sources you can reference when investigating replication issues.  One data source on my wish list was to have a one-stop shop in Extended Events similar to the AlwaysOn Health Session.

It turns out that SQL Server 2012 does have a few new replication related events, but don’t get too excited… Books Online manages our expectations in the following text (underlined text added by me):

“Replication supports Extended Events, however, this feature is for internal use only at this time. Replication extended events were added to help customer support engineers collect information to troubleshoot replication problems. The information collected is not useful for replication performance tuning or monitoring.”

There was a dash of hope in the “at this time” qualifier, but that was the only good news I could get from this. But even then, I wanted to be absolutely sure that there were truly no hidden diagnostic data sources that could be leveraged for replication issues.

I found the following potentially promising events in sys.dm_xe_objects:

repl_event
logreader_process_text_info
logreader_process_text_ptr
logreader_process_filestream_info
logreader_add_compensation_range
logreader_add_eor
logreader_apply_filter_proc

The first one was what I decided to investigate today was “repl_event”.  It is described in sys.dm_xe_objects as “Occurs when sp_repl_generateevent is called. this event is an internal repl event for tracing repl stored procedures. The data that is returned from user_event includes the event_id that was specified in the call to sp_repl_generateevent. This can be a value between x and y.“ :

CREATE EVENT SESSION [repl_event] ON SERVER 
ADD EVENT sqlserver.repl_event 
ADD TARGET package0.ring_buffer
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=ON);
GO

I started this event session and set up transactional replication (configured the distributor, simple publication, one subscriber).  No events were triggered when I did this.

I then investigated the sp_repl_generateevent procedure itself.  Looking at the definition (or trying), returned the message “replgenerateevent extended procedure”. 

Which objects reference sp_repl_generateevent?  All I could find was sys.sp_MSaddmergetriggers_internal.  The referencing section of the system stored procedure was as follows:

select @command2 = ‘

       — update any ppm row that already exist with this gen

       update ppm set ppm.generation = case when @is_mergeagent = 1 then 0 else @newgen end

       from ‘ + @quoted_past_mappings_viewname + ‘ ppm with (rowlock) inner join deleted v

       on ppm.tablenick =@tablenick  and ppm.rowguid = v.’ + @quoted_rgcol + ‘

       — insert the past partition mapping into gen 0 if this is the merge agent

      insert into ‘ + @quoted_past_mappings_viewname + ‘ with (rowlock) (publication_number, tablenick, rowguid, partition_id, generation,reason)

      select distinct ‘ + convert(nvarchar(100), @publication_number) + ‘, @tablenick, v.’ + @quoted_rgcol + ‘, v.partition_id, case when @is_mergeagent = 1 then 0 else @newgen end, 1

      from ( ‘ + @partition_deleted_view_rule + ‘ ) as v  
      if (@@ROWCOUNT <= 0)

      begin

           select @xe_message = CAST(”replica_id: ” + convert(nvarchar(100), @replnick) + ”, article_id: ” + convert(nvarchar(100), @tablenick) + ”, rowguid: ” + case when @article_rows_deleted = 1 then convert(nvarchar(100), @rowguid) else N”0” end + ”, generation: ” + case when @is_mergeagent = 1 then N”0” else convert(nvarchar(100), @newgen) end + ”, Reason: -1” AS varbinary(1000));

          exec master..sp_repl_generateevent 1, N”Event : ppm_insert”, @xe_message

      end

      ‘

Since I was on a test SQL Server instance, I thought I would test out a direct call to this procedure just to indeed see that this procedure was hooked to the repl_event:

DECLARE @xe_message varbinary(1000) = 
    CAST('Event payload' AS varbinary(1000));

EXEC sp_repl_generateevent 1, N'Event: Am I captured?', @xe_message;

Sure enough – the repl_event was fired:

image thumb Replication Extended Events, Not a Tool in your Toolbox (Yet)

 

Whether repl_event gets leveraged in the future, we’ll see.  If Microsoft implements this in the future, my wish list would include the following events (and knowing that we can capture these in other areas – but again I’m interested in a consolidated session):

  • Replication configuration events (creation, dropping, changes)
  • Agent statistics, like periodic reader/writer thread latency statistics
  • Subscription expirations
  • Conflicts
  • Failed replication jobs and retries
  • Data sync warnings
  • Interoperability events (for example – database mirroring failovers of the publication database)

I may investigate the logreader_* related events at some point, but based on the naming and descriptions of these events I don’t see significant use cases at this time.

If you run across any other replication related events that you find useful, please share your comments on this post.  Thanks!

Capturing Transient Query Plan Changes

I had a recent consulting engagement where a query had unpredictable performance on an isolated test environment.  I had two initial questions:

- What were the wait stats associated with the unpredictable query?

- What did the query execution plan look like in the “good” versus “bad” condition?

To address the query wait stats question, I set up an Extended Events session to track the query’s accumulated wait stats for each execution.  The performance issue would only happen a couple of times today, so I scheduled a job to loop the execution – without disconnecting, and keeping the session ID being referenced in the Extended Events session.  I set up a separate table to track each test run (begin/end time).  This way the long periods could be associated back to the Extended Events session.

We caught a few instances of the long running query, and the associated wait stats were primarily related to PAGEIOLATCH_SH.  That opened up other considerations which I don’t cover in this post, but I was still interested in seeing the execution plan for the long-running time periods versus the “steady” state.

For example, let’s say you have the following query:

EXEC sp_executesql    
    N'SELECT     p.ProductLine,
               SUM(f.SalesAmount) TotalSalesAmount
    FROM [dbo].[FactInternetSales] f
    INNER JOIN [dbo].[DimProduct] p ON
         f.ProductKey = p.ProductKey
    GROUP BY p.ProductLine
    ORDER BY p.ProductLine';
GO

This particular query leverages parallelism and a columnstore index in my database.  I can search for the query hash and query plan hash as follows (after executing on my test system):

SELECT TOP 1 last_execution_time, query_hash, query_plan_hash, p.query_plan
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text (qs.plan_handle) AS t
CROSS APPLY sys.dm_exec_query_plan (qs.plan_handle) AS p
WHERE t.text LIKE '%SUM(f.SalesAmount)%'
ORDER BY last_execution_time DESC;
GO

I order it by last execution because I want the most recent execution version (and you have to be careful when grabbing the query_hash to make sure the text match in your WHERE clause isn’t capturing another query).  Obviously easier to do on an isolated test system – but not impossible in production if you pay attention to execution count statistics.  The following shows example results from the previous query:

snaghtml506cb63 thumb Capturing Transient Query Plan Changes

I was interested in the query_hash and also the query_plan – which for this example, is the parallel plan:

image thumb Capturing Transient Query Plan Changes

Now what if I re-execute my query – but with parallelism disabled at the server scope?  No query change – but now the plan must change since it can’t run in batch execution mode.  Will tracking it based on my query_hash still work? 

SELECT query_plan_hash, p.query_plan
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_query_plan (qs.plan_handle) AS p
WHERE qs.query_hash = 0x3759AECF09255926
GO

In this case, I haven’t changed my query, but the plan changed, and so my query_hash allowed me to see the new serial query plan:

image thumb Capturing Transient Query Plan Changes

image thumb Capturing Transient Query Plan Changes

In the particular consulting scenario I mentioned at the beginning of this post, I wrote the most recent query plan to a separate table for each test run so that when the long performance happened on the test environment, we could go back and compare the different plans based on the same query hash value.

SQL Server 2012’s Information on Parallel Thread Usage

SQLServerCentral published an article today that I wrote about SQL Server 2012’s execution plan parallel thread usage statistics:

SQL Server 2012’s Information on Parallel Thread Usage

The new thread statistics show information on the number of concurrent execution paths within an execution plan, the count of used threads and also the count of reserved threads per NUMA node.  Definitely a useful feature, giving you visibility to actual thread reservation and not just the number of concurrently executing workers.