SQL Server PDB Symbols and Callstacks The Easy Way

Lately, I have been doing a lot of work troubleshooting certain behaviors in SQL Server for workloads that are, to put it simply, designed horribly. As a part of this, I have found it necessary to collect callstacks with Extended Events and to materialize them using the debugger symbols for SQL Server to see where exactly certain types of issues are being encountered to better understand some of the internals of newer features in SQL Server 2017 and 2019. Years ago I blogged about how to use the package0.callstack action in Extended Events for this type of thing, and Paul also has a blog post that talks about how to download the PDB symbols for SQL Server as well as a post that also demonstrates using the package0.callstack action to determine what causes a particular wait type. Using the debugging tools to get the symbols is somewhat clunky and tedious, so when I happened on this amazingly simple method of getting symbol files I had to share it.

The SQLCallstackResolver on Github has to be one of the greatest things since sliced bread if you want to materialize callstacks from SQL Server. All you need is the current release of the application, and the appropriate SQL Server binaries, and it will create a Powershell script to download the appropriate symbols files. There is even a Wiki page with scripts for downloading most of the builds for all of the versions of SQL Server but if one happens to not be there, you can run the application, choose the Binn path where that build’s executable and DLL’s are located and it will generate the PowerShell script to download the symbols. What’s even better though is that the application allows you to paste a callstack from any version of SQL Server into the window, point the Symbols path to the appropriate folder with the symbols downloaded, and it uses the Microsoft.Diagnostics.Runtime.dll project to materialize the callstack to the function names and calls. So if you have the output of the callstack and you know what the build of SQL Server is that generated it, you can now materialize the functions with just this application. No need to take a memory dump of SQL Server, no need to use Symchk, no need for trace flag 3656 live in your server, you can copy/paste/resolve in one place.

This is very, very cool!

Availability Group Readable Secondaries – Just Say No

Since SQL Server 2012, Microsoft has put a lot of efforts into marketing AlwaysOn Availability Groups as a feature and solution for both high availability/disaster recovery for SQL Server, and scalability for read-only workloads.  Over the last six years, I have talked more clients out of using Availability Groups than I have actually implemented for SQL Server for a lot of different reasons.  As easy as Microsoft has made it to build an Availability Group, restore and synchronize the databases, join secondary servers, etc. the simple fact is that once they are in production usage, you had better know what not to do that could break things, the impact having a readable secondary, and how to troubleshoot problems when they occur, because they will.

A lot of the implementation questions I get for Availability Groups are generally looking at leveraging a secondary replica for High Availability or Disaster Recovery and often times also for offloading reporting or readable workloads.  Readable Secondary copies of the database are really simple at the surface conceptually, but there are a lot of additional considerations I make that could lead you to another technology entirely like Transactional Replication or Log Shipping with Standby. Let’s look at how these three stack up against each other for different data access requirements.

Real-time Data

First things first, for real-time data access, you only have one option, the primary writeable copy of the database. In replication this is the publisher database, log shipping calls it the log shipped primary, and Availability Groups refers to it as the primary replica.  None of the secondary copies of the data, even in a synchronous Availability Group replica, is ever guaranteed to actually be real-time data.  A synchronous Availability Group only guarantees that the log records for an operation are hardened in the synchronous secondary replicas transaction log allowing them to later be redone on the secondary database when a transaction commits.  Depending on the volume of activity that is occurring, and whether the secondary replica is being used for read-only workloads or has sufficient hardware, the redo thread can typically be anywhere from a few hundred milliseconds to a few seconds behind the actual primary replica.  This means that if you insert data into a table, scope the identity value and return it back to the application, then attempt to read that record immediately from the read-only secondary, it could actually not be there (Yes, this actually does happen).

Near Real-time Data

For near real time data access, both Replication and the Availability Group can meet the requirement for near real-time data. Replication typically will have slightly more latency than an Availability Group readable secondary in either commit mode by a few seconds due to the way replication functions.  The log reader agent has to read the transaction log, write replication changes to the distribution database, and then the distribution agent has to pickup those changes up and apply them to the subscriber.  This typically has a latency of 3-7 seconds under normal conditions, where an Availability Group readable secondary could have less latency depending on the volume of activity. The big difference between the two is entirely in the implementation and features supported.

If you need a non-clustered index to support the read-only workload, that index must be created on the primary database of an Availability Group for it to be available on the readable secondary.  This means you have to backup and maintain that index and the maintenance will affect all servers in the Availability Group as a logged operation requiring redo on secondary replicas.  This is not the case with transactional replication, where the subscriber database is a writeable copy of the database and can have a completely different indexing design from the publisher database that is based on just the read-only workload.  Since the subscriber database is writeable, that also means that you can use features like Query Store, plan forcing, and automatic plan correction for the read-only workload; all of these are unavailable for a readable secondary replica in an Availability Group because the database is an exact duplicate of the primary (which can be enabled for Query Store and use these features).

Another consideration is the impact of the 14-byte row-version tag generation to support Read Committed Snapshot Isolation (RCSI) for the read-only replica in an Availability Group.  This can only be generated on the primary replica database, but is not required on the publisher database for transactional replication. Why does this matter? It can lead to page splits that cause logical fragmentation of the indexes. Moving to a readable secondary may require monitoring index fragmentation and introducing fill factor changes where you wouldn’t have needed them previously. If you have read the incredibly bad information online telling you that you can ignore index fragmentation entirely with SSDs, or that you don’t need to rebuild indexes with an Availability Group, consider this: logical scan fragmentation may not matter much from a performance standpoint on fast storage, but a low page density caused by page splits and fragmentation affects every single replica copy and wastes space on disk and in memory in the buffer pool, and affects plan select due to query costing by the optimizer. With replication, all of this is avoided entirely unless you use RCSI for your publisher workload anyway, but that does not impact the subscribers in anyway since they are only replicated copies of the data modification operations, not the pages themselves.

Point-in-time Data

For point in time data access, I often go low tech and suggest using Log Shipping with Standby and delayed restore to create a warm copy of the database that is restored to a point in time.  This can be useful for accounting reporting on the previous days sales transactions, performing ETL into a data mart or data warehouse, and it also provides the benefit of having a second copy of the database at a known point in time for DR purposes.  You can also do this type of reporting off of the near real-time data provided by Replication and Availability Groups typically as well. The table below breaks down the features and requirements of each of these three technologies for providing a readable copy of the database.

Real Time DataSecondary DelayReadable Workload IndexesRow-version Tag OverheadQuery Store SupportAutomatic Plan CorrectionPartial Data Only
Availability Groups (Sync)Primary OnlyDepends on Redo typically within a few secondsMust be created on PrimaryPrimary database with Readable SecondaryPrimary OnlyPrimary Only

No

Availability Groups (Async)Primary OnlyDepends on Log Send Queue and Redo typically within a few secondsMust be created on PrimaryPrimary database with Readable SecondaryPrimary OnlyPrimary Only

No

ReplicationPublisher OnlyDepends on log reader and distribution agent typically within a few secondsMay be created on subscriber copy onlyOnly if RCSI enabledPublisher and SubscribersPublisher and Subscribers

Yes

Log ShippingPrimary OnlyDepends on restore job configuration and standby – Point in time only, updates require disconnecting usersMust be created on PrimaryOnly if RCSI enabledPrimary OnlyPrimary Only

No

Complexity and Manageability

So what about complexity and manageability of these different options?

In the opening paragraph of this post I pointed out how easy Microsoft has made it to build an Availability Group to handle read-only copies of a database or group of databases. Setting up an Availability Group is really quite simple but that doesn’t mean it’s not lacking in complexity.  First you need to know what kind of Availability Group are you going to implement.  What started as a single feature has morphed over each release into:

  • Basic Availability Groups – Standard Edition (single database)
  • Enterprise Availability Groups – Enterprise Edition (one or many databases)
  • Distributed Availability Groups – Support for up to 17 readable secondaries without daisy-chaining
  • Domain Independent Availability Groups – Environments without Active Directory
  • Read-scale Availability Groups – No clustering requirement or HA

Add Linux into the mix of options as well as whether or not you need failover clustering as a part of that configuration, and this isn’t as simple as it seems.  One of the common areas of failure with Availability Groups that I’ve seen is with choosing the correct quorum configuration for the Windows Failover Cluster configuration, though Microsoft has made this easier in Windows Server 2012 R2 and higher. An error in quorum configuration can result in your High Availability solution failing and taking the databases offline as a result.

Replication tends to get a bad rap when it comes to complexity and manageability. The reality is that the technology just isn’t well understood by most SQL Server professionals, and it is really easy to configure (there’s a great UI in SSMS for it also), and as long as you don’t play with it or modify data on the subscriber databases, it usually just works without a hitch. Change management is something that has to be planned out with Replication, where it wouldn’t have to be with Availability Groups.  New tables have to be added to replication manually as an article, and certain operations like dropping and recreating a table or truncating a table are not allowed when the table is being replicated (sorry, no more using SSMS designer to insert a column in the middle of a table). Compared to Availability Groups, just for change management, Replication is much more complex and requires more work to manage over time, but the types of changes that require manual intervention in Replication typically are not occurring at a frequent pace for established applications and databases.

Log Shipping is just stupid simple low-tech and easy to manage. I mean for real how difficult is taking a backup and restoring it on another server? You might be surprised, but the configuration wizard and Agent Jobs automate this so that you typically don’t have to do anything but let it do its thing. Even third party backup tools offer their own flavor of implementation of log shipping as a part of their backup products for SQL Server typically. If something ever gets out of sync for log shipping, it usually means that someone took a manual log backup of the database, or switched recovery models and broke the log chain, and the standby database will have to be reinitialized from a new FULL backup of the primary to resume the restores (or you can manually apply the missing backup if a manual log backup was performed to close the gap).

Summary

While Availability Groups have become the Go To technology for providing a readable copy of data for SQL Server, they are not the only feature available to meet this requirement.  It is important to understand the requirements and the limitations of any feature that you are considering using, and to also evaluate the other available features that might be a better fit for the needs of the environment.  I have used transactional replication for nearly 10 years to solve read-only workload and reporting requirements for SQL Server and continue to use it over Availability Groups where as a technology, it is just a better fit for the requirements.  Being able to reduce indexing on the primary writeable copy of the database and only having readable workload non-clustered indexes on the subscribers can be a huge benefit for some workloads. For one client it was the difference between a 800GB publisher database and a 1.7TB primary AG replica which affected their backup/restore times and RTO SLAs.  Collect the requirements, evaluate the technologies and limitations, then choose the appropriate way to solve a problem.

Where Can I Download a Specific SQL Server Cumulative Update?

Microsoft has made a lot of changes to their servicing model for SQL Server over the years, specifically moving more towards Cumulative Updates and away from Service Packs as a means to deliver updates and fixes to SQL Server. Today I needed to replace a server in a Failover Clustered Instance (FCI) of SQL Server 2014 that was on Service Pack 1 + CU7. So I went to the KB article for CU7 (https://support.microsoft.com/en-us/kb/3162659) and clicked what I thought was the download link to CU7 for SP1:

Download Latest CU Link

However, to my dismay that link took me to the download link for SQL Server 2014 SP1 CU8 and not to CU7 as I was expecting. So I tried other SP1 CU links and they all redirect to the CU8 download page. Slightly irritated I decided to send a Skype IM to Glenn (@GlennAlanBerry|Blog) what was going on because he keeps up with every update Microsoft publishes for SQL Server. Glenn told me that Microsoft’s recommendation and preference is for people to install the latest CU so the KB articles now link to the latest CU download page. That doesn’t really help me with adding a failed FCI node back into a FCI that is on CU7, so Glenn offered to share the CU7 file with me by Dropbox.

It turns out that I didn’t need Glenn to share the file with me, I needed to read the information on the KB article closer and pay attention.

Download CU7 Link

The Note at the bottom of the information specifically says that after future CUs are released this specific CU can be downloaded from the Microsoft Windows Update Catalog.  So if you need a specific CU for SQL Server, either for testing purposes, or to fix a failed FCI node, head over to the Microsoft Windows Update Catalog, and you can easily find it there.

One additional note:

You have to use Internet Explorer to access the Microsoft Windows Update Catalog, the site won’t allow you to access it using Google Chrome or other non-IE browsers it seems.