Do you need to update statistics after an upgrade?

This post originally went live on May 11, 2018, but modifications were made on May 14, 2018 after some additional internal discussions with Microsoft.  Changes made on May 14, 2018 are in blue. 

There are a variety of methods we use for helping customers upgrade to a new SQL Server version, and one question we often get asked is whether or not statistics need to be updated as part of the upgrade process.

tl;dr

Yes.  Update statistics after an upgrade. Further, if you’re upgrading to 2012 or higher from an earlier version, you should rebuild your indexes (which will update index statistics, so then you just need to update column statistics).

History

Some of you may remember that the stats blob changed between SQL Server 2000 and SQL Server 2005, and Microsoft specifically recommended updating statistics after upgrading from SQL Server 2000.  Official Microsoft documentation about the stats blog change in SQL Server 2005 is difficult to find, but this article includes the following paragraph:

After you upgrade from SQL Server 2000, update statistics on all databases. Use the sp_updatestats stored procedure to update statistics in user-defined tables in SQL Server databases. This step is not necessary for upgrading from SQL Server 2005.

Current Microsoft documentation related to upgrading does not state anything specific about updating statistics, but people continue to ask and if you peruse forums, blog posts, and other social media options, you’ll see recommendations to update statistics. Further, the documentation that Microsoft provides about when to update statistics does not mention anything about upgrades.

Side bar: I don’t recommend using sp_updatestats, and here’s why: Understanding What sp_updatestats Really Updates.

Today

The statistics blob has not changed since SQL Server 2000 to my knowledge, but I thought I would ask someone from Microsoft for an official recommendation to share publicly.  Here you go:

Microsoft suggests that customers test the need for a full update of statistics after a major version change and/or a database compatibility level change.

Further items to note:

  1. If Microsoft updates the format of statistics (e.g. the stats blog), customers will be expected to update statistics after an upgrade.
    1. Microsoft does not always upgrade the statistics format as part of a major version upgrade.
  2. There are occasions where Microsoft does not change the format of statistics, but they do change the algorithm for creating statistics as part of a major version upgrade or database compatibility level change.

In addition, there was a change in the nonclustered leaf level internals in SQL Server 2012, so if you are upgrading to 2012 or higher from an earlier version (e.g. 2008, 2008R2), rebuild your nonclustered indexes.  And remember, rebuilding indexes updates the statistics for those indexes with a fullscan, so you do not need to update them again.

Conclusion

As part of your upgrade methodology, it is recommended (by me, based on experience with a lot of customer upgrades) to build in time to update statistics.  I’ve gotten some pushback from customers who don’t want to update statistics after upgrade because it takes too long.  Some kind reminders:

  • Updating statistics is an online operation, therefore, the database and related applications are accessible and usable. A statistics update does take a schema modification lock so you’re not allowed to make any changes to a table while its stats are updating.  Therefore, if you decide to change your schema after upgrading your SQL Server version (not something I would typically recommend), do that before you update stats.
  • You need to update statistics regularly to provide the optimizer with current information about your data, so at some point it needs to be done. Immediately after an upgrade is a really good time, considering the aforementioned items.

If you’re not comfortable upgrading to a newer version of SQL Server, we can help!  I’m in the process of helping a customer migrate from SQL Server 2012 to SQL Server 2017, and I’m so excited to get them up to the latest version so they can start using some new features…like Query Store 😉

Query Store Examples: Stories from customers

In the past week I’ve used Query Store to troubleshoot performance issues for two different customers running SQL Server 2016 and higher.  I’ve presented a lot of sessions about what Query Store is, how to configure it, and basic use cases, but we really take it to the next level when we can provide Query Store examples from the real world.  If you are running SQL Server 2016 or SQL Server 2017, I hope you’re using this feature!

Scenario 1

A long time customer reached out for some help after adding 50% more memory to their VM and seeing no improvement in performance.  Sometimes you can’t throw hardware at an issue, and they wanted to look at using In-Memory OLTP to help improve performance.  After a 10 minute discussion on In-Memory OLTP, where I explained why it wouldn’t magically solve their performance issues (and also explained the amount of testing that would need to be done prior to implementation), I took a look at the system.

I started with a health audit and a review of wait statistics.  There were a couple settings they could change, but nothing was horribly misconfigured.  Wait stats showed nothing remarkable.  The majority of waits were due to CXPACKET, but the average duration was extremely low.  They had some WRITELOG and PAGEIOLATCH waits, but these were also low in average duration.  I asked if they had Query Store enabled…they did not.  I requested that they enable it to capture query information while I starting to look at plan cache data.  With Query Store collecting information, I used Glenn’s DMV queries to dig into the plan cache to look at queries that were executing most frequently, taking the longest, and consuming the most resources.

Now, while the plan cache has some fantastic information, it’s transitory.  Plans can fall out of cache because they aren’t used, or because they get recompiled, so truly tracking information for execution frequency over time can be a bit tricky.  Further, there are some plans that never make it into the plan cache (e.g. trivial plans or those with the RECOMPILE hint).  This is where Query Store shines.  EVERY query that executes can be captured in Query Store, even if it’s trivial or has the RECOMPILE hint.  Further, execution frequency and resource use is aggregated over defined intervals (e.g. 15 minutes, 30 minutes), as determined by the INTERVAL_LENGTH_MINUTES setting.  You can look back over a small or large period of time to see more than just what query performance looked like, but also how many times queries executed.

Several hours later, after a typical workload where users said performance was “slow”, we looked at the Query Store data.  In terms of high resource queries, there were a few (e.g. lots of reads, or lots of CPU), but the execution frequency was low for a lot of those heavy hitters.  Then we looked at queries executing most frequently – the death by a thousand cuts scenario – and here we could see that there were thousands of queries executing in an hour, and while individually a query might not take long to execute, or use a lot of resources, but cumulatively it added up.

Queries with high execution count in Query Store

Queries with high execution count in Query Store

 

 

In digging into some of those queries I quickly noticed that almost every plan had a missing index recommendation.  I queried sys.indexes and found that almost 95% of the tables in the database had 2 indexes or less.  It’s a rare case where a database is under-indexed.  Their problem?  Their workload volume and their data are slowly, but steadily increasing.  They have some big tables that are missing indexes and causing scans, and even though all that data is sitting in memory (because they have enough server memory to hold the entire database), they are using a lot of CPU and time to roll through that data.  I could have determined this with the information in the plan cache, but I would have had to set up a job to capture it on a regular basis and then write some queries to do analysis against it.  With Query Store, I just had to enable it, let it capture the data, then use the UI to look at performance.

Scenario 2

A new customer engaged with us after upgrading to SQL Server 2017.  They had some extreme cases of variability in query performance – the system would be running fine and then all of the sudden performance would tank.  They would knee-jerk and free the plan cache, then suddenly things would be great again.  A couple hours later, the same problem.  Rinse and repeat, rinse and repeat.  We had them enable Query Store and within a couple hours we took a look at the data (letting them continue with the practice of freeing procedure cache when there was a problem).  Within the Query Store UI I looked at CPU use for that window of time, and used the data grid to sort the output to view queries with multiple plans first.  There it was…they had a select number of queries that had multiple plans with huge variations due to different input parameters – queries that were parameter sensitive.

Viewing multiple plans for the same query in Query Store

Viewing multiple plans for the same query in Query Store

 

We used Query Store to force the most optimal plan and thus stabilize performance, then we looked at the query itself and the different plans being generated.  Within an hour we determined one code change and an index that would generate a consistent plan.  After testing these changes and implementing them, we unforced the plan, confirmed that the new plan we wanted was consistently used, and moved on to the next challenge.

Summary

If you’re running SQL Server 2016 or higher, I highly recommend enabling Query Store…even if you have a third party monitoring tool.  Don’t get me wrong, those tools are great and track a ton of information that Query Store doesn’t.  But those tools don’t capture EVERY query, nor do they capture query metrics all the time.  Lastly, they don’t provide the ability to force plans.  You get all that with Query Store, and more 🙂

If you’re interested in learning more about Query Store from the ground up, you’re in luck!  In two weeks I’m hosting a live, online Immersion Event for Query Store that you can attend from the comfort of your home (or your desk at work 🙂  You can find a course description and registration details here.  I hope you’re able to attend!