Hopefully by now, anyone working with SQL Server as a profession is aware of how important statistics are. If you aren’t sure why, or you need a refresher, check out Kimberly’s blog post “Why are Statistics so Important“. Here’s a hint, the query optimizer uses statistics to help make decisions when creating an execution plan. Out-of-date statistics can mean a much less optimal plan.
Erin Stellato wrote an excellent blog post “Updating Statistics with Ola Hallengren’s Script” where she covers updating statistics with database maintenance plans, T-SQL, and using Ola Hallengren’s Index Optimize script. In this article Erin discusses various options for updating statistics and how using an approach of only updating statistics that have had row modifications is less invasive. She further explains how in SQL Server 2008R2 SP2 and SQL Server 2012 SP1, Microsoft introduced the sys.dm_db_stats_properties which tracks modifications for each statistic.
This new DMV allows users to build logic into their processes to only update statistics after a specific percentage of change has occurred. This can further reduce the overhead of updating statistics.
I’ve been using Ola Hallengren’s Index Optimize procedure for over decade as well as recommend his process for my clients. The Index Optimize procedure has logic built in to deal with index fragmentation based on the percentage of fragmentation. This allows you to not worry about minimal fragmentation, reorganize if the indexes aren’t heavily fragmented, and rebuild if the fragmentation level is over a certain threshold. It is a better process than blindly reorganizing or rebuilding all indexes regardless of their fragmentation level.
Over the years, I’ve found that many clients as well as DBA’s aren’t aware that reorganize does not update statistics, whereas an index rebuild does. If you are using logic to reorganize and rebuild, without a separate statistics update process, your statistics may be slowly aging and could become problematic. Fortunately, Ola’s Index Optimize procedure allows for passing parameters to update statistics. These include:
Typically, I’ve only had to configure @UpdateStatics = ‘ALL’ and @OnlyModifiedStatistics = ‘Y’. This would update all statistics if there have been any row modifications. This has worked for me, my previous employer, and my clients for many years.
Recently I encountered an issue with a very large database with numerous tables with 100’s of millions of rows each as well as hundreds of other tables with a million or less rows. Every table would have some level of data change per day. This was causing statistics to be updated nightly on nearly every table, even those with minimal change. This took extra time and was generating additional IO that the customer needed to minimize. I updated the process to change @OnlyModifiedStatistics to @StatisticsModificationLevel = ‘5’ for 5%. Now statistics will only be updated after a 5% data change, or so I thought.
When I made this change and reviewed what would now be updated, I was surprised to see that some large tables were going to have statistics updated. I noticed that several large tables were listed, even though the modification counter was well below the 5% threshold. It turns out that statistics will be updated when the number of modified rows has reached a decreasing, dynamic threshold, SQRT(number of rows * 1000). SQRT = Square Root. For example, a table with 9,850,010 rows at 5% would be 492,500 rows, however the modification counter was only 134,017 rows. If we plug 9,850,010 into SQRT(9,850,010 * 1000) = 99,247, which is well below the 492k value. Is this a bad thing? Absolutely not, 99,247 is still much larger than 1. By only use @OnlyModifiedStatistics, this nearly 10M row table would have statistics updated after a single modification.
Having @StatisticsModificationLevel is a nice tool to have at your disposal for those situations where you need to fine tune your maintenance process.