Search Engine Q&A #10: Rebuilding Indexes and Updating Statistics


It seems like all I’ve been talking about on the forums the last couple of days is the correct order of operations in a maintenance plan. The biggest confusion seems to be about the effect of rebuilding an index on statistics, with some incorrect advice being given out on what to do.


Rebuilding an index will update statistics with the equivalent of a full scan – doesn’t matter whether you use DBCC DBREINDEX or ALTER INDEX … REBUILD, the effect is the same. It can do this because it sees a complete picture of the index while its doing the rebuild.


Reorganizing an index (using the old DBCC INDEXDEFRAG I wrote, or the new ALTER INDEX … REORGANIZE) will NOT update statistics at all, because it only sees a few pages of the index at a time.


The problem I’ve been seeing is people rebuilding indexes and then updating statistics. So why is this a problem? Well, it depends :-)



  • If your default for updating statistics is to do a sampled scan, then having rebuild update the statistics with a full scan (as a side effect), and then proceeding to manually update them again with the default sampled scan, means that after both operations you’re left with sampled statistics. You’ve wasted resources doing the sampled scan AND you’ve lost the ‘free’ full-scan statistics that the index rebuild did for you.
  • If your default is to do a full scan, then you don’t lose out on having the best statistics, but you do waste resources by unnecessarily updating statistics a second time.

So what’s the solution?


The simple answer is not to update statistics on indexes that have just been rebuilt.


The more complicated answer is to:



  1. Have a list of indexes (F) that you know will cause workload performance problems if they get fragmented
  2. Have a list of indexes (S) that you know will cause workload performance problems if the statistics don’t get regularly updated
  3. Evaluate fragmentation for list F, and choose to reorganize, rebuild, or do nothing
  4. For all indexes in list S that were not rebuilt in step 3, update statistics

Hope this helps.

38 thoughts on “Search Engine Q&A #10: Rebuilding Indexes and Updating Statistics

  1. Paul,

    I have once encountered a system that did not function well with ‘auto update statistics on’ and ‘auto create statistics on’. But that exception confirms the rule, in my opinion. Administrators should leave both ‘auto update statistics’ and ‘auto create statistics’ on. I never update statistics manually. Instead I rely on the full scan statistics that come for free with the index rebuilds and the auto update statistics that are triggered automatically when the contents of tables change.
    Is that a bad practice?

    Werner

  2. I have made a stored procedure that supports some of the logic in Paul’s blog.

    EXECUTE dbo.IndexOptimize @Databases = ‘USER_DATABASES’,
    @FragmentationHigh_LOB = ‘INDEX_REBUILD_OFFLINE’,
    @FragmentationHigh_NonLOB = ‘INDEX_REBUILD_ONLINE’,
    @FragmentationMedium_LOB = ‘INDEX_REORGANIZE_STATISTICS_UPDATE’,
    @FragmentationMedium_NonLOB = ‘INDEX_REORGANIZE_STATISTICS_UPDATE’,
    @FragmentationLow_LOB = ‘NOTHING’,
    @FragmentationLow_NonLOB = ‘NOTHING’,
    @FragmentationLevel1 = 5,
    @FragmentationLevel2 = 30,
    @PageCountLevel = 1000

    How does all this work?
    Indexes with a fragmentation over 30% are rebuilt, online if possible.
    Indexes with a fragmentation between 5% and 30% are reorganized and also have their statistics updated.
    Index with a fragmentation under 5% (or a size of under 1000 pages) are ignored.

    It is available on http://ola.hallengren.com.

  3. Hi Werner,

    Your strategy is perfectly good. Some people that have a skewed data distribution find they need to update stats manually but that’s not the norm.

    Thanks

  4. Hi Ola,

    Very cool. The only thing I would personally add is the ability to specify a table containing a list of object/index pairs to check the fragmentation rather than working against the whole database – but that would be the icing on the cake.

    Thanks for posting!

  5. Paul,

    What if you run sp_updatestats after index maintenance?
    According to BOL:’In SQL Server 2005, sp_updatestats updates only those statistics that require updating based on the rowmodctr information in the sys.sysindexes compatibility view; therefore, preventing unnecessary updates of unchanged items.

    Does this mean that this stored procedure will prevent unnecessary updates of statistics after index maintenance?

    Thanks,
    Igor

  6. Thanks, Paul. I have this feature on the drawing board.

    If you look at the solution as it is today, the parameter @Databases, works a little differently than you may think.
    It allows you to select groups of databases (e.g. ‘USER_DATABASES’ and ‘SYSTEM_DATABASES’) as well as
    lists of databases (e.g. ‘Db1,Db2,Db3’). It’s also possible to deselect databases (e.g. ‘USER_DATABASES,-Db1,-Db2’).

    Check out the documentation. Also feel free to use it.

    One implementation of the feature that you are asking for would be to be able to select and deselect,
    also schemas, tables and indexes (e.g. ‘USER_DATABASES,-Db1.dbo.Table1,-Db1.dbo.Table2’).

  7. Paul,

    Thanks for the insights you provided in this post. I especially like your comments about the downsides of blindly updating statistics after index rebuilds (both the extra run time and reduced statistics quality issues), and the preference for informed scheduling around the index rebuild and statistics update DB maintenance processes.

    I have a few questions regarding these DB maintenance processes and their impact on query plans for ongoing workloads (both in-flight queries and queries starting during the DB maintenance process operational windows):

    1. When do these DB maintenance process actions invalidate existing query plans in the plan cache (if at all) – at the start of the index rebuild / statistics update run, at the completion of the index rebuild / statistics update run, or not at all?

    2. Are queries able to use these resources (DB indexes, DB statistics) during the DB maintenance process run, or are they required to use query plans that exclude these resources (working at a disadvantage)?

    3. Are there any blocking issues for the start of either type of DB maintenance process if in-flight queries are using these resources (DB indexes, DB statistics) that are about to be rebuilt / updated?

    I have heard that some forms of index rebuilds (possibly online rebuilds only?) retain the existing index for use by query plans while the new index is rebuilt, allowing existing queries to use the current index for the duration (even if the current index is in less than ideal condition, it is likely still better than not having that index for use by other queries during the index rebuild).

    I was trying to find out if these rules may apply to statistics updates – for either automatically triggered or manual requested statistics updates. I recently read about the new feature in SQL 2005 called asynchronous automatic statistics update (AUTO_UPDATE_STATISTICS_ASYNC) that appears to promote this behavior (saving the old stats while the new stats are being built, and dropping the old stats and any query plans that use them when the new stats build is complete and cutover). This new feature is available at a database level (same as auto create stats and auto update stats in the past), but is disabled by default (unlike auto create stats and auto update stats, which are both enabled by default). I understand the behavior on automatically triggered statistics updates when using this feature. I am uncertain if this behavior also applies when a manual statistics update is requested when using this feature, or how the behavior during manual statistics updates might be different with this new feature enabled or disabled.

    We plan to use periodic manual DB statistics updates for an application, and wanted to determine if queries started during the manual DB statistics update run will block until the DB statistics update is completed (synchronous behavior), or will continue to process without blocking during the manual DB statistics update process run (asynchronous behavior) – either using the original outdated DB statistics until the DB statistics update process / step completes successfully or possibly without the benefit of any DB statistics.

    I tried researching the topic in Books Online and on the web, but this appears to be a less common detail not covered in traditional resources. If there is an easy reference I missed, I don’t mind being put in my place. Although the topic is obscure, I believe the answers to these questions can be useful information worth sharing with the community.

    Thanks for any feedback you can provide on this topic, and for all of the effort you and your colleagues at SQL Skills provide. It is greatly appreciated.

    Scott R.

  8. Hello
    After reading this article we disabled the Update statistics task and relayed on the full index rebuild step.
    we noticed that the statistics type that starts with _WA_Sys_0000000 is not updated, which could cause severe damage.
    Is it true?
    uziel Elon DBA

  9. Hi Paul,

    IF “Auto Update Statistics On” for 1 DB, will this feature invalidate recently updated stats by index rebuild.

    regards,

  10. This was years ago but I have a good statistics story. This was on a merge replicated DB. We had a fairly simple query that hit a table with a couple thousand rows and joined it to a table with a few hundred thousand rows. Simple, right? No big deal. No one would think twice about it…until that simple one-join query started taking 30 minutes to complete. Call the DBA!

    Step 1: Did someone write a cross join accidentally? – No
    Step 2: Look at the tables. Row counts? Indexes? – Nothing wrong here.
    Step 3: Check the execution plan. – Why does SQL think the table with a few hundred thousand rows has only 6 rows?
    Step 4: Update stats
    Step 5: Re-run the query and get results in under a second
    Step 5: Bask in the glow of superhero-dom
    Step 6: Wake up and make sure that all database settings (auto-update stats, etc)
    Step 7: Shake head and be glad I’ve spent time learning how to do my job – and that I have access to great resources like SQLSkills.

    Stats were WAY out of sync. SQL was choosing a very bad way to go through a fairly large table. Stats getting refreshed helped SQL Server choose more wisely.

    1. After an index rebuild, no, for the index column statistics, but yes for the non-index column statistics. After an index reorganize, yes, for all statistics as none of them are updated by the reorganize.

      It’s not the rebuild/reorganize that has any effect on the distribution – it’s all the inserts/updates/deletes since the last index/statistics maintenance.

  11. Thanks for the information Paul… after learning this, I think I will keep my Update Statistics maintenance tasks under closer scrutiny.

  12. Rebuild Index also update Index Stats but not Column Stats, I want to know and curious why Microsoft opted not to update column stats?

    Educating me on this is greatly appreciated.

    Thanks
    Regards

  13. As I understand from your post “Rebuild Index Task” automatically does “Update Statistics Task with Full Scan”. I’m facing a problem with “exec sp_executesql” that is running for a minute. Doing “Rebuild Index Task” without “Update Statistics Task with Full Scan” solves the problem if I do it twice. (Like first time it solves the problem, executing the “Rebuild Index Task” the second time will cause the problem again, the third time it helps and the fourth time the problem will appear again and so on). The execution plans for query differs when running the “Rebuild Index Task” first time and second time. Doing “Rebuild Index Task” with “Update Statistics Task with Full Scan” always solves the issue.

    1. It’s nothing to do with the index rebuild per se, it’s that you’re getting different plans in the cache. Every time the index is rebuilt, the plans using the index are recompiled. If the next execution gets a good plan, everything’s cool. Rebuild again, recompile. Next plan may not be so good. Check out Kimberly’s posts on plan cache and also her Pluralsight course on Ad Hoc Statement Performance where she explains and demonstrates this.

  14. Hi Paul,

    We have a requirement to a run a batch job once in a month that creates about 500 K accounts.
    This job takes about 6 hours to complete. Every time when i run this job , i observe that for the first 2 to 3 hours the account creation rate will be throttled at 20 per sec. After this initial period this rate increases suddenly to a higher rate of about 30 per sec. I am curios to understand why everytime the intital few hours are slow.
    Could you please provide your thoughts on this ?
    We are using SQL server 2008 R2

  15. Hi Randal,

    Is it OK to keep only update stats for db maintenance job(in our environment rebuild–>update stats–>check db)

    Thanks
    Balakrishna.B

  16. Hi Paul,

    I’ve Googled on this matter and I read several things.
    Am I right in my conclusion that before or after the rebuild indexes job, update statistics is not necessary?
    Is it advisable to do a full scan on statistics, after the rebuild, for column stats only?

    1. Nope. Having badly out-of-date stats can affect the rebuild itself and make it take longer. And you need to decide how often to update stats for non-index columns – I can’t answer that for you.

  17. Hello Mr. paul, I have 2 questions to ask you.
    1. Is it OK to run sp_updatestats after rebuild index job?
    2. To claim space after massive deletion of rows in a table, should I perform shrink database or shrink files task?
    Thank you

    1. 1) You should perform maintenance on non-index column statistics – I’m not a stats expert so I can’t advise you on the best way to do that.
      2) Ideally neither – and you’d move data into a new filegroup. But I’d do it just on the files in the filegroup where the table is, then alter index … reorganize all indexes in those files to remove fragmentation from the shrink.

  18. Very helpful read ,

    We have a situation here..

    have the almost 2 TB and growing DW DB servers and the update stats is mandatory in our case. and also we are creating the table partitions and after the partition we rebuild index , but still we see the stats are outdated. (Actual Number of Rows and the Estimated Number of Rows) and so we are updating them using another maintenance job with default or 50% sampling , don’t know what should be the determined value to use while sampling with value ??? could any one help me understanding the sampling value calculation.. effort much appreciated.
    Using EXEC sp_updatestats @resample = ‘resample’ ..

    Any advise will be very helpful. Efforts much more appreciated.

    1. If you want stats to be accurate, you’ll need to use a full scan instead of a sample. After your index rebuild, are you sure the stats are outdated? Are you checking that the actual statistics are different after an update statistics with a full scan? With Actual vs Estimated in the query plans, there could be a bunch of other things leading to incorrect cardinality estimates.

  19. Hello Paul,

    My name is Krishna and i’m currently working for a medical client. Recently one of our application vendor recommended us to update stats on the tables where the fragmentation percentage is below 5% to overcome performance issues. We have reorg job for 5 to 22% fragmented indexes and rebuild job for >= 22% fragmented indexes in PROD DB servers.

    I never seen this kind of requirement in my experience. Is that really give any benefit? Can i achieve this using OLA’s script?

    2008 R2 is our SQL version.

    Appreciate your inputs on this.

    Thanks-Krishna

    Thanks

    1. It’s impossible to say whether updating statistics will help your workload without knowing whether you have bad plans from the statistics being out-of-date. However, my guess is that they’re recommending this as a general guidance without really knowing that it will help your problems. You can do statistics updates using Ola’s scripts – yes – see the documentation for details.

  20. Hi Paul,

    Sorry for asking on a thread that is old. But I feel its sill relevant. I have a database with 800gb in SQL 2012 and running using native maint. plan with rebuild index online, schedule is every week followed by Updating statistics (Column statistics only with full scan).

    1. Does rebuild index take care of column statistics if so then I can disable my update stats(for column stats) step.
    2. If not can I leave as is?
    3. I do not have any complaints so far but the duration of these two tasks is now 7 hours and also wanted to make sure if the current process is correct. Please correct me on this. Thanks.

  21. Is there any scenario where statistics would not be updated when an index is rebuilt? Say you have a partitioned table with incremental stats? Or, is there any reason SQL server would not update the last_updated column in [sys].[dm_db_stats_properties] when index is rebuilt or statistics updated?

    1. Index-column statistics will always be updated by a rebuild, but depending on version, may be sampled for a partitioned index rather than the equivalent of a full scan. I can’t think of a reason for not updating that column, unless it’s not an index-column statistic.

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.