OK, well, I guess I should be more specific because as in most things in SQL Server, the real answer is “it depends”. And for these two options, it depends mostly on your SQL Server version. Since SQL Server 7.0, the way that auto update works, has changed (much so for the better!!). So, if you’re in SQL Server 2005 or SQL Server 2008, I would say most definitely – leave these ON (or if you turned them off – turn them back on!!!)! If you still have problems with a specific index causing your grief, then turn off auto update at the index level NOT at the database level. To turn of auto update at the index level use STATISTICS_NORECOMPUTE in the index (or statistics) definition.
Now, as for why (and how!) this has changed over the versions… here we go:
SQL Server 7.0
- Invalidation: Statistics were *invalidated* when a row modification counter (sysindexes.rowmodctr) was reached. This meant that they could not tell where the modifications were occuring and, if modifications were somewhat isolated to a specific column ALL of the statistics for the TABLE would be invalidated (so, statistics could be invalidated earlier than necessary)
- Updating: Even worse, in SQL Server 7.0, when statistics were invalidated, they were immediately updated. This caused two problems – thrashing at the time of invalidation because all of the stats needed to be updated AND two, if the statistics were not used for awhile then extra work was involved to update them and by the time they were used, they might already be somewhat out of date already.
SQL Server 2000
- Invalidation: Statistics were still invalidated based on a row modification counter.
- Updating: SQL Server 2000 fixed the “updating-potentially-too-often” problem by only updating statistics when they were needed.
SQL Server 2005
- Invalidation: The biggest changes were introduced in SQL Server 2005 where they decided to NO LONGER user the sysindexes.rowmodctr and instead use an internal (and undocumented) columns specific modification counter. Now, statistic invalidation is more isolated to only those columns which are heavily modified. This internal/undoc’ed column is sysrowsetcolumns.rcmodified and can only be seen when connecting to SQL Server using the DAC (Dedicated Admin Connection).
- Updating: Updating didn’t really change but, SQL Server 2005 added “Aynch Auto Update” for statistics so that when the QO (query optimizer) encounters an out-of-date (i.e. invalidated) statistic, they can “trigger” the update but not wait for the update (meaning that they’ll optimize using the out-of-date statistic). This can be both positive (faster) and negative (might not be the best plan if the statistics have changed drastically). It is off by default and IMO, I’d leave it off in most cases but if you find that auto update events (which can be Profiled) are causing you grief, then you can turn this on at the database level.
SQL Server 2008
Nothing new except “Filtered Statistics” and these are interesting as the density vector is still relative to the table (not the predicate) but the histogram is just over the predicate (OK, I know I’ll have to blog a lot more about this one!). Anyway, I’m still playing/learning a lot more about these and they make the most sense with filtered indexes (as opposed to just a filtered statistic) but, just like statistics on secondary columns you will also potentially want statistics on the secondary columns of your indexes. The next question is should they have a filter or not. I’ve found that sp_createstats doesn’t seem to create statistics with filters and I’m going to need to do some testing here but I think statistics with filters (filters that match the non-clustered index) should help to make the stats better (and even allow better usage of filtered indexes) but, I’m really going to need a bunch of time with this – and another post :). As for auto create/auto update – no changes there!
Long story short, if you’re using SQL Server 2005 or SQL Server 2008, you should leave auto create/auto update ON.
Thanks for reading!
PS – A few of you have mailed me about a bug in the sp_helpindex2 script(s). OK, that’s my next post!!! Possibly with an sp_helpstats2 script as well!
10 thoughts on “Auto update statistics and auto create statistics – should you leave them on and/or turn them on??”
In my experience, the sampling ratio of automatic updating statistics is inversely proportionate to the size of the column/index, such that while a column in small table (500MB) would get a 100% sampling ratio a column in a 100GB table could have a sampling ratio as low as 0.01% or even lower. This is so small as to be detrimental to performance as the QP generates a very inefficient plan to start and has a small possibility of coming up with a better plan partway through the operation.
My suggestion is to check when an auto update takes place and see what the sampling ratio was; if it is sufficient for your object size, leave it on. Otherwise, whatever method you have for reindexing (outside of maintenance plans) can include manual updating of statistics. One thing to remember with this method is that an index which is rebuilt also updates it’s statistics at 100% sampling, so the manual process should only need to update columns which are not also an index (and explicit stats, of course).
One 3+TB DW environment I worked in had auto_update stats set before I arrived, and the server crashed at least once per month. Once we turned off auto_update and created a robust process to perform reindexing and manual stats updating, performance improved dramatically and the server stopped crashing.
Of course, this isn’t an issue in smaller databases. :)
At the moment, I’ve turned off auto update stats option (SQL Server 2005) but update stats with fullscan for all tables as part of the nightly maintenance job. If I turn this option on, can I stop updating stats with fullscan as part of my daily maintenance job? Can I rely on SQL Server to maintain accurate stats or should I still run update stats with fullscan at least once a week?
On a related note, I have two servers with the same database on each. On one server the update stats with fullscan runs in one hour. On the second server (with more CPU, memory, more tempDB files on SAN drive). MaxDOp on the poorly performing update stats server = 4; MaxDop on the faster one is 32. Is this setting making the difference? If yes – would it be possible to provide a MaxDop hint when running update statistics <table_name> with fullscan?
If you really need to do an update with fullscan then your maintenance job might be a lot better than the update stats database option. And, if you’re updating stats manually, then it’s like that the auto update won’t have anything to do (you probably won’t meet the requirements (the threshold)) given that you update them so frequently. You still might want to update a few of the stats with a full scan but it completely depends on the data as to whether or not it’s needed. However, if your job is missing any of the stats then the database option might actually help. I don’t imagine it hurting. Not sure I followed the perf question – what’s the time on the second server?
Let me know!
PS – Sorry for the delay in my comments. I *just* found out that I had a bad setting and a HUGE number of unapproved comments. I’m going through and approving them as well as trying to comment on them. Again, sorry for the delay! Cheers.
The naming of the STATISTICS_NORECOMPUTE clause of an index definition is (to me at least) a little confusing, does "STATISTICS_NORECOMPUTE = ON" mean that the NORECOMPUTE functionality is ON (i.e. statistics will not be recomputed) or does STATISTICS_NORECOMPUTE = ON mean that statistics will be recomputed (as you would intuitively interpret this setting value to mean)? Sorry if this sounds a bit pedantic!
I’ve recently begun working on a set of databases where the indexes are seriously messed up. Too many indexes (dozens) on some tables, none on others, duplicate indexes, etc. One of the things I’ve noticed is that on a large number of the indexes (say 90%) the setting STATISTICS_NORECOMPUTE is turned on. There is a nightly job that updates the statistics across the board. To add to the fun the setting is applied inconsistently across the different environments. There are more indexes in the model environment with the setting turned on than any other, dev has the least. So my question is: Is there any risk to turning the setting off across the board or am I better off turning it on where needed to sync up the environments. (ie if it’s turned on in Model but not Prod do I turn it off in Model or on in Prod?)
Yeah, both are problematic (too few indexes, too many indexes). But, I do feel that too many is usually WAY worse than too few. So, I definitely feel your pain. Lots of things to help you – my duplicate index script (to see if any are actually completely/totally redundant), index consolidation strategies, and the DMVs (dm_db_index_usage_stats).
And, of course, start with your biggest tables first. That’s always going to be the best strategy.
Good luck and keep me posted!
I agree. My first task is to get the dev, test, model and production environments synced so that I can make index changes starting dev and not find they already exist in model, but not in prod. To that end I’m trying to decide what I should do with the STATISTICS_NORECOMPUTE setting. If I have an index where it’s turned on in model but not prod (or test) should I turn it on in the other environments, or off in model? Any suggestions or concerns I should look at?