Index rebuilds depend on stats, which are updated by index rebuilds?!?

Jack Li, one of the Senior Escalation Engineers in Product Support, just posted details of an interesting case over on the CSS blog – his article is here. It talks about index builds and rebuilds, but the issue is the same for both, so I'll just talk about rebuilds.

The jist of the problem is that index rebuilds can parallelize, but sometimes they don't parallelize vey well. Each thread gets a certain range of the index to rebuild, using the existing index statistics to divide the ranges equally between the threads. If there's massive data skew, then one thread can end up doing the majority of the work, leading to a long run-time. The case in Jack's post involved a 250 million row index where 150 million rows had the same (NULL) key value. This range has to be processed by a single thread – a single value can't be divided between two+ threads.

Now, this is understandable behavior by the database engine, but it relies on the statistics being up-to-date. That's a bit of a catch-22 – rebuilding an index updates the statistics, but if the statistics aren't up-to-date then the index rebuild might parallelize badly! I guess the solution is that if you know that you have massive data skew in your large indexes, update statistics BEFORE doing an index rebuild. And given what I've been hearing this week at SQL Connections about how badly statistics keep biting people, I'm leaning towards a different recommendation for those people who have lots of perf trouble caused by statistics and the potential for skewed data – rebuild all your statistics regularly, and only rebuild/reorganize fragmented indexes. Statistics just cause so many problems it seems.

Thanks

PS Kimberly has a lot more info about statistics over on her blog – I'm just starting to venture into that mine-field

Leave a Reply Cancel reply

Other articles

2024: the year in books

Back in 2009 I started posting a summary at the end of the year of what I read during the year and people have been

2023 (not a typo!): the year in books

(Yes, this is for last year!) Back in 2009 I started posting a summary at the end of the year of what I read during

Black Friday Super Sale!

It’s the time of year again where companies do crazy sales on the day after Thanksgiving in the US, and we’re doing the same! All

Summer School Savings SALE is live!

Happy Fourth of July to all who celebrate! I’m looking forward to a great view Thursday evening looking west along the Skagit Valley with some

The Curious Case of… missing log information

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more

The Curious Case of… missing 832 errors during memory corruption

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.