(Look in the Misconceptions blog category for the rest of the month’s posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together when you join our Insider list, plus my online Myths and Misconceptions training course on Pluralsight.)
Another quickie but goodie before the finale tomorrow!
Myth #29: fix heap fragmentation by creating and dropping a clustered index.
Nooooooooooooo!!!
This is just about one of the worst things you could do outside of shrinking a database.
If you run sys.dm_db_index_physical_stats (or my old DBCC SHOWCONTIG) on a heap (a table without a clustered index) and it shows some fragmentation, don't EVER create and drop a clustered index to build a nice, contiguous heap. Do yourself a favor and just create the well-chosen clustered index and leave it there – there's a ton of info out there on choosing a good clustering key – narrow+static+unique+ever-increasing is what you need. Kimberly has a blog post from 2005(!) that sums things up: Ever-increasing clustering key – the Clustered Index Debate……….again! and I've got An example of a nasty cluster key.
Yes, you can use ALTER TABLE … REBUILD in SQL Server 2008 to remove heap fragmentation, but that is almost as bad as creating and dropping a clustered index!
Why am I having a conniption fit about this? Well, every record in a nonclustered index has to link back to the matching row in the table (either a heap or clustered index – you can't have both – see Kimberly's recent SQL Server Magazine blog post What Happens if I Drop a Clustered Index? for an explanation). The link takes the form of:
-
if the table is a heap, the actual physical location of the table record (data file:page number:record number)
-
if the table has a clustered index, the clustering key(s)
The blog post link at the bottom of this post explains in a lot more detail.
If you create a clustered index, all the linkages to the heap records are no longer valid and so all the nonclustered indexes must be rebuilt automatically to pick up the new clustering key links. If you drop the clustered index again, all the clustering key links are now invalid so all the nonclustered indexes must be rebuilt automatically to pick up the new heap physical location links.
In other words, if you create and then drop a clustered index, all the nonclustered indexes are rebuilt twice. Nasty.
If you think you can use ALTER TABLE … REBUILD in SQL Server 2008 to fix heap fragmentation, you can, but it causes all the nonclustered indexes to be rebuilt as the heap record locations obviously change.
Now, what about if you *rebuild* a clustered index? Well, that depends on what version you're on and whether you're doing a simple rebuild or changing the definition of the index. One major point of misconception is that moving a clustered index or partitioning it changes the cluster keys – it doesn't. For a full list of when the nonclustered indexes need to be rebuilt, see Indexes From Every Angle: What happens to non-clustered indexes when the table structure is changed?
13 Responses to A SQL Server DBA myth a day: (29/30) fixing heap fragmentation
Also, in 2008 you can use ALTER TABLE … REBUILD on a heap in order to defragment it
Yes, but that changes the heap record locations, which causes all the nonclustered indexes to be rebuilt. Not good.
Ok, but could be situations where you don’t have nc indexes and the table must be a heap. Don’t know why, really very strange situation, but the fact is that in 2008 we have another option to create&drop a clustered index on a heap (with better performance, I suppose).
Don’t misunderstand me, I’m agree with about always have a clustered index. I’m just suggesting an option when we don’t have option other than a heap
Ok…if I don’t have the option to leave a clustered index on a table that was originally heap, how can I remove fragmentation in the heap? Typically, I have dropped the non-clustereds, created the clustered index, dropped it, then re-created the non-clustereds. I realize I may get yelled at, but this is not something that is done frequently (once, if at all) and when it has to be done, what’s the best method?
Thanks,
Erin
I don’t understand how you can not have the option to leave the clustered index on the table, but the support agreement or whatever allows you to temporarily create one… that aside, yes, that’s the best way to do it unless you’re on 2008.
As the vendor, I can create and drop the index without violating the support agreement. Even though I think a clustered index would be good, if I don’t have sign off from development (or enough proof myself), I can’t leave it there. I don’t love it, but it’s the best I can do right now. If I’m on 2008, the ALTER TABLE … REBUILD is better than creating and dropping the clustered index (assuming drop non-clustereds before, recreate after)?
Erin
Yes, I know your situation Erin only too well – only so much proof we could offer when we were there ;-)
Paul,
Either creating clsutered index or Alter Table .. rebuild, if one can afford rebuilding nonclustered index what else problem you see with that approach.
Thanks.
So from this I gather that you think there shouldn’t be any heap tables. Do you have any guidelines on when a heap would be a better choice than just adding an int IDENTITY field as the clustered PK? (Assuming that the data doesn’t contain narrow+static+unique+ever-increasing data that can be used for the PK.)
Very, very few circumstances where heaps are better than clustered indexes (e.g. very fast massive data loading). If the data doesn’t naturally contain a good PK, create a surrogate one. Read through Kimberly’s blog posts I link to in my post – she has lots of good info there.
[...] A SQL Server DBA myth a day: (29/30) fixing heap fragmentation [...]
Paul said:”Yes, but that changes the heap record locations, which causes all the nonclustered indexes to be rebuilt. Not good.”
But what’s the big deal about rebuilding the nonclustered indexes? If the task was running every 30mins I could understand, but you don’t say that.
Paul – the problem is that rebuilding all the nonclustered indexes may be a very large operation, if the table is large and there are many nonclustered indexes. It generates a lot of transaction log, which can cause performance problems for anything that has to deal with sending log to a remote system or scanning the log.