A SQL Server DBA myth a day: (29/30) fixing heap fragmentation

(Look in the Misconceptions blog category for the rest of the month’s posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together when you join our Insider list, plus my online Myths and Misconceptions training course on Pluralsight.)

Another quickie but goodie before the finale tomorrow!

Myth #29: fix heap fragmentation by creating and dropping a clustered index.

Nooooooooooooo!!!

This is just about one of the worst things you could do outside of shrinking a database.

If you run sys.dm_db_index_physical_stats (or my old DBCC SHOWCONTIG) on a heap (a table without a clustered index) and it shows some fragmentation, don’t EVER create and drop a clustered index to build a nice, contiguous heap. Do yourself a favor and just create the well-chosen clustered index and leave it there – there’s a ton of info out there on choosing a good clustering key – narrow+static+unique+ever-increasing is what you need. Kimberly has a blog post from 2005(!) that sums things up: Ever-increasing clustering key – the Clustered Index Debate……….again! and I’ve got An example of a nasty cluster key.

Yes, you can use ALTER TABLE … REBUILD in SQL Server 2008+ to remove heap fragmentation, but that is almost as bad as creating and dropping a clustered index!

Why am I having a conniption fit about this? Well, every record in a nonclustered index has to link back to the matching row in the table (either a heap or clustered index – you can’t have both – see Kimberly’s recent SQL Server Magazine blog post What Happens if I Drop a Clustered Index? for an explanation). The link takes the form of:

  • If the table is a heap, the actual physical location of the table record (data file:page number:record number)
  • If the table has a clustered index, the clustering key(s)

The blog post link at the bottom of this post explains in a lot more detail.

If you create a clustered index, all the linkages to the heap records are no longer valid and so all the nonclustered indexes must be rebuilt automatically to pick up the new clustering key links. If you drop the clustered index again, all the clustering key links are now invalid so all the nonclustered indexes must be rebuilt automatically to pick up the new heap physical location links.

In other words, if you create and then drop a clustered index, all the nonclustered indexes are rebuilt twice. Nasty.

If you think you can use ALTER TABLE … REBUILD to fix heap fragmentation, you can, but it causes all the nonclustered indexes to be rebuilt as the heap record locations obviously change.

Now, what about if you *rebuild* a clustered index? Well, that depends on what version you’re on and whether you’re doing a simple rebuild or changing the definition of the index. One major point of misconception is that moving a clustered index or partitioning it changes the cluster keys – it doesn’t. For a full list of when the nonclustered indexes need to be rebuilt, see Indexes From Every Angle: What happens to non-clustered indexes when the table structure is changed?

40 thoughts on “A SQL Server DBA myth a day: (29/30) fixing heap fragmentation

  1. Ok, but could be situations where you don’t have nc indexes and the table must be a heap. Don’t know why, really very strange situation, but the fact is that in 2008 we have another option to create&drop a clustered index on a heap (with better performance, I suppose).
    Don’t misunderstand me, I’m agree with about always have a clustered index. I’m just suggesting an option when we don’t have option other than a heap

  2. Ok…if I don’t have the option to leave a clustered index on a table that was originally heap, how can I remove fragmentation in the heap? Typically, I have dropped the non-clustereds, created the clustered index, dropped it, then re-created the non-clustereds. I realize I may get yelled at, but this is not something that is done frequently (once, if at all) and when it has to be done, what’s the best method?

    Thanks,

    Erin

  3. As the vendor, I can create and drop the index without violating the support agreement. Even though I think a clustered index would be good, if I don’t have sign off from development (or enough proof myself), I can’t leave it there. I don’t love it, but it’s the best I can do right now. If I’m on 2008, the ALTER TABLE … REBUILD is better than creating and dropping the clustered index (assuming drop non-clustereds before, recreate after)?

    Erin

  4. I don’t understand how you can not have the option to leave the clustered index on the table, but the support agreement or whatever allows you to temporarily create one… that aside, yes, that’s the best way to do it unless you’re on 2008.

  5. Paul,
    Either creating clsutered index or Alter Table .. rebuild, if one can afford rebuilding nonclustered index what else problem you see with that approach.
    Thanks.

  6. So from this I gather that you think there shouldn’t be any heap tables. Do you have any guidelines on when a heap would be a better choice than just adding an int IDENTITY field as the clustered PK? (Assuming that the data doesn’t contain narrow+static+unique+ever-increasing data that can be used for the PK.)

  7. Very, very few circumstances where heaps are better than clustered indexes (e.g. very fast massive data loading). If the data doesn’t naturally contain a good PK, create a surrogate one. Read through Kimberly’s blog posts I link to in my post – she has lots of good info there.

  8. Paul said:”Yes, but that changes the heap record locations, which causes all the nonclustered indexes to be rebuilt. Not good.”

    But what’s the big deal about rebuilding the nonclustered indexes? If the task was running every 30mins I could understand, but you don’t say that.

    1. Paul – the problem is that rebuilding all the nonclustered indexes may be a very large operation, if the table is large and there are many nonclustered indexes. It generates a lot of transaction log, which can cause performance problems for anything that has to deal with sending log to a remote system or scanning the log.

      1. Dropping and recreating the NC’s would cause the same or similar amount of log generation, wouldn’t it?

  9. OK. I understand why it’s bad to use ALTER TABLE to rebuild a heap. But what if you have this case: You have a heap that uses page compression. Data is bulk loaded into it without the TABLOCK option (third party program – I can’t change this). If I read http://msdn.microsoft.com/en-us/library/dd894051%28v=sql.100%29.aspx correctly, if a bunch of new rows are inserted (i.e. they go onto new data pages), the data will only be row compressed. In order to get page compression for that new data, I have to rebuild the heap with page compression. Does that sound right?

  10. We have a similar situation as Shaun. SQL 2012, massive bulk insert into “daily” (heap) table with PAGE compression (~100 GB, ~600 mio rows), create the nonclustered columnstore index (no other indexes on the table), partition switch into main (heap) table. The bulk insert is performed without the TABLOCK option (multiple parallel SSIS destinations), so the actual compression is ROW, right? So, is ALTER TABLE … REBUILD our best option to achieve actual PAGE compression?
    A similar question: Some partitions in the main table get a substantial number of rows deleted after some time. In such case, is “ALTER TABLE REBUILD PARTITION = WITH (DATA_COMPRESSION = PAGE)” the optimal option to deal with logical fragmentation and page density?

    1. Sorry, used the wrong brackets, got mistaken for HTML tags… the last alter command was supposed to be
      “ALTER TABLE main_table REBUILD PARTITION = partion_number WITH (DATA_COMPRESSION = PAGE)”

      1. Yes, and yes, I believe. Although you may find that the partition with lots of deletes has those pages automatically recompressed as page compression will automatically recompress a page after a certain amount of updates to a page.

  11. Paul,

    So these tiny little look up tables with 5 rows does a Clustered benefit the performance? I have tested and seen no impact by adding a clustered on a tiny little lookup?

    1. Nope – highly unlikely it’s going to make any difference unless the rows span five pages and each lookup is a scan of five pages. The point of the post is not to create and drop a clustered index to fix heap fragmentation, which a tiny table won’t have. Thanks

  12. Paul,

    You say that NC index will be rebuild if you create and drop de index in HEAP TABLE, it is bad.
    In my case, i had tables without clustered index and nc index, are HEAP tables.
    When i execute shrink database this move row by row until find the blank space near inicial file, this cause increase fragmentation in HEAP TABLES ? Can i verify the fragmentation about HEAP tables ?

    Thank you master.

    1. The way you defrag nonclustered indexes is the same, whether they’re on a heap or a clustered index. If you have heap fragmentation that’s causing you a perf issue, consider creating a clustered index. If you can’t create a clustered index, you could do an ALTER TABLE … REBUILD, which will rebuild the heap, but also rebuild all nonclustered indexes as a side-effect.

  13. I tested this today 20150208. While there was somewhat less fragmentation after I created the clustered index and dropped it, it wasn’t enough. Paul R is correct in this as I attempted on multiple tables. The indexes dropped from 70 to 30 at best and on one set, the index actually became more fragmented. Thanks for the point Paul R on having to rebuild the non-clustered indexes afterward. That’s an extremely good point.

  14. Hi Paul
    The misconception post is great reference, I always get something new or a better perspective, every time i visit these pages. So thanks!

    About DON’T on Heaps:
    – While you completely advise against CI create/drop approach to fix a heap fragmentation. However in SQL BOL , I can see this as recommendation to manage heaps (reclaim heap space) –> https://msdn.microsoft.com/en-us/library/hh213609.aspx

    – Though there a note of caution, which my clients ignore safely and run into issues you’ve described.

    Can BOL team get this removed or re-worded? Thoughts?

    Thanks, Varun

    1. Heaps are a fact of life. There are many situations where we have to deal with them and don’t have control over the design.

      What I have been doing for many years now is to disable the NC indexes against the table first then rebuild/reorg/defrag the HEAP and then rebuild the disabled NC indexes. This gets around all of the issues you mention above and retains the index meta data by disabling them instead of dropping them.

      I do the same process for tables with the CI as long as the Heap or CI meets threshold levels. If not then my process only works on the NC indexes w/o disabling anything.

      1. Agreed. Yes, given that the table is offline anyway, disable+rebuild reduces the number of NC index rebuilds to one, if you have to stay with a heap.

        Btw, don’t do your approach for clustered indexes. Rebuilding a clustered index doesn’t touch the NC indexes at all so no need to disable+rebuild them.

  15. The cost of having to rebuild the nonclustered indexes has been exagerrated. If you have 300GB heap where 2% of each page is being used you need to rebuild heap. If it rebuild nonclustered indexes so waht. And as said heaps are commonly used.

    1. The cost hasn’t been exaggerated for rebuilding the nonclustered indexes – it’s more downtime for the table if doing it offline, it’s a lot of extra transaction log generation, plus buffer pool usage, and so on. Of course, the cost is proportional to the size of the table and the number of nonclustered indexes. You may not agree, but I’m not exaggerating. If you have a 300GB heap and your access pattern continually empties out the pages, you’d likely be better off with a clustered index that can be rebuilt or reorganized without having to touch the nonclustered indexes, except in the niche cases where a heap performs better than a clustered index based on the workload, but they’re not that common. Heaps are commonly used – and used incorrectly, where a clustered index would be better.

  16. Hi Paul

    I have a maintenance plan of rebuild, reorganize using fragmentation thresolds.

    How do I handle heap fragmentation using maintenance plan or SQL jobs. I am using SQL 2014.

  17. Hi Paul,
    You mean, “Alter Table Rebuild” command should be Execute for all type of index De-fragmentation correct?

  18. I’m a bit reluctant about adding a permanent clustered non-unique index to a heap in a way that developers are not aware of it. I understand that unless schema changes, there is no risk in doing so, but table owner (developer, vendor) is not going to be aware of if. So they may issue changes to this table and run into a problem because they didn’t expect it.

    If you explicitly drop all nonclustered indexes then create/drop a clustered index, and then recreate the nonclustered indexes, I think the end result is not going to be so bad isn’t? I assume that rebuild nonclustered indexes by dropping/recreating then is less “chatty” against the database structure, so it would make fewer writes to log.

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.