SQLskills SQL101: REBUILD vs. REORGANIZE

As Kimberly blogged about earlier this year, SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

Often when discussing index fragmentation I find that people aren’t aware of all the differences between rebuilding an index (using ALTER INDEX … REBUILD) and reorganizing an index (using ALTER INDEX … REORGANIZE), so that’s the topic for this post. Everything I say below applies to row-based indexes only (i.e. not columnstore indexes), and equally to a clustered index or a nonclustered index.

Space Required

Rebuilding an index requires building a new index before dropping the old index, regardless of the amount of fragmentation present in the old index. This means you need to have enough free space to accommodate the new index.

Reorganizing an index first squishes the index rows together to try to deallocate some index pages, and then shuffles the remaining pages in place to make their physical (allocation) order the same as the logical (key) order. This only requires a single 8-KB page, as a temporary storage for pages being moved around. So an index reorganize is extremely space efficient, and is one of the reasons I wrote the original DBCC INDEXDEFRAG for SQL Server 2000 (the predecessor of ALTER INDEX … REORGANIZE).

If you have space constraints, and can’t make use of single-partition rebuild, reorganizing is the way to go.

Algorithm Speed

An index rebuild will always build a new index, even if there’s no fragmentation. The length of time the rebuild takes is related to the size of the index, not the amount of fragmentation in it.

Reorganizing an index only deals with the fragmentation that exists, and the more fragmentation there is, the longer a reorganize will take.

This means that for a lightly fragmented index (e.g. less than 30% fragmentation), it’s generally faster to reorganize the index, but for a more heavily fragmented index, it’s generally faster to just rebuild the index. This is why you may have seen thresholds of 0 to 5-10% do nothing, 5-10% to 30% reorganize, 30%+ rebuild. I created this guidance while I was at Microsoft – see here.

Transaction Log Generated

In the FULL recovery mode, an index rebuild is fully logged, so the transaction log will have to accommodate the full size of the index in a single transaction. This also means the entire generated transaction log may need to be mirrored, sent to your AG replicas, scanned by replication, backed up, and so on.

In the SIMPLE and BULK_LOGGED recovery modes, the amount of transaction log generated by an offline index rebuild will be minimal (online index rebuild is always fully logged) – just the allocations of pages and extents. However, the next log backup performed (either in BULK_LOGGED or after switching to FULL) will also contain all the extents changed by the rebuild, and so the log backup will be roughly the same size as if the rebuild was done in the FULL recovery mode. The benefits are in time and the fact that the transaction log itself does not have to accommodate the full size of the index during the rebuild in a single transaction.

In all recovery modes, reorganizing an index is fully logged, but is performed as a series of small transactions so should not cause the transaction log to grow inordinately. And of course, transaction log is only generated for the operations performed, which may be less for a reorganize as it only deals with fragmentation that exists.

Locks Required

An offline index rebuild of any index holds a schema-modification (i.e. super-exclusive) table lock – no updates or reads of the entire table.

An online index rebuild of any index acquires a short-term shared table lock at the start of the operation, holds an intent-shared table lock throughout the operation (which will only block exclusive and schema-modification table locks), and then acquires a short-term schema-modification  table lock at the end of the operation. ‘Online’ is a bit of a misnomer. From SQL Server 2014, you can use the WAIT_AT_LOW_PRIORITY option to delay the potential for blocking – see this blog post.

An index reorganize holds an intent-exclusive table lock throughout the operation, which will only block shared, exclusive, and schema-modification table locks. One of the major reasons I wrote DBCC INDEXDEFRAG for SQL Server 2000 was as an online alternative to DBCC DBREINDEX.

Interruptible or Not

An index rebuild operation cannot be interrupted without it rolling back everything it’s done so far – it’s atomic – all or nothing. In SQL Server 2017, however, there is a resumable-online index rebuild feature.

An index reorganize can be interrupted and the worst that will happen is that a single page move operation is rolled back.

Progress Reporting or Not

Index rebuilds do not have proper progress reporting. You can hack it for online index operations by looking at the bigintdata1 column in the Progress Report: Online Index Profiler event, which happens to show how many rows of the old index have been scanned. You can also hack it for index operations by looking at the number of page reads the SPID has done in sys.dm_exec_requests.

Index reorganize operations populate the percent_complete column of sys.dm_exec_requests so you can easily gauge how much work remains. In fact DBCC INDEXDEFRAG also used to do progress reporting, but less elegantly, by printing a progress message to your connection every 30 seconds.

Statistics

An index rebuild will always rebuild the index column statistics with the equivalent of a full scan (or sampled, for an index partition or if the index is partitioned).

An index reorganize does not see a total view of the index and so cannot update statistics, meaning that manual index statistics maintenance is required.

Summary

As you can see, there are quite a few major differences between rebuilding and reorganizing, but there’s no right answer as to which one you should use – that’s your choice.

If you have an index maintenance routine that always rebuilds and never considers reorganizing, you should reconsider. It’s usually better to reorganize a lightly fragmented index and rebuild a more heavily fragmented index – to save time and resources. You’ll find that most index maintenance products and freely-available scripts allow you to make that choice.

And as always, rather than writing your own index maintenance solution, I recommend Ola Hallengren’s free code (yes, other people have done similar, but I think Ola’s is by far the best and most widely used).

40 thoughts on “SQLskills SQL101: REBUILD vs. REORGANIZE

  1. Good post. Really minor edit though – you wrote:

    “An index rebuild operation cannot be interrupted”

    It can definitely be interrupted, but I think you meant it can’t be resumed. (You can definitely kill or abort index rebuilds.)

  2. Hi Paul, great article as usual. I have a somewhat off-topic question. If reducing index fragmentation only makes scans faster, then wouldn’t it be a good idea to filter your list of indexes (to be defragged) using “select * from sys.dm_db_index_usage_stats where user_scans > 0 “. It seems to me that all other indexes are a waste of cycles. True this DMV is reset periodically but results could be archived periodically.

    1. No, because it’s not just about making scans more efficient. Low page density makes buffer pool usage inefficient, wastes disk space, and I/Os, and then the page splits themselves that cause fragmentation cause a lot of extra log to be generated and are a perf hit themselves. You have to consider the overall perf hit from fragmentation when deciding which indexes to take care of.

  3. Hi Paul,

    Can you please confirm if in Simple and Bulk Recovery Model Index Rebuild is minimally log, what I understand is that after SQL 2008 Index Rebuild is full logged irrespective of Recovery Models.

  4. Thanks Paul. It’s a great article. I have one doubt. Can you please confirm me reorganize is single thread operation regardless of sql server version and edition or it could be multi thread somehow?. The same is with rebuild process, Rebuild is multi thread in enterprise editions only and single thread in all other editions.

  5. Hi Paul,

    Great article, thanks for that. I have noticed that if I Reorganize an Index, at the end I have more free space than if I had Rebuilt that Index. Can you confirm if that’s correct? I checked on Internals and it says the reorg “compacts the leaf-level pages back to their originally specified fillfactor”. I wonder if that’s the reason or if there is something I a missing here. If that’s the case, is there any drawback in having those pages compacted compared with index Rebuild?
    Thanks a million.

  6. when we rebuild index with online and resumable options and we pauses the rebuild index at another session, the below query shows a very large number of rows.

    select * from fn_dblog(null , null)

    when we take a log backup from database, the fn_dblog shows a small number of rows, but we can resume rebuild the index or even Abort it, After getting a log backup.

    How does the SQL server do this?

    1. They’re kind of the same thing. Abort means a query was killed for some reason. Rollback means an aborted/aborting transaction has to undo the changes it made.

  7. It is nice to get some understanding on the options when rebuilding indexes.

    I have some indexes in SQL Server 2012 that are small. That is less than 10 pages. It appears that these are not rebuilt since they appear to keep their fragmentation level regardless of which method I use to defragment them.

    Is there a low size limit for rebuilding the indexes?

    I tend to use “alter index all on [tableName] rebuild;”

  8. Hi Paul,
    Under what circumstances would you use the command “ALTER TABLE… REBUILD” to free up space please?
    And how is this different to DBCC DBREINDEX?
    Thanks

  9. Hi Paul,
    for index with Page Count=7,630,209 and Fragmentation=11%, rebuild 30 min but reorganize 40 days!!, why ?
    Thanks

      1. Thank you for your responce
        There is no blocking and the reorganize wait types are: “hadr_sync_commit” & “writelog”.
        Is it right if I say the reason for this time difference maybe for parallel rebuild indexes vs single reorganize indexes or synchronous alwayson?

        1. No – having written the reorganize algorithm, there’s no way it should be taking 40 days to reorg a 58GB index – more like a few hours. There’s something going on but without looking at the server, it’s impossible to say.

          1. Hi Paul,

            Any suggestion on what to check in that scenario?
            I have similar situation: PaaS on Azure (VM with SQL Server 2016 EE installed), AlwaysON with secondary in another datacenter, large cluster index (+600GB), and reorg is taking 6 days and showing 59% completed only, wich means will collison with next saturday execution.

            I could see waits like HADR_SYNC_COMMIT and WRITELOG, nothing weird apparently.
            Log file is not growing like crazy.
            Resources seem to be normal (cpu under 40%, memory is 90% used).
            No blocking with users requests. Transactions working only on the cluster mentined.

            I’m kinda little lost here.
            Any guiadance/tip/suggestion/magic/conjuring/summon would be great.
            :)

            Thansk in advance.

  10. Dear Paul

    Index ReOrganize is done with a Single thread and Index Rebuild is done with the MAX DOP number of CPUs set in the server settings as you have also mentioned in your great article. In my tests Rebuild is many folds faster than Reorganize even for the fragmentation levels lower than 15 percent.
    I believe the Index maintenance type should favor Index Rebuild when you have the resources available.

    Please comment.

  11. This code below can be used to get an idea of what the progress is on a index rebuild and index create (clustered and non-clustered) on a table object. Not perfect when using MAXDOP > 1, the parallel threads may not have the same number of rows to operate on so it can be off as the threads drop off on the estimated time. Works well enough for me. I have to say though that I seldom use rebuilds, I generally would be moving an index to a new filegroup/database file. I have many large TB+ size tables and indexes on those tables so using an index rebuild it not my best option, I just recreate them on different storage to take advantage of the I/O on each drive. I borrowed this code as you notice from the top line.

    https://dba.stackexchange.com/questions/139191/sql-server-how-to-track-progress-of-create-index-command
    — add this line to index or insert tsql statement before running it’
    — SET STATISTICS PROFILE ON;

    DECLARE @SPID INT = 538 ;

    ;WITH agg AS
    (
    SELECT SUM(qp.[row_count]) AS [RowsProcessed],
    –SUM(qp.[estimate_row_count])
    10148290 AS [TotalRows], — change the value here to number of rows from the properties of the AHLDB temp table
    MAX(qp.last_active_time) – MIN(qp.first_active_time) AS [ElapsedMS],
    MAX(IIF(qp.[close_time] = 0 AND qp.[first_row_time] > 0,
    [physical_operator_name],
    N”)) AS [CurrentStep]
    FROM sys.dm_exec_query_profiles qp
    WHERE
    qp.[physical_operator_name] IN (N’Table Scan’, N’Clustered Index Scan’, N’Sort’, ‘Index Scan’) — most common operation
    –qp.[physical_operator_name] IN (N’Table Scan’, N’Clustered Index Scan’, N’Sort’, ‘Index Scan’,N’Clustered Index Insert’,N’Online Index Insert’,N’Compute Scalar’) — all operations
    –qp.[physical_operator_name] IN ( N’Clustered Index Scan’) — single operation
    –qp.[physical_operator_name] IN ( N’Clustered Index Insert’) — single operation
    –qp.[physical_operator_name] IN ( N’Online Index Insert’) — single operation
    –qp.[physical_operator_name] IN ( N’Index Insert’) — single operation
    –qp.[physical_operator_name] IN ( N’Sort’) — single operation
    AND qp.[session_id] = @SPID
    ), comp AS
    (
    SELECT *,
    ([TotalRows] – [RowsProcessed]) AS [RowsLeft],
    ([ElapsedMS] / 1000.0) AS [ElapsedSeconds]
    FROM agg
    )
    SELECT [CurrentStep],
    [TotalRows],
    [RowsProcessed],
    [RowsLeft],
    CONVERT(DECIMAL(5, 2),
    (([RowsProcessed] * 1.0) / [TotalRows]) * 100) AS [PercentComplete],
    [ElapsedSeconds],
    (([ElapsedSeconds] / [RowsProcessed]) * [RowsLeft]) AS [EstimatedSecondsLeft],
    DATEADD(SECOND,
    (([ElapsedSeconds] / [RowsProcessed]) * [RowsLeft]),
    GETDATE()) AS [EstimatedCompletionTime]
    FROM comp;

    — run this query below to find out what operations are running and adjust where clause above — if you get negative number then it has already gone past one operation to another –
    — if you get a an error with zero divide in the above query, the operation in where clause is not running, choose another line or run the command below to find active operation

    — select * from sys.dm_exec_query_profiles

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.