MSDN Webcast Q&A: Index Defrag Best Practices – Fragmentation Questions, Part I


Q: Is there another way to determine fragmentation? I don’t want to use DBCC SHOWCONTIG as it locks objects.


Well, this is a good question and unfortunately, there really isn’t a way to generate information such as Scan Density or Average Page Density without running DBCC SHOWCONTIG. However, if you decide that you’re only interested in Scan Density then you will be interested in the WITH FAST option for DBCC SHOWCONTIG. Using the WITH FAST option the command will only calculate the out of order extents (i.e. extent switches) and not calculate average row length, maximum row length, minimum row length, average page density, etc. and in using this option SQL Server will execute this command a lot faster thereby releasing the locks faster. However, locks will still be acquired. As an odd response though – I’d have to say that you could just defrag without looking… Meaning instead of even checking for fragmentation you could just execute DBCC INDEXDEFRAG. Since DBCC INDEXDEFRAG doesn’t use long running locks (it runs as “mini transactions” as it moves through the leaf level of the index) and doesn’t do anything when there’s no fragmentation (it really only moves data around when fragmentation exists) then you could do this without checking – and not creating blocking.


 


Q: How does the fill factor influence index structure?


FILLFACTOR – which is an option which ONLY takes effect when an index is initially built or later when an index is rebuilt or defragmented – defines the level of fullness for the leaf level of an index. Fillfactor ONLY applies to the leaf level and when set appropriately for the data and the volatility patterns of the data, it should help to minimize fragmentation thereby reducing the need to free space to be left at higher levels in the tree. However, if you also want to pad the upper levels of the tree with the same level of free space as the leaf level, you can add PAD_INDEX to the index statement and it will use the fillfactor setting for all levels of the index. For example:


 


CREATE INDEX Test


ON TestTable (TestColumn)


WITH FILLFACTOR = 80


 


Fills the leaf pages 80 percent full and the non-leaf pages are balanced such that at LEAST one entry is open on each leaf page.


 


CREATE INDEX Test


ON TestTable (TestColumn)


WITH FILLFACTOR = 80, PAD_INDEX


 


Fills BOTH the leaf and non-leaf pages 80 percent full.


 


Typically, PAD_INDEX is not needed unless you have extremely unbalanced activity and are unable to rebalance the tree (meaning you need to achieve high availability). However, I would say that PAD_INDEX is not perfect as implemented as I would rather set that value differently from fillfactor…


 


So, not a big fan of PAD_INDEX but feel it’s very important to set an APPROPRIATE fillfactor. (Review the webcast for more details on what I mean by “appropriate” fillfactor.)


 


Q: What would be the ideal settings [of fillfactor] for OLAP/DSS Databases?


To be honest, 100% is an ideal setting for a read-only decision support database, as new data is not causing splits. However, data loads (which build these reporting databases) can cause fragmentation. Ideally, you would drop your indexes, load your data and then rebuild your indexes upon completion. If you’re trying to reduce the amount of time the process takes AND you’re not loading in a lot of data (compared to what’s already there) then you might not want to drop and recreate indexes… Setting fillfactor in these cases will be based on how much data you will be loading compared with how much data you have. Set fillfactor to whatever that percentage is (if you’re loading 100,000 rows into a table which already has 900,000 rows then a fillfactor of 90 should have just enough room – IF the data is evenly distributed). If the data is NOT evenly distributed then you might want to drop this number a tad lower. AND – you only want to set this on indexes that are going to have splits. Remember, if your clustering key ever increasing and the new data coming is “appended” to the end of the table – both logically and by clustering key definition – then you only need the fillfactor setting on the nonclustered indexes. Regardless, it is still likely that you will end up with some fragmentation. Dropping the indexes, loading the data, and then rebuilding the indexes would certainly be better – but it will take more time. IF you’re loading a lot of data relative to what you already have then there’s NO question: drop the indexes first, load the data and then rebuild. When dropping indexes ALWAYS make sure to drop NC Indexes first and then drop the clustered index. When creating indexes ALWAYS make sure to create the clustered index first and then add the nonclustered. For more details on High Performance Data Loading check out the presentations on Gert Drapers’ site: www.SQLDev.Net, under Events. (FYI – There are all sorts of great presentations there!)


 


Q: When trying to determine the fill factor to set, would you recommend setting the same fillfactor to all tables or just adjust the fillfactor on the large tables and keep the small ones at the default?


I know I got this one during the webcast… But as a recap here. I would really recommend starting with the “monster” tables and then work your way down from there. Remember, you don’t really need a fillfactor unless your object is going to have splits. In many cases your clustered structures will not have a lot of splits as they might have a clustering key which is ever-increasing. Additionally, if you have all fixed width fields or you don’t have updates to varchar columns – then you might be able to keep a large table very compact at 100% in fact. The reason I mention this is that I have a customer who did exactly what you’re asking… They decided it was easier to just set the server option for fillfactor to 90% so that they’d change the default value instead of having to set it individually for every index. Unfortunately, for their largest (yes, their monster!) table it turned out that they didn’t have updates and their clustering key was on an identity column. Once they went back and changed their rebuild job to build to 100% they reclaimed 3.2 GB of disk space!


 


So, let me summarize. Only worry about fillfactor for your largest tables. Set it somewhat strategically and make sure to check on it using DBCC SHOWCONTIG (if possible) or just defrag regularly (if you just want to make sure and can’t afford the impact on availability – review the rest of the questions in “Index Fragmentation” for more details).


 


Q: Will Kimberly show the rebuild or defrag statement she used? I had to run to the restroom and missed that brief part.


I would have answered your question but I had to use the restroom myself… Seriously though, J the commands are:


            DROP and re-CREATE


            DBCC DBREINDEX (was added in 6.5)


            CREATE with DROP_EXISTING (was added in 7.0)


            DBCC INDEXDEFRAG (was added in 2000)


Personally, when I’m going to do a rebuild – I use DBCC DBREINDEX as it’s easier to automate and easier syntax in general. If my availability requirements don’t allow the table/indexes to be rebuilt because of locking, then you only have one choice for defraging – DBCC INDEXDEFRAG. I have a feeling that I cleared this up once you returned and certainly, you can review the webcast on demand here!


 


Q: Regarding sp_RebuildIndexes, does the SP need to be created in the Master Database? Our companies Best Practices recommend against it. We use an “Admin User Database” for most shared stored procedures.


Sure, you can create the sp in any database (just make sure to modify the script) and then make sure to explicitly qualify the execution with the datasbasename.owner.procedurename and everything should work. Also, there are a couple of procedures created. Make sure to review thoroughly the script.


 


Q: What type of Defrag do the data maintenance plans use?


They use DBCC INDEXDEFRAG.


 


Q: Can you use DBCC DBREINDEX if you are using transactional replication? I thought this ‘broke’ replication.


Checked the KB and pinged a few people and no one is aware of any issues with regard to DBREINDEX breaking replication (nor should creating/dropping indexes). In fact, because transactional replication focuses on the transactional data changes (not the page locations of the data) this operation should not even have the potential of “breaking” replication. Don’t get me wrong, it can create blocking and this in turn can impact user access to the replicated tables but it still should not break replication. 


On a related note (and probably another common replication/maintenance question I get), often people want to know how to have replicated table’s indexes rebuilt/defraged when the source table is rebuilt/defraged. And – this is a fairly simple trick. Use stored procedures to execute the rebuild/defrag and then stored procedure replication to have the execution of the stored procedure replicated to the subscribers.

2 thoughts on “MSDN Webcast Q&A: Index Defrag Best Practices – Fragmentation Questions, Part I

  1. Kimberly,

    I set up a DB Maint plan to do a reorg of a database. I ran SQL Profiler and it showed that the reorg was done with DBCC DBReindex. Are there options you can set to force a DBCC Indexdefrag?

    Thanks,
    Jeff

  2. Hey there Jeff – Not by using the maintenance plan (unfortunately). You’d have to set this up on your own. The best recommendation I have is to create your own procedure which loops through the tables of the database and runs DBCC INDEXDEFRAG. Fairly easily, you can use my sp_RebuildIndexes procedures as a base and then change the code to DEFRAG instead of rebuild.

    You can find sp_RebuildIndexes.txt (just rename to .sql) here: http://localhost/demoscripts.asp

    This is a good start. Let me know if you have any troubles!

    hth,
    kt

Leave a Reply to Kimberly L. Tripp Cancel reply

Your email address will not be published. Required fields are marked *

Other articles

Wow! Wow! Wow! THANK YOU!

I announced my retirement from SQL/tech here and your comments on my blog, on LinkedIn, and on Facebook were overwhelming and humbling! I’m so touched

Explore

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.