When you decide to rebuild or reorganzie an index, you have an option to set something called FILLFACTOR. This is probably the MOST IMPORTANT thing to understand about index maintenance and reducing fragmentation (especially in databases that are prone to it). Unfortunately, we need to dive into some internals to really understand why this is so helpful…
What is FILLFACTOR?
FILLFACTOR defines how full the leaf level pages of the index are filled when rebuilding or reorganizing an index. The leaf level of an index is the largest structure of an index and it's the most prone to fragmentation. A good analogy would be to use the phone book (ah, loosely for this one) and imagine that new people come to the area. The *entire* phone book needs to get reprinted. Obviously this is terribly expensive and therefore it's not done often (well, we all use the internet anyway…). But, imagine if EACH page of the phone book were only filled to 90%. When new people come to the area we could just add their information – on the specific page where it should reside – and the phone book stays relatively intact and we don't have to kill a bunch of trees (OK, you could argue that you would have had to kill more trees to make a phone book that has 10% whitespace…but, let's not go there right now. :). OK, I know this analogy is weak at best but think back to "a page in the phone book" and think of it representing a "page" within SQL Server (quite literally here – SQL Server actually stores data on 8KB pages and indexes are doubly-linked lists of these 8KB pages).
So, if the pages are full and there's nowhere to put a particular row (let's say someone with a lastname of 'Tripp'), how does SQL Server "insert" this new row? They can't possibly shift all of the data down and rewrite all of the pages AFTER 'Tripp' could they? I suppose they could but it would get REALLY expensive – especially if someone with the last name of 'Anderson' moves to town…
How does SQL Server make room for new rows? They do something called a page split. Now, this can be A LOT more technical than I'm going to explain here but simply put, they split the page (on which the row is to be located) into half. The first 50% of the rows STAYS on the current page and the 50% MOVES to a new page (and this new page is probably nowhere near the page being split). However, they keep this list LOGICALLY in order by maintaining a doubly-linked list. So, SQL Server knows where to go when it's reading the data – even if the pages are not physically/sequentially in order. If this happens once or twice, it's not a huge problem. But as it continues to happen, it makes scans of your data more and more expensive and the process of the split can be expensive in and of itself. The biggest problems are:
- The data is not physically in order (and can create lot random IOs when scanning)
- The pages are no longer very dense (they've gone from 100% to 50%). Splitting does has a positive side in that it's going to take awhile before the page splits again BUT, splitting is meant as a temporary fix and ideally, it shouldn't be happening to every page. But, this also negatively impacts cache – you're only using 50% of the page that's split so now you're wasting half of that page in memory (and, on disk). So, while making a table 10% wider from the start seems bad; it reduces the need for splitting and therefore your table will stay more intact over time.
What indexes have fragmentation?
Some indexes are prone to more fragmentation than others. For example, indexes with a high-order element (the first column of the index) that is NOT in ever-increasing order. This does NOT mean that ALL indexes should be created such that their first column is ever-increasing (that's NOT a good way to create indexes!). But, it does mean that you should pay special attention to some of your larger tables and therefore larger indexes (meaning ESPECIALLY the clustered index since the clustered index is over ALL of the columns).
In terms of index management, here are the main things to think about for each index type:
Rebuilding the clustered index is the most expensive index to rebuild. It's also the one that you'd rather avoid if you can. It has the most limitations… A clustered index CANNOT be rebuilt as an online operation IF the table has ANY LOB columns in it at all. What this means is that the clustered index – if fragmented – must be rebuilt using an offline operation. This translates into downtime for that table – while it's being maintained. So, if you can create a good clustering key EARLY, then you might be able to minimize clustered index maintenance.
Rebuilding a nonclustered index is much easier and likely to be an ONLINE operation. Because nonclustered indexes contain a much smaller number of columns – and rarely ever include a LOB column – they can be managed/maintained through ONLINE index operations. So, even if these are prone to more fragmentation (and, in general, they are) then it's not as big of a deal.
How do I solve this problem?
So, the BEST thing to do is to visually picture the ORDER of the data in the index (just think of it like a phone book – a LIST of records on PAGES – ordered by the KEY of the index). If you can visualize that structure then you can visualize what happens on an INSERT, UPDATE and a DELETE. Once you have a good visual you should be able to define a FILLFACTOR based on these two things:
- How much fragmentation is happening (is it becoming massively fragmented quickly) – then set a lower fillfactor (like 70 or 80 – depending on how big the table is)
- How frequently are you going to be running maintenance (if you can do this daily or weekly – for some objects) then you might not need much free space. If you can run maintenance against the table frequently then you won't need as much free space.
And, of course, you can directly check the level of fragmentation your table has by using the DMV: dm_db_index_physical_stats (which I referenced in Part I) and ideally, you'll create a procedure that runs through your objects determining how much fragmentation they have and the appropriate FILLFACTOR. Sometimes the best way to do this is to create your own "management table" which lists all of your tables and then have definitions for when to rebuild, when to reorg and what fillfactor to use. There are a few folks that have done things like this. Check out Ola's site here. He's put a lot of effort into a more robust and flexible combination of options for database maintenance. It might be more than you need but it's a GREAT EXAMPLE of all of the things that you can evaluate and consider wrt to maintenance.
Solving this using Database Maintenance Plans
Finally, if you CHOOSE to use a maintenance plan you ONLY have two options:
Reorganize pages with the default amount of free space
Does NOT sound like what it actually does (IMO). It sounds like it uses some form of SQL Server default value. However, it actually uses the "OrigFillFactor" value that's set for the index. This is actually a good thing IF you've set the FILLFACTOR on the CREATE INDEX statement OR on a rebuild. Reorganizing does NOT allow you to set this ONLY a rebuild does. So, if you have set this then this choice is probably best. If you run a query against sysindexes (yes, the backward compatibility view NOT the new sys.indexes catalog view), then you can see the OrigFillFactor setting for ALL of your indexes. (And, it's probably 0 – which sounds strange… 0 actually means 100… I could explain this but just trust me on this one.)
Change free space per page percentage to _____ %
Well, this is certainly an option you can consider. If you've never set the FILLFACTOR (or, even if you have…), this would overwrite it AND change the OrigFillFactor value to the percent you use here. If you decide to take this "sledgehammer" approach then I'd suggest something quite high – like 90-95% so that you don't end up wasting a lot of space. Some indexes (ones that have a high-order element that IS ever-increasing) don't really need free space and this might be completely wasted. So, I think it's better to set this indivually using stored procedures and/or a more custom maintenance procedure.
NOTE/TIP FROM COMMENTS POSTED: While I explain this concept (above) using the index option/setting (called FILLFACTOR) the UI is wanting the FREE SPACE percentage. So, if you want to use 90-95% FULL then be sure to set this value to 5-10%. Using the syntax, you will set page fullness.
OK, I still have at least 2 more in this series and EVEN THOUGH I've been completely sucked into facebook (as Paul blogged here), I'm still trying to manage to work and save SQLskills. Paul, on the other hand, just uploaded more photos. ;-)