SSMU Web Seminar: Indexing for Performance
Part IV: Index Maintenance

Kimberly L. Tripp

 

Q & A

 

You guys had such great questions today...and so many of them! There are a few issues that seem to stem from some of the previous webseminars so I'll start by saying that a few of you might greatly benefit from watching the previous parts I through III. I've added the links to the past presentations here but I have also addressed all of your questions here as well!

 

Indexing for Performance, Part I: Index Internals

        http://www.sqlmag.com/SSMU/Seminars/Index.cfm?Event_ID=18

Indexing for Performance, Part II: Indexing for SARGs and Joins

        http://www.sqlmag.com/SSMU/Seminars/Index.cfm?Event_ID=19

Indexing for Performance, Part III: Indexing for Aggregations and Indexed Views

        http://www.sqlmag.com/SSMU/Seminars/Index.cfm?Event_ID=20

Indexing for Performance, Part IV: Index Maintenance

        http://www.sqlmag.com/SSMU/Seminars/Index.cfm?Event_ID=21

 

Q: I am used to seeing the clustered index on the PK columns.  But in many cases the PK is not ever-increasing.  How important is it to have a clustered index on a table?

There’s a lot to answer in this one! Generally speaking a clustered index can help improve performance and help keep a table compact – if chosen appropriately. When you don’t have a clustered index data is not ordered and insert performance is compromised because heaps (a table without a clustered index) tend to be more optimized for space efficiency instead of performance. In order to optimize for space, SQL Serve needs to locate gaps for inserts – the space into which the new row will be inserted. The lookup for this free space can be expensive. SQL Server uses a structure called a PFS to find the free space (PFS = Page Free Space) but these lookups add time. Having a clustered index dictates the location for the row and no lookup is necessary. In general – and when chosen appropriates – a table performs better with a clustered index. Having said that, there are a few things I tend to look for in the clustered index and with each of these is the reason. The most important are that the clustered index is:

Narrow – narrow because all non-clustered indexes depend on the clustered index and actually include the clustering key in their index. The wider the clustering key the wider (and probably unnecessarily) the non-clustered indexes.

Static – static for many of the same reasons as above. Since the non-clustered indexes depend on the clustering key a change to a value which is overly duplicated will be very expensive.

Unique – Again, because non-clustered indexes depend on the clustered index they use the clustering key to reference rows. Therefore, each row must be uniquely identifiable. If the clustered index is not already unique SQL Server will have to “uniqify” the value by adding a 4-byte integer to the duplicate values.

All of these things add time and overhead and should be avoided.

As for an ever-increasing – key… there are a couple of reasons why I like the identity value. First, it naturally meets all of the above criteria. Second it minimizes page splits which in turn keep the table more compact and less fragmented (ah – a good lead in to the rest of Part IV). Finally, it creates a hot spot – while that’s normally not desired, it can be beneficial as the pages into which you are inserting – are already in cache. This effectively minimizes cache requirements.

 

Q: How do deletes affect page splits?

This is a good question – especially as I didn’t directly address this in the slide (good catch). A delete creates open space – which can actually be beneficial to a point. If other rows are modified and the page already has space to grow then a delete can effectively make room for later rows to expand without causing splits. However, if that space is not reused by another row modification then that empty space creates a problem known as internal fragmentation. With all types of fragmentation you must rebuild or defrag in order to reclaim that space. Note one exception: if all rows on the page are deleted SQL Server will mark that page as reserved but unused, if all pages within an extent are unused then that extent is freed for other tables to reserve/used.

 

Q: But if primary key is a non-clustered index and you do your queries on the primary key how does clustered-index on say identity col help?

Typically requests to a primary key are highly selective. For highly selective queries both indexes are highly affective at answering the request with very few reads. Having the identity column helps keep the table more compact, less fragmented, requires less rebuilds, has less splits and large scans stay more efficient.

 

Q: Would you recommend introducing a surrogate key identity column to an intersection table satisfying a many to many relationship between two table both of which have identity columns?

Well, there are really two questions here.

First, would I recommend a surrogate key if your tables don’t have an identity-like column on which you can cluster? Probably!

And do I recommend intermediate tables to define a many-to-many relationships? Yes, usually!

 

Q: Does running DBCC ShowContig use a lot of resources or cause locking?

Yes, it can. First, it runs a full table scan. Second, some calculations are more expensive than others. You might want to check out the WITH FAST option for SHOWCONTIG. While not all values are produced, you might get enough with fewer locks. Regardless, you’re still going to get a shared table level lock during showcontig.

 

Q: What is the default fill factor?

If FILLFACTOR is not specified, SQL Server will use the default fillfactor. The default fillfactor is configurable using sp_configure and is set to 0 by default. 0 means that SQL Server will fill the leaf pages to 100% and the non-leaf pages such that there is at least 1 space available.

 

Q: Do foreign keys have be dropped first when using Create with drop existing?

Not if the key value/index definition is not changing. If you use CREATE with DROP_EXISTING to change the clustering key, then yes as the references will no longer make sense.

 

Q: If using Database Maintenance Plans, what mechanism does SQL Server use for database reindexing?  Drop and recreate or dbcc reindex?

DBCC DBREINDEX.

 

Q: When I create a database I can allocate all space up front (making it more physically contiguous) or allocate a small amount of space and let it grow automatically.  But with auto grow I will get fragmentation of pages wont I?  And if so how do indexes help with this?  Or should you also pre-allocate your database space?

In general, it’s always best to pre-allocate space to databases. This minimizes FILE fragmentation. Index fragmentation (either internal or external index fragmentation) can occur regardless of whether or not the file is fragmented. But – it’s good to minimize a lot of autogrowths.

 

Q: Can you use the dbcc commands across linked servers?

Not really. But you might be able to create a procedure which runs these DBCC commands and then execute the procedure through 4 part naming.

 

Resources:

Whitepaper: Microsoft SQL Server 2000 Index Defragmentation Best Practices http://www.microsoft.com/technet/treeview/default.asp?url=/technet/prodtechnol/sql/maintain/Optimize/SS2KIDBP.asp

 

Search SQL Server Magazine’s website http://www.sqlmag.com

 

Thanks for the great questions everyone!

 

See you in the next online series

or at a conference

or on a newsgroup!

 

Thanks,

Kimberly

 

Kimberly L. Tripp

President, SYSolutions, Inc. www.SQLSkills.com

Principal Mentor, Solid Quality Learning www.SolidQualityLearning.com