SSMU Web Seminar: Indexing
for Performance
Part IV: Index Maintenance
Q & A
You guys had such
great questions today...and so many of them! There are a few issues that seem
to stem from some of the previous webseminars so I'll
start by saying that a few of you might greatly benefit from watching the
previous parts I through III. I've added the links to the past presentations
here but I have also addressed all of your questions here as well!
Indexing
for Performance, Part I: Index Internals
http://www.sqlmag.com/SSMU/Seminars/Index.cfm?Event_ID=18
Indexing
for Performance, Part II: Indexing for SARGs and
Joins
http://www.sqlmag.com/SSMU/Seminars/Index.cfm?Event_ID=19
Indexing
for Performance, Part III: Indexing for Aggregations and Indexed Views
http://www.sqlmag.com/SSMU/Seminars/Index.cfm?Event_ID=20
Indexing
for Performance, Part IV: Index Maintenance
http://www.sqlmag.com/SSMU/Seminars/Index.cfm?Event_ID=21
Q: I am used to seeing the clustered index
on the PK columns. But in many cases the
PK is not ever-increasing. How important
is it to have a clustered index on a table?
There’s a lot to
answer in this one! Generally speaking a clustered index can help improve performance
and help keep a table compact – if chosen appropriately. When you don’t have a
clustered index data is not ordered and insert performance is compromised
because heaps (a table without a clustered index) tend to be more optimized for
space efficiency instead of performance. In order to optimize for space, SQL
Serve needs to locate gaps for inserts – the space into which the new row will
be inserted. The lookup for this free space can be expensive. SQL Server uses a
structure called a PFS to find the free space (PFS = Page Free Space) but these
lookups add time. Having a clustered index dictates the location for the row
and no lookup is necessary. In general – and when chosen appropriates – a table
performs better with a clustered index. Having said that, there are a few
things I tend to look for in the clustered index and with each of these is the
reason. The most important are that the clustered index is:
Narrow – narrow because all non-clustered indexes
depend on the clustered index and actually include the clustering key in their
index. The wider the clustering key the wider (and probably
unnecessarily) the non-clustered indexes.
Static – static for many of the same reasons as
above. Since the non-clustered indexes depend on the clustering key a change to
a value which is overly duplicated will be very expensive.
Unique – Again, because non-clustered indexes
depend on the clustered index they use the clustering key to reference rows.
Therefore, each row must be uniquely identifiable. If the clustered index is
not already unique SQL Server will have to “uniqify”
the value by adding a 4-byte integer to the duplicate values.
All of these things add time and overhead and should be avoided.
As for an
ever-increasing – key… there are a couple of reasons why I like the identity
value. First, it naturally meets all of the above criteria. Second it minimizes
page splits which in turn keep the table more compact and less fragmented (ah –
a good lead in to the rest of Part IV). Finally, it creates a hot spot – while
that’s normally not desired, it can be beneficial as the pages into which you
are inserting – are already in cache. This effectively minimizes cache
requirements.
Q: How do deletes affect page splits?
This is a good
question – especially as I didn’t directly address this in the slide (good
catch). A delete creates open space – which can actually be beneficial to a
point. If other rows are modified and the page already has space to grow then a
delete can effectively make room for later rows to expand without causing
splits. However, if that space is not reused by another row modification then
that empty space creates a problem known as internal fragmentation. With all
types of fragmentation you must rebuild or defrag in order to reclaim that
space. Note one exception: if all rows on the page are deleted SQL Server will
mark that page as reserved but unused, if all pages within an extent are unused
then that extent is freed for other tables to reserve/used.
Q: But if primary key is a non-clustered index
and you do your queries on the primary key how does clustered-index on say
identity col help?
Typically requests to
a primary key are highly selective. For highly selective queries both indexes
are highly affective at answering the request with very few reads. Having the
identity column helps keep the table more compact, less fragmented, requires
less rebuilds, has less splits and large scans stay more efficient.
Q: Would you recommend introducing a
surrogate key identity column to an intersection table satisfying a many to
many relationship between two table both of which have
identity columns?
Well, there are really
two questions here.
First,
would I recommend a surrogate key if your tables don’t have an identity-like
column on which you can cluster? Probably!
And
do I recommend intermediate tables to define a many-to-many
relationships? Yes, usually!
Q: Does running DBCC ShowContig
use a lot of resources or cause locking?
Yes, it can. First, it
runs a full table scan. Second, some calculations are more expensive than
others. You might want to check out the WITH FAST option for SHOWCONTIG. While
not all values are produced, you might get enough with fewer locks. Regardless,
you’re still going to get a shared table level lock during showcontig.
Q: What is the default fill factor?
If FILLFACTOR is not
specified, SQL Server will use the default fillfactor.
The default fillfactor is configurable using sp_configure and is set to 0 by default. 0 means that SQL
Server will fill the leaf pages to 100% and the non-leaf pages such that there
is at least 1 space available.
Q: Do foreign keys have be dropped first
when using Create with drop existing?
Not if the key
value/index definition is not changing. If you use CREATE with DROP_EXISTING to
change the clustering key, then yes as the references will no longer make
sense.
Q: If using Database Maintenance Plans,
what mechanism does SQL Server use for database reindexing? Drop and recreate or dbcc
reindex?
DBCC
DBREINDEX.
Q: When I create a database I can allocate
all space up front (making it more physically contiguous) or allocate a small
amount of space and let it grow automatically.
But with auto grow I will get fragmentation of pages wont I? And if so how do indexes help with this? Or should you also pre-allocate your database
space?
In general, it’s
always best to pre-allocate space to databases. This minimizes FILE
fragmentation. Index fragmentation (either internal or external index
fragmentation) can occur regardless of whether or not the file is fragmented.
But – it’s good to minimize a lot of autogrowths.
Q: Can you use the dbcc
commands across linked servers?
Not really. But you
might be able to create a procedure which runs these DBCC commands and then
execute the procedure through 4 part naming.
Resources:
Whitepaper: Microsoft
SQL Server 2000 Index Defragmentation Best Practices http://www.microsoft.com/technet/treeview/default.asp?url=/technet/prodtechnol/sql/maintain/Optimize/SS2KIDBP.asp
Search SQL Server Magazine’s website http://www.sqlmag.com
Thanks for the great
questions everyone!
See you in the next online
series
or at a conference
or on a newsgroup!
Thanks,
Kimberly
President, SYSolutions, Inc. www.SQLSkills.com
Principal Mentor, Solid Quality Learning www.SolidQualityLearning.com