Related Resources
MSDN Webcast: Indexing for Performance – Proper Index Maintenance
MSDN Whitepaper: Microsoft SQL Server 2000 Index Defragmentation Best Practices
TechNet It’s ShOwtime Webcast: Index Defragmentation with SQL Server 2005
Technical Questions
Q: In your script, what is "HA Requirements"?
HA = High Availability. This is the requirement that your table stay online and available. Some companies are trying to achieve 99.999% uptime, this is especially challenging when even maintenance operations take a table offline.
Q: If you create extra indexes is there a easy to configure utility that you can run across an application after it has run for a few months to list keys that were never or infrequently used?
Use one of the new DMVs: sys.dm_db_index_usage_stats. To see the complete list of DMV objects, use the following query:
SELECT * FROM sys.system_objects
WHERE [name] LIKE 'dm[_]%'
Q: What is DMV again?
Dynamic Management View. These are new objects which give information about in-memory objects and state information.
Q: What are the parameters and their usage – for sys.dm_db_index_physical_stats?
(DatabaseID, ObjectID, IndexID, PartitionNumber, Mode)
DatabaseID = [ NULL | 'DatabaseID' ]
NULL: returns information for ALL databases, if NULL is used no other options can be supplied. This returns ALL indexes for all objects in all databases. Easy but possibly slow.
DatabaseID: smallint type. Refers to the ID for a specific database. DB_ID() or DB_ID('DatabaseID') can be used. The latter allows you to run this from ANY database as long as you have access. However, 3-part naming must be used.
ObjectID = [ DEFAULT | NULL | 'ObjectID' ]
DEFAULT/NULL: return ALL base data: CL, Heap, LOB for the specified database.
ObjectID: int type. Refers to the ID for a specific object. OBJECT_ID('TableName') can be used. When using OBJECT_ID, you can use 1/2/3-part naming. Be sure to use 3-part when executing outside of database.
IndexID = [ DEFAULT | NULL | 'IndexID' ]
DEFAULT/NULL: All indexes
IndexID: tinyint type. Refers to the ID of a specific index.
PartitionNumber = [ DEFAULT | NULL | # ]
DEFAULT/NULL/0: return ALL partitions
#: returns only the details about specific partition. When a PartitionNumber is specified then an IndexID must also be specified.
Mode = [ DEFAULT | NULL | 'SpecificMode' ]
DEFAULT/NULL/LIMITED: return FAST scan and use only an IS (Intent Shared) Table-level lock. This lock blocks ONLY eXclusive TABLE-level locks and schema changes. Excellent, relatively unobtrusive way to get fragmentation details.
LIMITED: IS Lock. Same as SQL 2000 WITH FAST, only page counts and EXTERNAL fragmentation displayed. Does not detail INTERNAL fragmentation and page density.
SAMPLED: IS Lock. For tables less than 10,000 pages (~80MB), all details are produced. For tables of more than 80MB, two samples are done (1% and 2%) at every nth page. The samples are compared and if close, 2% sampling output returned. If not close, then up to 10% will be sampled.
DETAILED: S Lock. Entire table analyzed for both internal and external fragmentation. Returns one row for each level of the index from the leaf level (level 0) all the way up to the root level. This can help you determine fragmentation in the non-leaf levels but at the expense of holding a shared table level lock.
Q: How often should you run DEFRAG on your SQL server box? Should this be a part a regular schedule? Taking down SQL is their any other consideration?
First, the only thing that’s not available is the table being REBUILT. Defragging an index does not take that table/index offline. So, more than anything, it depends on what you’re trying to achieve. If you want achieve better availability on SQL Server 2000 then you might choose to defrag rather than rebuild – to keep your tables available.
Q: How often do you get such perfect tables in practice?
A table is always completely clean and contiguous after a rebuild. To periodically fix a table, you should use consistent and automated rebuild strategies.
Q: Do you have suggestions for developers using MSDE when customer’s demands can vary? Vary from few transactions to a large customer with many transactions.
The general best practices in database and table design scale from the low end all the way up to the high-end and in the end – helps your database scale!
Q: Can you touch on rules of thumb for "pad index"?
If fragmentation in the leaf level is minimized through proper index maintenance and fillfactor – then fragmentation in the non-leaf levels should be low as well. You rarely need to specify padindex unless you have widely varying distribution of data and really want to leave larger gaps because of strange densities of data.
Q: Do most of these "Index Rules" apply to Indexed-Views?
Yes! All indexes can become fragmented after data modifications... Your scripts should always look for fragmentation across all scripts.
Q: Can you discuss fragmentation WRT horizontal partitioning, especially range partitioning on the primary key?
SQL Server 2005 offers more granular rebuild options –but not necessarily online. In many cases, you might want to design a read-only partitioned table and keep the volatile portion of the table (especially if only one partition), in its own separate table – possibly using a partition view (or an inline table valued function) over these two tables.
Q: If I'm selecting from a table with a where FirstName = ... and LastName = ... and I have 2 indexes, one on LastName and another on FirstName. Are they both used?
With an AND – maybe. The optimizer will look at the Index statistics to determine if either of them selective enough to use only one index. If neither is selective alone and a better index does not exist (a better index for AND would be one that includes BOTH of the columns in the SAME index – as a composite index), then SQL Server may choose to join the indexes (index intersection).
Q: URLs on the Resources slide can't be read. Could you type then into the Q&A, please?
When the session is available for download (which is what happens when MSDN put this online), then you can access the URLs there as well. Typically, I place all of the links at the beginning of the Q&A – resources section. I’ll make sure to do this consistently!
Q: How does an uniqueidentifier used as a clustered primary key effect performance?
This is best answered by session 4. In short, a non-sequential GUID can cause a lot of fragmentation.
Q: What is ExtentFragmentation as reported by DBCC SHOWCONTIG and is it less important than Logical Fragmentation?
Extent Fragmenation refers to how many extents are next to each other. This is a bit more important than Logical Fragmentation as logical fragmentation shows whether or not the pages are next to each other.
Q: How much danger is there in the defrag processes? What kind of backup procedures do you suggest when you defrag?
More frequent transaction log backups. A defrag generates a lot of log information. However, it does so in mini transactions. As a result, transaction log backups can occur concurrently with the defrag process and even though the defrag is not complete, the transaction can still be cleared because the defrag process runs as small transactions instead of one long running transaction. This also improves concurrency because the locks are released throughout the process.
Q: Defraging a large index can cause the log file to grow quite large. Is there a way to minimize this other than frequently log backups?
Yes, you’re correct – defraging a large index WILL grow the log file quite large! As for minimizing this activity in the log – no way to do that. But – you’re correct in increasing the frequency of log backups!
Q: With very large tables, how much available disk space (both transaction logs and data drive) do you need to have to rebuild? Does it take less space to defrag than to rebuild?
Well, this is really a multipart question… First, log space for rebuilds is mostly dependent on the recovery model. If you are running the FULL recovery model then creating and/or rebuilding indexes will take enough log space for the entire rebuild to complete. If you are running in the BULK_LOGGED or SIMPLE recovery models then this operation will run as a bulk operation and will be minimally logged. While this will take less time and significantly less log space, you are giving up some features when switching recovery models. I would strongly suggest reviewing the second session to see if this is appropriate.
Now, as for data space – a rebuild will always require at least the table size in free space and possibly as much as double (if an online rebuild is being performed). Typically, when space estimates are being done (when capacity planning the database) I always recommend taking the largest table size and multiplying it by 2 or 3 – in order to make sure you have enough space for rebuilds. There is space needed for sorting as well – this can come from the database OR from tempdb (using the SORT_IN_TEMPDB option).
Defraging doesn’t move an object so it doesn’t take additional data space BUT it does require more overall log space because it runs as mini transactions instead of just one.
Q: Should we look at different fragmentation stats if there are multiple files in the same filegroup?
No, you s