Effectively Designing a Scalable and Reliable Database

A Primer to Proper SQL Server Development

New Features in Indexing and Index Maintenance Best Practices, Part 5 of 10

Presented by Kimberly L. Tripp, SQLskills.com

Q: Can I view a recording of this webcast? Part 5 can be replayed by clicking here.

Q: Where can we get the demo scripts AND the sample database: Credit? The demo scripts are in this zip (20050902MSDNDemoScripts.zip (8.52 KB)); here in this blog entry. However, at the series completion, I will also create an entry under Past Event Resources for the entire webcast series.  To download the ZIP of the Credit Database Backup click here. Once unzipped, restore this backup to SQL Server 2000 or SQL Server 2005. The backup is a SQL Server 2000 backup and can be restored to either version! If restoring to SQL Server 2005, you might want to change the destination for the data and log file as the path will probably be different.

Q: Where are the links to all prior Webcast Q&As from this series?

Part 1: Creating a Recoverable Database
For the MSDN Download for Part 1, click here.
For the SQLskills Blog Entry for Part 1, click here.

Part 2: Creating a Reliable and Automated Backup Strategy
For the MSDN Download for Part 2, click here.
For the SQLskills Blog Entry for Part 2, click here.

Part 3: Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures
For the MSDN Download for Part 3, click here.
For the SQLskills Blog Entry for Part 3, click here.

Part 4: SQL Server Indexing Best Practices
For the MSDN Download for Part 4, click here.
For the SQLskills Blog Entries for Part 4
Resource links blog entry, click here.
Q&A blog entry, click here.

Q: How can I replay previous sessions? I thought we were going to get emails for replaying — but I haven't received any replay emails. You will receive replay emails ONLY when you register for these sessions through MSDN. We’ve come to find out that there are other ways to register but it’s only through MSDN that we know for sure you will receive the replay information. Regardless, you can always find the “on-demand” version of the sessions here.

Related Resources

MSDN Webcast: Indexing for Performance – Proper Index Maintenance MSDN Whitepaper: Microsoft SQL Server 2000 Index Defragmentation Best Practices TechNet It’s ShOwtime Webcast: Index Defragmentation with SQL Server 2005 

Technical Questions

Q: In your script, what is "HA Requirements"? HA = High Availability. This is the requirement that your table stay online and available. Some companies are trying to achieve 99.999% uptime, this is especially challenging when even maintenance operations take a table offline.

Q: If you create extra indexes is there a easy to configure utility that you can run across an application after it has run for a few months to list keys that were never or infrequently used? Use one of the new DMVs: sys.dm_db_index_usage_stats. To see the complete list of DMV objects, use the following query:SELECT * FROM sys.system_objects WHERE [name] LIKE 'dm[_]%'

Q: What is DMV again? Dynamic Management View. These are new objects which give information about in-memory objects and state information.

Q: What are the parameters and their usage – for sys.dm_db_index_physical_stats? (DatabaseID, ObjectID, IndexID, PartitionNumber, Mode)

DatabaseID = [ NULL | 'DatabaseID' ] NULL: returns information for ALL databases, if NULL is used no other options can be supplied. This returns ALL indexes for all objects in all databases. Easy but possibly slow.
DatabaseID: smallint type. Refers to the ID for a specific database. DB_ID() or DB_ID('DatabaseID') can be used. The latter allows you to run this from ANY database as long as you have access. However, 3-part naming must be used.
ObjectID = [ DEFAULT | NULL | 'ObjectID' ]DEFAULT/NULL: return ALL base data: CL, Heap, LOB for the specified database.
ObjectID: int type. Refers to the ID for a specific object. OBJECT_ID('TableName') can be used. When using OBJECT_ID, you can use 1/2/3-part naming. Be sure to use 3-part when executing outside of database.
IndexID = [ DEFAULT | NULL | 'IndexID' ]DEFAULT/NULL: All indexes
IndexID: tinyint type. Refers to the ID of a specific index.
PartitionNumber = [ DEFAULT | NULL | # ]DEFAULT/NULL/0: return ALL partitions
#: returns only the details about specific partition. When a PartitionNumber is specified then an IndexID must also be specified.
Mode = [ DEFAULT | NULL | 'SpecificMode' ]DEFAULT/NULL/LIMITED: return FAST scan and use only an IS (Intent Shared) Table-level lock. This lock blocks ONLY eXclusive TABLE-level locks and schema changes. Excellent, relatively unobtrusive way to get fragmentation details.

LIMITED: IS Lock. Same as SQL 2000 WITH FAST, only page counts and EXTERNAL fragmentation displayed. Does not detail INTERNAL fragmentation and page density.SAMPLED:  IS Lock. For tables less than 10,000 pages (~80MB), all details are produced. For tables of more than 80MB, two samples are done (1% and 2%) at every nth page. The samples are compared and if close, 2% sampling output returned. If not close, then up to 10% will be sampled.DETAILED: S Lock. Entire table analyzed for both internal and external fragmentation. Returns one row for each level of the index from the leaf level (level 0) all the way up to the root level. This can help you determine fragmentation in the non-leaf levels but at the expense of holding a shared table level lock.

Q: How often should you run DEFRAG on your SQL server box? Should this be a part a regular schedule? Taking down SQL is their any other consideration? First, the only thing that’s not available is the table being REBUILT. Defragging an index does not take that table/index offline. So, more than anything, it depends on what you’re trying to achieve. If you want achieve better availability on SQL Server 2000 then you might choose to defrag rather than rebuild – to keep your tables available.

Q: How often do you get such perfect tables in practice? A table is always completely clean and contiguous after a rebuild. To periodically fix a table, you should use consistent and automated rebuild strategies.

Q: Do you have suggestions for developers using MSDE when customer’s demands can vary? Vary from few transactions to a large customer with many transactions. The general best practices in database and table design scale from the low end all the way up to the high-end and in the end – helps your database scale!

Q: Can you touch on rules of thumb for "pad index"? If fragmentation in the leaf level is minimized through proper index maintenance and fillfactor – then fragmentation in the non-leaf levels should be low as well. You rarely need to specify padindex unless you have widely varying distribution of data and really want to leave larger gaps because of strange densities of data.

Q: Do most of these "Index Rules" apply to Indexed-Views? Yes! All indexes can become fragmented after data modifications… Your scripts should always look for fragmentation across all scripts.

Q: Can you discuss fragmentation WRT horizontal partitioning, especially range partitioning on the primary key? SQL Server 2005 offers more granular rebuild options –but not necessarily online. In many cases, you might want to design a read-only partitioned table and keep the volatile portion of the table (especially if only one partition), in its own separate table – possibly using a partition view (or an inline table valued function) over these two tables.

Q: If I'm selecting from a table with a where FirstName = … and LastName = … and I have 2 indexes, one on LastName and another on FirstName. Are they both used? With an AND – maybe. The optimizer will look at the Index statistics to determine if either of them selective enough to use only one index. If neither is selective alone and a better index does not exist (a better index for AND would be one that includes BOTH of the columns in the SAME index – as a composite index), then SQL Server may choose to join the indexes (index intersection).

Q: URLs on the Resources slide can't be read. Could you type then into the Q&A, please? When the session is available for download (which is what happens when MSDN put this online), then you can access the URLs there as well. Typically, I place all of the links at the beginning of the Q&A – resources section. I’ll make sure to do this consistently!

Q: How does an uniqueidentifier used as a clustered primary key effect performance? This is best answered by session 4. In short, a non-sequential GUID can cause a lot of fragmentation.

Q: What is ExtentFragmentation as reported by DBCC SHOWCONTIG and is it less important than Logical Fragmentation? Extent Fragmenation refers to how many extents are next to each other. This is a bit more important than Logical Fragmentation as logical fragmentation shows whether or not the pages are next to each other.

Q: How much danger is there in the defrag processes? What kind of backup procedures do you suggest when you defrag? More frequent transaction log backups. A defrag generates a lot of log information. However, it does so in mini transactions. As a result, transaction log backups can occur concurrently with the defrag process and even though the defrag is not complete, the transaction can still be cleared because the defrag process runs as small transactions instead of one long running transaction. This also improves concurrency because the locks are released throughout the process.

Q: Defraging a large index can cause the log file to grow quite large. Is there a way to minimize this other than frequently log backups? Yes, you’re correct – defraging a large index WILL grow the log file quite large! As for minimizing this activity in the log – no way to do that. But – you’re correct in increasing the frequency of log backups!

Q: With very large tables, how much available disk space (both transaction logs and data drive) do you need to have to rebuild? Does it take less space to defrag than to rebuild? Well, this is really a multipart question… First, log space for rebuilds is mostly dependent on the recovery model. If you are running the FULL recovery model then creating and/or rebuilding indexes will take enough log space for the entire rebuild to complete. If you are running in the BULK_LOGGED or SIMPLE recovery models then this operation will run as a bulk operation and will be minimally logged. While this will take less time and significantly less log space, you are giving up some features when switching recovery models. I would strongly suggest reviewing the second session to see if this is appropriate.Now, as for data space – a rebuild will always require at least the table size in free space and possibly as much as double (if an online rebuild is being performed). Typically, when space estimates are being done (when capacity planning the database) I always recommend taking the largest table size and multiplying it by 2 or 3 – in order to make sure you have enough space for rebuilds. There is space needed for sorting as well – this can come from the database OR from tempdb (using the SORT_IN_TEMPDB option).Defraging doesn’t move an object so it doesn’t take additional data space BUT it does require more overall log space because it runs as mini transactions instead of just one.

Q: Should we look at different fragmentation stats if there are multiple files in the same filegroup? No, you still want to review average fragmentation. However, you may have more “fragments” in a table that spans filegroups; this does not necessarily mean that your table is fragmented.

Q: Are there any good third party tools for checking fragmentation and performing maintenance? Unfortunately no revolutionary ones (that I know of and/or can recommend)…but I still have high hopes :)

Q: How do you determine the appropriate fill factor? Unfortunately, there isn't a magic number… but, you can test your guestimate by seeing how fragmented the table becomes between your regularly scheduled defragmentation routines.

Q: Does it matter if I build the clustered index before/after rebuilding the non-clustered indexes? You should always create the clustered index before creating non-clustered index but as for rebuilding – you can rebuild your indexes independently as a rebuild of the clustered does not (generally) cause a rebuild of the non-clustered. There is one exception to this rule in SQL Server 2000 – in 2000 ONLY, SQL Server will automatically rebuild the non-clustered indexes when a non-unique clustered index is rebuild. Why? Because the uniqueifiers are rebuilt.

Q: Will doing a defrag followed later by a rebuild decrease the work of the rebuild? Not really. A defrag doesn’t move the object – only a rebuild does. However, you might minimize the cost of the sort…

Q: How does cache map to table pages, i.e., does free space in table pages have a 1:1 correspondence to wasted cache? SQL Server reads the 8K page from disk into memory. The number of bytes that are wasted on disk are also wasted in memory. This is often the motivation for vertical partitioning! You might refer back to session three for more details on row/page structures!

Q: If switching a varchar cluster to a bigint and vice-versa in 2000, what would the best order of drop/create index? Actually, this is the reason that CREATE with DROP_EXISTING was created… so that you could “change” the definition of the clustered

DROP TABLE test
go
CREATE TABLE test
(
      testid      int               not null,
      col1       varchar(100)      not null
)
go
CREATE CLUSTERED INDEX testind ON test(col1)
go
CREATE CLUSTERED INDEX testind ON test(testid) WITH DROP_EXISTING
go
sp_rename 'test.testind', 'NewIndexName', 'INDEX'
go
sp_helpindex test
go

Q: What about instances of one name only? (like Madonna, Cher, etc. ;^) Well, this is a good question and this is something that you might need to plan for in design. In these cases, you might allow NULLs in the lastname column and then make sure to search both when a lookup is performed. To be honest, I probably won’t do all that much to find these special first names – unless you wanted to do searches across both columns without knowing whether or not this is a first or last name. You might do something like this in a lookup

SELECT * FROM NamesTable
WHERE LastName = @variable
      OR (FirstName = @variable AND LastName IS NULL)

Comment: Just wanted to say I appreciate the blog you have put together.

Thanks for the thanks! It's a lot of work but I think it's great as a reference!! Even for me! To be honest, I can't always remember where to find things either! J

Thanks! So – we’re half way there – 5 more to go! And, lots more questions coming I’m sure J For the next session, we’re going to cover Isolation and options in Isolation in SQL Server 2005. If you’re interested in hearing more isolation, locking/blocking – here’s the registration link:MSDN Webcast: A Primer to Proper SQL Server Development (Part 6 of 10): Mixed Workloads, Secondary Databases, Wait States, Locking and Isolation See you on Friday!

kt