Effectively Designing a Scalable and Reliable Database A Primer to Proper SQL Server Development
Indexing Best Practices, Part 4 of 10Presented by Kimberly L. Tripp, SQLskills.com For a list of the non-technical Resources related to this webcast, please review the following blog entry:MSDN Webcast Q&A for Resources: Indexing Best Practices, Part 4 of 10 Technical Questions
Q: When a clustered index is added to a table what happens physically to the data pages? SQL Server moves the data to the new location defined by the ON clause of the index creation statement. The ON clause can specify a different filegroup or even a partition scheme and the heap will move to the new location.
Q: I'm not getting the follow-up emails with the replay link. How do I get on this email list for replay?
You *must* register through MSDN. If you are using a partner site then that is probably why you’re not getting the email. IF you are registering through MSDN and still not receiving the email, please contact email@example.com with your information.
Q: Does the copy affect the transaction log?
Yes. A create index or rebuild index statement does require the statement to be a single transaction. If you are running with your database set to the FULL recovery model then this single transaction will create a lot of log activity. You could switch to the BULK_LOGGED or SIMPLE recovery model and get less log activity but switching to the SIMPLE recovery model should be avoided as it will break the continuity of the transaction log. I would recommend two resources to get more information about Recovery Models and their impact on performance, logging and recovery.
Review the sample chapter: Chapter 9: Database Environment Basics for Recovery from the MSPress title: Microsoft SQL Server 2000 High AvailabilityQ: Can one switch back and forth between simple and full recovery mode?
Ideally, you would switch ONLY between the FULL Recovery model and BULK_LOGGED Recovery model but not with SIMPLE. The two resources listed above will also help to clarify this point.
Q: What about re-building a clustered index? Will it be that slow too? Is it recommended to do it off-hours?
Rebuilding a clustered index takes a lot of the same requirements that creating the CL index does. It can also be slow and yes – as a result, you might want to do this off hours. You can speed up the process by changing recovery models but switching recovery models can also have an impact on recovery. See the two resources above for more details to help clarify this point. And, in part 5, I’ll cover more of the fragmentation details.
Q: When is it desirable to move a table?
This is not necessarily an easy one to answer simply. But, I often recommend separating transaction processing data from read-only data as well as possibly placing a large table on its own filegroup (especially true with partitioned objects). But, I don’t often recommend separating indexes. Here’s a link to a SQL Server Magazine article I wrote – it does give some guidance. In general I don’t separate too many objects. However, if you’re building a database that you will put into a read-only state you might want to work a bit harder to “shift” objects to filegroups in such a way that each of those objects gets a nice and clean and contiguous chunk of space within the file. Another time that it’s useful to “move” an object is when you load data into a heap and then want to build the clustered index on another filegroup – where that heap will become a partition of a partitioned table. This is good when the clustered index will move the object into a filegroup that does not have other objects in it. Again, the clustered index will be nice and clean and contiguous in the destination filegroup.
Q: So, if I understand correctly… a good approach to creating a table would be to have a clustered index independent of the primary key. So, ideally you could use a GUID for the PK and then have a clustered index in order to minimize the fragmentation.
The most ideal combination is when the clustered index is on an identity column – and even better if this is the Primary Key. However, if you do want to keep a GUID as the primary key then yes, you can create a clustered index which is independent of the primary key.
Q: In general, do you think folks have more problems setting up indexes for OLTP, OLAP/DWH, or Web apps?
In general, I think Decision Support environments can have more indexes with less of a negative… OLTP is a bit harder to find the right balance. The hardest is combination environments…
Q: What was the diff between 0 and 1 on the third argument of DBCC IND?
The third parameter is for Index ID. 0 = Heap, 1 = Clustered Index, 2-250 are non-clustered index ids.
Q: Should we keep the install defaults like Collation, and others? Or do you have recommendations?
Generally speaking, yes – I would recommend keeping the defaults. However, I do recommend that some development environments – especially those who are producing software for sale. When the database can be installed/setup on other servers that may have different code pages, etc., you might want to use binary case-sensitive sort. This will make your application more consistent – even with case-sensitive servers…and you never know if one of your customers may have one! It will also help to make you a better coder.
Q: What design considerations should I take into account when planning my primary keys, indexes and clustered indexes for databases that will be used with replication?
There are no specific requirements for Replication other than the fact that you must have a Primary Key… but the same best practices discussed today apply and should be used in replication as well.
Q: I am using SQL Server 2000. I have a table that is used a lot on the web server. The scan density is 97%. Should I adjust the indexing and/or Primary Key?
I tend to think more about how fast it drops rather than the actual number. But – 97% is pretty good in general. What you might want to do is a rebuild IF you have off hours time to rebuild (in SQL Server 2000 rebuilding can be done offline ONLY). Otherwise, you might just want to setup a defrag procedure. However, this is NOT
Q: Why are so many SQL commands undocumented and how do you get to know how to use them if they are undocumented?
First and foremost, I want to strongly urge you to take your time with commands that are undocumented. If the commands are really useful for day-to-day operations, they’re probably already documented. Occasionally, development teams add additional functionality during development to test their functionality. These commands – while proving useful in some cases – do not generally go through the same quality assurance procedures that fully documented commands go through and as a result, I would be very cautious before using most undocumented commands.
Having said all of that, you’ll find a lot of useful tidbits in articles, on webcasts, in books, etc. Just make sure you test the command thoroughly and don’t expect to get support on it. In general, undocumented commands are NOT supported.
Q: What is the command for scan density on SQL Server 2000?
DBCC SHOWCONTIG. If you’re interested in learning more about defragmentation on SQL Server 2000, please review this MSDN Webcast: SQL Server 2000 Performance Tuning: Index Defragmentation Best Practices
Q: In SQL Server 2000, DBCC SHOWCONTIG, can you explain extent scan fragmentation and if we should pay attention to it?
We’re going to go over this more with our next session on Index Defrag but Extent Scan Fragmentation describes whether or not the extents that an object owns are “next” to each other. If there are gaps then this implies that there are other objects “interleaved” with this one. This is not ideal, defragmenting the object will help to improve scans and should.
Q: The size of the database doubles when a clustered index is added?
Once the index is created the space needed to build the index is released. However, while the index is being created you need roughly 2-3 times the TABLE size to build it. If you have a table which is 1GB then you should plan for at least 2-3GB of free space in order to build or rebuild a clustered index. However, it’s only the table’s size that doubles and it’s only during the index creation NOT after the index has been created.
Q: Aren't GUIDs too large to use for an efficient index?
No, an index can be useful even if it’s a large percentage of the size of the table – mostly, because of the order of the columns and the criteria used in the query. Hard to give a fast answer to this but in general 16 bytes isn’t so bad – much larger may even be acceptable.
Q: Does level 1 of a clustered index point to rows, or does it describe the minimum and maximum row on a page?
Basically, it’s a pointer to the page and it has a pointer to the lowest value on the page. Each pointer in the non-leaf always points to the “first” value on the page (meaning the lowest).
Q: Why would you be inserting using an index other than using the primary key? Assuming the primary key is appropriate, I wouldn’t. However, there are a lot of environments where the primary key is chosen and kept – for a variety of reasons. SQL Server creates the clustered index on the Primary Key by default but it is not required’ you can create the clustered index on something else. It’s the clustered index which defines the insert order – not the Primary Key; however, they are often one and the same.
Q: Should all indexes be clustered if unique? Would this increase performance on search results from SP?
I think the question should really be should all clustered indexes be unique. Generally, the answer to this is yes. But to answer the second part – would this increase performance on an SP – the answer here is not necessarily. Stored procedures have many reasons for being – or not being – optimal. Plan choice, optimization and recompilation issues all play a major factor in whether or not a stored procedure is optimal. However, in session 7, I will talk about plan caching and stored procedure optimizations!
Q: If there is no natural clustering key, what is the advantage of "coming up with one". Or, said another way, what is the basic reason why I want a cluster rather than a heap?A heap does not inherently have order to it (that’s the point) but when the records are not ordered then SQL Server needs to find a location in which to insert the row. This causes a lookup in something called a PFS (page free space) as well as a lookup in the table’s Allocation Map (actually called an Index Allocation Map in SQL Server 2000 and a Heap or B-Tree Map in SQL Server 2005). To help the insert performance you generally want to have the insert location defined. However, just having a clustered index is not the best answer as there are other internal dependencies on the log. Because of these internal dependencies you typically want the clustered index to be unique, narrow and static. If you’re interested in reading more about the reasons behind having a clustered index which is unique, narrow and static:Ever-increasing clustering key – the Clustered Index Debate……….again! MSDN Webcast Q&A: Index Defrag Best Practices – Clustering Key Choice In some cases this means adding a column to a table solely for the purposes of clustering on it. Q: Is it a good practice to just have non-clustered indexes?Well, that’s a different way of answering the prior question…and while heaps can be useful for high performance loading they are generally not as useful for day to day OLTP or combination OLTP/DSS tables. So, my recommendation is almost always – the RIGHT clustered index + additional supporting non-clustered indexes. Q: In large OLTP databases is it recommended to move reporting to a separate database which has a lot of non-clustered indexes and no clustered indexes?No… Again, it’s not really a binary thing (should you always have a clustered, or not always have a clustered). Really, there are cases for both BUT in most databases – even DSS (Decision Support System) – a clustered index helps to define a narrow and unique value that all indexes have in common. This can help other operations… So, even in a DSS environment I generally recommend a clustered, as well as additional supporting non-clustered. Having said that there are cases where a single clustered – setup for order by and range queries can be useful but that’s generally only when the large majority of queries all want * (all columns) and you typically want the exact search and order by in almost every case. Q: I've read that you shouldn't cluster a column that is an Identity column because it hurts performance for OLTP as all writes are on same page/area of disk? Your thoughts?
There are some cases where insert performance can be compromised – in very high volume OLTP systems (typically more than 400 inserts per second but this can vary) but in general, inserting into the same “hot” spot of a table improves performance because the activity is isolated and the needed pages are already in cache. Lots more on this on in other Q&As.
Q: If I add an identity column to my table to have a good clustered key, it will never be used in a query. Would I naturally also include the primary key, even though it would make the clustered key less narrow?
You don’t need to include the primary key is the clustered key is already unique, narrow, static and ever-increasing.
Q: Does this mean that vertical partitioning of a table is only helpful when the table is scanned?
No, there are still many benefits in vertical partitioning – such as with locking, scanning. I’d refer you to review the Webcast that was part 3 in this series, for more details.
Q: Can you use DTA against SQL 2000 databases?
Yes. You can use either the DTA or the ITW against SQL Server 2000. If you're looking for details about how to use the SQL Server 2000 Index Tuning Wizard you can review this whitepaper: Index Tuning Wizard for Microsoft SQL Server 2000.
Q: Is DTA in the Express version, too?
DTA does not come with the Express Edition but you can certainly do tuning on a higher version and then use that tuned database in Express.
Q: Can you explain briefly Index View?
It’s a materialized view – in the sense that the data as defined by the view – is actually materialized and stored on disk. This has both pro’s and con’s and needs to be evaluated fully before you implement these. Please check out these whitepapers for more details:
Improving Performance with SQL Server 2005 Indexed Views by Author: Eric Hanson and Contributors: Gail Erickson, Lubor Kollar, Jason WardImproving Performance with SQL Server 2000 Indexed Views by Author: Gail Erickson and Contributors: Lubor Kollar, Jason Ward contributor Q: Do you think one can become expert in SQL Server and other aspects of Design/Programming or does one need to specialize?
Wow, I think it this is a tough question; more than anything I would recommend learning all of the options available so that you can better architect the right solution. Once you think you have a good solution, then move forward to really learn that area/topic/features. I’ve been using a phrase with SQL Server 2005 a lot – Jack of all trades, master of some…
Q: Can you repeat the new function INCLUDE on index?
In SQL Server 2000 the maximum size of an index key is 900 bytes or 16 columns – whichever comes first. This key helps to keep an index’s b-tree smaller and more scalable. In SQL Serer 2005, the leaf-level of the index can include additional non-key columns. This continues to help keep the tree structure scalable but also allows you to cover more queries.
Q: I have installed SQL Server 2005 CTP, but can only find SQL Configuration Manager. I don't see the management tools that you are using. Any suggestions?
Actually, no. This seems a bit strange?! I might try uninstalling and re-installing, as well as reviewing the setup logs to see if there are errors within them. For SQL Server 2005, all installation log files are stored in C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG.
Q: if your non clustered index either includes all or part of the clustered index are the clustered index columns added again to the non clustered index?
NO! SQL Server only adds the column(s) of the clustering key – if they are missing.
Q: Did you mention the cool timebound option in DTA?
Yes, but only briefly. Vipul Shah discusses DTA in greater detail in these two webcasts.
TechNet Webcast: Performance Diagnosis in SQL Server 2005 (Level 300) by Vipul ShahTechNet Webcast: SQL Server 2005 Database Tuning Advisor (Level 300) by Vipul Shah Q: What is the fill factor and how is it used?
I think I’m going to defer you to the next session! Now, there’s some motivation. J
And here are a few comments that I’ve received! I just wanted to say thanks for your comments and thank yous… J It's your questions that keep me on my toes and current. This is still fun so keep it coming.
Comment: Thank you again, particularly for thoroughness and enthusiasm.
Comment: Let me add my voice to the chorus of "thank you's", Kim. You're presentations and verbal Q&As are great, and your online ones are a truly generous gift to the community. No other webcaster has come close to doing what you do. Many thanks from all of us.
Comment: I must compliment you on your Q&A in your blog – I've just checked it out – couldn't be more comprehensive!
So – 4 down, 6 to go! Lots more questions coming I’m sure J
If you’re interested in hearing more about fragmentation, how to view it, how to clean it up and finally, how to prevent it – here’s the registration link: