NOTE: These have NOT yet been completed and there are still a few questions that need to be answered. I will wrap these up shortly and also break them down into separate blog entries so they’re easier to wade through. I will also post the pdf, the webcast link (when it’s available), the demo scripts and links to all of these blog entries – on SQLskills – when everything is complete. BUT – here’s a good start with ALL of these resources as well as the majority of the Q&A. Have fun!


 


Q: Can I view a recording of this webcast?


It should be available on www.microsoft.com/usa/webcasts/ondemand shortly.


 


Q: Where can we get a print out of the slides?


I have posted a pdf of the presentation here as well as on the “Past Event” resources page on SQLskills.com here.


 


Q: Where can we get the demo scripts?


I have posted the demo scripts here (MSDNWebcast-Tripp-20040719-SCRIPTS.zip (68.37 KB)) as well as on the “Past Event” resources page on SQLskills.com here.


 


Q: Does Kimberly L. Tripp have a blog?


Yes, www.SQLskills.com/blogs/Kimberly and if you’re reading this on my blog…well, then you already know! J


 


Q: Do you speak at other events for more advanced tuning topics?


Yes, in fact, SQLskills has a new event coming up in October that’s a 5-day event on performance tuning on SQL Server. We’re calling these events “Immersion Events” as they’re focused, indepth and intense! Full details are on www.SQLskills.com.


 


Finally, here are all of the resource links I recommended in the resources section of the slides:



Q: What about using uniqueidentifier fields for keys in a clustered index, will this affect performance? They are random numbers so they are not monotonically increasing…


Ha, I always want to time how long it’s going to take for me to get this question. Seriously, I don’t think I’ve EVER given an indexing lecture (where I talked about the clustering key) where someone didn’t ask about GUIDs. And – once again – a great question! I do not really have a problem with GUIDs in general. I think they have a purpose in that they are relatively “random” numbers (they are not really random entirely as they get part of their value from SQL Server, part from Windows, part is based on time, part is ever increasing and part is random; however, the final value created does not have an ever-increasing or decreasing pattern to it). They truly make sense in distributed applications or those where you do not want a predictable row identifier; however, they are not always an appropriate choice for a key. But this leads me to another discussion – is the primary key always clustered and/or always need to be an identity. These three things are really three separate things and they do NOT have to be on the same column. I often prefer that all three are the same column as an identity is naturally unique, naturally static, naturally narrow as well as creates an ever-increasing pattern that minimizes splits/fragmentation due to inserts (again, only when this is also the clustering key). So I guess the question is – does it really need to be the Primary key? And well, that’s the debatable but if all three criteria come together that often gives you the largest gains. If the Primary key is not an identity (and it’s a very large transaction processing table with a lot of inserts/updates/deletes) then I would consider adding a surrogate column – solely for clustering – and make the Primary Key non-clustered. The non-clustered index will certainly become fragmented (if on a GUID) so as long as proper maintenance practices are in place you can reduce the excess disk space and improve modification performance.


 


Q: What about UPDATEs that change the row’s clustering columns?


This question was related to “what can cause fragmentation” where I was discussing UPDATEs to variable width columns. While updates to variable width columns can cause a row to widen (and therefore cause a split) it’s a good point to mention that volatility in the clustering key is far worse. In fact, in last month’s ewbcast I spent quite a bit of time discussing proper placement of the clustering key – to minimize fragmentation and in general, improve performance. There are three criteria for which I look in a clustering key: the key should be Unique, Narrow and Static. As a final criteria – and this helps to proactively minimize fragmentation – I look for the key to have an ever-increasing pattern.


 


Q: Does clustering rely on the operating system for performance?


Well, I have to admit I am not entirely sure of the question here… But – I think I understand from where it might originate (well, I’ll try). First – and unfortunately (and also somewhat seriously), I think many people lack creativity in choosing a term for something – at least in the computer industry! J Here, I think the confusion is between “clustered index” and the term  OS term/feature “clustering.” The term clustered index stems from having data “clustered” in a certain order or grouping. In previous releases, we tended to create the clustered index for range queries (like on LastName) and therefore people with the same LastName were “clustered” together. However, because the internals of SQL Server changed [significantly] between 6.5 and 7.0 the internals really dictated a change in where the clustered index should be created. So, even though the term didn’t change it doesn’t entirely apply. Now the OS term/feature “clustering” applies to high availability. Specifically, failover clustering refers to a feature of the Enterprise Edition of SQL Server – which requires a higher level of Windows (typically an “Advanced”, “DataCenter”, or “Enterprise” version) to support it. This really is not the primary focus for this webcast but below are a couple of references if you want to understand more about High Availability and Failover Clustering in SQL Server:



 


Q: How well are Microsoft’s standard products that use SQL optimized? Are there published studies on optimizing Microsoft CRM, Sharepoint Services, Project Server, and the various Microsoft Business Products?


This one I kind of answered during the webcast…First, I will start by mentioning that I’m NOT an expert on any of these products specifically. However, you are correct in that they are based on SQL Server and use SQL Server as their data store. What that means is that they too must be optimized and maintained; however, it is impossible for anyone to predict perfectly the exact usage of each of these products and as a result they might have a good selection of indexes but they are unlikely to have maintenance operations defined. However – again, I cannot say that I am an expert and I could be wrong! What I will say however, is that regardless of what is defined and what is maintained – you can create additional indexes, possibly drop indexes (more on this in a moment) and maintain indexes with NO application changes. The reason why I say “possibly” drop indexes is because queries/procedures, etc. will FAIL if index hints are used and the index does not exist at runtime. And while I’m fairly certain that NO Microsoft product hardcodes index hints, I have seen other third party products which do and unfortunately they become VERY hard to optimize and manage. In general, it is harder to drop indexes than add indexes. Nevertheless, it is always possible to defrag/rebuild indexes – regardless of the application whether Microsoft of not. So, while I can’t speak to each of these applications directly (although my partner works on Microsoft Project and he’s pretty sharp! J), you can always improve upon an application with better/more useful indexes (meaning more specific indexes for YOUR usage patterns and requirements) and proper/consistent/automated index maintenance.


 


Q: Is there any internal command to output query plan used by optimizer (not showplan or external tools available), which may show splits with DML operations?


Well, I’ll start by saying that these are really two different parts to your query processing. First, splits are NOT part of the optimization plan as they are only known at runtime. So, a plan really wouldn’t ever show them. As a result, I don’t believe an internal command exists to show this; however, I’m a bit confused as to why it must be “internal” and not something else? Are you just trying to determine which commands and in which tables you are having a lot of splits? (And IMO, this is the most important.). If that’s the case then you can start by using PerfMon to see if you have splits (and get a rough idea of when) and then use DBCC SHOWCONTIG to see which tables are fragmented (meaning they had the splits). Perf Mon will only tell you that you have them – not in which table they are occurring (but I will say that you might be very pleased with a few new features of SQL Server 2005 – look forward to playing with beta II Profiler and PerfMon). Right now, check out PerfMon SQL Server: Access Methods, Page Splits/sec.


 


Q: How do we use the I/O affinity option in SQL Server 2000?


Actually, I’ll let the KB take care of this one. Check out KB Article Q298402 titled: INF: Understanding How to Set the SQL Server I/O Affinity Option.


 


Q: Can we use MAXDOP with Index Creation?


No, MAXDOP as a query hint only works on DML statements. However, if you want to limit all operations to a reduced number of CPUs you can set the server setting Max Degree of Parallelism. See the sp_configure option in the Books Online as well as KB Article 329204 titled: INF: General Guidelines to Use to Configure the MAXDOP Option.


 


Q: Is there another way to determine fragmentation? I don’t want to use DBCC SHOWCONTIG as it locks objects.


Well, this is a good question and unfortunately there really isn’t a way to generate information such as Scan Density or Average Page Density without running DBCC SHOWCONTIG. However, if you decide that you’re only interested in Scan Density then you will be interested in the WITH FAST option for DBCC SHOWCONTIG. Using the WITH FAST option the command will only calculate the out of order extents (i.e. extent switches) and not calculate average row length, maximum row length, minimum row length, average page density, etc. and in using this option SQL Server will execute this command a lot faster thereby releasing the locks faster. However, locks will still be acquired. As an odd response though – I’d have to say that you could just defrag without looking… Meaning instead of even checking for fragmentation you could just execute DBCC INDEXDEFRAG. Since DBCC INDEXDEFRAG doesn’t use long running locks (it runs as “mini transactions” as it moves through the leaf level of the index) and doesn’t do anything when there’s no fragmentation (it really only moves data around when fragmentation exists) then you could do this without checking – and not creating blocking.


 


Q: How does the fill factor influence index structure?


FILLFACTOR – which is an option which ONLY takes effect when an index is initially built or later when an index is rebuilt or defragmented – defines the level of fullness for the leaf level of an index. Fillfactor ONLY applies to the leaf level and when set appropriately for the data and the volatility patterns of the data, it should help to minimize fragmentation thereby reducing the need to free space to be left at higher levels in the tree. However, if you also want to pad the upper levels of the tree with the same level of free space as the leaf level, you can add PAD_INDEX to the index statement and it will use the fillfactor setting for all levels of the index. For example:


 


CREATE INDEX Test


ON TestTable (TestColumn)


WITH FILLFACTOR = 80


 


Fills the leaf pages 80 percent full and the non-leaf pages are balanced such that at LEAST one entry is open on each leaf page.


 


CREATE INDEX Test


ON TestTable (TestColumn)


WITH FILLFACTOR = 80, PAD_INDEX


 


Fills BOTH the leaf and non-leaf pages 80 percent full.


 


Typically, PAD_INDEX is not needed unless you have extremely unbalanced activity and are unable to rebalance the tree (meaning you need to achieve high availability). However, I would say that PAD_INDEX is not perfect as implemented as I would rather set that value differently from fillfactor…


 


So, not a big fan of PAD_INDEX but feel it’s very important to set an APPROPRIATE fillfactor. (Review the webcast for more details on what I mean by “appropriate” fillfactor.)


 


Q: What would be the ideal settings [of fillfactor] for OLAP/DSS Databases?


To be honest, 100% is an ideal setting for a read-only decision support database, as new data is not causing splits. However, data loads (which build these reporting databases) can cause fragmentation. Ideally, you would drop your indexes, load your data and then rebuild your indexes upon completion. If you’re trying to reduce the amount of time the process takes AND you’re not loading in a lot of data (compared to what’s already there) then you might not want to drop and recreate indexes… Setting fillfactor in these cases will be based on how much data you will be loading compared with how much data you have. Set fillfactor to whatever that percentage is (if you’re loading 100,000 rows into a table which already has 900,000 rows then a fillfactor of 90 should have just enough room – IF the data is evenly distributed). If the data is NOT evenly distributed then you might want to drop this number a tad lower. AND – you only want to set this on indexes that are going to have splits. Remember, if your clustering key ever increasing and the new data coming is “appended” to the end of the table – both logically and by clustering key definition – then you only need the fillfactor setting on the nonclustered indexes. Regardless, it is still likely that you will end up with some fragmentation. Dropping the indexes, loading the data, and then rebuilding the indexes would certainly be better – but it will take more time. IF you’re loading a lot of data relative to what you already have then there’s NO question: drop the indexes first, load the data and then rebuild. When dropping indexes ALWAYS make sure to drop NC Indexes first and then drop the clustered index. When creating indexes ALWAYS make sure to create the clustered index first and then add the nonclustered. For more details on High Performance Data Loading check out the presentations on Gert Drapers’ site: www.SQLDev.Net, under Events. (FYI – There are all sorts of great presentations there!)


 


Q: When trying to determine the fill factor to set, would you recommend setting the same fill factor to all tables or just adjust the fill factor on the large tables and keep the small ones at the default?


I know I got this one


 


Q: Will Kimberly show the rebuild or defrag statement she used? I had to run to the restroom and missed that brief part.


I would have answered your question but I had to use the restroom myself… Seriously though, J the commands are:


            DROP and re-CREATE


            DBCC DBREINDEX (was added in 6.5)


            CREATE with DROP_EXISTING (was added in 7.0)


            DBCC INDEXDEFRAG (was added in 2000)


If I’m going to do a rebuild – I typically use DBCC DBREINDEX as it’s easier to automate and easier syntax in general. If my availability requirements don’t allow the table/indexes to be rebuilt because of locking, then you only have one choice for defraging – DBCC INDEXDEFRAG. I have a feeling that I cleared this up once you returned and certainly, you can also review the webcast on demand once it’s available!


Q: Regarding sp_RebuildIndexes, does the SP need to be created in the Master Database? Our companies Best Practices recommend against it. We use an “Admin User Database” for most shared stored procedures.


Sure, you can create the sp in any database (just make sure to modify the script) and then make sure to explicitly qualify the execution with the datasbasename.owner.procedurename and everything should work!


 


Q: What type of Defrag do the data maintenance plans use?


They use DBCC INDEXDEFRAG.