OK, so, I don't blog very often. I don't know what it is... I think it's that I feel like I always need to blog huge posts and the thought of writing my huge post makes me not want to blog... so, sometimes my time-between-blogging (TBB :) is long. I'm going to turn over a new leaf (no, really... I'm *really* going to try this time!!) and I'm going to try and write smaller posts and more of them. And, yes, don't worry, I'll still write the long in-depth ones but I'll at least try to give you a few tidbits of things that I encounter - more frequently. And, that's partially why I'm posting this entry...

I've been working quite a bit with SQL Server 2008 and yesterday I went to test of a few things with database mail. Normally this gets setup quite early and gets setup with a lot of other things so restarts of the server/agent happen because of saving/tweaking my test VPC. I don't usually go in and setup Database Mail and then immediately try to send mail. Yesterday, I did... and, guess what, it didn't work. I got error [264] An attempt was made to send an email when no email session has been established. So, I felt like I had seen that before... and, with a bit of web searching I landed on Gops Dwarak's blog for a known issue of SQL Server 2005: http://blogs.msdn.com/gopsdwarak/archive/2006/04/25/583434.aspx. And, yes, restarting the SQL Server Agent solved the problem. I'm surprised that this is still a bug in 2008 but it's not entirely the worst I could come upon. However, having said that, I also thought that there was a general initiative for software to stop requiring restarts of services and/or the OS. And, yes, it's *just* an Agent restart so it won't directly impact your server's availability but, I'm still surprised that it wasn't resolved. Has anyone else found a bug (or even a documented "issue") that requires a service to be restarted - and you think it shouldn't?

And, well, that's definitely not the most interesting thing I've learned/found with SQL Server 2008, there's lots more (yes, I know - I need to blog a lot of it :). The most fun I've had has probably been learning sparse columns and filtered indexes. For right now, I want to give you some quick key points about some of the changes around indexes and in particular around sparse columns/filtered indexes:

  1. Do NOT create non-clustered indexes on sparse columns without filters (a filter that says WHERE sparsecolumn IS NOT NULL) because indexes do NOT have sparse columns in their definition. So, if you don't use a filter you will end up storing all of the NULL values in the index - which will waste a tremendous amount of space.
  2. A non-clustered index on a sparse column (without a filter) will be the same size as a nonclustered index on a non-sparse column (which is essentially the same point as above but, this might help clarify it a bit).
  3. Do NOT think that the increase in total indexes (from 250 in 2005 to 1000 in 2008) is because you should have more indexes... it's NOT. It's specifically because you might have a lot more columns (these changed as well from 1024 in 2005 to 30000 in 2008) because even that increase should not be used unless you're using sparse columns... So, you STILL want to use a lot of the best practices we've recommended in past webcasts, whitepapers, etc. and you still want to care about row size (and page density) and therefore work to create narrower tables (in general and relatively speaking - depending mostly on usage patterns). However, if you have a need for *lots* of wider tables BECAUSE you have an interesting set of properties that only some rows will have (i.e. the main reason to use sparse columns) then using sparse columns to handle these columns that will largely be filled with NULLs is a good thing. It's a good thing because a row that has a NULL for a sparse column will take ZERO BYTES. Absolutely no space is used for a sparse column that is NULL. So, this allows your tables to be wide (in definition) but your rows to be narrow (in practice). And, with well defined indexes you can VERY efficiently and effectively search on these properties.

OK, I hope to get some examples posted as well. Between Paul and I and our upcoming events - where we're demo'ing/discussing a lot of these principles, we'll plan to post a demo or two on how effective these really can be. And, I know... some of you will fight back with the thought that sparse columns introduce bad database design practices... I know, you want (instead) tables that have name/value pairs - which are the way most of us did this in all versions prior to 2008. However, name/value pairs tables become fragmented messes that also have fragmented indexes and therfore overall poor performance (for both inserts and queries). Not to mention, they can be difficult to query/join with (because you need to join multiple times to retrieve multiple properties) and the code gets messy quickly. Anyway, sparse columns - while they may not seem quite right at first - can really be a *MUCH BETTER* way to design (and perform!!!) around this problem.

And, speaking of events. We have a bunch of upcoming events... I'll give you the short bulleted list here because I'm sure many of you also read Paul's blog and he's mentioned quite a few of these coming up. Also, if you're interested in learning more details for these, the full abstract/links for many of these can be found on our Upcoming Events page. However, specific links are also listed below!

  • Best Practices in Performance and Availability for SQL Server 2005/2008, 1-3 September 2008 in Hatfield, England. You can get more details/register here
  • Dublin SQL Server User Group, Index Internals and Fragmentation, 4 September 2008, Dublin, Ireland. Bob Duffy blogged about this here and you can get more details/register here.
  • Microsoft SQL Academy 2008 - Session 1, 5 September 2008, Dublin, Ireland. You can get more details/register here.
  • Indexing for Performance in SQL Server 2000/2005/2008, 8-9 September 2008, Edinburgh, Scotland. You can get more details/register here.
  • SQL Connections "Power Workshops Series" Microsoft SQL Server 2008 Overview for Database Professionals (Hands-on — Bring Your Own Laptop), 6-8 October 2008, San Francisco, CA, USA. You can get more details/register here.
  • Microsoft Tech Ed EMEA ITPro, 3-7 November 2008, Barcelona, Spain. You can get more details/register here.
  • SQL Connections Fall Conference, 9-14 November 2008, Las Vegas, USA. You can get more details/register here.
  • PASS Community Summit 2008, 17-21 November 2008, Seattle, WA, USA. You can get more details/register here.

And, the last thing I'll leave you with is a recipe for the best darn Chocolate Chip Oatmeal Cookies *ever*. OK, I did a web search for exactly that (ok, it all started when Paul (who really doesn't really like sweets) decided that for his birthday he wanted Choc Chip Oatmeal Cookies (we asked becuase the girls and I wanted to bake something)). The search led me to here. And, they're definitely right that these are the best cookies ever! And, they make a few really good points:

  1. Don't microwave the butter... set it out for an hour before you're going to make the cookies. It's definitely different/better when the butter is naturally soft.
  2. Definitely take the cookies out when they look like they have a few more mins to go...

But, Paul's birthday was in July... and, it was not the only time we've made these cookies... mostly because *everyone* we introduce to these cookies is addicted - yes, the girls, me, Paul, my Mother, Brian Randell (who was forced to take some "togo" after a BBQ here) and countless of our other friends since this is now our "when-we-entertain" dessert ;-). So, since I've had the opportunity to tweak the recipe myself, here's what I do differently...

  1. I add less chocolate - usually 1-1.5 cups of chocolate chips. Sometimes I mix half/half semi-sweet and milk chocolate but I always use closer to 1 cup...
  2. I add about 1 cup of coconut - right at the end, with the chocolate chips...
  3. I make a single 12x18 cookie sheet "uni-cookie" and then I cut it up like brownies. As for the cook time, I've found that this giant cookie takes right about 14 minutes (but, you'll have to experiment with your oven... I usually take the giant cookie out when it's puffed up and the edges are just starting to go lightly brown). Oh, and you don't have to spread this out perfectly to the edges - it will expand and fill in. Just get somewhat close. The nice thing about this form factor is that it's a lot faster than making 36 individual cookies and then you get to choose the size(s) when you cut up the uni-cookie.

Oh, and don't blame me when you make 10 batches of these things and get all of your friends addicted as well. They really are yummy!!

Next up... installing Data Dude CTP16 with SQL 2008 and Visual Studio... what works with what and what's the best install order!!!

Thanks for reading!
Kimberly

I know that Paul and I recommended that you subscribe to Conor's blog... but have you? He's posted some great details on Partitioning (Part 1 and Part 2) as well as statistics and it always reminds me of how much I can learn from other people's perspectives!

And, just to dove-tail on some of his statistics comments... I, too, have found that as tables get significantly larger AND have non-standard distributions of more than 200 distinct values (and un-even distribution between those values as well), that the optimizer just cannot possibly do a perfect job. The only way an optimizer can be good is when it can "find a good plan fast" (which I first heard from Nigel Ellis (former Development Manager of the Query Processor team) - back when he delivered a Pacific Northwest SQL Server User Group meeting many moons ago). The most important thing to realize is that it's just not possible to waste time to find the absolutely best plan... mathematically analyzing all permutations would be prohibitive - you'd have to take a vacation between query executions (wait, that's not a bad idea... I digress :).

The point:

  1. Make sure that statistics are up-to-date (either through the database option: auto update stats OR by manually updating statistics)
  2. Consider re-evaluating statistics over large tables (and, when poor performance occurs - look at the estimated rows v. the actual rows - if the estimate/actual are off by a fact of 10, then it could be the statistics). I'd try updating stats first and then if that doesn't work, updating with a fullscan. If neither of those work, I'd also re-evaluate other possible indexes (there are some distributions between tables being joined that just can't show a correct correlation between the values when in multiple indexes... sometimes the best index is a multi-column (ie. composite index)). 
  3. Consider breaking very large tables down into smaller chunks (not just table index partitioning but possibly Partitioned Tables AND Partitioned Views) as this can give the optimizer additional details about partiticularly interesting data sets. Even in SQL Server 2008, statistics are still table-level (filtered indexes can provide some, but not complete, relief... I'll give more details in a later post) but I'd often argue that some of the best table designs are not just for a single table. Consider the statistical, locking, and indexing implications for mixed workloads against a single table (and the tremendous amount of blocking that could occur in addition to varying access patterns). And, even while 2008 will offer Partition-level lock escalation, well-designed tables may not need it! I know I've mentioned this before but different perspectives on statistics, optimizers and the fact that a good optimizer has to be efficient in-and-of itself, remind me of some of the most basic things that are also the most common problems contributing to poor query performance.

Returning to the basics and optimizing a system from the ground up always leads to better scalability!

Enjoy!
kt

 

Been thinking a lot about something that was mentioned in a few of my most recent posts... Especially when I get comments like "that's another item to add to our checklist" or "that's a good trick to add to our arsenal" and well, I thought in this blog entry I'd ask for your tricks that fall under the umbrella of designing for performance.

For example - do you change collations? I had a recommendation here.
For example - do you have a view that you want ordered? I had a recommendation (with caution) here. But - Adam Mechanic came back and said that he's used that trick to improve performance... and, I'm sure that's the case as well!
For example - do you have stored procedure parameters that are giving you grief? I had a series of recommendations in my Optimizing Procedural Code category here.

In fact, sometimes the best form of "hint" to SQL Server is NOT an optimizer hint but instead a more subtle change to the join (derived tables for example) or the infamous subquery -> join rewrite or the join -> subquery rewrite. I'm always asked "which is better - a subquery or a join" and I always answer YES. ;-)  OR taking a complex process and breaking it down into temp tables (I'd try to create views instead of temp tables first and see if the optimizer figures it out but there are cases when sometimes they just don't). Remember, it's not the optimizer's job to find the absolutely BEST plan; it's their job to find a good plan fast. And - they typically do. Really, no general "tricks" work ALL of the time and often they don't help at all but there are LOTS of things that I'm sure you've done and you really want to tell someone about it. How about here? I'm going to try to compile these tips/tricks into a best of...

Categories:
Design | SQL Server 2005 | Tips

Hey there everyone! I know I still owe you a few Q&A entries (for sessions 7, 8 and 9) but I wanted to get this blog entry out there so that you can play a bit with some of the resources. This series was targeted at developers but really helps to "bridge the gap" between development and administration by always remembering the impacts of what you implement (and techniques to help you to see if you do). More specifically, everything you do and/or design, has the potential for a negative impact to something else - there's no free lunch, eh?

So, this series focused more on Scalability but always remembered the impact to availability and/or reliability. This last session brought together the three primary areas to remember while developing a scalable and reliable architecture:

  1. Know your data
    • Design for Performance - Sessions 1, 2, 3, 6, and 8
  2. Know your users
    • Indexing for Performance - Sessions 4, 5 and 9
    • Optimizing Procedural Code - Session 7
    • Controlling Mixed Workloads and Concurrency - Session 6
  3. Users lie
    • Profile - to make sure that you're tuning what's really happening as opposed to what you think was going to happen! - Session 9

This last session had some great questions and as a result, a few new resources were used. Here are a few of the things we talked about:

Event Notifications and DDL Triggers

DMVs

Webcast links for the entire series!

Part 1: Creating a Recoverable Database

      For the MSDN Download for Part 1, click here.

      For the SQLskills Blog Entry for Part 1, click here

 

Part 2: Creating a Reliable and Automated Backup Strategy

      For the MSDN Download for Part 2, click here.

      For the SQLskills Blog Entry for Part 2, click here.

 

Part 3: Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures

      For the MSDN Download for Part 3, click here.

      For the SQLskills Blog Entry for Part 3, click here.

 

Part 4: SQL Server Indexing Best Practices

      For the MSDN Download for Part 4, click here.

      For the SQLskills Blog Entries for Part 4

         Resource links blog entry, click here.

         Q&A blog entry, click here.

 

Part 5: SQL Server Index Defrag Best Practices

      For the MSDN Download for Part 5, click here.

      For the SQLskills Blog entry, click here.

 

Part 6: Mixed Workloads, Secondary Databases, Locking and Isolation

      For the MSDN Download for Part 6, click here.

      For the SQLskills Blog Entry for Part 6, click here.

 

Part 7: Understanding Plan Caching and Optimizing Procedure Performance

      For the MSDN Download for Part 7, click here.

 

Part 8: Data Loading and Aging Strategies - Table and Index Partitioning

      For the MSDN Download for Part 8, click here.

 

Part 9: Profiling for Better Performance

      For the MSDN Download for Part 9, click here.

 

Part 10: Session Summary - Common Roadblocks to Scalability

      For the MSDN Download for Part 10, click here.

      Transcript can be found here.

So, the series comes to an end (even though I still have more work to do). I have to say that it was a lot of fun and I enjoyed everyone's questions. And /start shameless plug here/ starting in March, SQLskills will begin a 10-12 part series on TechNet. The series will include sessions from my colleague Bob Beauchemin as well as me. This will definitely be more Operations and DBA focused but for all of you developers - it may help you better understand the system, High Availability and a myriad of New Features in SQL Server 2005.

 

I hope to see you there - or at least your DBA... ;-)

 

Thanks again everyone,

Kimberly

This is a much needed and much overdue blog entry... In 8 Steps to Better Transaction Log throughput, I mentioned a customer that was helped by TWO typical optimization problems I see. In that blog entry, I said I would write two blog entries - that one on transaction log optimization and another on common tempdb optimizations. Well, I forgot...until I was reminded with an email this morning (thanks Marcus!).

First - a bit of understanding of TempDB - what goes there?

  • Internal temporary objects needed by SQL Server in the midst of other complex operations. For example, worktables created by a hash aggregate will be stored in TempDB or interim tables uses in hash joins (almost anything that shows as "hash" something in your query plan output is likely to go to tempdb).
  • User objects created with either # (for local temporary objects), ## (globabl temporary objects) or @ (table variables)
    • # = Local temporary object
      Local temp objects are objects accessible ONLY in the session that created it. These objects are also removed automatically when the session that created it ends (unless manually dropped).
    • ## = Globabl temporary object
      Global temporary objects are objects that are accessible to ANYONE who can login to your SQL Server. They will only persist as long as the user that created it lasts (unless manually dropped) but anyone who logs in during that time can directly query, modify or drop these temporary objects. These objects are also removed automatically when the session that created it ends (unless manually dropped) OR if being used by another session when the session that created it ends, when the session using it finishes using it (and it's only as long as any locks are held). If other sessions need more permanent use of a temporary object you should consider creating a permanent objects and dropping it manually.
    • @ = User-defined Table Variable
      User-defined Table Variables were introduced in SQL Server 2000 (or, wow - was it 7.0?) and provide an alternative to temporary tables by allowing you to create a variable defined as type TABLE and then you can populate and use it in a variety of ways. There has been A LOT of debate over whether or not you should always use table variables or always use temp tables. My response is that I ALWAYS avoid the word always! My point is that table variables are NOT always better nor are temp tables always better. There are key uses to each. I tend to like temp tables in scenarios where the object is used over a longer period of time - I can create non-key indexes on it and it's more flexible to create to begin with (SELECT INTO can be used to create the temp table). I also have the ability to use the temporary table in nested subprocedures because it's not local to the procedure in which it was created. However, if you don't need any of those things then a table variable might be better. When it is likely to be better - when you have smaller objects that don't need to be accessed outside of the procedure in which it was created and when you only need KEY indexes (a table variable ONLY supports the indexes created by a create table statement - meaning PRIMARY KEY and UNIQUE KEY).
  • Objects created by client applications - this is possibly a large part of your problem... Profiling can help you to determine if there's a lot of TempDB usage from your client applications.

OK, so now that you know what goes there - how do you make it optimal?

First and foremost, TempDB is in cache just as any other database is in cache. TempDB does not spill to disk unless you are low on cache and/or if you have a lot of inflight transactions in TempDB. Although TempDB is not persisted from shutdown to restart - it still needs to do some logging and therefore you should consider its optimization a lot like other databases.

Things you should do for TempDB (that are a lot like what you should do for every database):

  1. Isolate the data and log portion of TempDB.
  2. Place them on clean, defragmented disks.
  3. Pre-allocate them so they don't need to do a lot of autogrowth.
  4. Make sure you have sufficient memory to support active objects (check for disk activity to the disks that contain TempDB files).
  5. Make sure that transactions are written efficiently so that there are no unusually long running transactions that are unnecessarily holding resources (and therefore locks and therefore log activity).

And - if you need to move TempDB, you should review this KB Article: Moving SQL Server databases to a new location with Detach/Attach

Things you should do SPECIFICALLY for TempDB (especially if you're running on a multiproc machine):

Before I say what... let me tell you why? TempDB has a large number of objects being created all the time. For an object to be created, space must be allocated to it. Space allocation is determined by looking at some of the internal system pages (the GAM, and SGAM). In the end, it is these pages that start to have significant contention (with just one file) in a VERY active TempDB. To minimize that contention you can create multiple files.

  1. Consider creating multiple files for TempDB (even if on the same physical disks) so that there is less of a bottleneck when objects are being allocated. Make sure to read associated KB.
  2. Consider setting a trace flag to have object allocation grab extents rather than pages. Make sure to read associated KB.

BOTH of these last two are described in detail by a KB article: FIX: Concurrency enhancements for the tempdb database.

OK - so that should really help! Moving forward (meaning SQL Server 2005), having multiple files can still help for TempDB.

Effectively Designing a Scalable and Reliable Database

A Primer to Proper SQL Server Development
New Features in Indexing and Index Maintenance Best Practices, Part 5 of 10

Presented by Kimberly L. Tripp, SQLskills.com

 

Q: Can I view a recording of this webcast?

Part 5 can be replayed by clicking here.

All “on-demand” webcasts can be found here.

 

Q: Where can we get the demo scripts AND the sample database: Credit?

The demo scripts are in this zip (20050902MSDNDemoScripts.zip (8.52 KB)); here in this blog entry. However, at the series completion, I will also create an entry under Past Event Resources for the entire webcast series.

 

To download the ZIP of the Credit Database Backup click here. Once unzipped, restore this backup to SQL Server 2000 or SQL Server 2005. The backup is a SQL Server 2000 backup and can be restored to either version! If restoring to SQL Server 2005, you might want to change the destination for the data and log file as the path will probably be different.

 

Q: Where are the links to all prior Webcast Q&As from this series?

Part 1: Creating a Recoverable Database

      For the MSDN Download for Part 1, click here.

      For the SQLskills Blog Entry for Part 1, click here

 

Part 2: Creating a Reliable and Automated Backup Strategy

      For the MSDN Download for Part 2, click here.

      For the SQLskills Blog Entry for Part 2, click here.

 

Part 3: Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures

      For the MSDN Download for Part 3, click here.

      For the SQLskills Blog Entry for Part 3, click here.

 

Part 4: SQL Server Indexing Best Practices

      For the MSDN Download for Part 4, click here.

      For the SQLskills Blog Entries for Part 4

         Resource links blog entry, click here.

         Q&A blog entry, click here.

Q: How can I replay previous sessions? I thought we were going to get emails for replaying -- but I haven't received any replay emails.

You will receive replay emails ONLY when you register for these sessions through MSDN. We’ve come to find out that there are other ways to register but it’s only through MSDN that we know for sure you will receive the replay information. Regardless, you can always find the “on-demand” version of the sessions here

 

Related Resources

MSDN Webcast: Indexing for Performance – Proper Index Maintenance

MSDN Whitepaper: Microsoft SQL Server 2000 Index Defragmentation Best Practices

TechNet It’s ShOwtime Webcast: Index Defragmentation with SQL Server 2005 

Technical Questions
Q: In your script, what is "HA Requirements"?
HA = High Availability. This is the requirement that your table stay online and available. Some companies are trying to achieve 99.999% uptime, this is especially challenging when even maintenance operations take a table offline.

Q: If you create extra indexes is there a easy to configure utility that you can run across an application after it has run for a few months to list keys that were never or infrequently used?
Use one of the new DMVs: sys.dm_db_index_usage_stats. To see the complete list of DMV objects, use the following query:

SELECT * FROM sys.system_objects
WHERE [name] LIKE 'dm[_]%'

Q: What is DMV again?
Dynamic Management View. These are new objects which give information about in-memory objects and state information.

Q: What are the parameters and their usage – for sys.dm_db_index_physical_stats?

   (DatabaseID, ObjectID, IndexID, PartitionNumber, Mode)

   DatabaseID = [ NULL | 'DatabaseID' ]

NULL: returns information for ALL databases, if NULL is used no other options can be supplied. This returns ALL indexes for all objects in all databases. Easy but possibly slow.
DatabaseID: smallint type. Refers to the ID for a specific database. DB_ID() or DB_ID('DatabaseID') can be used. The latter allows you to run this from ANY database as long as you have access. However, 3-part naming must be used.

   ObjectID = [ DEFAULT | NULL | 'ObjectID' ]

DEFAULT/NULL: return ALL base data: CL, Heap, LOB for the specified database.
ObjectID: int type. Refers to the ID for a specific object. OBJECT_ID('TableName') can be used. When using OBJECT_ID, you can use 1/2/3-part naming. Be sure to use 3-part when executing outside of database.

   IndexID = [ DEFAULT | NULL | 'IndexID' ]

DEFAULT/NULL: All indexes
IndexID: tinyint type. Refers to the ID of a specific index.

   PartitionNumber = [ DEFAULT | NULL | # ]

DEFAULT/NULL/0: return ALL partitions
#: returns only the details about specific partition. When a PartitionNumber is specified then an IndexID must also be specified.

   Mode = [ DEFAULT | NULL | 'SpecificMode' ]

DEFAULT/NULL/LIMITED: return FAST scan and use only an IS (Intent Shared) Table-level lock. This lock blocks ONLY eXclusive TABLE-level locks and schema changes. Excellent, relatively unobtrusive way to get fragmentation details.

LIMITED: IS Lock. Same as SQL 2000 WITH FAST, only page counts and EXTERNAL fragmentation displayed. Does not detail INTERNAL fragmentation and page density.

SAMPLED:  IS Lock. For tables less than 10,000 pages (~80MB), all details are produced. For tables of more than 80MB, two samples are done (1% and 2%) at every nth page. The samples are compared and if close, 2% sampling output returned. If not close, then up to 10% will be sampled.

DETAILED: S Lock. Entire table analyzed for both internal and external fragmentation. Returns one row for each level of the index from the leaf level (level 0) all the way up to the root level. This can help you determine fragmentation in the non-leaf levels but at the expense of holding a shared table level lock.

Q: How often should you run DEFRAG on your SQL server box? Should this be a part a regular schedule? Taking down SQL is their any other consideration?

First, the only thing that’s not available is the table being REBUILT. Defragging an index does not take that table/index offline. So, more than anything, it depends on what you’re trying to achieve. If you want achieve better availability on SQL Server 2000 then you might choose to defrag rather than rebuild – to keep your tables available.

Q: How often do you get such perfect tables in practice?
A table is always completely clean and contiguous after a rebuild. To periodically fix a table, you should use consistent and automated rebuild strategies.

Q: Do you have suggestions for developers using MSDE when customer’s demands can vary? Vary from few transactions to a large customer with many transactions.
The general best practices in database and table design scale from the low end all the way up to the high-end and in the end – helps your database scale!

Q: Can you touch on rules of thumb for "pad index"?
If fragmentation in the leaf level is minimized through proper index maintenance and fillfactor – then fragmentation in the non-leaf levels should be low as well. You rarely need to specify padindex unless you have widely varying distribution of data and really want to leave larger gaps because of strange densities of data.

Q: Do most of these "Index Rules" apply to Indexed-Views?
Yes! All indexes can become fragmented after data modifications... Your scripts should always look for fragmentation across all scripts.

Q: Can you discuss fragmentation WRT horizontal partitioning, especially range partitioning on the primary key?
SQL Server 2005 offers more granular rebuild options –but not necessarily online. In many cases, you might want to design a read-only partitioned table and keep the volatile portion of the table (especially if only one partition), in its own separate table – possibly using a partition view (or an inline table valued function) over these two tables.

Q: If I'm selecting from a table with a where FirstName = ... and LastName = ... and I have 2 indexes, one on LastName and another on FirstName. Are they both used?
With an AND – maybe. The optimizer will look at the Index statistics to determine if either of them selective enough to use only one index. If neither is selective alone and a better index does not exist (a better index for AND would be one that includes BOTH of the columns in the SAME index – as a composite index), then SQL Server may choose to join the indexes (index intersection).

Q: URLs on the Resources slide can't be read. Could you type then into the Q&A, please?
When the session is available for download (which is what happens when MSDN put this online), then you can access the URLs there as well. Typically, I place all of the links at the beginning of the Q&A – resources section. I’ll make sure to do this consistently!

Q: How does an uniqueidentifier used as a clustered primary key effect performance?
This is best answered by session 4. In short, a non-sequential GUID can cause a lot of fragmentation.

Q: What is ExtentFragmentation as reported by DBCC SHOWCONTIG and is it less important than Logical Fragmentation?
Extent Fragmenation refers to how many extents are next to each other. This is a bit more important than Logical Fragmentation as logical fragmentation shows whether or not the pages are next to each other.

Q: How much danger is there in the defrag processes? What kind of backup procedures do you suggest when you defrag?
More frequent transaction log backups. A defrag generates a lot of log information. However, it does so in mini transactions. As a result, transaction log backups can occur concurrently with the defrag process and even though the defrag is not complete, the transaction can still be cleared because the defrag process runs as small transactions instead of one long running transaction. This also improves concurrency because the locks are released throughout the process.

Q: Defraging a large index can cause the log file to grow quite large. Is there a way to minimize this other than frequently log backups?
Yes, you’re correct – defraging a large index WILL grow the log file quite large! As for minimizing this activity in the log – no way to do that. But – you’re correct in increasing the frequency of log backups!

Q: With very large tables, how much available disk space (both transaction logs and data drive) do you need to have to rebuild? Does it take less space to defrag than to rebuild?
Well, this is really a multipart question… First, log space for rebuilds is mostly dependent on the recovery model. If you are running the FULL recovery model then creating and/or rebuilding indexes will take enough log space for the entire rebuild to complete. If you are running in the BULK_LOGGED or SIMPLE recovery models then this operation will run as a bulk operation and will be minimally logged. While this will take less time and significantly less log space, you are giving up some features when switching recovery models. I would strongly suggest reviewing the second session to see if this is appropriate.

Now, as for data space – a rebuild will always require at least the table size in free space and possibly as much as double (if an online rebuild is being performed). Typically, when space estimates are being done (when capacity planning the database) I always recommend taking the largest table size and multiplying it by 2 or 3 – in order to make sure you have enough space for rebuilds. There is space needed for sorting as well – this can come from the database OR from tempdb (using the SORT_IN_TEMPDB option).

Defraging doesn’t move an object so it doesn’t take additional data space BUT it does require more overall log space because it runs as mini transactions instead of just one.

Q: Should we look at different fragmentation stats if there are multiple files in the same filegroup?
No, you still want to review average fragmentation. However, you may have more “fragments” in a table that spans filegroups; this does not necessarily mean that your table is fragmented.

Q: Are there any good third party tools for checking fragmentation and performing maintenance?
Unfortunately no revolutionary ones (that I know of and/or can recommend)...but I still have high hopes :)

Q: How do you determine the appropriate fill factor?
Unfortunately, there isn't a magic number... but, you can test your guestimate by seeing how fragmented the table becomes between your regularly scheduled defragmentation routines.

Q: Does it matter if I build the clustered index before/after rebuilding the non-clustered indexes?
You should always create the clustered index before creating non-clustered index but as for rebuilding - you can rebuild your indexes independently as a rebuild of the clustered does not (generally) cause a rebuild of the non-clustered. There is one exception to this rule in SQL Server 2000 – in 2000 ONLY, SQL Server will automatically rebuild the non-clustered indexes when a non-unique clustered index is rebuild. Why? Because the uniqueifiers are rebuilt.

Q: Will doing a defrag followed later by a rebuild decrease the work of the rebuild?
Not really. A defrag doesn’t move the object – only a rebuild does. However, you might minimize the cost of the sort…

Q: How does cache map to table pages, i.e., does free space in table pages have a 1:1 correspondence to wasted cache?
SQL Server reads the 8K page from disk into memory. The number of bytes that are wasted on disk are also wasted in memory. This is often the motivation for vertical partitioning! You might refer back to session three for more details on row/page structures!

Q: If switching a varchar cluster to a bigint and vice-versa in 2000, what would the best order of drop/create index?
Actually, this is the reason that CREATE with DROP_EXISTING was created… so that you could “change” the definition of the clustered.

DROP TABLE test
go

CREATE TABLE test
(
      testid      int               not null,
      col1       varchar(100)      not null
)
go

CREATE CLUSTERED INDEX testind ON test(col1)
go

CREATE CLUSTERED INDEX testind ON test(testid) WITH DROP_EXISTING
go

sp_rename 'test.testind', 'NewIndexName', 'INDEX'
go

sp_helpindex test
go

Q: What about instances of one name only? (like Madonna, Cher, etc. ;^)
Well, this is a good question and this is something that you might need to plan for in design. In these cases, you might allow NULLs in the lastname column and then make sure to search both when a lookup is performed. To be honest, I probably won’t do all that much to find these special first names – unless you wanted to do searches across both columns without knowing whether or not this is a first or last name. You might do something like this in a lookup

SELECT * FROM NamesTable
WHERE LastName = @variable
      OR (FirstName = @variable AND LastName IS NULL)

Comment: Just wanted to say I appreciate the blog you have put together.
Thanks for the thanks! It's a lot of work but I think it's great as a reference!! Even for me! To be honest, I can't always remember where to find things either!
J

Thanks!

 

So – we’re half way there – 5 more to go! And, lots more questions coming I’m sure J

 

For the next session, we’re going to cover Isolation and options in Isolation in SQL Server 2005. If you’re interested in hearing more isolation, locking/blocking – here’s the registration link:

MSDN Webcast: A Primer to Proper SQL Server Development (Part 6 of 10): Mixed Workloads, Secondary Databases, Wait States, Locking and Isolation

 

See you on Friday!

kt

Effectively Designing a Scalable and Reliable Database

A Primer to Proper SQL Server Development
Indexing Best Practices, Part 4 of 10

Presented by Kimberly L. Tripp, SQLskills.com