Tuesday, February 05, 2008

I know that Paul and I recommended that you subscribe to Conor's blog... but have you? He's posted some great details on Partitioning (Part 1 and Part 2) as well as statistics and it always reminds me of how much I can learn from other people's perspectives!

And, just to dove-tail on some of his statistics comments... I, too, have found that as tables get significantly larger AND have non-standard distributions of more than 200 distinct values (and un-even distribution between those values as well), that the optimizer just cannot possibly do a perfect job. The only way an optimizer can be good is when it can "find a good plan fast" (which I first heard from Nigel Ellis (former Development Manager of the Query Processor team) - back when he delivered a Pacific Northwest SQL Server User Group meeting many moons ago). The most important thing to realize is that it's just not possible to waste time to find the absolutely best plan... mathematically analyzing all permutations would be prohibitive - you'd have to take a vacation between query executions (wait, that's not a bad idea... I digress :).

The point:

  1. Make sure that statistics are up-to-date (either through the database option: auto update stats OR by manually updating statistics)
  2. Consider re-evaluating statistics over large tables (and, when poor performance occurs - look at the estimated rows v. the actual rows - if the estimate/actual are off by a fact of 10, then it could be the statistics). I'd try updating stats first and then if that doesn't work, updating with a fullscan. If neither of those work, I'd also re-evaluate other possible indexes (there are some distributions between tables being joined that just can't show a correct correlation between the values when in multiple indexes... sometimes the best index is a multi-column (ie. composite index)). 
  3. Consider breaking very large tables down into smaller chunks (not just table index partitioning but possibly Partitioned Tables AND Partitioned Views) as this can give the optimizer additional details about partiticularly interesting data sets. Even in SQL Server 2008, statistics are still table-level (filtered indexes can provide some, but not complete, relief... I'll give more details in a later post) but I'd often argue that some of the best table designs are not just for a single table. Consider the statistical, locking, and indexing implications for mixed workloads against a single table (and the tremendous amount of blocking that could occur in addition to varying access patterns). And, even while 2008 will offer Partition-level lock escalation, well-designed tables may not need it! I know I've mentioned this before but different perspectives on statistics, optimizers and the fact that a good optimizer has to be efficient in-and-of itself, remind me of some of the most basic things that are also the most common problems contributing to poor query performance.

Returning to the basics and optimizing a system from the ground up always leads to better scalability!

Enjoy!
kt

 

Tuesday, February 05, 2008 12:06:01 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Monday, July 10, 2006

Been thinking a lot about something that was mentioned in a few of my most recent posts... Especially when I get comments like "that's another item to add to our checklist" or "that's a good trick to add to our arsenal" and well, I thought in this blog entry I'd ask for your tricks that fall under the umbrella of designing for performance.

For example - do you change collations? I had a recommendation here.
For example - do you have a view that you want ordered? I had a recommendation (with caution) here. But - Adam Mechanic came back and said that he's used that trick to improve performance... and, I'm sure that's the case as well!
For example - do you have stored procedure parameters that are giving you grief? I had a series of recommendations in my Optimizing Procedural Code category here.

In fact, sometimes the best form of "hint" to SQL Server is NOT an optimizer hint but instead a more subtle change to the join (derived tables for example) or the infamous subquery -> join rewrite or the join -> subquery rewrite. I'm always asked "which is better - a subquery or a join" and I always answer YES. ;-)  OR taking a complex process and breaking it down into temp tables (I'd try to create views instead of temp tables first and see if the optimizer figures it out but there are cases when sometimes they just don't). Remember, it's not the optimizer's job to find the absolutely BEST plan; it's their job to find a good plan fast. And - they typically do. Really, no general "tricks" work ALL of the time and often they don't help at all but there are LOTS of things that I'm sure you've done and you really want to tell someone about it. How about here? I'm going to try to compile these tips/tricks into a best of...

Monday, July 10, 2006 7:27:35 PM (Pacific Standard Time, UTC-08:00)  #    Comments [4]  | 
Saturday, November 05, 2005
On Friday, November 4, Kimberly L. Tripp completed part 10 of a 10-part series on developing Reliable and Scalable applications for an MSDN Webcast series. Each session had the format of a 90 minute content session followed by a 30 minute content specific Q&A. This session was a summary session for the entire series and included a lot of great questions. To get you started with many of the links to other sessions and even for some of the scripts used throughout this series - check out this blog entry. Enjoy!
Saturday, November 05, 2005 8:53:49 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Monday, September 05, 2005
This is a much needed and much overdue blog entry... In 8 Steps to Better Transaction Log throughput, I mentioned a customer that was helped by TWO typical optimization problems I see. In that blog entry, I said I would write two blog entries - that one on transaction log optimization and another on common tempdb optimizations. Well, I forgot...until I was reminded with an email this morning (thanks Marcus!).
Monday, September 05, 2005 10:28:03 AM (Pacific Standard Time, UTC-08:00)  #    Comments [5]  | 
Sunday, September 04, 2005
Part 5 is done and we're half way through the series. Another great series of questions from part 5 of 10 in the series titled: Effectively Designing a Scalable and Reliable Database. Looking forward to another session this Friday! Have a great holiday weekend/Monday (for those of you/us in the US)!
Sunday, September 04, 2005 6:41:41 PM (Pacific Standard Time, UTC-08:00)  #    Comments [3]  | 
Tuesday, August 30, 2005
OK, so here are the technical questions answered from part 4 of the webcast series. Still 6 more sessions to go. See you on Friday!
Tuesday, August 30, 2005 9:41:42 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Friday, August 26, 2005
Lots of questions in today's MSDN Webcast titled: Indexing Best Practices. And - lots of great resources to continue learning! To allow you to check out a few of the other resources right away, I'm posting this "Resources ONLY" part of the Q&A. Wow, talk about a lot of homework... Enjoy. :-)
Friday, August 26, 2005 1:43:20 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
And - here's the written Q&A from Part 3 of the MSDN Webcast Series: Effectively Designing a Scalable and Reliable Database, A Primer to Proper SQL Server Database Development. Part 3's focus was "Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures." As for the Q&A, surprisingly, it's delivered before part 4...just barely! Lots of great questions (as usual!) and quite a few links and things to reference. Enjoy!
Thursday, August 25, 2005 11:09:11 PM (Pacific Standard Time, UTC-08:00)  #    Comments [3]  | 
Thursday, August 25, 2005
Most people I know - especially those who don't know a lot about Indexes (but even those who do :) - can't believe that I have a two-day class on NOTHING but Indexing... They ask me, "How can you talk about one topic for two days?" The scary part is my answer...
Thursday, August 25, 2005 7:43:42 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Monday, August 22, 2005
This is a PRELIMINARY DRAFT of a whitepaper slated to be released on MSDN. This whitepaper is only a draft and SQLskills/Microsoft make no claims or warranties to the accuracy within (or whatever the disclaimer is supposed to say :). Enjoy!
Monday, August 22, 2005 10:22:27 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Saturday, August 20, 2005
Well, I thought the first Q&A log was a monster at 4 pages... this one is 11. LOTS of questions in the Aug 12th webcast on Creating an Effective Recovery Strategy. We talked about Recovery Models, Backup Types and Best Practices and you guys had a TON of great questions! I would also suggest reading the chapter for download on the homepage of SQLskills for some additional details about the Environment Basics that Effect Recovery. All of the session links and resources are listed in this blog entry. Enjoy! With all the resource links from this one - you'll be busy till next week (and/or when I get Part 3's questions completed).
Friday, August 19, 2005 11:33:58 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Thursday, August 04, 2005

Wow, I've been horribly bad at blogging these days and I owe you a few entries as well... But, good news, I have a new MSDN Webcast series that will help to answer a lot of best practices in database design strategies that can help you achieve better scalability, availability and reliability. All of my demos are on SQL Server 2005 but many of the concepts apply to both SQL Server 2000 as well as 2005.

The sessions are all targeted at best practices and as such we went with a base of 200 level. Knowing me I can't go through an entire session without some internals and/or technical depth so you should expect 200-300 overall. I'm very excited about this series because everything counts! What I mean is that each and every best practice you implement helps to bring about a more scalable, available and reliable solution - there are no magic bullets only great overall design can achieve these targets!

So, what are we going to talk about in 10 - 90 minute presentations..... a lot!

Here's a link to the first one and as we get rolling, I'll blog a lot more about additional resources, best practices and of course, I'll blog the webcast Q&A as I've done in the past.

Enjoy!

Thursday, August 04, 2005 7:09:46 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Monday, June 06, 2005

I think there are numerous reasons for why I love technology but at the top of the list: learning. It's amazing to me that not a day goes by where I don't dig deeper into something or clarify it further. Even learning something trivial, like a new keystroke, can make our work easier to do and make us more productive at it. Things are constantly changing; the one thing I do know about technology is that there's a lot to learn!

So, to start, thanks for everyone's feedback (esp. Steffen Krause here) on some of my recent posts regarding LEFT and RIGHT based partitioning functions here. There was a lot feedback regarding the simplicity in syntax and declaration of a RIGHT-based partition function and by making the first partition of a RIGHT-based partition function empty, YOU'RE RIGHT! We can eliminate the need for data movement. There's no performance difference and it doesn't matter internally which type you choose but - I've heard you all loud and clear! You don't like dealing with the imprecision of a datetime data type when specifying upper boundaries. So, having said that - I need to make a few changes. In my next revision of my presentation materials, whitepaper, scripts, etc. I'll work to give both perspectives. For some reason, I still like LEFT-based partition function but RIGHT is definitely easier to define.

So, keep it coming everyone. You've hit the nail on the head. The fun part about technology is... no one knows everything and we're all always learning!

Enjoy Tech*Ed!

Blog edits brought to you by Richard Campbell (long story)

Monday, June 06, 2005 6:26:21 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Sunday, May 15, 2005
In my pre-conference session at PASS Munich, I had a question related to the complexities of my recommending LEFT-based partitions... To keep us on track I somewhat avoided the discussion simplifying it to - "it's a best practice because of issues related to MERGE and SPLIT" AND (in typical fashion, as you guys often do :), I was later "called" on it in a discussion. This blog entry details the reasons why I feel that LEFT-based partitions make more sense in a "rolling range" scenario.
Sunday, May 15, 2005 7:20:37 AM (Pacific Standard Time, UTC-08:00)  #    Comments [3]  | 
Monday, October 25, 2004

The title is a common question I've received in the past and I thought I'd take a few minutes to explain a bit about keys and indexes...

This is by no means a lot of detail regarding relational theory, etc. but there are a few things that we should quickly review to make sure that the basis for indexes being created makes sense. First, from a relational theory perspective every table must have a primary key. From SQL Server's perspective it's not a requirement but it's generally a good idea. A primary/unique key are entity identifiers. Each are a unique way of identifying a row. There are subtle differences between the two - in implementation:

Primary Key

  • In SQL Server the Primary Key is enforced through a Primary Key Constraint.
  • None of the columns that make up the primary key allow nulls.
  • The values in the Primary Key must be unique - to enforce uniqueness (as well as make it efficient), SQL Server creates a unique clustered [composite] index on the column(s) that make up the key.

Unique Key

  • In SQL Server the Primary Key is enforced through a Unique Key Constraint.
  • The columns that make up the unique key CAN allow nulls but not for more than one complete key. Meaning that allowing Nulls on a single column unique key really only allows NULL (only one NULL value). Overall, allowing Nulls in a column (like Social Security Number) doesn't really make much sense but if you need to then you can't go with a unique constraint - instead consider a unique index. At that point, I typically get the question of what's the difference between a unique key and a unique index...
  • Allowing Nulls values in the columns that make up a composite unique key makes more sense as long as the complete key is not null for more than one row.
  • The values in the unique Key must be unique and to enforce uniqueness (as well as make it efficient), SQL Server creates a unique NON-clustered [composite] index on the column(s) that make up the key.

What's the difference between a unique index and a unique constraint?

  • A unique key CAN be referenced by a foreign key constraint and a column which has only a unique index cannot be referenced.
  • Constraints are checked before indexes and this can lead to a large multi-row insert/select or update to fail before modification. However, indexes might (for a large modification) be validated at the end (instead of row by row) so a large modification that has a failure will need to rollback at the end of the modification rather than before. This is a good point - and one I hadn't really thought of until I was poking around some of the Q&A on SQLMag's website. Here's where I found it.

So, all of this leads me up to the original question (yes, you knew I get here someday :) and that's “When did SQL Server stop putting indexes on Foreign Key columns?”

First, SQL Server has NEVER put an index on a foreign key column... Indexes are used (as described above) to make the lookup (in a primary or unique key) for a duplicate value FAST. If the keys are ordered then checking to see if one already exists is trivial (i.e. fast). There is NO reason for SQL Server to put an index on a foreign key column as the column does not (and probably would never be) unique (if it is then it's likely to have a primary or unique key on it as well - as in a 1-1 relationship). So, that leads me to another key point...

Are there any benefits to indexing foreign key columns? YES

  • Better performance on maintaining the relationship on a delete of a primary/unique key. When you delete a key row, SQL Server must check to see if there are any rows which reference the row being deleted.
    • If the foreign key relationship is defined with NO ACTION (on update/delete) then a referenced row CANNOT be deleted as it would leave the referencing rows “orphaned.” To find the rows efficiently an index on the foreign key column helps!
    • If the foreign key relationship is defined with CASCADE (on update/delete) then when a referenced row is modified all of the referencing rows must be modified as well (either updated to reflect the new value or on cascade delete). To find the rows to modify efficiently, an index on the foreign key column helps!
  • Better join performance - for many of the reasons above, SQL Server can more effectively find the rows to join to when tables are joined on primary/foreign key relationships. However, this is NOT always the “best” indexing choice for joins but it is a good start. 

Finally, if you want a few titles related to relational theory check out these links:

An Introduction to Database Systems, Eighth Edition
     by C.J. Date
     E.F. Codd

Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design
     Mike Hernandez

Have fun!
kt

Monday, October 25, 2004 3:51:17 PM (Pacific Standard Time, UTC-08:00)  #    Comments [4]  | 
Tuesday, July 20, 2004
Monday's MSDN Webcast titled: Indexing for Performance - Index Defrag Best Practices went well and as promised I'm working on a formal write-up of the entire post-webcast Q&A. However, I've not had a chance to complete it... For simplicity I'm going to post what I've got now, wrap it up tomorrow and then make sure to post all of the additional resources on SQLskills.com. For right now, the resources are only listed here and it's quite a list already - roughly 2500 words to start!
Design | Events | Indexes | Resources | Tips
Tuesday, July 20, 2004 9:31:47 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Friday, July 09, 2004
In an attempt to try to put together best practices that lead to a more performant and manageable system I'm going to start with some simple ones... Table creation and table design.
Design | Tips
Friday, July 09, 2004 3:19:12 PM (Pacific Standard Time, UTC-08:00)  #    Comments [3]  | 
Thursday, July 08, 2004
In having worked with databases - and gone to many large events - I've been asked some of the most interesting questions... One of my favorites (and this was years ago) was when someone came up to me after an Indexing Best Practices Session, and asked if I could answer an index question. I said yes...
Design | Indexes | Opinions | Tips
Thursday, July 08, 2004 6:16:31 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 

Theme design by Jelle Druyts

Pick a theme: