Sunday, August 24, 2008

YES!!!

OK, well, I guess I should be more specific because as in most things in SQL Server, the real answer is "it depends". And for these two options, it depends mostly on your SQL Server version. Since SQL Server 7.0, the way that auto update works, has changed (much so for the better!!). So, if you're in SQL Server 2005 or SQL Server 2008, I would say most definitely - leave these ON (or if you turned them off - turn them back on!!!)! If you still have problems with a specific index causing your grief, then turn off auto update at the index level NOT at the database level. To turn of auto update at the index level use STATISTICS_NORECOMPUTE in the index (or statistics) definition.

Now, as for why (and how!) this has changed over the versions... here we go:

SQL Server 7.0

  • Invalidation: Statistics were *invalidated* when a row modification counter (sysindexes.rowmodctr) was reached. This meant that they could not tell where the modifications were occuring and, if modifications were somewhat isolated to a specific column ALL of the statistics for the TABLE would be invalidated (so, statistics could be invalidated earlier than necessary)
  • Updating: Even worse, in SQL Server 7.0, when statistics were invalidated, they were immediately updated. This caused two problems - thrashing at the time of invalidation because all of the stats needed to be updated AND two, if the statistics were not used for awhile then extra work was involved to update them and by the time they were used, they might already be somewhat out of date already.

SQL Server 2000

  • Invalidation: Statistics were still invalidated based on a row modification counter.
  • Updating: SQL Server 2000 fixed the "updating-potentially-too-often" problem by only updating statistics when they were needed.

SQL Server 2005

  • Invalidation: The biggest changes were introduced in SQL Server 2005 where they decided to NO LONGER user the sysindexes.rowmodctr and instead use an internal (and undocumented) columns specific modification counter. Now, statistic invalidation is more isolated to only those columns which are heavily modified. This internal/undoc'ed column is sysrowsetcolumns.rcmodified and can only be seen when connecting to SQL Server using the DAC (Dedicated Admin Connection).
  • Updating: Updating didn't really change but, SQL Server 2005 added "Aynch Auto Update" for statistics so that when the QO (query optimizer) encounters an out-of-date (i.e. invalidated) statistic, they can "trigger" the update but not wait for the update (meaning that they'll optimize using the out-of-date statistic). This can be both positive (faster) and negative (might not be the best plan if the statistics have changed drastically). It is off by default and IMO, I'd leave it off in most cases but if you find that auto update events (which can be Profiled) are causing you grief, then you can turn this on at the database level.

SQL Server 2008

Nothing new except "Filtered Statistics" and these are interesting as the density vector is still relative to the table (not the predicate) but the histogram is just over the predicate (OK, I know I'll have to blog a lot more about this one!). Anyway, I'm still playing/learning a lot more about these and they make the most sense with filtered indexes (as opposed to just a filtered statistic) but, just like statistics on secondary columns you will also potentially want statistics on the secondary columns of your indexes. The next question is should they have a filter or not. I've found that sp_createstats doesn't seem to create statistics with filters and I'm going to need to do some testing here but I think statistics with filters (filters that match the non-clustered index) should help to make the stats better (and even allow better usage of filtered indexes) but, I'm really going to need a bunch of time with this - and another post :). As for auto create/auto update - no changes there!

Long story short, if you're using SQL Server 2005 or SQL Server 2008, you should leave auto create/auto update ON.

Thanks for reading!
kt

PS - A few of you have mailed me about a bug in the sp_helpindex2 script(s). OK, that's my next post!!! Possibly with an sp_helpstats2 script as well!

Sunday, August 24, 2008 11:12:29 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

OK, so SQL Server 2008 came out *BEFORE* Visual Studio SP1 but VSSP1 was required (this was a bit annoying and created a lot of confusion/emails/newsgroup posts, etc.) but luckily VSSP1 came out only a couple of days after SQL Server 2008 RTM'ed. As for Data Dude, well, it's not RTM'ing until later this year (I think there will be one more release - probably an RC0 - before RTM) but they've been pushing out various CTPs... so, what works with what and how the heck do you get everything to install?? Oh my!

Really, it's very simple:

  1. Install Visual Studio 2008 (with Team System/Database Edition), then, install SP1
  2. Install SQL Server 2008
  3. Download/Install the latest Data Dude CTP. BOTH CTP15 and CTP16 work with SQL Server 2008 RTM but 16 has a lot of updates.

SQL/VS can really be installed in any order but this (above) is what was recommended as the best order. However, I've not heard any problems with SQL then VS. As for where to go to get more details on Data Dude - check out Gert's "Data Dude" blog for links to the downloads and a lot of tips/tricks.

Have fun!
kt

Sunday, August 24, 2008 10:17:43 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

OK, so, I don't blog very often. I don't know what it is... I think it's that I feel like I always need to blog huge posts and the thought of writing my huge post makes me not want to blog... so, sometimes my time-between-blogging (TBB :) is long. I'm going to turn over a new leaf (no, really... I'm *really* going to try this time!!) and I'm going to try and write smaller posts and more of them. And, yes, don't worry, I'll still write the long in-depth ones but I'll at least try to give you a few tidbits of things that I encounter - more frequently. And, that's partially why I'm posting this entry...

I've been working quite a bit with SQL Server 2008 and yesterday I went to test of a few things with database mail. Normally this gets setup quite early and gets setup with a lot of other things so restarts of the server/agent happen because of saving/tweaking my test VPC. I don't usually go in and setup Database Mail and then immediately try to send mail. Yesterday, I did... and, guess what, it didn't work. I got error [264] An attempt was made to send an email when no email session has been established. So, I felt like I had seen that before... and, with a bit of web searching I landed on Gops Dwarak's blog for a known issue of SQL Server 2005: http://blogs.msdn.com/gopsdwarak/archive/2006/04/25/583434.aspx. And, yes, restarting the SQL Server Agent solved the problem. I'm surprised that this is still a bug in 2008 but it's not entirely the worst I could come upon. However, having said that, I also thought that there was a general initiative for software to stop requiring restarts of services and/or the OS. And, yes, it's *just* an Agent restart so it won't directly impact your server's availability but, I'm still surprised that it wasn't resolved. Has anyone else found a bug (or even a documented "issue") that requires a service to be restarted - and you think it shouldn't?

And, well, that's definitely not the most interesting thing I've learned/found with SQL Server 2008, there's lots more (yes, I know - I need to blog a lot of it :). The most fun I've had has probably been learning sparse columns and filtered indexes. For right now, I want to give you some quick key points about some of the changes around indexes and in particular around sparse columns/filtered indexes:

  1. Do NOT create non-clustered indexes on sparse columns without filters (a filter that says WHERE sparsecolumn IS NOT NULL) because indexes do NOT have sparse columns in their definition. So, if you don't use a filter you will end up storing all of the NULL values in the index - which will waste a tremendous amount of space.
  2. A non-clustered index on a sparse column (without a filter) will be the same size as a nonclustered index on a non-sparse column (which is essentially the same point as above but, this might help clarify it a bit).
  3. Do NOT think that the increase in total indexes (from 250 in 2005 to 1000 in 2008) is because you should have more indexes... it's NOT. It's specifically because you might have a lot more columns (these changed as well from 1024 in 2005 to 30000 in 2008) because even that increase should not be used unless you're using sparse columns... So, you STILL want to use a lot of the best practices we've recommended in past webcasts, whitepapers, etc. and you still want to care about row size (and page density) and therefore work to create narrower tables (in general and relatively speaking - depending mostly on usage patterns). However, if you have a need for *lots* of wider tables BECAUSE you have an interesting set of properties that only some rows will have (i.e. the main reason to use sparse columns) then using sparse columns to handle these columns that will largely be filled with NULLs is a good thing. It's a good thing because a row that has a NULL for a sparse column will take ZERO BYTES. Absolutely no space is used for a sparse column that is NULL. So, this allows your tables to be wide (in definition) but your rows to be narrow (in practice). And, with well defined indexes you can VERY efficiently and effectively search on these properties.

OK, I hope to get some examples posted as well. Between Paul and I and our upcoming events - where we're demo'ing/discussing a lot of these principles, we'll plan to post a demo or two on how effective these really can be. And, I know... some of you will fight back with the thought that sparse columns introduce bad database design practices... I know, you want (instead) tables that have name/value pairs - which are the way most of us did this in all versions prior to 2008. However, name/value pairs tables become fragmented messes that also have fragmented indexes and therfore overall poor performance (for both inserts and queries). Not to mention, they can be difficult to query/join with (because you need to join multiple times to retrieve multiple properties) and the code gets messy quickly. Anyway, sparse columns - while they may not seem quite right at first - can really be a *MUCH BETTER* way to design (and perform!!!) around this problem.

And, speaking of events. We have a bunch of upcoming events... I'll give you the short bulleted list here because I'm sure many of you also read Paul's blog and he's mentioned quite a few of these coming up. Also, if you're interested in learning more details for these, the full abstract/links for many of these can be found on our Upcoming Events page. However, specific links are also listed below!

  • Best Practices in Performance and Availability for SQL Server 2005/2008, 1-3 September 2008 in Hatfield, England. You can get more details/register here
  • Dublin SQL Server User Group, Index Internals and Fragmentation, 4 September 2008, Dublin, Ireland. Bob Duffy blogged about this here and you can get more details/register here.
  • Microsoft SQL Academy 2008 - Session 1, 5 September 2008, Dublin, Ireland. You can get more details/register here.
  • Indexing for Performance in SQL Server 2000/2005/2008, 8-9 September 2008, Edinburgh, Scotland. You can get more details/register here.
  • SQL Connections "Power Workshops Series" Microsoft SQL Server 2008 Overview for Database Professionals (Hands-on — Bring Your Own Laptop), 6-8 October 2008, San Francisco, CA, USA. You can get more details/register here.
  • Microsoft Tech Ed EMEA ITPro, 3-7 November 2008, Barcelona, Spain. You can get more details/register here.
  • SQL Connections Fall Conference, 9-14 November 2008, Las Vegas, USA. You can get more details/register here.
  • PASS Community Summit 2008, 17-21 November 2008, Seattle, WA, USA. You can get more details/register here.

And, the last thing I'll leave you with is a recipe for the best darn Chocolate Chip Oatmeal Cookies *ever*. OK, I did a web search for exactly that (ok, it all started when Paul (who really doesn't really like sweets) decided that for his birthday he wanted Choc Chip Oatmeal Cookies (we asked becuase the girls and I wanted to bake something)). The search led me to here. And, they're definitely right that these are the best cookies ever! And, they make a few really good points:

  1. Don't microwave the butter... set it out for an hour before you're going to make the cookies. It's definitely different/better when the butter is naturally soft.
  2. Definitely take the cookies out when they look like they have a few more mins to go...

But, Paul's birthday was in July... and, it was not the only time we've made these cookies... mostly because *everyone* we introduce to these cookies is addicted - yes, the girls, me, Paul, my Mother, Brian Randell (who was forced to take some "togo" after a BBQ here) and countless of our other friends since this is now our "when-we-entertain" dessert ;-). So, since I've had the opportunity to tweak the recipe myself, here's what I do differently...

  1. I add less chocolate - usually 1-1.5 cups of chocolate chips. Sometimes I mix half/half semi-sweet and milk chocolate but I always use closer to 1 cup...
  2. I add about 1 cup of coconut - right at the end, with the chocolate chips...
  3. I make a single 12x18 cookie sheet "uni-cookie" and then I cut it up like brownies. As for the cook time, I've found that this giant cookie takes right about 14 minutes (but, you'll have to experiment with your oven... I usually take the giant cookie out when it's puffed up and the edges are just starting to go lightly brown). Oh, and you don't have to spread this out perfectly to the edges - it will expand and fill in. Just get somewhat close. The nice thing about this form factor is that it's a lot faster than making 36 individual cookies and then you get to choose the size(s) when you cut up the uni-cookie.

Oh, and don't blame me when you make 10 batches of these things and get all of your friends addicted as well. They really are yummy!!

Next up... installing Data Dude CTP16 with SQL 2008 and Visual Studio... what works with what and what's the best install order!!!

Thanks for reading!
Kimberly

Sunday, August 24, 2008 9:27:18 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Thursday, October 04, 2007

OK, so thought I'd do a follow up to the post I did a couple of days ago titled: The perils of case-insensitive data (and our life in tangent-land). The reason I'd like to followup on it is that I received some excellent comments and I want to make sure that you're all aware of the tips/tricks and recommendations that there were (some of you may not have returned to see all of the comments). Really, I was impressed by the speed at which people responded as well as the great comments (and things I learned!). It just reminds me of the fact that none of us can know everything AND that our SQL community is awesome in its willingness to share and communicate.

As for the tips/tricks and "yes, duh!" realizations I came to... here are the interesting points from the comments:

First - why did my comparison work for a single character (e.g. '%A%') but not when I did a character range (e.g. '%[A-Z]%')? Well, it was because it was unicode! This was a "right! duh!" realization that I think I dreamed after I wrote this BUT, Hugo Kornelis is exactly right in his comment. Thanks Hugo! Here is a direct cut/paste of his comment:

The reason [A-Z] doesn't work, is that a collation doesn't just govern case sensitive vs case insensitive but also (amongst others) the sort order of letters. And most case sensitive collations sort like A - a - B - b - ... - Z - z. So [^A-Z] would include all letters except the lowercase z.

You can use [A-Z] to find uppercase characters in a binary collation (since all uppercase characters are in one range of ASCII, and all lowercase characters in another), but not in any other collation.

And, you can check out more from Hugo on his blog: http://sqlblog.com/blogs/hugo_kornelis/default.aspx

Second - the comparison query that I wrote all together (where I stated each letter individually in the WHERE clause) only took a few seconds to write (thanks to cut and paste :) AND it did work...And, sometimes getting something to work and moving on is all we can do (come on - you've ALL been there, eh? :). However, my main comment was that "it wasn't pretty". A much more elegant and unbelievably simple solution came from David R Buckingham (aren't the great answers always the really simple ones :)). Here is a direct/cut/paste of his comment:

The following query will return any fully lower case names in the table:

SELECT LastName
FROM Person.Contact
WHERE LastName COLLATE Latin1_General_CS_AS_KS_WS = LOWER( LastName ) COLLATE Latin1_General_CS_AS_KS_WS

I don't believe that David has a blog... maybe he should :).

Third - a very cool and clever trick that came in from Denis Gobo is related to the performance of repeatedly doing case-sensitive searches on a case-insensitive column. I suggested that creating an additional column (preferably a computed column that uses the case sensitive collation) would be an easy and optimal solution. This is still definitely true when the case-insensitive values are NOT selective enough to warrant using an index and the case-sensitive values are... However, if both the case-sensitive AND the case-insensitive values are reasonably selective then the trick that helps is from his comment. Here is a direct cut/paste of his comment:

Kimberly, the way to force an index seek is to do this

SELECT *
FROM MyTestContacts
WHERE Lastname = N'adams'
AND Lastname COLLATE Latin1_General_CS_AS_KS_WS = N'Adams'

The WHERE might return more than one row but the AND will return only the case sensitive one

I wrote about that a while back here:
http://sqlservercode.blogspot.com/2007/05/make-your-case-sensitive-searches-1000.html 

And, you can check out more from Denis on his blog: http://sqlservercode.blogspot.com/

Now, as for the issues related to creating a view in a database that has a different collation from the server's collation... Here, I'm fairly certain that there's still a bug. However, I'm happy to say that I don't think that it's the most likely situation that exists for collations. I think the two most likely situations are:

  1. The server has one collation. The database inherits that collation. The database developer makes column level collation changes throughout the db. This seems to work well. OR
  2. The server has one collation. The database has a different collation. The database developer consistently uses that collation throughout their app. A good example of this is where people have case-sensitive databases on case-insensitive servers. This works fairly well (although there are some issues wrt to temp tables, etc. and default_collation is a good thing to know).

I guess there's even a third one where column level changes are made in a database whose collation is different from the server but where there aren't any views that also change the collation to yet a different collation (and this is where there seems to be a bug).

So, this was an excellent (and reasonably fun :) :) exercise to go through wrt collations. And, this is how I (we?) learn! I really want to thank everyone for reading - and commenting/sharing! - the things they learned/knew. That's part of why I love the SQL Server community. And, speaking of which, I thought I'd end this entry with a few community links - as a reminder to everything that's out there:

Thanks for reading! Thanks for commenting!
kt

Thursday, October 04, 2007 11:57:05 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Tuesday, July 24, 2007

DDL Triggers were a new feature of SQL Server 2005 and while seemingly simple, they are very powerful. DDL Triggers allow you to trap an attempted DDL operation to audit it, prevent it, or do anything you want to validate/verify/”authorize”/etc – you write the code. And, since a trigger fires as part of the transaction, you can roll it back.

In many conference demos/webcasts, etc., I have provided a sample script that prevents ddl within a [production] database. That script has been really helpful/useful but recently I thought about an update to it…

SQL Server 2005 has another new feature "execute as". While I definitely see many benefits, I’m also a bit concerned. To a certain extent, I feel that the potential for SQL Injection is actually higher. If a developer creates a poorly written/tested stored procedure (ok, therein lies the problem, really!) that includes dynamic string execution AND then uses "execute as" to essentially elevate a user with minimal privileges to a higher level (so that they don’t need to give the base object rights to the user), a malicious user could “inject” code in and actually succeed if the “execute as” user has rights to the injected code. In prior releases, and with the default behavior (execute as caller), this is not possible (which is good for security but bad for dynamically executed strings within stored procedures as base object rights are necessary).

Having said that, and since security is always a concern, my DDL Trigger only audited for the login of the user who executed the statement, not for the actual user that’s logged in. In other words, if EXECUTE AS is used (or SETUSER is used), then the context of the user executing is actually different then the logged in user. To see this shift in context, SQL Server 2005 added a new function: ORIGINAL_LOGIN().

(reading between the lines is even more frightening in that prior to SQL Server 2005, the original user could not be tracked from SETUSER. The good news is that SETUSER is ONLY allowed to be used by DBOs so it’s not as widespread as the potential for “execute as”).

OK, so how can we put all of this together? We’ll want to add the ORIGINAL_LOGIN function into our audit table in our DDL Trigger. Even if you choose NOT to rollback, at least you’ll know who performed the operation (even if from a dynamically executed string!).

USE AdventureWorks;

go

--Create a login/user - just for this exercise

CREATE LOGIN Paul WITH PASSWORD = 'PxKoJ29!07';

go

CREATE USER Paul FOR LOGIN Paul;

go

sp_addrolemember 'db_ddladmin', 'Paul'

go

 

CREATE SCHEMA SecurityAdministration

go

CREATE TABLE SecurityAdministration.AuditDDLOperations

(

            OpID                            int                                NOT NULL identity    

                                                                                    CONSTRAINT AuditDDLOperationsPK

                                                                                                PRIMARY KEY CLUSTERED,

            OriginalLoginName    sysname                     NOT NULL,

            LoginName                 sysname                     NOT NULL,

            UserName                   sysname                     NOT NULL,

            PostTime                     datetime                     NOT NULL,

            EventType                   nvarchar(100)            NOT NULL,

            DDLOp                        nvarchar(2000)          NOT NULL

);

go

GRANT INSERT ON SecurityAdministration.AuditDDLOperations TO public;

go

 

CREATE TRIGGER PreventAllDDL

ON DATABASE

WITH ENCRYPTION

FOR DDL_DATABASE_LEVEL_EVENTS

AS

DECLARE @data XML

SET @data = EVENTDATA()

RAISERROR ('DDL Operations are prohibited on this production database. Please contact ITOperations for proper policies and change control procedures.', 16, -1)

ROLLBACK

INSERT SecurityAdministration.AuditDDLOperations

                        (OriginalLoginName,

                         LoginName,

                         UserName,

                         PostTime,

                         EventType,

                         DDLOp)

VALUES   (ORIGINAL_LOGIN(), SYSTEM_USER, CURRENT_USER, GETDATE(),

   @data.value('(/EVENT_INSTANCE/EventType)[1]', 'nvarchar(100)'),

   @data.value('(/EVENT_INSTANCE/TSQLCommand)[1]', 'nvarchar(2000)') )

RETURN;

go

 

--Test the trigger.

CREATE TABLE TestTable (col1 int);

go

DROP TABLE SecurityAdministration.AuditDDLOperations;

go

EXECUTE AS LOGIN = 'Paul' -- note: Remember, Paul is a DDL_admin

go

DROP TABLE SecurityAdministration.AuditDDLOperations;

go

REVERT;

go

 

SELECT * FROM SecurityAdministration.AuditDDLOperations;

go

DROP TRIGGER PreventAllDDL ON DATABASE;

go

DROP TABLE SecurityAdministration.AuditDDLOperations;

go

DROP SCHEMA SecurityAdministration;

go

DROP USER Paul;

go
DROP LOGIN Paul;
go

 

So, have fun testing with this one.

 

Thanks for reading!

kt

Tuesday, July 24, 2007 5:31:20 AM (Pacific Standard Time, UTC-08:00)  #    Comments [4]  | 
Monday, March 05, 2007

I had a discussion earlier today (with Paul Randal) about many misconceptions that exist about upgrading databases and more importantly, about "downgrading" databases. Really, the issue is that I've heard people get frustrated when they find that things like backup/restore works FROM SQL 7.0/2000 TO SQL Server 2005 but not the other way around - even if the database is in SQL Server 2000 (80) compatibility mode. First and foremost, compatibility mode only affects parsing, query processing, and general data manipulation; it does not affect physical storage (well, there's more to it than that but that's a general overview). When you upgrade a database to SQL Server 2005, you WILL benefit immediately from changes in the storage engine, etc. regardless of compatibility mode. Compatibility modes are there to give you time in updating/upgrading your code - if/when necessary. Most code will work when upgrading but some code may not be supported because of changes to keywords, syntax changes, etc... The best thing to do is check your application compatibility with the Upgrade Advisor. I did a a couple of webcasts on Installation/Upgrade as part of my 11-part series on TechNet. See the blog entry for the entire series here. Part 3 and part 4 are focused on Installation and Upgrade and their associated blog entries have a lot of additional links (including links to the Upgrade Advisor as well as a series of things you might want to do before you upgrade). Also, be sure to checkout the upgrade site off of the main Microsoft SQL Server site.

How to move USER databases around - a quick list of what's supported between versions

Backup/Restore from 7.0 to SQL Server 2000
Detach from 7.0, copy the files, then attach to SQL Server 2000
Backup/Restore from 7.0 to SQL Server 2005
Detach from 7.0, copy the files, then attach to SQL Server 2005
Backup/Restore from 2000 to SQL Server 2005
Detach from 2000, copy the files, then attach to SQL Server 2005

Why use Backup/Restore?

PROs

  1. Because you have a backup! This will allow you to go back to the version from which you came. However, without any changes made on the uplevel version.
  2. Because it doesn't require the database to be taken "offline" when the backup is performed (note: that this is both good and bad - bad because you don't really know the exact point in time to which the database reconciles...which may not matter if you're just testing).
  3. Because the backup will be the size of data only and will not include database free space. Free space is not backed up (e.g. a database with a 100GB data file with only 20GB of data should yield a file that's roughly 20GB in size). I say "roughly 20 GB" because the internals of a backup require that the transaction log records for the activity that occured during the backup process are also backed up with the full database (or differential) backup. This is actually the basis for why transaction log backups are not supported during a full/differential backup in SQL Server 2000 (they are in SQL Server 2005). However, this is the reason why the transaction log cannot be cleared while a full or differential is ALSO running in SQL Server 2005.

CONs

  1. You don't know the exact point in time to which the database reconciles (it will be the time that the backup completed) AND logs CAN be restored uplevel as well. NOTE: If you're interested in creating an exact point in time version of the database - consider putting the database into "restricted user" mode or "single user" mode (so that user transactions are not allowed during the backup). Again, this may not be a concern.
  2. It takes time to complete the backup (there are four phases of a restore: create/initialization, copy, redo, undo). Make the create/initialization *much* faster by enabling Instant Initialization. See my Instant Initialization blog post for more details.

Why use detach/attach?

PROs

  1. It's simple, it's fast... but once detached then the database is OFFLINE.
  2. You know the exact point in time to which it reconciles because no transactions are allowed into the database once it is offline. Again, this may not be a concern.

CONs

  1. You must copy the entire file - including the free space to the other location and the network copy might be the most expensive (meaning time consuming) part of the entire process. However, once copied, the files do NOT need to be created on the destination because on attach, these files will be used.
  2. The database is offline once detached and during copy.
  3. If you don't COPY the files and instead you attach the detached files, you will have ABSOLUTELY NO WAY of getting back to the version from which you detached. (ah, this is probably the single most important reason for why I prefer backup/restore!)

Summary for "How to move USER databases around"
Between these versions "upgrades" are supported ONLY to the uplevel version. There is NO single (or simple) feature that can be used to get back to the version from which you started (without exporting/importing all of the data). There is also no undocumented back-door to do this either (no trace flags, no DBCC commands, NA DA!!! as per Paul).

What about System Databases?
This is a whole other can of worms to open and the easiest thing I can say here is that you generally should not move/upgrade system databases across machines. These are upgraded through "in-place" upgrades of SQL Server (on the same machine) or through manual migrations (to different machines) of the users/objects (SQL Agent Jobs, user-defined system procedures in master, logins in master, etc.). This is not an easy process (manual migration) but may prove to be a better choice over an upgrade in place if something were to go horribly wrong (which is unlikely but I'm a "what's the worst case scenario" person when it comes to availability :). The other benefit of NOT upgrading in place - and instead MOVING databases from one version to another on an upgrade - is that you get to complete some basic "spring cleaning". New hardware, freshly formatted, freshly installed/configured OS, clean disks, etc. This can often alleviate some of the strangest, hard-to-determine-problems, that have plagued you for weeks/months. Like I said, this is a whole other can of worms to open!

But - if you're interested in moving system databases around on the SAME machine, here's a great KB that covers the required options, syntax, rules and restrictions: How to move SQL Server databases to a new location by using Detach and Attach functions in SQL Server

And - if you're interested in transferring logins and passwords between instances (for upgrade or for sync'ing two servers used to create a standby partnership - with Database Mirroring and/or Log Shipping), here's a great KB article that includes links to other articles even uplevel transfering of logins (like 2000 to 2005): How to transfer logins and passwords between instances of SQL Server

And - that's it for this week (probably)... two in a row is not likely to become three in a row (just setting expectations :) :) :),
kt

Monday, March 05, 2007 3:22:45 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Friday, February 02, 2007

Well, I've promised to blog more and I'm really going to try to do so. This morning I got the perfect question/comment (in email) to respond to and after working through a response that was taking me upwards of 3 hours (you'll learn later why I have 3 "spare" hours :)......... I figured that it was time to turn the response into a blog post. ;)

Background: The Clustered Index Debate
In the years since the storage engine was re-architected (SQL Server 7.0+) there's been constant debate on how to appropriately choose the clustered index for your tables. I've generally recommended an ever-increasing key to use as a clustered index and many find that counterintuitive. The primary reason people feel it's counterintuitive is that it creates a hotspot of activity. [If "hotspot" is not a familar term - a hotspot is solely an active place within your table.] Hotspots were something that we greatly tried to avoid PRIOR to SQL Server 7.0 because of page level locking (and this is where the term hot spot became a negative term). In fact, it doesn't have to be a negative term. However, since the storage engine was rearchitected/redesigned (in SQL Server 7.0) and now includes true row level locking, this motivation (to avoid hotspots) is no longer there. In fact (and probably even more counterintuitive), the opposite is true. Hotspots (specifically hot PAGES not hot ROWS) can be very beneficial because they; minimize the number of pages needed in cache, improve the likelihood of the required page already being in cache and in general, they minimize the overall amount of cache required. So, this is why many of us have changed our recommendation on where to create the clustering key in 7.0+. Instead of focusing on range queries we now focus on placing the clustering key on an ever-increasing key. In earlier releases, focusing on range queries for the clustered index reduced hotspots for insert/update and this in fact was the PRIMARY motivation to choose them, NOT range query performance! But - there are even MORE reasons to choose an ever-increasing key and they are based on internals as well. These internals are based on the significant changes made in the storage engine for 7.0+. For a quick start on these, I went through them in the Blog entry here.

And, today's email is not uncommon. This is the basis for the title clustered index debate. In general, there are still a lot of questions related to creating clustered indexes to improve "range query" performance. Don't get me wrong, there's definitely a benefit in performance for some range queries but the first thing to remember is that you get only one CL index per table (therefore only one type of range query can benefit). In the real world, t's not likely that you want to see your data exactly in the same way all the time. Therefore it's very challenging to come up with the "right clustered" index if you're using range queries as your strategy. Even worse, the affect of choosing the clustering key to improve range queries causes problems for modifications against that table (INSERTs/DELETEs and UPDATEs). So.............. this is what started my day today. A great email from a reader that brought up these points. The question/comment (modified to hit only the highlights and to protect their identity :) was this:

The most important characteristic for a Clustered Index key is to satisfy range queries. More often than not, if a sufficient range of data will be scanned, the Optimizer will choose the Clustered Index over all others due to the excessive cost of Bookmark Lookup operations. As such, the table KEY is a more suitable clustered index candidate than any surrogate (few every query a database by range of surrogate keys).  [kt note: this second sentence is not entirely true... SQL Server will certainly choose a clustered index over non-clustered that require table scans but there are A LOT of algorithms that SQL Server can use instead of either of these and my examples later show this... non-clustered covering seekable indexes, non-clustered scanable indexes, index-intersection, etc. ] 

Now, when the default behavior for SQL Server was designed such that the PRIMARY KEY was chosen as the default clustered index, it was exactly for this reason.  It is the business key.  It would satisfy uniqueness (by definition of logical KEY).  And, it is well suited for a wide variety of range scans.  However, this is when the PRIMARY KEY is defined on the Business Key of the data.

But, when you introduce the usage of surrogate keys (i.e., IDENTITY) as a physical implementation, and thus transfer the PRIMARY KEY definition to it, two things must be considered.  First, the Business Key this IDENTITY will be a proxy for must still exist as it is still apart of the logical design.  As part of the physical design, the logical key needs to be implemented as a physical constraint to maintain logical uniqueness.  Second, just because a proxy has been defined does not make it a natural candidate for the clustered index.  The business key still maintains this distinction.

What is often cited as the “reason” for IDENTITY PRIMARY KEY clustered index definitions is its monotonic nature, thus minimizing page splits.  However, I argue that this is the only “reason” for defining the clustered index as such, and is the poorest reason in the list.  Page Splits are managed by proper FILLFACTOR not increasing INSERTS.  Range Scans are the most important “reason” when evaluating clustered index key definitions and IDENTITies do not solve this problem.

Moreover, although clustering the IDENTITY surrogate key will minimize page splits and logical fragmentation due to its monotonic nature, it will not reduce EXTENT FRAGMENTATION, which can cause just as problematic query performance as page splitting.

In short, the argument runs shallow.

Luckily, this email arrived with perfect timing for me as I'm sitting in a "bootcamp" event on Always On technologies and I'm not speaking this morning (my colleague Bob Beauchemin is doing lectures on Scale Out technologies: Scalable Shared Databases, Service Broker, DPVs, etc.). Anyway, in addition to listening to Bob, I've decided to continue the blog series on "the clustered index debate". The first and most important point to stress is that minimizing page splits is NOT the only reason nor is it the most important. In fact, the most important factors in choosing a clustered index are that it's unique, narrow and static (ever-increasing has other benefits to minimizing splits).

The Clustered Index Debate Continued
First, there are many angles to look at wrt to "the clustered index debate" and it's not until all of the issues are reviewed, that this strategy (a monotonically increasing key) becomes obvious. So, I think it will probably take a couple of blog posts to really prove this. I'll start up this debate again here...... When you look at a general purpose table (which is most) where the table has ALL DML (S/I/D/U) then you are best off with an ever-increasing key (again, you have to look at the overall impact of all operations against the table - not just select... because I/D/U will also impact select in the long term). So, I'll break this down into each DML operation here. If you don't look at the overall impact, then large tables can end up having a tremendous number of problems once they're put into production. I've certainly heard this concern/debate before (and most people are skeptical at first glance) but when you look at the situation overall, you'll find that "finding the right balance" includes not just looking at range queries. In fact, here's a quick list of the things/tests/numbers/scenarios that help to prove my strategy:

  • Inserts are faster in a clustered table (but only in the "right" clustered table) than compared to a heap. The primary problem here is that lookups in the IAM/PFS to determine the insert location in a heap are slower than in a clustered table (where insert location is known, defined by the clustered key). Inserts are faster when inserted into a table where order is defined (CL) and where that order is ever-increasing. I have some simple numbers but I'm thinking about creating a much larger/complex scenario and publishing those. Simple/quick tests on a laptop are not always as "exciting". But - this is a well documented issue (IAM/PFS lookups) and poor performance on a heap is also referenced in this KB: PRB: Poor Performance on a Heap. note: this KB is quite dated and I don't actually agree with everything in this article however, the general concern of poor performance for inserts is still true on SQL Server 2005.
  • Updates are often faster (when the row needs to be relocated) and for the same reason (IAM/PFS lookups) BUT there are many types of updates and not all updates cause records to be relocated. Here are a few things to think about wrt to updates:
    • Updates that are completely in-place (some examples are where the update is updating a fixed-width column OR to variable-width columns where the row size doesn't change, etc.). These types of updates don't really care.
    • Updates that cause record relocation (where the row size changes) are definitely better by having a clustering key because the record relocation (which will be handled by a split) is defined by the clustering key
    • Updates to the clustering key are the WORST (in this case) which is one of the key reasons for having a cl key that is static (so we have to keep this in mind when we choose a clustering key).
  • Deletes aren't nearly as big of a concern BUT deletes in heaps create more gaps and more gaps creates more work in PFS/IAM lookups and while this helps to reduce wasted space, it still requires the time to find the space........ hence the slowed performance of Inserts/Updates. I've also written some blog entries that cover very interesting test cases for large scale deletes and why you'd want to consider partitioning to optimize for the "sliding window scenario" in this blog entry: MSDN Webcast Q&A: Index Defrag Best Practices - Fragmentation, Deletes and the “Sliding Window” Scenario and it's the LAST one!.
  • Selects.............. now this is the hardest one to go through in just a couple of bullets (ah, I guess this will lead to another one or two posts :) BUT I'll start by saying that the best way to tune the vast majority of range queries is through non-clustered [covering] indexes. But, it's also important for me to stress that I do NOT advocate covering every query (it's impossible to do). What's important to realize in terms of covering is that SQL Server 7.0 and up continues to include internal algorithms to improve performance when you don't have the "perfect" non-clustered covering seekable index and instead still gives better performance than going to the base table (or performing bookmark lookups - as mentioned in the mail...and I completely agree that these [bookmark lookups] can be evil!). To start this discussion, I'll give one of my favorite examples of a large-scale aggregate. The absolute best way to improve the performance is through an indexed view but the data can be gathered through many other algorithms - ideally through a non-clustered covering index that is in order by the group by and that includes the column(s) being aggregated. For example, take this query:

SELECT c.member_no AS MemberNo,
 sum(c.charge_amt) AS TotalSales
FROM dbo.charge AS c
GROUP BY c.member_no

On a charge table of 1.6 million rows here are the performance numbers to handle this aggregation:

  • Clustered table scan (CL PK on Charge_no) with a hash aggregate = 2.813 seconds
  • Index scan (non-clustered covering but NOT in order of the group by) with a hash aggregate = 1.436 seconds
  • Index scan (non-clustered covering in order of the group by) with a hash aggregate = .966 seconds
  • Indexed view = .406 seconds

Now this was a pretty small table (narrow rows and only 1.6 million rows) AND I didn't have any concurrent activity. The concurrent activity would have caused this to be even slower for hash aggregates, etc. Regardless, it proves the point (at least generally). Now, if I wanted to improve this range query then I'd have to cluster on the member_no column (and this is an ideal example because I often hear people say that clustering on a foreign key column helps to improve range/join queries - which can be true as well)......... But - this strategy has a few problems in addition to a few benefits (and we have to look at everything to be sure of our choice/decision). First, member_no is not unique (in the charge table) so SQL Server has to "uniquify" the rows. The process of "uniquification" impacts both time (on insert) and space (the rows will be wider to store each duplicate row's uniqufier). Also, theoretically it could change (in this case that's not true). Anyway, the time it takes for the clustered index is 2.406 seconds which is better than the clustered on the PK (of course) but if I were to also start modifying the rows (which creates splits) or even just insert 15% more rows........ then my table would become fragmented. At that point, the query performance should get worse in the table clustered by member_no table and it will continue to get even worse in the table clustered by charge_no (because of the worktable created in tempdb by the hash aggregate) BUT it won't be all that much worse in the non-clustered index examples (especially the covering index that's in the order of the group by - because this doesn't require a worktable).........

  • CL on member_no = 4.906 seconds
  • CL on charge_no = 6.173 seconds
  • Index scan (non-clustered covering but NOT in order of the group by) with a hash aggregate = 3.906 seconds
  • Index scan (non-clustered covering in order of the group by) with a hash aggregate = 1.250 seconds
  • Indexed view = .516 seconds

This is a great start to furthering the clustered index debate but I do have to admit that it's a counterintuitive and difficult issue to tackle because often isolated tests lead you to different conclusions. In this case though, the non-clustered indexes are better for this range query and the indexed view is the best (but I wouldn't consider the Indexed unless this were more of a read focused database rather than read/write). [and - of course, that statement warrants yet another blog post :)]

So, depending on the tests that you do - especially if you focus only on selects and you don't have modifications (i.e. fragmentation) - then they will make "creating the clustered index for range queries" appear to be best. Again, I'm not just saying this to prevent fragmentation, I'm saying this because I wouldn't use the clustered index OR a non-clustered index with bookmark lookups to handle this query. I'd consider a non-clustered covering that's seekable OR even a non-clustered covering that's scanable before I'd even choose the clustered (and that's what the optimizer would prefer as well). In the end it's really a bit of an art and a science to "finding the right balance" of indexing.

Oh - and if you arbitrarily add a column to use for clustering (maybe not as the primary key) that can help but many would prefer to use actual data... which means [potentially] creating your primary key with a new identity [or similar] column and this can impact your business logic (absolutely). I'm certain that certain tests can show that range queries are faster and it's absolutely correct that business application/usage can be a concern but when you look at the big picture (and the impact on I/D/U) then the benefits of the monotonically increasing key significantly outweigh these concerns. Simply put, a small/narrow key can help join performance and an ever increasing key can also help lookups for rows! (yes, definitely more coming)

Happy Friday! Have a great weekend. I'll try to continue more threads on this debate shortly!
kt

Friday, February 02, 2007 11:39:12 AM (Pacific Standard Time, UTC-08:00)  #    Comments [3]  | 
Thursday, November 30, 2006

OK, it's been a heck of a long time since I blogged... and for that I apologize. I'm also WAY overdue in my posting my demo scripts from a TON of conferences BUT... now everything has been posted. Check out the past events page on SQLskills and you can find the demo scripts that you're looking for......lots of fun stuff and TONS of scripts to play with and test.

Now - as for the reason(s).... many are business and for that I blame the following (yes, 17 flights [yes, one boarding pass is missing] over ONE 5 week trip with 7 events and 5 continent changes):

The other reason(s) are personal...suffice it to say that the last 6 months have been some life changing times for me and what I'm finding (or trying to find) is that ever important balance between work and life. During this holiday season (and always), I wish you and your loved ones well and I hope that you too can find (and cherish) what's most important to you.

So, you won't see anything else from me for this year but I do hope to be better (and more frequent) with blogging in the New Year and I especially hope to see you again at an upcoming conference. Let me leave you with the most exciting picture I've witnessed this year...it was during my one day of site seeing in Cape Town - where I went cage diving with Great White Sharks (and got horribly sea sick - which is rare for me) but where I was able to witness these amazing and powerful creatures....

Have a happy and safe holiday season!
kt

Thursday, November 30, 2006 10:42:45 PM (Pacific Standard Time, UTC-08:00)  #    Comments [4]  | 
Monday, July 10, 2006

Been thinking a lot about something that was mentioned in a few of my most recent posts... Especially when I get comments like "that's another item to add to our checklist" or "that's a good trick to add to our arsenal" and well, I thought in this blog entry I'd ask for your tricks that fall under the umbrella of designing for performance.

For example - do you change collations? I had a recommendation here.
For example - do you have a view that you want ordered? I had a recommendation (with caution) here. But - Adam Mechanic came back and said that he's used that trick to improve performance... and, I'm sure that's the case as well!
For example - do you have stored procedure parameters that are giving you grief? I had a series of recommendations in my Optimizing Procedural Code category here.

In fact, sometimes the best form of "hint" to SQL Server is NOT an optimizer hint but instead a more subtle change to the join (derived tables for example) or the infamous subquery -> join rewrite or the join -> subquery rewrite. I'm always asked "which is better - a subquery or a join" and I always answer YES. ;-)  OR taking a complex process and breaking it down into temp tables (I'd try to create views instead of temp tables first and see if the optimizer figures it out but there are cases when sometimes they just don't). Remember, it's not the optimizer's job to find the absolutely BEST plan; it's their job to find a good plan fast. And - they typically do. Really, no general "tricks" work ALL of the time and often they don't help at all but there are LOTS of things that I'm sure you've done and you really want to tell someone about it. How about here? I'm going to try to compile these tips/tricks into a best of...

Monday, July 10, 2006 7:27:35 PM (Pacific Standard Time, UTC-08:00)  #    Comments [4]  | 
Monday, July 03, 2006

Hey there everyone - If you're into Analysis Services, Integration Services, Reporting Services and BI in general - you'll want to check out Elizabeth Vitt's new blog. Liz has been specializing in BI since SQL Server started adding BI-centric components. She's got a wealth of information to share and many great insights into performance tuning as she's working on a BI Performance Tuning resource that will probably hit 100 pages (from current guestimates).

And - no surprise from Liz, she's out the gates running with her first entry on Influencing Aggregation Candidates.

Subscribe now!

And a big welcome to blogging for Liz!
kt

Monday, July 03, 2006 10:20:36 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Friday, June 30, 2006

Well, it's been a GREAT week here in Switzerland while working with my partner