Monday, March 17, 2008
So now that I have the latest CTP working again on my main machine, it's far less troublesome to go research things and post what I find.  Tonight I'll talk a little about datetime vs. date, as dates are on my mind for whatever reason.

So the "old" SQL Server datetime type only goes back to 1752, which seems very odd until you remember that current notions of date and time are really not that old.  In 1582, Pope Gregory XIII fixed the calendar because they realized that the year was actually slightly off compared to what the calendar said - each year the date of Easter ended up getting slightly further from what was intended.

Modern computer science students (at least ones who did the assignments *I* did in school :) will remember the funky rule for calculating a leap year - it is a leap year if the year is divisible by 4 but not 100 except for when also divisible by 400.  The year 2000 was just such a case, so we had a leap year then.  2100 will not be a leap year.  They had different fancy math in the original Julian calendar (as in Caesar) with a special month that happened every 377 years or so.  Later they tried to fix this because they wouldn't actually do the month every 377 years.  I promise - I'm not making this up.

By the time the figured it out, they were off by several days.  In modern terms, the OS shipped, everyone installed it, and it has a fatal bug that impacts every customer on upgrade to the hotfix :(.  So while today such bugs can put you out of business, back then the church had a bit more market power than the average customer might have today.  So, they decided to fix things.  So they changed the calendar and skipped 10 days in the process to fix the client.  Even worse, different clients (countries) installed the patch on different days, so they all changed dates differently.

It so happens that the British Empire (the "pink bits" on the old maps) adopted this in 1752.  That's almost 200 years after the Pope did his thing.  So, since SQL Server was first done in the US, Jan 1. 1753 is the first legal date because all of the math before that is simply dizzying.  Alaska actually didn't switch until the late 1800s since Russia controlled it before that, and the Orthodox calendar celebrates its Christmas as Easter later than those in the West because of this very issue. 

So if I try this with the old datetime type, I get:
create table datetbl(col1 datetime)
insert into datetbl(col1) values ('17510101')


Msg 242, Level 16, State 3, Line 1
The conversion of a varchar data type to a datetime data type resulted in an out-of-range value.
The statement has been terminated.

So I figure I'll try this out on the new date type:

create table datetbl2(col1 date)
insert into datetbl2(col1) values ('17510101')

(1 row(s) affected)

Well, the legal date range is listed as 0001 to 9999.  So let's check the math and see what happens...

create table datetbl3(col1 date, col2 date)
insert into datetbl3(col1, col2) values ('17510101', '17530101')
select DATEDIFF(dd, col1, col2) from datetbl3
731

Well, it does not appear that they skipped too many days there.  They even added a leap year since 1752 is divisible by 4 but not by 100 except for 400.  So instead of 730 we get 731.

I looked at Books online, but so far I haven't found any reference to the difference in the DATEDIFF function or in the discussion of valid ranges.  While I am sure that a number of people will not care that much and will mostly just want to use historical dates mostly with low precision, it's still important to make sure that a core system does calculations correctly.

Perhaps they will add a comment into BOL or a warning for datediff and other function uses around the various switch points for dates.  Perhaps not.  However, it's important for you, the programmer, to know the issues when dealing with old dates.

I originally learned about all of this stuff in detail when I reverse-engineered the SQL Server Expression Service Code while I was building SQL Server for Windows CE/SQL Mobile/SQL Server Compact Edition/whatever it is called now.  I've found this area to be fascinating because it takes something we take for granted and just slaps you in the face a few times.  I hope you have a bit more insight now as to the 1753 date limitation and perhaps will go crack open that history book :).

Thanks,

Conor Cunningham
Monday, March 17, 2008 7:47:07 PM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
Well, I was able to get things to finally install using the information from my previous post and using a named instance (different from the default instance I used previously).  So, while I'd still like to track down the keys that define an installation instance with enough detail to remove them, I think I've gotten close enough that others should be able to use this as a template to avoid an OS re-install.

I'd like to extend a HUGE thank you to the Microsoft SQL Server Setup Dev Team who spent time looking at this problem and providing a workaround for me.  Please accept my gratitude.

I am sure that they are hard at work getting setup ready to RTM, which will include problems like the one I've seen. 

So, please post up if you have other questions about what I did - I'll answer them if I can.

Thanks,

Conor

Monday, March 17, 2008 7:00:45 PM (Central Standard Time, UTC-06:00)  #    Comments [1]  | 
Friday, March 14, 2008
You may recall my previous posts on my trouble with SQL 2008 CTP6.  I've made some progress on fixing my machine that I thought I'd share with you.  I now get past the following error:

D:\temp\sqlctp6\servers>setup
The following error occurred:
MsiGetProductInfo failed to retrieve ProductVersion for package with Product Code = '{00E75F61-A126-4CE1-90B8-42295052F1AC}'. Error code: 1605.

I useded the SysInternals (err, Microsoft) Process Monitor Tool and watched for keys found/not found during the failed install.  This found a few keys in HKCR\Installer\UpgradeCodes that were being found early in the setup100.exe process.

(Fair notice - modifying the registry on your computer is your problem, not mine :).

So take the key:
00E75F61

Reverse it.  I think I had 615fe700, but it was late and I was tired.  It might have been 16f57e00.  Anyways, you will see some keys under HKCR\Installer\UpgradeCodes.  There are actually 2-3 places in the registry searched for each key  I've been killing all three of them for each key - there are about 7-10 keys.  The registry section looks like this:



The hex digits you have in the error will correspond to the right hand side of this picture.  The key I've been deleting is it's parent, which is the key being opened in the "key found"/"key not found" stuff in the process monitor log.

Here's what the log looked like for me:

136791  11:53:49.4922341 PM      setup100.exe    2812       CloseFile              D:\temp\sqlctp6\tools\setup\sqlrun_bids.msi                SUCCESS             

136793  11:53:49.4923712 PM      setup100.exe    2812       WriteFile              C:\Program Files (x86)\Microsoft SQL Server\100\Setup Bootstrap\Log\20080313_2353\WOPR_20080313_2353_Detail_ComponentUpdateSetup.txt SUCCESS                Offset: 7,591, Length: 62

136795  11:53:49.4956628 PM      setup100.exe    2812       RegOpenKey                HKLM\Software\Microsoft\Windows\CurrentVersion\Installer\Managed\S-1-5-21-2888934283-224128331-3030229123-1000\Installer\UpgradeCodes\87674BD65E9A5D1409951D671E37BDA4          NAME NOT FOUND         Desired Access: Read

136796  11:53:49.4957796 PM      setup100.exe    2812       RegOpenKey                HKCU\Software\Microsoft\Installer\UpgradeCodes\87674BD65E9A5D1409951D671E37BDA4      NAME NOT FOUND                Desired Access: Read

136797  11:53:49.4958199 PM      setup100.exe    2812       RegOpenKey                HKCR\Installer\UpgradeCodes\87674BD65E9A5D1409951D671E37BDA4 SUCCESS              Desired Access: Read

136798  11:53:49.4958553 PM      setup100.exe    2812       RegEnumValue                HKCR\Installer\UpgradeCodes\87674BD65E9A5D1409951D671E37BDA4 SUCCESS              Index: 0, Name: 16F57E00621A1EC4098B249205251FCA, Type: REG_SZ, Length: 2, Data:

136799  11:53:49.4958960 PM      setup100.exe    2812       RegOpenKey                HKLM\Software\Microsoft\Windows\CurrentVersion\Installer\Managed\S-1-5-21-2888934283-224128331-3030229123-1000\Installer\UpgradeCodes\87674BD65E9A5D1409951D671E37BDA4          NAME NOT FOUND         Desired Access: Read

136800  11:53:49.4959398 PM      setup100.exe    2812       RegOpenKey                HKCU\Software\Microsoft\Installer\UpgradeCodes\87674BD65E9A5D1409951D671E37BDA4      NAME NOT FOUND                Desired Access: Read

136801  11:53:49.4959604 PM      setup100.exe    2812       RegCloseKey                HKCR\Installer\UpgradeCodes\87674BD65E9A5D1409951D671E37BDA4 SUCCESS             

136802  11:53:49.5146648 PM      setup100.exe    2812       RegOpenKey                HKLM\Software\Microsoft\Windows\CurrentVersion\Installer\Managed\S-1-5-21-2888934283-224128331-3030229123-1000\Installer\Products\16F57E00621A1EC4098B249205251FCA        NAME NOT FOUND         Desired Access: Read

Now the installer gets past this and tries to install the engine and then fails, but I will call this progress ;).

Notice - I had deleted all of my physical files for SQL Server from the machine, so killing the registry keys seemed like a reasonable next step.  I can't promise you it's a good idea since I don't have things working yet, but I hope this helps the many of you who mailed me and found me via search engines.

Thanks,

Conor Cunningham
Friday, March 14, 2008 5:15:30 PM (Central Standard Time, UTC-06:00)  #    Comments [1]  | 
Wednesday, March 12, 2008
Well, I spun the wheel of database topics I have here in my room, and today I think I'll talk about updates and views.... specifically inserts through non-indexed views.  Since I haven't blogged about this previously, I'll start at the beginning.  There are many, many update topics, so don't feel left out - comment if there's an update topic that interests you and I'll get to it.

UPDATEs are challenging for different reasons than SELECT statements.  While a SELECT statement can have huge combinatorical challenges in terms of the number of different plan choices to consider, UPDATEs often have a relatively small number of plan choices but instead have a number of very difficult performance tradeoffs about which kind of plan tweaks will cause the plan to finish the fastest.  You can imagine that there is a labor tradeoff between making a system work really well for SELECT plan exploration and making a system that can handle all of the detailed tweaks for UPDATE plans - most things are neutral, but getting that last 10% of performance out in one area may impact the other's ability to innovate.  So, balance is required on the part of the database implementor.

So if I were building my first database engine and I wanted to update a base table (a "heap"), I may have the following:

create table z1 (col1 int, col2 decimal, col3 nvarchar(100));
insert into z1 (col1, col2, col3) values (1, 2, 'i like cheese')
So if I implement my storage engine with a series of equally sized pages, I can probably figure out how to load each of them into memory and look for a place to store a record that's a linearization of (1, 2, 'i like cheese') on disk. 

Let's start making things more complex.

create table z2 (col1 int primary key, col2 decimal, col3 nvarchar(100));
insert into z2 (col1, col2, col3) values (1, 2, 'i like cheese')
So if I add a primary key, this is implemented in SQL Server as a clustered index.  This replaces the heap.  So, now to build my own storage engine that does this I'd have a B-Tree implemented and I would find the right place in my B-Tree to insert my new record.  I may need to split some pages to make things fit.  Here's the plan in SQL Server:

  |--Clustered Index Insert(OBJECT:([t1].[dbo].[z2].[PK__z2__357D0D3E03317E3D]), SET:([t1].[dbo].[z2].[col1] = RaiseIfNullInsert([@1]),[t1].[dbo].[z2].[col2] = [Expr1003],[t1].[dbo].[z2].[col3] = [Expr1004]), DEFINE:([Expr1003]=CONVERT_IMPLICIT(decimal(18,0),[@2],0), [Expr1004]=CONVERT_IMPLICIT(nvarchar(100),[@3],0)))

This is nice because there is still just the one structure to manage.  I just load it up, insert my row, and I am a happy man.

Now, I'll blog some other day about what happens with multiple indexes and such - today I want to talk about inserts against views, as this has lots of nasty details in its implementation.

create view v1 as select col1, col2, col3 from z2 where col2 between 2 and 100
ok, so I've added a view against z2 that shows a subset of the rows.  That's not too bad, or is it?  Well, if I insert/update the base table, my plan is the same as before, so that's easy enough.  However, if I try to run an insert against the _view_, what should happen?  The view isn't stored anywhere.  However, in this case, there is a single table underneath it and the rows from the base table can be "mapped" up through the view, so perhaps I could reverse that operation and translate a request to update the view to be an update against the base table. 

insert into v1 (col1, col2, col3) values (1, 2, 'foo')

  |--Clustered Index Insert(OBJECT:([t1].[dbo].[z2].[PK__z2__357D0D3E03317E3D]), SET:([t1].[dbo].[z2].[col1] = RaiseIfNullInsert([@1]),[t1].[dbo].[z2].[col2] = [Expr1003],[t1].[dbo].[z2].[col3] = [Expr1004]), DEFINE:([Expr1003]=CONVERT_IMPLICIT(decimal(18,0),[@2],0), [Expr1004]=CONVERT_IMPLICIT(nvarchar(100),[@3],0)))

Hey, that's pretty neat - SQL Server performs an insert against the base table that looks very similar to what we saw before.  Very nice of them :).

Ok, let's talk about the case when the row doesn't meet the filter:
insert into v1 (col1, col2, col3) values (3, 250, 'foo')
  |--Clustered Index Insert(OBJECT:([t1].[dbo].[z2].[PK__z2__357D0D3E03317E3D]), SET:([t1].[dbo].[z2].[col1] = RaiseIfNullInsert([@1]),[t1].[dbo].[z2].[col2] = [Expr1003],[t1].[dbo].[z2].[col3] = [Expr1004]), DEFINE:([Expr1003]=CONVERT_IMPLICIT(decimal(18,0),[@2],0), [Expr1004]=CONVERT_IMPLICIT(nvarchar(100),[@3],0)))

Hrm.  Well, that seems to work too... The ANSI committee got this far when working on their ANSI SQL specification, and they added a neat little keyword "with check option" that allows inserts, updates, and deletes against views to make sure that the resulting row would continue to be exposed through the view after the change before it is allowed.
create view v2 as select col1, col2, col3 from z2 where col2 between 2 and 100 with check option


insert into v2 (col1, col2, col3) values (1, 250, 'foo')

So if we run the same query as before against this view with "with check option" defined, we get this crazy looking plan:




What's all this then?  Well, some of this requires a few more operators of the day before I can fully explain the plan.  However, in this case it's doing the following:

1. create a dummy row that has the values you want to insert into it. 
2. insert it into the clustered index.
3. perform a check to see if this row matches the filter condition(s)
4. Assert that the check succeeded.  If not, fail the query and roll back the transaction.

All of that from a harmless little insert statement...

Well, let's do something else that is theoretically invertable and see how far this support goes:

create view v3 as select col1 + 1 as a, col2, col3 from z2 
insert into v3 (a, col2, col3) values (1, 250, 'foo')
Msg 4406, Level 16, State 1, Line 1
Update or insert of view or function 'v3' failed because it contains a derived or constant field.

So I claim that this could be supported as an updatable view type because the domain of col1 is integer and the scalar expression on it is invertable.  Unfortunately, none of the vendors really support this to my knowledge.  So, instead of it being a straight reference to a column in z2, we could create an expression that is the proper query needed to make the view v3 consistent by inserting a different row into z2. 

I've done two operators - basically the two easiest ones in any query processor.  However, lots of different operators can be supported and have some set of rules about invertability that can be used to perform inserts, updates, and deletes against them.

So that's scratching the surface on updates.  I'll try to post up a few more entries on different parts of the system, but I hope you've learned something today about how view updatability is implemented.

Thanks,

Conor
Wednesday, March 12, 2008 7:42:41 PM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
Saturday, March 08, 2008
So while I hope that I've demonstrated that I can say some interesting things about database engine design and implementation, I also want to post blog entries on the design of data-driven applications, as this is a topic I've been pondering a lot lately.

I'll start with something that seems completely off-topic that isn't...

So for Christmas I bought my wife an "experience" - She's always liked Ms. Pac-Man, so I figured that I would find a way to let her play anytime she wants.  I also didn't want a big, bulky arcade machine in the house - those things give off lots of heat and moving them isn't much fun.

I eventually settled on buying her an X-Arcade Joystick, hooking that up to a HTPC connected to my Television, and getting MAME set up to play it on an emulator.  My wife also likes q-bert and is getting quite good at it ;).

My wife and I also have a 2-year old daughter who is already smarter than us.  While we were playing this game, my daughter enjoyed watching the "videos" (the intermission videos played after level 2, after level 5, etc).  She thought they were wonderful - but she also got upset when we couldn't get to a video.  Eventually I started getting into the 3rd map in the game (after the second video), which for all you too young to remember looks like this:




So this is the "cranberry" level, and my daughter has cried because I can't get to the  level when I play.  I never would have expected to have disappointed my daughter for not playing a video game well enough, but it goes to show you that her expectations _start_ in life at a much higher level than mine will ever be.

She believes that I should always be able to get to the cranberry level (and stay there for as long as she is remotely interested).  She also believes that she can watch any television show at any time - she's never had to live without Tivo... I know I can't go back, but to her it must seem like we're living in the trees and grunting or something.  She also believes that pocket calculators are telephones, and she tries very hard to call my parent's cat on my wife's calculator when she gets ahold of it.  It *is* very funny, but honestly how far off is it?  Most cell phones have some calculator built into them now.  Why not the other way around?

As I've been outside of Microsoft for a bit, I am fortunate to get to see things from a different perspective.  If you think about it, databases haven't really changed that much in awhile.  SQL has been around a long time, and the core features get better each release but the paradigm in the server doesn't change much at all.  The UI tools certainly haven't changed - Access 2008 or whatever it is called will probably look a LOT like Access95.  SQL Server Management Studio is still basically a listview of objects and a query window.  Database tools are an area where I am trying very hard to go back and think more like my daughter - why CAN'T I get access to all of my data with a single query?  Why can't I backup my data more easily?   Sure, there have been various advances (XML, LINQ, ER Queries, Hibernate, S3, that Cloud SQL Server that got announced last week), and some of them are very cool, but how many of those do I expect to be around in 20 years?  How many really make my life better?  If you take a moment and think about it, there's a lot of complexity in today's systems, and the often don't "just work" unless you spend a LOT of money getting someone to write custom software for you - even then, it might not work that well when you get done.

So, I'll challenge you - remember that day you got your first Tivo - remember the day before that.  What is the one thing that you're going to do to make something that cool?  If you write setup, how are you going to make that better?  If you write a database tool, why am I going to LOVE that tool like I would a Tivo?  Don't make me post up pictures of my daughter crying - I'll do it!  Get out there and make something cool.  We'll all be happier if you do.

Thanks,

Conor Cunningham
Saturday, March 08, 2008 4:12:10 PM (Central Standard Time, UTC-06:00)  #    Comments [1]  | 
So as you proceed up the river into the jungle, searching for answers about how the query optimizer works, I'll ask you one question: Did you know that there's actually a lot of stuff that the Optimizer team just tells you?  It's in the product, and I'm constantly suprised by how little attention they get.  You can learn all sorts of things by looking at the data.  Now, not all of it is documented, but that's usually just because one needs leeway to change the internals rather than some deep, dark secret that needs to be kept.

Now that you're hooked ;), I'll tell you a bit about an optimizer DMV called sys.dm_exec_query_optimizer_info.  It tells you all sorts of things about how your querys are optimized.  It's a bag of counters for all sorts of things that, with a trained eye, can give you lots of insight into what is happening.

This guy is actually documented, at least partially, on MSDN. 




(That's from my SQL 2008 install, btw).

Some of these fields are "undocumented".  search 0, 1, and 2 are in that category.  I won't talk about them except to say that the names aren't really obfuscated too much. 

To learn about a particular query, you find a nice, quiet server and:

1. select * from this table, store the results somewhere
2. optimize a query of interest,
3. select from this table again, then compare the current totals to the originals.

I think that this is one use of this DMV - trying to figure out why a query takes a long time to optimize.

The other use of the DMV is to get a good statistical picture of a running system.  Say that I'm a DBA and I want to know how many queries in my application have hints or _need_ hints to work well.  Well, this will tell you.  Granted, it doesn't separate recompiles from compiles, and if you have a system where plans are getting kicked out of the cache things may be a bit skewed, but I can tell you that this is far better than simply guessing.  Often the DB application developer doesn't realize that they've built an application that requires a lot of hinting or a lot of compilations, and you can see this in more detail than you get with the performance counters.

I've already talked about "trivial plans", which are not documented in this DMV but are widely known in the other outputs of the system.  I'll let you guys guess about the search 0, 1, and 2 stuff - if you can back up your guess with a public post, book, or other form of comment I'll confirm if you get it right.

Have a great weekend, ya'll.

Conor Cunningham
Saturday, March 08, 2008 3:41:56 PM (Central Standard Time, UTC-06:00)  #    Comments [2]  | 
Thursday, March 06, 2008
I've been mostly working with a very kind soul from the SQL Server Installer Dev Team trying to fix my box.  So far my luck hasn't been too good, but I'll keep at it.

Things that are interesting to post if you have a blog and go through the same exercise that I did:
* which versions of CTP4/5? and CTP6 did you install - x86 or x64 - the registry keys are different.
For me, I think it was the x86 versions in both cases on my Vista Ultimate x64 box.

I learned that the version isn't really visible in the resource fork you can see for the setup.exe, which in hindsight is something I will do if I ever build another installer ;).

I'll write up some query stuff soon - send me a mail if there's a topic that causes you to lose some sleep and I'll see if there is anything interesting to say on it.

I'm also trying to put together a posting about my experiences with Ms. Pac-Man and my 2-year old daughter.

Conor

Thursday, March 06, 2008 8:55:41 PM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
Sunday, March 02, 2008
After fighting off a cold all week, I've had some more time to go play with CTP6 on my secondary machine.

Today we'll do some experiments to see how the Optimizer picks plans for filtered indexes.  This will help you figure out how to build your database schemas and queries to take advantage of this new kind of index.

I was involved in the inception of the idea to build filtered indexes into SQL Server, but I didn't have anything to do with the implementation.  So, what I'll cover today is just me playing with a feature to see what it can do. 

I thought that it would be good to play with disjoint ranges and see whether they can be "covered" by a filtered index.  In my last blog post, I covered how a basic, single range case *is* covered by a filtered index when the predicates match up correctly.  In this post, I'll try multiple ranges to see what happens.

Same database setup as before (just a table with wide rows and enough rows to make index choice obvious to the optimizer):

use t1

create database t1
use t1
drop table t1
create table t1(col1 int, col2 nchar(2000), col3 time)

create index i1 on t1(col1) where col1 > 5 and col1 < 20



create index i1 on t1(col1) where col1 > 5 and col1 < 20
declare @p int
set @p =0
while @p < 20000
begin
insert into t1(col1) values (rand()*10000)
set @p=@p+1
end
create index i2 on t1(col1) where col1 > 25 and col1 < 40


OK I added a second index over a different range in the same column. This is not a case that the
SQL Server 2005 optimizer had to handle.

query 1: a query with a disjunction (OR) of ranges (BETWEEN/AND).
select col1 from t1 where (col1 > 5 and col1 < 20) or (col1 > 25 and col1 < 40)
|--Table Scan(OBJECT:([t1].[dbo].[t1]), WHERE:([t1].[dbo].[t1].[col1]>(5) AND [t1].[dbo].[t1].[col1]<(20) OR [t1].[dbo].[t1].[col1]>(25) AND [t1].[dbo].[t1].[col1]<(40)))

Hrm. Well, there doesn't seem to be code in CTP that does index unions for cases like this, unfortunately.
Hinting isn't an option either because the index hint code returns an error if the hinted index does not
completely cover the index. On my machine, I get:

(43 row(s) affected)

SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 2661 ms.

That elapsed time is mostly reading pages off my IDE drive. To be clear, this is the _cold cache_ time,
which means that I ran "DBCC dropcleanbuffers" before my run. This throws out all of the buffer pool
pages and forces a read from disk. One assumption in the costing model for the SQL Server QP is that the
pages are _not_ in memory already (and there are cases where this becomes interesting). For this example,
it just means that we have a big table scan with lots of IOs to do (and is therefore slow).

Second example: IN lists
select col1 from t1 where col1 in (6, 7, 8, 26)
|--Table Scan(OBJECT:([t1].[dbo].[t1]), WHERE:([t1].[dbo].[t1].[col1]=(6) OR [t1].[dbo].[t1].[col1]=(7) OR [t1].[dbo].[t1].[col1]=(8) OR [t1].[dbo].[t1].[col1]=(26)))

Well, this is consistent.  IN is generally just a list of ORs as far as the QP is concerned.  Runtime for this query should be very similar to what we saw in our last example:

(7 row(s) affected)

 SQL Server Execution Times:
   CPU time = 63 ms,  elapsed time = 2708 ms.

OK, so what's an enterprising SQL developer to do?  Well, you can rewrite the query as a UNION ALL as long as you know that the ranges are disjoint and cover the predicate properly.
select col1 from t1 with (index=i1) where (col1 > 5 and col1 < 20) 
union all 
select col1 from t1 with (index=i2) where (col1 > 25 and col1 < 40)
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.

(43 row(s) affected)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

That's a lot nicer.

I have no idea if this is something that they intend to add for this release of the product.  I'm not really even sure how common this is.  How often do you want to apply a disjunction of ranges to multiple partial indexes where you don't want to create an index over the whole table?  No clue.  If you have such an example, I'd love to hear about it - drop me a line at conor@sqlskills.com

In the meantime, this is just a post on something interesting I learned while playing with the new CTP6 for SQL Server 2008 that I thought you might like to know.

Happy Querying!

Conor Cunningham

Sunday, March 02, 2008 3:05:32 PM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
Monday, February 25, 2008
(I got CTP 6 running on another machine, so I'm working except for parallel plans now since it's only a single-proc.  You guys will have to wait on that operator of the day article for now :)

Kudos to my former teammembers for getting filtered indexes into CTP6.  It's pretty darn neat, and I'll show you a few tidbits that are interesting.

So, filtered indexes are not really a new "feature" in some sense.  You can do everything in a filtered index with an indexed view.  Of course, indexed view matching is only supported in the Enterprise edition of the software, so perhaps not everyone has seen those benefits. 

Indexed views are a tricky feature - generalized tree pattern matching is hard (read: CPU expensive), and if you've looked in the BOL the list of restrictions is so long that reminds me of filling out tax forms.  However, the other side effect that occurs when you have indexed views is that the optimizer has to go through additional phases of its search in order to apply them.  The optimizer has a couple of buckets of rules it can run, and most plans find a nice plan in the first or second round of rules.  Generalized indexed view matching is restricted to the last set of buckets, which usually means that there are a lot more rules that have to run before that view can be matched.  The bottom line is that the compilation cost isn't bad for a single query (usually), but it's the sort of thing that can make the difference between an application that can be used in UI response-time requirements or not.

Enter the filtered index.  This is a recognition that there are a lot of indexed views that don't need tons and tons of fancy equations, joins, etc.  These are single-tabled views that have simple predicates.  Once you get into this ballpark, you can bolt this on to the super-efficient index matching code and you can enable a whole new class of application from what you could build previously.  This is why I am excited about this feature.

I haven't looked to see where this feature will fall into the SKU matrix yet, and I'm sure that they're still pondering that very question.  However, you guys should play with this on CTP6 - it's nifty!

So, first things first.  Let's build one of these guys and see if we can match it:

create database t1
use t1
drop table t1
create table t1(col1 int, col2 nchar(2000), col3 time)

create index i1 on t1(col1) where col1 > 5 and col1 < 20
declare @p int 
set @p =0
while @p < 20000
begin
insert into t1(col1) values (rand()*10000)
set @p=@p+1
end
ok so I've filled this table up with a lot of useless data and created a nice little filtered index.
select * from t1  where col1 > 5 and col1 < 20
  |--Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1000], [Expr1007]) OPTIMIZED WITH UNORDERED PREFETCH)
       |--Compute Scalar(DEFINE:([Expr1006]=BmkToPage([Bmk1000])))
       |    |--Index Scan(OBJECT:([t1].[dbo].[t1].[i1]))
       |--RID Lookup(OBJECT:([t1].[dbo].[t1]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD)

select * from t1  where col1 > 5 and col1 < 10
  |--Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1000]) OPTIMIZED)
       |--Compute Scalar(DEFINE:([Expr1006]=BmkToPage([Bmk1000])))
       |    |--Index Seek(OBJECT:([t1].[dbo].[t1].[i1]), SEEK:([t1].[dbo].[t1].[col1] < (10)) ORDERED FORWARD)
       |--RID Lookup(OBJECT:([t1].[dbo].[t1]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD)

so in the first example I've created a query against the table that directly matches the index condition.  It matches and even generates an index scan (slightly faster since it doesn't have to navigate down the b-tree to start returning rows).

The second example is interesting because is a proper subset of the index.  The indexed view matching code can do subsumption as well.  It generates a seek in this case and returns only part of the filtered index.

Both cases do RID lookups back to the heap since I did SELECT * and the index is not "covering".

The other data point I'll leave you with this evening is that the showplan_xml has something interesting in it:

StmtSimple StatementText="select * from t1 where col1 > 5 and col1 < 10" StatementId="1" StatementCompId="1" StatementType="SELECT" StatementSubTreeCost="0.0474183" StatementEstRows="7" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" ParameterizedText="(@1 tinyint,@2 tinyint)SELECT * FROM [t1] WHERE [col1]>@1 AND [col1]<@2">


Well, there are two interesting things.  First, StatementOptmLevel=FULL means that we didn't get a trivial plan.  There is a parameterized text field, but I don't think that is being used here.  these two queries are showing up as different plans in the plan cache (and obviously they have different plans). So, absent forced autoparam, I don't think that trivial plan is working on these guys.  This makes sense since there is a cost-based plan choice.

That's it for me tonight guys.  Happy Querying!

Conor

PS: Thanks for all of the comments and posts trying to help me get back to a working state.  I've tried many of the suggestions without luck, but I'm going to try playing with the orca tool, perhaps tomorrow or so, to see if I can get my main machine back up and working.

Monday, February 25, 2008 9:35:45 PM (Central Standard Time, UTC-06:00)  #    Comments [5]  | 
Sunday, February 24, 2008
oh well.  I've spent a fair amount of my day attempting to manually hack out a lot of the various registry keys associated with SQL Server 2008 in the hopes of upgrading from the November 2007 CTP to the Feb 2008 CTP on my main machine (Vista x64 box).  Since I received questions about whether I've gotten this to work, I figure I'd blog about where I am so you can compare with your own experiences.

Bottom line - No luck.  Here's the error I'm at now:

D:\temp\sqlctp6\servers>setup
The following error occurred:
MsiGetProductInfo failed to retrieve ProductVersion for package with Product Code = '{00E75F61-A126-4CE1-90B8-42295052F1AC}'. Error code: 1605.
Error result: 0xEEBAE95A
Result facility code: 0x6BA
Result error code: 0xE95A

I've learned a number of things - specifically that the non-SQL Server core Engine pieces have TONs of registry keys.  It seems that they took a shotgun and reserved a fraction fo the registry - SQLDMO and the DTS engine components seemed to be big offenders.

Either way, I'm getting blocked by my old friend the Windows Installer aka MSI.  Funnily enough, I've killed every key like this in the registry.  I'm also watching (via procmon.exe) and it's not asking for a registry key like this.  So, I guess MSI has stored something away in another data store and I haven't been able to find that to clean it up.  I've never had much patience for that thing.

I guess I'm in a bit of a reflective mood right now, as I am basically unable to install a product I had some hand in helping to write.  Certainly it's a beta, and I'm not upset that it has bugs in the installer per se - I'm willing to accept that.  What I'm less willing to accept is that the system can get into a state that is so obviously *&%$$%ed up and from which there is no real recovery path.

Let's look at where things failed here:
1. MSI storing information outside of the registry that impacts whether it runs or not (ie whether I can install or not).  No documentation on what to do when things get away from the good case.
2. GUIDs - customers don't care about them.. I don't care about them.  I can't remember numbers that long.  Why am I dealing with this?  I'm happy if you give me a number that I can pass to some other program, but this one seems not so useful.
3. The error message is not "actionable".  What do I do to fix this?  Where should I read additional information?
4. The "Error result" in this example.  Now, most people would have given up and called support... Instead I said "I should read up on this API and see what it says".  So I google for it (because loading help takes SO darn long and it asks me all sorts of privacy policy and other nonsense before I can get what I want AND it's not installed, of course ;) and I find the MSDN page on it - great.  It says nothing about this.
5. On the MSDN page, the errors are only listed using the C header literals - no hex codes on the page. grrr.  The installer didn't translate it for me back to the constant in the help page.
6. So I fire up Visual Studio 2008 and go to the error lookup tool, and I enter this code.  No good - it's not listed anywhere.  That seems bad.
7. I go back to the MSDN page, figure out that I should look in MSI.H.  Luckily, I have the Vista Platform SDK installed, so I go find that file.  Then I determine it's not even listed there either.
8. google on the error code by itself yields nothing of interest

So I'm left asking "Why is all of this so hard?".  The bottom line is that it SHOULDN'T be. 

Application design is an important aspect of any system.  As systems get more complex, it obviously becomes more challenging to make everything work.  It's important to determine the various use cases for your program, whatever it is, and then determine how you are going to manage the customer experience through all the steps of that process.  Here, the installation process failed on multiple levels and through multiple components.  The experience should be "your installation failed.  Here's how you can clean it up".  Even if that requires manual steps (it *is* a beta), there should be a path and at the end of that path should be a happy customer with a working computer installation.

I'm obviously left with options like "reinstall my OS", "install a VM" (which I won't do because I play games on this machine), or "find another machine for CTP6", which may be what I end up doing.

Conor finds himself longing for the days of .ini files all of a sudden.  he also apparently finds himself speaking in the third person about himself ;).  the old days of INI files would have been pretty easy to fix a failed install - "delete this directory and reinstall".

I hope this helps the others having trouble with CTP6 installation.  I'll see what I can do tomorrow to get a machine set up with SQL 2008 CTP 6.

Conor Cunningham

Sunday, February 24, 2008 8:35:34 PM (Central Standard Time, UTC-06:00)  #    Comments [6]  | 
Saturday, February 23, 2008
Well, my dev box is in a bad state now - can't install the new CTP and I had to start manually hacking on the old one.

I may have to install SQL Server 2008 CTP 8 on a VM or on another machine.

So, I apologize for delays in posting up examples to answer your questions.

The lesson of the day is that one should just not trust installers anymore ;).

Conor

Saturday, February 23, 2008 10:35:05 PM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
Thursday, February 21, 2008
One of the earlier comments I received asked about costing of T-SQL and CLR functions.  More specifically, they want to know how they are costed and why it is so low.

Let's create a motivating example.  I'll create a table with some size to it (I add a binary(2000) column to make each page take at least 2000 bytes, and this means that about 4 rows fit on a page.  So, if I add 100,000 rows to a table, we're talking about enough IOs to show up in the costing functions).
use t1

create table t3(col1 int, col2 binary(2000))
insert into t3(col1, col2) values (2, 0x1234)
insert into t3(col1, col2) values (3, 0x1234)
declare @p int
set @p=0
while @p < 100000
begin
insert into t3(col1, col2) values (rand()*1000, 0x1234)
set @p=@p+1
end


create function f3(@p1 int) RETURNS int
BEGIN
  return (select count(col1) from t3 where col1=@p1)
END

select dbo.f3(2)
(I apologize for my lack of real-world examples.. After you've done this a few thousand times you resort to the shortest names you can type ;).

ok, so we have a table and we create a T-SQL scalar function that just runs an aggregate to find the number of occurances of a particular value in the table.   Since there are no indees on t3 this will be a table scan and it will read lots of pages into the buffer pool.

So let's look at the costing output for each of the elements of this query.





Above is the plan for the base query (select dbo.f3(2)).  It emits a dummy row from a constant table and then does a "compute scalar" operation to run the function and generate a 1 row, 1 column result.

That cost doesn't look so big...

Now let's look at the plan for the scalar subquery (well, the top of it since we're looking for the total cost)






Well, the laws of physics appear to be violated. 

You can try whatever cost query you want in the function - unfortunately, these aren't really costed in the SQL Server QP (at least through 2005 and what I have seen so far on the 2008 CTPs).  Back when SQL Server 7 was being created, scalars were generally always cheap and they really didn't need costs.  There's a small amount of CPU cost added in a compute scalar and that generally worked for OLTP plans.  As SQL Server started being used in larger and larger scenarios, this issue started to manifest.

As SQL Server has added more features, some of them can be arbitrarily expensive and this issue can show up in plan selection as a problem.  Areas where you are likely to see this are T-SQL functions, CLR functions, string functions on varchar(max) and friends, and I'm sure that there are others.  However, it's worth noting that this won't be a problem in a lot of cases, and often workarounds are possible that may even be better design choices than doing what I've done in this example.  I don't recommend that you "hide" subqueries in scalar functions like this precisely because the optimizer can't see and cost them properly, even if they are as simple as this one.

There are a few tricky problems in "fixing" this (costing the scalar functions in line with their runtime).  First, this isn't exactly the sort of thing that people are going to beat down the gates at Microsoft to get fixed.  However, it is an area where the model doesn't work right, and when a customer does hit this they are often very far into a deployment or POC and can't exactly redesign their application easily.  Second, there are actually lots of really complex things that happen with the placement of computescalars and how they are evaluated in the optimizer and execution engine.  Finally, the SQL Server QP is set up to do relational transformations (A join B is equivalent to B join A, etc.), and the computation fo scalars isn't really a true relational operation in this algebra.  This makes true costing of compute scalars always something that is different than costing a join or a filter.  Third, changing the costing of such a basic operator will probably impact the plan selection for basically EVERY query, as their costs will change slightly - that's a big risk to fix a "little" bug. The last tricky bit in fixing this has to do with the way that T-SQL functions like this get exposed into the optimizer.  Not all queries really have known costs a priori.  The example function that I've given you does have a fixed cost, under some definition, but if I start putting procedural logic into the function, then the actual executed path depends on the runtime data, and that means that the cost is based on something you haven't interpreted yet.  So, there are some at least reasonable explanations as to why this issue persists to this day.

So my advice is that if you have cases where your scalar functions are undercosted and you think it impacts plan selection, then you need to contact Microsoft and let them know.  I have seen some cases of this, but I think that it's perhaps not the most common problem and that might have some impact on whether it gets addressed in the future.  I *guarantee* you that the people who work there are highly skilled and passionate about solving customer issues, as I worked with them, so I suspect that if there's enough squeaky wheels then this is something that may be addressed.

Your other option is to send me money, I guess ;).

I hope that gives you a bit of background on the issues you need to know when examining your query plans when you have expensive scalars in them.

Conor Cunningham




Thursday, February 21, 2008 9:03:40 PM (Central Standard Time, UTC-06:00)  #    Comments [2]  | 
Tuesday, February 19, 2008
ok so I haven't written anything about statistics yet on the blog, and Paul fielded a question from someone asking about details of the statistics implementation in SQL Server.  Luckily, I know a little bit about that.

So, this post will cover the published background material that I think everyone should read about SQL Server's statistics implementation, and then we can have a more detailed conversation about the nitty gritty fun details of them.

So this was written for SQL 2000 but the basic details apply to SQL Server 2005 and 2008.  I'd read this first.
http://msdn2.microsoft.com/en-us/library/aa902688(SQL.80).aspx

There was an additional white paper written for SQL Server 2005 that covers the new features that were added.  Lots of extensions where we refined the features from 2000 and made them work more effectively.
http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx


If you have questions about statistics, feel free to post them up.  If you wish to question statistics, then well I guess you should start with DBCC SHOW_STATISTICS. 

Conor Cunningham
Tuesday, February 19, 2008 7:43:49 PM (Central Standard Time, UTC-06:00)  #    Comments [1]  | 
Monday, February 18, 2008
So I like kung-fu movies.  I don't care how bad the acting is, as long as there is some master/relative/friend who has been killed who needs avenged.  Often good movies are made this way.

In software, a good idea often comes out of getting tired of seeing bugs on a sore subject, often not even your fault.  So, in a fit of rage, I locked myself in my office for a few weeks and came up with the streaming table-valued function, since used as the proverbial duct-tape within the QP for all sorts of unforseen features.  However, it has a "walking on rice paper without leaving footprints" vibe, so I will tell you a bit about this guy ;).

(I did the QO part for this operator and a chunk of the architecture for it - there were others who worked on it too, however).

So a query operator takes rows in the bottom and spits rows out the top (yes, none of the exposed query plan graphs are drawn this way, but that's how people talk about it ;).  Some operators take more than one input - most of the ones with 2 inputs are called "joins".  Some can take an arbitrary number of inputs (usually called UNIONs).  Some take parameters (Compute Scalar).  most don't take parameters, however.

After you've done the 10 or so common operators in a query processor, there's a lot of specialty problems that don't really fit well into those operators.  So people end up trying all sorts of crazy paths to get the data they need out to customers.  Exuberence can often cause a few bugs, however, so eventually something needed to be done.

So, the "Streaming Table Valued Function" (STVF) is a relational operator (takes in rows and passes out rows).  In this case, it takes in some number of scalar parameters (like any good function) and spits out zero or more rows.  The beauty of this operator is that anyone can program anything to get returned, as long as they return back rows - it's like a user-defined operator for internal folks to write features for customers.  Amazingly, there are a lot of these, and you can see them in Dynamic Management Views (amongst others).

So, when you are looking at the plans in the plan cache or some other DMV, you are using this nifty operator:

SELECT usecounts, cacheobjtype, objtype, text 
FROM sys.dm_exec_cached_plans 
CROSS APPLY sys.dm_exec_sql_text(plan_handle) 
WHERE usecounts > 1 
ORDER BY usecounts DESC;
(actually you are using it _twice_ in this case, as there are two different sys.dm* references).

  |--Nested Loops(Inner Join, OUTER REFERENCES:(SYSDMEXECCACHEDPLANS.[plan_handle]))
       |--Sort(ORDER BY:(SYSDMEXECCACHEDPLANS.[usecounts] DESC))
       |    |--Filter(WHERE:(SYSDMEXECCACHEDPLANS.[usecounts]>(1)))
       |         |--Table-valued function
       |--Table-valued function

In this plan, you can see that rows are pulled from one TVF, are filtered, then sorted, then passed into the second TVF.

The "OUTER REFERENCE" is the key that the loops join is passing something from the inner side to the outer side (remember in the crazy syntax used here, the first row below the Nested Loops is the "inner" and the second line is the "outer".  So, the Sort/Filter/TVF is the "inner" in this case.

These things differ from the traditional TVFs that you could create in SQL Server as a user.  Those were actually backed by temporary tables in many cases, and this made it a pretty heavy-weight solution if you just want to pass a few rows into the server.  _Streaming_ sounds a lot better than that, you must admit.

Many operators in a QP are streaming - filter is, as is a table scan.  However a sort operator is called a "stop-and-go" operator because it generally consumes all of its input rows before producing the first output row.  Streaming TVFs would fall into the "not stop-and-go" category.

This operator is used heavily starting in SQL 2005 in lots and lots of places.  It is not really something you can directly reference (there is no exposed syntax for this operator), and as such it requires a bit of hunting to track down the various places it gets used.

I'll leave it as an exercise for you guys to find a few non-DMV places that use STVFs.  Any TVF that takes arguments is likely an STVF.  So go look at those query plans, folks!  First one who tells me a non-DMV/DMF STVF feature gets a star next to their name! 

I also know that there is at least one new feature in SQL Server 2008 that uses the STVF.

Happy querying!

Conor Cunningham

Monday, February 18, 2008 9:18:49 PM (Central Standard Time, UTC-06:00)  #    Comments [4]  | 
Saturday, February 16, 2008
The blog was down a bit today - sorry for all who got to see CLR stack traces pushed through HTTP (I *will* do a post on that sometime ;)...


A reader sent in a question about my previous RAND() post (here) and wanted to know if other functions were pre-evaluated.  The specific example was about DATEADD(), but my answer will apply to most functions.

The short answer is "DATEADD() is not pre-evaluated once per query in the same way as RAND() is".

Most functions in SQL Server are evaluated once per row, just as you'd expect.  Even most non-deterministic functions are executed once per row.  The key reason for this is that often these functions are run over column data from a row in a table, and the value can change for each row.

So, this dateadd function is executed once per row:
select DATEADD("dd", 5, col1) from t1
Here's part of the showplan_xml output for this query
            <ColumnReference Column="Expr1004" />
                    <ScalarOperator ScalarString="dateadd(day,(5),[t1].[dbo].[t1].[col1])">
                      <Intrinsic FunctionName="dateadd">
                        <ScalarOperator>
                          <Const ConstValue="(4)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="(5)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Database="[t1]" Schema="[dbo]" Table="[t1]" Column="col1" />
                          </Identifier>
                        </ScalarOperator>
                      </Intrinsic>
                    </ScalarOperator>

And here is what you see in the showplan_xml for RAND():
select RAND() from t1
                    <ScalarOperator ScalarString="rand()">
                      <Identifier>
                        <ColumnReference Column="ConstExpr1005">
                          <ScalarOperator>
                            <Intrinsic FunctionName="rand">
                              <ScalarOperator>
                                <Const ConstValue="" />
                              </ScalarOperator>
                            </Intrinsic>
                          </ScalarOperator>
                        </ColumnReference>
                      </Identifier>
                    </ScalarOperator>

Now, those of you paying close attention will notice that the _name_ of the column is "ConstExprXXXX" instead of "ExprXXXX".

I don't think that I'd rely on the name meaning much, and I don't even recall if there are other things that use ConstExprXXXX names.  It might be a good indicator that you are getting warmer if you are looking for which functions are executed once at the start of the query.

If you use the query patterns that I posted and that others posted in comments, you can determine what is happening for each function you wish to investigate.

You can probably look through the list of public functions (or get fancy and run strings.exe on the sql server binary) to look for function names that you can evaluate to see if they are showing up as ConstExpr or non-ConstExpr.  One that comes to mind is getdate() - it's non-deterministic and you likely don't want the results to change during a long-running query.  I'll leave you guys to go find any others - I don't believe that there lots of these types of functions.

As a side note for all you readers out there, I won't be talking about internal details of the system beyond what is published or what is discussed publicly.  So, you won't be hearing from me about SQL Server's secret plans for including bullhorns in every box of SQL Server or how the logging feature is really made from Area 51 technologies.  However, I will tell you guys the basics of database and QP design and show you the *external* places where you can look to see what information Microsoft makes available to you.  Most of the time I plan to point you to some experiment you can run that will tell you, with a little context, enough to answer whatever question you have.

Have a great weekend all - I'm back to painting a room in my house.

Conor Cunningham

Saturday, February 16, 2008 5:36:41 PM (Central Standard Time, UTC-06:00)  #    Comments [2]  | 
Wednesday, February 13, 2008
I've been playing with the new grouping sets feature now that it's in a CTP, partially because I had a small hand in them before I left Microsoft, but partially because I'm trying to re-examine features from a more external perspective.  So, I figure this is a good conversation starter.

So, as all programmers are lazy, I started with my former collegue Craig's blog post on the subject (http://blogs.msdn.com/craigfr/archive/2007/10/11/grouping-sets-in-sql-server-2008.aspx).  Craig's a query execution guy by training (and a darn good one), so many of Craig's blog posts are pretty complimentary to my background.  Anyways, his post doesn't talk about the various plans.  Part of the reason is that GROUPING SETS contains a fair amount of syntactic sugar.  It's a feature that lets you write CUBEs and ROLLUPs in a different way. It's slightly more general and powerful, but I'm still trying to get my head around why Oracle/IBM pushed to get this added to the ANSI spec.  You can do most of this with CUBE + filters (and a good query optimizer).

So let's talk about cube and rollup plans from an outsider's perspective, as I haven't seen anyone write about this in the depth that I would like.

If I take a the rollup query from Craig's example in SQL Server 2008's November CTP, I can see the following plan:



ok, two aggregates - what's up with that?  Well, if you look at the properties for the second aggregate, you can see that there is an extra bit of information that tells you that it's actually not a classical aggregate operator:



ok, so this guy does some special magic but it is represented in the external query plan as an aggregate function evaluator.  It just adds the extra rows that you see in a ROLLUP query. 

The key part of ROLLUPs is that they can be computed in one linear pass of the data (sorted appropriately).

CUBE plans are a little trickier:


So the QP conatenates a number of different branches of operations, and it actually ends up being a way to compute the matrix of different dimensions - if you have more columns in your cube query, you'll get more branches.  If you look at the second "stream aggregate" in a branch with 2, you'll see the rollup information just like in the rollup example. 

So CUBE is a fancier form of ROLLUP.  Obviously it has one sort per dimension, which is EXPENSIVE on big tables.  (You could have indexes for each branch, which could make things faster, assuming you have the disk space for the indexes). 

In SQL 2005, the plan is slightly different for Craig's example:



This plan is great if the size of the data going through the CUBEing operation is much smaller than the size of the table.  In this case, it writes the data to a temporary spool, sorts the output of that, and then re-reads that spool in the second branch.  I don't recall if SQL Server 2005 generated the SQL 2008 plan or not, but the SQL 2008 plan is potentially more parallelizable since each branch is independent.  Note that reading the base table twice could open you up to more interesting locking, but this can happen in any query with multiple scans of the same table if your isolation level is too low.  I haven't tried to create a parllel CUBE plan on SQL 2008, but I'd guess it is possible...

ok, so now we can add in GROUPING SETS.  These are essentially CUBEs with the ability to prune branches you don't need.  This can get more interesting when you have lots of columns on which you are CUBEing, but the example from Craig that I'm using here doesn't do much beyond CUBE.

Technically, one could try to start filtering NULLs from the result of a CUBE and perhaps the query optimizer could work backwards to deduct that it didn't need to compute various branches of a CUBE like GROUPING SETs can do.  Obviously it's easier when the cust