Saturday, February 23, 2008
Well, my dev box is in a bad state now - can't install the new CTP and I had to start manually hacking on the old one.

I may have to install SQL Server 2008 CTP 8 on a VM or on another machine.

So, I apologize for delays in posting up examples to answer your questions.

The lesson of the day is that one should just not trust installers anymore ;).

Conor

Saturday, February 23, 2008 10:35:05 PM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
Thursday, February 21, 2008
One of the earlier comments I received asked about costing of T-SQL and CLR functions.  More specifically, they want to know how they are costed and why it is so low.

Let's create a motivating example.  I'll create a table with some size to it (I add a binary(2000) column to make each page take at least 2000 bytes, and this means that about 4 rows fit on a page.  So, if I add 100,000 rows to a table, we're talking about enough IOs to show up in the costing functions).
use t1

create table t3(col1 int, col2 binary(2000))
insert into t3(col1, col2) values (2, 0x1234)
insert into t3(col1, col2) values (3, 0x1234)
declare @p int
set @p=0
while @p < 100000
begin
insert into t3(col1, col2) values (rand()*1000, 0x1234)
set @p=@p+1
end


create function f3(@p1 int) RETURNS int
BEGIN
  return (select count(col1) from t3 where col1=@p1)
END

select dbo.f3(2)
(I apologize for my lack of real-world examples.. After you've done this a few thousand times you resort to the shortest names you can type ;).

ok, so we have a table and we create a T-SQL scalar function that just runs an aggregate to find the number of occurances of a particular value in the table.   Since there are no indees on t3 this will be a table scan and it will read lots of pages into the buffer pool.

So let's look at the costing output for each of the elements of this query.





Above is the plan for the base query (select dbo.f3(2)).  It emits a dummy row from a constant table and then does a "compute scalar" operation to run the function and generate a 1 row, 1 column result.

That cost doesn't look so big...

Now let's look at the plan for the scalar subquery (well, the top of it since we're looking for the total cost)






Well, the laws of physics appear to be violated. 

You can try whatever cost query you want in the function - unfortunately, these aren't really costed in the SQL Server QP (at least through 2005 and what I have seen so far on the 2008 CTPs).  Back when SQL Server 7 was being created, scalars were generally always cheap and they really didn't need costs.  There's a small amount of CPU cost added in a compute scalar and that generally worked for OLTP plans.  As SQL Server started being used in larger and larger scenarios, this issue started to manifest.

As SQL Server has added more features, some of them can be arbitrarily expensive and this issue can show up in plan selection as a problem.  Areas where you are likely to see this are T-SQL functions, CLR functions, string functions on varchar(max) and friends, and I'm sure that there are others.  However, it's worth noting that this won't be a problem in a lot of cases, and often workarounds are possible that may even be better design choices than doing what I've done in this example.  I don't recommend that you "hide" subqueries in scalar functions like this precisely because the optimizer can't see and cost them properly, even if they are as simple as this one.

There are a few tricky problems in "fixing" this (costing the scalar functions in line with their runtime).  First, this isn't exactly the sort of thing that people are going to beat down the gates at Microsoft to get fixed.  However, it is an area where the model doesn't work right, and when a customer does hit this they are often very far into a deployment or POC and can't exactly redesign their application easily.  Second, there are actually lots of really complex things that happen with the placement of computescalars and how they are evaluated in the optimizer and execution engine.  Finally, the SQL Server QP is set up to do relational transformations (A join B is equivalent to B join A, etc.), and the computation fo scalars isn't really a true relational operation in this algebra.  This makes true costing of compute scalars always something that is different than costing a join or a filter.  Third, changing the costing of such a basic operator will probably impact the plan selection for basically EVERY query, as their costs will change slightly - that's a big risk to fix a "little" bug. The last tricky bit in fixing this has to do with the way that T-SQL functions like this get exposed into the optimizer.  Not all queries really have known costs a priori.  The example function that I've given you does have a fixed cost, under some definition, but if I start putting procedural logic into the function, then the actual executed path depends on the runtime data, and that means that the cost is based on something you haven't interpreted yet.  So, there are some at least reasonable explanations as to why this issue persists to this day.

So my advice is that if you have cases where your scalar functions are undercosted and you think it impacts plan selection, then you need to contact Microsoft and let them know.  I have seen some cases of this, but I think that it's perhaps not the most common problem and that might have some impact on whether it gets addressed in the future.  I *guarantee* you that the people who work there are highly skilled and passionate about solving customer issues, as I worked with them, so I suspect that if there's enough squeaky wheels then this is something that may be addressed.

Your other option is to send me money, I guess ;).

I hope that gives you a bit of background on the issues you need to know when examining your query plans when you have expensive scalars in them.

Conor Cunningham




Thursday, February 21, 2008 9:03:40 PM (Central Standard Time, UTC-06:00)  #    Comments [2]  | 
Tuesday, February 19, 2008
ok so I haven't written anything about statistics yet on the blog, and Paul fielded a question from someone asking about details of the statistics implementation in SQL Server.  Luckily, I know a little bit about that.

So, this post will cover the published background material that I think everyone should read about SQL Server's statistics implementation, and then we can have a more detailed conversation about the nitty gritty fun details of them.

So this was written for SQL 2000 but the basic details apply to SQL Server 2005 and 2008.  I'd read this first.
http://msdn2.microsoft.com/en-us/library/aa902688(SQL.80).aspx

There was an additional white paper written for SQL Server 2005 that covers the new features that were added.  Lots of extensions where we refined the features from 2000 and made them work more effectively.
http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx


If you have questions about statistics, feel free to post them up.  If you wish to question statistics, then well I guess you should start with DBCC SHOW_STATISTICS. 

Conor Cunningham
Tuesday, February 19, 2008 7:43:49 PM (Central Standard Time, UTC-06:00)  #    Comments [1]  | 
Monday, February 18, 2008
So I like kung-fu movies.  I don't care how bad the acting is, as long as there is some master/relative/friend who has been killed who needs avenged.  Often good movies are made this way.

In software, a good idea often comes out of getting tired of seeing bugs on a sore subject, often not even your fault.  So, in a fit of rage, I locked myself in my office for a few weeks and came up with the streaming table-valued function, since used as the proverbial duct-tape within the QP for all sorts of unforseen features.  However, it has a "walking on rice paper without leaving footprints" vibe, so I will tell you a bit about this guy ;).

(I did the QO part for this operator and a chunk of the architecture for it - there were others who worked on it too, however).

So a query operator takes rows in the bottom and spits rows out the top (yes, none of the exposed query plan graphs are drawn this way, but that's how people talk about it ;).  Some operators take more than one input - most of the ones with 2 inputs are called "joins".  Some can take an arbitrary number of inputs (usually called UNIONs).  Some take parameters (Compute Scalar).  most don't take parameters, however.

After you've done the 10 or so common operators in a query processor, there's a lot of specialty problems that don't really fit well into those operators.  So people end up trying all sorts of crazy paths to get the data they need out to customers.  Exuberence can often cause a few bugs, however, so eventually something needed to be done.

So, the "Streaming Table Valued Function" (STVF) is a relational operator (takes in rows and passes out rows).  In this case, it takes in some number of scalar parameters (like any good function) and spits out zero or more rows.  The beauty of this operator is that anyone can program anything to get returned, as long as they return back rows - it's like a user-defined operator for internal folks to write features for customers.  Amazingly, there are a lot of these, and you can see them in Dynamic Management Views (amongst others).

So, when you are looking at the plans in the plan cache or some other DMV, you are using this nifty operator:

SELECT usecounts, cacheobjtype, objtype, text 
FROM sys.dm_exec_cached_plans 
CROSS APPLY sys.dm_exec_sql_text(plan_handle) 
WHERE usecounts > 1 
ORDER BY usecounts DESC;
(actually you are using it _twice_ in this case, as there are two different sys.dm* references).

  |--Nested Loops(Inner Join, OUTER REFERENCES:(SYSDMEXECCACHEDPLANS.[plan_handle]))
       |--Sort(ORDER BY:(SYSDMEXECCACHEDPLANS.[usecounts] DESC))
       |    |--Filter(WHERE:(SYSDMEXECCACHEDPLANS.[usecounts]>(1)))
       |         |--Table-valued function
       |--Table-valued function

In this plan, you can see that rows are pulled from one TVF, are filtered, then sorted, then passed into the second TVF.

The "OUTER REFERENCE" is the key that the loops join is passing something from the inner side to the outer side (remember in the crazy syntax used here, the first row below the Nested Loops is the "inner" and the second line is the "outer".  So, the Sort/Filter/TVF is the "inner" in this case.

These things differ from the traditional TVFs that you could create in SQL Server as a user.  Those were actually backed by temporary tables in many cases, and this made it a pretty heavy-weight solution if you just want to pass a few rows into the server.  _Streaming_ sounds a lot better than that, you must admit.

Many operators in a QP are streaming - filter is, as is a table scan.  However a sort operator is called a "stop-and-go" operator because it generally consumes all of its input rows before producing the first output row.  Streaming TVFs would fall into the "not stop-and-go" category.

This operator is used heavily starting in SQL 2005 in lots and lots of places.  It is not really something you can directly reference (there is no exposed syntax for this operator), and as such it requires a bit of hunting to track down the various places it gets used.

I'll leave it as an exercise for you guys to find a few non-DMV places that use STVFs.  Any TVF that takes arguments is likely an STVF.  So go look at those query plans, folks!  First one who tells me a non-DMV/DMF STVF feature gets a star next to their name! 

I also know that there is at least one new feature in SQL Server 2008 that uses the STVF.

Happy querying!

Conor Cunningham

Monday, February 18, 2008 9:18:49 PM (Central Standard Time, UTC-06:00)  #    Comments [4]  | 
Saturday, February 16, 2008
The blog was down a bit today - sorry for all who got to see CLR stack traces pushed through HTTP (I *will* do a post on that sometime ;)...


A reader sent in a question about my previous RAND() post (here) and wanted to know if other functions were pre-evaluated.  The specific example was about DATEADD(), but my answer will apply to most functions.

The short answer is "DATEADD() is not pre-evaluated once per query in the same way as RAND() is".

Most functions in SQL Server are evaluated once per row, just as you'd expect.  Even most non-deterministic functions are executed once per row.  The key reason for this is that often these functions are run over column data from a row in a table, and the value can change for each row.

So, this dateadd function is executed once per row:
select DATEADD("dd", 5, col1) from t1
Here's part of the showplan_xml output for this query
            <ColumnReference Column="Expr1004" />
                    <ScalarOperator ScalarString="dateadd(day,(5),[t1].[dbo].[t1].[col1])">
                      <Intrinsic FunctionName="dateadd">
                        <ScalarOperator>
                          <Const ConstValue="(4)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Const ConstValue="(5)" />
                        </ScalarOperator>
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Database="[t1]" Schema="[dbo]" Table="[t1]" Column="col1" />
                          </Identifier>
                        </ScalarOperator>
                      </Intrinsic>
                    </ScalarOperator>

And here is what you see in the showplan_xml for RAND():
select RAND() from t1
                    <ScalarOperator ScalarString="rand()">
                      <Identifier>
                        <ColumnReference Column="ConstExpr1005">
                          <ScalarOperator>
                            <Intrinsic FunctionName="rand">
                              <ScalarOperator>
                                <Const ConstValue="" />
                              </ScalarOperator>
                            </Intrinsic>
                          </ScalarOperator>
                        </ColumnReference>
                      </Identifier>
                    </ScalarOperator>

Now, those of you paying close attention will notice that the _name_ of the column is "ConstExprXXXX" instead of "ExprXXXX".

I don't think that I'd rely on the name meaning much, and I don't even recall if there are other things that use ConstExprXXXX names.  It might be a good indicator that you are getting warmer if you are looking for which functions are executed once at the start of the query.

If you use the query patterns that I posted and that others posted in comments, you can determine what is happening for each function you wish to investigate.

You can probably look through the list of public functions (or get fancy and run strings.exe on the sql server binary) to look for function names that you can evaluate to see if they are showing up as ConstExpr or non-ConstExpr.  One that comes to mind is getdate() - it's non-deterministic and you likely don't want the results to change during a long-running query.  I'll leave you guys to go find any others - I don't believe that there lots of these types of functions.

As a side note for all you readers out there, I won't be talking about internal details of the system beyond what is published or what is discussed publicly.  So, you won't be hearing from me about SQL Server's secret plans for including bullhorns in every box of SQL Server or how the logging feature is really made from Area 51 technologies.  However, I will tell you guys the basics of database and QP design and show you the *external* places where you can look to see what information Microsoft makes available to you.  Most of the time I plan to point you to some experiment you can run that will tell you, with a little context, enough to answer whatever question you have.

Have a great weekend all - I'm back to painting a room in my house.

Conor Cunningham

Saturday, February 16, 2008 5:36:41 PM (Central Standard Time, UTC-06:00)  #    Comments [2]  | 
Wednesday, February 13, 2008
I've been playing with the new grouping sets feature now that it's in a CTP, partially because I had a small hand in them before I left Microsoft, but partially because I'm trying to re-examine features from a more external perspective.  So, I figure this is a good conversation starter.

So, as all programmers are lazy, I started with my former collegue Craig's blog post on the subject (http://blogs.msdn.com/craigfr/archive/2007/10/11/grouping-sets-in-sql-server-2008.aspx).  Craig's a query execution guy by training (and a darn good one), so many of Craig's blog posts are pretty complimentary to my background.  Anyways, his post doesn't talk about the various plans.  Part of the reason is that GROUPING SETS contains a fair amount of syntactic sugar.  It's a feature that lets you write CUBEs and ROLLUPs in a different way. It's slightly more general and powerful, but I'm still trying to get my head around why Oracle/IBM pushed to get this added to the ANSI spec.  You can do most of this with CUBE + filters (and a good query optimizer).

So let's talk about cube and rollup plans from an outsider's perspective, as I haven't seen anyone write about this in the depth that I would like.

If I take a the rollup query from Craig's example in SQL Server 2008's November CTP, I can see the following plan:



ok, two aggregates - what's up with that?  Well, if you look at the properties for the second aggregate, you can see that there is an extra bit of information that tells you that it's actually not a classical aggregate operator:



ok, so this guy does some special magic but it is represented in the external query plan as an aggregate function evaluator.  It just adds the extra rows that you see in a ROLLUP query. 

The key part of ROLLUPs is that they can be computed in one linear pass of the data (sorted appropriately).

CUBE plans are a little trickier:


So the QP conatenates a number of different branches of operations, and it actually ends up being a way to compute the matrix of different dimensions - if you have more columns in your cube query, you'll get more branches.  If you look at the second "stream aggregate" in a branch with 2, you'll see the rollup information just like in the rollup example. 

So CUBE is a fancier form of ROLLUP.  Obviously it has one sort per dimension, which is EXPENSIVE on big tables.  (You could have indexes for each branch, which could make things faster, assuming you have the disk space for the indexes). 

In SQL 2005, the plan is slightly different for Craig's example:



This plan is great if the size of the data going through the CUBEing operation is much smaller than the size of the table.  In this case, it writes the data to a temporary spool, sorts the output of that, and then re-reads that spool in the second branch.  I don't recall if SQL Server 2005 generated the SQL 2008 plan or not, but the SQL 2008 plan is potentially more parallelizable since each branch is independent.  Note that reading the base table twice could open you up to more interesting locking, but this can happen in any query with multiple scans of the same table if your isolation level is too low.  I haven't tried to create a parllel CUBE plan on SQL 2008, but I'd guess it is possible...

ok, so now we can add in GROUPING SETS.  These are essentially CUBEs with the ability to prune branches you don't need.  This can get more interesting when you have lots of columns on which you are CUBEing, but the example from Craig that I'm using here doesn't do much beyond CUBE.

Technically, one could try to start filtering NULLs from the result of a CUBE and perhaps the query optimizer could work backwards to deduct that it didn't need to compute various branches of a CUBE like GROUPING SETs can do.  Obviously it's easier when the customer just tells the QP what it wants directly.

I haven't tried to exhaustively go through the various kinds of plans likely in SQL Server.  There aren't really that many with ROLLUP/CUBE, at least not from what I've seen.  However, I hope this gives you an intro into how they work and how you can look at the query execution plan to see what is happening.

Now's the part where I get to ask for your input so I can learn something ;).  I want to hear about scenarios where you use CUBE/ROLLUP today - please send me a mail at conor@sqlskills.com!  I'll post up a summary of what I learn.  If you plan to use GROUPING SETS, let me know that too.  I'm very curious about what scenarios people do relational computations like this (as opposed to using Analysis services, for example).

Thanks for reading!

Conor Cunningham

Wednesday, February 13, 2008 8:37:52 PM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
Monday, February 11, 2008
So there's all sorts of legacy behaviors in SQL Server that will keep me employable for years and years because it didn't make sense when Microsoft started shipping SQL Server and there's too many people to change it now ;).

RAND() is one of those things that behaves differently in the various commercial database vendors, and it takes some time to grok what's happening.

Let's say that I want to get a random set of rows from a table - that seems like a nice thing to do, right?

So I go write:

SELECT * FROM MyTable WHERE RAND() > 0.5

This seems like a great first attempt to get about half the rows back, assuming a good pseudo-random number implementation.  However, you'll notice that if you run this, the behavior isn't quite what might be expected (at least the way I've described it in this example).

So when one executes this query, you'll notice that sometimes you get all rows from your table and other times you'll get none.  But wait - functions are run per-row, right?  Well, no, at least not in all cases.  The complete details are beyond a single blog posting, but for largely historical reasons the RAND() function is actually executed per query, not per row.  That means that, logically, you get the query from the plan cache, get it ready to run, and then run RAND() once, caching the value.  Then, each and every row uses that cached value when evaluating any scalar logic. 

So, for this example, either that initial call to RAND() returned a value greater than .5 or it didn't.  As such, you get all or no rows only.

SQL Server added TABLESAMPLE as a mechanism to address part of this need - retreiving a sample from a table.  However, it isn't arbitrarily composable, and the semantics used in the default system implementation in SQL 2005 are actually intended to be used for statistics generation - a slightly different goal than quality sampling.

But wait - a lot of functions DO execute per row.  How can I tell which is which?  Well, I am not sure that there is a publicly exposed way, other than trial and error, to determine this.  I need to go do some more homework, as my memory is, well, random.

Even worse, some function computations are moved around in query trees, and so the actual number of executions may not be what you think as "per row".  For example, if you execute a filter before/after the scalar logic, then the results may be different.  Put another way, if you use an index seek vs. a table scan, how many times do you expect any old function to be executed?

It's enough to make your head spin! 

The basic rules are that, as long as the optimizer knows about the semantic issues, it will try very hard to not move around functions who have different semantics when executed different numbers of times.

So, be careful with your functions, built-in, user-defined, or whatever exotic form gets dreamed up next.  There's a lot going on under the covers, and knowing a few details can help you more deeply understand what a query is doing.


Thanks,

Conor Cunningham



Monday, February 11, 2008 9:02:29 PM (Central Standard Time, UTC-06:00)  #    Comments [2]  | 
Sunday, February 10, 2008
I previously blogged about the sequence project operator, which I helped architect/develop while working on the SQL Server QP (here).  In that same post, I promised to tell you about the Segment operator in more detail as well.

Segment is largely a "helper" operator in SQL Server.  It does very little and there isn't a direct way to create one.  They are often used to help facilitate more complex operations.

Every operator in SQL Server's QP takes rows of input and produces rows of output.  Some have multiple children (joins), while others are just unary.  Segment is a unary operator that takes rows of input and determines if some set of columns have changed from the previous row.  Usually, the input on those columns are ordered or grouped to make this meaningful.  An extra column is created, like in a compute scalar operation, which records if the value has changed.  That computed value is then used to drive other physical operators.

You can see what columns are considered in each segment by looking at showplan_xml (from the ranking query in my previous post):

                <RelOp NodeId="1" PhysicalOp="Segment" LogicalOp="Segment" EstimateRows="1000" EstimateIO="0" EstimateCPU="2e-005" AvgRowSize="23" EstimatedTotalSubtreeCost="0.0328114" Parallel="0" EstimateRebinds="0" EstimateRewinds="0">
                  <OutputList>
                    <ColumnReference Database="[t1]" Schema="[dbo]" Table="[z]" Column="col1" />
                    <ColumnReference Database="[t1]" Schema="[dbo]" Table="[z]" Column="col2" />
                    <ColumnReference Column="Segment1005" />
                    <ColumnReference Column="Segment1006" />
                  </OutputList>
                  <Segment>
                    <GroupBy>
                      <ColumnReference Database="[t1]" Schema="[dbo]" Table="[z]" Column="col2" />
                    </GroupBy>
                    <SegmentColumn>
                      <ColumnReference Column="Segment1006" />
                    </SegmentColumn>


The grouping columns are listed, and you can see these "Segment100x" columns that get fabricated along the way.  These are not surfaced outside the query, so you can't see the values of this directly.  However, you can see if the grouping columns have changed.

So, in this plan we have two segment columns on different grouping column sets, both of these passed into the sequence project (you can only actually see that they stop being propogated in the showplan_xml, but the QP is usually smart enough to not pass things along unnecessarily, so that usually means that the buck stops where it stops for a reason :).

Remember, all of the operators can change from release to release.  I don't believe any of this changed substantally from 2005 to 2008 based on what I can see in the plans.

That concludes our daily episode of the operator of the day.  You guys have a good weekend.  I am enjoying warmth and sunshine in Austin in February, which I can tell you is a substantial change from my life while in Seattle.


Sunday, February 10, 2008 11:46:38 AM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
Thursday, February 07, 2008
I received a comment asking about this, so I'll write a short note on the subject.

The QP will take and expand all *'s early in the pipeline and bind them to objects (in this case, the list of columns).  It will then remove unneeded columns due to the nature of the query. 

So for a simple EXISTS subquery like this:

SELECT col1 FROM MyTable WHERE EXISTS (SELECT * FROM Table2 WHERE MyTable.col1=Table2.col2)
The * will be expanded to some potentially big column list and then it will be determined that the semantics of the EXISTS does not require any of those columns, so basically all of them can be removed.

"SELECT 1" will avoid having to examine any unneeded metadata for that table during query compilation.

However, at runtime the two forms of the query will be identical and will have identical runtimes.

As such, I typically use SELECT 1.  In my blog post I didn't.

Conor

Thursday, February 07, 2008 8:01:46 AM (Central Standard Time, UTC-06:00)  #    Comments [6]  | 
Wednesday, February 06, 2008
Had I known the number of responses I would have gotten about running SQL Server in virtual environments, I would have spent more time writing a questionnaire! 

I guess I'm also pleasantly surprised that so many of you read this blog - you must all be smarter and better looking than the average blog reader, for sure ;).

ok, I learned a few things of interest and I will sum them up here.  Then I'll write something more technical where you might learn a few things from me to help you in your day-to-day.

1. A number of you run SQL in virtualized environments.  VMWare more than VirtualPC (to be expected, given the relative state of the two products).  Perhaps the mix will change after Windows 2008 becomes available.

2. More than a few of you run production SQL Server systems on virtualized hardware.  I'm still asking a few clarifying questions, but it appears that this is not limited to tiny systems or lower-volume systems.

3. Within these production systems cases, the most common benefits related to manageability.  Specifically, moving the VM to a new machine when an old one died or needed to be reconfigured.  This makes sense.

4. A number of people run single-user systems to do development, try out some new configuration, etc.  Often you don't care about the system performance in this case.

5. For the group in between the single-user system and the deployed systems, there is frustration about VM configuration.  Typically the overhead is high unless you invest in a beefy storage solution (for example, a RAID-1 or RAID-5 array directly attached ).  Configuring storage and memory properly for the VM is critical for good performance.

I may return to this question in the future with a more systematic survey.  I didn't even ask you if you were running ESX vs. the non-ESX versions of VMWare or whether your CPUs have the VT capability in them!  Thanks for all who shared data with me - I learned a few things and have more to ponder.

I've been playing with VMWare more than average recently, and I've had some fun and some pain.  My experiences are not dissimilar - it can be good for manageability, but it can also be slower. 


OK now for the bit where I tell you a few things you might not know ;).

You probably know that MS isn't particularly interested in you doing this, and there's been press on the licensing (which SKUs you are do virtualization on, etc) around this for a bit.  MS is working hard on its own enterprise-class virtualization story, and I'm sure it will be at least reasonable. 


Within the server itself, there is suprisingly little knowledge of a lot of things in this area that are important to performance.  SQL Server's core engine assumes things like:

1. all CPUs are equally powerful
2. all CPUs process instructions at about the same rate.
3. a flush to disk should probably happen in a bounded amount of time.

These are simplifying assumptions that are often completely untrue, but SQL Server doesn't want to get into the business of forcing you to configure a bunch of numbers in order to use the product (or pay a high-priced consultant, for that matter - at least not to get started ;). 

Virtualization, from the system implemenation perspective, has a similar impact to processor hyperthreading.  Hyperthreading, for those who haven't been reading Intel's website much lately, allows a superscalar (multi-pipeline) CPU to schedule mutiple threads of work at the same time as long as they don't require the same pipelines.  This sounds awesome, right?  You can increase your instructions per cycle on the same hardware with only minimal extra transitors of scheduling overhead.  Well, this works great as long as you build an OS from the ground up that exposes threads as its primary primitive instead of processes.  Then you can just have a pile of threads that execute all over the place, usually without much care about how fast they are running, and you end up with everything running faster.  When Windows consumed this, it wasn't so simple.  The scheduler schedules threads to CPUs, and you can't have a part of a CPU.  So, we have logical and physical CPUs.  To most applications, they see 1 CPU per allowed thread.  this makes everything work but it hides the fact that these things are not in any way the same unit of processing power.  So some threads get assigned to fast CPUs and some get assigned to slow CPUs.  Sometimes, you get assigned back and forth.  ack!

Now let's throw SQL Server on top of this.  It's running lots of threads in one process.  They get scheduled essentially randomly across this matrix of real and fake CPUs.  Some user applications are one thread per user, and they work fine (OLTP applications).  In fact, some of them work a bit faster (10-20%) because there are "more" CPUs.  This is great, since you pay by the socket for Microsoft software (unlike some of those other guys ;). 

Data warehouse applications, however, are a much, much different beast.  A good deal of the benefit of buying the top-end SKU is to get parallel queries.  This speeds up queries by running one query on multiple threads.  It trades CPUs for runtime performance improvements.  The QP has lots of smarts to try to split up things into equal chunks of work and to then run each of those chunks on a different thread on a different CPU.  Usually the pattern for something like an index build or a hash join would be:

1. get everything onto different threads
2. do some work
3. wait for all those threads to complete
4. do something else in the query

So now I have some threads that finish earlier than others.  So they block until the slowest threads finish.  Even worse, I don't think that the query re-allocates those threads for other queries until the whole query finishes.  So, now you have some background as to why hyperthreading was not recommended for at least some SQL Server deployments.

Virtualization has some of the same issues.  Things don't proceed at the same rate on each CPU, assuming you define more than one CPU for the virtual machine.  When your physical machine is under a high load from other VMs, you get less CPU, memory, IO, etc.  For occasional applications, all of this is fine - you can trade this speed for consolodation, managebility, etc.  Occasionally things are slower.  That's fine.  For high-volume applications, this is usually not as good.

Other issues with Virtualization - memory is a big one.  SQL Server assumes, at least in the main server SKUs, that it is the only significant memory consumer on the machine.  It's a *server*.  (SQLExpress has different assumptions, but it's no memory slouch either).  Now, SQL Server will work in a memory constrained environment, but you often don't want to do that.  You take that away from a lot of different things - the buffer pool, the compiled plan cache, memory to execute queries (for example, hash join grants).  All of these things can add up if you aren't careful.

I/O is the area where I have the least experience in virtualization.  This is one of the reasons I asked people about production SQL Servers.  Usually they did get some storage array, and this makes sense - it ramps the I/O bandwidth and usually isolates it from any other operations on the machine (your OS, your application you are developing on top of SQL Server, etc).  I'm going to spend some more time on this, but I think the core idea is sound - as you start sharing your I/O bandwidth over several VMs, you are going to hit limits earlier with big IO consumers like SQL Server.  The same basic logic applies - isolate your database traffic onto different storage paths, especially when building a system to scale.  In a VM world, this can let you avoid the sharing penalties vs. the default config of everyone sharing the same hard drive.

Had you asked me while I worked at MS about a query performance problem on virtualized hardware (even from the MS product), the answer would have been "don't do that".  I think eventually there will be enough push for some other answer than that.  (Reminder - I don't work there anymore, and my opinion is purely my own at this point).


I hope that makes this a fair trade :)

I'm off to bed.  See ya'll.

Conor

Wednesday, February 06, 2008 11:13:08 PM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
Tuesday, February 05, 2008
I'd like to do a post or two on running SQL Server 2005 within a virtual machine.  Before I get into all sorts of fun details on this, I'd like to ask you guys to write in and tell me about what whether you are doing this or not.

It's ok, I don't get paid by the man anymore, you can tell me ;).  Even if they did still pay me, I'm curious. 

So, who's got a SQL 2005 Server running in a virtual pc?  in an VMWare ESX server?  How's that working out for ya?

Email me at conor@sqlskills.com!  I want to hear from you!

Who was smart enough to sell their VMWare stock on the way up but not smart enough to sell their EMC stock? (that would be me).


Thanks,

Conor
Tuesday, February 05, 2008 9:57:21 PM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
Saturday, February 02, 2008
In my previous post, I talked about the basics of partition elimination and mentioned some of the challenges.  SQL 2005 does contain logic to do partition elimination, but it is imperfect because some of the interactions with other features in the optimizer didn't always do partition elimination when customers wanted/expected it to do so.  SQL 2008 uses a somewhat different internal notation for partitioning, and it appears (from what I have seen so far) to do a better job at some of the interactions that were troublesome for 2005.  This should make partitioning work much more seamlessly in the upcoming release.

This example was motivated by a reader of the blog (thanks!) who was using a partitioned fact table for sales data and joining to another table to roll his own row-level security.  I've seen a number of customers do row-level security this way, so it's actually not a bad example for how things can break down in partition elimination in 2005 vs. 2008.  Here is the script we'll use for the example:
create partition function pf2(int) as range left for values (10, 20, 30)
create partition scheme ps2 as partition pf2 ALL to ([primary])

create table usersecuritybysite(username nvarchar(100), site int)
insert into usersecuritybysite(username, site) values ('dbo', 1)
insert into usersecuritybysite(username, site) values ('dbo', 2)

create table factsales(site int, invoice int, itemid int, col3 binary(2000)) ON ps2(site)
insert into factsales (site, invoice, itemid) values (1, 1, 1)
insert into factsales (site, invoice, itemid) values (2, 1, 1)
insert into factsales (site, invoice, itemid) values (3, 1, 1)
insert into factsales (site, invoice, itemid) values (4, 1, 1)

This obviously has far less data than an actual fact table, but I'll show you the plans that are interesting from this example and they should translate to an evaluation of a plan on a full-scale system.

So, we have two tables, both heaps.  The fact table is partitioned, and we will always join with an unpartitioned (but likely smaller) table when querying the fact table.  SQL 2005 generates the following plan:




So the optimizer generates a plan to build a hash table from usersecuritybysite and then probe it with every row from the partitioned table.  That's actually not a horrible plan, but as the cardinality of the fact table increases, this will get worse because we're scanning all partitions.  If we look at the statistics profile, we can see that all 4 partitions are scanned once in this plan.  Ideally, we'd like it to use the information from the usersecuritybysite table to eliminate partitions "almost always".  That would be safer.  (Nothing in optimization is simple, and there are actually more plans I'd consider here, but I'd like to focus on the point of not touching useless partitions in this post - in this case, I actually do want to avoid touching unneeded partitions as often as possible).



The constant scan with values 1,2,3,4 is where the partition information is encoded in SQL 2005.

Each partition has roughly the startup cost of a table, so think of this as opening 4 tables.  Databases are usually optimized so that reading lots of rows is the fastest thing - opening tables won't be as fast as opening rows, and when push comes to shove one will optimize row reads at the expense of table opens every time.

Let's look at the SQL 2008 plan:


Hey, what happened?  Well, the internal plan representation in SQL 2005 was pretty good but not perfect.  I think it's fair to say that it does a better job than partitioned views for local cases, but it wasn't as good as the non-partitioned tables in a few cases.  Many of the improvements that were driven in 2005 led to lessons that were incorporated in 2008's optimizer support.  Specifically, compilation time is higher in 2005 for partitioned tables because there are more operators in the tree.  Also, the introduction of these extra operators happened to make "trivial" plans no longer "trivial" (because partitioning is represented as a join in 2005, and joins mean cost-based choices, and therefore it is not "trivial" to choose a plan).  Finally, some of the plan choices that happen in the non-partitioned cases from years and years of tweaking queries didn't quite fit with partitioning because of the representation.  For example, index selection didn't know as much about partition selection as it needed to have to make perfect plan choices. 

So, 2008 changes things largely for the better.  The stats profile for this plan in 2008 is here:



So you'll notice a few things.  First, the constant scan is gone, and this plan looks like the non-partitioned plan.  Hey, is that a seek?  What's up with that?  This is a heap!!! Well, the optimizer now has some smarts to treat partitioning as a special leading index key, and even heaps can have them.  So, now the index selection code can do most of the partitioning applications better and avoid some of those seams I mentioned earlier.  It's harder to tell how many partitions got pruned, but if you look at the text more closely:

       |--Table Scan(OBJECT:([t1].[dbo].[factsales] AS [fs]), SEEK:([PtnId1004]=RangePartitionNew([t1].[dbo].[usersecuritybysite].[site] as [us].[site],(0),(10),(20),(30))),  WHERE:([t1].[dbo].[factsales].[site] as [fs].[site]=[t1].[dbo].[usersecuritybysite].[site] as [us].[site]) ORDERED FORWARD)

You can see that the optimizer is running some partition function on the usersecuritybysite.site column as part of the SEEK.  I'm asking my friends if this is the only indication I can see beyond looking down at the lock manager, but this is what I can see right now.

To be clear, 2005 also has the ability to partition pruning.  I'm not sure how many people store their fact tables in heaps and how often it happens to not prune.   If we create a clustered index on the fact table in 2005, you get a better plan:




This isn't quite as perfect as the 2008 plan, but it does avoid opening unneeded partitions.  In this case, the security table is scanned and joined with the list of partitions run against the partition function.  This generates the right partition to open for each security table row.  It happens to work great for this case because the table is being used as a "gate" to let the user see the partition or not.  Things get a bit more complex when the non-partitioned tables has duplicates or many rows from the fact table can be matched (you might want to pre-sort your partition requests and use a different plan in that case to avoid re-opening partitions).

So, the bottom line is that this stuff is complicated and it's a possible to get an imperfect plan at times.  You can use this technique to see if your 2005 plan is eliminating all the partitions you would expect.  If you end up with a plan that doesn't prune properly, you can do things like split the query or use a hint to force the optimizer to do what you want.  I'd recommend that you know what you are doing before going down either path.  Everyone has different levels of experience on this stuff, and it can be tricky to fully understand all of the interactions. I hope I've given you enough insight in this post to see whether you have a problem. 

Also, please be careful using statistics profile on a production machine,  and be sure to turn it off when you are done - it slows down queries, and nothing makes Conor angrier than a needlessly slow query ;).

Have a good weekend, y'all!

Conor
Saturday, February 02, 2008 3:05:08 PM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
Thursday, January 31, 2008
Horizontal Table and Index Partitioning, where different rows in the same logical table are split into different physical tables, is an essential data warehousing tool.  It allows large tables to be split into multiple physical heaps or b-trees based on a key in the data.  This often makes big tables more manageable because many operations take longer as a function of the size of the table.  Splitting those tables can allow a database administrator to take the cost of an index rebuild or statistics update in pieces instead of all at once.

Partitioned tables have somewhat increased overhead because there's all these extra physical structures to handle.  They often also have different optimization characteristics.  The optimizer can often determine that whole partitions are not needed for a particular query, and it then applies
logic to eliminate those partitions from the query plan entirely.  This is conceptually similar to index matching, and it provides an order-of-magnitude performance improvement when it works.

Partition elimination relies on a fairly complex and expensive part of the optimizer that reasons about the valid domains for each column participating in a query.  For example, if you have a query like:
SELECT * FROM A INNER JOIN B on A.a=B.b where A.a < 20 and B.b > 50;




If you stare at this while touching your nose, you may eventually figure out that no rows should be returned from such a query because logically the conditions introduce a contradiction.  In this case, the query will be simplified during optimization and you'll see a plan like this:

I will cover constant tables in a future "operator of the day" post, but you could imagine that not much will be returned from this query once this simplification has occurred.

Partition elimination, at least in SQL Server 2005, is conceptually similar.  Each access method (heap, clustered index, secondary index) that contains a partitioning is tracked and considered against the valid ranges for that query.  If it is determined that one or more partitions is not needed, these are pruned from the final plan and your query just got a lot faster.

It would be nice if that were the end of it, but it gets far more complicated.  Let's say that you write the query like this:

SELECT * FROM MyTable WHERE ptncol=?

Well, I can reason that the query will touch one and only one partition in this case.  Dynamic partition pruning deals with a number of cases like this, and boy was that fun to implement. 

Once you get your head wrapped around dynamic partition pruning, you then start to worry about a range of other nasty, nasty problems that are only somewhat visible via the output plan.  The SQL 2005 query tree representation for partitioned tables unfortunately is not orthogonal with how parallelism is applied to the query tree, so there are cases when you can't get parallel plans against partitioned tables in a plan.  Additionally, the index matching and access path selection code also has to deal with backwards compatiblity when selecting the access path, and there are lots of details about costing and index selection that are very core to compilation time performance and plan quality.  Finally, there are some operators in the query tree that can't/don't flow information about the algebraic logic in enough detail to prune partitions.

While some of the details of this are beyond what is visible in the output plans (and therefore I won't be talking about it), I hope you can at least accept that it's a complex problem to make it all work perfectly.  SQL 2005 did get a lot of the pieces working, and I think everyone would say that it's a big, big step forward compared to (distributed) partitioned views from SQL 2000. 

SQL 2008 contains some additional changes to partitioning to continue to integrate it tightly into the rest of the query optimizer, and I'll do a post on the external query representation differences between SQL 2005 and SQL 2008.  I'll also be doing a post or two on some specific pruning issues that real customers have seen in SQL 2005.

Bedtime for me - happy querying!

Conor


Thursday, January 31, 2008 11:33:00 PM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
Wednesday, January 30, 2008
So one of the features I had a hand in building in SQL Server 2005 is really useful for helping SQL Server work well in large data warehouse deployments.

I led a team writing the paper that was published about this, and you can read it in my previous blog post here.  The paper covers the theory but not the implementation.

The basic idea is that SQL Server has a lot of competing memory consumers within a single process space.  Often the naive approach (block incoming requests until there is enough memory) will fail under extreme circumstances because there is even incoming requests take some amount of memory, and you can starve on memory waiting for memory.  Additionally, many of the components are built to run best with fewer queries running in the system - for example, hash joins run a lot faster when they do not spill to disk.  When you add it all up, you have a bunch of different memory consumers with a bunch of different optimal policies to achieve optimal performance.

For the query optimizer, we tried a number of different approaches in the naive space before we eventually settled on a tiered set of monitors that block based on how much memory an in-process memory optimization has consumed compared to the free memory in the system.  This meant that the system would not block any trivially small query from compiling, allow a certain number of normal (OLTP) queries to compile, allow a fewer number of medium (think basic DSS) queries to compile, and have one final bucket to allow for the largest query in human history to compile.  (Note that I am not saying "execute" - there are a different set of queues for those queries).

What we found was that, as SQL Server started to be used in larger and larger deployments, the optimizer would consume more memory in those situations where large DW queries were coming into the system.  This led to out of memory errors thrown during compilation or execution.  The cost of restarting these queries was high (often the plan wasn't in the cache yet or was thrown out of the cache quickly due to memory pressure), and the system would just go into a memory-thrashing fit.

So how do I tell if my system is in a memory-thrashing fit due to compilation-overload?  Well, there is a command called DBCC MEMORYSTATUS that will tell you a bit about where you are.  If you scroll through the various rowsets returned, you'll eventually find one called "Optimization Queue".  This reflects the total memory available for all optimizations in the system, and the SQL OS layer controls this.  If lots of other things need memory, this will get clamped down.  The next 3 should be familiar by now - they are the various gateways to block queries of various sizes.  The "Threshold factor" is the amount of memory at which the optimizer will start trying to block that query from continuing until other queries in that bucket have completed compilation.

The "(default)" postfix is for normal user queries.  The "(internal)" queues are for internal queries.  There aren't a ton of internal queries.

Here's a screenshot of this in action:



Now, last I checked I didn't have a large-scale data warehouse running on my desktop, so these numbers are pretty tame right now.  You can run this command on a production system to see where you stand - it is intended to run quickly for debugging purposes on running systems, actually.

Enjoy!

Conor Cunningham
Wednesday, January 30, 2008 9:03:13 AM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
Tuesday, January 29, 2008
One of the reasons I'm posting up content like this is to help all of you learn a bit of what I have bouncing around my head.  I'm hoping that I'll get a few more people writing some awesome data-driven applications.

So, I'd like to get some help from you - I'd like to hear from you.  I need some stories and/or questions about any data warehouses you've been using.  I love hearing these stories, especially about the sorts of things that you had to do to get them to work well with SQL Server.

I can't promise that I'll blog on every single thing you send me, but I will read them all and post up some interesting information I have to help you out where possible.

So, if you've hit problems with memory limits compiling or running queries on a data warehouse, I want to hear about it.  If you've got some nasty, nasty query plans, please send me a picture of your schema and the statistics profile for a query and we can talk about what isn't working well.  Send me a stats problem where the optimizer isn't cutting it.

So, please mail in to conor@sqlskills.com.  I'm happy to keep necessary details confidential, if desired.  All of this will help me focus my energies on the topics that will help you the most.  Otherwise you guys get to listen to me ramble on about whatever I want :)

Thanks,

Conor

Tuesday, January 29, 2008 8:17:30 PM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
So I'll be posting explanations for each query operator in the output query trees that you can see through "set showplan_text on" in SQL Server.  I'm hopeful that this will give you the tools to better read the query plans being generated by the QP when evaluating your system.

So I'll start with one that I added into the optimizer - the "sequence project".

create table z(col1 int identity, col2 int default rand()*10000)

declare @i int 
set @i=0
while @i < 1000
begin
insert into z default values
set @i=@i+1
end

select RANK() over(order by col2), * from z



If you look at the graphical showplan you get an output plan that looks a bit funny.  A Sequence Project is a special form of projection (the thing that evaluates the scalars in the SELECT list for each row).  By default, a regular project in SQL Server does not preserve nor guarantee any sort properties in the physical plan.  As a result, ranking functions put in regular projects would not preserve order in all cases deterministically.  People seem to complain when the query results are incorrect.

Enter the sequence project....This little beauty does do scalar computations, but they are generally limited to a very small number of operations that require order for one reason or another.  In the case of ranking functions, an order is required as part of the syntax, and you will see that this particular plan needs a sort since there are no indexes on the base table to satisfy a sort request on col2.  (I'll talk about segments in a future post, but for now ignore them and assume that they are part of the sequence project when looking at this plan).

The problem with sequence projects are that while they preserve ordering semantics, they interfere with a lot of the other optimization logic in the QP.  For example, these can interfere with index matching for predicates above the sequence project in a query tree.  Operators that block other optimizations tend to make optimizer developers very unhappy and sometimes elicit violent reactions!  So, be careful when proposing "opaque" operators to the next query optimization developer you see ;).

So, SQL Server has a small number of operations that in previous releases of the product were "incorrectly" put into regular projects that are now in sequence projects in the current versions of the product.  (By "incorrectly", I mean that there were potential optimizations that could be run that would break their semantics, not that there were actively executing plans at a customer site that were incorrect.  In optimizer terminology, they model was not correctly preserving the semantics even though there were no rules that exposed this fault).

So, the challenge for you is think of some places _beyond sequence functions_ (RANK, DENSE_RANK, NTILE, ROW_NUMBER, ...) that use the sequence projection operator.  If you mail them to me with plans, I may be able to tell you more about what is happening for each of them.

Happy hunting!

Conor Cunningham
Tuesday, January 29, 2008 8:06:04 PM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
Sunday, January 27, 2008
For a number of years, the marketing for SQL Server's competitors made a large number of claims that SQL Server couldn't handle "Enterprise"-scale deployments.  Over time, those claims became much quieter as Microsoft used SQL Server extensively to run their whole business, through multi-terrabyte reference cases, and the hugely successful TerraServer project where Jim Gray led a team trying to build a huge database of satellite images using SQL Server.

This doesn't mean that there aren't challenges for all of the major vendors in adding features to help their products scale.  Each of them have some failings in some of their product offerings, so please don't interpret this as a "Microsoft SQL Server can't do the job" post.  On the contrary, I think that there is little that it can't do, but I know where the limitations get hit and I have some expertise in working around the current limitations.  Each Data Warehouse problem is a bit different, so the kinds of problems I'll discuss here may or may not happen for any given deployment.

In my previous post, I discussed how subqueries are approached by the query optimizer (as joins), and this can lead to problems when the cardinality or cost estimates are incorrect.  Often when an estimate is incorrect, "unrolling" the subquery by hand and creating a temporary table for a portion of the query will correct the mistake.  Splitting a larger query like this makes the operation more procedural and makes each piece smaller.  It also happens to give the optimizer better cardinality estimates for the portion of the query that has been unrolled - these better estimates may help the rest of the query perform more to your expectations.  This all assumes that:
1. You have the skills and desire to unroll a subquery
2. You know how to unroll it to give the optimizer information to help avoid cardinality estimate errors
3. That the query is known a priori.  Dynamically-generated queries are harder to hint

The best way to track down estimation errors is to find some time when you can run a problematic query with "set statistics profile on" enabled.  While this does make the query run more slowly, it captures useful information to help identify areas of a query where the estimated and actual cardinality are materially incorrect.  I've blogged about this previously (in previous blogs, not this one), so I'll do a post linking some of the useful backgrounders on this area.

Data Warehouses are usually the largest of the large in terms of SQL Server deployments, and usually they have a dedicated DBA to help keep it working.  This is partially because any business problem worth hundreds of thousands of dollars in hardware is also probably worth the salary of a DBA to keep it working.  However, the other side of this coin is that there aren't enough data warehouses in the world to make it truly a commodity software purchase.  As such, lots of little things can and do go wrong in data warehouse deployments that often require expertise to fix.

Let's talk about some of the reasons why the query optimizer may make incorrect guesses on data warehouse tables, starting with some of the easier ones

1. missing statistics.  I've seen a bunch of customers disable auto-create statistics because "someone told them to do so" or "I don't trust those things".  Unless you are going to hand-tune every query, you should likely leave auto-create enabled.

2. out-of-date statistics.  This is a bit trickier.  Column statistics are updated when a certain number of changes are made, and sometimes that hasn't been triggered.  You can manually update statistics in those cases.  For customers who disable auto-update statistics (<2%, but this happens a lot on the higher-end deployments), you need to have a schedule to update those stats so that the optimizer has a fighting chance.

3. a statistics sample that does not reflect the whole data set.  There are a number of assumptions made during statistics creation that are more pragmatic than mathematically proven.  To make auto-stats possible, only a small percentage of the pages are actually used to create statistics for non-indexed columns.  If those pages don't cover the whole data range well, queries over the missing ranges could get poor estimates.

4. Fundamental limits in the statistics object.  Currently there can be up to 200 steps in the histogram that is created.  If your table's data has less than 200 interesting spikes in the distribution curve, then you are probably fine.  However, your fact table may benefit from more.  Unfortunately, this can be a problem even with full stats enabled in SQL 2005 and, from what I have seen so far, in the CTPs of SQL 2008.

5. model limits in the optimizer.  Optimizers make assumptions so that they can compile a query in a reasonable amount of time.  For example, an optimizer may assume that columns are, in general, statistically independent of each other for the purposes of cardinality estimation because it's the only computationally feasible assumption to make.  This assumption is very often not true, and it can introduce errors in cardinality estimation that can lead to poor plan selection.  As databases get larger, these "out of model" errors can get larger and require more hand-holding to make sure that a given query gets a good plan from the optimizer.


I hope this gives some insight into how cardinality estimation impacts join optimization and thus subquery optimization in SQL Server 200x.

Thanks,

Conor

Sunday, January 27, 2008 2:12:04 PM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
I've been trying to think of a good way to introduce how SQL Server does subqueries based on a question from a reader.  One of the tricky pieces to query optimization is that the "language" used internally in the optimizer is actually somewhat more expressive than SQL.  The second part of the problem is that, for people who have used other products at one time or another, it's a challenge to get them to re-learn something to grok how SQL Server does it (and even to figure out what they need to re-learn).

So I'm going to do a series of posts on subqueries and the related pieces behind subqueries to put them in context.

Let's start at the beginning.  A subquery is a fancy filter condition where you can specify an query to run for each row of the outer query.  This doesn't mean that's how the optimizer will actually execute it, but you can start thinking about it like that:

SELECT * FROM MyTable WHERE productkey IN (SELECT * FROM Orders where MyTable.productkey=Orders.productkey)


One way to execute this query is to read each row from MyTable, get productkey, start up another query to look through orders, and determine if we want to keep the row from MyTable based on the result of that second query.  This is how some of the other QP's on which I have worked attack subqueries and is perfectly valid.

In the internal lanaguage of the optimizer (and you can read this in output plans), you will typically see that this is executed physically in SQL Server using a "semi-join", which you should imagine as a join operator that does what we've described within one single query instead of running an extra query per row.  A semi-join will return rows from only one side of the join (left or right semi join).  These semijoins are nice because you can actually re-use your physical join implementation to execute them.  They are also nice because you can also leverage your tree rewrite logic and cardinality estimation logic to help you think about these like joins as well.  So, you can reorder and rewrite the query into alternative, equivalent forms that might run more quickly.

The nice part about this approach is that you get the benefit of a lot of work and experience estimating cardinality for joins, which helps the SQL Server Optimizer make decisions about these rewrites.  The bad part about this is that in queries with many joins or queries with lots and lots of data, your odds of getting a mistake in join cardinality/cost estimation go up and up, and your opportunity to get a plan that is far from optimal also go up.

My next post will cover some of these things in more detail - off to get my kid up.

Conor
Sunday, January 27, 2008 7:59:43 AM (Central Standard Time, UTC-06:00)  #    Comments [3]  | 

Theme design by Jelle Druyts

Pick a theme: