<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:pingback="http://madskills.com/public/xml/rss/module/pingback/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>Conor Cunningham's Blog</title>
    <link>http://www.sqlskills.com/blogs/conor/</link>
    <description />
    <language>en-us</language>
    <copyright>Conor Cunningham</copyright>
    <lastBuildDate>Thu, 24 Apr 2008 01:58:53 GMT</lastBuildDate>
    <generator>newtelligence dasBlog 2.0.7226.0</generator>
    <managingEditor>conor@sqlskills.com</managingEditor>
    <webMaster>conor@sqlskills.com</webMaster>
    <item>
      <trackback:ping>http://www.sqlskills.com/blogs/conor/Trackback.aspx?guid=4e2d6f51-d049-4b15-89e1-8fde5b0671e3</trackback:ping>
      <pingback:server>http://www.sqlskills.com/blogs/conor/pingback.aspx</pingback:server>
      <pingback:target>http://www.sqlskills.com/blogs/conor/PermaLink,guid,4e2d6f51-d049-4b15-89e1-8fde5b0671e3.aspx</pingback:target>
      <dc:creator>
      </dc:creator>
      <wfw:comment>http://www.sqlskills.com/blogs/conor/CommentView,guid,4e2d6f51-d049-4b15-89e1-8fde5b0671e3.aspx</wfw:comment>
      <wfw:commentRss>http://www.sqlskills.com/blogs/conor/SyndicationService.asmx/GetEntryCommentsRss?guid=4e2d6f51-d049-4b15-89e1-8fde5b0671e3</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <pre>
          <span style="color: Black; background-color: transparent; font-family: Courier New; font-size: 11px;">
            <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">
              <br />
            </span>
          </span>
        </pre>
I spent some time playing with the SQL 2008 geometry data type.  This post will
cover some of the things I have learned about how it is implemented from playing with
the feature.  Some of these will impact the feature's usability for some. 
Others may not care.<br /><br />
1. late-binding of the geometry data is an interesting choice.  So one of the
nice properties of a SQL query is that, for many things, the parse and bind of a query
can find all sorts of compilation errors that would show up during development. 
You don't have to run the query in order to see the simple mistakes, and that helps
us all.  This type doesn't throw parsing errors until you try to execute the
query.  That's not perfect - you have to go execute spatial queries to see whether
you have typed in lots of complex data correctly or not.  I can speak from experience
that I tend to mess that stuff up.<br /><br />
Example:<br /><br /><pre><span style="color: Black; background-color: transparent; font-family: Courier New; font-size: 11px;"><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">create</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">table</span> g1
(col1 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">int</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">identity</span>,
col2 geometry)<br />
go<br /><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">insert</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">into</span> g1(col2) <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">values</span> (Geometry::STGeomFromText(<span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;">'LINESTRING(0
0, 10 10, 21 2)'</span>,0));<br />
go<br /><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">insert</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">into</span> g1(col2) <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">values</span> (Geometry::STGeomFromText(<span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;">'LINESTRING(funky
chicken)'</span>,0));<br />
go</span></pre>
That third query compiles just fine - running it returns an error.<br /><br />
2. The geography type uses some form of the .net framework.  (I'll point out
that I have not enabled the CLR in my server).  Given that the ordpath type requires
that the CLR be enabled, I'll guess that this means that there is a special implementation
of the geometry type.<br /><br />
Here's the runtime errof from the previous query:<br /><br /><font color="#ff0000">Msg 6522, Level 16, State 1, Line 1<br />
A .NET Framework error occurred during execution of user-defined routine or aggregate
"geometry": 
<br />
System.FormatException: 24141: A number is expected at position 16 of the input. The
input has funky.<br />
System.FormatException: 
<br />
   at Microsoft.SqlServer.Types.OpenGisWktReader.RecognizeDouble()<br />
   at Microsoft.SqlServer.Types.OpenGisWktReader.ParsePoint()<br />
   at Microsoft.SqlServer.Types.OpenGisWktReader.ParseLineStringText(FigureAttributes
attributes)<br />
   at Microsoft.SqlServer.Types.OpenGisWktReader.ParseLineStringTaggedText()<br />
   at Microsoft.SqlServer.Types.OpenGisWktReader.ParseGeometryTaggedText()<br />
   at Microsoft.SqlServer.Types.OpenGisWktReader.ReadGeometry()<br />
   at Microsoft.SqlServer.Types.SqlGeometry.STGeomFromText(SqlChars geometryTaggedText,
Int32 srid)<br />
.<br />
The statement has been terminated.</font><br /><p></p>
(I think I will be making t-shirts that say "<b>The input has funky</b>" on it...
let me know if you want in :).<br /><br />
I am actually pretty happy with the error message - the only thing missing is which
column caused the error.  You'll notice that "position 16" is inside of this
second string being parsed by the .NET routines.  I personally do find stack
traces unacceptable for general use, but the .net and java folks seem to have doubled
down on this approach, so we all get to learn about their libraries even in SQL now...<br /><br />
This is just a seam in how it was implemented.  Be aware of this when using the
type.<br /><br />
3. the geometry type has been (arbitrarily) given an estimated row width of about
4000 bytes.  well, this is what shows up in showplan.  So I inserted a bunch
of data into this table (1.4 million rows in about 148MB).  The cost estimates
don't seem to use the 4000 byte number for the row width (as it would take a long
time to read 5.6GB - more than 13 seconds or so, given that is what the costing says,
so I am guessing that this is just reported in showplan the same way it is costed
internally).  I'm just using a heap (no indexes) for this experiment.<br /><br />
4. I'm waiting for maps.live.com  to make it easy for me to geocode addresses
and insert them into SQL.  It has a feature to send it to some gps device, but
I haven't played with it beyond learning that it didn't detect that I had blocked
cookies by default ;).<br /><br />
5. "select col2.ToString() from g1" is a way to see something useful from that data
type.  otherwise you get back binary goo...<br /><br />
That's all for tonight.<br /><br />
Thanks, 
<br /><br />
Conor Cunningham<br /><br /><br /><br /><br /><br /><img width="0" height="0" src="http://www.sqlskills.com/blogs/conor/aggbug.ashx?id=4e2d6f51-d049-4b15-89e1-8fde5b0671e3" /><br /><hr />
This blog is sponsored by <a href="http://www.sqlskills.com">SQLSkills</a>. 
</body>
      <title>SQL 2008 Geometry type tidbits</title>
      <guid isPermaLink="false">http://www.sqlskills.com/blogs/conor/PermaLink,guid,4e2d6f51-d049-4b15-89e1-8fde5b0671e3.aspx</guid>
      <link>http://www.sqlskills.com/blogs/conor/2008/04/24/SQL2008GeometryTypeTidbits.aspx</link>
      <pubDate>Thu, 24 Apr 2008 01:58:53 GMT</pubDate>
      <description>&lt;pre&gt;&lt;span style="color: Black; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;&lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;
&lt;br&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;
I spent some time playing with the SQL 2008 geometry data type.&amp;nbsp; This post will
cover some of the things I have learned about how it is implemented from playing with
the feature.&amp;nbsp; Some of these will impact the feature's usability for some.&amp;nbsp;
Others may not care.&lt;br&gt;
&lt;br&gt;
1. late-binding of the geometry data is an interesting choice.&amp;nbsp; So one of the
nice properties of a SQL query is that, for many things, the parse and bind of a query
can find all sorts of compilation errors that would show up during development.&amp;nbsp;
You don't have to run the query in order to see the simple mistakes, and that helps
us all.&amp;nbsp; This type doesn't throw parsing errors until you try to execute the
query.&amp;nbsp; That's not perfect - you have to go execute spatial queries to see whether
you have typed in lots of complex data correctly or not.&amp;nbsp; I can speak from experience
that I tend to mess that stuff up.&lt;br&gt;
&lt;br&gt;
Example:&lt;br&gt;
&lt;br&gt;
&lt;pre&gt;&lt;span style="color: Black; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;&lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;create&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;table&lt;/span&gt; g1
(col1 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;int&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;identity&lt;/span&gt;,
col2 geometry)&lt;br&gt;
go&lt;br&gt;
&lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;insert&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;into&lt;/span&gt; g1(col2) &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;values&lt;/span&gt; (Geometry::STGeomFromText(&lt;span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;'LINESTRING(0
0, 10 10, 21 2)'&lt;/span&gt;,0));&lt;br&gt;
go&lt;br&gt;
&lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;insert&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;into&lt;/span&gt; g1(col2) &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;values&lt;/span&gt; (Geometry::STGeomFromText(&lt;span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;'LINESTRING(funky
chicken)'&lt;/span&gt;,0));&lt;br&gt;
go&lt;/span&gt;&lt;/pre&gt;
That third query compiles just fine - running it returns an error.&lt;br&gt;
&lt;br&gt;
2. The geography type uses some form of the .net framework.&amp;nbsp; (I'll point out
that I have not enabled the CLR in my server).&amp;nbsp; Given that the ordpath type requires
that the CLR be enabled, I'll guess that this means that there is a special implementation
of the geometry type.&lt;br&gt;
&lt;br&gt;
Here's the runtime errof from the previous query:&lt;br&gt;
&lt;br&gt;
&lt;font color="#ff0000"&gt;Msg 6522, Level 16, State 1, Line 1&lt;br&gt;
A .NET Framework error occurred during execution of user-defined routine or aggregate
"geometry": 
&lt;br&gt;
System.FormatException: 24141: A number is expected at position 16 of the input. The
input has funky.&lt;br&gt;
System.FormatException: 
&lt;br&gt;
&amp;nbsp;&amp;nbsp; at Microsoft.SqlServer.Types.OpenGisWktReader.RecognizeDouble()&lt;br&gt;
&amp;nbsp;&amp;nbsp; at Microsoft.SqlServer.Types.OpenGisWktReader.ParsePoint()&lt;br&gt;
&amp;nbsp;&amp;nbsp; at Microsoft.SqlServer.Types.OpenGisWktReader.ParseLineStringText(FigureAttributes
attributes)&lt;br&gt;
&amp;nbsp;&amp;nbsp; at Microsoft.SqlServer.Types.OpenGisWktReader.ParseLineStringTaggedText()&lt;br&gt;
&amp;nbsp;&amp;nbsp; at Microsoft.SqlServer.Types.OpenGisWktReader.ParseGeometryTaggedText()&lt;br&gt;
&amp;nbsp;&amp;nbsp; at Microsoft.SqlServer.Types.OpenGisWktReader.ReadGeometry()&lt;br&gt;
&amp;nbsp;&amp;nbsp; at Microsoft.SqlServer.Types.SqlGeometry.STGeomFromText(SqlChars geometryTaggedText,
Int32 srid)&lt;br&gt;
.&lt;br&gt;
The statement has been terminated.&lt;/font&gt;
&lt;br&gt;
&lt;p&gt;
&lt;/p&gt;
(I think I will be making t-shirts that say "&lt;b&gt;The input has funky&lt;/b&gt;" on it...
let me know if you want in :).&lt;br&gt;
&lt;br&gt;
I am actually pretty happy with the error message - the only thing missing is which
column caused the error.&amp;nbsp; You'll notice that "position 16" is inside of this
second string being parsed by the .NET routines.&amp;nbsp; I personally do find stack
traces unacceptable for general use, but the .net and java folks seem to have doubled
down on this approach, so we all get to learn about their libraries even in SQL now...&lt;br&gt;
&lt;br&gt;
This is just a seam in how it was implemented.&amp;nbsp; Be aware of this when using the
type.&lt;br&gt;
&lt;br&gt;
3. the geometry type has been (arbitrarily) given an estimated row width of about
4000 bytes.&amp;nbsp; well, this is what shows up in showplan.&amp;nbsp; So I inserted a bunch
of data into this table (1.4 million rows in about 148MB).&amp;nbsp; The cost estimates
don't seem to use the 4000 byte number for the row width (as it would take a long
time to read 5.6GB - more than 13 seconds or so, given that is what the costing says,
so I am guessing that this is just reported in showplan the same way it is costed
internally).&amp;nbsp; I'm just using a heap (no indexes) for this experiment.&lt;br&gt;
&lt;br&gt;
4. I'm waiting for maps.live.com&amp;nbsp; to make it easy for me to geocode addresses
and insert them into SQL.&amp;nbsp; It has a feature to send it to some gps device, but
I haven't played with it beyond learning that it didn't detect that I had blocked
cookies by default ;).&lt;br&gt;
&lt;br&gt;
5. "select col2.ToString() from g1" is a way to see something useful from that data
type.&amp;nbsp; otherwise you get back binary goo...&lt;br&gt;
&lt;br&gt;
That's all for tonight.&lt;br&gt;
&lt;br&gt;
Thanks, 
&lt;br&gt;
&lt;br&gt;
Conor Cunningham&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&lt;img width="0" height="0" src="http://www.sqlskills.com/blogs/conor/aggbug.ashx?id=4e2d6f51-d049-4b15-89e1-8fde5b0671e3" /&gt;
&lt;br /&gt;
&lt;hr /&gt;
This blog is sponsored by &lt;a href="http://www.sqlskills.com"&gt;SQLSkills&lt;/a&gt;. </description>
      <comments>http://www.sqlskills.com/blogs/conor/CommentView,guid,4e2d6f51-d049-4b15-89e1-8fde5b0671e3.aspx</comments>
      <category>SQL Server 2008</category>
    </item>
    <item>
      <trackback:ping>http://www.sqlskills.com/blogs/conor/Trackback.aspx?guid=a24f81ce-2e72-4a39-84c5-50cba47d6da2</trackback:ping>
      <pingback:server>http://www.sqlskills.com/blogs/conor/pingback.aspx</pingback:server>
      <pingback:target>http://www.sqlskills.com/blogs/conor/PermaLink,guid,a24f81ce-2e72-4a39-84c5-50cba47d6da2.aspx</pingback:target>
      <dc:creator>
      </dc:creator>
      <wfw:comment>http://www.sqlskills.com/blogs/conor/CommentView,guid,a24f81ce-2e72-4a39-84c5-50cba47d6da2.aspx</wfw:comment>
      <wfw:commentRss>http://www.sqlskills.com/blogs/conor/SyndicationService.asmx/GetEntryCommentsRss?guid=a24f81ce-2e72-4a39-84c5-50cba47d6da2</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">A follow-up from my previous post.<br /><br />
Is COUNT(1) faster than COUNT(*)?<br /><br />
No, COUNT(1) is mapped into COUNT(*) early in the QP.  From the logic of my previous
post, COUNT(1) is always non-null and thus is the same as just doing a COUNT(*). 
Since it is semantically equivalent to COUNT(*), the QP just maps it to COUNT(*) because
it makes things simpler and you can avoid special cases for things like local-global
aggregation and moving group bys around joins.<br /><br />
You can actually look at the query plan for "SELECT COUNT(1) from Table;" and see
that the output plan is COUNT(*).<br /><br />
You can save it a few instructions by using COUNT(*).  You will also make the
ANSI SQL gods happier.<br /><br />
I get a lot of questions about these syntax issues - it makes sense, as this is often
very interesting in traditional procedural programming languages.  In many cases,
there are only very minor performance differences, if any, in the SQL language due
to these.  The big differences happen because your slightly different query actually
means something semantically different and thus implies a different amount of work
to compute.<br /><br />
Please keep sending in your questions - I'm happy to answer them!<br /><br />
Thanks,<br /><br />
Conor Cunningham<br /><p></p><img width="0" height="0" src="http://www.sqlskills.com/blogs/conor/aggbug.ashx?id=a24f81ce-2e72-4a39-84c5-50cba47d6da2" /><br /><hr />
This blog is sponsored by <a href="http://www.sqlskills.com">SQLSkills</a>. 
</body>
      <title>COUNT(1) vs. COUNT(*)</title>
      <guid isPermaLink="false">http://www.sqlskills.com/blogs/conor/PermaLink,guid,a24f81ce-2e72-4a39-84c5-50cba47d6da2.aspx</guid>
      <link>http://www.sqlskills.com/blogs/conor/2008/04/22/COUNT1VsCOUNT.aspx</link>
      <pubDate>Tue, 22 Apr 2008 01:02:39 GMT</pubDate>
      <description>A follow-up from my previous post.&lt;br&gt;
&lt;br&gt;
Is COUNT(1) faster than COUNT(*)?&lt;br&gt;
&lt;br&gt;
No, COUNT(1) is mapped into COUNT(*) early in the QP.&amp;nbsp; From the logic of my previous
post, COUNT(1) is always non-null and thus is the same as just doing a COUNT(*).&amp;nbsp;
Since it is semantically equivalent to COUNT(*), the QP just maps it to COUNT(*) because
it makes things simpler and you can avoid special cases for things like local-global
aggregation and moving group bys around joins.&lt;br&gt;
&lt;br&gt;
You can actually look at the query plan for "SELECT COUNT(1) from Table;" and see
that the output plan is COUNT(*).&lt;br&gt;
&lt;br&gt;
You can save it a few instructions by using COUNT(*).&amp;nbsp; You will also make the
ANSI SQL gods happier.&lt;br&gt;
&lt;br&gt;
I get a lot of questions about these syntax issues - it makes sense, as this is often
very interesting in traditional procedural programming languages.&amp;nbsp; In many cases,
there are only very minor performance differences, if any, in the SQL language due
to these.&amp;nbsp; The big differences happen because your slightly different query actually
means something semantically different and thus implies a different amount of work
to compute.&lt;br&gt;
&lt;br&gt;
Please keep sending in your questions - I'm happy to answer them!&lt;br&gt;
&lt;br&gt;
Thanks,&lt;br&gt;
&lt;br&gt;
Conor Cunningham&lt;br&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.sqlskills.com/blogs/conor/aggbug.ashx?id=a24f81ce-2e72-4a39-84c5-50cba47d6da2" /&gt;
&lt;br /&gt;
&lt;hr /&gt;
This blog is sponsored by &lt;a href="http://www.sqlskills.com"&gt;SQLSkills&lt;/a&gt;. </description>
      <comments>http://www.sqlskills.com/blogs/conor/CommentView,guid,a24f81ce-2e72-4a39-84c5-50cba47d6da2.aspx</comments>
      <category>performance</category>
    </item>
    <item>
      <trackback:ping>http://www.sqlskills.com/blogs/conor/Trackback.aspx?guid=b3efafc9-1cc5-417c-9677-12dad2e6a5d0</trackback:ping>
      <pingback:server>http://www.sqlskills.com/blogs/conor/pingback.aspx</pingback:server>
      <pingback:target>http://www.sqlskills.com/blogs/conor/PermaLink,guid,b3efafc9-1cc5-417c-9677-12dad2e6a5d0.aspx</pingback:target>
      <dc:creator>
      </dc:creator>
      <wfw:comment>http://www.sqlskills.com/blogs/conor/CommentView,guid,b3efafc9-1cc5-417c-9677-12dad2e6a5d0.aspx</wfw:comment>
      <wfw:commentRss>http://www.sqlskills.com/blogs/conor/SyndicationService.asmx/GetEntryCommentsRss?guid=b3efafc9-1cc5-417c-9677-12dad2e6a5d0</wfw:commentRss>
      <slash:comments>3</slash:comments>
      <body xmlns="http://www.w3.org/1999/xhtml">So COUNT isn't an operator, but you should
view this post as a "what is this operator" kind of post since I talk about how these
things work and what they mean.<br /><br />
One of the benefits of building database engines is that all (well, most of) the syntax
rules end up stuck in your head, but I know that many of these things are completely
non-obvious, so I think it's worth mentioning a few things about what each of these
things mean since I see them misused in lots of queries, blogs, etc.<br /><br />
-- Returns the count of rows from the table at the time that the query is run in the
transactional isolation mode in which the query is run (usually read committed for
SQL Server)<br />
SELECT COUNT(*) FROM Table<br /><br />
So that example is easy enough...<br /><br />
SELECT COUNT(column1) FROM Table<br /><br />
This is actually NOT the same query as COUNT(*), in general.  It means "count
the number of non-null column1 rows".  
<br /><br />
(Now for something cool.  If you run this on a column that is non-nullable, then
SQL Server converts it into the count(*) case because it is faster to run that form
since it doesn't have to examine the data in each row and can instead just count rows).<br /><br />
SELECT COUNT(DISTINCT column1) will count the number of UNIQUE non-null column1 values. 
It does not count NULL.  I don't believe SQL Server completely removes the DISTINCT
operation for non-null columns in all cases.  It can in some cases.<br /><br />
COUNT(col) OVER (...) is a completely different beast.  It runs a count computation
using the rules you've seen above but it does not collapse the rows - it adds a new
column on all the rows with the computed count.  
<br /><br />
Remember that these are semantically different operations.<br /><br />
Conor<br /><br /><br /><br /><br /><img width="0" height="0" src="http://www.sqlskills.com/blogs/conor/aggbug.ashx?id=b3efafc9-1cc5-417c-9677-12dad2e6a5d0" /><br /><hr />
This blog is sponsored by <a href="http://www.sqlskills.com">SQLSkills</a>. 
</body>
      <title>COUNT(*) vs COUNT(column) vs. COUNT(DISTINCT) vs. COUNT(col) OVER(...)</title>
      <guid isPermaLink="false">http://www.sqlskills.com/blogs/conor/PermaLink,guid,b3efafc9-1cc5-417c-9677-12dad2e6a5d0.aspx</guid>
      <link>http://www.sqlskills.com/blogs/conor/2008/04/17/COUNTVsCOUNTcolumnVsCOUNTDISTINCTVsCOUNTcolOVER.aspx</link>
      <pubDate>Thu, 17 Apr 2008 03:11:31 GMT</pubDate>
      <description>So COUNT isn't an operator, but you should view this post as a "what is this operator" kind of post since I talk about how these things work and what they mean.&lt;br&gt;
&lt;br&gt;
One of the benefits of building database engines is that all (well, most of) the syntax
rules end up stuck in your head, but I know that many of these things are completely
non-obvious, so I think it's worth mentioning a few things about what each of these
things mean since I see them misused in lots of queries, blogs, etc.&lt;br&gt;
&lt;br&gt;
-- Returns the count of rows from the table at the time that the query is run in the
transactional isolation mode in which the query is run (usually read committed for
SQL Server)&lt;br&gt;
SELECT COUNT(*) FROM Table&lt;br&gt;
&lt;br&gt;
So that example is easy enough...&lt;br&gt;
&lt;br&gt;
SELECT COUNT(column1) FROM Table&lt;br&gt;
&lt;br&gt;
This is actually NOT the same query as COUNT(*), in general.&amp;nbsp; It means "count
the number of non-null column1 rows".&amp;nbsp; 
&lt;br&gt;
&lt;br&gt;
(Now for something cool.&amp;nbsp; If you run this on a column that is non-nullable, then
SQL Server converts it into the count(*) case because it is faster to run that form
since it doesn't have to examine the data in each row and can instead just count rows).&lt;br&gt;
&lt;br&gt;
SELECT COUNT(DISTINCT column1) will count the number of UNIQUE non-null column1 values.&amp;nbsp;
It does not count NULL.&amp;nbsp; I don't believe SQL Server completely removes the DISTINCT
operation for non-null columns in all cases.&amp;nbsp; It can in some cases.&lt;br&gt;
&lt;br&gt;
COUNT(col) OVER (...) is a completely different beast.&amp;nbsp; It runs a count computation
using the rules you've seen above but it does not collapse the rows - it adds a new
column on all the rows with the computed count.&amp;nbsp; 
&lt;br&gt;
&lt;br&gt;
Remember that these are semantically different operations.&lt;br&gt;
&lt;br&gt;
Conor&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&lt;img width="0" height="0" src="http://www.sqlskills.com/blogs/conor/aggbug.ashx?id=b3efafc9-1cc5-417c-9677-12dad2e6a5d0" /&gt;
&lt;br /&gt;
&lt;hr /&gt;
This blog is sponsored by &lt;a href="http://www.sqlskills.com"&gt;SQLSkills&lt;/a&gt;. </description>
      <comments>http://www.sqlskills.com/blogs/conor/CommentView,guid,b3efafc9-1cc5-417c-9677-12dad2e6a5d0.aspx</comments>
      <category>query operators</category>
    </item>
    <item>
      <trackback:ping>http://www.sqlskills.com/blogs/conor/Trackback.aspx?guid=b039c40d-a48d-428c-a594-0cecd82c1ad8</trackback:ping>
      <pingback:server>http://www.sqlskills.com/blogs/conor/pingback.aspx</pingback:server>
      <pingback:target>http://www.sqlskills.com/blogs/conor/PermaLink,guid,b039c40d-a48d-428c-a594-0cecd82c1ad8.aspx</pingback:target>
      <dc:creator>
      </dc:creator>
      <wfw:comment>http://www.sqlskills.com/blogs/conor/CommentView,guid,b039c40d-a48d-428c-a594-0cecd82c1ad8.aspx</wfw:comment>
      <wfw:commentRss>http://www.sqlskills.com/blogs/conor/SyndicationService.asmx/GetEntryCommentsRss?guid=b039c40d-a48d-428c-a594-0cecd82c1ad8</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">I started off tonight playing with the
new page compression feature.  So far I like it.  I haven't found something
yet about which I want to post (which is code for "I'm still looking for the seams
;)"), but I have some other things you can try to learn a few things about how the
SQL Server QP makes assumptions about various kinds of predicates in queries during
cardinality estimations.<br /><br />
So, you may or may not know much about how the SQL Server QP figures out what plans
to run.  For the uninitiated, it almost seems like some form of magic. 
In some ways, it is - it's very powerful and poorly understood by many, and it usually
requires very little effort by someone skilled in the area to make something amazing
happen.  SQL Server merely needs to make itself sparkle when I fix a query plan
and I'm set for life :).<br /><br />
In reality, SQL Server uses a cost-based optimizer, which means that it keeps track
of all sorts of interesting statistical information, row counts, page counts, etc. 
It uses all of these in formulas to come up with numbers for each plan fragment and
then it weighs the relative costs of all of these to pick a plan that has the "least
cost".  That sounds nice and absolute until you get to go actually try to make
that work, and then you are left with all sorts of nasty questions like:<br /><br />
* What should the cost formulas be?<br />
* Do the numbers need to differ based on the customer's hardware? How do we calibrate
all of this stuff?  What do we do as machines get faster?<br />
* How do I estimate how many rows are going to come back from one predicate in my
WHERE clause or join in time less than running the query to figure it out?<br />
* Same question when I have a bunch of preciates?<br /><br />
Eventually, the QP has to make a set of assumptions so that they can come up with
a plan in a reasonable amount of time, both because customers don't like things to
ever take time and because managers don't like customers to tell them about how much
time something should take..  One assumption might be that, data is uniformly
distributed over a data type's possible values when you don't have any better information. 
This can help make it possible to come up with solutions that work well most of the
time.  The problem is that estimates can be wrong, and that can cause the QP
to come up with a different plan than had it had correct information to use when selecting
the plan.<br /><br />
So, I'll show you an example here.  To be clear, I'm not saying that this is
something that is "broken".  This just exposes a place where 2 different assumptions
rub up against each other in a way that will SEEM odd to the outside observer. 
When you consider the average customer use cases, these assumptions are not bad and
work very well the vast majority of the time...<br /><br />
To the example:<br /><pre><span style="color: Black; background-color: transparent; font-family: Courier New; font-size: 11px;"><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">drop</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">table</span> comp1 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">create</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">table</span> comp1(col1 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">int</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">identity</span>,
col3 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">nvarchar</span>(3000)) <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">declare</span> @i <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">int</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">set</span> @i=0 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">while</span> @i
&lt; 70000 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">begin</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">insert</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">into</span> comp1
(col3) <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">values</span> (N<span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;">'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'</span>) <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">set</span> @i=@i+1 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">end</span></span></pre>
So I create a table with the same long string in it 70,000 times.  
<br /><br />
Then I run a query with a where clause just to get some statistics created:<br /><pre><span style="color: Black; background-color: transparent; font-family: Courier New; font-size: 11px;"><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">select</span> * <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">from</span> comp1 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">where</span> col3 <span style="color: Silver; background-color: transparent; font-family: Courier New; font-size: 11px;">like</span><span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;">'%4567890%'</span><br /><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">dbcc</span> show_statistics
(<span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;">'comp1'</span>,
col3)<br /><br /></span>Once we have all of this stuff, we can look at the estimates for two very similar
queries:<span style="color: Black; background-color: transparent; font-family: Courier New; font-size: 11px;"><br /></span></pre><br /><pre><span style="color: Black; background-color: transparent; font-family: Courier New; font-size: 11px;"><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">select</span> * <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">from</span> comp1 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">where</span> col3 <span style="color: Silver; background-color: transparent; font-family: Courier New; font-size: 11px;">like</span><span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;">'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890%'</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">select</span> * <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">from</span> comp1 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">where</span> col3 <span style="color: Silver; background-color: transparent; font-family: Courier New; font-size: 11px;">like</span><span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;">'%123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890%'</span></span></pre>
(one is LIKE 'abc%'.  The other is LIKE '%abc% where abc is the value we have
inserted 70,000 times).<br /><br />
So, both queries will return 70,000 rows.  
<br /><br />
Well, the abc% pattern query estimates 70,000 rows (good!).  The second query
estimates 69.6985 rows.  that's a bit under ;).<br /><br />
Let's talk about this a bit more so you can understand why.  In the first query,
there is an exact string match against a column represented in the statistics histogram. 
So, the likely outcome is to take that cardinality count to determine the number of
rows that will likely be returned from the query.  In this case, we expect all
rows to come back.<br /><br />
In the second one, there is no mechanism to estimate the cardinality of ANY string
of this size (SQL Server does have a feature that does smaller strings which is exposed
as the "STRING INDEX" in the histogram, but you can't see the details of this object
in 2005 and I haven't seen that change in 2008 either).  So, for really large
strings, it is left with... guessing.<br /><br />
So, that 69.6985 number is an estimate that is partially based on the length of the
string.  Now, the QP could try to walk through each statitistics object and try
to find substrings against any existing piece of statistical data and then try to
adjust its estimate.  In practice, though, the cost of doing that is expensive. 
The various statistics objects are run at different times and have different sample
rates, so even then they will vary somewhat.  Finally, for most cases it may
just not impact the plan choice that much.  Odds are, though, that this will
bite at least one of my readers at some point.  So, this is good to know - it
can help you find that spot where the assumption in the QP is causing your query plan
to be wrong.  This is the sort of case where you will want to consider a query
hint to help the QP out.<br /><br />
There are more assumptions (and seams between them) in the cardinality estimation
code.  I'll let you guys go hunt for them a bit to find them.<br /><br />
Happy Querying!<br />
Conor Cunningham<br /><br /><br /><br /><br /><p></p><img width="0" height="0" src="http://www.sqlskills.com/blogs/conor/aggbug.ashx?id=b039c40d-a48d-428c-a594-0cecd82c1ad8" /><br /><hr />
This blog is sponsored by <a href="http://www.sqlskills.com">SQLSkills</a>. 
</body>
      <title>Costing and Statistics, continued...</title>
      <guid isPermaLink="false">http://www.sqlskills.com/blogs/conor/PermaLink,guid,b039c40d-a48d-428c-a594-0cecd82c1ad8.aspx</guid>
      <link>http://www.sqlskills.com/blogs/conor/2008/04/17/CostingAndStatisticsContinued.aspx</link>
      <pubDate>Thu, 17 Apr 2008 02:54:35 GMT</pubDate>
      <description>I started off tonight playing with the new page compression feature.&amp;nbsp; So far I like it.&amp;nbsp; I haven't found something yet about which I want to post (which is code for "I'm still looking for the seams ;)"), but I have some other things you can try to learn a few things about how the SQL Server QP makes assumptions about various kinds of predicates in queries during cardinality estimations.&lt;br&gt;
&lt;br&gt;
So, you may or may not know much about how the SQL Server QP figures out what plans
to run.&amp;nbsp; For the uninitiated, it almost seems like some form of magic.&amp;nbsp;
In some ways, it is - it's very powerful and poorly understood by many, and it usually
requires very little effort by someone skilled in the area to make something amazing
happen.&amp;nbsp; SQL Server merely needs to make itself sparkle when I fix a query plan
and I'm set for life :).&lt;br&gt;
&lt;br&gt;
In reality, SQL Server uses a cost-based optimizer, which means that it keeps track
of all sorts of interesting statistical information, row counts, page counts, etc.&amp;nbsp;
It uses all of these in formulas to come up with numbers for each plan fragment and
then it weighs the relative costs of all of these to pick a plan that has the "least
cost".&amp;nbsp; That sounds nice and absolute until you get to go actually try to make
that work, and then you are left with all sorts of nasty questions like:&lt;br&gt;
&lt;br&gt;
* What should the cost formulas be?&lt;br&gt;
* Do the numbers need to differ based on the customer's hardware? How do we calibrate
all of this stuff?&amp;nbsp; What do we do as machines get faster?&lt;br&gt;
* How do I estimate how many rows are going to come back from one predicate in my
WHERE clause or join in time less than running the query to figure it out?&lt;br&gt;
* Same question when I have a bunch of preciates?&lt;br&gt;
&lt;br&gt;
Eventually, the QP has to make a set of assumptions so that they can come up with
a plan in a reasonable amount of time, both because customers don't like things to
ever take time and because managers don't like customers to tell them about how much
time something should take..&amp;nbsp; One assumption might be that, data is uniformly
distributed over a data type's possible values when you don't have any better information.&amp;nbsp;
This can help make it possible to come up with solutions that work well most of the
time.&amp;nbsp; The problem is that estimates can be wrong, and that can cause the QP
to come up with a different plan than had it had correct information to use when selecting
the plan.&lt;br&gt;
&lt;br&gt;
So, I'll show you an example here.&amp;nbsp; To be clear, I'm not saying that this is
something that is "broken".&amp;nbsp; This just exposes a place where 2 different assumptions
rub up against each other in a way that will SEEM odd to the outside observer.&amp;nbsp;
When you consider the average customer use cases, these assumptions are not bad and
work very well the vast majority of the time...&lt;br&gt;
&lt;br&gt;
To the example:&lt;br&gt;
&lt;pre&gt;&lt;span style="color: Black; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;&lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;drop&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;table&lt;/span&gt; comp1 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;create&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;table&lt;/span&gt; comp1(col1 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;int&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;identity&lt;/span&gt;,
col3 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;nvarchar&lt;/span&gt;(3000)) &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;declare&lt;/span&gt; @i &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;int&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;set&lt;/span&gt; @i=0 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;while&lt;/span&gt; @i
&amp;lt; 70000 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;begin&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;insert&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;into&lt;/span&gt; comp1
(col3) &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;values&lt;/span&gt; (N&lt;span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'&lt;/span&gt;) &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;set&lt;/span&gt; @i=@i+1 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;end&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;
So I create a table with the same long string in it 70,000 times.&amp;nbsp; 
&lt;br&gt;
&lt;br&gt;
Then I run a query with a where clause just to get some statistics created:&lt;br&gt;
&lt;pre&gt;&lt;span style="color: Black; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;&lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;select&lt;/span&gt; * &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;from&lt;/span&gt; comp1 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;where&lt;/span&gt; col3 &lt;span style="color: Silver; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;like&lt;/span&gt; &lt;span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;'%4567890%'&lt;/span&gt;
&lt;br&gt;
&lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;dbcc&lt;/span&gt; show_statistics
(&lt;span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;'comp1'&lt;/span&gt;,
col3)&lt;br&gt;
&lt;br&gt;
&lt;/span&gt;Once we have all of this stuff, we can look at the estimates for two very similar
queries:&lt;span style="color: Black; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;
&lt;br&gt;
&lt;/span&gt;&lt;/pre&gt;
&lt;br&gt;
&lt;pre&gt;&lt;span style="color: Black; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;&lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;select&lt;/span&gt; * &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;from&lt;/span&gt; comp1 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;where&lt;/span&gt; col3 &lt;span style="color: Silver; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;like&lt;/span&gt; &lt;span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890%'&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;select&lt;/span&gt; * &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;from&lt;/span&gt; comp1 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;where&lt;/span&gt; col3 &lt;span style="color: Silver; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;like&lt;/span&gt; &lt;span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;'%123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890%'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;
(one is LIKE 'abc%'.&amp;nbsp; The other is LIKE '%abc% where abc is the value we have
inserted 70,000 times).&lt;br&gt;
&lt;br&gt;
So, both queries will return 70,000 rows.&amp;nbsp; 
&lt;br&gt;
&lt;br&gt;
Well, the abc% pattern query estimates 70,000 rows (good!).&amp;nbsp; The second query
estimates 69.6985 rows.&amp;nbsp; that's a bit under ;).&lt;br&gt;
&lt;br&gt;
Let's talk about this a bit more so you can understand why.&amp;nbsp; In the first query,
there is an exact string match against a column represented in the statistics histogram.&amp;nbsp;
So, the likely outcome is to take that cardinality count to determine the number of
rows that will likely be returned from the query.&amp;nbsp; In this case, we expect all
rows to come back.&lt;br&gt;
&lt;br&gt;
In the second one, there is no mechanism to estimate the cardinality of ANY string
of this size (SQL Server does have a feature that does smaller strings which is exposed
as the "STRING INDEX" in the histogram, but you can't see the details of this object
in 2005 and I haven't seen that change in 2008 either).&amp;nbsp; So, for really large
strings, it is left with... guessing.&lt;br&gt;
&lt;br&gt;
So, that 69.6985 number is an estimate that is partially based on the length of the
string.&amp;nbsp; Now, the QP could try to walk through each statitistics object and try
to find substrings against any existing piece of statistical data and then try to
adjust its estimate.&amp;nbsp; In practice, though, the cost of doing that is expensive.&amp;nbsp;
The various statistics objects are run at different times and have different sample
rates, so even then they will vary somewhat.&amp;nbsp; Finally, for most cases it may
just not impact the plan choice that much.&amp;nbsp; Odds are, though, that this will
bite at least one of my readers at some point.&amp;nbsp; So, this is good to know - it
can help you find that spot where the assumption in the QP is causing your query plan
to be wrong.&amp;nbsp; This is the sort of case where you will want to consider a query
hint to help the QP out.&lt;br&gt;
&lt;br&gt;
There are more assumptions (and seams between them) in the cardinality estimation
code.&amp;nbsp; I'll let you guys go hunt for them a bit to find them.&lt;br&gt;
&lt;br&gt;
Happy Querying!&lt;br&gt;
Conor Cunningham&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.sqlskills.com/blogs/conor/aggbug.ashx?id=b039c40d-a48d-428c-a594-0cecd82c1ad8" /&gt;
&lt;br /&gt;
&lt;hr /&gt;
This blog is sponsored by &lt;a href="http://www.sqlskills.com"&gt;SQLSkills&lt;/a&gt;. </description>
      <comments>http://www.sqlskills.com/blogs/conor/CommentView,guid,b039c40d-a48d-428c-a594-0cecd82c1ad8.aspx</comments>
      <category>SQL Server 2008</category>
    </item>
    <item>
      <trackback:ping>http://www.sqlskills.com/blogs/conor/Trackback.aspx?guid=ba601f26-b6bd-4c3b-bacf-d8b36a1e8a5d</trackback:ping>
      <pingback:server>http://www.sqlskills.com/blogs/conor/pingback.aspx</pingback:server>
      <pingback:target>http://www.sqlskills.com/blogs/conor/PermaLink,guid,ba601f26-b6bd-4c3b-bacf-d8b36a1e8a5d.aspx</pingback:target>
      <dc:creator>
      </dc:creator>
      <wfw:comment>http://www.sqlskills.com/blogs/conor/CommentView,guid,ba601f26-b6bd-4c3b-bacf-d8b36a1e8a5d.aspx</wfw:comment>
      <wfw:commentRss>http://www.sqlskills.com/blogs/conor/SyndicationService.asmx/GetEntryCommentsRss?guid=ba601f26-b6bd-4c3b-bacf-d8b36a1e8a5d</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">One of the areas I managed in SQL Server
had to do with the code that automatically builds statistics and uses them during
query optimization to come up with a good plan.  Today I'm going to talk a bit
about how statistics are built and how this works with parallelism and partitioning.<br /><br />
First things first.  There are two ways in which statistics are created in SQL
Server:<br />
1. You create an index.<br />
2. You run a query that needs statistics when they do not exist and the server has
"auto-create statistics" enabled.  (That's the simple definition - there are
actually caveats I am skipping)<br /><br />
Why does creating an index also create statistics?  Well, the main cost in creating
statistics is reading all the pages into memory.  The thought is that if you
want to create an index, you have already paid the expensive cost and the server might
as well go ahead and create the statistics object for you instead of re-reading those
pages later.  So, you get these "for free" (well, almost free).<br /><br />
When you run a query that tries to perform an operation where cardinality estimates
would be improved by having statistical information about a column, the server can
try to create statistics during the compilation process.  Since compilation time
needs to be kept to a minimum, these are usually done over a _sample_ of the pages
to avoid making the compilation of a simple query as expensive as an index build.<br /><br />
So the basic plan to create an index is a singlethreaded plan that is something like
this:<br /><br />
INSERT (New B-Tree)<br />
  |<br />
Sort<br />
  |<br />
SCAN(Heap or some index)<br /><br />
So during this plan's execution, there is an implicit side-effect to also create this
statistics object.<br /><br />
For auto-stats, there is a separate query that is run.  The syntax is not public,
but you can see artifacts of this if you look at the profiler and you've turned on
the various auto-stats and plan outputs:<br /><br /><pre><span style="color: Black; background-color: transparent; font-family: Courier New; font-size: 11px;"><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">drop</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">table</span> stat1 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">create</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">table</span> stat1(col1 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">int</span>,
col2 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">int</span>,
col3 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">binary</span>(6000)) <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">declare</span> @i <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">int</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">set</span> @i=0 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">while</span> @i
&lt; 1000 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">begin</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">insert</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">into</span> stat1(col1,
col2) <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">values</span> (<span style="color: Fuchsia; background-color: transparent; font-family: Courier New; font-size: 11px;">rand</span>()*1000, <span style="color: Fuchsia; background-color: transparent; font-family: Courier New; font-size: 11px;">rand</span>()*1000) <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">set</span> @i=@i+1 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">end</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">select</span> * <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">from</span> stat1 <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">where</span> col2
&gt; 5</span></pre><br /><br />
Execution Tree<br />
--------------<br />
Stream Aggregate(DEFINE:([Expr1004]=STATMAN([t1].[dbo].[stat1].[col2])))<br />
  |--Sort(ORDER BY:([t1].[dbo].[stat1].[col2] ASC))<br />
       |--Table Scan(OBJECT:([t1].[dbo].[stat1]))<br /><br /><br /><p></p>
What is this?  Well, this is the plan that is used to generate the statistics
object.  It scans a table, sorts it (into ascending order), and then feeds it
into this magical thing called statman.<br /><br />
Now, the details of statman are undocumented, but you can infer that it is a special
internal aggregate function that is being run inside of a group by operation (stream
aggregate in the plan).  This means that it is collapsing all the rows into some
BLOB and then it does something with this.<br /><br /><br />
Next time I hope to talk about parallel statistics build.<br /><br />
Happy querying!<br /><br />
Conor Cunningham<br /><img width="0" height="0" src="http://www.sqlskills.com/blogs/conor/aggbug.ashx?id=ba601f26-b6bd-4c3b-bacf-d8b36a1e8a5d" /><br /><hr />
This blog is sponsored by <a href="http://www.sqlskills.com">SQLSkills</a>. 
</body>
      <title>Statistics, Damned Lies, and Statistics - What is Statman?</title>
      <guid isPermaLink="false">http://www.sqlskills.com/blogs/conor/PermaLink,guid,ba601f26-b6bd-4c3b-bacf-d8b36a1e8a5d.aspx</guid>
      <link>http://www.sqlskills.com/blogs/conor/2008/04/10/StatisticsDamnedLiesAndStatisticsWhatIsStatman.aspx</link>
      <pubDate>Thu, 10 Apr 2008 02:50:58 GMT</pubDate>
      <description>One of the areas I managed in SQL Server had to do with the code that automatically builds statistics and uses them during query optimization to come up with a good plan.&amp;nbsp; Today I'm going to talk a bit about how statistics are built and how this works with parallelism and partitioning.&lt;br&gt;
&lt;br&gt;
First things first.&amp;nbsp; There are two ways in which statistics are created in SQL
Server:&lt;br&gt;
1. You create an index.&lt;br&gt;
2. You run a query that needs statistics when they do not exist and the server has
"auto-create statistics" enabled.&amp;nbsp; (That's the simple definition - there are
actually caveats I am skipping)&lt;br&gt;
&lt;br&gt;
Why does creating an index also create statistics?&amp;nbsp; Well, the main cost in creating
statistics is reading all the pages into memory.&amp;nbsp; The thought is that if you
want to create an index, you have already paid the expensive cost and the server might
as well go ahead and create the statistics object for you instead of re-reading those
pages later.&amp;nbsp; So, you get these "for free" (well, almost free).&lt;br&gt;
&lt;br&gt;
When you run a query that tries to perform an operation where cardinality estimates
would be improved by having statistical information about a column, the server can
try to create statistics during the compilation process.&amp;nbsp; Since compilation time
needs to be kept to a minimum, these are usually done over a _sample_ of the pages
to avoid making the compilation of a simple query as expensive as an index build.&lt;br&gt;
&lt;br&gt;
So the basic plan to create an index is a singlethreaded plan that is something like
this:&lt;br&gt;
&lt;br&gt;
INSERT (New B-Tree)&lt;br&gt;
&amp;nbsp; |&lt;br&gt;
Sort&lt;br&gt;
&amp;nbsp; |&lt;br&gt;
SCAN(Heap or some index)&lt;br&gt;
&lt;br&gt;
So during this plan's execution, there is an implicit side-effect to also create this
statistics object.&lt;br&gt;
&lt;br&gt;
For auto-stats, there is a separate query that is run.&amp;nbsp; The syntax is not public,
but you can see artifacts of this if you look at the profiler and you've turned on
the various auto-stats and plan outputs:&lt;br&gt;
&lt;br&gt;
&lt;pre&gt;&lt;span style="color: Black; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;&lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;drop&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;table&lt;/span&gt; stat1 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;create&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;table&lt;/span&gt; stat1(col1 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;int&lt;/span&gt;,
col2 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;int&lt;/span&gt;,
col3 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;binary&lt;/span&gt;(6000)) &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;declare&lt;/span&gt; @i &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;int&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;set&lt;/span&gt; @i=0 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;while&lt;/span&gt; @i
&amp;lt; 1000 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;begin&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;insert&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;into&lt;/span&gt; stat1(col1,
col2) &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;values&lt;/span&gt; (&lt;span style="color: Fuchsia; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;rand&lt;/span&gt;()*1000, &lt;span style="color: Fuchsia; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;rand&lt;/span&gt;()*1000) &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;set&lt;/span&gt; @i=@i+1 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;end&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;select&lt;/span&gt; * &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;from&lt;/span&gt; stat1 &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;where&lt;/span&gt; col2
&amp;gt; 5&lt;/span&gt;&lt;/pre&gt;
&lt;br&gt;
&lt;br&gt;
Execution Tree&lt;br&gt;
--------------&lt;br&gt;
Stream Aggregate(DEFINE:([Expr1004]=STATMAN([t1].[dbo].[stat1].[col2])))&lt;br&gt;
&amp;nbsp; |--Sort(ORDER BY:([t1].[dbo].[stat1].[col2] ASC))&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t1].[dbo].[stat1]))&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&lt;p&gt;
&lt;/p&gt;
What is this?&amp;nbsp; Well, this is the plan that is used to generate the statistics
object.&amp;nbsp; It scans a table, sorts it (into ascending order), and then feeds it
into this magical thing called statman.&lt;br&gt;
&lt;br&gt;
Now, the details of statman are undocumented, but you can infer that it is a special
internal aggregate function that is being run inside of a group by operation (stream
aggregate in the plan).&amp;nbsp; This means that it is collapsing all the rows into some
BLOB and then it does something with this.&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
Next time I hope to talk about parallel statistics build.&lt;br&gt;
&lt;br&gt;
Happy querying!&lt;br&gt;
&lt;br&gt;
Conor Cunningham&lt;br&gt;
&lt;img width="0" height="0" src="http://www.sqlskills.com/blogs/conor/aggbug.ashx?id=ba601f26-b6bd-4c3b-bacf-d8b36a1e8a5d" /&gt;
&lt;br /&gt;
&lt;hr /&gt;
This blog is sponsored by &lt;a href="http://www.sqlskills.com"&gt;SQLSkills&lt;/a&gt;. </description>
      <comments>http://www.sqlskills.com/blogs/conor/CommentView,guid,ba601f26-b6bd-4c3b-bacf-d8b36a1e8a5d.aspx</comments>
    </item>
    <item>
      <trackback:ping>http://www.sqlskills.com/blogs/conor/Trackback.aspx?guid=a746ef4a-0eb2-4f83-bb00-9d14e4d50f33</trackback:ping>
      <pingback:server>http://www.sqlskills.com/blogs/conor/pingback.aspx</pingback:server>
      <pingback:target>http://www.sqlskills.com/blogs/conor/PermaLink,guid,a746ef4a-0eb2-4f83-bb00-9d14e4d50f33.aspx</pingback:target>
      <dc:creator>
      </dc:creator>
      <wfw:comment>http://www.sqlskills.com/blogs/conor/CommentView,guid,a746ef4a-0eb2-4f83-bb00-9d14e4d50f33.aspx</wfw:comment>
      <wfw:commentRss>http://www.sqlskills.com/blogs/conor/SyndicationService.asmx/GetEntryCommentsRss?guid=a746ef4a-0eb2-4f83-bb00-9d14e4d50f33</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">I've returned from a small trip and I will
be preparing my next SQL post soon.<br /><br />
I've been struggling with slow POP3 sync behavior on my Outlook 2007/Vista 64 box,
and I finally found a hammer to beat it back into submission.<br /><br />
The problem - I sync manually, and when I do the application becomes basically so
slow as to be non-responsive.  I deleted a bunch of mail to get my mailbox down
to a reasonable size (150MB) - still there.  I deleted RSS feeds, blaming XML
for my woes again :).  That didn't work either.<br /><br />
I eventually found this:<br />
http://support.microsoft.com/default.aspx?scid=kb;EN-US;935400<br /><br />
Something to do with the new Vista TCP window size algorithm not working well with
"legacy" hardware.  No only is the download slow, but using Outlook becomes unbearable
as well, so I can't really do anything with the application...  I didn't get
timeout errors (at least not frequently), but I certainly had bad feelings about my
email experience.<br /><br />
When I disabled the new TCP window behavior, all went back to what was expected....
Now I have to go find all those RSS feeds again :).  So here's to "<span class="userInput">netsh
interface tcp set global autotuninglevel=disabled".  It worked for me, at least
so far.</span><br /><br />
I think that this is a case where Microsoft has an opportunity to compare their offering
to gmail and determine "hey, I wonder why people think that a webui is better - they
are _so_ much slower...." well, in some cases the thick client is actually slower,
and that doesn't make MS look very good.  So I hope this workaround avoids frustration
for others :).<br /><br />
I'll also hope that MS could add something to outlook in the next service pack when
it realizes that it downloads 15KB in 2 minutes from my POP3 servers.  Perhaps
they can add a popup or special error in the next service pack of outlook 2007 to
point people in the right direction.<br /><p></p><br />
My setup:<br />
Vista x64 SP1<br />
netgear gigabit 8 port switch<br />
linksys wrt54g NAT<br />
some no-name cable modem that my cable company gave to me.<br /><br /><br /><img width="0" height="0" src="http://www.sqlskills.com/blogs/conor/aggbug.ashx?id=a746ef4a-0eb2-4f83-bb00-9d14e4d50f33" /><br /><hr />
This blog is sponsored by <a href="http://www.sqlskills.com">SQLSkills</a>. 
</body>
      <title>Slow Outlook 2007 on Vista (x64)</title>
      <guid isPermaLink="false">http://www.sqlskills.com/blogs/conor/PermaLink,guid,a746ef4a-0eb2-4f83-bb00-9d14e4d50f33.aspx</guid>
      <link>http://www.sqlskills.com/blogs/conor/2008/04/09/SlowOutlook2007OnVistaX64.aspx</link>
      <pubDate>Wed, 09 Apr 2008 01:01:01 GMT</pubDate>
      <description>I've returned from a small trip and I will be preparing my next SQL post soon.&lt;br&gt;
&lt;br&gt;
I've been struggling with slow POP3 sync behavior on my Outlook 2007/Vista 64 box,
and I finally found a hammer to beat it back into submission.&lt;br&gt;
&lt;br&gt;
The problem - I sync manually, and when I do the application becomes basically so
slow as to be non-responsive.&amp;nbsp; I deleted a bunch of mail to get my mailbox down
to a reasonable size (150MB) - still there.&amp;nbsp; I deleted RSS feeds, blaming XML
for my woes again :).&amp;nbsp; That didn't work either.&lt;br&gt;
&lt;br&gt;
I eventually found this:&lt;br&gt;
http://support.microsoft.com/default.aspx?scid=kb;EN-US;935400&lt;br&gt;
&lt;br&gt;
Something to do with the new Vista TCP window size algorithm not working well with
"legacy" hardware.&amp;nbsp; No only is the download slow, but using Outlook becomes unbearable
as well, so I can't really do anything with the application...&amp;nbsp; I didn't get
timeout errors (at least not frequently), but I certainly had bad feelings about my
email experience.&lt;br&gt;
&lt;br&gt;
When I disabled the new TCP window behavior, all went back to what was expected....
Now I have to go find all those RSS feeds again :).&amp;nbsp; So here's to "&lt;span class="userInput"&gt;netsh
interface tcp set global autotuninglevel=disabled".&amp;nbsp; It worked for me, at least
so far.&lt;/span&gt;
&lt;br&gt;
&lt;br&gt;
I think that this is a case where Microsoft has an opportunity to compare their offering
to gmail and determine "hey, I wonder why people think that a webui is better - they
are _so_ much slower...." well, in some cases the thick client is actually slower,
and that doesn't make MS look very good.&amp;nbsp; So I hope this workaround avoids frustration
for others :).&lt;br&gt;
&lt;br&gt;
I'll also hope that MS could add something to outlook in the next service pack when
it realizes that it downloads 15KB in 2 minutes from my POP3 servers.&amp;nbsp; Perhaps
they can add a popup or special error in the next service pack of outlook 2007 to
point people in the right direction.&lt;br&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;br&gt;
My setup:&lt;br&gt;
Vista x64 SP1&lt;br&gt;
netgear gigabit 8 port switch&lt;br&gt;
linksys wrt54g NAT&lt;br&gt;
some no-name cable modem that my cable company gave to me.&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&lt;img width="0" height="0" src="http://www.sqlskills.com/blogs/conor/aggbug.ashx?id=a746ef4a-0eb2-4f83-bb00-9d14e4d50f33" /&gt;
&lt;br /&gt;
&lt;hr /&gt;
This blog is sponsored by &lt;a href="http://www.sqlskills.com"&gt;SQLSkills&lt;/a&gt;. </description>
      <comments>http://www.sqlskills.com/blogs/conor/CommentView,guid,a746ef4a-0eb2-4f83-bb00-9d14e4d50f33.aspx</comments>
    </item>
    <item>
      <trackback:ping>http://www.sqlskills.com/blogs/conor/Trackback.aspx?guid=1a9632b2-7664-431b-8c60-29794f0de3fa</trackback:ping>
      <pingback:server>http://www.sqlskills.com/blogs/conor/pingback.aspx</pingback:server>
      <pingback:target>http://www.sqlskills.com/blogs/conor/PermaLink,guid,1a9632b2-7664-431b-8c60-29794f0de3fa.aspx</pingback:target>
      <dc:creator>
      </dc:creator>
      <wfw:comment>http://www.sqlskills.com/blogs/conor/CommentView,guid,1a9632b2-7664-431b-8c60-29794f0de3fa.aspx</wfw:comment>
      <wfw:commentRss>http://www.sqlskills.com/blogs/conor/SyndicationService.asmx/GetEntryCommentsRss?guid=1a9632b2-7664-431b-8c60-29794f0de3fa</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">Based on my previous post describing the
differences between ANSI-89 and ANSI-92+ join syntaxes and recommendations, I had
a follow-up question in a comment which was (paraphrased)<br /><br />
What do I do with non-join WHERE clauses - how should I write those?<br /><br />
Example:<br /><br /><p class="MsoPlainText"></p><pre><span style="color: Black; background-color: transparent; font-family: Courier New; font-size: 11px;"><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">SELECT</span> p.FirstName
+ <span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;">'
works in '</span> + d.DepartmentName <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">AS</span><span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;">'Who
Works Where'</span><span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">FROM</span> Person <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">AS</span> p <span style="color: Silver; background-color: transparent; font-family: Courier New; font-size: 11px;">JOIN</span> Department <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">AS</span> d <span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;">ON</span> p.DepartmentID
= d.DepartmentID <span style="color: Silver; background-color: transparent; font-family: Courier New; font-size: 11px;">AND</span> p.JobType
= <span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;">'Salaried'</span></span></pre>
(so, the question is about the filter condition on p.JobType).<br /><br />
Answer: it doesn't generally matter, but I'd recommend that you put it in the WHERE
clause for readability.<br /><br />
I'll explain the why now.  A little background first:<br /><br />
In SQL Server (actually all QPs of any note), the SQL string is converted into an
operator tree and then there's a lot of magic to rewrite that query tree into alternative
forms of the query.  While some people think of this as "if I try different syntaxes,
the query may execute faster", the optimizer is actually doing something at a much
deeper level - it is essentially rewriting the tree using a series of rules about
equivalences.  It's a lot more like doing a math proof than trying different
syntaxes - it deals with associativity, commutivity, etc. (Conor bats cobwebs out
of people's heads- yes, you guys did this stuff in school).  It's a big set theory
algebra machine.  So, you put in a tree repesenting the syntax at one end and
get a query plan out the other side.<br /><br />
So, I'll ask the question from a slightly different perspective and answer that too
to help explain the "why":<br /><br />
"Does putting filters in the join condition of an inner join impact the plan choice
from the optimizer?".<br /><br />
Answer: no - at least not in most cases.  
<br /><br />
When the SQL Server Optimizer starts working on this query, it will do a number of
things very early in the optimization process. One thing is called "simplification",
and most of you can guess what will happen there.  One core task in simplication
is "predicate pushdown", where the query is rewritten to push the filter conditions
in WHERE clauses towards the tables on which the predicates are defined.  This
mostly enables index matching later in optimization.  It also enables computed
column matching.<br /><br />
So, these predicates are pushed down in both cases.  You lose a lot in query
readability by trying this form of rewrite for very little gain.<br /><br />
There is one case where I'd consider doing this, but it really requires that you have
uber knowledge of the QP.  However, this seems like a good challenge, so I'll
explain the situation and let you guys write in if you can find an example of it:<br /><br />
You know that the QP uses relational algebra equivalence rules to rewrite a query
tree (so A join B is equivalent to B join A, filter .a(a join b) == (select * from
a where filter.a) join b, etc.<br /><br />
One could imagine that some of the fancier operators may not fit as easily into the
relational algebra rewrite rules.  (Or, they are just so complex that the cost
of trying such rewrites outweighs the benefit).<br /><br />
Can you find operators where (filter (OPERATOR (SCAN TABLE)) is not equivalent to
(OPERATOR (filter (SCAN TABLE))?  
<br /><br />
Obviously inner join is a bad place to start.  I'll throw out some not-so-random
areas for you to try:<br />
* updates<br />
* xml column manipulations<br />
* SELECT list items on objects that change semantics based on how many times they
are executed in a query (rand()?  think functions)<br />
* play with group by (this one is tricky)<br />
* OVER clause?<br />
* UNION/UNION ALL, INTERSECT, ...<br /><br />
So, there are some cases where the QP will do these rewrites, and there are some places
where it can't/won't (or at least doesn't do it always).  In a few of these cases,
the intent of the query can be preserved by manually rewriting the query to push the
predicate "down" the query tree towards the source tables.  However, I would
not recommend this unless you really know what you are doing - the query rewrite needs
to be equivalent or else you may not get the right results back from your query!<br /><br /><br />
Bottom line - I think that the query is far more readable with non-join predicates
in the WHERE clause.  Whenever I try to optimize queries, I usually push them
into this format so that I can wrap my head around what the query is trying to accomplish.<br /><br /><br />
Happy querying!<br /><br />
Conor<br /><p></p><img width="0" height="0" src="http://www.sqlskills.com/blogs/conor/aggbug.ashx?id=1a9632b2-7664-431b-8c60-29794f0de3fa" /><br /><hr />
This blog is sponsored by <a href="http://www.sqlskills.com">SQLSkills</a>. 
</body>
      <title>How to write non-JOIN WHERE clauses</title>
      <guid isPermaLink="false">http://www.sqlskills.com/blogs/conor/PermaLink,guid,1a9632b2-7664-431b-8c60-29794f0de3fa.aspx</guid>
      <link>http://www.sqlskills.com/blogs/conor/2008/04/01/HowToWriteNonJOINWHEREClauses.aspx</link>
      <pubDate>Tue, 01 Apr 2008 01:47:00 GMT</pubDate>
      <description>Based on my previous post describing the differences between ANSI-89 and ANSI-92+ join syntaxes and recommendations, I had a follow-up question in a comment which was (paraphrased)&lt;br&gt;
&lt;br&gt;
What do I do with non-join WHERE clauses - how should I write those?&lt;br&gt;
&lt;br&gt;
Example:&lt;br&gt;
&lt;br&gt;
&lt;p class="MsoPlainText"&gt;
&lt;/p&gt;
&lt;pre&gt;&lt;span style="color: Black; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;&lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;SELECT&lt;/span&gt; p.FirstName
+ &lt;span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;'
works in '&lt;/span&gt; + d.DepartmentName &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;AS&lt;/span&gt; &lt;span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;'Who
Works Where'&lt;/span&gt; &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;FROM&lt;/span&gt; Person &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;AS&lt;/span&gt; p &lt;span style="color: Silver; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;JOIN&lt;/span&gt; Department &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;AS&lt;/span&gt; d &lt;span style="color: Blue; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;ON&lt;/span&gt; p.DepartmentID
= d.DepartmentID &lt;span style="color: Silver; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;AND&lt;/span&gt; p.JobType
= &lt;span style="color: Red; background-color: transparent; font-family: Courier New; font-size: 11px;"&gt;'Salaried'&lt;/span&gt; &lt;/span&gt;&lt;/pre&gt;
(so, the question is about the filter condition on p.JobType).&lt;br&gt;
&lt;br&gt;
Answer: it doesn't generally matter, but I'd recommend that you put it in the WHERE
clause for readability.&lt;br&gt;
&lt;br&gt;
I'll explain the why now.&amp;nbsp; A little background first:&lt;br&gt;
&lt;br&gt;
In SQL Server (actually all QPs of any note), the SQL string is converted into an
operator tree and then there's a lot of magic to rewrite that query tree into alternative
forms of the query.&amp;nbsp; While some people think of this as "if I try different syntaxes,
the query may execute faster", the optimizer is actually doing something at a much
deeper level - it is essentially rewriting the tree using a series of rules about
equivalences.&amp;nbsp; It's a lot more like doing a math proof than trying different
syntaxes - it deals with associativity, commutivity, etc. (Conor bats cobwebs out
of people's heads- yes, you guys did this stuff in school).&amp;nbsp; It's a big set theory
algebra machine.&amp;nbsp; So, you put in a tree repesenting the syntax at one end and
get a query plan out the other side.&lt;br&gt;
&lt;br&gt;
So, I'll ask the question from a slightly different perspective and answer that too
to help explain the "why":&lt;br&gt;
&lt;br&gt;
"Does putting filters in the join condition of an inner join impact the plan choice
from the optimizer?".&lt;br&gt;
&lt;br&gt;
Answer: no - at least not in most cases.&amp;nbsp; 
&lt;br&gt;
&lt;br&gt;
When the SQL Server Optimizer starts working on this query, it will do a number of
things very early in the optimization process. One thing is called "simplification",
and most of you can guess what will happen there.&amp;nbsp; One core task in simplication
is "predicate pushdown", where the query is rewritten to push the filter conditions
in WHERE clauses towards the tables on which the predicates are defined.&amp;nbsp; This
mostly enables index matching later in optimization.&amp;nbsp; It also enables computed
column matching.&lt;br&gt;
&lt;br&gt;
So, these predicates are pushed down in both cases.&amp;nbsp; You lose a lot in query
readability by trying this form of rewrite for very little gain.&lt;br&gt;
&lt;br&gt;
There is one case where I'd consider doing this, but it really requires that you have
uber knowledge of the QP.&amp;nbsp; However, this seems like a good challenge, so I'll
explain the situation and let you guys write in if you can find an example of it:&lt;br&gt;
&lt;br&gt;
You know that the QP uses relational algebra equivalence rules to rewrite a query
tree (so A join B is equivalent to B join A, filter .a(a join b) == (select * from
a where filter.a) join b, etc.&lt;br&gt;
&lt;br&gt;
One could imagine that some of the fancier operators may not fit as easily into the
relational algebra rewrite rules.&amp;nbsp; (Or, they are just so complex that the cost
of trying such rewrites outweighs the benefit).&lt;br&gt;
&lt;br&gt;
Can you find operators where (filter (OPERATOR (SCAN TABLE)) is not equivalent to
(OPERATOR (filter (SCAN TABLE))?&amp;nbsp; 
&lt;br&gt;
&lt;br&gt;
Obviously inner join is a bad place to start.&amp;nbsp; I'll throw out some not-so-random
areas for you to try:&lt;br&gt;
* updates&lt;br&gt;
* xml column manipulations&lt;br&gt;
* SELECT list items on objects that change semantics based on how many times they
are executed in a query (rand()?&amp;nbsp; think functions)&lt;br&gt;
* play with group by (this one is tricky)&lt;br&gt;
* OVER clause?&lt;br&gt;
* UNION/UNION ALL, INTERSECT, ...&lt;br&gt;
&lt;br&gt;
So, there are some cases where the QP will do these rewrites, and there are some places
where it can't/won't (or at least doesn't do it always).&amp;nbsp; In a few of these cases,
the intent of the query can be preserved by manually rewriting the query to push the
predicate "down" the query tree towards the source tables.&amp;nbsp; However, I would
not recommend this unless you really know what you are doing - the query rewrite needs
to be equivalent or else you may not get the right results back from your query!&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
Bottom line - I think that the query is far more readable with non-join predicates
in the WHERE clause.&amp;nbsp; Whenever I try to optimize queries, I usually push them
into this format so that I can wrap my head around what the query is trying to accomplish.&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
Happy querying!&lt;br&gt;
&lt;br&gt;
Conor&lt;br&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.sqlskills.com/blogs/conor/aggbug.ashx?id=1a9632b2-7664-431b-8c60-29794f0de3fa" /&gt;
&lt;br /&gt;
&lt;hr /&gt;
This blog is sponsored by &lt;a href="http://www.sqlskills.com"&gt;SQLSkills&lt;/a&gt;. </description>
      <comments>http://www.sqlskills.com/blogs/conor/CommentView,guid,1a9632b2-7664-431b-8c60-29794f0de3fa.aspx</comments>
    </item>
    <item>
      <trackback:ping>http://www.sqlskills.com/blogs/conor/Trackback.aspx?guid=a568aa12-273f-46fe-bf64-53358257c6fa</trackback:ping>
      <pingback:server>http://www.sqlskills.com/blogs/conor/pingback.aspx</pingback:server>
      <pingback:target>http://www.sqlskills.com/blogs/conor/PermaLink,guid,a568aa12-273f-46fe-bf64-53358257c6fa.aspx</pingback:target>
      <dc:creator>
      </dc:creator>
      <wfw:comment>http://www.sqlskills.com/blogs/conor/CommentView,guid,a568aa12-273f-46fe-bf64-53358257c6fa.aspx</wfw:comment>
      <wfw:commentRss>http://www.sqlskills.com/blogs/conor/SyndicationService.asmx/GetEntryCommentsRss?guid=a568aa12-273f-46fe-bf64-53358257c6fa</wfw:commentRss>
      <slash:comments>2</slash:comments>
      <body xmlns="http://www.w3.org/1999/xhtml">I had a request from a reader that I'll
answer today about when to do joins in the ON clause and when to do them in the WHERE
clause.  For example:<br /><br />
SELECT * FROM A, B WHERE A.a = B.b<br /><br />
vs. 
<br /><br />
SELECT * FROM A INNER JOIN B ON (A.a = B.b)<br /><br />
The short answer is that both are the same (at least for inner joins), but I prefer
and encourage you to use the latter format (and I will explain why).<br /><br />
Earlier versions of ANSI SQL did not contain the ON clause for join conditions - it
used the where clause for everything.  This was fine for inner joins, but as
database applications started using outer joins, there were problems that arose with
this approach.  Some of you may remember the original ANSI-89-era syntax for
*= and =*.  These were used on the predicates to define the behavior for an outer
join.  In this case, we'll preserve non-matching rows from A in addition to the
normal rows returned from a join:<br /><br />
SELECT * FROM A, B where A.a *= B.b<br />
which is equvalent to:<br />
SELECT * FROM A LEFT OUTER JOIN B on (A.a = B.b)<br /><p></p><br />
This "hack" worked fine until people started using multiple predicates for things
and also started doing multiple outer joins in one query.  Then we were left
with a big, ambiguous mess about which *=, =* applied to which join.  So, ANSI
banished *= and =* and SQL Server has been threatening to follow for quite sometime. 
I honestly never use the old-style join syntax, so I don't even recall the exact deprecation
state.  It is dead to me already ;).<br /><br />
The broader concept is that predicates are "attached" to joins using the ON clause. 
This is very helpful when you are trying to figure out what should happen in a query. 
It helps semantically define the set of rows that should return from the join.<br /><br />
So, if I start nesting various inner and outer joins in a big, nasty query, all of
a sudden it is very nice to have an ON clause to define what should go where.<br /><br />
SELECT * FROM A INNER JOIN (SELECT B.* FROM B LEFT OUTER JOIN C ON (B.col1=C.col1
and B.foo.C.bar)) AS I1 ON A.col1=I1.col1;<br /><br />
As applications get more complex, it is not uncommon to have 10s of tables in a query.<br /><br />
Internal to the SQL Server query processor (actually pretty much all query processors),
there is a tree format for the query operations.  The exact representation will
vary from vendor to vendor, but each one of these SQL syntax pieces transates to some
set of relational operators in this query tree.  Putting your query syntactically
into this format gets things much closer to the internal algebra of the query processor
in addition to making things easier to read as queries get more complex.<br /><br /><i>Actually, if I were to go build my own QP, I'd seriously consider adding a query
tree mechanism in addition to SQL (this concept is not new and is not mine). 
OLEDB had a concept like this in the earlier public betas, for example.  Obviously
the implementor would want to retain the ability to change the internal implementation,
but a tree of commands is actually far easier to grok than SQL, once you get used
to the idea.  Other technologies expose a graph structure to you (video codecs/transforms
in windows, msbuild is an XML file representing a tree, etc).  SQL as a textual
language exists historically.  It's also a nice way to write queries :).</i><br /><br />
The only other area where I get concerned is when people turn off ANSI_NULLs. 
It is one of those historical features that should basically never be used. 
I could imagine cases where some comparisons in joins behave differently in the ON
clause vs. afterwards in an WHERE clause.  I don't want to pollute people's minds,
as my attempts to go back and re-learn the quirks on this for this post left me baffled
since NULL=NULL returns TRUE only for some syntax constructs.  So, I don't have
a case where it is broken, but I'll leave you with the "ANSI_NULLs off is bad" message
and list it as a potential reason.<br /><br />
Will you get wrong results if you use the old-style join syntax?  no.  The
world will still turn.  So, this is really a recommendation based on style and
sanity.  I would recommend that you get used to the newer style - it may help
you write more powerful applications and think more like the QP.  For some applications,
this might let you write more powerful features for your users.<br /><br /><br />
Thanks,<br /><br />
Conor Cunningham<br /><img width="0" height="0" src="http://www.sqlskills.com/blogs/conor/aggbug.ashx?id=a568aa12-273f-46fe-bf64-53358257c6fa" /><br /><hr />
This blog is sponsored by <a href="http://www.sqlskills.com">SQLSkills</a>. 
</body>
      <title>ON vs. WHERE - where should you put join conditions?</title>
      <guid isPermaLink="false">http://www.sqlskills.com/blogs/conor/PermaLink,guid,a568aa12-273f-46fe-bf64-53358257c6fa.aspx</guid>
      <link>http://www.sqlskills.com/blogs/conor/2008/03/29/ONVsWHEREWhereShouldYouPutJoinConditions.aspx</link>
      <pubDate>Sat, 29 Mar 2008 20:31:04 GMT</pubDate>
      <description>I had a request from a reader that I'll answer today about when to do
joins in the ON clause and when to do them in the WHERE clause.&amp;nbsp; For
example:&lt;br&gt;
&lt;br&gt;
SELECT * FROM A, B WHERE A.a = B.b&lt;br&gt;
&lt;br&gt;
vs. 
&lt;br&gt;
&lt;br&gt;
SELECT * FROM A INNER JOIN B ON (A.a = B.b)&lt;br&gt;
&lt;br&gt;
The short answer is that both are the same (at least for inner joins), but I prefer
and encourage you to use the latter format (and I will explain why).&lt;br&gt;
&lt;br&gt;
Earlier versions of ANSI SQL did not contain the ON clause for join conditions - it
used the where clause for everything.&amp;nbsp; This was fine for inner joins, but as
database applications started using outer joins, there were problems that arose with
this approach.&amp;nbsp; Some of you may remember the original ANSI-89-era syntax for
*= and =*.&amp;nbsp; These were used on the predicates to define the behavior for an outer
join.&amp;nbsp; In this case, we'll preserve non-matching rows from A in addition to the
normal rows returned from a join:&lt;br&gt;
&lt;br&gt;
SELECT * FROM A, B where A.a *= B.b&lt;br&gt;
which is equvalent to:&lt;br&gt;
SELECT * FROM A LEFT OUTER JOIN B on (A.a = B.b)&lt;br&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;br&gt;
This "hack" worked fine until people started using multiple predicates for things
and also started doing multiple outer joins in one query.&amp;nbsp; Then we were left
with a big, ambiguous mess about which *=, =* applied to which join.&amp;nbsp; So, ANSI
banished *= and =* and SQL Server has been threatening to follow for quite sometime.&amp;nbsp;
I honestly never use the old-style join syntax, so I don't even recall the exact deprecation
state.&amp;nbsp; It is dead to me already ;).&lt;br&gt;
&lt;br&gt;
The broader concept is that predicates are "attached" to joins using the ON clause.&amp;nbsp;
This is very helpful when you are trying to figure out what should happen in a query.&amp;nbsp;
It helps semantically define the set of rows that should return from the join.&lt;br&gt;
&lt;br&gt;
So, if I start nesting various inner and outer joins in a big, nasty query, all of
a sudden it is very nice to have an ON clause to define what should go where.&lt;br&gt;
&lt;br&gt;
SELECT * FROM A INNER JOIN (SELECT B.* FROM B LEFT OUTER JOIN C ON (B.col1=C.col1
and B.foo.C.bar)) AS I1 ON A.col1=I1.col1;&lt;br&gt;
&lt;br&gt;
As applications get more complex, it is not uncommon to have 10s of tables in a query.&lt;br&gt;
&lt;br&gt;
Internal to the SQL Server query processor (actually pretty much all query processors),
there is a tree format for the query operations.&amp;nbsp; The exact representation will
vary from vendor to vendor, but each one of these SQL syntax pieces transates to some
set of relational operators in this query tree.&amp;nbsp; Putting your query syntactically
into this format gets things much closer to the internal algebra of the query processor
in addition to making things easier to read as queries get more complex.&lt;br&gt;
&lt;br&gt;
&lt;i&gt;Actually, if I were to go build my own QP, I'd seriously consider adding a query
tree mechanism in addition to SQL (this concept is not new and is not mine).&amp;nbsp;
OLEDB had a concept like this in the earlier public betas, for example.&amp;nbsp; Obviously
the implementor would want to retain the ability to change the internal implementation,
but a tree of commands is actually far easier to grok than SQL, once you get used
to the idea.&amp;nbsp; Other technologies expose a graph structure to you (video codecs/transforms
in windows, msbuild is an XML file representing a tree, etc).&amp;nbsp; SQL as a textual
language exists historically.&amp;nbsp; It's also a nice way to write queries :).&lt;/i&gt;
&lt;br&gt;
&lt;br&gt;
The only other area where I get concerned is when people turn off ANSI_NULLs.&amp;nbsp;
It is one of those historical features that should basically never be used.&amp;nbsp;
I could imagine cases where some comparisons in joins behave differently in the ON
clause vs. afterwards in an WHERE clause.&amp;nbsp; I don't want to pollute people's minds,
as my attempts to go back and re-learn the quirks on this for this post left me baffled
since NULL=NULL returns TRUE only for some syntax constructs.&amp;nbsp; So, I don't have
a case where it is broken, but I'll leave you with the "ANSI_NULLs off is bad" message
and list it as a potential reason.&lt;br&gt;
&lt;br&gt;
Will you get wrong results if you use the old-style join syntax?&amp;nbsp; no.&amp;nbsp; The
world will still turn.&amp;nbsp; So, this is really a recommendation based on style and
sanity.&amp;nbsp; I would recommend that you get used to the newer style - it may help
you write more powerful applications and think more like the QP.&amp;nbsp; For some applications,
this might let you write more powerful features for your users.&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
Thanks,&lt;br&gt;
&lt;br&gt;
Conor Cunningham&lt;br&gt;
&lt;img width="0" height="0" src="http://www.sqlskills.com/blogs/conor/aggbug.ashx?id=a568aa12-273f-46fe-bf64-53358257c6fa" /&gt;
&lt;br /&gt;
&lt;hr /&gt;
This blog is sponsored by &lt;a href="http://www.sqlskills.com"&gt;SQLSkills&lt;/a&gt;. </description>
      <comments>http://www.sqlskills.com/blogs/conor/CommentView,guid,a568aa12-273f-46fe-bf64-53358257c6fa.aspx</comments>
    </item>
  </channel>
</rss>