(I am still around – I’ve just been busy with a few personal projects of late – keep asking questions if you have them).
I received a question from a reader named Andy that was a follow-up to my questions on OUTER JOINs and WHERE clauses and ON clauses.
Upon re-reading my previous entries, I determined that I got distracted talking about the old-style join syntax instead of giving a good explanation about ON vs. WHERE _semantics_ for outer joins.
select * from cc1 left outer join cc2 on (cc1.col1=cc2.col1)
drop table cc1
create table cc1(col1 int, col2 int)
insert into cc1(col1, col2) values (1, 2)
insert into cc1(col1, col2) values (2, 3)
create table cc2(col1 int, col2 int)
insert into cc2(col1, col2) values (1, 2)
select * from cc1 left outer join cc2 on (cc1.col1=cc2.col1)
col1 col2 col1 col2
----------- ----------- ----------- -----------
1 2 1 2
2 3 NULL NULL
(2 row(s) affected)
select * from cc1 left outer join cc2 ON (1=1) WHERE (cc1.col1=cc2.col1)
col1 col2 col1 col2
----------- ----------- ----------- -----------
1 2 1 2
(1 row(s) affected)
The question boils down to “why are these two queries returning different results?” This is a very good question.
I’ll explain the what, then I’ll try to explain the why/how:
In the first query, the ON clause has the predicate. This logically ties the predicate to the join. Since OUTER JOINS can return rows that do not match, the predicate is used to determine what “doesn’t match”.
In the second case, the join part actually tells the QP to do a full cross product. The WHERE clause is actually not bound to the OUTER join semantics at all. This is why they return different results.
Now let’s talk “why/how”. The QP represents this second query as a tree with a filter above the join. The QP has lots of smarts in it. One of the thing it can determine is that the filter condition (cc1.col1=cc2.col2) actually prevents NULLs from being returned since NULL = anything is UNKNOWN in three-value logic (and therefore not TRUE and therefore not returned from the WHERE clause). Since all NULL values from the non-outer side of the join are removed, this is logically equivalent to running an inner join because all of the extra rows for non-matching rows are actually removed.
select * from cc1 INNER join cc2 ON (1=1) WHERE (cc1.col1=cc2.col1)
If you look at the query plans, you will see that the second query is actually running an inner join because of this recognition in the QP that you don’t need an outer join. (The QP can more freely reorder inner joins, so it prefers to convert outer to inner joins where possible).
I hope that gives a better explanation as to why SQL Server returns different results for this query!
Happy Querying!
Conor