ON vs. WHERE – where should you put join conditions?

I had a request from a reader that I’ll answer today about when to do
joins in the ON clause and when to do them in the WHERE clause. For
example:

SELECT * FROM A, B WHERE A.a = B.b

vs.

SELECT * FROM A INNER JOIN B ON (A.a = B.b)

The short answer is that both are the same (at least for inner joins), but I prefer and encourage you to use the latter format (and I will explain why).

Earlier versions of ANSI SQL did not contain the ON clause for join conditions – it used the where clause for everything. This was fine for inner joins, but as database applications started using outer joins, there were problems that arose with this approach. Some of you may remember the original ANSI-89-era syntax for *= and =*. These were used on the predicates to define the behavior for an outer join. In this case, we’ll preserve non-matching rows from A in addition to the normal rows returned from a join:

SELECT * FROM A, B where A.a *= B.b
which is equvalent to:
SELECT * FROM A LEFT OUTER JOIN B on (A.a = B.b)

This “hack” worked fine until people started using multiple predicates for things and also started doing multiple outer joins in one query. Then we were left with a big, ambiguous mess about which *=, =* applied to which join. So, ANSI banished *= and =* and SQL Server has been threatening to follow for quite sometime. I honestly never use the old-style join syntax, so I don’t even recall the exact deprecation state. It is dead to me already ;).

The broader concept is that predicates are “attached” to joins using the ON clause. This is very helpful when you are trying to figure out what should happen in a query. It helps semantically define the set of rows that should return from the join.

So, if I start nesting various inner and outer joins in a big, nasty query, all of a sudden it is very nice to have an ON clause to define what should go where.

SELECT * FROM A INNER JOIN (SELECT B.* FROM B LEFT OUTER JOIN C ON (B.col1=C.col1 and B.foo.C.bar)) AS I1 ON A.col1=I1.col1;

As applications get more complex, it is not uncommon to have 10s of tables in a query.

Internal to the SQL Server query processor (actually pretty much all query processors), there is a tree format for the query operations. The exact representation will vary from vendor to vendor, but each one of these SQL syntax pieces transates to some set of relational operators in this query tree. Putting your query syntactically into this format gets things much closer to the internal algebra of the query processor in addition to making things easier to read as queries get more complex.

Actually, if I were to go build my own QP, I’d seriously consider adding a query tree mechanism in addition to SQL (this concept is not new and is not mine). OLEDB had a concept like this in the earlier public betas, for example. Obviously the implementor would want to retain the ability to change the internal implementation, but a tree of commands is actually far easier to grok than SQL, once you get used to the idea. Other technologies expose a graph structure to you (video codecs/transforms in windows, msbuild is an XML file representing a tree, etc). SQL as a textual language exists historically. It’s also a nice way to write queries :).

The only other area where I get concerned is when people turn off ANSI_NULLs. It is one of those historical features that should basically never be used. I could imagine cases where some comparisons in joins behave differently in the ON clause vs. afterwards in an WHERE clause. I don’t want to pollute people’s minds, as my attempts to go back and re-learn the quirks on this for this post left me baffled since NULL=NULL returns TRUE only for some syntax constructs. So, I don’t have a case where it is broken, but I’ll leave you with the “ANSI_NULLs off is bad” message and list it as a potential reason.

Will you get wrong results if you use the old-style join syntax? no. The world will still turn. So, this is really a recommendation based on style and sanity. I would recommend that you get used to the newer style – it may help you write more powerful applications and think more like the QP. For some applications, this might let you write more powerful features for your users.

Thanks,

Conor Cunningham

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.

ON vs. WHERE – where should you put join conditions?

Other articles

New blog location

My blog will be moving

What’s Conor been doing?

A nice MSDN article on SQL 2008 Data Warehousing

The Trouble with Triggers

Outer joins and ON clauses vs. WHERE clauses

Imagine feeling confident enough to handle whatever your database throws at you.