Based on my previous post describing the differences between ANSI-89 and ANSI-92+ join syntaxes and recommendations, I had a follow-up question in a comment which was (paraphrased)
What do I do with non-join WHERE clauses – how should I write those?
Example:
SELECT p.FirstName + ' works in ' + d.DepartmentName AS 'Who Works Where'
FROM Person AS p
JOIN Department AS d
ON p.DepartmentID = d.DepartmentID
AND p.JobType = 'Salaried'
(so, the question is about the filter condition on p.JobType).
Answer: it doesn’t generally matter, but I’d recommend that you put it in the WHERE clause for readability.
I’ll explain the why now. A little background first:
In SQL Server (actually all QPs of any note), the SQL string is converted into an operator tree and then there’s a lot of magic to rewrite that query tree into alternative forms of the query. While some people think of this as “if I try different syntaxes, the query may execute faster”, the optimizer is actually doing something at a much deeper level – it is essentially rewriting the tree using a series of rules about equivalences. It’s a lot more like doing a math proof than trying different syntaxes – it deals with associativity, commutivity, etc. (Conor bats cobwebs out of people’s heads- yes, you guys did this stuff in school). It’s a big set theory algebra machine. So, you put in a tree repesenting the syntax at one end and get a query plan out the other side.
So, I’ll ask the question from a slightly different perspective and answer that too to help explain the “why”:
“Does putting filters in the join condition of an inner join impact the plan choice from the optimizer?”.
Answer: no – at least not in most cases.
When the SQL Server Optimizer starts working on this query, it will do
a number of things very early in the optimization process. One thing is called “simplification”, and most of you can guess what will happen there. One core task in simplication is “predicate pushdown”, where the query is rewritten to push the filter conditions in WHERE clauses towards the tables on which the predicates are defined. This mostly enables index matching later in optimization. It also enables computed column matching.
So, these predicates are pushed down in both cases. You lose a lot in query readability by trying this form of rewrite for very little gain.
There is one case where I’d consider doing this, but it really requires that you have uber knowledge of the QP. However, this seems like a good challenge, so I’ll explain the situation and let you guys write in if you can find an example of it:
You know that the QP uses relational algebra equivalence rules to rewrite a query tree (so A join B is equivalent to B join A, filter .a(a join b) == (select * from a where filter.a) join b, etc.
One could imagine that some of the fancier operators may not fit as easily into the relational algebra rewrite rules. (Or, they are just so complex that the cost of trying such rewrites outweighs the benefit).
Can you find operators where (filter (OPERATOR (SCAN TABLE)) is not equivalent to (OPERATOR (filter (SCAN TABLE))?
Obviously inner join is a bad place to start. I’ll throw out some not-so-random areas for you to try:
* updates
* xml column manipulations
* SELECT list items on objects that change semantics based on how many times they are executed in a query (rand()? think functions)
* play with group by (this one is tricky)
* OVER clause?
* UNION/UNION ALL, INTERSECT, …
So, there are some cases where the QP will do these rewrites, and there are some places where it can’t/won’t (or at least doesn’t do it always). In a few of these cases, the intent of the query can be preserved by manually rewriting the query to push the predicate “down” the query tree towards the source tables. However, I would not recommend this unless you really know what you are doing – the query rewrite needs to be equivalent or else you may not get the right results back from your query!
Bottom line – I think that the query is far more readable with non-join predicates in the WHERE clause. Whenever I try to optimize queries, I usually push them into this format so that I can wrap my head around what the query is trying to accomplish.
Happy querying!
Conor