In this post, I'll go over a few things about IN plans in the SQL Server Query Optimizer. Currently I'm using the SQL Server 2008 November CTP, but this behavior is still valid on SQL 2005.
I'll be using the following script for my discussion, and you can try this on your own.
create database in1
use in1
create table t1(col1 int identity, col2 int default rand()*100000, col3 binary(2000))
declare @i int
set @i=0
while @i < 10000
begin
insert into t1 default values
set @i=@i+1
end
create table t2(col1 int identity, col2 binary(200))
-- plan 1
select * from t1 where col2 in (1)
-- plan 2
select * from t1 where col2 in (1,2,3,4,5,6,7,8,9,10)
-- plan 3
select * from t1 where col2 in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)
-- plan 4
select * from t1 inner join t2 on t1.col1=t2.col1
where t1.col2 in (
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,
51,52,53,54,55,56,57,58,59,60,61,62,63,64)
-- plan 5
select * from t1 inner join t2 on t1.col1=t2.col1
where t1.col2 in (
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,
51,52,53,54,55,56,57,58,59,60,61,62,63,64,65)
So, we have a database with 2 tables, and I created a bunch of random data in a column so that we can play with IN plans.
ok, plan 1 is "easy". I use IN with a single operator. This happens to turn into a simple plan that looks like this (using set showplan_text to view):
|--Table Scan(OBJECT:([in1].[dbo].[t1]),WHERE:([in1].[dbo].[t1].[col2]=CONVERT_IMPLICIT(int,[@1],0)))
Hrm. hey, the where clause is in the table scan? no separate operator? It's a physical optimization that I'll cover in another post (I may have covered this in my previous blog posts - I'll go check), and it's called "pushing non-SARG predicates" in SQL Server.
Let's compare with the plan for plan 2:
|--Table Scan(OBJECT:([in1].[dbo].[t1]), WHERE:([in1].[dbo].[t1].[col2]=(1) OR [in1].[dbo].[t1].[col2]=(2) OR [in1].[dbo].[t1].[col2]=(3) OR [in1].[dbo].[t1].[col2]=(4) OR [in1].[dbo].[t1].[col2]=(5) OR [in1].[dbo].[t1].[col2]=(6) OR [in1].[dbo].[t1].[col2]=(7) OR [in1].[dbo].[t1].[col2]=(8) OR [in1].[dbo].[t1].[col2]=(9) OR [in1].[dbo].[t1].[col2]=(10)))
Same plan shape, just more conditions.
I wonder if this goes on forever? Well, plan 3 looks different:
|--Filter(WHERE:([in1].[dbo].[t1].[col2]=(1) OR [in1].[dbo].[t1].[col2]=(2) OR [in1].[dbo].[t1].[col2]=(3) OR [in1].[dbo].[t1].[col2]=(4) OR [in1].[dbo].[t1].[col2]=(5) OR [in1].[dbo].[t1].[col2]=(6) OR [in1].[dbo].[t1].[col2]=(7) OR [in1].[dbo].[t1].[col2]=(8) OR [in1].[dbo].[t1].[col2]=(9) OR [in1].[dbo].[t1].[col2]=(10) OR [in1].[dbo].[t1].[col2]=(11) OR [in1].[dbo].[t1].[col2]=(12) OR [in1].[dbo].[t1].[col2]=(13) OR [in1].[dbo].[t1].[col2]=(14) OR [in1].[dbo].[t1].[col2]=(15) OR [in1].[dbo].[t1].[col2]=(16)))
|--Table Scan(OBJECT:([in1].[dbo].[t1]))
Now the filter is split out into another query operator. (It starts happening with 16 predicates)
OK, these 3 plans have been simple query plans. In fact, you can do "set showplan_xml on" and run the query and learn something about how much optimization was done on them:
<ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.0" Build="10.0.1075.23">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementText="select * from t1 where col2 in (1)" StatementId="1" StatementCompId="1" StatementType="SELECT" StatementSubTreeCost="2.48687" StatementEstRows="1" StatementOptmLevel="
TRIVIAL" ParameterizedText="(@1 tinyint)SELECT * FROM [t1] WHERE [col2]=@1">
The end-to-end design of the optimizer is beyond the scope of this post, but there are various stages of optimization and queries can "quit early" if they find a "good enough" plan. In this case, there aren't actually any real cost-based decisions to make about which plan to run, so these plans are called "trivial".
Once you start adding joins into the mix, there are join orders to consider, and those are cost-based. You can see that the optimization level for plan 4 differs:
<StmtSimple StatementText="select * from t1 inner join t2 on t1.col1=t2.col1
where t1.col2 in (
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,
51,52,53,54,55,56,57,58,59,60,61,62,63,64)" StatementId="1" StatementCompId="1" StatementType="SELECT" StatementSubTreeCost="2.70289" StatementEstRows="1" StatementOptmLevel="
FULL">
One would infer that FULL > TRIVIAL ;).
Plan 4:
|--Hash Match(Inner Join, HASH:([in1].[dbo].[t2].[col1])=([in1].[dbo].[t1].[col1]))
|--Table Scan(OBJECT:([in1].[dbo].[t2]))
|--Filter(WHERE:([in1].[dbo].[t1].[col2]=(1) OR [in1].[dbo].[t1].[col2]=(2) OR [in1].[dbo].[t1].[col2]=(3) OR [in1].[dbo].[t1].[col2]=(4) OR [in1].[dbo].[t1].[col2]=(5) OR [in1].[dbo].[t1].[col2]=(6) OR [in1].[dbo].[t1].[col2]=(7) OR [in1].[dbo].[t1].[col2]=(8) OR [in1].[dbo].[t1].[col2]=(9) OR [in1].[dbo].[t1].[col2]=(10) OR [in1].[dbo].[t1].[col2]=(11) OR [in1].[dbo].[t1].[col2]=(12) OR [in1].[dbo].[t1].[col2]=(13) OR [in1].[dbo].[t1].[col2]=(14) OR [in1].[dbo].[t1].[col2]=(15) OR [in1].[dbo].[t1].[col2]=(16) OR [in1].[dbo].[t1].[col2]=(17) OR [in1].[dbo].[t1].[col2]=(18) OR [in1].[dbo].[t1].[col2]=(19) OR [in1].[dbo].[t1].[col2]=(20) OR [in1].[dbo].[t1].[col2]=(21) OR [in1].[dbo].[t1].[col2]=(22) OR [in1].[dbo].[t1].[col2]=(23) OR [in1].[dbo].[t1].[col2]=(24) OR [in1].[dbo].[t1].[col2]=(25) OR [in1].[dbo].[t1].[col2]=(26) OR [in1].[dbo].[t1].[col2]=(27) OR [in1].[dbo].[t1].[col2]=(