Filtered Indexes Make Me Drool…

(I got CTP 6 running on another machine, so I’m working except for parallel plans now since it’s only a single-proc. You guys will have to wait on that operator of the day article for now 🙂

Kudos to my former teammembers for getting filtered indexes into CTP6. It’s pretty darn neat, and I’ll show you a few tidbits that are interesting.

So, filtered indexes are not really a new “feature” in some sense. You can do everything in a filtered index with an indexed view. Of course, indexed view matching is only supported in the Enterprise edition of the software, so perhaps not everyone has seen those benefits.

Indexed views are a tricky feature – generalized tree pattern matching is hard (read: CPU expensive), and if you’ve looked in the BOL the list of restrictions is so long that reminds me of filling out tax forms. However, the other side effect that occurs when you have indexed views is that the optimizer has to go through additional phases of its search in order to apply them. The optimizer has a couple of buckets of rules it can run, and most plans find a nice plan in the first or second round of rules. Generalized indexed view matching is restricted to the last set of buckets, which usually means that there are a lot more rules that have to run before that view can be matched. The bottom line is that the compilation cost isn’t bad for a single query (usually), but it’s the sort of thing that can make the difference between an application that can be used in UI response-time requirements or not.

Enter the filtered index. This is a recognition that there are a lot of indexed views that don’t need tons and tons of fancy equations, joins, etc. These are single-tabled views that have simple predicates. Once you get into this ballpark, you can bolt this on to the super-efficient index matching code and you can enable a whole new class of application from what you could build previously. This is why I am excited about this feature.

I haven’t looked to see where this feature will fall into the SKU matrix yet, and I’m sure that they’re still pondering that very question. However, you guys should play with this on CTP6 – it’s nifty!

So, first things first. Let’s build one of these guys and see if we can match it:

create database t1
use t1
drop table t1
create table t1(col1 int, col2 nchar(2000), col3 time)

create index i1 on t1(col1) where col1 > 5 and col1 < 20
declare @p int 
set @p =0
while @p < 20000
begin
insert into t1(col1) values (rand()*10000)
set @p=@p+1
end

ok so I’ve filled this table up with a lot of useless data and created a nice little filtered index.

select * from t1  where col1 > 5 and col1 < 20

|–Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1000], [Expr1007]) OPTIMIZED WITH UNORDERED PREFETCH)
       |–Compute Scalar(DEFINE:([Expr1006]=BmkToPage([Bmk1000])))
       |    |–Index Scan(OBJECT:([t1].[dbo].[t1].[i1]))
       |–RID Lookup(OBJECT:([t1].[dbo].[t1]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD)

select * from t1  where col1 > 5 and col1 < 10

|–Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1000]) OPTIMIZED)
       |–Compute Scalar(DEFINE:([Expr1006]=BmkToPage([Bmk1000])))
       |    |–Index Seek(OBJECT:([t1].[dbo].[t1].[i1]), SEEK:([t1].[dbo].[t1].[col1] < (10)) ORDERED FORWARD)
       |–RID Lookup(OBJECT:([t1].[dbo].[t1]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD)

so in the first example I’ve created a query against the table that directly matches the index condition. It matches and even generates an index scan (slightly faster since it doesn’t have to navigate down the b-tree to start returning rows).

The second example is interesting because is a proper subset of the index. The indexed view matching code can do subsumption as well. It generates a seek in this case and returns only part of the filtered index.

Both cases do RID lookups back to the heap since I did SELECT * and the index is not “covering”.

The other data point I’ll leave you with this evening is that the showplan_xml has something interesting in it:

StmtSimple StatementText=”select * from t1 where col1 > 5 and col1 < 10“ StatementId=”1“
StatementCompId=”1“ StatementType=”SELECT“ StatementSubTreeCost=”0.0474183“
StatementEstRows=”7“ StatementOptmLevel=”FULL“
StatementOptmEarlyAbortReason=”GoodEnoughPlanFound“
ParameterizedText=”(@1 tinyint,@2 tinyint)SELECT
* FROM [t1] WHERE [col1]>@1 AND [col1]<@2“>

Well, there are two interesting things. First, StatementOptmLevel=FULL means that we didn’t get a trivial plan. There is a parameterized text field, but I don’t think that is being used here. these two queries are showing up as different plans in the plan cache (and obviously they have different plans). So, absent forced autoparam, I don’t think that trivial plan is working on these guys. This makes sense since there is a cost-based plan choice.

That’s it for me tonight guys. Happy Querying!

Conor

PS: Thanks for all of the comments and posts trying to help me get back to a working state. I’ve tried many of the suggestions without luck, but I’m going to try playing with the orca tool, perhaps tomorrow or so, to see if I can get my main machine back up and working.

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.

Filtered Indexes Make Me Drool…

Other articles

New blog location

My blog will be moving

What’s Conor been doing?

A nice MSDN article on SQL 2008 Data Warehousing

The Trouble with Triggers

Outer joins and ON clauses vs. WHERE clauses

Imagine feeling confident enough to handle whatever your database throws at you.