Spatial Index Diagnostic Procs – Filters

The spatial index diagnostic procs' most enlightening pieces of information have to do with the filter counts and efficiencies reported at the end of the set of columns or XML document output. But what are these filters exactly and what do the results mean?

The proc output refers to three filters:
1. Primary Filter
2. Internal Filter
3. Secondary Filter

Remember that the spatial indexes are based on tessellation or tiling. Issac has some nice diagrams that show how the tiling works in theory. Please review his indexing posts before reading these and when/if you get confused by the terminology. Depending on the size and type of the spatial object, the object may completely cover a tile (think big polygon), partially cover (intersect) a tile (think point) or not cover a tile. It can also hit the border of a tile. The tiles are referred to in the spatial procs as cells.

Remember also that the point of a spatial index is to reduce the number of Intersect operations (or other operations covered by the spatial index) actually performed by using tiling. I liked Isaac's analogy that "if I want to find roads that intersect Madison WI, I only have to look at roads that intersect Wisconsin". So the index (primary filter) selects all roads that intersect Wisconsin. It can get false positives (roads that intersect Wisconsin but not Madison). The "Secondary Filter" is this case is the Intersects operation itself.

So what's the "internal filter"?  If you think of the primary filter as filtering out 100 polygons out of 10000, they'd still have to run Intersects on the 100 polygons to actually see if they qualify. But say that the index and query sample both *completely* cover a cell. Or index completely covers the cell and query sample touches it. We know that the cell qualifies for Intersect in that case, we don't need to run Intersect on the query sample for that combination. That's the Internal Filter.

So let's look at some sample output. In this case the index was over a geography column that contains only points. The query sample is a polygon. 

<Base_Table_Rows>511650</Base_Table_Rows>
<Number_Of_Rows_Selected_By_Primary_Filter>2931</Number_Of_Rows_Selected_By_Primary_Filter>
<Number_Of_Rows_Selected_By_Internal_Filter>2666</Number_Of_Rows_Selected_By_Internal_Filter>
<Number_Of_Times_Secondary_Filter_Is_Called>265</Number_Of_Times_Secondary_Filter_Is_Called>
<Number_Of_Rows_Output>2779</Number_Of_Rows_Output>

So: 511650 rows in the table. Using the index (primary filter) eliminates all but 2931 of them. The internal filter determines that 2666 are true hits, not false positive. They have to run Intersect 265 times (2931 – 2666) times. And the number of rows returned is 2779, so there were some false positives. Good thing they ran Intersect. BTW. Using the method Filter() instead of Intersect() just means they're returning all the rows selected by primary filter (2931).

Finally, remember if you're using a geography index, there is no bounding box, so no part of the rows or query sample can be outside the index. With geometry, some of the rows or query sample can be outside the bounding box. This means that with geography, Rows_Selected_By_Primary_Filter >= Number_Of_Rows_Output. With geometry, Number_Of_Rows_Output may be > Rows_Selected_By_Primary_Filter as well.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.