Spatial Index Diagnostic Procs – The Rest of the Story

I've devoted the last few blog entries to describing what's in that spatial index analysis proc output and how to use it and interpret the results. Just wanted to describe the rest of the information and how the information relates to what you'd find in a query plan that uses a spatial index.

The procs return:
1. Information about the index
2. Information about the query sample
3. How efficient the index is when used against the query sample

The information you get about the index is:
–  Bounding Box dimensions – BOL says they are NULL for geography but they are always -180, -90, 180, 90 for geography
–  GridLevelSize – 4 levels
–  Cells_Per_Object – maximum tessellated cells per row, specified in DDL
–  Page and Row counts for the spatial index
–  Avg number of cells per base row (how many of the CellsPerObject were actually used)
–  Height, Width, Area of a cell – Geography is always the same, but its nice to know how big the cells are compared to the other levels. Not sure what the unit of measure is for geography (BOL says "depends on SRID"), by looking at the number it doesn't look like it could be meters.
–  CellAreaToBoundingBox percentage – 4 levels
–  Total SRIDs found

The most interesting is likely the info about the number of cells in each level (including level0 – off the bounding box). Any levels that are 0 rows don't appear in the XML output. You get info for:
-Number of Object Cells for Index
-Number of Interior Cells
-Number of Intersecting Cells
-Number of Border Cells
-also some numbers about cells normalized to the leaf grid

The cells are useful in how they relate to the primary filter. The interior, intersecting, and border cells are used by the internal filter.

For the query sample, you get the same cells numbers. No "cells normalized to the leaf" however. BTW, for the geography index ignore the BOL comment that indicates that if Level0 of the query sample is 1 then index is not useable for this query. For geography its always 1.

This relates to the following information that appears in a query plan that uses spatial indexes.
1. All of the query samples (in the spatial predicate) are tessellated using the same parameters (density/max cells)
2. The table is joined against all parent/children of the index (the index levels are actually a hierarchy)
3. Index ranges are calculated
4. A join of the tesselated samples on index is done. This results in the number of rows selected by the primary filter.
5. The internal and secondary filters are applied in a separate query iterator (of type Filter) later on in the plan.

The upshot of this is that, not only is the index dimensions used to tessellate the base table rows, the same dimensions as the index are used to tessellate the query sample for the spatial predicate (query sample, in the sproc). So, for example, I'd always thought that spatial indexes over points should specify all-HIGH density and maybe 2 max cells per object. But if your query sample contains large polygons, those same dimensions will be used against the query sample too. So maybe all-HIGH density for point-based indexes isn't a hard and fast rule.

Hopefully this should give you enough information to investigate your indexes and queries using the spatial analysis stored procs. As far as patterns for "which specifications of index is best", more patterns will show up as I investigate more different patterns of spatial data. For now, I'm dividing these into three main categories:

1. Spatial index over table of points, Query sample is polygon.
2. Spatial index over table of polygons, Query sample is point.
3. Spatial index over table of polygons, Query sample is polygon.

With the simplifying premise that linestrings are thin like a point (ie they won't fill a cell), but if you put a STBuffer around them, they're more like polygons.

Happy index hunting.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.