SQL Server Denali – the new autogrid spatial index

I've been trying out the new Denali "AUTO_GRID" spatial index (that is …USING GEOGRAPHY_AUTO_GRID or GEOMETRY_AUTO_GRID in the CREATE SPATIAL INDEX DDL) on some of my old spatial performance use cases. It's worth repeating (from Ed's excellent whitepaper) that the AUTOGRID index uses 8 levels of tessellation instead of the 4 levels that the original "manual grid" spatial index uses. Let's go on from there. BTW, If you're not familiar with spatial indexes you might want to skip to the last paragraph for the motivation first.

In the manual-grid index, you can specify a grid resolution of high/medium/low for each of the 4 levels. This equates to a grid cell size of 16 (low), 64 (medium), or 256 (high) cells for grid level. I've adopted a convension of calling an index with grid level of GRIDS = (HIGH, HIGH, HIGH, HIGH ), as "HHHH", GRIDS = (MEDIUM, LOW, MEDIUM, HIGH ) as "MLMH", etc.

Using this convention, the new AUTOGRID index is "HLLLLLLL" that is high-res at the first level, low-res at the other 7 levels. You can't specify an 8 level index AND specify grid-res at the same time, the only 8- level index supported is the "AUTOGRID". And you can't get the grid-res of an AUTOGRID via the usual mechanism, sys.spatial_index_tessellations reports NULL for these fields. I got the info from the spatial diagnostic stored procs.

So far, I've tried it on two use cases: point-in-polygon with a zipcode polygon and intersecting polygons (zipcode and congressional district). With the intersecting polygons, a "HHHH" index seems to outperform the autogrid by 17 vs 24 seconds with a "primed" data cache. With the point-in-polygon case, the autogrid beats the "HHHH" index by 14 vs 21 seconds with a primed data cache and 1000 points. I'm also encouraged by the fact that, given multiple spatial indexes, the query optimizer picked the correct (best) one in both cases.

However, I'm less impressed by the spatial diagnostic procedures, which picked the new autogrid as best both times, that is, Primary_Filter_Efficiency and Percentage_Of_Rows_NotSelected_By_Primary_Filter were better for the autogrid spatial index even when its not the better choice. Hmmm…well I can still adjust cells_per_object and try again… And perhaps I used a non-representative query sample in the disgnostic procs for the polygon case.

So what's this autogrid index for anyhow? It's there for software products that can automatically create a single spatial index for each spatial table/column. These products can get a "usually best" and at least "good enough" default spatial index without doing any special analysis of the spatial objects in the table or knowing what spatial queries you're going to perform. Autogrid is supposed to be often the best choice and certainly never a bad choice (as the old default of "MMMM" sometimes was. And if you can find a better spatial index for your specific queries, you're still allowed to use it, of course. Nice feature.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.