Does everybody get that? (Spatial Index Reprise)

This post is an example of the "bring your own problem" approach in the classroom, inspired by upcoming SQLskills Immersion Events in August. See previous posting for the reason behind the blog post title.

I believe it was John Lennon who wrote: "You say you got a real solution. Well, you know, We'd all love to see the plan". I'm certain he wasn't referring to the query plans I'd "love to see" when I am diagnosing a problem, but that's the cognitive overlap that happens when you're listening to tunes while troubleshooting. And sometimes just "seeing the (estimated) plan" isn't enough.

I've been working with spatial data and spatial indexing for quite a while now. And have already published a blog entry on how to ensure your spatial index is used. But, since then, I've answered countless questions in classes, at conferences, and from clients about why the spatial index isn't being used without a hint.

The top two things to be sure of for starters are:
 1. Make sure you are on at least SQL Server 2008 SP1 to
 2. Make sure the query processor can sniff your spatial parameter values (i.e. use sproc or sp_executesql with spatial parameter passed in)

Here's two of more curious ones I've gotten lately. First one is client is using 4 processor machine. All is going well, but they figure it will be faster with a 24-proc machine. The spatial queries go slower in testing, by orders of magnitude. Turns out the 4-proc was using spatial index. With one tester on the 24-proc, the optimizer decided it was best to use a highly parallelised non-spatial-index plan. Setting maxdop down (or using a hint) was a temporary fix. The optimizer algorithm was changed along the way (2008 R2 SP1, IIRC). But, to think of this problem another way, unless you have an entire 24-proc machine to yourself, the actual number of processors available won't usually be equal to the total number of processors on the machine. In fact, on a loaded system, you may only get one or two processors at runtime. So setting maxdop down so the spatial index will be used or hinting isn't a bad idea. BTW, the spatial index plans can and do use parallelism for most or all of the plan if you have enough available processors.

The optimizer is superb at "reasoning" over most relational queries and indexes, but tends to underestimate the spatial index, which can have a dramatic effect when used. I think this happens for two reasons, first is an underestimation of the expense of STIntersects(), especially when one of the spatial operands is a complex polygon. Second is overhead of doing a seek into the base table for each candidate row, which is a side-effect of using the spatial index. So also choosing the correct spatial index parameters is crucial for cutting down on the number of seeks. Second one was that a query in this form: …WHERE spatialcol.STDistance(@spatialparm)*(conversionfactor) < number. This not only wouldn't use the spatial index, you couldn't even hint it. The reason for this is the spatial index is only usable for certain forms, using …WHERE spatialcol.STDistance(@spatialparm)= number/conversionfactor made using the spatial index viable.

So, to wrap up, if your spatial query does not use the spatial index, or seems to run too long, try hinting the spatial index first. If you get an error, you're not using a "spatial index viable" predicate. Once you've got the query "hintable", then you can work having the optimizer choose the spatial index. And compare clock-on-the-wall time with and without the spatial index. Not the estimated query plan.

Does everybobdy get that?

@bobbeauch

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.