Spatial Index Diagnostic Procs – How to specify query sample

A little more about the query sample that gets fed into the spatial index procs.

Query sample is a singleton geometry or geography. It's not a query. So it's not as straightforward as "here's a spatial query that could use an index, run this query and show me the figures". So this procedure is not going to work any differently for a query that use STIntersects vs STDistance vs… It's going to show you how an index on table A would be utilized given a single operation.

It maps most closely when your query is something like:
select … from tablea a where a.geog.STIntersects(@g)=1

Is this case @g is your query sample, and Number_Of_Output_Rows is the number of rows the query returns. But what if you have a conditional query or query that has more than one row as output, like

select …
from tablea a,
     tableb b
where a.id = @someid and b.geog.STIntersects(a.geog)=1

In this case you could specify your query sample like this:
DECLARE @my_query_sample geography = (select geog from tablea a where a.id = @someid)

…And you're looking to an analyze usage on an geography index on tableb. But this will cause an error if "select geog from tablea a where a.id = @someid" returns more than one row. How do I use the sprocs in this case? Immediately you think:
DECLARE @my_query_sample geography;
SELECT @my_query_sample = geog from tablea a where a.id = @someid

or even:
select @my_query_sample = a.geog
from tablea a,
     tableb b
where a.id = @someid and b.geog.STIntersects(a.geog)=1

Let's analyze this way of specifying query sample. If your query sample looks like it's going to represent more than one row, note that the spatial index is always use in (at least I've only seen it used in) a SEEK. So if your query window-producing query represents N rows, you need to choose one row. Hopefully a representative row. This would work if the "top(1) without an order by" row was representative:

DECLARE @my_query_sample geography = (select top(1) geog from tablea a where a.id = @someid)

And realize that you're going to have to do this operation (repsented by the output) N times. Let's make this (hopefully) more clear.

select a.zipcode, b.point
from zipcodes a
     points b
where a.zipcode IN (select zipcode from zipcodes where state = 'OR')
and a.geog.STIntersects(b.geog)

There are 430 zipcodes in Oregon. There are 1000 points in your table. You have an index on zipcodes and one on points. Remember that STIntersects is commutative (a.geog.STIntersects(b.geog) = b.geog.STIntersects(a.geog)) which index is "better"? Notice that the query returns 1 column from zipcodes and 1 column from points, so you can't just STUnion all of the Oregon zipcodes together. You'd get the same points that way, but have no idea which zipcode goes with which point. So choices are:

1. Representative query sample for zipcodes in Oregon, analyze point index, realize that you're going to do this "query" 430 times.
DECLARE @my_query_sample geography = (select top(1) geog from zipcodes a
 where a.zipcode IN (select zipcode from zipcodes where state = 'OR') )
2. Representative query sample for point, analyze zipcode index, realize that you're going to do this "query" 1000 times.
DECLARE @my_query_sample geography = (select top(1) geog from points)

Be aware that doing 430 or 1000 seeks against the spatial index might "scare off" the query optimizer from choosing that plan. You could also declare you're query sample like this:

DECLARE @geog GEOGRAPHY
select @geog = a.geog — or @geog = b.geog
from zipcodes a
     points b
where a.zipcode IN (select zipcode from zipcodes where state = 'OR')
and a.geog.STIntersects(b.geog)

But that's still going to give you one representative combination point or zipcode as a query sample (ie @geog is still a singleton). If all of your points are in Oregon, it's going to produce a similar analytic output, and if all your points are not in Oregon, it might be better to specify it that way. But then you would have to know how many points in Oregon to deduce how many seeks. And you can't check the ouput against "Number_Of_Rows_Returned" because you don't know which row it actually chose.

If you are just needing points, not zipcode-point combinations you could STUnion together all the zipcodes in Oregon and run the query that way. The spatial analysis procs give you a 1-1 mapping against that query. Or use a state table rather than a zipcode table. Of course this is predicated on the fact that zipcodes don't overlap multiple states.

Hope this answers the question, "what do I use as a query sample to troubleshoot query X and index Y?".

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.