There are two slightly different ways of representing non-existing spatial data, the "empty geometry" concept and database NULL. You can read about the subtle differences between them here. If you're using a spatial index, either of these types have the ability to cause query perforamance problems.

Let's take, as an example, a table with 10000 rows, all of which contain 'POINT EMPTY'. If we put a spatial index on those rows, we will have 10000 rows in our spatial index all with a cellid value of 0 and a SRID (in the index information) of 0. This is true even if you are using the geography data type where STSrid for the data (when using POINT EMPTY) actually returns 4326.

When you're using the spatial index diagnostic stored procedures, these index entries show up as "ObjectCells in Level 0 for Index"; the diagnosic also report 2 SRIDs in the index data when using the geography index. This can have a big effect on your query plan.

Suppose that I have 10000 empty values (either 'POINT EMPTY' or NULL) in a table called "mypoints" and a query like this:

DECLARE @g geography;
SELECT @g = shapegeog FROM zipcodes where id = '97225'
SELECT * FROM mypoints WHERE mypoints.shapegeog.STIntersects(@g)=1

If the query plan uses the spatial index, you'll get a "candidate hit" on each of the empty points during the spatial index seek iterator. This is because the row for 'POINT EMPTY' is technically outside the index bounds (level0 cell) and the spatial method will have to be evaluated later in the query. Having 10000 additional index seeks on the *base* table, to fetch the geography column in order to evaluate the spatial method, will slow things down. You'd think it might be useful to set the geography to NULL and redo the query like this:

SELECT * from mypoints
WHERE mypoints.shapegeog IS NOT NULL AND mypoints.shapegeog.STIntersects(@g)=1

This will still get the 10000 additional candidate hits, and in my limited tests, I still get 10000 additional index seeks on the base table; the IS NOT NULL predicate is manifested as a separate filter iterator.

A workaround for this is to set the "empty" points to a valid value far outside your search areas (say 'POINT(0 90)'). Unless you're doing point-in-polygon computations where the North Pole is a valid choice (in that case, you're answers will be wrong because they'll include your placeholders) your queries will execute much faster.

This workaround can cause big problems when using other methods (like STDistance for example) because now you have a real (but nonsensical) value, so be careful of the complications. You can use CASE statements to substitute NULL values, for example,

-- If I used the North Pole to speed up point-in-polygon queries
declare @p geography = 'POINT(0 0)'
select
 case
   when (shapegeog.STEquals('POINT(0 90)')=1) THEN NULL
   else shapegeog.STDistance(@p)
  end
from mypoints

I'd also thought of using NULLIF to substitute NULL for my placeholder values in spatial operations that don't need to use the spatial index, but this won't work because instances of the spatial data types aren't comparable except when using STEquals. So NULLIF and spatial types causes an error.

Categories:
SQL Server Spatial

I've found out that a few of my talks have been accepted for TechEd 2009 Europe, which will take place this year in Berlin, Germany, on Nov 9-13. I'll be doing some SQL Server performance-related talks. If you're in Berlin, stop by and say hi, it will be nice to see everybody again.

Categories:

I'm speaking at the Portland SQL Server User Group meeting this month about What's New in the SQL Server World. I'll be talking about upcoming new features in SQL Server 2008 R2 and upcoming features and products (like SQL Server Azure and Entity Framework 4.0) that could affect the way that you work with SQL Server in the future. It's an overview out of necessity, but I'll do as many concrete examples and demos as time allows (demand-based), as well as try and leave time for discussion on the where you think things are headed. So, if you're into SQL Server Administration, Development, Business Intelligence, or Solutions Architecture, there should be something for everyone. See you on the 24th.

Categories:

People are sometimes concerned with the lifetime of temporary tables they create in CLR code. As an example, if I use a parameterized statement in .NET code, the SqlClient provider wraps the statement in exec sp_executesql. If the statement batch includes something like "CREATE TABLE #temp1" as part of the batch, the table #temp1 is no longer visible when the batch returns. This is because the lifetime of a temporary table created inside of a stored procedure is the stored procedure itself, ie. the temporary table no longer exists once the stored procedure completes. In this case, the stored procedure that's scoping the temp table is "sp_executesql". But... if proc A creates a temporary table and then (inside proc A) calls proc B that reads it, that's fine. Because nested procs can see "temp" tables created within outer scopes.

But what about procs written in SQLCLR? If you're using the context connection in your SQLCLR proc, visibility in the inner proc works fine, regardless of whether:
procs A and B are written in T-SQL
procs A and B are written in SQLCLR
proc A is SQLCLR and B is T-SQL
proc A is T-SQL and B is SQLCLR

Hope that clarifies things...

Categories:
SQLCLR

Theme design by Nukeation based on Jelle Druyts