A short while ago, I wrote a blog entry about SQL Server 2012 features that are currently available in SQL Azure. In that post, I mentioned that Ed Katibah would publish an article detailing the current state of the SQL Server 2012 spatial features in SQL Azure, in detail.

That article is now available at the Technet Wiki. Note that not all features are available currently, and Ed has a nice detailed chart that covers this feature-by-feature. Thanks, Ed! And, as with the SQL features in SQL Azure, the available spatial features will increase in future as parity with SQL Server 2012 RTM is acheived. Watch this space.

@bobbeauch

Since SQL Server Denali adds support for index compression of spatial indexes, using the same options as data and index compression in SQL Server 2008 and above. As opposed to what I expected at first (because spatial indexes consist of only a few small fixed-length columns), using compression with spatial indexes appears to be quite effective. With my cursory tests, I obtained between 25-40% reduction in size with ROW compression and 50-90% reduction in size with PAGE compression. Queries that use a compressed spatial index execute in about the same (in some cases, less) CPU and Elapsed time as the same queries using uncompressed spatial indexes. As spatial indexes can be quite large depending on the MAX_CELLS_PER_OBJECT and density settings, and the makeup and size of the spatial data set being indexed, compression of spatial indexes (which is restricted to SQL Server Enterprise and above editions) is a good space-saving choice.

@bobbeauch

Saw a request on the SQL Server spatial forum this morning for routines that would decompose a geography type to a set of Lat/Longs. A while ago, I was experiemnting with spatial decomposition, and happened to cruft up a couple of SQLCLR table-valued functions to accomplish this. Although the same functions could be written in T-SQL, it's better to use SQLCLR for this, just for speed's sake. Although these function aren't likely production quality (and I think I only tested these on polygons), I figured it would be better to post them here now than to wait until I had time, and answer each forum query with "write me mail" in the meantime. I did send these out a few times privitely and almost a year ago, and Matt Jones did port the code to use Geometry rather than Geography and did send me the changes (thanks Matt). I've enclosed these files (these are the WClassX.cs files) as well, although they're not hooked up to the project. Suffixed each TVF with "Geom" so they do have some rather odd names. If you do make any changes or enhancements to these in future, please send them back.

Enjoy!

@bobbeauch

SpatialDecomposition.zip (21.23 kb)

Categories:
SQL Server Spatial | SQLCLR

I've been trying this out every day or so since I'd heard about the update (originally named the July 2011 Service Release), but I'd forgotten about it for about a week. So I don't exactly know when this happened on my SQL Azure server. But tonight, around midnight, SELECT @@version returns:

Microsoft SQL Azure (RTM) - 11.0.1465.26 Aug 10 2011 22:54:49 Copyright (c) Microsoft Corporation

So cool, we're at Denali on SQL Azure. Or are we just on Denali spatial (which was the new feature that was mentioned in the announcement). Or, where are we?

First to try some T-SQL Denali features.
  The new THROW statement works, as does OFFSET and FETCH, but...
  Creating a sequence fails, as does LAG/LEAD and EXECUTE WITH RESULT SETS
 
How about spatial? Not trying to be tremendously rigorous for now, I tested some obvious things.
  The new HasM and HasZ properties work, as does the new ShortestLineTo method. Probably some other methods (like the new ST-methods on the geography type) do too.
  The items that were caught by sys.dm_db_objects_impacted_on_version_change WERE affected by the change.
  A test of the new spatial precision indicates that the Denali improved precision is being used but...
  Cannot create a FULLGLOBE type or a CIRCULARSTRING type
  The new AUTO_GRID indexes are not supported
  Attempting to use geometry::UnionAggregate produces "A severe error occurred on the current command. The results, if any, should be discarded."

And BTW, both SSMS from SQL Server Denali and SQL Server 2008 R2 SP1 connect to the updated SQL Azure version just fine. I don't have any earlier SSMS versions to try right now.

Hmmm...well that's kind of a mixed bag now, isn't it? I know we're not in Kansas anymore, but we're not all in Denali either. I'm sure someone will come up with a feature matrix for this SQL Azure version (or I will), but there's enough of a subset of new features to make it interesting. More info as it arrives.

@bobbeauch

In looking at the what's new for SQL Azure (May 2011) page, I came across the following: "Upcoming Increased Precision of Spatial Types: For the next major service release, some intrinsic functions will change and SQL Azure will support increased precision of Spatial Types."

There's a few interesting things about this announcement.

Firstly, the increased precision for spatial types is not a SQL Server 2008 R2 feature. It's a Denali CTP1 feature. Although the article doesn't indicate whether they've made up a special "pre-Denali" version of this feature, or when exactly "the next major service release" will be (and when SQL Server Denali will be released is unknown), it would be interesting if updated SQL Server spatial functionality made its appearance in SQL Azure *before* making its appearence in an on-premise release of SQL Server. As far as I know, this will be the first time a new, non-deprecation feature is deployed in the cloud before on-premise (non-deprecation because, for example, the COMPUTE BY clause fails in SQL Azure but not in any on-premise RTM release of SQL Server). Note that usage of SQL Server "opaque" features (for example, are instances managed internally be a variant of the Utility Control Point concept?) cannot be determined.

In addition, this may be the first "impactful change" (BOL doesn't say breaking change, but change with a possible impact, but one never knows what the impact would be in other folks' applications) in SQL Azure Database. The BOL entry continues "This will have an impact on persisted computed columns as well as any index or constraint defined in terms of the persisted computed column. With this service release SQL Azure provides a view to help determine objects that will be impacted by the change. Query sys.dm_db_objects_impacted_on_version_change (SQL Azure Database) in each database to determine impacted objects for that database."

Here's a couple of object definitions that will populate this DMV:

create table spatial_test (
 id int identity primary key,
 geog geography,
 area as geog.STArea() PERSISTED,
);

-- one row, class_desc INDEX, for the clusted index
select * from sys.dm_db_objects_impacted_on_version_change

ALTER TABLE spatial_test
ADD CONSTRAINT check_area CHECK (area > 50);

-- two more rows, class_desc OBJECT_OR_COLUMN, for the constraint object
select * from sys.dm_db_objects_impacted_on_version_change


Before this, the SQL Azure Database koan was "Changes are always backward-compatible". There is now the sys.dm_db_objects_impacted_on_version_change DMV and the BOL page for it even provides sample DDL to handle the impacted objects. But this begs the question: I can run the DMV to determine objects that would be impacted and fix them when the change occurs, but if I don't know when the SU will be released, how can I plan/stage my app change to corespond? Interesting times ahead...

@bobbeauch

I'd been fretting over the lack of a true visualizer for the new circularstring and compoundcurve spatial types in SQL Server Denali. I'd installed SSMS from SQL Server 2008 R2, and although it works just fine for Denali spatial types (even though the binary format has changed since 2008/R2), trying to use it with either of the new "curve" types produces an empty graph pane. Because of this lack of visualization, I also wondered how these types would ever work with map websites/controls that only support Points, LineStrings, and Polygons.

Silly me. When I mentioned this to Ed Katibah (aka Spatial Ed), he pointed out that this was exactly the purpose of STCurveToLine(). I thought this method would only produce LineStrings that had as many vertices as the curve, resulting in a "square" rendition of a circle, for example. But no... this routine "densifies" the linestring, resulting in a linestring with more points, that really looks like a curve/circle. Nice! If this doesn't look circular enough (or is too many points) you can always use CurveToLineWithTolerance() instead of STCurveToLine(). Thanks, Ed! Here's a short example:

declare @g geometry = 'CIRCULARSTRING(0 2, 2 0, 4 2, 2 4, 0 2)' -- here's a circle
select @g.STIsValid() -- ensure it's valid
select @g.STNumPoints() -- 5 points in the definition

-- show in the SSMS 2008/R2 "spatial results tab"
select @g.STCurveToLine()
select @g.CurveToLineWithTolerance(0.01,0)

-- circle densification
select @g.STCurveToLine().STNumPoints(), -- 65 points
         @g.CurveToLineWithTolerance(0.01,0).STNumPoints() -- 33 points

@bobbeauch

One of the interesting spatial features in SQL Server Denali is the inclusion of spatial aggregates, namely CollectionAggregate, ConvexHullAggregate, EnvelopeAggregate, and UnionAggregate. Here's an example of unioning two squares together to make a rectangle.

create table t1 (g geometry)

insert t1 values('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))')
insert t1 values('POLYGON((1 0, 2 0, 2 1, 1 1, 1 0))')

select geometry::UnionAggregate(g) from t1

Beside being useful, they are interesting because its the first use of SQLCLR-based aggregates (that I'm aware of) as part of the SQL Server database engine code itself.

The what's new spatial whitepaper has an interesting comment about the spatial aggregates: "The new aggregates are exposed in SQL Server only and are not exposed in the underlying spatial library". I asked (spatial) Ed about this and pointed out that you can use the spatial aggregates (or any .NET-based user-defined aggregate, for that matter) on the client as well as the server. You just need to find the correct class in the library in the spatial library, which is public (and hopefully it will stay public in the released version). Here's the same aggregation in client-side code.

//collection of SqlGeometry
List<SqlGeometry> glist = new List<SqlGeometry>();
glist.Add(SqlGeometry.Parse("POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))"));
glist.Add(SqlGeometry.Parse("POLYGON((1 0, 2 0, 2 1, 1 1, 1 0))"));

GeometryUnionAggregate agg = new GeometryUnionAggregate();
// call the appropriate methods on the aggregate
agg.Init();
foreach (SqlGeometry geom in glist)
    agg.Accumulate(geom);
SqlGeometry theanswer = agg.Terminate();
Console.WriteLine("answer is {0}", theanswer);

The only strange thing is that, in T-SQL, the aggregate appears to be a static property on the geometry class (you use geometry::UnionAggregate to invoke it in T-SQL). But using reflection on the SqlGeometry class in the library reveals no such public (or private) property. But that's fine; you CAN use the spatial aggregates directly; you just need to know the name of the aggregate class. Conceivably, you could even parallelize the client-side aggregation and call Merge() at the appropriate time.

@bobbeauch

 

The nearest neighbor query is one of the most common queries against spatial data. How many people haven't gone to a mapping app, typed in their current location and asked for the 10 nearest [your favorite thing goes here]? The obvious way to phrase the spatial query for this, given an STDistance method on the SqlGeometry data type, would be:

DECLARE @me = 'POINT (-121.626 47.8315)'
SELECT TOP(10) Location, Description, Location.STDistance(@me) AS distance
FROM [spatial_table]
ORDER BY distance

Unfortunately (for your performance, it does return the correct answer) in SQL Server 2008/R2, this query doesn't use the spatial index. And attempting to hint the spatial index results in the "could not produce a plan with your hints" error. Oh.

In SQL Server Denali, this just works. You do have to include a predicate that refers to STDistance, like:

...FROM [spatial_table]
WHERE Location.STDistance(@me) < 1000 -- 'where-distance' query
ORDER BY distance

but the predicate can also be something as unintrusive (ie you don't have to commit to a max distance) as:

...FROM [spatial_table]
WHERE Location.STDistance(@me) IS NOT NULL -- 'where is not null' query
ORDER BY distance

This uses the spatial index without hinting. And its way faster than the 2008 version of the same query. Excellent. But...when the "nearest neighbor" question came up once too often in SQL Server 2008, Isaac Kunen came up with an alternate algorithm that uses a numbers table.  And its quite fast, although you do have to hint the spatial index (or, let's say, I've never got it to use the index without hinting). So how does this algorithm do against the new, automatic spatial index use plan?

In the simple example where I tried this out, here's the plan cost:
Isaac's algorithm and hint:             0.517
Denali with WHERE and distance:  4.06683
Denali with WHERE..IS NOT NULL: 14.7

But the differences in query plans that involve spatial don't always seem to the "revelent" (the QO does sometimes underestimate the big difference using the spatial index will make). All the queries are subsecond, compared to 10 seconds or so when not using the index. So, any other comparison? How about I/O or worker time?

Issac's algorithm and hint:   1x worker time/reads/clrtime
Denali with wHERE and distance: 10x worker time/reads/clrtime
Denali with WHERE..IS NOT NULL:  3x worker time/reads/clrtime

This is interesting due to the fact that the plan cost was much less for the where-distance query vs. where-isnotnull query, but the resource cost is exactly the opposite. And more interesting is the fact that Isaac's numbers table-based algorithm wins both ways.

Caveats: The obvious big caveat is that this is only a single test case, albeit a fairly common one (both "@me" and the points-of-interest are points). YMMV. Second, none of the queries I used, run in SSMS, allow for a proper cardinality estimate at plan creation time (aka parameter sniffing). But attempting to use sp_executesql or a sproc so that it could be estimated, causes the Denali "where-distance" query to loop endlessly. Bug reported; it is afterall, CTP1. And sp_executesql with the "where-isnotnull" case, doesn't do much better, still worse than "numbers table". Third: Isaac's original algorithm is fragile in some edge cases (blank spatial table/no qualifying points, IIRC), Ed and I handled these cases in the slightly revised version in the chapter in Inside SQL Server 2008 Programming.

So, it appears that, even though the query is far less straightforward to write and the spatial index needs to be hinted, the numbers table method wins, so far. However, I'd still say that the nearest neighbor optimization is useful to have in the product. But it is still Denali CTP1. And it goes without saying, if you have a test that disproves this, send it right along, I'm always happy to see it.

@bobbeauch

Now about that "other" feature that could be called fullglobe in Denali that I mentioned. I don't know which feature will retain the moniker or whether they both will...that's why I'm not in marketing.

Here's the deal... the SQL Server spatial geography type is limited to a single hemisphere in size in SQL Server 2008 and R2. That doesn't mean that a single feature can't span hemispheres, it means the feature can't be bigger than a hemisphere. And it means that there is a "left-foot rule" that you must follow for geography polygons. You might also have heard it called "left-hand rule", but Ed renamed it, as there already is a well-known right-hand rule in mathematics and physics.

The rule is really meant for disambiguation; if I code up a polygon that describes Iceland, am I representing Iceland or "everything on earth that isn't Iceland"? The left-foot rule makes it unambiguous.

Anyhow, Denali removes the hemisphere limit, so you can represent "not Iceland". And provides a cool convenience feature to make this easy. So here's code inspired by my first (but not my last) teaching gig in Reykjavik, a few years ago, in December (thanks Gunner). I enjoyed myself immensely, especially in the geothermal-heated public pool near downtown (yes, its open in December, sorry, no link).

-- here's Iceland
SELECT geog
FROM dbo.cntry00
WHERE cntry_name = 'Iceland'

-- here's everywhere that's not Iceland
-- well, it better be "not Iceland" I can't see it in spatial results
SELECT geog.ReorientObject()
FROM dbo.cntry00
WHERE cntry_name = 'Iceland'

The only fallout I can envision from this is that some folks might have used the error to indicate their ring-orientation was wrong. Now it will just work, and you'll wind up with "big 'ol hunk o' geography" (TM pending) instead. So if you are one of those people beware...it works now. For the rest of the things that used to break and that this feature fixes reference Ed and Milan's excellent whitepaper.

@bobbeauch

Once upon a time, I wrote a blog entry about CircularString, CompoundCurve, and CurvePolygon (the new geometric types supported in Denali). And mentioned that they came from ISO-ANSI SQL/MM and weren't in the OGC spec. Well, lo and behold, while browsing the latest OGC Simple Features - Part 1 spec, the last incantation (version 1.2.1) does support 'em. We don't have all the OGC types (missing Curve, Surface, multi-versions of those, Polyhedral Surface, and TIN) but it's good to see the OGC support/add the ones we do have.

Speaking of the new types, I'm still looking for a visualizer for them. Spatial result tab was removed from SSMS CTP1 Denali, so I've got SSMS 2008 R2 installed. But of course that doesn't support the new geometric types, so curves, fullglobe, and types greater than a hemisphere large (these were added in Denali too) draw a blank. So I can draw them on graph paper. And in my mind... Neither Bing nor Google, that I'm aware of, support them. Wonder if SAFE FME visualizer does, hmm... You usually approximate them with linestrings/polygons, STCurveToLine and CurveToLineWithTolerance are your friends.

@bobbeauch

So what's the area of the earth? Well...depends on who's measurement you want to use.

There an interesting SQL Server Denali spatial feature known as fullglobe. Actually there's two features with that name, more later; I was using the FULLGLOBE geometric type (geometric type is the term used by OGC, this one only works with SQL Server's GEOGRAPHY data type). You create one like this:

DECLARE @g geography = ('FULLGLOBE', 4326) -- where 4326 is your SRID

It's a special extension to the OGC (that doesn't have a specific geography type as such either) that specifies the entire earth. So you can get the area of the earth by using @g.STArea(). After doing the "figure out the water area by subtracting the union of the countries' area" thing with it, I thought this might be a cool way to demonstrate one facet of SRIDs (that is spatial reference system identifiers).

There's a table called sys.spatial_reference_systems that lists all of the SRIDs supported by SQL Server's geography type (they use these in the computations). The also includes the OCG well-known text that describes them. One parameter describes the ellipsoid used by this SRID. Ellipsoids' measurements differ for two main reasons; different surveys and the fact that the earth changes shape slightly over time.

So you could probably guess (and you'd be right) that different SRIDs would produce different areas for the earth. Here's the code. Note that as a special bonus for using it, I found two new SRIDs in Denali, Unit Sphere (SRID 101) and Spherical Earth (SRID 126). Nice.

SELECT spatial_reference_id AS SRID,
       dbo.measure_globe(spatial_reference_id) AS Area,
       unit_of_measure,
       well_known_text,
       dbo.get_ellipsoid(well_known_text) AS Ellipsoid
FROM sys.spatial_reference_systems
ORDER BY area, ellipsoid, spatial_reference_id – whatever you want

CREATE FUNCTION dbo.measure_globe(@srid INT)
RETURNS FLOAT
AS
BEGIN
DECLARE @g geography = geography::STGeomFromText('FULLGLOBE',@srid);
RETURN @g.STArea()
END

CREATE FUNCTION dbo.get_ellipsoid(@wkt NVARCHAR(4000))
RETURNS NVARCHAR(4000)
AS
BEGIN
-- grungy string-handling alert
DECLARE @begin INT = PATINDEX('%ELLIPSOID%',@wkt) + 11
DECLARE @rest NVARCHAR(4000) = SUBSTRING(@wkt, @begin, 4000)
DECLARE @end INT = CHARINDEX('"', @rest)
RETURN SUBSTRING(@rest, 1, @end-1)
END
GO

@bobbeauch

I've been trying out the new Denali "AUTO_GRID" spatial index (that is ...USING GEOGRAPHY_AUTO_GRID or GEOMETRY_AUTO_GRID in the CREATE SPATIAL INDEX DDL) on some of my old spatial performance use cases. It's worth repeating (from Ed's excellent whitepaper) that the AUTOGRID index uses 8 levels of tessellation instead of the 4 levels that the original "manual grid" spatial index uses. Let's go on from there. BTW, If you're not familiar with spatial indexes you might want to skip to the last paragraph for the motivation first.

In the manual-grid index, you can specify a grid resolution of high/medium/low for each of the 4 levels. This equates to a grid cell size of 16 (low), 64 (medium), or 256 (high) cells for grid level. I've adopted a convension of calling an index with grid level of GRIDS = (HIGH, HIGH, HIGH, HIGH ), as "HHHH", GRIDS = (MEDIUM, LOW, MEDIUM, HIGH ) as "MLMH", etc.

Using this convention, the new AUTOGRID index is "HLLLLLLL" that is high-res at the first level, low-res at the other 7 levels. You can't specify an 8 level index AND specify grid-res at the same time, the only 8- level index supported is the "AUTOGRID". And you can't get the grid-res of an AUTOGRID via the usual mechanism, sys.spatial_index_tessellations reports NULL for these fields. I got the info from the spatial diagnostic stored procs.

So far, I've tried it on two use cases: point-in-polygon with a zipcode polygon and intersecting polygons (zipcode and congressional district). With the intersecting polygons, a "HHHH" index seems to outperform the autogrid by 17 vs 24 seconds with a "primed" data cache. With the point-in-polygon case, the autogrid beats the "HHHH" index by 14 vs 21 seconds with a primed data cache and 1000 points. I'm also encouraged by the fact that, given multiple spatial indexes, the query optimizer picked the correct (best) one in both cases.

However, I'm less impressed by the spatial diagnostic procedures, which picked the new autogrid as best both times, that is, Primary_Filter_Efficiency and Percentage_Of_Rows_NotSelected_By_Primary_Filter were better for the autogrid spatial index even when its not the better choice. Hmmm...well I can still adjust cells_per_object and try again... And perhaps I used a non-representative query sample in the disgnostic procs for the polygon case.

So what's this autogrid index for anyhow? It's there for software products that can automatically create a single spatial index for each spatial table/column. These products can get a "usually best" and at least "good enough" default spatial index without doing any special analysis of the spatial objects in the table or knowing what spatial queries you're going to perform. Autogrid is supposed to be often the best choice and certainly never a bad choice (as the old default of "MMMM" sometimes was. And if you can find a better spatial index for your specific queries, you're still allowed to use it, of course. Nice feature.

Safe Software, the premier spatial data transformation and loading vendor for SQL Server, has long been known for it flagship FME Desktop and FME Server products (that include a spatial workbench for transformation and loading, support a ton of spatial formats, SSIS integration, and more). But did you know that they've just introduced an affordable, though feature-rich, FME Desktop SQL Server Loader edition as well?

In addition to the online information, they'll be holding a Webinar showing how to use this product to do data transformation and loading into SQL Server spatial on Dec 14. You can sign up here.

Categories:
SQL Server Spatial

OK, last one for the evening, its getting late in Berlin. And BTW, we might have some Denali goodness here at TechEd. Search for "futures", perhaps.


There a new hint for spatial queries in Denali CTP1, namely, SPATIAL_WINDOW_MAX_CELLS. This is a table hint that, as far as I can tell, this hint affects the query sample (ie, the "other side" of the STIntersects, for example) rather than affecting the spatial index itself. If you run this example (from the census database we used with SQL Jumpstart, else substitute your own query that uses a spatial index), you can see this in the query plan.

select * from zipcodes z with(SPATIAL_WINDOW_MAX_CELLS=1536)
join Congress c
on z.shapegeog.STIntersects(c.shapegeog)=1
where c.id = 116
 
1.       Run the query with "include actual execution plan" ON.
2.       Left-Click  on the step at the extreme right, that is, "Table-valued function, Get_Geography_Tessellation_Varbinary" to select it.
3.       Hit F4 to bring up the properties window (it’s also available on the View menu)
4.       Click on the line "Parameter List", second to the bottom. Click the three dots to being up a window with parameters.
5.       Without the hint, the value of the 2-nd last parameter is 1024 (max cells that they use to tessellation the other side (in the case, the congress table) to match to zipcode table (spatial index side)
6.       With the hint (of 1536), the value of the 2-nd last parameter is 1536
 
The default value is 1024 for both geography and geometry. That’s at odds with the Denali spatial whitepaper that claims "The default value is 512 for geometry and 768 for geography".
 
Also, note the variant that uses the table hint as a query hint, this one:
select * from zipcodes z
join Congress c
on z.shapegeog.STIntersects(c.shapegeog)=1
where c.id = 116
OPTION (TABLE HINT(z, SPATIAL_WINDOW_MAX_CELLS=1536))
 
This executes without syntax errors, but doesn’t use the hint. As it should, see http://technet.microsoft.com/en-us/library/ms181714.aspx . Look for "Using table hints as query hints" and see the examples at the bottom of the BOL page, where you would use this in a plan guide (which is the whole point. Being able to use it in a plan guide, for vendor-generated code, for example, might be a big deal.

SQL Server Denali CTP1 was released today. I don’t want to be the fiftieth person to post the link to the download; reference Ed's blog posting for the location.

This is a repost (with some corrections now that the CTP is actually out) of a blog posting I did when someone discovered a web location of an updated serialization spec. It attempted to give an overview of where the new spatial types (like circular string and curves) came from and what they mean. I took down the blog posting when the spec "disappeared". But now that Ed Katibah blogged about it (be sure to read his excellent whitepaper), I guess it’s *really* OK now.

Everyone's on the lookout for info about features of SQL Server Denali. The docs folks may have given us some insight in the publication of an update of the "Microsoft SQL Server CLR Types Serialization Formats [MS-SSCLRT]" spec. This includes updates that "apply to SQL Server Denali CTP1" according to footnotes in the spec. A message about this spec update was posted on the Technet SQL Server spatial forum. The spec is publically posted.

Besides a bit indicating "this geography instance is bigger than half a hemisphere", there's four new types listed under OpenGIS Types: CircularString, CompoundCurve, CurvePolygon, and FullGlobe. The 1.2 versions of the OGC specs mention the first three. No mention of FullGlobe, that must technically not be a new type. So what are the three remaining types and what are they good for? That's more interesting than just knowing they exist. The best information I could find comes from the ISO-ANSI SQL/MM Part 3 spec.

In OGC's original type hierarchy (I couldn't find an updated one in their new specs, maybe I didn't look hard enough), LineString is the only direct subtype of Curve and Line and LinearRing derive from LineString. In SQL/MM's type hierarchy LineString, CircularString, and CompoundCurve all directly derive from Curve (an abstract type). So both CircularString and CompoundCurve "is a" Curve, just as LineString is. There is no Line and LinearRing in SQL/MM spec. In a somewhat similar fashion, Surface is declared as abstract type, CurvePolygon is the direct subtype, and Polygon is a subtype of CurvePolygon.

Back to LineString vs. CircularString vs CompoundCurve. A single segment of a LineString consists of two points connected by a straight line. But a single segment of a CircularString consists of three points that form an arc: start point, end point, and any point on the arc in between. You can have multiple segments; the end point of one segment can be the start point of the next segment. "A combination of linear and circular strings can be modeled using the ST_CompoundCurve type." (quote from the spec). Ah...so it would seem to be less work to define an arc as three points than approximate an arc with many small linestrings. More accurate too. That's (possibly) the goodness.

For CurvePolygon, to relate it to Polygon (which we all know and love) and Surface (the abstract class), from the spec: "Surfaces, as 2-dimensional geometries, are defined in the same way as curves using a sequence of points. The boundary of each surface is a curve, or a set of curves if the surface has any holes in it. The boundary of a surface consists of a set of rings, where each ring is a curve. The type ST_CurvePolygon stands for such a generalized surface, and the subtype ST_Polygon restricts the conditions for the rings of the boundary to linear strings."

Finnaly there was a question I just had to find the answer to: how then do you define a Circle? According to the new CTP1 docs, you need to define a circle in halves. So something like: declare @g geometry = 'COMPOUNDCURVE(CIRCULARSTRING(0 2, 2 0, 4 2), CIRCULARSTRING(4 2, 2 4, 0 2))'; might do the trick.

I did a few demos in the spatial visualization talk at SQLConnections using the AdventureWorks 2008 data. The Person.Address table is all geocoded and so I did a quick map in SQL Server Reporting Services 2008 R2 using individual addresses. Followed up later in the talk with addresses using ESRI's MapIt product, their Spatial Data Service and nice Silverlight controls. Both demos were projected over maps; I used both Bing Maps and a simple ESRI ARCGIS Server map.

As I zoomed and panned over the maps (SSRS 2008 R2 can zoom and pan during design time; ESRI controls allow interactive zoom and pan) some folks in attendence noticed that the street addresses from AdventureWorks 2008 don't correspond to "real" street names. Other folks told me they had problems using other geocoders with the AdventureWorks 2008 data.


The reason is pretty simple. Although the geocoding gets you to the right city/state, the actual addresses are "fake", aka, test data. So don't expect perfect street correspondance with real streets and test data. And...no, it's not a bug in your geocoder.

Categories:
SQL Server Spatial

Michael J Swart posted a SQL query on his blog that produced a rendition of Botticelli’s Birth of Venus, in the spatial results tab of SQL Server Management Studio when executed. But, when everyone rushed over in anxious anticipation, some folks could execute the query but couldn't see the picture on their 64-bit editions. Here's why...

It's an obscure problem with rendering spatially invalid results that surfaced around SP1 timeframe. The map control that produces the tab cannot execute the MakeValid() method. It's looking for the 32-bit version of SqlServerSpatial.dll that should have been installed in C:\Windows\SysWOW64, as it needs to be on a 64 bit machine. The exception happens inside the Microsoft.SqlServer.Spatial assembly, which cannot locate SqlServerSpatial.dll.

Here's a simple repro:
DECLARE @g GEOMETRY
SELECT @g = GEOMETRY::STGeomFromText('POLYGON((1 1, 3 3, 3 1, 1 3, 1 1))',0)
SELECT @g -- Spatial results tab doesn't display in SSMS x64
SELECT @g.MakeValid() -- This works
GO

So if this doesn't work for you, you'll need to either,
    A. Locate and install 32-bit SqlServerSpatial.dll in C:\Windows\SysWOW64
or B. Change the script to output the result to a table variable, where you can call MakeValid() on the spatial column.

Reinstalling the spatial library from the SQL Server 2008 Feature Pack may work as well, but I've not tried this so I can't promise.

Then you'll "get the picture".

Categories:
SQL Server Spatial

Everyone is still recovering from the November conference weeks (reminds me of sweeps week on TV). As a "SQL guy" the conferences in the last three weeks that would interest me include: SQLPASS-US, SQL Connections, TechEd Europe, PDC, and SQLBits. Why folks insist on holding all of them in the same three week period is beyond me. I've spent last week paving virtual machines with new CTP/beta versions of almost every software product I use.

I'm sure there were some neat demos during "conference sweeps". Maybe people thought I did some in my talks. And I haven't had a chance to go through everything yet. But I wanted to write about two demos I found pretty cool.

At the PDC, talk SVR33 included Ed Katibah, Torsten Grabs and Olivier Meyer. Olivier did demos of "the spatially enabled spreadsheet" (that is, an Excel add-in that encapsulates the SQL Server spatial library plus geocoding). Torsten use StreamInsight and combined spatial functions with LINQ queries that were registered to query data in a stream as it arrived "on the fly" from (a simulation of) Microsoft's shuttle system. This was a nice outside the box example showing the synergy between two seemingly unrelated parts of the SQL Server product.

You can catch a video of it here.

Categories:
SQL Server Spatial

My latest MSDN magazine column on Visualizing Spatial Data is out in the Novemeber issue. It covers three ways to visualize SQL Server 2008 spatial data:

SQL Server 2008 R2 Reporting Services Map Control
ESRI MapIt Version 1.0
MapPoint Add-In for SQL Server 2008

Interestingly because the lead times are rather long for the column, we've already seen new versions of some of these.

SQL Server 2008 R2 November CTP was available last week with updates to the Map Control
ESRI MapIt Version 1.1 shipped this week

Oh well...so its now better and richer than the software I originally wrote about. More about that as I get caught up from TechEd last week.

Enjoy!

Categories:
SQL Server Spatial

There are two slightly different ways of representing non-existing spatial data, the "empty geometry" concept and database NULL. You can read about the subtle differences between them here. If you're using a spatial index, either of these types have the ability to cause query perforamance problems.

Let's take, as an example, a table with 10000 rows, all of which contain 'POINT EMPTY'. If we put a spatial index on those rows, we will have 10000 rows in our spatial index all with a cellid value of 0 and a SRID (in the index information) of 0. This is true even if you are using the geography data type where STSrid for the data (when using POINT EMPTY) actually returns 4326.

When you're using the spatial index diagnostic stored procedures, these index entries show up as "ObjectCells in Level 0 for Index"; the diagnosic also report 2 SRIDs in the index data when using the geography index. This can have a big effect on your query plan.

Suppose that I have 10000 empty values (either 'POINT EMPTY' or NULL) in a table called "mypoints" and a query like this:

DECLARE @g geography;
SELECT @g = shapegeog FROM zipcodes where id = '97225'
SELECT * FROM mypoints WHERE mypoints.shapegeog.STIntersects(@g)=1

If the query plan uses the spatial index, you'll get a "candidate hit" on each of the empty points during the spatial index seek iterator. This is because the row for 'POINT EMPTY' is technically outside the index bounds (level0 cell) and the spatial method will have to be evaluated later in the query. Having 10000 additional index seeks on the *base* table, to fetch the geography column in order to evaluate the spatial method, will slow things down. You'd think it might be useful to set the geography to NULL and redo the query like this:

SELECT * from mypoints
WHERE mypoints.shapegeog IS NOT NULL AND mypoints.shapegeog.STIntersects(@g)=1

This will still get the 10000 additional candidate hits, and in my limited tests, I still get 10000 additional index seeks on the base table; the IS NOT NULL predicate is manifested as a separate filter iterator.

A workaround for this is to set the "empty" points to a valid value far outside your search areas (say 'POINT(0 90)'). Unless you're doing point-in-polygon computations where the North Pole is a valid choice (in that case, you're answers will be wrong because they'll include your placeholders) your queries will execute much faster.

This workaround can cause big problems when using other methods (like STDistance for example) because now you have a real (but nonsensical) value, so be careful of the complications. You can use CASE statements to substitute NULL values, for example,

-- If I used the North Pole to speed up point-in-polygon queries
declare @p geography = 'POINT(0 0)'
select
 case
   when (shapegeog.STEquals('POINT(0 90)')=1) THEN NULL
   else shapegeog.STDistance(@p)
  end
from mypoints

I'd also thought of using NULLIF to substitute NULL for my placeholder values in spatial operations that don't need to use the spatial index, but this won't work because instances of the spatial data types aren't comparable except when using STEquals. So NULLIF and spatial types causes an error.

Categories:
SQL Server Spatial

When I downloaded SQL Server 2008 R2 CTP2 last week, I went right by (for now) the cool Gemini features and even Multi Server Management and went straight for the new SSRS Map Control. After explaining the SQL Server 2008 spatial data types, the spatial methods, and SRIDs to folks until they glazed  over, and then watch their eyes light up when I showed 'em the Spatial Results tab in SSMS, I grokked this "power of visualization" thing right away. Maybe not the most earthshattering feature in R2, the map control is perhaps the coolest.

Start by dropping a map control on a report in BIDS or ReportBuilder 3.0 and wizarding your way through a simple color analytical map of say, sales by state. And take it from there. If you're not from the US (or "you're so bored with the USA"), get base map data of more relevence to you by importing an ESRI Shapefile. Or using Bing Maps as a base map. You could even use your the geometry of your business' office Geometric layout if you have it in SQL Server Spatial format; set the map property "Coordinate System=Planar" for that type of data. And be sure to have a read through Spatial Ed's "Cartographic Adjustment of Spatial Data for SQL Server Reporting Services", series starting here for some real fun.

I'm not going to say much more for now because my next MSDN "Under The Table" column is about this beauty, as well as some other new cool spatial visualization tools for SQL Server, and I'm wary of double-dipping with the blog. Maybe some more info on advanced ways to use the control later.

So, till then...have a map!

Today I posted the code to a projcect I've been working on, a library that creates KML documents from SQL Server 2008 geography instances or queries. It's called SQLServerToKML and is available on CodePlex here. There's no releases as such yet, just the library and a rudimentary document showing how to us it.

Hope it's useful, post comments/suggestions to the project's discussion page.

Categories:
SQL Server Spatial


Last week, Microsoft released the MapPoint 2009 Add-In for SQL Server 2008 Spatial. You can get it here.

This add in allows you to add layers to the map from queries against SQL Server spatial geography columns. It's more of an entry level mapping program that an IT-level (like the ESRI offerings) mapping program. You can save the map and use it against without SQL Server having to be available. There's also a number of "utility" type functions including shapefile import and queries and editing against your existing layers.

One thing I especially like about it is that it's usable sans-web, because the MapPoint data is local.

I was peripherally involved with the project from a point of view of identifying general spatial query constructs that would be most performant, usually this involved using query patterns (both in T-SQL and the underlying ADO.NET code) that would be most likely to use spatial indexes. There's a few different constructs (like CTEs, intermediate temp tables, and using parameterized queries in ADO.NET) that are a help in this regard. It was a learning experience.

Check it out.

Categories:
SQL Server Spatial

During a talk about spatial indexes and performance last week, I was surprised by a question about using ANSI-92 JOIN - ON syntax in spatial queries. I'd coded:

SELECT * FROM geonames.dbo.geonames g
JOIN Sample_USA.dbo.Counties c
  ON g.GEOG.STIntersects(c.geog) = 1
WHERE c.ID = 1569

Later I learned that the question was prompted by the following verbiage from the SQL Server Books Online: "To be supported by a spatial index, these methods must be used within the WHERE clause of a query, and they must occur within a predicate of the following general form geometry1 . method_name ( geometry2 ) comparison_operator valid_number".

There phrase "within the WHERE clause" is a bit too restrictive. The query above using JOIN has the exact same plan as this equivalent query using WHERE does:

SELECT * FROM geonames.dbo.geonames g, Sample_USA.dbo.Counties c
WHERE c.ID = 1569
AND g.GEOG.STIntersects(c.geog) = 1

Works just fine, either form uses the spatial index. The reason why folks would be concerned about the EXACT WORDING rather than the spirit of the advice, is that the second part of the BOL sentence IS true; the spatial method must be BEFORE the comparison predicate. That is:

-- This uses the spatial index
SELECT * FROM geonames.dbo.geonames g, Sample_USA.dbo.Counties c
WHERE c.ID = 1569
AND g.GEOG.STIntersects(c.geog) = 1

-- This doesn't use the spatial index
SELECT * FROM geonames.dbo.geonames g, Sample_USA.dbo.Counties c
WHERE c.ID = 1569
AND 1 = g.GEOG.STIntersects(c.geog)

But go ahead and use the JOIN verb, it uses the spatial index just fine. As a matter of fact, if you want parameter sniffing and query plan reuse it should be:

CREATE PROCEDURE GetNamesForCounty (@county_id int)
AS
SELECT * FROM geonames.dbo.geonames g
JOIN Sample_USA.dbo.Counties c
  ON g.GEOG.STIntersects(c.geog) = 1
WHERE c.ID = @county_id

Or call the parameterized query from a database API directly (ie use query parameters rather than variables with parameterized queries). And OF COURSE you usually don't need SELECT * either...

Last week I attended my first-ever ESRI conference in San Diego. Although the "star of the show" was version 9.3.1 and the upcoming version 9.4 of the ARCGIS series of products, I most enjoyed the presentations about the new ESRI MapIt product. This product is a collection of tools and toolkits to allow import, preparation, and usage of spatial data in SQL Server 2008.

The tools include:
A data import and encoding tool, Spatial Data Assistant, that enables you to import data from .shp files and ARCGIS Servers. I especially liked that the tool can reproject the data as it imports. You can also geocode address fields with either the ERSI or Bing Services geocoder.

A REST-based service that allows serving SQL Server spatial data. This allows integration with the webpart and the API.

A Sharepoint Webpart that lets you produce layers from MAPIt, ARCGIS Server, or Sharepoint lists that contain lat/long or address data.

The ARCGIS Silverlight/WPF API now allows input from the MapIt service as well as ARCGIS Servers, Bing Maps, and GEORSS. It's used to produce some amazing applications.

I like the product a lot and installed and worked with the evaluation during and after the show. It simplifies staging and producing brilliant looking maps and apps, but also puts all the power of Silverlight at your disposal.

Actually, the "stars of the show" were the users that attended. ESRI users got a great deal of exposure in the plenary sessions as well as the Map Gallery and elsewhere. I got to meet James Fee and Morten Nielsen. Morten is known for his work with Shape2SQL, SharpMap and other utilities and he now works for ESRI on the Silverlight APIs. Good choice on their part. I also made a lot of new friends at the conference, as always.

Of course I had to sing for my supper as well and presented a talk on Analyzing SQL Server 2008 Spatial Indexes as well as a couple of impromptu sessions and demos at the booth. Because ARCGIS can use SQL Server as a geodatabase and products (including MapIt) automatically define spatial indexes during operation, there was a good deal of interest.

Thanks for a fun time...

Categories:
SQL Server Spatial

Just got a mail message from an old friend who asked why I'd dropped out of sight in the past month of so. I've been in "extended partial vacation mode" for the last month or so and haven't blogged in a while. But I'm "still alive and well" (Johnny Winter was at the Portland blues festival last week, couldn't help the reference).

Next week I'll be at the ESRI user conference catching up on the latest and greatest in the GIS world. On Tuesday, I'll be doing a fairly short talk for the ESRI SQL Server special interest group on SQL Server 2008 spatial performance troubleshooting and also a longer followup talk (with lots of real-world examples) at the Microsoft booth. If you're around, drop by and say hi.

About a month ago, I received a copy of "Beginning Spatial with SQL Server 2008" by Alastair Aitchison. I've become acquainted with Alastair through his frequent postings on the SQL Server Spatial forum (under the login Tanoshimi), where he's always been very helpful and patient with folks starting out with spatial data concepts and SQL Server practice. Although I'd been working with SQL Server spatial for a while now, I read the book from cover to cover. It doesn't disappoint. The basic spatial concepts are explained in the first few chapters, and he goes on from there to present an encyclopediadic approach to the spatial methods and properties supported in SQL Server 2008.

I especially enjoyed that he showed examples of geocoding, visualization, and services using both Virtual Earth and Google Earth. I also liked the scenario based approach to the spatial functions, for example showing STIntersection using the Appian Way and the Pontine Marshes. Each spatial function described has an example using real-world geographic use case, and this helps in understanding. The end-to-end examples gave you a good understanding about how to start off with spatial using import, query, and visualization.

The only quibbles I had were with the introduction to SQLCLR and object orientation WRT spatial. It was confusing to describe the OGC inheritence hierarchy without contrasting this with SQL Server's actual implementation (two data types geometry and geography each supporting all of the instanciable OGC types, like point, polygon, etc). And because SQL Server's .NET UDTs don't support inheritence, individual classes for each OGC type is not the way they're exposed (or even implemented internally AFAIK). I was also surprised at the assertion that you can't initialize a geometry/geography from a WKT string in T-SQL; this works because conversion from string implicitly calls the Parse method and uses the default SRID for either type. Reported it as errata. Finally, more performance-related information might have been useful, but the book was positioned as a beginning book after all.

Starting off with spatial data is more than just learning the intricacies of a new data type or lists of methods and properties of a SQLCLR-based type; you need to acquire some domain-specific knowledge to make heads or tails of the data type.  And that's where this book is invaluable. I'd recommend it as a way to get up to speed quickly with the SQL Server 2008 spatial types.

My interview with the folks at RunAs Radio is out. I'm discussing one of my favorite topics, spatial data and spatial in SQL Server 2008, with Richard and Greg. This wandered off into some interesting possible usages for this data. Catch it here.

I've answered quite a few questions lately about how to make sure that your spatial index is being used. Use of a spatial index with a large or complex spatial dataset can mean the difference between a query that takes minutes to execute and subsecond execution. You also want to use an index hint as a last resort, the spatial index should not need a hint. Here's a few things to try, in order of importance.

1. Apply SQL Server 2008 SP1!!! I can't stress this one too much. There was a change to query costing that affected spatial index use.
2. Make the query as uncomplicated as possible. Don't try to combine STIntersects with a call to STBuffer, MakeValid or other nested spatial method calls or subqueries. Use multiple statements if needed.
3. If you're running the code in SSMS, use sp_executesql around the spatial query (or use your own stored procedure with the spatial value as a parameter) to ensure the query coster "knows" the parameter value at the time its creating the query plan, that is, at beginning of the batch or on entry to a stored procedure or sp_executesql.Here's an example:

-- does not use the spatial index without a hint
declare @latlonPoint geometry = geometry::Parse('POINT (45.518066 -122.767464)')
select a.id, a.shape.STAsText()
from zipcodes a
where a.shape.STIntersects(@latlonPoint)=1
go

-- this does use the spatial index without using a hint
declare @latlonPoint geometry = geometry::Parse('POINT (45.518066 -122.767464)')
exec sp_executesql
N'select a.id, a.shape.STAsText()
from zipcodes a
where a.shape.STIntersects(@latlonPoint)=1', N'@latlonPoint geometry', @latlonPoint
go

-- so does this
create procedure find_zipcode (@g geometry)
as
select a.id, a.shape.STAsText()
from zipcodes a
where a.shape.STIntersects(@latlonPoint)=1
go

declare @latlonPoint geometry = geometry::Parse('POINT (45.518066 -122.767464)')
execute find_zipcode(@latlonPoint)

4. If you're using client code, make sure you use a parameterized query and that it passes the value in the SqlParametersCollection. ADO.NET will change this into a call that uses sp_executesql.

5. Don't depend of passing in a string literal to give the query coster the right info, because the code "creates" the point inline, after the query plan has been created.:

-- use a parameterized query with sp_executesql or stored procedure instead
select a.id, a.shape.STAsText()
from zipcodes a
where a.shape.STIntersects('POINT (45.518066 -122.767464)')=1

6. Check the query plan (actual or estimated plan will work) to ensure the index is being used. The Spatial Index step in the query plan is easy to locate.

7. Make sure you have an appropriate spatial index with your data AND for your query sample. You can check this with the spatial index diagnostic stored procedures. To see how to use these procedures, start with the blog series here.

8. Use a hint as a last resort and see if it makes a difference in the query speed. If using a spatial index hint causes an error "Could not create plan", it may mean that your query is too complex. See step 2.

I hope this helps your query response be as fast as possible.

On experimenting with the CTP of SQL Server 2008 SP1, I found that there were some changes made to the costing algorithm, vis-a-vis spatial indexes. In the RTM version, spatial indexes weren't uses as often as they could be, because the query costing was too high compared to the spatial filters. This made programmers resort to index hints for spatial indexes at the slightest provication. Here's some representative examples:

-- find surrounding zipcodes
-- no indexes used, 5 secs
-- cost pre-SP1: 45.2741
-- index used in SP1, subsecond
-- cost SP1: 14.2199
declare @id varchar(9) = '97225'
exec sp_executesql
N'select b.id, b.shape.STAsText()
from zipcodes a, zipcodes b
where a.id = @id and a.shape.STIntersects(b.shape)=1', N'@id varchar(9)', @id
go

-- cost SP1: 26.3341
-- cost pre-SP1: 219.754
declare @id varchar(9) = '162243'
exec sp_executesql
N'select b.id, b.shape.STAsText()
from census a, census b
where a.id = @id and a.shape.STIntersects(b.shape)=1', N'@id varchar(9)', @id
go

So you should not have to hint spatial indexes as frequently (or at all) in SP1. If you'd had those hints encapsulated in plan guides, you can just turn the plan guides off.

I'm still having trouble with some queries picking the best spatial index out of multiple choices; an "hhhh" index vs a "mmmm" index for example. So you may still have to hint if you have multiple spatial indexes for varied use cases. But for a single use-case/single index, it appears to pick the best of "use spatial index" vs "scan and filter" every time. To determine which is the best spatial index for a specific use-case have a look at the spatial analysis proc series starting a few postings back.

Also, don't forget that in order to give the plan coster the best information, always use stored procedure parameters or parameterized queries. If you're just executing T-SQL batches in SSMS, sp_executesql will do the trick; if you use parameterized queries in any of the database APIs (e.g. ADO.NET), they'll convert it to a call to sp_executesql for you. And don't forget to use the right syntax for the spatial index method, that is, use "a.STIntersects(b) = 1" rather than "1 = a.STIntersects(b)".

I've become a columnist for MSDN magazine and my first column is available in the online edition, Feb 2009 issue today. I'm doing a column every few months called "Under The Table", about database-centric development. The first column is about one of the most exciting (IMHO) new features in SQL Server 2008, spatial data type and indexes, and the spatial library that's part of Microsoft.SqlServer.Types. Hope you like it.

When SQL Server's spatial data types were introduced, people asked "Why does every standard method begin with ST" (e.g. STIsEmpty). SQL Server docs always cite the OGC (Open Geospatial Consortium) specifications, but OGC's methods do not begin with ST. Where did it come from?

As far as I can figure the "ST" before every standard method comes from the SQL Multimedia (SQL/MM): Part 3 Spatial spec and stands for spatial-temporal. But SQL/MM prefixes standard methods with "ST_" not "ST". ESRI and IBM's DB2 spatial extender also use "ST_", and DB2 appears to use the convention of leaving the "ST_" off for non-standard methods. Oracle Spatial uses ST as of 10g (see the comments below), as well as "SDO_" for both standard and non-standard methods (stands for Spatial Data Option). Postgres Spatial uses no prefix, a la OGC.

So what's in a naming standard? Apparently, not much. One nice side-affect is that searching for "STBoundary" (no underscore), for example, gets only references to SQL Server spatial.

Yesterday a friend asked me if it was better to always use a value of GEOMETRYCOLLECTION EMPTY as an alternative to making a geometry/geography column nullable. OGC has its own concept of database NULL, that is [GEOMETRY] EMPTY, where [GEOMETRY] can be any valid geometry subtype (that is 'POINT EMPTY', 'LINESTRING EMPTY', etc). When I asked why it mattered, he said he was tired of putting ..WHERE...IS NOT NULL in every UPDATE of SQL Server spatial data type. Sounds like reason for a blog entry.

NULL is a valid value of any SQL Server data type, defining a variable without initializing it makes it NULL.
DECLARE @x int; -- value of @x is NULL
DECLARE @geog Geography -- value of @geog is NULL

According to SQL-1999 (and above) a SQL data type can have properties and methods; accessor methods and mutator methods. Calling a accessor method on a NULL variable returns NULL. Calling a mutator method on a NULL variable is an error, in SQL Server the error text is "Mutator ... cannot be called on a NULL value". If I have a table of 100000 geography instances and one row contains NULL, attempting to update that column without using WHERE..IS NOT NULL produces this error.

UPDATE dbo.Geotable SET geog.STSrid = 4326; -- error if any row contains NULL geog

EMPTY GEOGRAPHY is a different concept then NULL because OGC defines a common architecture for geography, independent of SQL. In fact, the "OpenGIS® Implementation Specification for Geographic information - Simple feature access - Part 1:Common architecture" spec does not even use the word NULL once. EMPTY can be tested with the OGC standard method IsEmpty (STIsEmpty in SQL Server), and is defined as "the empty point set ∅ for the coordinate space". Because it is not database NULL, you can call methods on it without error:

DECLARE @a geometry = 'POINT EMPTY';
SET @a.STSrid = 4326; -- works fine

Some other interesting behaviors of [GEOGRAPHY] EMPTY

declare @a geometry, @b geometry;
set @a = 'POINT EMPTY';
set @b = 'LINESTRING EMPTY';

select @a.STEquals(@b);  -- true

select @a.STDimension(); -- -1 (negative one)

select @a.STIsEmpty();  -- true, of course
select @a.STIsSimple(); -- true

-- Disjoint is true if "Intersection of @a and @b is the empty set"
select @a.STDisjoint(@b); -- 1

-- Both Within and Contains are false
select @a.STWithin(@b);  -- 0
select @a.STContains(@b);  -- 0

Some other interesting uses of [GEOGRAPHY] EMPTY

-- the boundary of a point is the empty set
declare @x geometry = 'POINT(1 1)';
select @x.STBoundary().ToString();  -- GEOMETRYCOLLECTION EMPTY

-- but the boundary of a polygon is a closed linestring
declare @y geometry = 'POLYGON((1 1, 1 2, 2 2, 2 1, 1 1))';
select @y.STBoundary().ToString();  -- 'LINESTRING (1 1, 2 1, 2 2, 1 2, 1 1)'

-- and the boundary of a closed linestring (ring) is GEOMETRYCOLLECTION EMPTY
declare @z geometry = @y.STBoundary();
select @z.STBoundary().ToString(); -- GEOMETRYCOLLECTION EMPTY

OGC's SQL Option spec (and SQL Server) differenciates between NULL and [GEOMETRY] EMPTY. For example, in the case above, the boundary of any NULL geometry IS NULL, not EMPTY.

Should you use NULL or EMPTY as a Geometry/Geography default? Depends on the semantics that you want to enforce, but, to me, lack of a value in SQL is NULL. And you can even include a constraint that the column value may not be either or both special values. EMPTY is a good way to ensure not having to use "IS NOT NULL" in every UPDATE statement, but that's as far as it goes.

What do you think?

I've devoted the last few blog entries to describing what's in that spatial index analysis proc output and how to use it and interpret the results. Just wanted to describe the rest of the information and how the information relates to what you'd find in a query plan that uses a spatial index.

The procs return:
1. Information about the index
2. Information about the query sample
3. How efficient the index is when used against the query sample

The information you get about the index is:
-  Bounding Box dimensions - BOL says they are NULL for geography but they are always -180, -90, 180, 90 for geography
-  GridLevelSize - 4 levels
-  Cells_Per_Object - maximum tessellated cells per row, specified in DDL
-  Page and Row counts for the spatial index
-  Avg number of cells per base row (how many of the CellsPerObject were actually used)
-  Height, Width, Area of a cell - Geography is always the same, but its nice to know how big the cells are compared to the other levels. Not sure what the unit of measure is for geography (BOL says "depends on SRID"), by looking at the number it doesn't look like it could be meters.
-  CellAreaToBoundingBox percentage - 4 levels
-  Total SRIDs found

The most interesting is likely the info about the number of cells in each level (including level0 - off the bounding box). Any levels that are 0 rows don't appear in the XML output. You get info for:
-Number of Object Cells for Index
-Number of Interior Cells
-Number of Intersecting Cells
-Number of Border Cells
-also some numbers about cells normalized to the leaf grid

The cells are useful in how they relate to the primary filter. The interior, intersecting, and border cells are used by the internal filter.

For the query sample, you get the same cells numbers. No "cells normalized to the leaf" however. BTW, for the geography index ignore the BOL comment that indicates that if Level0 of the query sample is 1 then index is not useable for this query. For geography its always 1.

This relates to the following information that appears in a query plan that uses spatial indexes.
1. All of the query samples (in the spatial predicate) are tessellated using the same parameters (density/max cells)
2. The table is joined against all parent/children of the index (the index levels are actually a hierarchy)
3. Index ranges are calculated
4. A join of the tesselated samples on index is done. This results in the number of rows selected by the primary filter.
5. The internal and secondary filters are applied in a separate query iterator (of type Filter) later on in the plan.

The upshot of this is that, not only is the index dimensions used to tessellate the base table rows, the same dimensions as the index are used to tessellate the query sample for the spatial predicate (query sample, in the sproc). So, for example, I'd always thought that spatial indexes over points should specify all-HIGH density and maybe 2 max cells per object. But if your query sample contains large polygons, those same dimensions will be used against the query sample too. So maybe all-HIGH density for point-based indexes isn't a hard and fast rule.

Hopefully this should give you enough information to investigate your indexes and queries using the spatial analysis stored procs. As far as patterns for "which specifications of index is best", more patterns will show up as I investigate more different patterns of spatial data. For now, I'm dividing these into three main categories:

1. Spatial index over table of points, Query sample is polygon.
2. Spatial index over table of polygons, Query sample is point.
3. Spatial index over table of polygons, Query sample is polygon.

With the simplifying premise that linestrings are thin like a point (ie they won't fill a cell), but if you put a STBuffer around them, they're more like polygons.

Happy index hunting.

A little more about the query sample that gets fed into the spatial index procs.

Query sample is a singleton geometry or geography. It's not a query. So it's not as straightforward as "here's a spatial query that could use an index, run this query and show me the figures". So this procedure is not going to work any differently for a query that use STIntersects vs STDistance vs... It's going to show you how an index on table A would be utilized given a single operation.

It maps most closely when your query is something like:
select ... from tablea a where a.geog.STIntersects(@g)=1

Is this case @g is your query sample, and Number_Of_Output_Rows is the number of rows the query returns. But what if you have a conditional query or query that has more than one row as output, like

select ...
from tablea a,
     tableb b
where a.id = @someid and b.geog.STIntersects(a.geog)=1

In this case you could specify your query sample like this:
DECLARE @my_query_sample geography = (select geog from tablea a where a.id = @someid)

...And you're looking to an analyze usage on an geography index on tableb. But this will cause an error if "select geog from tablea a where a.id = @someid" returns more than one row. How do I use the sprocs in this case? Immediately you think:
DECLARE @my_query_sample geography;
SELECT @my_query_sample = geog from tablea a where a.id = @someid

or even:
select @my_query_sample = a.geog
from tablea a,
     tableb b
where a.id = @someid and b.geog.STIntersects(a.geog)=1

Let's analyze this way of specifying query sample. If your query sample looks like it's going to represent more than one row, note that the spatial index is always use in (at least I've only seen it used in) a SEEK. So if your query window-producing query represents N rows, you need to choose one row. Hopefully a representative row. This would work if the "top(1) without an order by" row was representative:

DECLARE @my_query_sample geography = (select top(1) geog from tablea a where a.id = @someid)

And realize that you're going to have to do this operation (repsented by the output) N times. Let's make this (hopefully) more clear.

select a.zipcode, b.point
from zipcodes a
     points b
where a.zipcode IN (select zipcode from zipcodes where state = 'OR')
and a.geog.STIntersects(b.geog)

There are 430 zipcodes in Oregon. There are 1000 points in your table. You have an index on zipcodes and one on points. Remember that STIntersects is commutative (a.geog.STIntersects(b.geog) = b.geog.STIntersects(a.geog)) which index is "better"? Notice that the query returns 1 column from zipcodes and 1 column from points, so you can't just STUnion all of the Oregon zipcodes together. You'd get the same points that way, but have no idea which zipcode goes with which point. So choices are:

1. Representative query sample for zipcodes in Oregon, analyze point index, realize that you're going to do this "query" 430 times.
DECLARE @my_query_sample geography = (select top(1) geog from zipcodes a
 where a.zipcode IN (select zipcode from zipcodes where state = 'OR') )
2. Representative query sample for point, analyze zipcode index, realize that you're going to do this "query" 1000 times.
DECLARE @my_query_sample geography = (select top(1) geog from points)

Be aware that doing 430 or 1000 seeks against the spatial index might "scare off" the query optimizer from choosing that plan. You could also declare you're query sample like this:

DECLARE @geog GEOGRAPHY
select @geog = a.geog --- or @geog = b.geog
from zipcodes a
     points b
where a.zipcode IN (select zipcode from zipcodes where state = 'OR')
and a.geog.STIntersects(b.geog)

But that's still going to give you one representative combination point or zipcode as a query sample (ie @geog is still a singleton). If all of your points are in Oregon, it's going to produce a similar analytic output, and if all your points are not in Oregon, it might be better to specify it that way. But then you would have to know how many points in Oregon to deduce how many seeks. And you can't check the ouput against "Number_Of_Rows_Returned" because you don't know which row it actually chose.

If you are just needing points, not zipcode-point combinations you could STUnion together all the zipcodes in Oregon and run the query that way. The spatial analysis procs give you a 1-1 mapping against that query. Or use a state table rather than a zipcode table. Of course this is predicated on the fact that zipcodes don't overlap multiple states.

Hope this answers the question, "what do I use as a query sample to troubleshoot query X and index Y?".

Last post discussed the filters. The procs report some raw numbers and some derived numbers. Here's a cheat sheet, although the info is all in the BOL.

N = Number of rows in the table
O = Number of rows output
P = Number of rows selected by primary filter (by the index)
S = Number of rows selected by internal filter (by the index optimizations)

Then,

P-S       = Number_Of_Times_Secondary_Filter_Is_Called (number of times they ran the expensive operation)
(N-P)/N = Percentage_Of_Rows_NotSelected_By_Primary_Filter
S/P       = Percentage_Of_Primary_Filter_Rows_Selected_By_internal_Filter
S/O       = Internal_Filter_Efficiency %
O/P       = Primary_Filter_Efficiency %

<Percentage_Of_Rows_NotSelected_By_Primary_Filter>9.942714746408677e+001
<Percentage_Of_Primary_Filter_Rows_Selected_By_Internal_Filter>9.095871716137836e+001
<Internal_Filter_Efficiency>9.593378913278158e+001
<Primary_Filter_Efficiency>9.481405663596043e+001

So how efficient is the index? Let's change this info into statements, using the values from the previous blog entry:

N = 511650, O = 2779, P = 2931, S = 2666

<Number_Of_Times_Secondary_Filter_Is_Called>265</Number_Of_Times_Secondary_Filter_Is_Called>
<Percentage_Of_Rows_NotSelected_By_Primary_Filter>9.942714746408677e+001
<Percentage_Of_Primary_Filter_Rows_Selected_By_Internal_Filter>9.095871716137836e+001
<Internal_Filter_Efficiency>9.593378913278158e+001
<Primary_Filter_Efficiency>9.481405663596043e+001

1. They had to run STIntersects 265 times, rather than 511650 times if they didn't use the index.
2. 99.4% of the rows were eliminated by using the index
3. The Internal_Filter (internal optimizations) eliminated 90% of the times they would have had to run Intersects, even if the used the index (primary_filter).
4. The Internal_Filter was able to "deduce" 95.9% of the rows in the answer.
5. The Primary_Filter was able to find 94.8% of the rows in the answer.

Pretty good index, huh? And pretty good internal filter. Probably is going to be used.

Let's look at one more, but this time from using a geoemtry index. The index is over in geometry column that contains only points. The query sample is a polygon.

<Base_Table_Rows>11267
<Number_Of_Rows_Selected_By_Primary_Filter>265
<Number_Of_Rows_Selected_By_Internal_Filter>149
<Number_Of_Times_Secondary_Filter_Is_Called>116
<Number_Of_Rows_Output>413
<Percentage_Of_Rows_NotSelected_By_Primary_Filter>9.764799857992367e+001
<Percentage_Of_Primary_Filter_Rows_Selected_By_Internal_Filter>5.622641509433962e+001
<Internal_Filter_Efficiency>3.607748184019371e+001
<Primary_Filter_Efficiency>1.558490566037736e+002

Note that in this case, the internal filter only elimates 56.2% of the rows that the primary filter finds. It only finds 36% of the output rows. But note that in this case, primary filter efficiency = 155%. How can that be? Well, this is geometry and the primary filter didn't find all the rows in the answer. There must be some outside the bounding box. Remember that Primary_Filter_Efficiency is defined as Output_Rows / Primary_Filter rows.

More coming...

The spatial index diagnostic procs' most enlightening pieces of information have to do with the filter counts and efficiencies reported at the end of the set of columns or XML document output. But what are these filters exactly and what do the results mean?

The proc output refers to three filters:
1. Primary Filter
2. Internal Filter
3. Secondary Filter

Remember that the spatial indexes are based on tessellation or tiling. Issac has some nice diagrams that show how the tiling works in theory. Please review his indexing posts before reading these and when/if you get confused by the terminology. Depending on the size and type of the spatial object, the object may completely cover a tile (think big polygon), partially cover (intersect) a tile (think point) or not cover a tile. It can also hit the border of a tile. The tiles are referred to in the spatial procs as cells.

Remember also that the point of a spatial index is to reduce the number of Intersect operations (or other operations covered by the spatial index) actually performed by using tiling. I liked Isaac's analogy that "if I want to find roads that intersect Madison WI, I only have to look at roads that intersect Wisconsin". So the index (primary filter) selects all roads that intersect Wisconsin. It can get false positives (roads that intersect Wisconsin but not Madison). The "Secondary Filter" is this case is the Intersects operation itself.

So what's the "internal filter"?  If you think of the primary filter as filtering out 100 polygons out of 10000, they'd still have to run Intersects on the 100 polygons to actually see if they qualify. But say that the index and query sample both *completely* cover a cell. Or index completely covers the cell and query sample touches it. We know that the cell qualifies for Intersect in that case, we don't need to run Intersect on the query sample for that combination. That's the Internal Filter.

So let's look at some sample output. In this case the index was over a geography column that contains only points. The query sample is a polygon. 

<Base_Table_Rows>511650</Base_Table_Rows>
<Number_Of_Rows_Selected_By_Primary_Filter>2931</Number_Of_Rows_Selected_By_Primary_Filter>
<Number_Of_Rows_Selected_By_Internal_Filter>2666</Number_Of_Rows_Selected_By_Internal_Filter>
<Number_Of_Times_Secondary_Filter_Is_Called>265</Number_Of_Times_Secondary_Filter_Is_Called>
<Number_Of_Rows_Output>2779</Number_Of_Rows_Output>

So: 511650 rows in the table. Using the index (primary filter) eliminates all but 2931 of them. The internal filter determines that 2666 are true hits, not false positive. They have to run Intersect 265 times (2931 - 2666) times. And the number of rows returned is 2779, so there were some false positives. Good thing they ran Intersect. BTW. Using the method Filter() instead of Intersect() just means they're returning all the rows selected by primary filter (2931).

Finally, remember if you're using a geography index, there is no bounding box, so no part of the rows or query sample can be outside the index. With geometry, some of the rows or query sample can be outside the bounding box. This means that with geography, Rows_Selected_By_Primary_Filter >= Number_Of_Rows_Output. With geometry, Number_Of_Rows_Output may be > Rows_Selected_By_Primary_Filter as well.

I've been looking at the spatial index stored procedures (sp_help_spatial_geography_index_xml and friends) a little harder recently, in an attempt to help answer two questions.

1. If I have a spatial query, why does/doesn't it use my spatial index?
2. If I have a specific spatial query in mind, what are the best values to choose for my spatial index density?

In addition to the diagnostic information presented in the procs, there are a few things to keep in mind when attempting to answer the first question. Currently, the query optimizer is estimating "high" when it comes to using the spatial index, so there are occasions were the spatial index should be used and isn't. Isaac mentioned this in his PDC talk, and mentioned that this would be changed in the next refreshed. As far as I could see, it isn't in SQL Server 2008 CU2.

Also, remember that SQL Server will never use cardinality estimates for a parameter set in a variable by the SET statement unless you use OPTION(RECOMPILE) on the statement. This is a general SQL Server behavior, not a spatial behavior because of when SQL Server creates the query plan. So if you want cardinality estimates for the variable @g in the query "select geog from table where geog.Intersects(@g)=1" you must either pass @g in as a stored procedure parameter or use sp_executesql. Realize that if you use a parameterized query in a client app, the client API stacks will change the parameterized query to invoke sp_executesql, so you're OK here. According to Michael Rys' SQLPASS talk, one spatial-specific behavior is that cardinality won't be used if @g if specified as a constant, that is "select...where geog.Intersects('POINT(1 1)')=1 " either. Finally SQL Server doesn't change the query plan in a parameter value changes, general behavior, not spatial-specific.

Back to the diagnostic stored procs.

The information that the spatial index procs provide consists of three main categories:
1. Information about the index itself
2. Information about the query sample
3. How efficient the query index is when used against the query sample

The query sample is specified as an input parameter to the diagnostic procs, and would be "@g" in the Intersects queries above.

If you've read my previous blog post on the diagnostic procs, read it again, I've revised it. Filters are the subject of the next post.

I remember hearing during the SQL Server 2008 beta that there would be diagnostic stored procedures that would produce information about spatial indexes and help in troubleshooting why an index was not being used. In addition, these procedures would help in determining the best spatial index. A few weeks ago I had the good fortune to run across these procedures.

sp_help_spatial_geometry_index and sp_help_spatial_geography_index put out information as columns
sp_help_spatial_geometry_index_xml and sp_help_spatial_geography_index_xml put out XML format

Each procedure takes a table name, index name, and sample query. There's an option for verbose or basic properties. The parameters and output are doc'd in BOL. Rather than repeat the doc, I'd like to walk through a problem and show how you'd interpret the output.

Say that you are trying to decide between an index with high and medium density to run a query similar to the following:

Declare @geometry geometry =
 Geometry::STGeomFromText('POLYGON((500000 300000,500000 340000,540000 340000,540000 300000,500000 300000))', 0)
exec sp_executesql N'select * from spatial_table where LOCATION.STIntersects(@g) = 1', N'@g geometry', @geometry

Create both indexes and run the procedure against each one, using the sample query. There is a lot of very detailed information about the index itself because I was running in verbose mode, including information about the number of cells at each level of tesselation. Before looking at this information, it would be good to review Isaac's excellent blog post on how tesselation works. The most useful info though, was contained in the "Number of rows selected by ... filter" properties and filter efficiency properties. In my case there they were

High density index:
Number_Of_Rows_Selected_By_Primary_Filter:  2931
Number_Of_Rows_Selected_By_Internal_Filter: 2399
Number_Of_Times_Secondary_Filter_Is_Called:  532

Medium density index:

Number_Of_Rows_Selected_By_Primary_Filter:  2931
Number_Of_Rows_Selected_By_Internal_Filter: 2666
Number_Of_Times_Secondary_Filter_Is_Called:  265

Note that "primary filter" in this case is the spatial index in question, the goal is to have the rows selected by the primary or internal filter rather than run the secondary filter (the actual Intersects operation). Note that there can be false positives, the actual number of rows returned was not 2931 but 2779.

The best way to judge the effectiveness of the index, however, is with the property Internal_Filter_Efficiency or Primary_Filter_Efficiency.  Internal_Filter_Efficiency, for example, showed the high density index at 86% and the medium denisty index at 96%. Clearly, the medium density index is a better choice for this query. In addition, looking at the statistics on the index itself shows that the medium density index is a smaller index with less rows and index row per base row. The figures were:

High density:
Total_Primary_Index_Rows: 5273534 rows
Average_Number_Of_Index_Rows_Per_Base_Row : 10

Medium density:
Total_Primary_Index_Rows: 1965865 rows
Average_Number_Of_Index_Rows_Per_Base_Row : 3

There is also a property, Total_Number_Of_ObjectCells_In_Level0_For_QuerySample, that will tell you if the index *can* be used. A non-zero value indicates this and properties that will assist in determining whether to adjust the bounding box on a geometry index.

Hope this will give you a good head start in working with these excellent informational and diagnostic stored procedures.

 

After reading Isaac's recent blog posting about spatial queries and index hinting, I thought I might add some information based on a query I was working with this week. Sometimes if you have a query that's too syntactically complex, hinting won't work. In these cases you can cause the index to be used (or at least hintable) by breaking up the query. Warning. Protracted code example follows. 

If the spatial query is somewhat complex, it's useful to write it out step by step, so we start with the query written out in steps. We're using three tables, a county table, highway table, and POI (points of interest) table. Without pondering the necessity of clipping to the county boundary twice, here's the original logic. 

--Find all banks within 2km of I66 as it runs through Fairfax County
DECLARE @g geography
DECLARE @h geography
DECLARE @i geography
DECLARE @j geography
DECLARE @k geography
SELECT @g = geog from va where fips = 51059 -- Fairfax County
SELECT @h = geog from us_hwys where route_num = 'I66' -- I66 Hwy
SELECT @i = @h.STIntersection(@g)-- I66 within county
SELECT @j = @i.STBuffer(2000)-- buffer around I66
SELECT @k = @j.STIntersection(@g)-- clip buffer to county boundary
SELECT geog from POI
  WHERE geog.STIntersects(@k)=1 and Description = 'bank'

We thought that it might be better for the overall query performance to write this as a single SQL query, so, doing simple substitution, one subquery at a time, we come up with these two equivalent query sets at the end of the process:  

-- A. Almost there... 2-query process
DECLARE @k geography
select @k =
((SELECT geog from us_hwys where route_num = 'I66').STIntersection(
  (SELECT geog from va where fips = 51059)
)).STBuffer(2000).STIntersection(
  (SELECT geog from va where fips = 51059)
)
-- uses the spatial index without a hint, total elapsed time for both steps 2 seconds
SELECT geog from POI -- with (index(spatial_idx_2))
where geog.STIntersects(@k)=1 and Description = 'bank'
go

-- B. Done... 1 query
-- But, doesn't use the spatial index
-- Total elapsed time, 48 minutes 53 seconds!
SELECT geog from POI  -- with (index(spatial_idx_2))  hint doesn't work
where geog.STIntersects(
  ((SELECT geog from us_hwys where route_num = 'I66').STIntersection(
    (SELECT geog from va where fips = 51059)
  )).STBuffer(2000).STIntersection(
    (SELECT geog from va where fips = 51059)
  )
)=1 and Description = 'bank'

The resulting query is too complex for the query processor to even "think of" a plan that uses the spatial index. Even hinting doesn't work. The simpler STIntersects query using the 2-query process can be hinted, but we don't need the hint. The query processor is smart enough to use the spatial index with the simplified version. Oh...

So the moral of the story is not only to be on the lookout for index-hinting opportunities, but that sometimes, if the query is too complex the query processor won't take the hint. The error in this case was:

Msg 8622, Level 16, State 1, Line 1
Query processor could not produce a query plan because of the hints defined in this query.
Resubmit the query without specifying any hints and without using SET FORCEPLAN.

In cases like this, it may be necessary to break the query into pieces in order to apply the hint. In this case, it happens with an STIntersects() that used four subqueries, the exact "complexity point" may vary with the query itself.

Of course, in this case, because of the repeating "clipping" steps (clip to a particular county), we can rewrite the query to use common table expressions. This not only makes it easier to read but, by removing the redundant subquery, the query processor decides to use the index with the hint.

WITH GetCounty as (SELECT geog from va where fips = N'51059'),
     GetHighway as(SELECT geog from us_hwys where route_num = 'I66')
SELECT POI.geog from POI with (index(spatial_idx_2)),  GetCounty C,  GetHighway H
WHERE POI.geog.STIntersects(
  H.geog.STIntersection(C.geog).STBuffer(2000).STIntersection(C.geog)
)=1 and Description = 'bank'

So watch out for introduction of excess complexity. Hope this was helpful.

Folks have always had trouble with the fact that ring orientation is required with spatial instances if you're using SQL Server 2008's geography data type, but not with the geometry data type. For an explanation of the need for this, reference Isaac Kunen's blog entry here and Ed Katibah's blog entry (link in next paragraph).

In Ed's blog entry, he provides a neat way to fix spatial instances that have the wrong ring orientation for geography by using the geometry type and calling a method that forces the required ring orientation for geography. You should read his entry from yesterday for background.

I thought this was such a cool mechanism that I coded up a SQLCLR UDF that encapsulates this behavior. It accepts almost any WKT and produces a "proper" geography type, regardless of the ring orientation in the WKT. The code is included with this post. The function is called GeographyFromAnyWKT. Well, almost any. There are certain edge conditions (such as a ring that actually would exceed a single hemisphere) that will produce and error, but at least you shouldn't have to worry about ring orientation. Thanks Ed!

As an aside, although this is a Visual Studio SQLCLR autodeploy database project, the function can't be autodeployed because it returns a SqlGeograhy type and this is not covered by the autodeployer code. So I've included a deploy script and a couple of test cases in the project. Although I define the UDF as RETURNS NULL ON NULL INPUT, I've also included (redundant) null checking in the function itself, just in case you want to change the function a bit. Enjoy!

GeographyValidator.zip (96.73 KB)

This entry is a continuation of the previous posting on how to use SQL Server 2008 spatial data and Virtual Earth. This entry discusses your web service and SQL Server code choices.

The point of the web service is to translate one of the output formats of SQL Server 2008 spatial to a format this Virtual Earth can use. The spatial data types can be output in:
1. Well-known text format
2. "Native" format - that is, return a Geometry/Geography type to a .NET client as the SqlGeometry/SqlGeography native .NET types
3. GML - Geography markup language

Remember that Virtual Earth needs:
1. GEORSS or
2. KML
3. Custom XML-based format (and custom Javascript code to process it)
4. Javascript strings (to pass to eval())

There is not a one-to-one correspondence between any of these methods, so there is going to be a need for a transformation somewhere. In addition, a SQL Server Geometry or Geography type can contain a collection of points, linestrings, or polygons. There may also need to be a one-to-many translation. This is facilitated by methods STNumGeometries/STGeometryN and STNumPoints/STPointN.

Finally, there could be a pushpin (point) associated with the center of a polygon and/or text associated with the pushpin. Where the text comes from will be application-dependent.

Because there are many ways to process XML in SQL Server and in web services, that's lots of choices. Some that come to mind are:
1. Transforming a SqlGeometry/SqlGeography from a SqlDataReader to a custom XML format
   Javascript code in the web page processes the custom format.
2. Using SQL Server's XQuery to transform GML into GEORSS or KML
   This is returned to the web service as a SqlXml data type.
   Consumed directly as a layer in Virtual Earth.
3. Using an XSLT transform (or LINQ to XML) to transform GML to GEORSS or KML. This can occur in the web service or even a SQLCLR procedure, although this type of processing should really be accomplished in the web service.
4. Using SELECT ... FOR XML PATH to construct GEORSS or KML directly.
   Similar processing to choice #2 above.
5. Transforming a SqlGeometry/SqlGeography from a SqlDataReader to Javascript strings.
   Web page simply calls eval()

Hopefully this posting will elucidate the choices you have for this rendering. Depending on which method you choose, there could be quite a few "moving parts" (a la Rube Goldberg) in the solution.
- Web pages with custom Javascript code
- Web service to supply the data
- HTTP Handler to allow Virtual Earth to use remote GEORSS/KML
- Data access code in SQL Server
- Transformation code in web service or SQL Server

So.....what's your favorite method for using SQL Server 2008 spatial data with Virtual Earth?

Last week at the SQL Server 2008 Jumpstart I showed an example of using SQL Server 2008 spatial data types with Virtual Earth. I showed a single coding style. There really is a plethora of coding styles to using these together, and I'd like to describe some of the most common ones. I'd like to do this without much code for now, because an end-to-end description sometimes results in the graphic aspects (which other controls to use, how to build Ajax web pages, etc) seem to get in the way.

I'm not even going to mention using Virtual Earth to serve tiles directly, but only the Virtual Earth web control. Start with the Virtual Earth Interactive SDK for coding information.

In general, what we're after is:
   Page contains Virtual Earth control
   Page makes a request to a Web Service for data (usually Ajax)
   Web Service get and processes data from database, returns data
   Page uses Virtual Earth calls to present the data

The first two are straightforward, although the Virtual Earth control is available only online via a URL. This makes testing when you don't have internet access virtually impossible. The Ajax request (that is, call a URL with an XML payload using a browser-specific mechanism, wait asynchronously for the response) normally doesn't use SOAP-based Web Services, because you'd have to add and remove the SOAP headers. It's usually XML-in, XML-out, sans SOAP headers.

The Virtual Earth calls depend on what version of Virtual Earth control you're using. I'll show this using version 6.
1. You can render using shapes directly
2. You can import GEORSS into a shape layer
3. You can import KML into a shape layer

Virtual Earth supports visualizing shapes directly (#1 above) by using:
1. AddPushpin - to visualize a point or the centroid of a polygon
2. AddPolyline - that's Linestring in SQL Server 2008
3. AddPolygon - for polygons

For shape layers (#2 and #3 above), the code looks something like this:

var l = new VEShapeLayer();           
var veLayerSpec = new VEShapeSourceSpecification(type, local.xml, l);

map.ImportShapeLayerData(veLayerSpec, onFeedLoad);

The "type" in the VEShapeSourceSpecification constructor can either be GEORSS or ImportXML (that's KML). You can also have a Live Search Maps Collection but likely that one won't come from SQL Server.  Bear in mind that the VEShapeSourceSpecification constructor needs a *local* XML file URL. If you don't want to use a local XML file, you can build an ASP.NET HTTPHandler that uses redirection to pretend a remote XML file is local (see Mike McDougall's article and code to accomplish this).

One last twist that I've seen is that you can build your Javascript code directly in each page OR make the Web Service program reformat your data into Javascript calls directly. Once you're back at the page (the Web Service passes back a string of Javascript) you simply call Javascript's "eval" function. This makes the web page client-side code easy, but the Web Service code lots more complex, as you're generating code, not points, lines, and polygons.

Web Service choices follow next....

I'll have to admit it, when I first saw that SQL Server 2008 was adding spatial data support, I thought of it as a niche. The province of geographers, cartographers, and maybe a few others. Complex, involving a lot of higher mathematics, each province having their own geographic encoding, and so on... And that level exists, to me it's the production of spatial reference data. When I think of spatial reference data, I think of map data you'd buy from ESRI and data posted by government agencies. Or made available by utilities, so you don't hit a power cable while digging in your garden. As opposed to spatial line of business data.

Now, before you go searching your LOB application for latitude and longitude columns, how about looking for columns that contain "address". It's a short hop from address to lat/long by using a geocoder. The one I used is the MapPoint web service. Now you have line-of-business spatial data. I'll bet every app has a field that contains address. And how about looking for the nearest salesperson for a potential customer? Or the nearest warehouse? Mapping programs like Virtual Earth, Google Earth, and Yahoo Maps can give you general business information and maps but how about encoding information in your own business?

I'm quite excitied over this upcoming "niche" feature and think it could make its way into each and every application. That's spatial data "for the masses" (so I'm not the greatest at sloganizing, forgive me).

Theme design by Nukeation based on Jelle Druyts