About a month ago, I received a copy of "Beginning Spatial with SQL Server 2008" by Alastair Aitchison. I've become acquainted with Alastair through his frequent postings on the SQL Server Spatial forum (under the login Tanoshimi), where he's always been very helpful and patient with folks starting out with spatial data concepts and SQL Server practice. Although I'd been working with SQL Server spatial for a while now, I read the book from cover to cover. It doesn't disappoint. The basic spatial concepts are explained in the first few chapters, and he goes on from there to present an encyclopediadic approach to the spatial methods and properties supported in SQL Server 2008.

I especially enjoyed that he showed examples of geocoding, visualization, and services using both Virtual Earth and Google Earth. I also liked the scenario based approach to the spatial functions, for example showing STIntersection using the Appian Way and the Pontine Marshes. Each spatial function described has an example using real-world geographic use case, and this helps in understanding. The end-to-end examples gave you a good understanding about how to start off with spatial using import, query, and visualization.

The only quibbles I had were with the introduction to SQLCLR and object orientation WRT spatial. It was confusing to describe the OGC inheritence hierarchy without contrasting this with SQL Server's actual implementation (two data types geometry and geography each supporting all of the instanciable OGC types, like point, polygon, etc). And because SQL Server's .NET UDTs don't support inheritence, individual classes for each OGC type is not the way they're exposed (or even implemented internally AFAIK). I was also surprised at the assertion that you can't initialize a geometry/geography from a WKT string in T-SQL; this works because conversion from string implicitly calls the Parse method and uses the default SRID for either type. Reported it as errata. Finally, more performance-related information might have been useful, but the book was positioned as a beginning book after all.

Starting off with spatial data is more than just learning the intricacies of a new data type or lists of methods and properties of a SQLCLR-based type; you need to acquire some domain-specific knowledge to make heads or tails of the data type.  And that's where this book is invaluable. I'd recommend it as a way to get up to speed quickly with the SQL Server 2008 spatial types.

For those of you who's blog readers don't always get updated (rather than brand new) blog entries, there's a happy ending to my last "fixed-almost" blog entry on EDM 4.0 and parameterization. I've heard that this has been fixed to generate VARCHAR(8000) parameters and also that the unparameterized version:

var x = from a in ents.authors
        where a.au_lname == "Smith"
        select new { a.au_lname, a.au_fname };

will generate ...WHERE au_lname = 'Smith' instead of ...WHERE au_lname = N'Smith'.

Note that this is a different semantic; they generate a parameter type that agrees with the column's data type rather than using the .NET string type's data type equivalent (NVARCHAR). This will be better for performance because the database and parameter types agree.

Fix is coming in next beta version of .NET 4.0. I've updated the original blog entry to note this as well.

Categories:
Data Access

One of the first things that I did after installing VS2010 beta this week was to check on a parameterization problem with string parameters, and Linq To SQL/ADO.NET Entity Framework generated code. In the original version, this L2S query:

var x = from a in ents.authors
        where a.au_lname == "Smith"
        select new { a.au_lname, a.au_fname };

or this EF query:

string name = "Smith";
var x = from a in ents.authors
        where a.au_lname == name
        select new { a.au_lname, a.au_fname };

Would cause plan cache pollution and possibly performance problems. These queries produced a query plan with the parameter specified as NVARCHAR(5). Each query with a different length will produce a new plan in the cache, and the use of NVARCHAR rather than VARCHAR will cause a table scan (rather than a seek) in the plan, regardless of how many rows are in the table, because there must be a conversion of au_lname from VARCHAR to NVARCHAR as part getting the rows, making the query non-sargable. The correct database data type (VARCHAR) needs to be specified as the parameter data type.

In .NET 4.0/VS2010 Beta 1, L2S handles this, producing a VARCHAR(8000) parameter, as SQL Server's autoparameterization does. Entity Framework produces NVARCHAR(4000), fixing the length problem but still using the a different data type than the table specifies.

NOTE: This will be fixed in the next beta version of .NET 4.0 to produce VARCHAR(8000). See complete post here.

In my next installment of my MSDN "Under The Table", I'll be writing about the performance implications of data access method code in detail. I do mention "fixed in the next release" WRT L2S and EF, and hope that it will be resolved before .NET 4.0 ships.

Categories:
Data Access

I finally got around to install .NET 4.0 CTP today on a SQL Server box to test a long-held theory. Didn't quite work out the way I'd thought.

When .NET support was introduced in SQL Server 2005, there was a lot of interest in how SQL Server and .NET Framework would approach the versioning story. The story was that SQL Server would always load "the latest version of the .NET runtime installed on the machine". SQL Server 2005 shipped with .NET 2.0.50727.42 and since then, there's been a number of Service Packs and even new releases of .NET, including 2.0 SP1, 3.0, 3.5, and 3.5 SP1. SQL Server 2008 shipped with a pre-requisite of .NET 3.5 SP1. And, up until this point, SQL Server has always loaded the latest version of the .NET runtime. That's because there is no choice.

The basic bits of .NET reside in mscorlib.dll and System.dll. Through all of the releases after SQL Server 2005, those bits are still in the "version 2.0" directory. The 3.0 and 3.5 version don't install a side-by-side version of mscorlib (or System.dll, or System.Data.dll or most of the "core" .NET framework). These are versioned by updated the 2.0 versions in-place. .NET 3.0 and 3.5 directories exist, but they only contain additional libraries, like WCF and LINQ for example. The current version of the core .NET framework at the time of this writing is 2.0.50727.3053.

Enter .NET 4.0, which includes side-by-side new versions of mscorlib.dll or the rest of the entire core. When the .NET 4.0 CTP is installed on a machine that includes SQL Server 2008, the version of .NET loaded by 2008 is 2.0.50727.3053 (sys.dm_clr_properties reports 2.0.50727), not 4.0. Specifying 4.0 in a configuration file will force SQL Server to load the 4.0 version, but then attempting to create and initialize a variable of type "geography" fails with the message "Method's type signature is not interop compatible".

It will be interesting to see how this plays out as .NET 4.0 gets closer to becoming a reality.

Categories:
SQLCLR

My interview with the folks at RunAs Radio is out. I'm discussing one of my favorite topics, spatial data and spatial in SQL Server 2008, with Richard and Greg. This wandered off into some interesting possible usages for this data. Catch it here.
My latest article on out. It's in the May issue of MSDN magazine, and can be found here. It this article I explore the internals of programming with the filestream feature of SQL Server 2008 and some best practices around when and how to use it. Hope you like it.

I've answered quite a few questions lately about how to make sure that your spatial index is being used. Use of a spatial index with a large or complex spatial dataset can mean the difference between a query that takes minutes to execute and subsecond execution. You also want to use an index hint as a last resort, the spatial index should not need a hint. Here's a few things to try, in order of importance.

1. Apply SQL Server 2008 SP1!!! I can't stress this one too much. There was a change to query costing that affected spatial index use.
2. Make the query as uncomplicated as possible. Don't try to combine STIntersects with a call to STBuffer, MakeValid or other nested spatial method calls or subqueries. Use multiple statements if needed.
3. If you're running the code in SSMS, use sp_executesql around the spatial query (or use your own stored procedure with the spatial value as a parameter) to ensure the query coster "knows" the parameter value at the time its creating the query plan, that is, at beginning of the batch or on entry to a stored procedure or sp_executesql.Here's an example:

-- does not use the spatial index without a hint
declare @latlonPoint geometry = geometry::Parse('POINT (45.518066 -122.767464)')
select a.id, a.shape.STAsText()
from zipcodes a
where a.shape.STIntersects(@latlonPoint)=1
go

-- this does use the spatial index without using a hint
declare @latlonPoint geometry = geometry::Parse('POINT (45.518066 -122.767464)')
exec sp_executesql
N'select a.id, a.shape.STAsText()
from zipcodes a
where a.shape.STIntersects(@latlonPoint)=1', N'@latlonPoint geometry', @latlonPoint
go

-- so does this
create procedure find_zipcode (@g geometry)
as
select a.id, a.shape.STAsText()
from zipcodes a
where a.shape.STIntersects(@latlonPoint)=1
go

declare @latlonPoint geometry = geometry::Parse('POINT (45.518066 -122.767464)')
execute find_zipcode(@latlonPoint)

4. If you're using client code, make sure you use a parameterized query and that it passes the value in the SqlParametersCollection. ADO.NET will change this into a call that uses sp_executesql.

5. Don't depend of passing in a string literal to give the query coster the right info, because the code "creates" the point inline, after the query plan has been created.:

-- use a parameterized query with sp_executesql or stored procedure instead
select a.id, a.shape.STAsText()
from zipcodes a
where a.shape.STIntersects('POINT (45.518066 -122.767464)')=1

6. Check the query plan (actual or estimated plan will work) to ensure the index is being used. The Spatial Index step in the query plan is easy to locate.

7. Make sure you have an appropriate spatial index with your data AND for your query sample. You can check this with the spatial index diagnostic stored procedures. To see how to use these procedures, start with the blog series here.

8. Use a hint as a last resort and see if it makes a difference in the query speed. If using a spatial index hint causes an error "Could not create plan", it may mean that your query is too complex. See step 2.

I hope this helps your query response be as fast as possible.

The latest "chain letter" type topic to wend it's way through the blogsphere is "things you now know" and I've been tagged. Here's some things that I now "know" (or opinions I now have) that I wished I'd known earlier in life. I present them without explanation because after all, they're opinions.

In general, software doesn't "evolve". Rather, it goes in cycles which can be equated to fashions, changing slightly each cycle.

Being adament about a particular (currently fashionable) methodology or product (i.e. not listening to others, what's called "output only mode") takes up lots of time in meetings with less payback than it's worth, if its already been decided to adopt a similar but slightly different methodology or product.

It's more difficult to do software maintanance that to do new development. With maintanance, not only your (usually small) change has to work, but all the things and interactions you didn't originally write must continue to work also. With new development, only the things you currently write must work.

Hope this helps someone. MHO.

Categories:

On experimenting with the CTP of SQL Server 2008 SP1, I found that there were some changes made to the costing algorithm, vis-a-vis spatial indexes. In the RTM version, spatial indexes weren't uses as often as they could be, because the query costing was too high compared to the spatial filters. This made programmers resort to index hints for spatial indexes at the slightest provication. Here's some representative examples:

-- find surrounding zipcodes
-- no indexes used, 5 secs
-- cost pre-SP1: 45.2741
-- index used in SP1, subsecond
-- cost SP1: 14.2199
declare @id varchar(9) = '97225'
exec sp_executesql
N'select b.id, b.shape.STAsText()
from zipcodes a, zipcodes b
where a.id = @id and a.shape.STIntersects(b.shape)=1', N'@id varchar(9)', @id
go

-- cost SP1: 26.3341
-- cost pre-SP1: 219.754
declare @id varchar(9) = '162243'
exec sp_executesql
N'select b.id, b.shape.STAsText()
from census a, census b
where a.id = @id and a.shape.STIntersects(b.shape)=1', N'@id varchar(9)', @id
go

So you should not have to hint spatial indexes as frequently (or at all) in SP1. If you'd had those hints encapsulated in plan guides, you can just turn the plan guides off.

I'm still having trouble with some queries picking the best spatial index out of multiple choices; an "hhhh" index vs a "mmmm" index for example. So you may still have to hint if you have multiple spatial indexes for varied use cases. But for a single use-case/single index, it appears to pick the best of "use spatial index" vs "scan and filter" every time. To determine which is the best spatial index for a specific use-case have a look at the spatial analysis proc series starting a few postings back.

Also, don't forget that in order to give the plan coster the best information, always use stored procedure parameters or parameterized queries. If you're just executing T-SQL batches in SSMS, sp_executesql will do the trick; if you use parameterized queries in any of the database APIs (e.g. ADO.NET), they'll convert it to a call to sp_executesql for you. And don't forget to use the right syntax for the spatial index method, that is, use "a.STIntersects(b) = 1" rather than "1 = a.STIntersects(b)".

I was browsing through MSDN magazine (the paper version) today and noticed an ad for the upcoming SQLConnections conference in Orlando. I'm doing three sessions there as well as a day-long pre-conference talk "All about SQL Server 2008 Spatial Data and writing Location-aware Applications". I'm REALLY looking forward to this one and have some interesting and novel demos and information. Almost everyone is really using spatial apps, but store the info in its "alternate" form; that is address-city-state-postal code-country form.

However, the ad mentioned my pre-conference workshop was "A Day of SQL Server Security". This turned out to be a misprint (it is correct on the conference website), I am doing a 1.5 hour talk on "SQL Server 2008 Security for Developers and Architects", but the full-day pre-con is on Spatial and Spatial apps. Really...

Just didn't want anyone showing up expecting a different topic. Although I could do a day (or more) on SQL Server security, it will be the condensed version that "plays at" SQLConnections. You've been warned...

Categories:

Theme design by Nukeation based on Jelle Druyts