So you want to use the SqlGeometry and SqlGeogrpahy collections in .Net. I found the lack of friendly collection based properties and methods suprising, i.e. no iterators, no collections which you can add to.(http://sqlblogcasts.com/blogs/simons/archive/2007/12/09/Katmai-Spatial---First-thoughts.aspx)

What this means is you can't create a line and then simply extend it. Or even extend a polygon to cover another point (easily) i.e. myPolygon.AddPoint(new Point(x,y)). This is sort of understadable as there are many ways the point could be added, to the exterior ring, or extend the polygon to cover the point (a pit like pulling an elastic band)

I've found that you either have to create your string representations of Geometry or use the WKB format. As these are standards based they shoud be fine. I think the safest is to use WKB but you need to know the binary structure

If you want to understand the format required for WKB the easiest to do use reflector against the Microsoft.SqlServer.Types dll.

What I've learnt so far is that the format is,

Point = [Byte order][Type][X value][Y value]

MultiPoint = [Byte order][Type][PointCount]<Point1><Point2>

Line = [Byte order][Type][PointCount][X Value][YValue][X value][Y value].....

MultiLine = [Byte order][Type][LineCount]<Line1><Line2> .....

Polygon = [Byte order][Type][Polygon Count][PointCount][X Value][YValue][X value][Y value].....

MultiPolygon = [Byte order][Type][Polygon Count]<Polygon1><Polygon2>....

Geometry Collection = [Byte order][Type][Shape Count]<Shape1><Shape2>.....

where

[Byte Order] is a singe byte indicating the endian or the rest of the bytes i.e. is 1 0x01000000 or 0x00000001

Type is 4 byte unsigned integer indicating the type of shape

   GeometryCollection = 7,
    LineString = 2,
    MultiLineString = 5,
    MultiPoint = 4,
    MultiPolygon = 6,
    Point = 1,
    Polygon = 3,
    Unknown = 0
PointCount, LineCount and polygon Count are unsigned integers and are used to iterate through the remainder of the bytes.

X Value and Y Value are 8 byte floating points.

<Shape> are repetitions of the relevant shape specification.

Creating a WKB in TSQL is quite neat see http://sqlblogcasts.com/blogs/simons/archive/2007/12/29/SQL-Server-2008-Spatial---How-many-ways-to-make-your-point-.aspx

Unfortunately the SqlGeometry and SqlGeograohy types are sealed so you can't inherit them. You would have to have a local instance and repeat all the method calls. You would have to implement your own parser to be able to add points and this would suck from a perf perspective, although very little validation is done when you create a geometry.


This is cross posted from my SQLBlogcasts blog which can be found here, http://sqlblogcasts.com/blogs/simons/SQL Server 2008 Spatial - Using it in .Net code

Categories:


Performing bulk inserts into a table when the table has an identity column has long bee a difficult challenge for client programming.

The difficulty has been getting the identity values that have been generated for the set of data you are saving. It becomes even more difficult when you want to save parent child relationships.

In SQL 2005 the OUTPUT clause was introduced which allowed you to return the identity values generated by a an insert of multiple rows (you can also generate any other data generated by the server i.e. defaults contrary to what BOL says). However there was one draw back. You couldn't relate the identity values to the original data set unless you had a natural key in your table or you stored a second surrogate key. This means you had to fully reload your client application data set. For large data sets thats not ideal.

This is because the OUTPUT clause for an INSERT statement can only reference the inserted pseudo table. This only contains the columns of the underlying table so you don't have access to any other data.

With SQL Server 2008 you get MERGE. MERGE is a combined INSERT, UPDATE and DELETE statement that supports the OUTPUT clause. I thought I would have a look and see if the same limitations apply to the OUTPUT claue as they do for the INSERT statement. Well they don't.

This means that you can return any value from your source data even if its not being inserted into the table. For example

CREATE TABLE Person (id int identity(1,1), name varchar(200))

 

MERGE INTO Person

USING (Values ('Simon',1),('Mark',2),('Stuart',3)) People(name,clientSideId)

ON 1=2

WHEN NOT MATCHED THEN

INSERT (name) values (People.Name)

OUTPUT People.clientSideId, Inserted.Id;

This allows you to update your client code with the idenity values generated by the server. Whilst this logically shold work, in CTP5 there is a bug that causes this to fail. It appears that the issue is to do with the ON clause. When the expression is always false something is causing a severe error. There are a number of workarounds, sometimes using a WHEN MATCHED clause will help and in others changing he USING to be a table reference and not a derived table helps.

For an example of saving parent child relationships have a look at the code below. This copies the objects and columns system views into new tables, whilst maintaining correct referential integrity.

The use of the @objectTab and @columnTab is to avoid the error mentioned above, once fixed this code can be greatly simplfied.

You might be thinking that this is goig to such from a performance stance. It doesn't. Whilst the execution plan costs is much higher than the straight insert the actual cost of cpu and IO can be less than the direct INSERT approach. I beleieve this is due to some sorting and assertions that are imposed in the MERGE statement which simplify the storage and upating of the underlying indexes.

Here is the code mentioned above. It returns two data sets that can be used to update your client data sets. You could combine into 1 data set if required.

drop table NewObject

go

drop table NewColumn

go

create table NewObject (id int identity(1,1) check (id > 0 ), name sysname)

go

create table NewColumn (id int identity(1,1), ObjectId int, name sysname)

go

set nocount on

 

declare @objects xml = (

select top 1000 object_id [@object_id]

      ,name [@name]

      ,(select column_id [@id], name [@name] from sys.columns where sys.columns.object_id = sys.objects.object_id for xml path('column'), Type )

from sys.objects

for xml path ('object'))

 

declare @NewObject table (new_id int, old_id int, unique ([old_id]))

declare @id int

declare @NewColumn table (new_id int, old_id int, new_col_id int, old_col_id int)

 

declare @objectTab table (name sysname, object_id int)

insert into @objectTab (name, object_id)

       select obj.node.value('@name','sysname') name

            , obj.node.value('@object_id','int') object_id

         from @objects.nodes('/object') obj(node)

 

merge into NewObject

using ( select name, object_id from @objectTab

       )src(name,id)

on NewObject.id = 0

when matched then

delete

when not matched then

insert (name) values(src.name)

output src.id, inserted.id into @NewObject (old_id, new_id);

 

select * from @NewObject

 

declare @columnTab table (name sysname,id int,  object_id int)

insert into @columnTab (name, id, object_id)

       select col.node.value('@name','sysname') name

            , col.node.value('@id','int') id

            , obj.node.value('@object_id','int') object_id

         from @objects.nodes('object') obj(node)

         cross apply obj.node.nodes('column') col(node)

 

merge into NewColumn

using (select col.name, col.object_id , col.id , obj.new_id new_object_id

        from @columnTab col

        join @NewObject obj on col.object_id = obj.old_id) col(name,old_object_id,old_col_id, new_object_id)

on 1=2

when not matched then

insert (name, ObjectId) values(col.name, col.new_object_id)

output col.old_object_id, col.old_col_id, col.new_object_id, inserted.id into @NewColumn (old_id, old_col_id, new_id, new_col_id);

 

select * from @NewColumn

 

 


This is cross posted from my SQLBlogcasts blog which can be found here, http://sqlblogcasts.com/blogs/simons/SQL Server 2008 - Batch inserts and identity columns using MERGE

Categories:
SQL Server 2008


The spatial functionality is hugely standards based. There spatial standards are well established what this means is that there are many methods to support the standards. One of the areas that you will see this is the creation of spatial data.

To create an instance of a spatial type you have many many options, most use the methods on the base types, i.e. geometry::STGeomFromText.

This is the set of ways of creating a point,

declare @p geometry;

 

set @p = geometry::Point(10,10,0);

set @p = geometry::STGeomFromText('POINT(10 10)',0);

set @p = geometry::STPointFromText('POINT(10 10)',0);

set @p = geometry::STGeomFromWKB(0x010100000000000000000024400000000000002440,0);

set @p = geometry::STPointFromWKB(0x010100000000000000000024400000000000002440,0);

set @p = geometry::GeomFromGml('<Point xmlns="http://www.opengis.net/gml"><pos>10 10</pos></Point>',0);

set @p = geometry::Parse('POINT(10 10)');

set @p = 'POINT(10 10)';

select @p

Only the point sub type has its own specific method geometry::Point. This I guess is because the attributes are well defined. You have one X, one Y and a SRID (Spatial Reference ID), where as the other types you have many Xs and many Ys.

Let me explain a few of these.

The STGeom... and STPoint... are the same expect the later verifies that the geometry you pass in is a Point.

The ..Text methods use the Well-Known Text (WKT) format as input and the ...WKB use the Well-Known Binary (WKB) format. You can use the STAsText method to get the text (WKT) representation of a variable and STAsBinary to find the binary (WKB) represenation

The GeomFromGml is an extension to the methods specified in the OGC and uses the Geography Markup Language (A subset for SQL Server) as input. You can use the AsGml method to get the GML representation . I would personally avoid GML representations due to the bulkiness of XML and overhead or associated processing.

Note: In BOL the method is GeomFromGML. This is incorrect only GeomFromGml works (notice the case of GML) this is consistent with the AsGml method, but contrary to the WKB methods. Who knows what CTP6 or RTM will bring.

You will undoubtedly see most examples using the text based methods because they are more readable. However on on doing some testing it seems the binary methods are quicker (in some tests twice a quick). This makes some sense as each point coordinate can be read as a single entity (8 byte float) where as the text representation has to be read character by character and then creating the value once a boundary (space, comma or bracket) has been reached.

It is also easier to create binary values in TSQL

geometry::STPolyFromWKB(0x00+0x00000003+0x00000001 + 0x00000005

                      + cast(@minx as binary(8)) + cast(@miny as binary(8))

                      + cast(@maxx as binary(8)) + cast(@miny as binary(8))

                      + cast(@maxx as binary(8)) + cast(@maxy as binary(8))

                      + cast(@minx as binary(8)) + cast(@maxy as binary(8))

                      + cast(@minx as binary(8)) + cast(@miny as binary(8)) ,0)

This creates a simple 4 sided polygon (aka a square). Key thing to note is the first two bytes that specify the byte order which for TSQL binary functions you want 0 and not 1 (most examples use 1.


This is cross posted from my SQLBlogcasts blog which can be found here, http://sqlblogcasts.com/blogs/simons/SQL Server 2008 Spatial - How many ways to make your point?

Categories:


For a list of spatial references and their definitions including units of measure have a look at the system view

select * from sys.spatial_reference_systems


This is cross posted from my SQLBlogcasts blog which can be found here, http://sqlblogcasts.com/blogs/simons/SQL Server 2008 Spatial - List of spatial referneces

Categories:


A friend recently emailed me after trying out the new spatial stuff in Katmai asking why the distance from Beaverton to Boston was 51, and any way 51 what.

On looking at his code I realised he had used the Geometry type and not the Geography type. The former is used to work on a single planar surface measured in x and y. The latter is for doing spatial stuff on spheres i.e. the earth.

He had passed the longitude and latitude to the geometry type and done a STDistance call to find the distance. Even though he had speciifed a spatial reference for WGS84 the answer came back as 51. Which was simply the hypotenuse of the triangle using straight forward pythagorus.

The Spatial reference only applies when using the Geography data type which deals with the elipitcal word. So why do we need spatial references anyway. Well they define the world in which you are working, i.e. what is 0, how big is the world etc, and what unit of measure is to be used.

If you want a nice explanation of spatial references have a read of the Mortens post here http://www.sharpgis.net/2007/05/05/SpatialReferencesCoordinateSystemsProjectionsDatumsEllipsoidsConfusing.aspx

This might be confusing and you may ask why do we need it for the Geometry type. The key is the unit of measure, If I take some spatial data describing a house from the UK over to building site in America and combine the data if I don't have the data stamped with a spatial reference I will end up in all sorts of mess. Why?

Units of measure.

If I measure something in feet and inches and then cobine with measurements in metres. Without knowing they are using different measurements I will end up with the wrong result. Thats why in SQL if you compare or perform an operation on two geometry (or geography) values with different spatial references (SRID) you will end up with a NULL answer.

It would have been nice for the types have been able to convert between the Spatial  References for at least the geometry types.

 


This is cross posted from my SQLBlogcasts blog which can be found here, http://sqlblogcasts.com/blogs/simons/SQL Server 2008- What is a spatial reference?

Categories:


Why would you want to use filestream in SQL Server 2008?

The files you store using filestream in SQL Server 2008 are not accessible directly using the normal io methods.Instead you have to make a SQL call to get a pointer which you can then use with normal methods.

So if you have to access SQL anyway why not store the data directly in SQL in the say a varbinary(max) field?

The answer is performance.

Jim Gray and his colleagues did research into the storing of large data in the database (SQL2005) versus the file system (NTFS).

To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem,

In this research it showed that storing anything less than 256Kb was better done in the database, and anything larger than 1M should be stored in the file system. In between depended on the read/write ratio, and storage age.

So if you need to store data blobs that are > 256Kb (per instance) you need to consider filestream.

[18/12/2007 Link fixed]


This is cross posted from my SQLBlogcasts blog which can be found here, http://sqlblogcasts.com/blogs/simons/SQL Server 2008 Filestream : Why?

Categories:


When you read the specs for the spatial stuff (BOL has some incorrect links) some of the methods talk about min and maximum geometry dimensions. I must have been doing too much analysis services stuff and for the life of me I couldn't figure out what a dimension was in this context.

On reading the specs I had a homer moment "DOH!!!".

Of course the dimensions are whether it is 1D, 2D, 3D (remember Jaws ?) etc.

So how does this relate to geometries and spatial stuff.

There are 3 base geometries in SQL 2008, a point, a line and a polygon. In Katmai the spatial is only defined in 2 dimensions (ok so each geometry has a z and an m value but these aren't really used).

So the maximum dimensions in Katmai is 2, i.e. we are dealing with a flat surface (even the Geography stuff is really flat there is no concept of height).

In this situation a point has 0 dimensions (no length or area), a line has 1 dimension (no area) and a polygon has 2 dimensions.

Why is this inportant? Well the spatial methods often only work with or return geometries with certain dimensions. i.e. the method STCrosses returns true if the intersection of the geometries has a lesser dimension than the maxium dimension of the two geometries, and the intersection only contains points inside each geometry. So in english, if you have a polygon (2D) this will return False, because when 2 polygons overlap the intersection is another polygon (2D) 2D=2D and so we get false. You can overlap polygons on their edges where the intersection is just a line. But in this case the points of the line are not inside the 2 polygons but rather on the edge.

Compare this with STIntersection that returns true if either geometries share the same points. No matter the dimension of the intersection.

Finally STContains only applies to polygons (generally). A line only has 1 dimension and so can't contain anything (well it can contain another line or a point if the line or point are just a part of the line). My point here is that even if a line is closed (start and end points the same), it only has 1 dimension and so does not cover the area that it has enclosed, and so any geometries within this area are not contained in the line. You can create a polygon from that closed line, in which case this polygon will contain any geometries that exist in the enclosed space.

More on functions later.


This is cross posted from my SQLBlogcasts blog which can be found here, http://sqlblogcasts.com/blogs/simons/Katmai Spatial - Whats a dimension?

Categories:


Following on my previous post on discovering the spatial types I mention that one of the aspects to understand is the different types of geometries.

Point

Fairly this is obvious its a single spot that has an X and a Y co-ordinate.

Multi Point

A group of points that are not connected. (if they were connected it would be a line or a polygon)

Line String

Guess what this is a line, but not neccesarily straight. Your line can change direction but it is continuous. A bit like a route, i.e. start at A go to B then go to C. Each point (A, B,C) is defined as a point and stored in the geometry.

The line can cross itself so i.e. to form a "&" sign or a "4". Whilst you might think that because this causes a closed area you have a closed geometry you don't. (I'll talk about closed geometries later)

LINE (0 0,1 1,5 1,10 0, 0 10)

Multi Line String

A group of lines. They do not have to be continuous.

Polygon

This is a shape that consists of one or more lines that start and end at the same point. A polygon must have at least 3 points (simplext polygon is a triangle).

POLYGON((0 0,0 10,10 10,10 0,0 0)) creates a square polygon 10x10 with its bottom left corner at 0,0

A  Polygon can have holes, this is done by specifying an additional set of points when you define the polygon. This set of points have to make up a polygon that is removed from the first polygon

POLYGON((0 0,0 10,10 10,10 0,0 0),(3 3,3 7,7 7,7 3,3 3))  amends the first polygon by removing a 4x4 square from the middle.

Multi Polygon

This guy is your friend, a polygon needs to be valid (I'll explain later). Thankfully there is a nice method on the geometry type that turns an invalid Polygon into a valid one. This generally creates a Multi Polygon, which is a collection of polygons (no suprise there).

Geometry Collection

You'll never guess what this is. Yep its a collectin of the above geometry types.

The key thing to consider is the dimension of your geometry as this is important when working with some of the methods like STOverlap and STContains. I'll discuss methods in another post.


This is cross posted from my SQLBlogcasts blog which can be found here, http://sqlblogcasts.com/blogs/simons/Katmai Spatial - What's a geometry

Categories:


I've been doing some work with the Spatial types in Katmai and these are my first thoughts.

The first thing is that the data type is the same used by SQL as used in your .Net code. So what does that mean,

·         You use the same methods, which is great from a familiarity perspective

·         When calling the methods in TSQL you are calling the CLR methods and so they are case sensitive.

·         Because TSQL doesn't have things like collections and arrays the types don't have them. So if you are using the spatial types in .Net this is why there is no .Points property that returns the points of the geometry. Instead you have to call a function if passing an index to get the relevant Point. There is a function that returns number of points to enable you to loop through the points. You will therefore have code like this,

for (int i = 0; i<geom.STNumPoints();i++)

{

     SqlGeometry Point = geom.STPointN(i+1);

}

·         You can return your value to the client and perform actions upon it orsan the client or do it in TSQL. This is great because you can make a collection of geometries on your client and do spatial stuff with it without even touching a database. Why might you want to do this? Spatial calculations are complex, its easy to calculate the area of a square, but what about an irregular polygon, you can do it but with how many lines of code. With the spatial types just use STArea().

·         There aren’t any nice methods that work with other CLR types. It would be great if the STPointN returned a CLR Point type, but it doesn’t, so you do end up with more code than you may think you should.

So what other pointers are there,

·         All methods are prefixed by ST. This I don’t understand and seems pointless to me. (Ok so most are and are due to conforming to standards/

·         The Point arrays are not 0 based but 1 based indexes. So you have to start at 1.

·         Everything is related, a Multi Polygon is a geometry that contains Polygon geometries, A Polygon consists of Points which themselves are geometries. You may ask why? Well the reason is that all the spatial methods can be applied to all the geometry types, finding distances, areas, unions etc.

·         Creating geometries isn’t the nicest mechanism. You generally create a string representation and then pass that to be parsed.

declare @g geometry;

set @g = geometry::STGeomFromText(‘LINE(1 1,10 10,10 20,20 10)’)

All in all its really easy. Once you get round the types of geometries and what all the methods do its amazing what you can do with so few lines of code, both in TSQL or in your favourite .Net code.

 


This is cross posted from my SQLBlogcasts blog which can be found here, http://sqlblogcasts.com/blogs/simons/Katmai Spatial - First thoughts

Categories:


If you are upgrading to Visual Studio 2008 then be careful with SQL Server.

You may or may not know that the Business Intelligence Development Studio (BIDS) is a visual studio shell. If you don't have Visual Studio installed when you install SQL Server then it installs a version of Visual Studio for you.

So how does this impact VS 2008. Well you may think that when you upgrade you can uninstall Visual Studio 2005.

STOP

You can uninstall VS 2005, but make sure you don't uninstall the shell used by SQL for BIDS. How do you know what that is? I'll tell you its "VS 2005 Premier Partner Edition".

If you uninstall this you will not be able to use BIDS, so you won't be able to develop Integration services packages, Reporting Services reports, or Analysis Services Database. Whats more based on my experience you have to jump through a few hoops to get BIDS installed again.

You might think why can't I use VS2008, well the simple answer is that Microsoft haven't developed the project types ( the designers etc that work in VS for SSIS, SSRS and SSAS) for VS2008.

VS2008 will be used for the Katmai version of BIDS so if you have to support a SQL2005 and a SQL 2008 instance you will need to make sure you keep your SQL2005 BIDS lying around. Unfortunately the benefit of VS2008 replacing VS2005 by supporting all .Net versions ( 2.0, 3.0 and 3.5) is not the same for us SQL guys we have to stick with both versions.

 


This is cross posted from my SQLBlogcasts blog which can be found here, http://sqlblogcasts.com/blogs/simons/Installing Visual Studio 2008 and SQL Server 2005

Categories:
SQL Server 2008

Theme design by Nukeation based on Jelle Druyts