Yesterday a friend asked me if it was better to always use a value of GEOMETRYCOLLECTION EMPTY as an alternative to making a geometry/geography column nullable. OGC has its own concept of database NULL, that is [GEOMETRY] EMPTY, where [GEOMETRY] can be any valid geometry subtype (that is 'POINT EMPTY', 'LINESTRING EMPTY', etc). When I asked why it mattered, he said he was tired of putting ..WHERE…IS NOT NULL in every UPDATE of SQL Server spatial data type. Sounds like reason for a blog entry.
NULL is a valid value of any SQL Server data type, defining a variable without initializing it makes it NULL.
DECLARE @x int; — value of @x is NULL
DECLARE @geog Geography — value of @geog is NULL
According to SQL-1999 (and above) a SQL data type can have properties and methods; accessor methods and mutator methods. Calling a accessor method on a NULL variable returns NULL. Calling a mutator method on a NULL variable is an error, in SQL Server the error text is "Mutator … cannot be called on a NULL value". If I have a table of 100000 geography instances and one row contains NULL, attempting to update that column without using WHERE..IS NOT NULL produces this error.
UPDATE dbo.Geotable SET geog.STSrid = 4326; — error if any row contains NULL geog
EMPTY GEOGRAPHY is a different concept then NULL because OGC defines a common architecture for geography, independent of SQL. In fact, the "OpenGIS® Implementation Specification for Geographic information – Simple feature access – Part 1:Common architecture" spec does not even use the word NULL once. EMPTY can be tested with the OGC standard method IsEmpty (STIsEmpty in SQL Server), and is defined as "the empty point set ∅ for the coordinate space". Because it is not database NULL, you can call methods on it without error:
DECLARE @a geometry = 'POINT EMPTY';
SET @a.STSrid = 4326; — works fine
Some other interesting behaviors of [GEOGRAPHY] EMPTY
declare @a geometry, @b geometry;
set @a = 'POINT EMPTY';
set @b = 'LINESTRING EMPTY';
select @a.STEquals(@b); — true
select @a.STDimension(); — -1 (negative one)
select @a.STIsEmpty(); — true, of course
select @a.STIsSimple(); — true
– Disjoint is true if "Intersection of @a and @b is the empty set"
select @a.STDisjoint(@b); — 1
– Both Within and Contains are false
select @a.STWithin(@b); — 0
select @a.STContains(@b); — 0
Some other interesting uses of [GEOGRAPHY] EMPTY
– the boundary of a point is the empty set
declare @x geometry = 'POINT(1 1)';
select @x.STBoundary().ToString(); — GEOMETRYCOLLECTION EMPTY
– but the boundary of a polygon is a closed linestring
declare @y geometry = 'POLYGON((1 1, 1 2, 2 2, 2 1, 1 1))';
select @y.STBoundary().ToString(); — 'LINESTRING (1 1, 2 1, 2 2, 1 2, 1 1)'
– and the boundary of a closed linestring (ring) is GEOMETRYCOLLECTION EMPTY
declare @z geometry = @y.STBoundary();
select @z.STBoundary().ToString(); — GEOMETRYCOLLECTION EMPTY
OGC's SQL Option spec (and SQL Server) differenciates between NULL and [GEOMETRY] EMPTY. For example, in the case above, the boundary of any NULL geometry IS NULL, not EMPTY.
Should you use NULL or EMPTY as a Geometry/Geography default? Depends on the semantics that you want to enforce, but, to me, lack of a value in SQL is NULL. And you can even include a constraint that the column value may not be either or both special values. EMPTY is a good way to ensure not having to use "IS NOT NULL" in every UPDATE statement, but that's as far as it goes.
What do you think?