I've been busy for the last few weeks putting together a "manifesto" whitepaper about the .NET-based data access stacks and also the possibilities for .NET programmers in the SQL Server product itself. The whitepaper's direct link is here, although its also available via both the MSDN Data Developer Center and the SQL Server 2008 Application Development website. I don't see the arbitrary distinction between application developers who use databases and SQL developers that some do, so writing this paper and covering both was a good fit.

This whitepaper is quite a bit more high-level than others that I've written, meant to be an all-embracing introduction to .NET and SQL Server and point out the possibilities for .NET programmer. While writing this I was stunned at how many integration points there are between .NET and SQL Server. Every part of the SQL Server product uses .NET in one form or another. Check it out.

Even though there's a quite a bit of coverage of futures, the platform is so fluid that it doesn't include the announcement on the rename of ADO.NET Data Services to WCF Data Services and also the announcement of the Open Data Protocol (OData). Follow the links to these announcements.

Hope you like the paper, I had fun writing it...Cheers.

When I downloaded SQL Server 2008 R2 CTP2 last week, I went right by (for now) the cool Gemini features and even Multi Server Management and went straight for the new SSRS Map Control. After explaining the SQL Server 2008 spatial data types, the spatial methods, and SRIDs to folks until they glazed  over, and then watch their eyes light up when I showed 'em the Spatial Results tab in SSMS, I grokked this "power of visualization" thing right away. Maybe not the most earthshattering feature in R2, the map control is perhaps the coolest.

Start by dropping a map control on a report in BIDS or ReportBuilder 3.0 and wizarding your way through a simple color analytical map of say, sales by state. And take it from there. If you're not from the US (or "you're so bored with the USA"), get base map data of more relevence to you by importing an ESRI Shapefile. Or using Bing Maps as a base map. You could even use your the geometry of your business' office Geometric layout if you have it in SQL Server Spatial format; set the map property "Coordinate System=Planar" for that type of data. And be sure to have a read through Spatial Ed's "Cartographic Adjustment of Spatial Data for SQL Server Reporting Services", series starting here for some real fun.

I'm not going to say much more for now because my next MSDN "Under The Table" column is about this beauty, as well as some other new cool spatial visualization tools for SQL Server, and I'm wary of double-dipping with the blog. Maybe some more info on advanced ways to use the control later.

So, till then...have a map!

And a CTP is available now.

I first heard of the .NET Complex Event Processing engine at this year's TechEd. Until lately, all we had to go on was Torsten Grabs' TechEd presentation and a whitepaper that gave a fairly high-level overview of its purpose and architecture. And, oh, it was to be released as part of SQL Server 2008 R2. So last week when SQL Server 2008 R2 CTP3 was released, there was also an announcement that a CTP of StreamInsight was available. The CEP engine has a name. And there's code and libraries to work with.

Downloaded it last week and installed it. It has no dependencies on SQL Server 2008 R2; in fact it doesn't even use SQL Server at this point. The metadata is stored in a SQL Server CE database, and the data it's processing as a stream isn't particularly stored anywhere unless your event target stores it. There's an example of consuming data from and storing data in SQL Server, but that's not always the point.

At a quick first glance, it is vaguely reminiscent of SQL Server Notification Services with event sources and event targets, but the similarity ends there. In the Notification Services architecture, events are stored in the database and matched against subscriptions to produce notifications. The focus in SQLNS is on storing events, matching, and formatting notifications. Until the event is stored, it's not visible to the infrastructure, only to the event provider.

In StreamInsight, the focus is on processing and querying the data in the input stream, in real-time. The stream processing architecture is lightweight, and almost all of the use cases presented in the docs name "the ability to handle up to 100000 events per second for a large number of devices" as a goal. Storing events is an option, but the interesting part is being able to query the event stream in-flight and generate output events based on aggregations, outliers, or other characteristics of the stream. In this respect, StreamInsight resembles TIBCO events much more than SQLNS.

There is an input adapter architecture and an output adapter architecture. You can write your own event server, use their server infrastructure or imbed it in an existing application. There's even a model for reporting state changes in the stream (event source) to interested parties in the event sink to implement a pub-sub application. The development model is .NET-based and at the center of it is a LINQ provider over the event stream. You register compiled LINQ queries that execute over to stream and provide selection, projection, joins, aggregation, and some additional operators like timestamp modification. You can also extend the LINQ provider's functionality with your own user-defined functions.

Rather than continue to regurgitate the documentation (I'm doing this here as an exercise to make sure I "get it", feel free to disagree with or correct me), I'll urge you to download the CTP, run the examples, get comfortable with the architecture, and try writing a simple adapter to get a feel for it. There's a team blog and a forum where the team members hang out (I recognize some of the folks' names there) if you have questions or problems.

There's also a nice event flow debugger to give you visibility over how stream flow through the processing engine, queries, and adapters. But beware...the debugger needs a later OS like Vista, Windows7, or Windows Server 2008 to work. It looks like the debugger is using event tracing for Windows (ETW) as its trace vehicle, and because it uses the "newer" ETW (ETW-vnext was known once as Crimson, and introduced in the latest OSes) it needs one of these OSes to function.

So, happy stream processing!...
Cheers 

Categories:
SQL Server 2008

Just got a mail message from an old friend who asked why I'd dropped out of sight in the past month of so. I've been in "extended partial vacation mode" for the last month or so and haven't blogged in a while. But I'm "still alive and well" (Johnny Winter was at the Portland blues festival last week, couldn't help the reference).

Next week I'll be at the ESRI user conference catching up on the latest and greatest in the GIS world. On Tuesday, I'll be doing a fairly short talk for the ESRI SQL Server special interest group on SQL Server 2008 spatial performance troubleshooting and also a longer followup talk (with lots of real-world examples) at the Microsoft booth. If you're around, drop by and say hi.

About a month ago, I received a copy of "Beginning Spatial with SQL Server 2008" by Alastair Aitchison. I've become acquainted with Alastair through his frequent postings on the SQL Server Spatial forum (under the login Tanoshimi), where he's always been very helpful and patient with folks starting out with spatial data concepts and SQL Server practice. Although I'd been working with SQL Server spatial for a while now, I read the book from cover to cover. It doesn't disappoint. The basic spatial concepts are explained in the first few chapters, and he goes on from there to present an encyclopediadic approach to the spatial methods and properties supported in SQL Server 2008.

I especially enjoyed that he showed examples of geocoding, visualization, and services using both Virtual Earth and Google Earth. I also liked the scenario based approach to the spatial functions, for example showing STIntersection using the Appian Way and the Pontine Marshes. Each spatial function described has an example using real-world geographic use case, and this helps in understanding. The end-to-end examples gave you a good understanding about how to start off with spatial using import, query, and visualization.

The only quibbles I had were with the introduction to SQLCLR and object orientation WRT spatial. It was confusing to describe the OGC inheritence hierarchy without contrasting this with SQL Server's actual implementation (two data types geometry and geography each supporting all of the instanciable OGC types, like point, polygon, etc). And because SQL Server's .NET UDTs don't support inheritence, individual classes for each OGC type is not the way they're exposed (or even implemented internally AFAIK). I was also surprised at the assertion that you can't initialize a geometry/geography from a WKT string in T-SQL; this works because conversion from string implicitly calls the Parse method and uses the default SRID for either type. Reported it as errata. Finally, more performance-related information might have been useful, but the book was positioned as a beginning book after all.

Starting off with spatial data is more than just learning the intricacies of a new data type or lists of methods and properties of a SQLCLR-based type; you need to acquire some domain-specific knowledge to make heads or tails of the data type.  And that's where this book is invaluable. I'd recommend it as a way to get up to speed quickly with the SQL Server 2008 spatial types.

My interview with the folks at RunAs Radio is out. I'm discussing one of my favorite topics, spatial data and spatial in SQL Server 2008, with Richard and Greg. This wandered off into some interesting possible usages for this data. Catch it here.
My latest article on out. It's in the May issue of MSDN magazine, and can be found here. It this article I explore the internals of programming with the filestream feature of SQL Server 2008 and some best practices around when and how to use it. Hope you like it.

I've answered quite a few questions lately about how to make sure that your spatial index is being used. Use of a spatial index with a large or complex spatial dataset can mean the difference between a query that takes minutes to execute and subsecond execution. You also want to use an index hint as a last resort, the spatial index should not need a hint. Here's a few things to try, in order of importance.

1. Apply SQL Server 2008 SP1!!! I can't stress this one too much. There was a change to query costing that affected spatial index use.
2. Make the query as uncomplicated as possible. Don't try to combine STIntersects with a call to STBuffer, MakeValid or other nested spatial method calls or subqueries. Use multiple statements if needed.
3. If you're running the code in SSMS, use sp_executesql around the spatial query (or use your own stored procedure with the spatial value as a parameter) to ensure the query coster "knows" the parameter value at the time its creating the query plan, that is, at beginning of the batch or on entry to a stored procedure or sp_executesql.Here's an example:

-- does not use the spatial index without a hint
declare @latlonPoint geometry = geometry::Parse('POINT (45.518066 -122.767464)')
select a.id, a.shape.STAsText()
from zipcodes a
where a.shape.STIntersects(@latlonPoint)=1
go

-- this does use the spatial index without using a hint
declare @latlonPoint geometry = geometry::Parse('POINT (45.518066 -122.767464)')
exec sp_executesql
N'select a.id, a.shape.STAsText()
from zipcodes a
where a.shape.STIntersects(@latlonPoint)=1', N'@latlonPoint geometry', @latlonPoint
go

-- so does this
create procedure find_zipcode (@g geometry)
as
select a.id, a.shape.STAsText()
from zipcodes a
where a.shape.STIntersects(@latlonPoint)=1
go

declare @latlonPoint geometry = geometry::Parse('POINT (45.518066 -122.767464)')
execute find_zipcode(@latlonPoint)

4. If you're using client code, make sure you use a parameterized query and that it passes the value in the SqlParametersCollection. ADO.NET will change this into a call that uses sp_executesql.

5. Don't depend of passing in a string literal to give the query coster the right info, because the code "creates" the point inline, after the query plan has been created.:

-- use a parameterized query with sp_executesql or stored procedure instead
select a.id, a.shape.STAsText()
from zipcodes a
where a.shape.STIntersects('POINT (45.518066 -122.767464)')=1

6. Check the query plan (actual or estimated plan will work) to ensure the index is being used. The Spatial Index step in the query plan is easy to locate.

7. Make sure you have an appropriate spatial index with your data AND for your query sample. You can check this with the spatial index diagnostic stored procedures. To see how to use these procedures, start with the blog series here.

8. Use a hint as a last resort and see if it makes a difference in the query speed. If using a spatial index hint causes an error "Could not create plan", it may mean that your query is too complex. See step 2.

I hope this helps your query response be as fast as possible.

On experimenting with the CTP of SQL Server 2008 SP1, I found that there were some changes made to the costing algorithm, vis-a-vis spatial indexes. In the RTM version, spatial indexes weren't uses as often as they could be, because the query costing was too high compared to the spatial filters. This made programmers resort to index hints for spatial indexes at the slightest provication. Here's some representative examples:

-- find surrounding zipcodes
-- no indexes used, 5 secs
-- cost pre-SP1: 45.2741
-- index used in SP1, subsecond
-- cost SP1: 14.2199
declare @id varchar(9) = '97225'
exec sp_executesql
N'select b.id, b.shape.STAsText()
from zipcodes a, zipcodes b
where a.id = @id and a.shape.STIntersects(b.shape)=1', N'@id varchar(9)', @id
go

-- cost SP1: 26.3341
-- cost pre-SP1: 219.754
declare @id varchar(9) = '162243'
exec sp_executesql
N'select b.id, b.shape.STAsText()
from census a, census b
where a.id = @id and a.shape.STIntersects(b.shape)=1', N'@id varchar(9)', @id
go

So you should not have to hint spatial indexes as frequently (or at all) in SP1. If you'd had those hints encapsulated in plan guides, you can just turn the plan guides off.

I'm still having trouble with some queries picking the best spatial index out of multiple choices; an "hhhh" index vs a "mmmm" index for example. So you may still have to hint if you have multiple spatial indexes for varied use cases. But for a single use-case/single index, it appears to pick the best of "use spatial index" vs "scan and filter" every time. To determine which is the best spatial index for a specific use-case have a look at the spatial analysis proc series starting a few postings back.

Also, don't forget that in order to give the plan coster the best information, always use stored procedure parameters or parameterized queries. If you're just executing T-SQL batches in SSMS, sp_executesql will do the trick; if you use parameterized queries in any of the database APIs (e.g. ADO.NET), they'll convert it to a call to sp_executesql for you. And don't forget to use the right syntax for the spatial index method, that is, use "a.STIntersects(b) = 1" rather than "1 = a.STIntersects(b)".

I've become a columnist for MSDN magazine and my first column is available in the online edition, Feb 2009 issue today. I'm doing a column every few months called "Under The Table", about database-centric development. The first column is about one of the most exciting (IMHO) new features in SQL Server 2008, spatial data type and indexes, and the spatial library that's part of Microsoft.SqlServer.Types. Hope you like it.

Today I came across the new SQL Server 2008 Compliance Portal. This portal has information and links to the new Compliance whitepaper and compliance scripts (the "sample files" at the bottom of the main page on the compliance portal). New features for ensuring compliance in SQL Server 2008 include Policy-Based Management, Auditing, and TDE, to name just some of the ones that come to mind.

This information fits in well with some of the talks I'll be doing at SQLConnections in Orlando in March. I'm speaking about "Practical SQL Server 2008 Security for Developers and Architects" and "Programming PBM and Data Collection with PowerShell". I'll also be doing a day-long preconference talk on spatial data/location aware apps, and a talk on programming filestreams (unrelated to compliance). Hope to see you there.

Enjoy the compliance portal!

Categories:
SQL Server 2008 | Security

When SQL Server's spatial data types were introduced, people asked "Why does every standard method begin with ST" (e.g. STIsEmpty). SQL Server docs always cite the OGC (Open Geospatial Consortium) specifications, but OGC's methods do not begin with ST. Where did it come from?

As far as I can figure the "ST" before every standard method comes from the SQL Multimedia (SQL/MM): Part 3 Spatial spec and stands for spatial-temporal. But SQL/MM prefixes standard methods with "ST_" not "ST". ESRI and IBM's DB2 spatial extender also use "ST_", and DB2 appears to use the convention of leaving the "ST_" off for non-standard methods. Oracle Spatial uses ST as of 10g (see the comments below), as well as "SDO_" for both standard and non-standard methods (stands for Spatial Data Option). Postgres Spatial uses no prefix, a la OGC.

So what's in a naming standard? Apparently, not much. One nice side-affect is that searching for "STBoundary" (no underscore), for example, gets only references to SQL Server spatial.

Yesterday a friend asked me if it was better to always use a value of GEOMETRYCOLLECTION EMPTY as an alternative to making a geometry/geography column nullable. OGC has its own concept of database NULL, that is [GEOMETRY] EMPTY, where [GEOMETRY] can be any valid geometry subtype (that is 'POINT EMPTY', 'LINESTRING EMPTY', etc). When I asked why it mattered, he said he was tired of putting ..WHERE...IS NOT NULL in every UPDATE of SQL Server spatial data type. Sounds like reason for a blog entry.

NULL is a valid value of any SQL Server data type, defining a variable without initializing it makes it NULL.
DECLARE @x int; -- value of @x is NULL
DECLARE @geog Geography -- value of @geog is NULL

According to SQL-1999 (and above) a SQL data type can have properties and methods; accessor methods and mutator methods. Calling a accessor method on a NULL variable returns NULL. Calling a mutator method on a NULL variable is an error, in SQL Server the error text is "Mutator ... cannot be called on a NULL value". If I have a table of 100000 geography instances and one row contains NULL, attempting to update that column without using WHERE..IS NOT NULL produces this error.

UPDATE dbo.Geotable SET geog.STSrid = 4326; -- error if any row contains NULL geog

EMPTY GEOGRAPHY is a different concept then NULL because OGC defines a common architecture for geography, independent of SQL. In fact, the "OpenGIS® Implementation Specification for Geographic information - Simple feature access - Part 1:Common architecture" spec does not even use the word NULL once. EMPTY can be tested with the OGC standard method IsEmpty (STIsEmpty in SQL Server), and is defined as "the empty point set ∅ for the coordinate space". Because it is not database NULL, you can call methods on it without error:

DECLARE @a geometry = 'POINT EMPTY';
SET @a.STSrid = 4326; -- works fine

Some other interesting behaviors of [GEOGRAPHY] EMPTY

declare @a geometry, @b geometry;
set @a = 'POINT EMPTY';
set @b = 'LINESTRING EMPTY';

select @a.STEquals(@b);  -- true

select @a.STDimension(); -- -1 (negative one)

select @a.STIsEmpty();  -- true, of course
select @a.STIsSimple(); -- true

-- Disjoint is true if "Intersection of @a and @b is the empty set"
select @a.STDisjoint(@b); -- 1

-- Both Within and Contains are false
select @a.STWithin(@b);  -- 0
select @a.STContains(@b);  -- 0

Some other interesting uses of [GEOGRAPHY] EMPTY

-- the boundary of a point is the empty set
declare @x geometry = 'POINT(1 1)';
select @x.STBoundary().ToString();  -- GEOMETRYCOLLECTION EMPTY

-- but the boundary of a polygon is a closed linestring
declare @y geometry = 'POLYGON((1 1, 1 2, 2 2, 2 1, 1 1))';
select @y.STBoundary().ToString();  -- 'LINESTRING (1 1, 2 1, 2 2, 1 2, 1 1)'

-- and the boundary of a closed linestring (ring) is GEOMETRYCOLLECTION EMPTY
declare @z geometry = @y.STBoundary();
select @z.STBoundary().ToString(); -- GEOMETRYCOLLECTION EMPTY

OGC's SQL Option spec (and SQL Server) differenciates between NULL and [GEOMETRY] EMPTY. For example, in the case above, the boundary of any NULL geometry IS NULL, not EMPTY.

Should you use NULL or EMPTY as a Geometry/Geography default? Depends on the semantics that you want to enforce, but, to me, lack of a value in SQL is NULL. And you can even include a constraint that the column value may not be either or both special values. EMPTY is a good way to ensure not having to use "IS NOT NULL" in every UPDATE statement, but that's as far as it goes.

What do you think?

I've devoted the last few blog entries to describing what's in that spatial index analysis proc output and how to use it and interpret the results. Just wanted to describe the rest of the information and how the information relates to what you'd find in a query plan that uses a spatial index.

The procs return:
1. Information about the index
2. Information about the query sample
3. How efficient the index is when used against the query sample

The information you get about the index is:
-  Bounding Box dimensions - BOL says they are NULL for geography but they are always -180, -90, 180, 90 for geography
-  GridLevelSize - 4 levels
-  Cells_Per_Object - maximum tessellated cells per row, specified in DDL
-  Page and Row counts for the spatial index
-  Avg number of cells per base row (how many of the CellsPerObject were actually used)
-  Height, Width, Area of a cell - Geography is always the same, but its nice to know how big the cells are compared to the other levels. Not sure what the unit of measure is for geography (BOL says "depends on SRID"), by looking at the number it doesn't look like it could be meters.
-  CellAreaToBoundingBox percentage - 4 levels
-  Total SRIDs found

The most interesting is likely the info about the number of cells in each level (including level0 - off the bounding box). Any levels that are 0 rows don't appear in the XML output. You get info for:
-Number of Object Cells for Index
-Number of Interior Cells
-Number of Intersecting Cells
-Number of Border Cells
-also some numbers about cells normalized to the leaf grid

The cells are useful in how they relate to the primary filter. The interior, intersecting, and border cells are used by the internal filter.

For the query sample, you get the same cells numbers. No "cells normalized to the leaf" however. BTW, for the geography index ignore the BOL comment that indicates that if Level0 of the query sample is 1 then index is not useable for this query. For geography its always 1.

This relates to the following information that appears in a query plan that uses spatial indexes.
1. All of the query samples (in the spatial predicate) are tessellated using the same parameters (density/max cells)
2. The table is joined against all parent/children of the index (the index levels are actually a hierarchy)
3. Index ranges are calculated
4. A join of the tesselated samples on index is done. This results in the number of rows selected by the primary filter.
5. The internal and secondary filters are applied in a separate query iterator (of type Filter) later on in the plan.

The upshot of this is that, not only is the index dimensions used to tessellate the base table rows, the same dimensions as the index are used to tessellate the query sample for the spatial predicate (query sample, in the sproc). So, for example, I'd always thought that spatial indexes over points should specify all-HIGH density and maybe 2 max cells per object. But if your query sample contains large polygons, those same dimensions will be used against the query sample too. So maybe all-HIGH density for point-based indexes isn't a hard and fast rule.

Hopefully this should give you enough information to investigate your indexes and queries using the spatial analysis stored procs. As far as patterns for "which specifications of index is best", more patterns will show up as I investigate more different patterns of spatial data. For now, I'm dividing these into three main categories:

1. Spatial index over table of points, Query sample is polygon.
2. Spatial index over table of polygons, Query sample is point.
3. Spatial index over table of polygons, Query sample is polygon.

With the simplifying premise that linestrings are thin like a point (ie they won't fill a cell), but if you put a STBuffer around them, they're more like polygons.

Happy index hunting.

A little more about the query sample that gets fed into the spatial index procs.

Query sample is a singleton geometry or geography. It's not a query. So it's not as straightforward as "here's a spatial query that could use an index, run this query and show me the figures". So this procedure is not going to work any differently for a query that use STIntersects vs STDistance vs... It's going to show you how an index on table A would be utilized given a single operation.

It maps most closely when your query is something like:
select ... from tablea a where a.geog.STIntersects(@g)=1

Is this case @g is your query sample, and Number_Of_Output_Rows is the number of rows the query returns. But what if you have a conditional query or query that has more than one row as output, like

select ...
from tablea a,
     tableb b
where a.id = @someid and b.geog.STIntersects(a.geog)=1

In this case you could specify your query sample like this:
DECLARE @my_query_sample geography = (select geog from tablea a where a.id = @someid)

...And you're looking to an analyze usage on an geography index on tableb. But this will cause an error if "select geog from tablea a where a.id = @someid" returns more than one row. How do I use the sprocs in this case? Immediately you think:
DECLARE @my_query_sample geography;
SELECT @my_query_sample = geog from tablea a where a.id = @someid

or even:
select @my_query_sample = a.geog
from tablea a,
     tableb b
where a.id = @someid and b.geog.STIntersects(a.geog)=1

Let's analyze this way of specifying query sample. If your query sample looks like it's going to represent more than one row, note that the spatial index is always use in (at least I've only seen it used in) a SEEK. So if your query window-producing query represents N rows, you need to choose one row. Hopefully a representative row. This would work if the "top(1) without an order by" row was representative:

DECLARE @my_query_sample geography = (select top(1) geog from tablea a where a.id = @someid)

And realize that you're going to have to do this operation (repsented by the output) N times. Let's make this (hopefully) more clear.

select a.zipcode, b.point
from zipcodes a
     points b
where a.zipcode IN (select zipcode from zipcodes where state = 'OR')
and a.geog.STIntersects(b.geog)

There are 430 zipcodes in Oregon. There are 1000 points in your table. You have an index on zipcodes and one on points. Remember that STIntersects is commutative (a.geog.STIntersects(b.geog) = b.geog.STIntersects(a.geog)) which index is "better"? Notice that the query returns 1 column from zipcodes and 1 column from points, so you can't just STUnion all of the Oregon zipcodes together. You'd get the same points that way, but have no idea which zipcode goes with which point. So choices are:

1. Representative query sample for zipcodes in Oregon, analyze point index, realize that you're going to do this "query" 430 times.
DECLARE @my_query_sample geography = (select top(1) geog from zipcodes a
 where a.zipcode IN (select zipcode from zipcodes where state = 'OR') )
2. Representative query sample for point, analyze zipcode index, realize that you're going to do this "query" 1000 times.
DECLARE @my_query_sample geography = (select top(1) geog from points)

Be aware that doing 430 or 1000 seeks against the spatial index might "scare off" the query optimizer from choosing that plan. You could also declare you're query sample like this:

DECLARE @geog GEOGRAPHY
select @geog = a.geog --- or @geog = b.geog
from zipcodes a
     points b
where a.zipcode IN (select zipcode from zipcodes where state = 'OR')
and a.geog.STIntersects(b.geog)

But that's still going to give you one representative combination point or zipcode as a query sample (ie @geog is still a singleton). If all of your points are in Oregon, it's going to produce a similar analytic output, and if all your points are not in Oregon, it might be better to specify it that way. But then you would have to know how many points in Oregon to deduce how many seeks. And you can't check the ouput against "Number_Of_Rows_Returned" because you don't know which row it actually chose.

If you are just needing points, not zipcode-point combinations you could STUnion together all the zipcodes in Oregon and run the query that way. The spatial analysis procs give you a 1-1 mapping against that query. Or use a state table rather than a zipcode table. Of course this is predicated on the fact that zipcodes don't overlap multiple states.

Hope this answers the question, "what do I use as a query sample to troubleshoot query X and index Y?".

Last post discussed the filters. The procs report some raw numbers and some derived numbers. Here's a cheat sheet, although the info is all in the BOL.

N = Number of rows in the table
O = Number of rows output
P = Number of rows selected by primary filter (by the index)
S = Number of rows selected by internal filter (by the index optimizations)

Then,

P-S       = Number_Of_Times_Secondary_Filter_Is_Called (number of times they ran the expensive operation)
(N-P)/N = Percentage_Of_Rows_NotSelected_By_Primary_Filter
S/P       = Percentage_Of_Primary_Filter_Rows_Selected_By_internal_Filter
S/O       = Internal_Filter_Efficiency %
O/P       = Primary_Filter_Efficiency %

<Percentage_Of_Rows_NotSelected_By_Primary_Filter>9.942714746408677e+001
<Percentage_Of_Primary_Filter_Rows_Selected_By_Internal_Filter>9.095871716137836e+001
<Internal_Filter_Efficiency>9.593378913278158e+001
<Primary_Filter_Efficiency>9.481405663596043e+001

So how efficient is the index? Let's change this info into statements, using the values from the previous blog entry:

N = 511650, O = 2779, P = 2931, S = 2666

<Number_Of_Times_Secondary_Filter_Is_Called>265</Number_Of_Times_Secondary_Filter_Is_Called>
<Percentage_Of_Rows_NotSelected_By_Primary_Filter>9.942714746408677e+001
<Percentage_Of_Primary_Filter_Rows_Selected_By_Internal_Filter>9.095871716137836e+001
<Internal_Filter_Efficiency>9.593378913278158e+001
<Primary_Filter_Efficiency>9.481405663596043e+001

1. They had to run STIntersects 265 times, rather than 511650 times if they didn't use the index.
2. 99.4% of the rows were eliminated by using the index
3. The Internal_Filter (internal optimizations) eliminated 90% of the times they would have had to run Intersects, even if the used the index (primary_filter).
4. The Internal_Filter was able to "deduce" 95.9% of the rows in the answer.
5. The Primary_Filter was able to find 94.8% of the rows in the answer.

Pretty good index, huh? And pretty good internal filter. Probably is going to be used.

Let's look at one more, but this time from using a geoemtry index. The index is over in geometry column that contains only points. The query sample is a polygon.

<Base_Table_Rows>11267
<Number_Of_Rows_Selected_By_Primary_Filter>265
<Number_Of_Rows_Selected_By_Internal_Filter>149
<Number_Of_Times_Secondary_Filter_Is_Called>116
<Number_Of_Rows_Output>413
<Percentage_Of_Rows_NotSelected_By_Primary_Filter>9.764799857992367e+001
<Percentage_Of_Primary_Filter_Rows_Selected_By_Internal_Filter>5.622641509433962e+001
<Internal_Filter_Efficiency>3.607748184019371e+001
<Primary_Filter_Efficiency>1.558490566037736e+002

Note that in this case, the internal filter only elimates 56.2% of the rows that the primary filter finds. It only finds 36% of the output rows. But note that in this case, primary filter efficiency = 155%. How can that be? Well, this is geometry and the primary filter didn't find all the rows in the answer. There must be some outside the bounding box. Remember that Primary_Filter_Efficiency is defined as Output_Rows / Primary_Filter rows.

More coming...

The spatial index diagnostic procs' most enlightening pieces of information have to do with the filter counts and efficiencies reported at the end of the set of columns or XML document output. But what are these filters exactly and what do the results mean?

The proc output refers to three filters:
1. Primary Filter
2. Internal Filter
3. Secondary Filter

Remember that the spatial indexes are based on tessellation or tiling. Issac has some nice diagrams that show how the tiling works in theory. Please review his indexing posts before reading these and when/if you get confused by the terminology. Depending on the size and type of the spatial object, the object may completely cover a tile (think big polygon), partially cover (intersect) a tile (think point) or not cover a tile. It can also hit the border of a tile. The tiles are referred to in the spatial procs as cells.

Remember also that the point of a spatial index is to reduce the number of Intersect operations (or other operations covered by the spatial index) actually performed by using tiling. I liked Isaac's analogy that "if I want to find roads that intersect Madison WI, I only have to look at roads that intersect Wisconsin". So the index (primary filter) selects all roads that intersect Wisconsin. It can get false positives (roads that intersect Wisconsin but not Madison). The "Secondary Filter" is this case is the Intersects operation itself.

So what's the "internal filter"?  If you think of the primary filter as filtering out 100 polygons out of 10000, they'd still have to run Intersects on the 100 polygons to actually see if they qualify. But say that the index and query sample both *completely* cover a cell. Or index completely covers the cell and query sample touches it. We know that the cell qualifies for Intersect in that case, we don't need to run Intersect on the query sample for that combination. That's the Internal Filter.

So let's look at some sample output. In this case the index was over a geography column that contains only points. The query sample is a polygon. 

<Base_Table_Rows>511650</Base_Table_Rows>
<Number_Of_Rows_Selected_By_Primary_Filter>2931</Number_Of_Rows_Selected_By_Primary_Filter>
<Number_Of_Rows_Selected_By_Internal_Filter>2666</Number_Of_Rows_Selected_By_Internal_Filter>
<Number_Of_Times_Secondary_Filter_Is_Called>265</Number_Of_Times_Secondary_Filter_Is_Called>
<Number_Of_Rows_Output>2779</Number_Of_Rows_Output>

So: 511650 rows in the table. Using the index (primary filter) eliminates all but 2931 of them. The internal filter determines that 2666 are true hits, not false positive. They have to run Intersect 265 times (2931 - 2666) times. And the number of rows returned is 2779, so there were some false positives. Good thing they ran Intersect. BTW. Using the method Filter() instead of Intersect() just means they're returning all the rows selected by primary filter (2931).

Finally, remember if you're using a geography index, there is no bounding box, so no part of the rows or query sample can be outside the index. With geometry, some of the rows or query sample can be outside the bounding box. This means that with geography, Rows_Selected_By_Primary_Filter >= Number_Of_Rows_Output. With geometry, Number_Of_Rows_Output may be > Rows_Selected_By_Primary_Filter as well.

I've been looking at the spatial index stored procedures (sp_help_spatial_geography_index_xml and friends) a little harder recently, in an attempt to help answer two questions.

1. If I have a spatial query, why does/doesn't it use my spatial index?
2. If I have a specific spatial query in mind, what are the best values to choose for my spatial index density?

In addition to the diagnostic information presented in the procs, there are a few things to keep in mind when attempting to answer the first question. Currently, the query optimizer is estimating "high" when it comes to using the spatial index, so there are occasions were the spatial index should be used and isn't. Isaac mentioned this in his PDC talk, and mentioned that this would be changed in the next refreshed. As far as I could see, it isn't in SQL Server 2008 CU2.

Also, remember that SQL Server will never use cardinality estimates for a parameter set in a variable by the SET statement unless you use OPTION(RECOMPILE) on the statement. This is a general SQL Server behavior, not a spatial behavior because of when SQL Server creates the query plan. So if you want cardinality estimates for the variable @g in the query "select geog from table where geog.Intersects(@g)=1" you must either pass @g in as a stored procedure parameter or use sp_executesql. Realize that if you use a parameterized query in a client app, the client API stacks will change the parameterized query to invoke sp_executesql, so you're OK here. According to Michael Rys' SQLPASS talk, one spatial-specific behavior is that cardinality won't be used if @g if specified as a constant, that is "select...where geog.Intersects('POINT(1 1)')=1 " either. Finally SQL Server doesn't change the query plan in a parameter value changes, general behavior, not spatial-specific.

Back to the diagnostic stored procs.

The information that the spatial index procs provide consists of three main categories:
1. Information about the index itself
2. Information about the query sample
3. How efficient the query index is when used against the query sample

The query sample is specified as an input parameter to the diagnostic procs, and would be "@g" in the Intersects queries above.

If you've read my previous blog post on the diagnostic procs, read it again, I've revised it. Filters are the subject of the next post.

I've been doing talks and demos on SQL Server 2008 Extended Events for a while now, it's one of my favorite parts of the product. Per session waitstats, SQL stack, built-in system health session, what's not to like?

Before my talk at TechEd EMEA for IT Professionals last week (which went really well), I discovered something that was very helpful in getting things to run more smoothly. Since the betas, I've always used multi-part names, not only for events, actions, and providers, but for predicates too. To be clearer. Lately, I'd been running into a weird problem. Consider the following event session:

create event session errorsession on server
add event sqlserver.error_reported
(
action
(   
sqlserver.session_id, 
sqlserver.sql_text    
)
where sqlserver.error_reported.error = 547
and package0.counter <= 3 
)
add target package0.ring_buffer
go

About one third of the time, running this DDL would produce:

Msg 25706, Level 16, State 8, Line 1
The event attribute or predicate source, "sqlserver.error_reported.error", could not be found.

Then, I'd move the DDL code to another window, execute it again, and it would work. Sometimes starting up the session:

alter event session errorsession on server state=start

would produce the error. And, of course that attribute DID exist, when I got past the weird error, it resultiing session worked like a charm. You can see the attribute/field in sys.dm_xe_object_columns. Hmmmm, scratches head...

After looking at the definition of the system_health session, I decided to try a one-part attribute name. Instead of:

...where sqlserver.error_reported.error = 547 and...
-- how about
...where error = 547 and...

Bingo! Now CREATE and ALTER SESSION work first time and every time. Don't know why this should happen, but at least there's a syntax correction that works. For the folks at the talk, all my event session code will be up on the SQLskills website shortly, under the "Past Conferences" section. With ONE-PART attribute names in predicates.

BTW, a few other things to remember to save you some head-scratching.

When using the ETW target, your SQL Server service account (which DOES have least-privileges...right?) needs to be a member of the Performance Monitor Users and Performance Log Users Windows groups.

When using a file-based target, the service account has to have permissions on the directory where you're writing the file. This sounds obvious, but folks forget that the SQL Server service account isn't all-powerful. Or at least *shouldn't be*, if you're running SQL Server as local system or administrator, FIX IT, using SQL Server Configuration Manager (not Control Panel/Services).

Use an asynchronous target with default event latency rather than a synchronous target, for less overhead, better throughput. However the ETW target is synchronous only.

All of the events, targets, etc are sharable, mix-n-match, across all packages WITH ONE EXCEPTION. The items in the SecAudit package are private. Folks always want to use SecAudit.asynchronous_security_audit_event_log_target, ... you can't. It's for the new Auditing feature only.

All that said, Happy extended eventing!

I remember hearing during the SQL Server 2008 beta that there would be diagnostic stored procedures that would produce information about spatial indexes and help in troubleshooting why an index was not being used. In addition, these procedures would help in determining the best spatial index. A few weeks ago I had the good fortune to run across these procedures.

sp_help_spatial_geometry_index and sp_help_spatial_geography_index put out information as columns
sp_help_spatial_geometry_index_xml and sp_help_spatial_geography_index_xml put out XML format

Each procedure takes a table name, index name, and sample query. There's an option for verbose or basic properties. The parameters and output are doc'd in BOL. Rather than repeat the doc, I'd like to walk through a problem and show how you'd interpret the output.

Say that you are trying to decide between an index with high and medium density to run a query similar to the following:

Declare @geometry geometry =
 Geometry::STGeomFromText('POLYGON((500000 300000,500000 340000,540000 340000,540000 300000,500000 300000))', 0)
exec sp_executesql N'select * from spatial_table where LOCATION.STIntersects(@g) = 1', N'@g geometry', @geometry

Create both indexes and run the procedure against each one, using the sample query. There is a lot of very detailed information about the index itself because I was running in verbose mode, including information about the number of cells at each level of tesselation. Before looking at this information, it would be good to review Isaac's excellent blog post on how tesselation works. The most useful info though, was contained in the "Number of rows selected by ... filter" properties and filter efficiency properties. In my case there they were

High density index:
Number_Of_Rows_Selected_By_Primary_Filter:  2931
Number_Of_Rows_Selected_By_Internal_Filter: 2399
Number_Of_Times_Secondary_Filter_Is_Called:  532

Medium density index:

Number_Of_Rows_Selected_By_Primary_Filter:  2931
Number_Of_Rows_Selected_By_Internal_Filter: 2666
Number_Of_Times_Secondary_Filter_Is_Called:  265

Note that "primary filter" in this case is the spatial index in question, the goal is to have the rows selected by the primary or internal filter rather than run the secondary filter (the actual Intersects operation). Note that there can be false positives, the actual number of rows returned was not 2931 but 2779.

The best way to judge the effectiveness of the index, however, is with the property Internal_Filter_Efficiency or Primary_Filter_Efficiency.  Internal_Filter_Efficiency, for example, showed the high density index at 86% and the medium denisty index at 96%. Clearly, the medium density index is a better choice for this query. In addition, looking at the statistics on the index itself shows that the medium density index is a smaller index with less rows and index row per base row. The figures were:

High density:
Total_Primary_Index_Rows: 5273534 rows
Average_Number_Of_Index_Rows_Per_Base_Row : 10

Medium density:
Total_Primary_Index_Rows: 1965865 rows
Average_Number_Of_Index_Rows_Per_Base_Row : 3

There is also a property, Total_Number_Of_ObjectCells_In_Level0_For_QuerySample, that will tell you if the index *can* be used. A non-zero value indicates this and properties that will assist in determining whether to adjust the bounding box on a geometry index.

Hope this will give you a good head start in working with these excellent informational and diagnostic stored procedures.

 

I'm not usually one to post product announcements, but I had to point out that the Service Broker External Activator shipped with the latest (Oct 2008) version of SQL Server 2008 feature pack. It's available here.

Doing a quick "visual diff" with the August 2008 feature pack, I also notice some new PowerShell extensions, SQLXML 4.0 SP1, and a number of items for Reporting Services (including, of course, Report Builder 2.0), Analysis Services (including ADOMD.NET and some interesting datamining report viewer controls) and the SAP BI connector.

Lots of new items.

It's almost November and I'm in the final phase of getting ready for a road trip. In the next three weeks I'll be speaking on SQL Server 2008 at:

TechEd EMEA ITPro - talks on Extended Events and on T-SQL performance improvements
TechEd EMEA Developers - talks on T-SQL perf, SQLCLR, Filestream programming, Sparse Columns, and SQL Server/PowerShell
SQLPASS - a preconference day of all SQL Server 2008 developer features and a spotlight session on programming SQL Server spatial data types

If you're at any of those places, please stop by and say hi. I'll also be hanging out at the ask the experts booth some of that time. Looking forward to seeing you there.

Categories:
SQL Server 2008

In the past year or so, a few installations had begun to experience problems with the SQL Server 2005 security cache (aka TokenAndPermUserStore) growing too large over time. Some manifestations are connection and query timeouts and queries that take a long time.

The folks at PSS published the canonical blog entry about this problem, including knowledge base articles and suggestions for SQL Server 2005, and also mention of the name of the SQL Server 2008 parameters to configure the size of this cache at the end of the article.

In SQL Server 2008 the size of TokenAndPermUserStore is configurable as mentioned in the SQL Server 2008 Books Online. I happened across it yesterday trying in researching a user inquiry. The configuration options, access check cache quota and access check cache bucket count, are mentioned in the books online without much fanfare or much description. But a knowledge base article, (KB955644) published the day that SQL Server 2008 shipped, describes the defaults and suggest some guidelines for configuring these parameters. Excellent.

A friend of mine was asking about the affect of multi-targeting in Visual Studio 2008 on SQLCLR. For an explanation of how multi-targeting works, reference David Kean's blog entry on Visual Studio 2008 multi-targeting and FXCop.

I happened to have an instance of SQL Server 2005 RTM (which uses .NET framework version 2.0.50727.42) on hand to try. I took two assemblies as a test. Just for fun, one assembly contained a user-defined function that works on SQL Server 2005, but not on SQL Server 2008 (reference my blog posting on 'Semantic (possibly breaking) change in SQLCLR TVFs') but contains NO new 3.5-specific functionality. It simply reads the event log and returns a table in a SQLCLR TVF. One assembly used System.DateTimeOffset (a type in 2.0.50727.3053 but not in 2.0.50727.42) internally (ie, not as an input or output parameter, but inside of a method I intended to try as a SQLCLR UDF).

I compiled both assemblies on a system with Visual Studio 2008 (.NET 2.0.50727.3053) targeting the .NET Framework 3.5. As an aside, when I tried targeting .NET 2.0, running code analysis gave a warning about DateTimeOffset. Move both 3.5-targeted assemblies to my SQL Server 2005 RTM system.

The assembly with no 3.5-specific functionality cataloged and ran fine in SQL Server 2005 RTM. Attempting to catalog the assembly that included DateTimeOffset failed CREATE ASSEMBLY with the error message:

Msg 6218, Level 16, State 2, Line 1
CREATE ASSEMBLY for assembly 'UseDateTimeOffset' failed because assembly 'UseDateTimeOffset' failed verification. Check if the referenced assemblies are up-to-date and trusted (for external_access or unsafe) to execute in the database. CLR Verifier error messages if any will follow this message [ : UserDefinedFunctions::GetTimeSpan][mdToken=0x6000001] Type load failed.

where UseDateTimeOffset.dll is my assembly and UserDefinedFunctions.GetTimeSpan is the public static method that would have used it.

Hope that clarifies things. You can use 3.5-targeted assemblies on SQL Server 2005 as long as they don't use functionality (the type System.DateTimeOffset is in mscorlib.dll) that don't exist in the SQL Server 2005 machines's version of .NET.

BTW, you CAN use any of the new .NET functionality in SQL Server 2005 as long as you install .NET 3.5 on the SQL Server machine. Obviously that doesn't mean you can use DateTimeOffset as a stored procedure/UDF parameter, but you can use it inside a stored procedure called by SQLCLR. Just like you can use Array in your implementation, but not expose it to T-SQL.  I mentioned one cavaet to installing .NET 3.5 on a SQL Server 2005 machine in a previous blog posting earlier this year. Also, SQL Server 2005 SP2 won't recognize System.Core.dll as a "safe" assembly (as it is in SQL Server 2008) but that's to be expected.

Categories:
SQL Server 2008 | SQLCLR

Although being able to use a GUI tool like SQL Server Management Studio is a nice feature by itself, it's unusual that fixing something in a GUI makes a feature (that hasn't changed) more useable. The "feature" is defaults on SQLCLR procedure parameters.

You can't have defaults on parameters in .NET, but you can specify them in the CREATE PROCEDURE DDL, for example:

CREATE PROCEDURE addwithdefaults (@x int, @z int out, @y int = 5)
AS EXTERNAL NAME sampleasm.StoredProcedures.AddWithDefaults

These work just like defaults in T-SQL procedures; if you don't specify the parameter, the SQLCLR code receives the default value when it is called. Dandy...it's always worked this way.

But, in SQL Server 2005 SSMS (and SMO) the SQLCLR procedure was always displayed as "No Default". Even though there's perfectly good information in the SQL Server metadata about the default. Some folks were convinced that, even though default parameters worked as they should, it was dangerous to use them. Say a DBA was to move the procedure from test to production by scripting the CREATE PROCEDURE statement from SSMS (or SMO). They'd lose the default and when the procedure was defined with the generated script, code that worked in test (depended on the default value) would break in production.

I just noticed in SQL Server 2008 SSMS that defaults ARE displayed and procedures ARE scripted correctly. Great. Let a thousand SQLCLR procedures with default parameters bloom...I guess.

Categories:
SQL Server 2008 | SQLCLR | SMO

Extended Events is one of my favorite SQL Server 2008 features and I'll be speaking about how to use them for problem diagnosis, at the Portland (Oregon) SQL Server User's Group's September meeting. Check out http://www.pdxvbug.com/pdxuser.asp for details.

Meeting is 6:30 on Thursday Sept 25. See you there.

Categories:
SQL Server 2008

After reading Isaac's recent blog posting about spatial queries and index hinting, I thought I might add some information based on a query I was working with this week. Sometimes if you have a query that's too syntactically complex, hinting won't work. In these cases you can cause the index to be used (or at least hintable) by breaking up the query. Warning. Protracted code example follows. 

If the spatial query is somewhat complex, it's useful to write it out step by step, so we start with the query written out in steps. We're using three tables, a county table, highway table, and POI (points of interest) table. Without pondering the necessity of clipping to the county boundary twice, here's the original logic. 

--Find all banks within 2km of I66 as it runs through Fairfax County
DECLARE @g geography
DECLARE @h geography
DECLARE @i geography
DECLARE @j geography
DECLARE @k geography
SELECT @g = geog from va where fips = 51059 -- Fairfax County
SELECT @h = geog from us_hwys where route_num = 'I66' -- I66 Hwy
SELECT @i = @h.STIntersection(@g)-- I66 within county
SELECT @j = @i.STBuffer(2000)-- buffer around I66
SELECT @k = @j.STIntersection(@g)-- clip buffer to county boundary
SELECT geog from POI
  WHERE geog.STIntersects(@k)=1 and Description = 'bank'

We thought that it might be better for the overall query performance to write this as a single SQL query, so, doing simple substitution, one subquery at a time, we come up with these two equivalent query sets at the end of the process:  

-- A. Almost there... 2-query process
DECLARE @k geography
select @k =
((SELECT geog from us_hwys where route_num = 'I66').STIntersection(
  (SELECT geog from va where fips = 51059)
)).STBuffer(2000).STIntersection(
  (SELECT geog from va where fips = 51059)
)
-- uses the spatial index without a hint, total elapsed time for both steps 2 seconds
SELECT geog from POI -- with (index(spatial_idx_2))
where geog.STIntersects(@k)=1 and Description = 'bank'
go

-- B. Done... 1 query
-- But, doesn't use the spatial index
-- Total elapsed time, 48 minutes 53 seconds!
SELECT geog from POI  -- with (index(spatial_idx_2))  hint doesn't work
where geog.STIntersects(
  ((SELECT geog from us_hwys where route_num = 'I66').STIntersection(
    (SELECT geog from va where fips = 51059)
  )).STBuffer(2000).STIntersection(
    (SELECT geog from va where fips = 51059)
  )
)=1 and Description = 'bank'

The resulting query is too complex for the query processor to even "think of" a plan that uses the spatial index. Even hinting doesn't work. The simpler STIntersects query using the 2-query process can be hinted, but we don't need the hint. The query processor is smart enough to use the spatial index with the simplified version. Oh...

So the moral of the story is not only to be on the lookout for index-hinting opportunities, but that sometimes, if the query is too complex the query processor won't take the hint. The error in this case was:

Msg 8622, Level 16, State 1, Line 1
Query processor could not produce a query plan because of the hints defined in this query.
Resubmit the query without specifying any hints and without using SET FORCEPLAN.

In cases like this, it may be necessary to break the query into pieces in order to apply the hint. In this case, it happens with an STIntersects() that used four subqueries, the exact "complexity point" may vary with the query itself.

Of course, in this case, because of the repeating "clipping" steps (clip to a particular county), we can rewrite the query to use common table expressions. This not only makes it easier to read but, by removing the redundant subquery, the query processor decides to use the index with the hint.

WITH GetCounty as (SELECT geog from va where fips = N'51059'),
     GetHighway as(SELECT geog from us_hwys where route_num = 'I66')
SELECT POI.geog from POI with (index(spatial_idx_2)),  GetCounty C,  GetHighway H
WHERE POI.geog.STIntersects(
  H.geog.STIntersection(C.geog).STBuffer(2000).STIntersection(C.geog)
)=1 and Description = 'bank'

So watch out for introduction of excess complexity. Hope this was helpful.

When SQL Server 2008 shipped, Visual Studio 2008 SP1 and .NET 3.5 SP1 shipped a few days later. Visual Studio SP1 contains some neat enhancements that allow you to use SQL Server 2008 databases in Server Explorer and the related designers. There's support for SQL Server change tracking in the Sync Services designer. The EDM and LINQ to SQL designers know about DateTimeOffset (datetimeoffset) and TimeSpan (time) in SQL Server 2008. Server Explorer also knows about geometry, geography, and hierarchyid. Very nice.

I often use the SQLCLR projects (that's Database Project under your programming language) that provide templates, auto-deployment of assemblies and CLR-based database objects, and an "Add Reference" dialog that allows adding references only to system libraries that are classified as "safe to use" (i.e. have been tested with SQL Server) and assemblies that already exist in your auto-deploy target database. Makes the base functionality more accessible to programmers. When I wrote samples of SQL Server 2008 functionality I didn't use those projects, because they didn't yet support some of the new functionality. So I thought I'd go back in and check.

Another reason I checked is because I sometimes get feedback from programmers on newsgroups/forums that, if its not exposed in Visual Studio (or in SSMS Object Explorer to give another example) the base functionality must not actually exist. You CAN program SQLCLR objects that use ALL of these features...just use an ordinary Class Library project and write your DDL by hand.

First thing I noticed is that there is no special SQLCLR project for C++. In VS2005 it was called "SQL Server Project" but in VS2008 it's gone. That leaves us with projects for C# and VB.NET. But not a lot of people coded SQLCLR in managed C++.

Next, I tried out my SQL Server 2008-specific features. Here's a rundown.
  The new supported libraries System.Core and System.Xml.Linq don't appear in Add References dialog.
  Autodeploy doesn't recognize the system .NET-based data types SqlHierarchyId, SqlGeometry, SqlGeography.
  Autodeploy knows about the mapping of DateTimeOffset in SQL Server 2008, but doesn't know about TimeSpan to SQL Server's time data type.
  Can't autodeploy a multi-input user-defined aggregate.
  Can't autodeploy a large user-defined aggregate (in fact it doesn't seem to like any UDAgg with Format.UserDefined).
  Autodeploy always maps the .NET type DateTime to SQL Server's datetime. But in 2008 DateTime can map to SQL Server's datetime, datetime2, or date. Perhaps an enhancement to SqlFacet.
  I can't specify "order by" for an ordered TVF during autodeploy.
  Using a nullable type as a parameter fails in autodeploy.

Bear in mind that these SQL Server 2008 features do work as advertised if you use Class Library projects and manual deployment (ie, roll your own DDL).

I did file a bug on Connect for these. Maybe it should have been an enhancement request instead, but I really thought SQL Server 2008 support would include these and they just weren't in the beta yet.

Categories:
SQL Server 2008 | SQLCLR

One of my favorite new features of SQL Server 2008 is extended events. I've written a bunch of blog entries on 'em (use the search, type in Extended Events). So a few days ago, I recieved an email from Jonathan Kehayias directing me to his new program on Codeplex, the SQL 2008 Extended Events Manager, asking for my opinion and suggestions.

Well, my opinion is "I like it a lot". And one of my first suggestions was a starter help file, because those of us who are sometimes "GUI challanged" might miss features upon first glance. And every new dialog I discover enforces my appreciation for the program's usefulness.

So here's a short starter walkthrough.

1. When you bring up the program, an empty window appears. Choose File-New Connection from the menu to get a connection to an instance. You can change connection but you can only have one connection open at a time.
2. A treeview appears in the lefthand pane of the main window. It shows information about your current event sessions. An event session consists of one or more events. Events contain event fields, actions, and can contain predicates. Each event session has a target with options related to the target specified.
3. Each event session has a context menu. You can Edit or Drop the event session, Script the event session for CREATE or DROP, and Stop and Start the event session.
4. Choosing New Event Session from the context menu on the (top-level) instance node of the tree or choosing Edit Event Session on an existing event session brings you to the Session Editor dialog.
  a. For a new session, you need to enter the session name.
  b. Clicking the hyperlink for Add Event brings you to the Event Editor dialog. Here you can choose events, actions, and predicates with the help of "search terms" that help you locate the event you want. There's even a Predicate Editor.
  c. Clicking the hyperlink for Add Target brings you to the Target Editor.
  d. Saving an event session in the Session Editor creates it immediately or you can script the event session.
5. The Extended Events Metadata Viewer is available from the content menu of the (top-level) instance node as well. This dialog lets you browse graphically through the Extended Event metadata.

BTW, the program consists of two pieces, the GUI program and the ExtendedEventsManager library. The library is meant to be as SMO-like as possible (there currently are no SMO classes for Extended Events). This means that you could even load the library into...let's say PowerShell..and use it there also.

Post enhancement requests, bug reports, etc to the Codeplex project page.

Folks have always had trouble with the fact that ring orientation is required with spatial instances if you're using SQL Server 2008's geography data type, but not with the geometry data type. For an explanation of the need for this, reference Isaac Kunen's blog entry here and Ed Katibah's blog entry (link in next paragraph).

In Ed's blog entry, he provides a neat way to fix spatial instances that have the wrong ring orientation for geography by using the geometry type and calling a method that forces the required ring orientation for geography. You should read his entry from yesterday for background.

I thought this was such a cool mechanism that I coded up a SQLCLR UDF that encapsulates this behavior. It accepts almost any WKT and produces a "proper" geography type, regardless of the ring orientation in the WKT. The code is included with this post. The function is called GeographyFromAnyWKT. Well, almost any. There are certain edge conditions (such as a ring that actually would exceed a single hemisphere) that will produce and error, but at least you shouldn't have to worry about ring orientation. Thanks Ed!

As an aside, although this is a Visual Studio SQLCLR autodeploy database project, the function can't be autodeployed because it returns a SqlGeograhy type and this is not covered by the autodeployer code. So I've included a deploy script and a couple of test cases in the project. Although I define the UDF as RETURNS NULL ON NULL INPUT, I've also included (redundant) null checking in the function itself, just in case you want to change the function a bit. Enjoy!

GeographyValidator.zip (96.73 KB)

I came across the following interesting behavior while testing a SQLCLR table-valued function that did work in SQL Server 2005 SP2 but doesn't work at all in SQL Server 2008. It appears to be by design, because the error message (in 2008) clearly indicates what's wrong. But the code worked in SQL Server 2005 and if you depend on this code behavior, it's a breaking change. And I haven't seen this in a readme file or BOL "What's New" section.

First, some background. .NET procedures are allowed to do any kind of "data access" including getting the Windows Identity, accessing the local database instanace, etc. .NET user-defined functions, however, are not permitted data access unless they are marked with a SqlFunction attribute specifying DataAccess=DataAccessKind.Read. Accessing certain session information requires SystemDataAccess=SystemDataAccessKind.Read as well.

.NET table-valued functions consist of a UDF function method and also a FillRowMethod. The UDF function method must return an instance of a .NET type that implements IEnumerable or IEnumerator. This can be a class that you provide or one of the build-in .NET types such as System.Array. SQL Server will call its Enumerator and call back to FillRowMethod once for every time the enumerator returns true.

In SQL Server 2005, you can do "data access" in the UDF method, the FillRowMethod, or any of the other methods in the class (like the enumerator's MoveNext method). Only the UDF method must be marked DataAccess=DataAccessKind.Read and only the UDF method CAN be marked with the SqlFunction attribute and produce the desired effect.

In SQL Server 2008, attempting to do data access in the FillRowMethod now throws an exception. Perhaps the behavior change was required to implement a new SQLCLR feature, ordered TVFs, but I'm only guessing that ordered TVFs are the reason. Perhaps it was never intended to work. The error message in 2008 is pretty clear:

"System.InvalidOperationException: Data access is not allowed in this context.  Either the context is a function or method not marked with DataAccessKind.Read or SystemDataAccessKind.Read, is a callback to obtain data from FillRow method of a Table Valued Function, or is a UDT validation method."

There's no workaround for this if you choose a table-valued function, except to do all your data access in the TVF method only. But SQLCLR provides another way to way to stream a rowset of data you synthesize yourself, using SqlMetaData, SqlDataRecord, and SqlPipe methods in a SQLCLR stored procedure. You can get almost the same result (streamed rowset) in such a stored procedure and "data access" is always allowed in SQLCLR stored procedure code.

Categories:
SQL Server 2008 | SQLCLR

For attendees of last Thursday's talk on SQL Server Spatial for the masses, the demo code is located here. Thanks for coming and for your participation!

Categories:
SQL Server 2008

.NET (and therefore SQLCLR) divides up running its code (even within the same process like the sqlserver.exe process) into appdomains. The appdomain is like a lightweight process used to enforce isolation between running .NET code within the same Windows process. SQLCLR (.NET code running in SQL Server) uses appdomains to isolate execution of .NET code on a per database and per assembly owner basis.

SQLCLR uses appdomains for two reasons: for running .NET user code like functions and procedures, and for running DDL to create and alter assemblies. You can see appdomains being created and destroyed in the SQL Server log, as well as query the current appdomains by using the sys.dm_clr_appdomains dynamic management view.

In SQL Server 2005, the SQL Server log only showed user code-running appdomains, the messages look like this:

AppDomain 4 (testdb.dbo[runtime].3) created.
AppDomain 4 (testdb.dbo[runtime].3) is marked for unload due to common language runtime (CLR) or security data definition language (DDL) operations.
AppDomain 4 (testdb.dbo[runtime].3) unloaded.

Is these messages there's a appdomain being created for running code in the testdb database for assemblies owned by dbo. It's a runtime appdomain. The last two messages show the appdomain (some time later) being unloaded.

In SQL Server 2005 it was difficult to "see" DDL appdomains, because their presence was not logged in the SQL Server log. You could deduce they existed by noting that the AppDomain numbers (e.g. AppDomain 4 above) seemed to contain skips in the number range. Or if you were extremely lucky, by querying sys.dm_clr_appdomains at *exactly* the right time.  In SQL Server 2008, it is. You don't see them being created, but I just executed some DDL to create a SQLCLR assembly and SQL Server log reports:

AppDomain 5 (testdb.dbo[ddl].4) unloaded.

This should make it a little easier to diagnose appdomain-related problems or see when assemblies have been created or altered.

Categories:
SQL Server 2008 | SQLCLR

I've been able to coordinate being home and the Portland (Oregon) SQL Server user group's meeting schedule, so next week I'll be speaking "at home" for a change. I'm giving a talk on one of my favorite features for developers in SQL Server 2008, Spatial Data support. Because it's relatively common these days to see GPS not only in cars but also in "carry along" devices, cell phones, cameras, and other hi-tech toys I don't personally possess, I called this talk "SQL Server spatial data for the masses". If you're at all interested in SQL Server or storing, processing, and visualizing spatial data and databases, I'll see you there.

WHEN:
Thursday, July 24th - 6:30 PM

WHERE:
SQLSoft+
1500 NW Bethany Blvd.
Suite 285
Beaverton, Oregon OR 97007

You may want to RVSP to Ken Starnes at kstarnes@kdsa.com so he can save you a seat.

Categories:
SQL Server 2008

I've noticed that some folks have written PowerShell scripts that execute against a list of servers. In the scripts, they read the names of the servers from XML files. But the SQL Server PowerShell provider in RC0 has a useful "component" called SQLRegistration; its "path" is SQLSERVER:\SQLRegistration that they can use instead.

This path permits enumeration and manipulation of the (SQL) servers and server groups that are defined using SQL Server Management Studio. The SQLRegistration path is not specific to server or instance, but to the SSMS user that's signed on. For example, on the same physical machine, SQLRegistration for the Windows user bobb reflects bobb's settings in SSMS. SQLRegistration for user mary (on the same machine) would reflect mary's SSMS settings. There is no SQLSERVER:\SQLRegistration\{machinename}\{instancename} path, just SQLSERVER:\SQLRegistration.

SQLRegistration contains two "subdirectories", called "Central Management Server Group" and "Database Engine Server Group". These are equivalent to SSMS' "Central Management Servers" and "Local Server Groups" folders, respectively.

So rather than write custom XML files that hold information about groups of servers you can use the built-in Registration store. You can import and export registration information to keep team members' view of SQL Servers and groups consistant using SSMS.

Because I've been writing PowerShell scripts against the provider's Policy and DataCollection stores, I thought it would be fun to program SQLRegistration. I found the Microsoft.SqlServer.Management.Registration namespace in BOL (which is RC0 is really called Microsoft.SqlServer.Management.Smo.Registration, BOL says this will change before RTM) and was about to code against the RegisteredServer and ServerGroup classes, when I realized that Registration was a simple hierarchies of servers and groups. Just like directories and files. And the built-in groups each have a "mode" property that's value is "d". Registered Servers have a blank mode property. Hmmm...

So, at the PowerShell prompt
> cd 'SQLSERVER:\sqlregistration\Database Engine Server Group'
> new-item MyNewGroup

creates a new directory (ServerGroup) named MyNewGroup. Well then, a registered server should be a file.

> new-item MyNewServer -itemtype file

New-Item : SQL Server PowerShell provider error: This provider only supports the
creation of new Server Groups or Server Registrations. Please specify "directory"
to create a new Server Group or "registration" to create a new Server Registration.

So this should work?

> new-item MyNewServer -itemtype registration

New-Item : SQL Server PowerShell provider error: Please specify a connection string using the -Value parameter.

> new-item MyNewServer -itemtype registration -Value "server=mynewserver;integrated security=true"
>

Yep. This creates a new server registration using the connection parameters that you specify. How cool is that? No other part of the SQL SERVER PowerShell provider that I'm aware of supports the new-item cmdlet/operation. Because the SQLRegistration represent fairly simple items and hierarchies new-item is supported here. No custom programming needed (although you can use the SMO classes if you'd rather), just treat it as you would file system.

A couple of caveats. In RC0 SSMS and PowerShell SQLRegisgtration sync up when you open SSMS but if you're using both at once, the sync to SSMS isn't immediate. Sometimes refresh doesn't appear to show the changes you made in PowerShell. Eventually it will show them if you refresh multiple times or restart SSMS. And after deleting a Server Registration in SSMS, I still had to run "rm MyNewServer" in PowerShell to get rid of it there. Also, I use "integrated security=sspi" rather than "integrated security=true" in my ADO.NET connection strings. This seems to bother SSMS which reports an error in configuration. Use "true" rather than "sspi" when creating registrations through PowerShell.

Categories:
PowerShell | SQL Server 2008

I received a question today about whether I'd converted my Policy-Based Management examples using SMO (see the multi-part "Programming Policy-Based Management with SMO" series, starting here) from C# to PowerShell yet. I did do this a while ago; they're available as a script download on the SQLskills website (look on the "Past Conferences" page under TechEd 2008). But...

Since then RC0 has changed PBM a bit. The multipart name policy can't use ExecutionMode.Enforce any more, so I changed it to ExecutionMode.None which equates to "On Demand" in the SSMS dialog. There was a change to one of the enumerated constant names too. So the updated scripts for RC0 are posted as part of this blog entry.

As long as I was working on parts of the PowerShell provider for SQL Server and SMO, I decided to put together an example of using the DataCollection APIs as well. The script creates a custom DataCollection Collection Set. That's pretty straightforward. Using the CollectionStore instance just use the correct location in the provider hierarchy and get a CollectionStore instance.

# set a collectionstore object for the default instance on local machine
$colpath = ('SQLSERVER:\DataCollection\' + (get-item env:\computername).Value + '\default')
$col = get-item $colpath

Then create a CollectionSet instance with (your CollectionStore $col is specifed as a parameter to new-object) and one or more CollectionItems in the CollectionSet. Then set the CollectionSet and CollectionItem proprties and call Create. This script enclosed as well, named CreateCustomCollectionSet.ps1.

As an aside, I've enclosed a script called start_smo_sql.ps1. This is a script that I call at PowerShell startup if I know the SQL Server provider is already installed. It sets convenience variables for long SMO namespace names, paths to the default Server, CollectionStore, etc. It's not invoked from my main PowerShell profile %UserProfile%\My Documents\WindowsPowerShell\profile.ps1 which is executed by all PowerShell shells/this user. Instead it's invoked from %UserProfile%\My Documents\WindowsPowerShell\Microsoft.SqlServer.Management.PowerShell.sqlps_profile.ps1. This profile is only executed when I'm using SQLPS.exe, the SQL Server-specific custom shell. It's quite cool that PowerShell accomodates separate profiles for custom shells.

The toughest part in creating a custom collection set is that the XML schema for collection set properties has been updated in one of the CTPs, but not updated in Books Online RC0. The updated XML schema can be obtained by executing:

use msdb
select * from syscollector_collector_types

and using the parameter_schema column to get the schema you need to figure out how to create a validatable parameter. My T-SQL custom collection set used a parameter with an XML namespace on the root element (only) that looked like this:

<ns:TSQLQueryCollector xmlns:ns="DataCollectorType">
        <Query>
          <Value>select * from sys.dm_exec_query_stats</Value>
          <OutputTable>dm_exec_query_stats</OutputTable>
        </Query>
</ns:TSQLQueryCollector>

ps_demo_scripts.zip (3.23 KB)

Last week at TechEd Developers, I gave a talk on PowerShell and SQL Server. I mentioned some upcoming changes in RC0, and have just had a chance to check them out.

The PowerShell provider for SQL Server has been expanded to handle not only a "SQL" subdirectory (which enumerates database objects) and "SQLPolicy" (which enumerates the policy-based management objects) but also two new "directories": SQLRegistration and DataCollection.

SQLRegistration covers the groups and members of "Registered Servers" and before you ask, yes this is the same Registered Servers that show up in SQL Server Management Studio. So if you're managing hundreds of servers, you can easier script against any server or group of "Registered Servers". You don't have to navigate to a different machine and instance by referencing the name, as I described in an earlier blog entry. Be aware that the registration service takes a while to refresh registrations. If you define a new registered server in SSMS, it won't show up in the PowerShell list immediately. Just wait a minute or so...

The DataCollection "subdirectory" allows you to enumerate the database objects and properties for the data collection feature that is used to populate and control the performance data warehouse in SQL Server 2008. You navigate through these collections/values the same way you'd navigate SQL and SQLPolicy. This one isn't doc'd as being part of the PowerShell provider in  BOL yet.

A couple of other miscellanous changes I noticed:
1. There are no longer separate custom drives for SQL and SQLPolicy as in previous versions. If you liked these custom drives its easy to add them:
New-PSDrive -Name SQL -Root SQLSERVER:\SQL
New-PSDrive -Name SQLPolicy -Root SQLSERVER:\SQLPolicy
..etc

2. Collection Names are no longer case-sensitive. So:

dir SQLServer:\SQL\{Computer}\{Instance}\databases

(where you replace {Computer} and {Instance} with real names) works now with "databases" being case insensitive.

Categories:
PowerShell | SQL Server 2008

SQL Server 2008 RC0 comes with a Readme file and a Release Notes file. It's always good to read both. In this RC, the readme file contains information on installation and upgrade, and the Release Notes file contains information about things that have changed from previous releases.

In RC0, they've changed the syntax/format of a few relevent new features. The changes affect T-SQL MERGE, the HierarchyID data type's methods, and the Geometry data type's Latitude/Longitude order. These won't be doc'd in BOL under What's New, because the What's New section lists new features, not features that have changed since the last CTP. Reading up on features that have changed will (hopefully) reduce the number of "why doesn't this code example that I downloaded from a reputable source work?" questions.

In addition to the two new performance features I just blogged about, RC0 also contains the new "Spatial Builder" API. Isaac has an excellent blog post about this API, with a (now-)working example. Check out his blog, I didn't find this feature mentioned in either "What's New" or "Release Notes".

Categories:
SQL Server 2008

There's a newer version of SQL Server 2008 (Release Candidate 0) available on MSDN and Technet.

Another performance-related feature (actually its additional information) allows you to have better visibility into performance-affecting queries. This information is available as columns query_hash and query_plan_hash in the sys.dm_exec_query_stats and sys.dm_exec_requests DMV. There's a really nice illustrative example in Books Online, so I won't repeat it here. You can use the queries in my last blog post to experiment with this as well.

The queries:

SELECT title_id, sum(qty) from sales
group by title_id
having sum(qty) = 30

and

SELECT title_id, sum(qty) from sales
group by title_id
having sum(qty) = 40

are similar enough that they could be parameterized (although they aren't autoparameterized). Parameterization would save query plan entry storage, rather than allocating a separate plan for each incantation of the similar query. You can also use this feature, along with the additional information in sys.dm_exec_query_stats to produce cumulative statistics (such as avg IOs, elasped time, etc. BOL has a nice query to accumulate similar plans, the relavent part is simply "GROUP BY query_hash".

But because you have similar queries, should you always have a parameterized query? What if "...having  sum(qty) = 30" has a completely different plan than "...having sum(qty) = 130"? You can find this information by looking at query_plan_hash. If two plans have similar structure but a different query plan, the query_hash value will be identical but query_plan_hash will be different. That's an indication that parameterization might not be the way to go; the first query to executed will cache it's plan and the other query will use the same plan. This is because of a SQL Server behavior known as parameter sniffing.

Because parameterizing queries is usually one of the first things programmers can do to affect performance, it helps to know when to parameterize. In addition, many folks like to start query tuning by tuning the queries (including similar queries with different parameter values) that are executed most frequently. Query_plan_hash and query_hash gives you visibility into this important information.

There's a newer version of SQL Server 2008 (Release Candidate 0) available on MSDN and Technet.

There's a couple of features that show up in RC0 that are performance related. One is a server setting 'optimize for ad hoc workloads'. This one tells SQL Server to save a "compiled plan stub" in the query cache for adhoc queries, rather than the query plan that's usually saved.

You can see these stubs as cacheobjtype 'Compiled Plan Stub' in sys.dm_exec_cached_plans. And you can see them because trying to resolve a plan_handle to a query plan (using sys.dm_exec_cached_plans or sys.dm_exec_query_stats) doesn't work for stubs, like it does for "real" plan handles. Saving a stub rather than the entire plan saves precious plan cache memory.

Here's a little script to try it out.

sp_configure 'show advanced options',1
reconfigure
go
sp_configure 'optimize for ad hoc workloads',1
reconfigure
go
-- don't use on a production system, clears whole query cache
DBCC FREEPROCCACHE
go
use pubs
go
-- background plans to produce this DMV, note there is a stub for one of these plans
select * from sys.dm_exec_cached_plans
go
-- not autoparameterized, stub produced
SELECT title_id, sum(qty) from sales
group by title_id
having sum(qty) = 30
go
-- you can see the stub
select * from sys.dm_exec_cached_plans
go
-- autoparameterized, whole query plan for parameterized version
SELECT *
FROM titles
WHERE price = 19.99
go

-- stub for the non-parameterized version, plan for the parameterized version
SELECT usecounts, cacheobjtype, objtype, [text]
FROM sys.dm_exec_cached_plans P
   CROSS APPLY sys.dm_exec_sql_text(plan_handle)
   WHERE [text] NOT LIKE '%dm_exec%'
ORDER BY p.usecounts DESC
-- query plan for stub query handle is not saved returns NULL
SELECT sql.text, p.query_plan
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(sql_handle) sql
CROSS APPLY sys.dm_exec_query_plan(plan_handle) p
WHERE text NOT LIKE '%sys_dm_exec%' AND text NOT LIKE '%msparam_0%'
ORDER BY qs.EXECUTION_COUNT DESC
go

-- set it back, same experiment produces plans not stubs
sp_configure 'show advanced options',1
reconfigure
go
sp_configure 'optimize for ad hoc workloads',0
reconfigure
go

In a blog posting about a few weeks ago, I'd written about noticing a DataSet being created in an Entity Framework program by using the ADO.NET client trace facility. Entity Framework programs do (indirectly) cause a DataSet to be created, but its only for resolving the ProviderInvariantName of the underlying data provider. NOT for anything related to the functioning of the EntityClient or ObjectContext in any way. I've revised the original blog posting to remove the DataSet reference, but just in case you already read it....

The connection string for the EntityClient data provider contains a parameter that refers to the ProviderInvariantName of the underlying data provider (e.g. System.Data.SqlClient). Entity Framework validates the underlying data provider name by using the ADO.NET provider enumerator class DbProviderFactories. DbProviderFactories creates the DataSet to store the provider information obtained from the machine.config file.

I might have guessed this...as I'd written a whitepaper on the functionality of the ADO.NET provider model, "Generic Coding with the ADO.NET 2.0 Base Classes and Factories" not more than a few years ago. Using the DbProviderFactories class is the only way to enumerate the ADO.NET data providers installed on your machine.

Come to my "End-To-End Tracing with SQL Server 2008 and ADO.NET" talk at TechEd next week and I'll show you the trace.

I'm getting ready to head out to TechEd in Orlando. I'll be doing a full day of SQL Server 2008 for Developers on Monday, a talk on SQL Server client and server-side tracing (including SQL Server 2008 Extended Events), one on PowerShell and SQL Server, and finishing up with a talk on SQL Server Security for Developers and Architects. I'll also be hanging around the SQL Server Technical Learning Center (on the showfloor) at least part of the time I'm there.

If you're around, stop by and say hi...

Categories:
SQL Server 2008

It's a good idea when talking to a database to save on network roundtrips. The table-valued parameter in SQL Server 2008 is an example of a feature that can reduce them in the "1 order, 1-n detail items" use case.

So its always been mildly irritating that in order to insert a row with a filestream column, you'd need to make 2 database roundtrips. One roundtrip is to execute the INSERT statement, inserting an empty value in the filestream column. This causes the file to be created, a NULL value won't cause file creation, and you need the PathName to open a file handle or use the new SqlFileStream .NET class. The second roundtrip is used to get the FILESTREAM_TRANSACTION_CONTEXT and PathName. Then you have the info you need for the handle/SqlFileStream.

However, with the OUTPUT clause introduced in SQL Server 2005 you can do it all in one roundtrip. Here's the SQL statement, based on the BOL filestream example. You can run this code using an ADO.NET DataReader to retrieve the values, and construct a SqlFileStream instance to write to.

INSERT INTO dbo.student
OUTPUT Inserted.Resume.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()
VALUES (newid (), 'Mary', CAST ('' as varbinary(max)))";

Of course, if you're writing a large value into a file anyway, the savings of one database roundtrip in the overall scheme of things is questionable, but using minimum roundtrips is a good habit to get into in any case.

This item was mentioned in other blogs too, but just to mention it again (so my SQLCE and Sync Services friends won't think I overlooked it)...

Visual Studio 2008 SP1 Beta contains direct support for using SQL Server 2008 Change Tracking. When you use ADO.NET 1.0 Sync Services with a SQL Server 2008 database, the Sync Services designer (that's Add/New Item/Local Database Data Cache) adds a checkbox that allows you simply to "Use SQL Server Change Tracking". No extra triggers, no tombstone tables, change tracking does it all for you. Visual Studio generates some scripts to enable change tracking at a database level and at a table level for the tables you select.

There's some nice generated code that warns you if you need to resync because you haven't synchronized often enough. And you can tweak the SQL scripts for more control. You can specify a CHANGE_RETENTION (the default is 2 days) and whether or not AUTO_CLEANUP is ON (it's on by default) in your ALTER DATABASE statement. And if you want to write your own, more granular synchronization or conflict resolution code, you can specify WITH TRACK_COLUMNS_UPDATED = ON (default is OFF).

Something that puzzled me was, although the SQL Server Books Online states: "Using snapshot isolation is the easiest way for you to help ensure that all change tracking information is consistent. For this reason, we strongly recommend that snapshot isolation be set to ON for the database.", the ALTER DATABASE DDL generated by Visual Studio did not include the option to SET ALLOW_SNAPSHOT_ISOLATION ON. Enabling snapshot isolation in SQL Server does involve a non-trivial amount of overhead, so it looks like they set it to the least overhead option and let you decide if you need snapshot isolation after reading the arguments/scenarios where it's useful in Books Online. Probably a wise choice.

Synchronization Framework is an exciting technology, given that there's so many possible uses of synchronization with "Local Data Caches" and, with the support for Sync Services in mobile devices. The next release of the Synchronization Framework (2.0, there's a beta out already) even contains support for Peer-to-Peer synchronization through the model. Perhaps this will fit some additional "replication via services" in databases. We'll have to wait and see if the Peer-to-Peer sync provider even supports databases, I guess. Always something interesting going on...

Last fall I'd asked the folks on the data access team about the possibility of including the new SQL Server 2008 data types in the new data access object mapping layers. At that time they said "no" but they also asked "why do you want it?" and "are people expecting it?". Someone must have had some compelling arguments.

I'm sure you've read this in other blogs, but in Visual Studio 2008 SP1 Beta, which arrived Monday, there was...drumroll please...support for all four date and time types in not only LINQ to SQL, but in ADO.NET Entity Framework. Thanks! I'm still a bit surprised about the inclusion of DateTimeOffset in EF (its a SQL Server-specific data type and EF is more platform-neutral than LINQ to SQL), but it was a cool surprise to have them there. 

Everyone's blog entry also points out there is some special support for Filestream storage (including a System.Data.Types.SqlFilestream class), but they identify Filestream as a data type. Filestream is a storage format rather than a data type, the actual data type is varbinary(max). Minor terminology nit. You can read and write FILESTREAM based types without resorting to PInvoke. Cool.

Of course, there were some SQL Server 2008 data types (the SQLCLR-based types) that aren't "in there". These are the spatial types (Geometry and Geography) and the HierarchyID data type. Perhaps for EF/LINQ to SQL version 2. These types are correctly identified in Server Explorer in a table but attempting to add a table containing them to an EF diagram fails (it actually leaves the type out of the diagram) and in LINQ to SQL throws an error "the data type is not supported".

That means if you have one of these types in your SQL Server 2008 database, you need to do something funky like treat it as a varbinary(max) and deserialize it (think IBinarySerialize) on the client yourself. Because the SQLCLR-based data types would have a one-to-one mapping to .NET classes (Geometry = Microsoft.SqlServer.Types.SqlGeometry, Geography = SqlGeography, HierarchyID = SqlHierarchyID) these would possibly be straightforward to implement in the next release. They WOULD be a potential model-breaker in that you would have "columns" (entity properties) that have their own properties and methods. Another whole level of indirection.

I've said before that IMHO, spatial support is the "killer developer feature" in SQL Server 2008. Let's hope for more API support in future. But THANKS for the inclusion of the date and time types.

After posting the last blog entry, I realized that I hadn't blogged in about a month. Hmmmm...

Well, since last month I posted the demos for my SQLConnections talks on the SQLskills website (under PastConferences). I really had a good time at SQLConnections, its been a while since I'd been invited to speak there. Thanks to those folks who showed up and said hi.

I've been updating my classes for SQL Server 2008 while waiting for the next CTP. Don't know when that's coming, but everyone's guess is "sometime in June". And I've found a few more neat things in SQL Server 2008 that I'll get around to writing about some day soon. And did some experimenting with the Visual Studio 2008 SP1 Beta (see previous post).

And I've done a lot of yardwork and gardening. I've been outside almost every day; might even lose that pastey-white "programmer who never sees the sun" look. But its 97 degrees in Portland today (yes, 97, really) and its a bit hot for gardening.

Next stop...TechEd Developer Week in June. I'm doing a precon there on SQL Server 2008 and a couple of other interesting talks. Wonder if it will be 97 degrees in Orlando... If you're in Orlando stop by and say hi.

Categories:
SQL Server 2008

Every once in a while I'll get inquires on a paper I wrote in 2004 about the ETW trace providers for ADO.NET (named "Tracing Data Access"). I got an inquiry today, and since I'd recently installed Visual Studio 2008 SP1 Beta, I thought I'd try it out on a LINQ to SQL program and an Entity Framework program.

If you haven't read the paper, the original is still available on MSDN, but an embellished version (including SNAC tracing in addition to ADO.NET) was released in 2006. I'm also doing a talk at TechEd on "End-to-End tracing in ADO.NET and SQL Server 2008". If you're at all interested in client tracing or SQL Server 2008 Extended Events, it will be worth your while to attend.

Unless the LINQ to SQL program has its own ETW provider that I missed, I got no LINQ to SQL specific trace events. Since LINQ to SQL is a thin layer over System.Data.SqlClient, I do get the events for SqlClient however.

Entity Framework however, is another matter. I not only get EF-specific trace events, but there appears to be some new trace namespace abbreviations for EF as well. These are the ones I've seen while going through one trace of one simple EF program. I may have missed some.

dobj = System.Data.Objects
ec   = System.Data.EntityClient
cqt  = System.Data.Common.CommandTrees
pc   = System.Data.Query.PlanCompiler

There's quite a bit of trace information for some parts of EF, like the CommandTree portion. But after a whole half-hour and a few traces, it appears that not all items in EF are as thoroughly instrumented with trace events as, say, SqlClient. On the other hand, EF provider for SQL Server will show lots of SqlClient-specific and System.Data.Common events, as does LINQ to SQL.

Categories:
SQL Server 2008

I'm heading off to SQLConnections in Orlando early tomorrow morning. I'll see starting with a preconference "Day of SQL Server 2008 for Developers" and covering, well, every new feature that could interest a developer. I'll also being doing a series of talks about everything from Spatial data support to Extended Events to PowerShell in SQL Server to XML for DBAs.

If you'll be in Orlando for the conference, stop by and say hi.

Categories:
SQL Server 2008

One of the uses for sparse columns will likely be replacing entity-attribute-value designs. In an EAV design, the "sparse attribute" values are represented by a separate table. The attribute value can either be declared as (n)varchar or sql_variant. An example I've always used is the hardware store that wants to keep track of its products in a "products" table, but each product has different attributes. If you use the minimum number of columns to simplify the example, the EAV design looks like this.

create table products -- base table
(
 id int primary key,
 name varchar(max)
)
go
create table properties -- sparse attribute (name-value pair) table
(
 id int,
 name varchar(50),
 value varchar(max),
 CONSTRAINT PK_properties PRIMARY KEY (id, name),
 FOREIGN KEY (id) REFERENCES products (id)
)
go

Here's a straightforward way to convert the table to use SQL Server 2008's sparse columns. It uses dynamic SQL, but in this case there's no user input (SQL injection worries).

declare @tab nvarchar(max),
        @sql nvarchar(max)

set @tab=N'create table products2 (id int primary key, name nvarchar(max) '
select  @tab=@tab+','+ name + ' varchar(max) sparse' from properties
group by name
set @tab += ' ,col_values xml column_set for all_sparse_columns);'
-- select @tab
exec(@tab)

Populating it is also straghtforward using the pivot operator, introduced in SQL Server 2005. In the case where each item has only one of each sparse property (the table constraint enforces this), and our properties table has only three columns, there's no real aggregation with pivot. The aggregate is just required by the pivot operator syntax. This populates the table:

declare @col nvarchar(max),
        @sql nvarchar(max)

set @cols=N''
select  @cols=@cols+','+ name from properties
group by name
 select @cols
set @cols=substring(@cols,2,datalength(@cols)/2 - 1)
set @sql=N'insert into products2 (id, name,' + @cols + ') select p.id, p.name, '+@cols+
      N' from (
      select id, name, value
      from properties
      ) as q
      pivot
      (
      max(value)
      for name in ('+@cols+
      ')
      ) as PivotTable'
set @sql += ' join products p on PivotTable.id = p.id'
-- select @sql
exec (@sql)

What remains to be done now is to choose better data types for the sparse columns, if the data isn't really a string. If you haven't enforced value type correctness in the application, this may require some data cleansing. Full example as an attachment. Remember that currently (in CTP6) you can only have 1024 total columns in a table; but the limit will be increased to 30000 sparse columns before RTM.

eav_to_sparse.zip (1.23 KB)

It's easier than you'd think... When you use the SQL Server 2008 CREATE EVENT SESSION DDL statement with extended events, you specify:

Events to be captured (e.g. sqlserver.error_reported)
Actions to be fired to add more information (e.g. sqlserver.sql_text)
Predicates for conditional capture (e.g. where sqlserver.error_reported.error = 547)
Event target to collect the events (e.g. package0.ring_buffer)
Addtional options (e.g. MAX_MEMORY)

In general, the way you look for events, actions, predicates, etc to use is to query against the metadata views:

select p.name + '.' + o.name as [Full Name],
       o.description s [Description]
from sys.dm_xe_objects o
join sys.dm_xe_packages p on o.package_guid = p.guid
-- @type can be 'event', 'action', etc
where o.object_type = @type order by p.name, o.name

Two of the actions listed have to do with activity (causality): package0.attach_activity_id and package0.attach_activity_id_xfer. In addition there is an option TRACE_CAUSALITY. I thought you'd have to set the TRACE_CAUSALITY option and add the actions to collect activity information. But attempting to add either action produced an error.

It's easier than that. Simply turning on the TRACE_CAUSALITY option is enough. The actions can't be specified in CREATE EVENT SESSION (or ALTER EVENT SESSION) because they are for internal use. The activity_ids show up at the target without explicitly naming the actions.

It appears that there are some changes in .NET 3.5 System.Transactions (or System.Transactions.dll and System.Data.dll version 2.0.50727.1433 if you're looking at assembly versions). Florin Lazar blogs about a change to the syntax here. And Alazel Acheson blogs about changes to SqlClient changes to accomodate using lightweight transactions with less promotion to distributed transactions in the ADO.NET Team blog.

It is interesting to read how the latest version of SqlClient can combine automatic transactions with connection pooling to use promotion to a distributed only when necessary. It was reminiscent of how COM+ used the connection pool; the difference is that COM+ always used distibuted transactions. Some of the changes use a transaction-aware connection reset mode that's only supported in SQL Server 2008. So server changes were required as well.

At the end of the blog entry Alazel mentions that you can't use this facility with two open connections at the same time; it requires that there is a free connection with the appropriate transaction scope in the pool. I immediately thought of SQL Server's sp_getbindtoken and sp_bindsession, the manual way to allow two connections to share a single transaction context. But sp_getbindtoken and sp_bindsession are on the deprecation list (to be removed in a future version of SQL Server) in SQL Server 2008. The books online entry for these calls recommends "using MARS or distributed transactions instead". But why MARS?

MARS (multiple active resultsets) made its appearence in SQL Server 2005 and its data access stacks. It permits you to have multiple interleaved sessions while using a single SQL Server connection. These multiple sessions share the same transaction, modulo an interesting but escoteric behavior known as batch-scoped transactions. MARS also does not support named savepoints.

What MARS and sp_getbindtoken/sp_bindsession have in common is that they are both solutions to the same "problem". I prefer calling this a SQL Server "behavior" because its not technicall a problem, just how the underlying network stack works. SQL Server does not allow other activity on a connection (like an UPDATE or a second SELECT) while reading a rowset is in progress. Pre-MARS, if you wanted to have multiple commands in a single transaction scope, you would use two connections and "bind" the transaction scopes together with the transaction token. With the MARS capability this is no longer necessary.

So, the OpenConnction1/DoCommand1/Close then OpenConnection2/DoCommand2/Close pattern is now acommodated by System.Transactions and SQL Server 2008 without transaction promotion. To do two commands with the same transaction scope without closing the connection, use a single connection and multiple sessions with MARS. With either data access pattern, only a single physical connection to SQL Server is needed.

Short post this evening...

Just in case anyone else flails around looking for this feature, its right under your nose. When you use the PowerShell SQL Server 2008 provider, you have visibility to a single, local machine (and all its SQL Server instances you can access with integrated security) by default. To get access to multiple machines using the provider, simply reference a SQL provider path that contains that machine name. If the (Windows) principal has access to the other machine's SQL Server instance, it will open a connection using Windows auth.

For example, say that I'm on a machine named zmv20. I have access to machine zmv21's SQL Server instance as well.

>cd SQLSERVER:\SQL
>dir

MachineName
-----------
zmv20

>dir zmv21  <---- makes a connection to zmv21

Instance Name  (on zmv21)
-------------
DEFAULT

>dir           <---- now you can "see" zmv21 too

MachineName
-----------
zmv20
zmv21

Using the test-path cmdlet also works.

>test-path SQLSERVER:\SQL\zmv22  <---- can I login to this machine too?

But bear in mind that this is subject to a connection timeout lag. The error message indicates that its first using WMI to obtain the machine connection.

Of course, all this is doc'd in SQL Server Books Online. Where I missed it a few times...I was looking for the equivalent of a "connect" command. You don't need one.

Categories:
PowerShell | SQL Server 2008

I'm still getting used to the new sparse column feature in SQL Server 2008.

I'd just read in the BOL definition of ALTER TABLE that you can add a column_set to an existing table. I was converting a sample app from an EAV (entity attribute value) design to sparse columns. I used the existing "attribute-value" table to create the sparse columns, then created the table. I then went back to add the column_set with ALTER TABLE after the fact.

alter table sparsetest3 add spcolset xml column_set for all_sparse_columns

Received this error message:
Msg 1734, Level 16, State 1, Line 1
Cannot create the sparse column set 'spcolset' in the table 'sparsetest3' because the table already contains a sparse column set. A table cannot have more than one sparse column set.

Huh? I figured that I should be able to add the column_set and filed the lack of this capability as a bug. It came back as "by design".  And doc'd in BOL that way. The fact that its by design makes perfect sence and helped to solidify in my mind how the column_set works.

In a "normal" table (even with sparse columns) without a column_set, "select * from table" returns all of the columns. If a table has a sparse column_set, "select * from..." behaves differently, and returns only the non-sparse columns and the column_set column. NOT the individual sparse columns. You can INSERT or UPDATE this table by using only the column set. When you update using a column_set, all of the sparse columns that you don't specify are set to NULL. You can even update to column_set to NULL itself, which NULLs out all of the sparse columns.

The reason that "A column set cannot be added to a table if that table already contains sparse columns" (BOL exact wording under "Guidelines for using sparse columns") is that it could break existing code that uses "select * from...". Imagine:

create table sparsetab (
 id int identity primary key,
 col1 int,
 spcol2 int sparse
);

select * from sparsetab -- returns spcol2
-- add column_set (this is disallowed, but imagine it DID work)
select * from sparsetab -- doesn't return spcol2, only the column_set, can break code that relies ont spcol2

And BOL is right in ALTER TABLE as well.

create table sparsetab2 (
 id int identity primary key,
 col1 int
);
go
-- add first sparse column and column_set at the same time, works fine.
alter table sparsetab2
 add spcol1 int sparse,
       spcolset xml column_set for all_sparse_columns
go

You can even, as BOL indicates, add a column_set to a table that does not yet have a sparse column.

create table sparsetab4 (
 id int identity primary key,
 spcolset xml column_set for all_sparse_columns
);
go
-- Now you can add sparse columns, they use the column_set

The ONLY issue I have is with the error message 1734 at the beginning of the post. It's misleading, because I DON'T already have a column_set. But I DO already have sparse columns.

In last sparse column correction from a long-ago post. I'd heard (early on) that you would be able to have over 4 million sparse columns (actually sizeof(int) of them). This turns out to be incorrect, it was announced lately that the limit will be 30,000 sparse columns. And, in the current CTP6 you can only have 1024 total columns (in as previous versions); the sparse column limit will be changed sometime before RTM.

Amazingly, when I quoted the 30,000 column limit to a class last week, there was a groan. One student told me his EAV table already had over 60,000 unique attributes. 30,000 wouldn't be enough...must be a HUGE EAV table. That's the motivation for sparse column. Also, his EAV table had the "value" column defined as SQL_VARIANT. That's the other motivation, sparse columns are strongly typed. Although inserting through the column_set always uses a string (nvarchar) as the value, and attempts to convert string to the definied data type for specific columns.

This is the last part of a series on programming policy-based management. The series starts here.

In the previous installment, I created a policy that was constrained to a single database. To accomplish this, I used a Condition that called out the database by name, and tied it to the TargetSet using TargetSet's SetLevelCondition method.

An alternative consists of creating a policy as part of a PolicyCategory. Each Policy is a member of exactly one PolicyCategory. The default PolicyCategory is the only one that "ships with the system", but you can define your own. If you don't specify otherwise in the code (SSMS has a Category dropdown on the Description page of the new Category dialog), your policy is a member of the default category.

Each PolicyCategory has a property that indicates whether it's manditory that a database subscribe to that category. If this property is true, each database has an implicit subscription to the category. If not, a database must explicitly subscribe. The code to define a PolicyCategory is straightforward:

static void CreateCategory(PolicyStore ps)
{
    PolicyCategory cat = new PolicyCategory(ps, "MyNewCategory");
    cat.MandateDatabaseSubscriptions = false;
    cat.Create();
}

To create a Policy that's a member of the PolicyCategory, simply use the aptly-named PolicyCategory property. If you're using a named PolicyCategory you may not want to restrict that policy to a specific database. Here's the changes to the Policy definition.

// No Condition On This One, applies to all databases, but must be subscribed to
//ts1.SetLevelCondition(ts1.GetLevel("Server/Database"), "FinanceDB");

// Name the category
p2.PolicyCategory = "MyNewCategory";

To create a PolicyCategorySubscription (for those categories that are not Manditory), the only interesting part is that you need a SqlSmoObject. You can use a concrete subclass (like Database) or make up a SqlSmoObject by using a URN. Dan Sullivan's and my book "Developer's Guide to SQL Server 2005" covers both ways to make an SqlSmoObject. Here's the code for PolicyCategorySubscription.

static void CreatePolicyCategorySubscription(PolicyStore ps)
{
    Server svr = new Server(); // open a connection to default instance, local server
    Database db = new Database(svr, "pubs");
    PolicyCategorySubscription subs = new PolicyCategorySubscription(ps, db);
    subs.PolicyCategory = "MyNewCategory";
    subs.Create();
}

This means that only the pubs database now follows the policies in "MyNewCategory". BTW, there's currently the PowerShell provider in CTP6 throws an error when attempting list a PolicyCategorySubscription (its in the hierarchy at the same level as Policy). The PolicyCategorySubscription still works as advertised though.

This concludes the series on Programming Policy-Based Management with SMO. Hope it was useful.

This post is part of a series on programming policy-based management. The series begins here.

So, we were working with a policy that required an ObjectSet. ObjectSets contain TargetSets. For example, the ObjectSet for the naming policy (IMultipartName) we were working on needs a TargetSet for Procedure, Synonym, Table, Function, Type, View, and XmlSchemaCollection.

Note that this collection is similar to what you'd see for a MultipartName policy in the SSMS designer dialog. The title for it in SSMS is "Against Targets:". We want our policy to apply only to Tables. After fumbling around for a while attempting to define TargetSets and add them to the ObjectSet's collection of them, I found that the seven TargetSets I needed were *already* defined. This reduces enabling only the table's TargetSet to two lines of additional code. Notice that you can reference a specific TargetSet in an ObjectSet by using an indexer. The indexer is an SMO URL.

TargetSet ts1 = os1.TargetSets["Server/Database/Table"];
ts1.Enabled = true;

But how to restrict this policy to a single database? For this we need a Condition to name the database. Because this condition is simple we can use ExpressionNode.Parse(). Here's the Condition code.

// Create a condition to enforce
Condition con = new Condition(ps, "FinanceDB");
con.Facet = "Database";
// Note: Using Parse() treats the string as an SMO URL. Only works for simplest cases
string s = "@Name = 'finance'";    
// try-catch code omitted for brevity
con.ExpressionNode = ExpressionNode.Parse(s);
con.Create();

Back to our TargetSet. We restrict the TargetSet to a specific database by using TargetSet.SetLevelCondition. SetLevelCondition takes a TargetSetLevel, and we get the appropriate TargetSetLevel (which is prepopulated) by.... you guessed it...using a SMO URL. After ts1.Enabled, this limits the policy to a single database defined by our Condition.

// FinanceDB is the name of our Condition that "limit/defines" this policy only to Finance
ts1.SetLevelCondition(ts1.GetLevel("Server/Database"), "FinanceDB");

Having initialized the ObjectSet correctly, we now create the ObjectSet, tie it to the Policy and voila...

os1.Create();
p2.ObjectSet = "CheckFinanceTab_ObjectSet";
p2.Create();

Check this policy by using the following code in a query window:

use finance
go
create table dbo.foo (id int);
go

You get the expected error:
Policy 'CheckFinanceTab' has been violated by '/Server/(local)/Database/finance/Table/dbo.foo'.
This transaction will be rolled back.
Policy description: ''
Additional help: '' : ''.
Msg 3609, Level 16, State 1, Procedure sp_syspolicy_dispatch_event, Line 50
The transaction ended in the trigger. The batch has been aborted.

This is part of a series on programming policy-based management. The series starts here.

Now, we'll tackle programming a little bit more complicated policy. The table-naming standard that applies to a set of database objects. For this, we need three items:
1. Condition for defining the policy itself.
2. Policy that uses the condition and contains....
3. Condition that specifies a set of database objects to which the policy should be applied.

The first condition is straightforward

Condition con = new Condition(ps, "TablePattern");
con.Facet = "IMultipartNameFacet";
ExpressionNode exp = new ExpressionNodeOperator(OperatorType.LIKE,
    new ExpressionNodeAttribute("Name"),
    new ExpressionNodeConstant("fintbl%")
    );
con.ExpressionNode = exp;
con.Create();

Note that, as with the complex condition we specified previously (part 3), you can't specify 'LIKE' by using the Parse() method, but 'LIKE' does appear when you access the condition by using ToString(). Perhaps, in future, Parse and ToString will be reflexive on an ExpressionNode. Or maybe I'm just missing something.

The first part of the policy is straightforward too:

Policy p2 = new Policy(ps, "CheckFinanceTab");
p2.Condition = "TablePattern";
p2.AutomatedPolicyExecutionMode = AutomatedPolicyExecutionMode.Enforce;
p2.Enabled = true;

But we want this policy to apply to to the TABLES (not views, for example) and only those tables in the finance database. For this, we need an ObjectSet.

ObjectSet os1 = new ObjectSet(ps, "CheckFinanceTab_ObjectSet");
os1.Facet = "IMultipartNameFacet";

Note that ObjectSets reference Facets too. Note also that ObjectSets are a "top-level" object, that is, in the PowerShell provider they appear at the same level as Conditions and Policies. Note that Facets do NOT appear at that level.

Before we go any further, cavaet... there is no function in SSMS to define an ObjectSet standalone. Therefore, no way to delete a "half-baked" ObjectSet from SSMS if you don't want it. To delete an ObjectSet you don't want, you need to go into the PowerShell provider, navigate to SQLSERVER:\SQLPolicy\{server}\{instance}\ObjectSets and delete it. "del MyObjectSet". In SSMS, deleting a policy that refers to an ObjectSet deletes the ObjectSet too. Not sure if you can have multiple policies refer to the same ObjectSet yet, but if you could I assume you'd have to delete all the policies before an ObjectSet would go away.

This is part three of a series on programming policy-based management. The series starts here.

So, to initialize my Condition's ExpressionNode I need more than just a string. It doesn't look like, at this time, every ExpressionNode CAN initialized with a string. But we can use the subclasses. The ones I need here are Operator, Attribute, and Function. I need: DatabaseMailEnabled (Attribute), Equals (Operator) and false (Function). BTW, ExpressionNodeFunction appears to be what you are programming when you use the "Advanced Functions" dialog in the SSMS Condition dialog.

Putting one expression together first looks like this:

ExpressionNode exp1 = new ExpressionNodeOperator(OperatorType.EQ,
    new ExpressionNodeAttribute("DatabaseMailEnabled"),
    new ExpressionNodeFunction(ExpressionNodeFunction.Function.False)
    );

Note that, unlike in SSMS, you don't need '@' before the attribute name. But I need to specify both DatabaseMail and SqlMail. That's just a little more complex.

ExpressionNode exp1 = new ExpressionNodeOperator(OperatorType.EQ,
    new ExpressionNodeAttribute("DatabaseMailEnabled"),
    new ExpressionNodeFunction(ExpressionNodeFunction.Function.False)
    );
ExpressionNode exp2 = new ExpressionNodeOperator(OperatorType.EQ,
    new ExpressionNodeAttribute("SqlMailEnabled"),
    new ExpressionNodeFunction(ExpressionNodeFunction.Function.False)
    );
ExpressionNode exp_both = new ExpressionNodeOperator(OperatorType.AND, exp1, exp2);

con1.ExpressionNode = exp_both;
con1.Create();

And, nice as you please. Looks just like the one defined in SSMS using the GUI. Make sure that you remove (or comment out) the original ExpressionNode.Parse statement.

Now, on to the policy.

// Create a policy that uses the condition we just created
Policy p1 = new Policy(ps, "OffByDefaultSMO");
p1.Condition = "MailOffSMO";
p1.AutomatedPolicyExecutionMode = AutomatedPolicyExecutionMode.None;
p1.Enabled = false;
p1.Create();

Looks like the one created by the GUI and works like it too.

This is a pretty simple policy, because it can only be applied at the instance level. Next, we'll look at a policy that can be applied to a set of database objects, e.g. All tables in a particular database. This requires that we investigate ObjectSets.

Cavaet. It's pretty easy to make a mistake, sans docs. Some mistakes produce a generic "I can't do this"-type message. So ALWAYS drill into the INNER exception if you get an error. That is:

try
{
    con1.ExpressionNode = exp_both;
    con1.Create();
}
catch (Exception ex)
{
    Console.WriteLine(ex.Message);
    if (ex.InnerException != null)
    {
        Console.WriteLine(ex.InnerException.Message);
    }
}

Stack trace might even be helpful.

This is the second part in a series about programming policy-based management. The series starts here.

To build our MailOffByDefault policy we need:
   Condition that specifies properties and settings
   Policy that uses the condition

Condition first. This looks pretty straighforward.

Condition con1 = new Condition(ps, "MailOffSMO");
con1.Facet = "ISurfaceAreaFacet";
con1.ExpressionNode = ExpressionNode.Parse(
                        "@DatabaseMailEnabled = 0 and @SqlMailEnabled = 0");
con1.Create();

The Condition class uses PolicyStore instance (ps) and names the condition (MailOffSMO). You initialize a facet with a string. So where did the string come from? It is named in the SMO library but I "found" it by making an equivalent condition with SSMS and inspecting it.

foreach (Condition c in ps.Conditions)
 // ... look at c in the VS visualizer

ExpressionNode should "define" my condition, and since there is no ExpressionNode constructor, I first tried Parse(). This created the Condition, but it was unusable in SSMS. SSMS wanted "false" not "0". I searched around for how to specify "false" in the parse string for a while, then came upon something better.

ExpressionNode has six subclasses that can be used in combination to specify any set of expressions that you need. These are
  ExpressionNodeAttribute
  ExpressionNodeConstant
  ExpressionNodeChildren (with subclasses)
    ExpressionNodeFunction, ExpressionNodeGroup, and ExpressionNodeOperation

That's next.

I've been looking at the new Policy-Based Management (was: Declarative Management Framework) in SQL Server BOL. All of the BOL examples use the SSMS user-interface to define and maintain the policy store. While there will likely be 3 ways to configure PBM as there is with, say, replication (SSMS, SMO, and system stored procs), I thought I'd take a try at programming it using the new SMO libraries.

Because the docs are sparse (there are listing of the new classes but no description of what they do in BOL), I figured I'd start by writing C# code, because I like the visualizers (those components that allow you to drill into a heavily nested structures while debugging) in Visual Studio. Later on, I'll port these to PowerShell. I can also use PowerShell reflection capabilities and the new SQL Server provider to get a quick look a the structures.

Be aware of the fact that, since this API is so sparsely documented it could change by RTM. Always a consideration.

The new classes live, for the most part, in two libraries:
  Microsoft.SqlServer.Dmf
  Microsoft.SqlServer.Management.Sdk.Sfc

I also added references in the project to:
  Microsoft.SqlServer.ConnectionInfo
  Microsoft.SqlServer.PolicyEnum
  Microsoft.SqlServer.Smo

I mean to start by replicating the two examples in the PBM (is that its new acronym?) books online tutorial. But first, we need a starting point. It's the PolicyStore class. The PolicyStore is also the machine-root of the PowerShell provider drive SQLSERVER:SQLPolicy\{machine}\{instance}.

You can initialize the PolicyStore's connection with an instance of SqlStoreConnection from the ...Management.Sdk.Sfc namespace. Not sure what Sfc stands for, but being an old C++ programmer, perhaps its SQL Foundation Classes ;-) ? No matter. Luckily you can initialize a SqlStoreConnection with a plain old SqlConnection. So, lets connect to the store.

SqlConnection conn = new SqlConnection("server=zmv32;integrated security=sspi");
PolicyStore ps = new PolicyStore();
ps.SqlStoreConnection = new SqlStoreConnection(conn);
conn.Open();

Where to go from here? In the PowerShell provider, the subdirectories of the policy store are Conditions, Policies, ObjectSets, PolicyCategories, and PolicyCategorySubscriptions. To create the first policy (MailOffByDefault) we need a Condition and a Policy. That's next.

This is the last post in the series, at least for now. I'll update it (or post more on the topics discussed here) as the products involved evolve and mature. This one's about:

LINQ to SQL and EF queries will be "untunable" because, even if you discover a performance problem, you can't change the underlying API code to produce the exact SQL query that you want. There's too many layers of abstraction to change it.

T-SQL is a declarative language, allowing you the ability to rewrite queries for better performance. A whole cottege industry has grown up around this (I teach it myself), and it usually consist of changing the SQL to get the plan you want, based on your intimate knowledge of the (current) data and the (current) use cases. As one of the simplest examples, you can switch between joins, correlated subqueries, and nested subqueries to see which one gives best performance. Or use EXISTS rather than a correlated subquery or IN clause.

Because the queries (LINQ and/or ESQL) are programmatically transformed in SQL queries there is not (that I'm currently aware of) the ability to "rephrase" LINQ/ESQL queries to produce subtlely different SQL queries and thus better performance. If you can produce rephrased SQL by changing a LINQ/ESQL query (not just rewriting a LINQ/EF query to produce different results that are more optimal), I'd be interested in hearing about it. Perhaps another cottege industry awaits...

BTW, although most/many SQL queries can be rewritten (sometimes many different ways) and tested for best generated query plan/best performance, the limitation is that, in future, the query processor can get smarter, thus making your past years' work unncessary. Usually though, you've benefited from rewriting SQL for that extra 6 mos-5 years until the query processor changed anyway.

Besides query rewrites, you can also "hint" queries, in most dialects of SQL I've seen. This helps when the query processor chooses a suboptimal plan (uncommon, but not unheard of) and you have intimate knowledge or data and use cases. Or when you're trying to service different use cases with the same query; SQL queries only have one plan at a time (modulo parallelized plans) and you might have to satisfy different use case by differently hinting the same query. Because the translation to SQL is deeply imbedded in the LINQ/EF source code, if I find a performance problem that can be helped with a hint, I can't hint in the LINQ/ESQL code. This means going back to using stored procedures (they work with hints) and away from the model.

Hinting is usually not preferred over rewriting the SQL because hints "tie the query processor's hands", i.e. if the statistics change so that a different plan would work better, the query processor can't use this information because you've told it how to accomplish the query. You've changed SQL from a declarative language to an imperative language. It's best not to put query hints in code, but separate them to a separate layer. SQL Server 2005 calls this separate layer plan guides. The plan guide is a named database object that relates a hint to a query without changing the underlying code. You can add/drop plan guides or turn them on/off at will. Or re-evalute them when things (the statistics or use cases) change.

Can you use plan guides with LINQ/EF queries? Two things to keep in mind. First, a plan guide for a SQL statement requires an exact match on a batch-by-batch basis. Machine-generated SQL will likely make exact match easier, but you will have to check that the guides are being used each time LINQ/EF libraries changes. Second, plan guides work best if you have a limited number of them in your database. They're meant to be special-case... not to add another level of complexity to an already complex (and getting more so as the layers of abstraction increase) situation. So use these with care.

So, is this an issue worth worrying about? I think we'll need to wait and see. Fix a few "bad (generated) SQL or bad queries" problems before giving up entirely. Or, fix performance problems (in the generated SQL) by going to stored procedures and see how many procs you have after a year. Are the folks who are licking their chops in anticipation of LINQ/EF related perf problems justified? Well, its not me that thinks optimizing declarative languages will always have its place.

MHO.

Hope you enjoyed this series. As implementations of these models take hold, I'll be watching for items that would change my opinions. Or prove them...


This post is part of a series about worries when implementing LINQ to SQL or ADO.NET Entity Framework from a SQL database-centric programmer's perspective. The last two worries are related. It's mostly about either level of abstraction being one level away from the "real SQL code" that's being executed.

First off...
LINQ to SQL and EF will discourage the use of stored procedures to return rowsets; returning rowsets in stored procedures is *thought to be* superior to returning them in dynamic SQL. In addition, the code generators will use a subset of T-SQL query constructs, that is, only the constructs that the LINQ or ESQL language supports, rather than the full power of the T-SQL query language.

A stored procedure is always thought of by stored procedure afficianados as representing a "contract" between consumer and provider. That is, the database metadata tells me exactly what I'm going to get. Although the database metadata does indicate number, type, etc of parameters, this is absolutely not true for rowsets returned by stored procedures. There is NO database metadata that records anything about the returned rowsets, or even how many rowsets a stored procedure will return. Actually the number of rowsets returned is part of the ANSI standard but SQL Server implement it. In addition, errors that might happen in the middle of a stored procedure might result of rowsets being missing. And than there's always the possibility of returning multiple and/or different rowsets by using a stored procedure with conditional code... Not much of a rowset contract at all.

One way to ameliorate this problem (in SQL Server) is to use multistatement table-valued functions to return one rowset with known metadata. The main hassle with this is performance; a multistatement table-valued function is the equivalent of filling a table variable in code and then returning it. There are I/O considerations (the I/O of reading the base tables + I/O of reading the table variable at the consumer) The are also performance considertations as SQL Server table variables have no statistics, if the table-valued function is used as a row source in a larger query (composable queries), there is no way to estimate the number of rows returned by the TVF.

SQL Server's strongly typed table-valued parameters in SQL Server 2008 would be an analogous concept, but currently these are limited in being "input only" in procedures. No strong typed results yet. Oracle is an exception to this "no contract for rowsets" concept. Because Oracle doesn't return rowsets from stored procedures, they introduced a special parameter type called refcursor. Refcursors can appear in database APIs as a parameter (of type Refcursor or more generically "table"). And you can have strongly typed Refcursors, providing the needed contract. We'll have to wait for Oracle's (or DataDirect Technologies') EF provider or LINQ abstraction product to see how they use this.

So now that we've determined that there is no more of a rowset contract for stored procedures than ad-hoc SQL (the difference really is in SQL encapsulation and support of ownership chains, but that's another story), what about extentions that ESQL doesn't support? There are database-specifc extensions like SQL Server's PIVOT operator, or ANSI SQL standards, like ranking and windowing.

LINQ folks are quick to talk about implementation thorugh "extension methods" but the long and short of this is that these are a LINQ-ism, unrelated to LINQ to SQL. That is, the LINQ construct to SQL dialect statement mapping is fixed and embedded in the LINQ product. Using extensions to the *SQL statement mapping* (ie changing what SQL statement is produced) require either going deep into the framework (if this can be done at all) or implementing equivalent concepts on the client side, leaving the generated database code alone.

EF may have a little better story with this because each provider-writer implements the ESQL to query mapping, conceivably you could write a custom provider to encapsulate the supplied provider with extensions. However, the ESQL language itself does not have to capability of ODBC-like "escape clauses", so there'd be no way to express this extended SQL-based functionality in ESQL.

So I'd classify the "subset of SQL" and "stored procedure rowset is an anonymous type" problem as something that might be worked out in future releases of databases and frameworks. Until LINQ to SQL or EF provides escape clauses in the framework, the easiest way out is the ultimate escape clause, using the stored procedure that returns (anonymous) rowsets. And the more stored procedures are used (not CUD procedures, which enhance the model, but rowset-returning procedures), the farther away from the model you get. Which interferes with the usefulness of the model in general.

MHO.

This post covers LINQ to SQL and EF worry #4. That is:

LINQ to SQL and EF will write code that gets too much or too little data at a time. Too much data in one query is a waste. Too little data is also bad because it means extra database round-trips.

I really put this in for completeness. Both LINQ to SQL and EF have good mechanisms to deal with this one. In addition, its not necessarily (only) an ORM problem. In a file system graphic user interface, do you prefetch all of the (perhaps thousands) or files' information throughout the entire file system when someone wants to look at the content of the C drive? The answer with this one is "it depends". If you know you're going to eventually display all of the related entities' information you likely do want to get them. If not, perhaps you want to get related entities all at one, when the first child entity is selected. Or get the children one at a time when each child entity is selected.

LINQ to SQL addresses this by implementing a property on the DataContext, the DeferredLoadingEnabled property. It is True by default, retrieving only the Customer object when the Customer has Orders. The related Orders objects are retrieve with an extra roundtrip to the database, one row at a time, when the Customers' Orders property is accessed in code. There is a related property, LoadOptions, on the DataContext that takes a DataLoadOptions instance that allows you to control exactly how much related data is retrieved. That is, do I want only related Orders or Orders, OrderDetails, and associated Products in a single round trip? The DataLoadOptions also allows you to filter the amount of data you get from related tables, that is, I want each Customers' associated Orders, but only OrderID and OrderDate.

ADO.NET Entity Framework does this a little differently. They don't have a property for whether deferred loading is enabled, they just load deferred by default. In order to load associated entites, there is a separate Load method, and an IsLoaded property that you can check before loading. EF also has an Include property of the query of also you to specify which related entities can be loaded, if eager loading is desired. With EF you can also use Entity-Splitting in your design if you know you always want to retrieve OrderID and OrderDate, but no other properties, from the Orders table. Object purists may frown on composing object based only on commonly-used queries, however.

You can also retrieve only certain columns from an object (ie all the fields in Customers but the Customers' picture) with either a related or anonymous type. And, of course, you can always specify a join that returns an anonymous type is desired to get just the properties you need from related tables.

So I'd say that this worry not only is completely unwarrented, but that LINQ to SQL and EF make programmers think more about lazy loading vs eager loading, and make it clearer and more maintainable than a join, which always returns an "anonymous rowset" with columns from all tables interspersed. That is, you know exactly what related data (at an object level) is being requested and retrieved.

MHO.

Categories:
SQL Server 2008

SQL Server 2005 introduced some new features that left "traditional" SQL folks puzzled. What's the hidden use for SQLCLR? Or the XML data type? Or Service Broker? Besides being available to you as a programmer, these are all used internally. 

I once had a conversation with a database person (not SQL Server) who, when asked about the viability of certain features in his database product for the general database developer, responded that "although there are only a handful of users using feature X, we use it internally in the product." So its always interesting to observe the SQL Server team using its own features to implement other features. Here's a couple of examples from SQL Server 2008.

SQL Server 2008 includes PowerShell integration and a PowerShell provider for SQL Server that allows you to navigate the database and policy object models using a file paradigm. Where else are they using PowerShell in the product?

A glance at the BOL reveals that there are a few new SMO (SQL Server Management Object) libraries, with names ending in DMF (e.g. Microsoft.SqlServer.Management.Dmf). These are used to setup and manage policy. According to this Policy-Based Management webcast, the libraries are used both internally (via SQLCLR) and externally (hosted by PowerShell) to define and check policy. Remember DMF stands for declarative management framework, the original name for the feature now known as Policy-Based Management.

Another use of PowerShell in SQL Server 2008 is as a subystem for SQL Agent jobs. In addition to the SQL Server 2005 Agent subsystems, you can use the PowerShell subsystem with jobs. Create a new job and jobstep to see PowerShell as a choice.

These (PowerShell and Policy) are also used together when you use an "on-schedule" policy execution mode. Choosing this mode schedules a SQL Agent job that uses a job step hosted by PowerShell that invokes a PowerShell cmdlet named Evaluate-Policy. The help file for this cmdlet says it "supports the SQL Server 2008 infrastructure". By the way, "On change - log only" policy execution mode uses Service Broker for its implementation.

Another example of synergy between new features involves the extended event feature. Extended events are implemented via an event engine built into SQL Server. Individual events are lightweight to fire and the infrastructure is amazingly flexible. Events can be associated with actions that pick up addition info, filtered via predicates, and dispatched synchronously or asynchronously to a variety of event targets.

Besides the events,actions,target, etc used for diagnosis, Extended Events are also used to implement to new SQL Server 2008 Auditing feature. Auditing has its own package of events,actions,data types, and targets which are used when you define an AUDIT in DDL, but because they are just "ordinary" Extended Event objects, you can also use them in conjunction with any of the other Extended Event facilities.

That's just a couple of examples of "feature synergy" in SQL Server 2008. I've blogged about others in SQL Server 2008 (e.g. SQLCLR UDTs for Spatial data types and hierarchyID) and I'll blog about more examples as I come across them.

Categories:
SQL Server 2008

This post is part of a series, see parts 1 and 2. Sorry to be so long getting back to this series.

This post covers LINQ to SQL and EF worry #3. That is:

LINQ to SQL and EF will encourage "SELECT to middle-tier, then update/delete" rather than issuing SQL UPDATE/DELETE statements that are set-based.

Neither LINQ to SQL or Entity Framework currently contains an insert/update/delete language. Entity SQL could contain DML in the future, LINQ (Language Intergrated QUERY) to SQL doesn't have one. But both APIs can affect INSERT/UPDATE/DELETE operations on the database. You create or manipulate object instances then call SaveChanges (EF) or SubmitChanges (LINQ to SQL).

The manipulate objects and save method works well in LINQ to SQL and reasonably well in EF. The distinction is that in EF, if there are related entities (e.g. a title row contains an integrity constraint that mandates that is title's publisher must exist in the publisher's table), you must fetch the related entity first. Or synthsize a reference using EntityKey (see associated post here) to save the database roundtrip involved in fetching the related entity. See this associated post about deleting a single row without fetching. But how about UPDATE?

SaveChanges and SubmitChanges can boxcar multiple INSERT/UPDATE/DELETE operations in a single roundtrip. But let's consider the number of database roundtrips involved to change a single customer row. This requires one roundtrip to "GET" the row and another to update the row. And what about a searched update in SQL (UPDATE...WHERE) that updates multiple rows? Or an updated based on a SQL join condition (my favorite example, using update over a recusrive CTE, gets all employees reporting to a certain manager and gives them all a raise)? The number of fetches required JUST to do the update increases. Maybe not the roundtrips required to get the rows, but the sheer number of fetches (network/datebase traffic) required.

Let's address the general "GET then UPDATE" pattern first. I worried about this one until I realized that, in most applications I've worked on, you don't usually do a "blind" UPDATE or DELETE. A customer web application fetches a row (and related rows), a pair of eyes inspects the row to ensure this IS indeed the right row, and then presses the gadget that causes an UPDATE/DELETE. So "GET then UPDATE" is an integral part of most applications anyway. OK for now. If UPDATE/DELETE of a row affects related rows, this can be accomplished with cascading UPDATE/DELETE in the database.

But how about multiple, searched, updates without inspecting/fetching ALL the rows involved? Neither LINQ to SQL or EF has a straighforward way to deal with this. AlexJ wrote an excellent 4-part blog series (start here) about rolling your own searched update in EF with an underlying SQL Server using .NET extension methods getting the SQL query text and string handling to turn it into an Update, but its not necessarily what I'd call straightforward. Maybe, wrapped in a library to encapsulate the details... It also looks SQL Server-dependent, and I thought EF wasn't supposed to be. So you'd need to replicate this for each provider.

LINQ to SQL contains the ultimate fallback method for this case. DataContext.ExecuteCommand() lets you execute any SQL command, including parameters. EF doesn't have the equivalent because (remember) your data store is an object model over a conceptual data source, not the data source itself.

I think this is one place (searched UPDATE/DELETE) that I'd suggest/mandate using stored procedures. The blind searched operation is accomplished in a single database roundtrip, and you can even (in SQL Server at least) use the OUTPUT clause in SQL DML to obtain information in rowset form as to exactly what got deleted. Since this is a database-specific operation, stored procedure sounds like a good workaround for this problem.

MHO: With stored procedures as needed and the realization that most apps use "GET then UPDATE" anyway, I think I'll dismiss this worry.

Categories:
SQL Server 2008

Recently I had the occasion to load .NET 3.5 on to a machine that had an existing instance of SQL Server 2005. .NET 3.5 does not work by versioning the "main .NET assemblies" (e.g. there is no "version 3.5" of mscorlib.dll, System.dll) but by replacing the 2.0 versions of them. You can observe this by inspecting the 4-part version number. For example System.dll (in \windows\Microsoft.NET, Framework\2.0.50727 as well as in the GAC) changed from version 2.0.50727.42 to 2.0.50727.1433 when I installed .NET 3.5. As an aside, when I installed Vista SP1 recently (I think this is what triggered it) System.dll went to version 2.0.50727.1434.

SQLCLR (using SQL Server as a .NET runtime host) will load a series of "approved" assemblies by directly using the fusion APIs. However there are less than 20 approved assemblies for SQL Server 2005 and sometimes people will write SQLCLR code using assemblies that aren't of the approved list. The two most common cases I've seen include using System.DirectoryServices.dll or some of the WCF client assemblies. Neither is on the approved list.

Although using system assemblies outside the approved list is discouraged, it IS possible. The resulting (user) assembly must be cataloged as UNSAFE and the CREATE ASSEMBLY process will catalog not only the original user assembly and referenced "system" assembly but, because of the way assembly dependancies work, quite a few (I've seen up to 20) dependent system assemblies. You can see the user assembly and all dependent system assemblies in the SQL Server metadata table "sys.assemblies". You also receive warnings that "assembly xyz has not been tested in this environment".

I've always said that if you use system assemblies that are not in the "approved" list, you now shift the responsibility of managing those assemblies to *you*, the SQL Server application programmer and/or DBA.

So what does this have to do with .NET 3.5 and SQL Server? It's a little known fact that when SQL Server loads a "user" assembly, it will check to see if a similar assembly (ie assembly with the same strong name and version) exists in the GAC (.NET global assembly cache). If the user assembly in SQL Server and the user assembly version in the GAC have the same .NET version number but a different MVID, SQL Server will refuse to load the assembly (from SQL Server's system catalog) and throw an exception.

As pre-requisite information, all of the "main" system assemblies (like System.dll) have a GAC version number of 2.0.0.0 even though the actual version number may be 2.0.50727.42. Or 2.0.50727.1433. You can see this by inspecting the GAC. And...an MVID is a .NET module version identifier. This is different from the version number, even non-strong named assemblies have MVIDs. The module version identifier is a GUID that changes *each time* an assembly is reassembled.

You can prove that the GAC/MVID check occurs by cataloging a user assembly to SQL Server and to the GAC, then recompiling/changing the assembly in the GAC (only MVID, not version number) but not the version in SQL Server. The resulting error after the the GAC'd version is changed is:

Msg 10314, Level 16, State 11, Line 1
An error occurred in the Microsoft .NET Framework while trying to load assembly id 65536. The server may be running out of resources, or the assembly may not be trusted with PERMISSION_SET = EXTERNAL_ACCESS or UNSAFE.... (rest of message elided).

But again, what's this got to do with .NET 3.5? I cataloged two assemblies to SQL Server 2005 prior to installer .NET 3.5. One assembly simply did mathematics (catalog as SAFE), one used an assembly that wasn't from the "approved" list, System.Remoting.dll (catalog as UNSAFE, it brings in about 15 dependent system assemblies that are not on the approved list). User-defined functions as these assemblies were also cataloged and worked fine.

Now install .NET 3.5. This changes the MVIDs (but not the version) of some of the System GAC'd assemblies. The SAFE assemblies that refer only to system assemblies on the "approved list are unaffected". My mathematics assembly/function still worked great. My UseRemoting assembly/function WAS affected. The resulting error is:

Msg 6522, Level 16, State 1, Procedure doIt, Line 0
A .NET Framework error occurred during execution of user-defined routine or aggregate "doIt":
System.IO.FileLoadException: Could not load file or assembly 'System.Runtime.Remoting, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' or one of its dependencies. Assembly in host store has a different signature than assembly in GAC.

(Exception from HRESULT: 0x80131050)
System.IO.FileLoadException:
   at UseRemoting.Class1.DoIt()

So what's the moral of the story? If you stick to the assemblies on the approved list you are fine. Fusion loader will get the latest version for you, no changes are needed, although you want to test your user assembly again in the new environment. But by using assemblies not on the approved list YOU take responsibility for the assembly. After installing a new version on .NET that changes system assemblies that you use, don't forget to DROP your user assembly (this should also drop all the old dependent system assemblies), recompile and retest your user assembly (always a good practice) and then run CREATE ASSEMBLY again. All should be well, because your "unapproved" assemblies are the same version the GAC uses.

As an aside, but an advance warning, SQL Server 2005 does NOT need .NET 3.5. But SQL Server 2008 DOES use .NET 3.5 and installs it as part of SQL Server 2008 install. See previous paragraph.

Hope this helps.

Categories:
SQL Server 2008

It's good to see Ed Katibah open up a blog on SQL Server spatial data. Ed is sort of the "dad" of spatial data in SQL Server; he's got a ton of experience and history in this space. Be sure to catch his posting on what's new in spatial for CTP6.  I've been trying out the new functions and the more I use the spatial types and functions, the more I understand the reasoning behind the dual Geometry - Geography types in SQL Server.

Issac's back at blogging too about spatial indexes, really makes for good reading.

Subscribed!

Categories:
SQL Server 2008

I've been working with Filestream storage in SQL Server 2008 since it appeared in CTP5. The way I've always set it up is to use sp_filestream_configure. During the CTP6 setup process, I noticed you could now configure Filestream as part of setup. Because I knew how to use sp_filestream_configure I skipped that part of setup. And everything just worked as expected when I used it. A friend of mine, not wanting to miss anything, configured Filestream as part of setup. When he tried to use it, the following "Catch 22-like error messages occurred":

EXEC sp_filestream_configure @enable_level = 3;

"The FILESTREAM feature is already configured to the specified level. No change has been made."

CREATE DATABASE ... with a filegroup for FILESTREAM

Msg 5591, Level 16, State 1, Line 1
FILESTREAM feature is disabled.

Huh? After checking the short list of usual suspects (e.g. NTFS file system, running SSMS as admin under Vista) we were both puzzled. Especially because it "worked for me" as it always had. There's nothing more frustrating to hear during problem resolution then "Well, I'm not sure what you did wrong because it works for me". Grrrr...

There is finally a resolution thanks to the storage engine team (confrmed and expounded on by Joanna's blog entry on March 3), and it turns out that the way filestream is configured changed in CTP6 for some very good reasons. You see, sp_filestream_configure sets up Filestream in the OS (requires OS privs) and SQL Server (requires SQL Server privs). Unless you're running sp_filestream_configure as an OS Admin who's also a SQL Server sysadmin, it may not work completely. But sp_filestream_configure is "IN" for one last CTP (CTP6). Gone before RTM.

So filestream configuration was broken into two parts:
1. Configuring filestream at an OS level is moved into setup. Or use SQL Server Configuration Manager. Or WMI scripting (with SMO/WMI) is you like scripting (I do). In the SQL Server Configuration Manager GUI, you configure it by select SQL Server Services (left pane), right-click on your SQL Server service instance and choose "properties" and use the FILESTREAM tab. Note that the equivalent FILESTREAM tab does NOT appear when using Services Control Panel applet. I hope you switched (as you should have) to SQL Server Configuration Manager back in SQL Server 2005 days.

2. Configuring filestream at a SQL Server instance level requires EXEC sp_configure 'filestream access level', '2'. Note that the access level choices in SQL Server are 0,1,2.

3. BOTH configurations steps/setting (OS and SQL Server instance) must be compatible for filestream to work. If, for example, filestream is enabled at the OS level but disabled in SQL Server, you'll get message 5591. See above.

So how did my friend get the "Catch 22 errors"? In CTP6 (but not in future), setup performed step 1, but not step 2. In future setup will do both steps. In CTP6, sp_filestream_configure (to be removed before RTM) will do both steps. But before sp_filestream_configure does anything, it CHECKS to see if either configuration job is already done. Thus the "filestream feature is already configured" message.

Got it? So get used to configuring filestream in both places using SQL Server Configuration Manager AND sp_configure. That's the way of the future. The fact that both exist in one CTP can be frustrating but is understandable. Reminds me of a similar shift when configuring HTTP endpoints (which also requires both OS and SQL configuration) during the SQL Server 2005 CTPs.

Hope this post saves someone a few "grrrr... moments" when using this new useful feature.

Quite a bit of activity last week in the Visual Studio and ORM spaces.

Visual Studio released a CTP of Visual Studio 2008 support for SQL Server 2008 CTP6. It allows you to connect to CTP6 with Server Explorer, which enables quite a few other features to work. check with Overview document on the download page. SQLCLR projects and T-SQL and SQLCLR debugging work too.

Curiously the Overview document says "using the LINQ to SQL Designer with SQL Server 2008 databases" is specifically not supported. But I've used SQL Server 2008 WITH NO new data types (e.g. DATE, TIME, etc) and it does seems to work. EF beta3 designer too. Maybe they are referring to lack of support for the new data types. Speaking of which...

Faisal Mohamood announced on the ADO.NET team blog that they ARE working on support for the four new date and time types in SQL Server 2008. Hooray. Now let's keep our fingers crossed for SQL Server UDT support.

Julie Lerman wants these types in EF too, but I'm not sure I agree. DATE and TIME would be nice because they're ANSI standard types. DATETIME2 maps favorably to DateTime in EF, actually better coverage of the value space than DATETIME. Maybe what she really means is that she wants the SSDL and MDL and the designer to be aware of them and generate the closest possible mapping. Even if that's "string".

The distinction is that the ADO.NET Entity Framework is meant to be database neutral, not SQL Server specific, and supports a discrete subset of database data types not a 1-1 mapping of EF types to SQL Server types. LINQ to SQL, OTOH, is SQL Server-specific and should support a 1-1 mapping. At this point I'm really thinking they should have called it "LINQ to SQL Server". Not that the *concept* can't be extended to other databases (LINQ to DB2, LINQ to Oracle, etc) but these databases will have to implement their own. The fact is that LINQ to SQLCE does exist and is separate from LINQ to SQL (Server). And other vendors may follow with their own, database-dependent implementation of a light LINQ layer. But speaking of SQLCE...

The SQLCEBLOG announced the beta of SQLCE 3.5 SP1 with ADO.NET Entity Framework Beta 3. There was support for EF in SQLCE a while ago, but it was postponed in the 3.5 release which sports LINQ support. Now it looks like Entity Framework support is back. Hooray. BUT, although a common programming model between SQLCE and a subset of SQL Server functionality is a great idea, IMHO the best and most lightweight programming model for SQLCE uses SqlResultset. Still the winner.

Oh yes, BTW, SQL Sevrer 2008 CTP6 was released last week. But that's been posted on EVERY SQL Server related blog in the world. Congrats for another CTP folks. Sparse columns, filtered indexes, and filtered statistics are in as well as quite a few other features. 

Sometime before SQL Server 2008 RTM, the libraries for LINQ and LINQ to XML (that is System.Core.dll and System.Xml.Linq.dll) but NOT LINQ to SQL will be added to the SQLCLR "approved" assembly list. The approved assembly list is the list of .NET Framework libraries that have been tested in a SQL Server-hosted environment, annotated their code with HostProtectionAttributes where needed, and can be used in user assemblies that can be cataloged with "PERMISSION_SET = SAFE". That means you can use 'em in SQLCLR user-defined functions, procs, and other SQLCLR database objects.

Being curious, I asked "why LINQ and LINQ to XML but not LINQ to SQL"? The answers I got were "LINQ to SQL not tested in this environment" and "not sure there should be yet another way to access the database with .NET". Fair enough. Although, its never seems to be a problem to provide yet another way to mix SQL Server and XML. The number of different ways now exceeds the number of fingers on one hand.

OTOH, SMO doesn't appear to be added to the approved assembly list in SQL Server 2008. Nor has WCF. Oh well.

Enjoy...

I often hear those who identify themselves as "database programmers" (sometimes I think these may be folks who program only inside-database code), say: "I'd give learning LINQ and/or LINQ to SQL/Entities or Entity SQL (eSQL) a go in my sparse time, but I don't have a good book or a tool.

I haven't read a book on either of these yet, spending most of my time in BOL. BOL is a good start (but not comprehensive) for LINQ and LINQ to SQL. eSQL docs are pretty sparse, but that's because its not a shipping product yet. Maybe things will improve. (Note: if anyone wants to send me a copy of their book to read in my spare time, I'll give it a go ;-).

Following LINQ is more difficult than EF (modulo LINQ to Entities) because there's many implementations/dialects. There are operators that LINQ (over objects) can do, but LINQ to SQL or LINQ to Entities cannot.

On to tools:

For LINQ, I like LINQPad by Joseph Albahari. Simple to set up, although the queries are not *exactly* the same as you'd write in a program. And it doesn't support LINQ to Entities yet (or LINQ to DataSet for that matter). But, all in all, an excellent tool.

For eSQL, there's no comparison to Zlatko Michailov's eSqlBlast. This is not only an eSQL practice query processor, but also a visualizer for the results you'd get back. You just type in the eSQL. Even comes with source code. Latest version is on CodePlex. This is straight eSQL, no LINQ to Entities support here either.

Neither one of these has Intellisence or statement completion, I believe.

Enjoy...

This post covers LINQ to SQL and EF worry #2. That is:

LINQ to SQL and EF will encourage "SELECT * FROM..." style coding because you get back a nice, clean object this way instead of the less-useful anonymous type you receive by doing projection. This will also make most/all covering indexes useless.

LINQ to SQL and EF can return something other than a whole object instance. Here's an example:

// This returns a collection of authors
var query =  from a in ctx.authors
             select a;

// this returns an anonymous type
var query =  from a in ctx.authors
             join ta in ctx.titleauthors on a.au_id equals ta.au_id
             join t in ctx.titles on ta.title_id equals t.title_id
             select new { a.au_id, t.title_id };

The collection of authors returned by the first query is updatable and a reference to it can be used outside the function in which its created. The anonymous type is not updatable and cannot be used outside the function in which its created (there is a nice workaround for this, but even the author of the workaround describes it as a bit of a hack). Although I can see a use for anonymous types in data-binding the good ol' dropdown list. But because they're be can't insert a new row unless you include all the columns you need, perhaps not dropdown comboboxes.

You don't necessarily need to return an anonymous type. You can define (by hand) a class that represents the projection of authorid and titleid. Or use a view. But, in order to do this on a large-scale project, you'd need to define a class for each projection in the entire project. Just for fun, I ask my students "do you know every projection (rowset) that every query in your project returns"? Can you even enumerate them? No one's answered "yes" to that question yet. So anonymous type (or much extra work). Maybe someone will write a tool to do this some day. Or "whole objects"...aka SELECT * FROM.

Counter to this is perhaps you SHOULD know every projection your project returns. Would certain help in tuning to know them all.

BTW, this is the same issue you'd run into using stored procs that return rowsets with LINQ to SQL; they return anonymous types. Using rowset-returning procs with EF *forces* (in beta3) you to define a class to contain the rowset produced. That's good. But EF can't make use of procs that return more than 1 rowset (SqlDataReader.NextResult() in ADO.NET). LINQ to SQL can use these, you get multiple anonymous types.

So simple 1-rowset sprocs with EF, or custom classes for every projection (and sproc in LINQ to SQL) it is. Or nice, clean, full objects (using SELECT * FROM). And what about covering indexes? An overly-simplistic definition would be "non-clusted index defined over the set of columns used by one table in a projection". Makes for some nice query plans (and sometimes even helps with concurrency issues in SELECTs). But if we're always doing SELECT * FROM, flush those covering indexes. The only index that matters is the (possibly) clustered index on the base table.

A counter-argument to this is that you really shouldn't define a covering index for every projection just because you can. Any index takes space and affects insert/updates/deletes, there's a tradeoff here. A second counter-argument is that if you need many, many covering indexes, perhaps your database isn't as well normalized as it could be. I'm not really sure I buy this argument.

MHO: This is a for the mostpart a valid worry.

I'd been meaning to write this entry for a while, about my opinion on LINQ to SQL and the ADO.NET Entity Framework and performance. I've just finished reading the 2-part blog series "Exploring the Performance of the ADO.NET Entity Framework", and was surprised (I guess) that the database performance aspect was barely mentioned.

One way to look at performance is to examine and profile the ADO.NET code, but because both EF and LINQ to SQL are T-SQL code generators (EF is not database-specific, but I'm only talking about SQL Server here, so far), another way to talk about performance is to examine the generated T-SQL code. I want to look at the generated T-SQL code side.

In this posting I'm going to list my top worries WRT the code generated by these APIs. And try and argue for and against the worries at the same time. Here's the list.

1. LINQ to SQL and EF will proliferate dynamic SQL, and will almost surely produce suboptimal dynamic SQL, causing database performance problems and plan cache pollution. And dynamic SQL is evil to start with.

2. LINQ to SQL and EF will encourage "SELECT * FROM..." style coding because you get back a nice, clean object this way instead of the less-useful anonymous type you receive by doing projection. This will also make most/all covering indexes useless.

3. LINQ to SQL and EF will encourage "SELECT to middle-tier, then update/delete" rather than issuing direct SQL UPDATE/DELETE statements that are set-based.

4. LINQ to SQL and EF will write code that gets too much or too little data at a time. This is a variation of #2. Too much data in one query is a waste. Too little data is also bad because it means extra database round-trips.

5. LINQ to SQL and EF will discourage the use of stored procedures to return rowsets; returning rowsets in stored procedures is *thought to be* superior to returning them in dynamic SQL. In addition, the code generators will use a subset of T-SQL query constructs, that is, only the constructs that the LINQ or ESQL language supports, rather than the full power of the T-SQL query language.

6. LINQ to SQL and EF queries will be "untunable" because, even if you discover a performance problem, you can't change the underlying API code to produce the exact SQL query that you want. There's too many layers of abstraction to change it.

7. Other arguments? I'll accept other points to argue/worry about...

Here the argument about worry-point #1. Other worry-points in upcoming posts.

It's interesting to see the number of folks who do query tuning for a living salivating over the prospect of tuning the bad, bad, bad queries that will assuredly result from these two data access stacks. And the number of DBA-types who'd like to "ban LINQ/EF and databases in their companies". It's also interesting to note that most people who profess a dislike for the generated code, have never seen (or seen very little of) the generated code. Usually, when someone sites a particularly bad instance of generated code they're never able to tell me if the code came from LINQ to SQL or EF. So I'd like to open a clearinghouse for LINQ to SQL and EF queries that generate really poor SQL. Send them to me at my SQLskills email address, together with enough info/data to reproduce it. Conor Cunningham's blog has begun addressing the question of LINQ's IN operator and SQL cache pollution, by writing about IN and SQL plans.

So far, its been my experience that LINQ to SQL, being more relation-centric, will in general generate code that's closer to what a good T-SQL programmer would generate. EF is more "object-centric" and sometimes generates SQL thats meant to construct object graphs. But neither one of them (that I can deduce) can generate a full outer join.

About dynamic SQL. It's almost dogma amongst database programmers that "static SQL" in stored procedures is better for security than dynamic SQL constructed via string concatenation. Besides the obvious association between dynamic SQL and SQL injection, using dynamic SQL means that all users must be given access to the underlying tables, unless you strictly use LINQ to SQL/EF with views and sprocs. LINQ to SQL and EF make every attempt to use parameterized SQL, and LINQ to SQL claims to have minimized/eraticated the potential for SQL injection when using their code. And remember, many programmers use dynamic SQL currently, LINQ to SQL would be an improvement for them.

More about plan cache pollution. I'll start by recommending Sangeetha Shekar's blog series on the plan cache. The starting entry of the 17-part series is here.  Except for the concept of "many different projections when one stored procedure will suffice" there's no cachability difference between parameterized SQL and a stored procedure. Non-parameterized SQL suffers a slight cachability difference until its reused. Because LINQ to SQL and EF are code generators, its likely that they may generate more homogenous SQL than programmers who use SqlCommand.CommandText. And programmers that use dynamic SQL (the ones most likely to use only LINQ to SQL/EF) are likely causing plan cache pollution right now in any case.

MHO.

This entry is a continuation of the previous posting on how to use SQL Server 2008 spatial data and Virtual Earth. This entry discusses your web service and SQL Server code choices.

The point of the web service is to translate one of the output formats of SQL Server 2008 spatial to a format this Virtual Earth can use. The spatial data types can be output in:
1. Well-known text format
2. "Native" format - that is, return a Geometry/Geography type to a .NET client as the SqlGeometry/SqlGeography native .NET types
3. GML - Geography markup language

Remember that Virtual Earth needs:
1. GEORSS or
2. KML
3. Custom XML-based format (and custom Javascript code to process it)
4. Javascript strings (to pass to eval())

There is not a one-to-one correspondence between any of these methods, so there is going to be a need for a transformation somewhere. In addition, a SQL Server Geometry or Geography type can contain a collection of points, linestrings, or polygons. There may also need to be a one-to-many translation. This is facilitated by methods STNumGeometries/STGeometryN and STNumPoints/STPointN.

Finally, there could be a pushpin (point) associated with the center of a polygon and/or text associated with the pushpin. Where the text comes from will be application-dependent.

Because there are many ways to process XML in SQL Server and in web services, that's lots of choices. Some that come to mind are:
1. Transforming a SqlGeometry/SqlGeography from a SqlDataReader to a custom XML format
   Javascript code in the web page processes the custom format.
2. Using SQL Server's XQuery to transform GML into GEORSS or KML
   This is returned to the web service as a SqlXml data type.
   Consumed directly as a layer in Virtual Earth.
3. Using an XSLT transform (or LINQ to XML) to transform GML to GEORSS or KML. This can occur in the web service or even a SQLCLR procedure, although this type of processing should really be accomplished in the web service.
4. Using SELECT ... FOR XML PATH to construct GEORSS or KML directly.
   Similar processing to choice #2 above.
5. Transforming a SqlGeometry/SqlGeography from a SqlDataReader to Javascript strings.
   Web page simply calls eval()

Hopefully this posting will elucidate the choices you have for this rendering. Depending on which method you choose, there could be quite a few "moving parts" (a la Rube Goldberg) in the solution.
- Web pages with custom Javascript code
- Web service to supply the data
- HTTP Handler to allow Virtual Earth to use remote GEORSS/KML
- Data access code in SQL Server
- Transformation code in web service or SQL Server

So.....what's your favorite method for using SQL Server 2008 spatial data with Virtual Earth?

Last week at the SQL Server 2008 Jumpstart I showed an example of using SQL Server 2008 spatial data types with Virtual Earth. I showed a single coding style. There really is a plethora of coding styles to using these together, and I'd like to describe some of the most common ones. I'd like to do this without much code for now, because an end-to-end description sometimes results in the graphic aspects (which other controls to use, how to build Ajax web pages, etc) seem to get in the way.

I'm not even going to mention using Virtual Earth to serve tiles directly, but only the Virtual Earth web control. Start with the Virtual Earth Interactive SDK for coding information.

In general, what we're after is:
   Page contains Virtual Earth control
   Page makes a request to a Web Service for data (usually Ajax)
   Web Service get and processes data from database, returns data
   Page uses Virtual Earth calls to present the data

The first two are straightforward, although the Virtual Earth control is available only online via a URL. This makes testing when you don't have internet access virtually impossible. The Ajax request (that is, call a URL with an XML payload using a browser-specific mechanism, wait asynchronously for the response) normally doesn't use SOAP-based Web Services, because you'd have to add and remove the SOAP headers. It's usually XML-in, XML-out, sans SOAP headers.

The Virtual Earth calls depend on what version of Virtual Earth control you're using. I'll show this using version 6.
1. You can render using shapes directly
2. You can import GEORSS into a shape layer
3. You can import KML into a shape layer

Virtual Earth supports visualizing shapes directly (#1 above) by using:
1. AddPushpin - to visualize a point or the centroid of a polygon
2. AddPolyline - that's Linestring in SQL Server 2008
3. AddPolygon - for polygons

For shape layers (#2 and #3 above), the code looks something like this:

var l = new VEShapeLayer();           
var veLayerSpec = new VEShapeSourceSpecification(type, local.xml, l);

map.ImportShapeLayerData(veLayerSpec, onFeedLoad);

The "type" in the VEShapeSourceSpecification constructor can either be GEORSS or ImportXML (that's KML). You can also have a Live Search Maps Collection but likely that one won't come from SQL Server.  Bear in mind that the VEShapeSourceSpecification constructor needs a *local* XML file URL. If you don't want to use a local XML file, you can build an ASP.NET HTTPHandler that uses redirection to pretend a remote XML file is local (see Mike McDougall's article and code to accomplish this).

One last twist that I've seen is that you can build your Javascript code directly in each page OR make the Web Service program reformat your data into Javascript calls directly. Once you're back at the page (the Web Service passes back a string of Javascript) you simply call Javascript's "eval" function. This makes the web page client-side code easy, but the Web Service code lots more complex, as you're generating code, not points, lines, and polygons.

Web Service choices follow next....

SQL Server 2005 introduced the concept of endpoints. Every connection entry point into SQL Server is abstracted as a endpoint. You can see all the endpoints on your SQL Server instance by using the sys.endpoints metadata view. In addition to those you'd expect (for TCP/IP, Named Pipes, Shared Memory) there are also endpoints defined for the dedicated admin connection, Service Broker, Database Mirroring...and HTTP.

The endpoint concept is still with us, but HTTP endpoints, those endpoints that allow you to expose SQL Server procs and functions directly as HTTP Web Services will be deprecated in SQL Server 2008. Upon creating an HTTP endpoint (aka SOAP endpoint) in SQL Server 2008, you'll be warned that "this feature will be removed in a future version. You should not use these in new development work." This was announced at the SQL Server 2008 Jumpstart last week.

So what replaces this functionality (and how many folks actually used it in SQL Server 2005). The likely successor seems to be ADO.NET Data Services, a mechanism to produce RESTful services over any type of data, using Entity Data Model or LINQ to SQL. The model you expose through data services doesn't really have to be a database at all.

And for those of you that liked the idea of exposing your stored procedures as services, ADO.NET Data Services supports "Service Operations", which can be parameterized and contain any type of logic you wish. Service Operations aren't limited to stored procedures, but could be used to encapsulate them.

Categories:
SQL Server 2008

Wow. It's already Tuesday and I'm still recovering from last week. I've been laying low for the last six months or so, writing a few (OK, quite a few) blog entries on SQL Server 2008 topics. But I'd really been working on my next SQL Server course (on 2008, of course), that premiered last week as part of a program known as SQL Server 2008 Jumpstart.

Simon Sabin (the newest SQLskills associate) and I presented to a good-sized crowd of Microsofties and trainers from everywhere. We did the developer track; there were also DBA and BI tracks by my well-known SQLskills associates. You'll see this course in a lot of different venues in the next few months, but it won't always be us that's doing the presenting. Just so many places we can be at once...

We covered everything from new T-SQL features to Spatial Data and Indexes and everything in-between. In addition to the SQL Server features, there was an entire day devoted to client APIs and dev tools; not only ODBC (yes, I did an ODBC demo in C++) and ADO.NET, but also the Entity Framework, ADO.NET Data Services, Occasionally Connected Apps.

So, for the next few weeks, I'm going to be writing down my thoughts on questions (some of them rather pointed) that arose about the topics we presented. Some of the post will be facts, answers to questions...but they'll be some opinion pieces as well (marked with MHO, that's my humble opinion).

Categories:
SQL Server 2008

In a previous blog entry a while ago, I wrote about writing a single SQL statement that did a SQL MERGE operation, used the OUTPUT clause to put out a rowset and directed the rowset into an INSERT statement. This happens in one statement without the need of explicitly defining a temporary table and using multiple SQL statements.

Turns out that this feature has a name: Composable DML. I've also heard it called (in SQL Server Books Online) "DML table source".

I've also heard the term composable queries, both in references to both Entity Framework functions and LINQ to SQL. They even have an "IsComposable" attribute in the Function definition) In this context, it means that the output of a function that produces a rowset (in this case a SQL Server table-valued function) or code that produces a rowset can be combined with further queries that do more filtering or projection, or even retrieve related rows. When the queries are submitted to the database, they are composed into a single SQL statement.

The point is pretty similar, reduce the resources required to accomplish a set of operations by reducing the number of statements or database round-trips needed.

More about Service Broker priority in SQL Server 2008.

The books online states, when setting a initiator/target priority as local service name/remote service name that the priority affects:

1. Sends from the initiator queue
2. Receives from the initiator queue
3. Getting the next conversation group from the initiator queue.

And specifies the mirror image priority for target/initiator as local service name/remote service name affects:

1. Receives from the target queue
2. Sends from the target service
3. Getting the next conversation group from the target queue

So, in my previous example, where

The initiator/target as local service name/remote service name is priority 7
The target/initiator as local service name/remote service name is priority 6

Let's send a message from the initiator that "sticks" in sys.transmission_queue. Suppose that either the network is slow or we don't have the appropriate setup. But the message is floundering in the transmission queue. Already SENT, waiting to be delivered.

"select * from sys.transmission_queue;" shows the message BEING SENT (and transmitted) at priority 7. And, when the message is finally DELIVERED, the priority in the receiveq "select * from receiveq" is priority 6.

Normally, you'd likely make both priorities in the initiator/target pair the same to avoid confusion, but you can make them different for greater granularity. And it DOES help to illustrate where and when priority is used.

Conversation priority is a new feature with SQL Server 2008. In a previous blog post, I talked about how to simply set one up. But you'd usually not want to set up a priority for all messages. So, lets set up a simple service and then define a "Premier Customer" priority. Before trying this you need to make sure the database is set to honor broker priority.

ALTER DATABASE pubs SET HONOR_BROKER_PRIORITY ON;

Suppose I had a simple (one database) service pair. A service called "sender" (note lower case) using senderq is the initiator and a service called "receiver" using receiverq is the target. They use the [default] contract. So we can set up a "Premier Customer" by contract.

CREATE CONTRACT premier ([DEFAULT] SENT BY ANY);
GO

Now, let's ensure that the target can also use the premier contract in addition to the DEFAULT contract.

ALTER SERVICE receiver ON QUEUE receiveq (ADD CONTRACT premier);

We need to set up the priority.

CREATE BROKER PRIORITY PremierCustomer
    FOR CONVERSATION
    SET (CONTRACT_NAME = premier,
         LOCAL_SERVICE_NAME = sender,
         REMOTE_SERVICE_NAME = 'receiver',
         PRIORITY_LEVEL = 7);
GO

The interesting thing is, this priority ONLY covers messages sent and received (and get conversation group) by the sender/senderq, IT DOESN'T cover the receives in the receiver/receiveq queue. So we'll set up a mirror image priority, and make the priority level still higher than the default (5) but gratuitously different than the other priority

CREATE BROKER PRIORITY PremierCustomer2
    FOR CONVERSATION
    SET (CONTRACT_NAME = premier,
         LOCAL_SERVICE_NAME = receiver,   -- local service is receiver
         REMOTE_SERVICE_NAME = 'sender', -- remote service is sender
         PRIORITY_LEVEL = 6);                   -- note level 6, not 7
GO

Now, begin a conversation using sender and receiver and the premier contract and send a message. Note that when you issue "SELECT * FROM sys.conversation_endpoints;" the sender (initiator's) endpoint is priority 7, the receiver's endpoint is priority 6. And, looking at the message in the receiveq its set to priority 6.

If you'd send a return message from the receiver (target) back to the sender (initiator), the return message in the senderq would be priority 7.

So when setting up priorities, if you want all messages on both sides to be affected, don't forget to set up a priority for EACH side. Here's the entire demo.

BrokerPriorityByContractDemo.zip (1.57 KB)

One last SQL syntax post for the evening...

We've all heard about SQL Server 2008 row constructors. They allow syntax like this to work:

CREATE TABLE name_table (name varchar(20), age int);
go
INSERT INTO name_table VALUES ('Bob', 54), ('Mary', 30), ('Sam', 15), ('Buddy', 9);
go

But how about using them as a table source:

SELECT n.name, n.age, tab.species
FROM name_table n
JOIN
(
  VALUES ('Bob', 'person'), ('Mary', 'person'), ('Sam', 'cat'), ('Buddy', 'cat')
) tab (name, species)
ON n.name = tab.name;

You specify a table alias and name the columns, and its just another (synthesized on the fly) table.

To round out the new plan guide-related features in SQL Server 2008, there is a new way to express a table hint that increases the plan guide's reach.

The sp_create_plan_guide procedure requires a hint as the last parameter. This can be in a form OPTION (hint), just an XML query plan (in SQL Server 2008 "OPTION (USE PLAN)" isn't required), or NULL. Specifying NULL can be used to "subtract" a hint from an existing query where the hint is hardcoded in source code you don't have the ability to change.

This requirement limited plan guides to hints that could be used in an OPTION clause, and table hints didn't qualify because they are inline. In SQL Server 2008 this limitation is removed. You can specify a table hint by using the syntax OPTION (TABLE HINT (table_alias, hint)). This makes table hints usable in plan guides. You can even specify multiple hints on a per-table basis.

OPTION (TABLE HINT (table_alias1, hint1), (table_alias2, hint2)...)

BOL indicates in a few places that this syntax exists specifically to make table hints accessible to plan guides. Cool.

Another use for an analogous syntax that's not in SQL Server 2008 would be to be able to specific per-table JOIN hints the same way. You can currently specify only one JOIN hint in the option clause. An analogous way to specify multiple, per-join JOIN hints in an OPTION clause would be nice. But, if you're controlling the plan that closely, perhaps its best to resort to plan forcing.

BTW, manditory disclaimer: Hints are (usually) evil and should be used only as a last resort. Using a USE PLAN hint seems, at least to me, similar to writing custom navigation code for each query in IMS/DB, as I did in the '80s. You're throwing the query algebrizer/optimizer away...or at least pinning its hands.

SQL Server 2005 introduced the OUTPUT clause in DML statements. With the OUTPUT clause, you can output column values to a table variable, a table, or return these values to the user. You can even use multiple OUTPUT clauses to push the values into both a table variable and a table from the same statement.

In SQL Server 2008 there is an additional option. You can use your OUTPUT column values directly in an INSERT-SELECT statement. Here's what it would look like, using MERGE with an OUTPUT clause (and an example from one of the early webcasts):

CREATE TABLE Stock (Stock VARCHAR(10) PRIMARY KEY, Qty INT CHECK (Qty > 0));
CREATE TABLE Trades (Stock VARCHAR(10) PRIMARY KEY, Delta INT CHECK (Delta <> 0));
CREATE TABLE AuditChanges (Action varchar(6), Stock VARCHAR(6), Qty INT);
GO

INSERT Stock VALUES('MSFT', 10), ('BOEING', 5);
INSERT Trades VALUES('MSFT', 5), ('BOEING', -5), ('GE', 3);
GO

INSERT INTO AuditChanges
SELECT * FROM
(
MERGE Stock S
  USING Trades T
  ON S.Stock = T.Stock
  WHEN MATCHED AND (Qty + T.Delta = 0) THEN
    DELETE
  WHEN MATCHED THEN
    UPDATE SET Qty += T.Delta
  WHEN NOT MATCHED THEN
    INSERT VALUES(Stock, T.Delta)
  OUTPUT $action, T.Stock, inserted.Qty
) tab (action, stock, qty);
GO

Notice how the OUTPUT clause requires a table alias (in this case "tab") and needs to name the columns returned from OUTPUT.

One of the nifty new SQL Server 2005 features was called "plan forcing". You could acheive plan forcing by using the USE PLAN query hint but this made such a hideous query that the better way to do this is to use a plan_guide. A plan_guide is a named database object (like table or view) that associates a query hint with a particular query.

In SQL Server 2008 there have been some enhancements made for plan_guides and plan forcing. The biggest change is that you can create a plan from the plan cache directly, rather than using the rather long set of steps that you needed in SQL Server 2005. You only need a few lines of boilerplate code, setting up and invoking sp_create_plan_guide_from_cache, and the plan guide is automatically created. See SQL Server 2008 Nov 2007 for details and copious examples. The concept of creating a plan guide from a query plan already in cache is known as "plan freezing".

You can also validate that a plan guide will still be used (after dropping an index, for example) by using the new sys.fn_validate_plan_guide function. In addition, its easy to see if a Plan Guide is being used or not with the profiler trace events "Plan_guide_successful" and "Plan_guide_unsuccessful", located under the "Performance" category. These can also be monitored over time with two new perfmon counters "Guided Plan Executions/sec" and, what has to be the most humorous perfmon counter "Misguided Plan Executions/sec".

One other plan guide enhancement that doesn't seem to be mentioned in BOL is that you can use plan guides for INSERT, UPDATE, DELETE, or even MERGE statements in SQL Server 2008. This feature works in the SQL Server 2008 Nov CTP (CTP5).

As always the standard disclaimer is "query hints are (usually) evil" but if you're desperate enough to need plan forcing or plan freezing, its good to know that its there.

One reason why I've seen people use plan freezing is as a hedge against plan instability that can result from what's known as "parameter sniffing". Parameter sniffing is usually a good thing; the query processor sniffs the parameter values the first time you execute a parameterized stored procedure and uses the statistics based on those parameters accordingly. However, if your "first time execution" was based on uncommon parameter values, you're now stuck with a suboptimal plan. Plan freezing can be use to freeze a plan that's most optimal and "override" parameter sniffing. You just have to catch such a plan in the cache.

The drawback is, of course, that you are stuck with the same plan even if the statistics change. I'm sometimes accused of liking plan guides and plan forcing too much, but perhaps it's because I'd oftentimes get a call when parameter sniffing (or a service pack: yes, it does happen every now and again) changed to a query plan that is suboptimal. For those trouble calls (ie, "my query just starting running 4 times slower today and the plan is different from yesterday") plan_guides/forcing let you change the plan without changing the underlying code. When the fix arrives, you just disable the plan guide to let the optimizer go on its merry way.

Since its appearance in SQL Serve 2008 CTP5, folks have been wondering about the reason for SQL Server's new Change Tracking feature, and how it differs from Change Data Capture (CDC). The CTP5 BOL has a nice writeup about how it differs from CDC, so I'd encourage you to start there. But about "why it's there...".

The main reason for change tracking, as far as I can see, is to be used with ADO.NET Sync Services. ADO.NET Sync Services is part of the Microsoft Sync Framework and is a way to do (I'm trying to summarize here) "client directed programmable replication". For database folks ADO.NET Sync Services can be thought of as providing synchronization between SQL Server Compact Edition and any relational datastore, but the model is extensible. ADO.NET Sync Services 1.0 shipped with SQL Server Compact Edition 3.5 in Visual Studio 2008 (there may be other ship vehicles I'm unaware of) and provides 2-tier, 3-tier, or service-oriented synchronization. It provides hub-and-spoke synchronization. ADO.NET Sync Services version 2.0 will add (the CTP is out now) peer-to-peer synchronization. Microsoft Sync Framework (CTP 1 is out now) includes sync support for data stores that aren't necessarily databases. The best place to go to find some highly informative examples that illustrate the synchronization patterns supported by the model is "The Synchronizer's" (Rafik's) blog.

What's this all have to do with SQL Server 2008 change tracking? Although ADO.NET Sync Services 1.0 is a good start (and there's GUI-based designers in VS2008), it usually means adding timestamp or datetime columns and tombstone tables (tables that track the primary key of deleted rows) to existing database tables and triggers to populate the information sync services needs. That's fairly intrusive, especially with packed applications. SQL Server 2008 change tracking takes care of all of this for you.

You turn on change tracking on a database with ALTER DATABASE and on individual tables with ALTER TABLE. Change tracking does the rest. You can access the information you need for Sync Services applications using the CHANGETABLE table-valed function and a few related functions. You usually want to turn on ALLOW_SNAPSHOT_ISOLATION in the database as well, because change tracking works by tracking when a transaction is committed, rather than when its started.

Change tracking provides the information ADO.NET Sync Services needs to answer request such as "what rows have changed since my particular client (change originator is tracked by change tracking as well) last synchronized" and "have I synchronized with the main database so long ago (change tracking info has a DBA-specified retention) that I need to sync the entire table because the incremental info I need is no longer available".

Currently the VS2008 designers (they're accessed by Added a "Local Database Cache" item to a programming project) don't have an option to indicate "just use SQL Server 2008 and I'm using change tracking", so you have to code the synchronization procedures to use change tracking with Sync Services by hand. Perhaps such a feature is in the works for ADO.NET Sync Services 2.0.

And how about SQL Server Compact Edition (currently the only "ClientProvider" that ship with Sync Services 1.0). Well the SQL Server Compact Edition, Sync Services "just works". No special setup is required, although you may (I haven't determined this) need SQL Server Compact Edition 3.5. This version of SQLCE ships with VS2008. One last thing...ADO.NET Sync Services 1.0 isn't available for compact devices yet, so when you sync with SQL Server Compact Edition, it must be deployed on the desktop.

One last SQLCLR feature I'd forgotton about but was quite highly publicized. This is extension of SQLCLR UDT and UDAgg maximum size from 8000 bytes to 2gb. You just the MaxByteSize of -1 in the appropriate attribute, like this:

[SqlUserDefinedAggregate(Format.UserDefined, MaxByteSize=-1,IsNullIfEmpty=true)]

The serialization format is going to have to be UserDefined, the limt for Format.Native is still 8000 bytes.

Just catalog the UDT or UDAgg and use it as you would any other TYPE/AGGREGATE. I crufted up a proof-of-concept one pretty easily, and the new spatial data types (GEOGRAPHY and GEOMETRY) are also large (system) UDTs, so you know this works as of CTP5.

Now you can have your "infinite" UDT that holds an array or "infinite" string concatenation UDAgg. Enjoy.

Categories:
SQL Server 2008 | SQLCLR

Another cool SQLCLR feature in SQL Server 2008 is the ability to define a table-valued function with an ORDER clause. If you have intimate knowledge that the function always returns rows in sorted order, you declare your SQLCLR UDF with an ORDER clause in the CREATE FUNCTION DDL statement. Let's experiment with this using the cheap and easy Fibonacci sequence function from Dan Sullivan and my SQL Server 2005 Developer's Guide.

create assembly orderedtvf from 'C:\temp\OrderedTVF.dll'
go

-- no order clause
create function FibonacciUnOrdered (@prevstart int, @start int, @rows int)
returns table (next int, prev int)
as external name orderedtvf.FData.Fibonacci
go

The Fibonacci sequences generated are always in ascending order because that's how the function is implemented. In fact, they are in order by both the "next" and "prev" column. Because there is no limit to the number of method signatures we can have over the same SQLCLR UDF, we use the same SQLCLR method, just changing the DDL statement and function name.

create function FibonacciByNext (@prevstart int, @start int, @rows int)
returns table (next int, prev int)
order (next asc) -- this is new
as external name orderedtvf.FData.Fibonacci
go

create function FibonacciByPrev (@prevstart int, @start int, @rows int)
returns table (next int, prev int)
order (prev asc) -- same concept, different ordering column
as external name orderedtvf.FData.Fibonacci
go

Now let's do some testing. The query plan iterators and plan cost are shown as comments.

-- cost: 0.0279081
-- TVF -> Sort -> Select
select * from dbo.FibonacciUnordered(3,4,5)
order by next

With an ordered TVF, there is no SORT iterator, but there are extra query plan steps to operate on the ordered set. AND... the query cost is over 10x lower.

-- cost: 0.0023802
-- TVF -> Segment -> Sequence Project -> Assert -> Select
select * from dbo.FibonacciByNext(3,4,5)
order by next

-- cost: 0.0023802
-- TVF -> Segment -> Sequence Project -> Assert -> Select
select * from dbo.FibonacciByPrev(3,4,5)
order by prev

Note that it IS best to have a different TVF name for each sort order, and "no order", if you plan to use different ORDER BY clauses. This one has a Sort AND its cost is greater than the function that's declared UnOrdered

-- Different order
-- cost: 0.0292881 (more than Unordered = 0.0279081)
-- TVF -> Segment -> Sequence Project -> Assert -> Sort -> Select
select * from dbo.FibonacciByNext(3,4,5)
order by prev

In addition, the query plan guarentees that you don't lie in your order clause. Here's proof.

create function FibonacciWrong (@prevstart int, @start int, @rows int)
returns table (next int, prev int)
order (prev desc) -- THEY ARE IN ASCENDING ORDER, NOT DESCENDING
as external name orderedtvf.FData.Fibonacci
go

-- Error:
-- The order of the data in the stream does not conform to the ORDER hint
-- specified for the CLR TVF 'dbo.FibonacciWrong'.
-- The order of the  data must match the order specified in the ORDER hint for a CLR TVF.
-- Update the ORDER hint to reflect the order in which the input data is ordered,
-- or update the CLR TVF to match the order specified by the ORDER hint.
select * from dbo.FibonacciWrong(3,4,5)
order by prev

Remember, you're not only saving a SORT iterator in the query plan, you're saving memory too. The SORT iterator requires a memory grant. And because there are no stats for SQL Server to use in these "opaque to SQL" functions, the memory grant for the SORT iterator in FibonacciUnOrdered is 1024K. And, we hope that the rather generous memory grant is enough, else memory is being allocated during query execution.

So, declaring ordered TVFs is worth it.

Categories:
SQL Server 2008 | SQLCLR

One thing that I didn't find in the BOL What's New page is some of the new SQLCLR functionality in SQL Server 2008. The first one that intrigued me is support of multi-input user-defined aggregates. Suppose I wanted to implement Oracle's COVAR_POP aggregate, an analytic function that returns the population covariance between two expressions. The signature is COVAR_POP(expr1, expr2) and I want the signature to stay the same in SQL Server.

All that I need to do this is to use the "template" for a .NET UDAgg struct/class, replacing the Accumulate method that take one parameter with a 2-parameter method, like this:

public void Accumulate([SqlFacet(Precision = 20, Scale = 10)] Nullable<decimal> Value1, 
                               [SqlFacet(Precision = 20, Scale = 10)] Nullable<decimal> Value2)
{
  // code here
}

My CREATE AGGREGATE DDL statement would change a bit:

CREATE ASSEMBLY multiparmagg FROM 'C:\temp\multiparmagg.dll'
go

CREATE AGGREGATE dbo.covar_pop(@expr1 decimal(20,10), @expr2 decimal(20,10))
RETURNS decimal(20,10)
EXTERNAL NAME multiparmagg.CovarPop;
go

To invoke:

create table dbo.test_covar (
  i1 decimal(20,10),
  i2 decimal(20,10)
);
go
-- fill with data, then...
select dbo.covar_pop(i1, i2) from dbo.test_covar;

Happy aggregating.

Categories:
SQL Server 2008 | SQLCLR

With any new product or CTP, I've always gotten fast results by starting with the readme.txt file that comes with the product/CTP. Although its now called ReadmeSQL2008.htm, its still worth reading. This usually gives you answers about install scenarios and last-minute changes.

For example, this readme file answers two of the (so-far) FAQs about CTP5.
  What happened to Surface Area Configuration Utility?
  Can I install SQL Server 2008 CTP on Windows Server 2008? (ie, is it officially supported)

You'll need to RTFR for answers to these.

Right after the readme file, the first thing to do (after you've installed it) is to take a gander at the SQL Server Books Online, in the "What's New" section. Here's a bookmark for those without the facility to look in the TOC. mshelp://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10sq_GetStart/html/6a428023-e3cc-4626-a88a-4c13ccbd7db0.htm.

There's WAY TOO MANY new features in this CTP to describe in a few sentences. And the BOL writers have really had their hands full with this CTP, by the looks of the sheer volume of new material. Enough to keep anyone reading and experiementing for weeks. However...

Every once in a while something slips through the readme and What's new, or isn't finished yet, like the "What's new in SQL Server Installation" section. Or folks think of new ways to use features the BOL writers hadn't thought of. Or want to expound on the repercussions of feature XYZ. Stay tuned for these.

Categories:
SQL Server 2008

SQL Server 2008 adds the concept of priority for conversations. It's setup using special DDL statements, priority cannot be specified on the CREATE DIALOG CONVERSATION or SEND/RECEIVE DML statements. The DDL statements are CREATE/ALTER/DROP BROKER PRIORITY.

To specify a priority, you associate a BROKER PRIORITY object with combinations of the qualifiers LOCAL_SERIVCE_NAME/REMOTE_SERVICE_NAME/CONTRACT and the priority is associated with all messages and conversation endpoints that match that combination. The wildcard 'ALL' (or leaving the qualifier out entirely) is permitted for any or all of the qualifiers, and matching precidence is specifying in BOL under the CREATE BROKER PRIORITY syntax. The set of defined priorities is stored in sys.conversation_priorities metadata view.

In addition to this setup, the database has to be set to use priorities, with the "ALTER DATABASE...SET HONOR_BROKER_PRIORITY ON" DDL statement. The default behavior is not to honor priorities.

Once you define BROKER PRIORITY(s) and set the database to honor them, priority (default is 5) will be set on:
 sys.conversation_endpoints
 each message in the queue 
 sys.transmission_queue messages

Service Broker priorities are a much asked-for feature and its good to see it implemented. Because this is a new feature in CTP5, there are still some rough edges that need fixing. You can't set honor_broker_priority on CREATE or ATTACH database. And using the SMO scripter (Script As/CREATE on the database in Object Explorer) doesn't set honor_broker_priority either.

There are some really nice examples of the syntax and the concept in BOL. I've enclosed a really simple starter script that changes the default priority and illustrates the metadata. Enjoy.

broker_priority.sql (2.29 KB)

SQL Server 2008 adds support for the 'let' clause in FLWOR expressions. The for and let clauses have a similar purpose, to bind content (tuples) to variables.  Either one can begin a FLWOR expression:

declare @x xml = '';
select @x.query('
for $i in (1,2,3)
return $i
');
> returns 1 2 3

declare @x xml = '';
select @x.query('
let $i := (1,2,3)
return $i
');
> returns 1 2 3

The distinction is that let is an assignment clause, in the simple statement using 'let' above, $i refers to the entire sequence (1,2,3). The for clause sets up an iterator. The simple statement above using 'for' loops 3 times and each time through the loop $i refers to a single member of the sequence. So, if I add an 'order by' clause, the results are quite different.

declare @x xml = '';
select @x.query('
for $i in (1,2,3)
order by $i descending
return $i
');
> returns 3 2 1

declare @x xml = '';
select @x.query('
let $i := (1,2,3)
order by $i descending
return $i
');
> error:
> XQuery [query()]: 'order by' requires a singleton (or empty sequence), found operand of type 'xs:integer +'

One limitation on the XQuery let clause is that it does not support constructed elements. So this statement works fine:

declare @x xml = '';
select @x.query('
let $x := 1
return $x
');
> returns 1

but this statement does not:

declare @x xml = '';
select @x.query('
let $x := ( <foo>2</foo>, <bar>2</bar> )
return $x
');
> error:
XQuery [query()]: 'let' is not supported with constructed XML

So Let the use of the let clause begin...no longer do I have to explain what a "FWOR" expression is, hooray. However, nota bene. When 'let' is used inside a loop, it's evaluated each time around the loop:

declare @x xml = '';
select @x.query('
for $i in (1,2,3)
let $j := 42
return ($i, $j)
');
> returns 1 42 2 42 3 42
> $j is evaluated three times

 

The upcoming SQL Server 2008 provider for Powershell was a big hit and sparked quite a bit of discussion when I talked about and demonstrated it during TechEd/Developers last week. At the time I thought that this might be a good discussion topic for database administators, and, after asking around, a room was found, and I'll be re-presenting this session, entitled "Using Windows Powershell with the SQL Server 2008 Provider and SMO" here at TechEd/ITForum tomorrow (Friday) at 13:30-14:45 in room 131.

If you do administrative tasks with Powershell or are interested in SQL Server, drop by and bring your opinions. See you there.

Categories:
SQL Server 2008

Actually yesterday, but today was the first time I'd had a chance to write about it.

Having done quite a bit with what's now being called "traditional web services", my first impression of REST were, I'll have to admit, the thought that it was web services without schema. I kinda like my metadata; it's always been irritating that stored procedures do not store ANY metadata on the number or shapes of the rowsets returned, only metadata on the parameters are stored. The closest that ANSI comes is allowing specification of the number of rowsets, a piece of the standard the SQL Server doesn't implement. So web services without schema seemed less useful than web services with.

But one of the up-and-coming data access technologies at Microsoft is Astoria, which is described as "a REST-ful set of interfaces to relational database (and other) data". It's causing a lot of excitement.

After the conference I mentioned this to an old friend, Jon Flanders, as he was wearing his "real programmers care about URIs" (or something close) teeshirt. Asked him to explain the zen of REST. To summarize, I got the impression it was all about the location specification and using HTTP verbs like GET/POST/DELETE to effect "state transfer" operations. And, although its not very common, metadata can be specified using WADL (Web Application Description Language). So specifying the location is perhaps like a "connection string" to the service? And to the data the application interacts with?

I'm still a bit skeptical of the (seeming) typelessness and contract-lessness of it all, the "HTTP is the only protocol"-ishness, and IIRC, I can find out the location of a traditional web service using WSDL's soap:address element's "location" attribute in the service portion. But at least I have an somewhat of an understanding of what all the buzz is about.

More on Astoria in future posts.

I like free, standard, sample databases. One that I've been looking at for quite a while is the Mondial database currently available at Institute for Informatics Georg-August-Universität Göttingen. This set of data (from an old CIA World factbook and other sources), is available not only as a relational database, but as XML, RDF, and even F-logic.

The problem with using this with SQL Server has been the lack of a DATE datatype with the appropriate value range. The DATE datatype is used as "Date of Independence" and some countries have independence dates before 1753. So you'd have to represent it as a VARCHAR. Yuk.

With the new datatypes in SQL Server 2008, this is do-able without compromise, so here it is. In addition, since many of the locations (in two tables) come with latitude and longitude columns, I've added a GEOGRAPHY column, so when CTP5 comes out we'll have something to use GEOGRAPHY with.

Source scripts included. Enjoy!

mondial-sqlserver2008.zip (163.73 KB)

Categories:
SQL Server 2008

Ever since the hierarchyid data type was introduced in SQL Server 2008 CTP4, there's been a fair amount of discussion about renaming some of the methods of the data type to make them a bit more intuitive. In addition, there's been discussion about "missing" methods that don't exist in the hierarchyid data type. Well...

Because hierarchyid is a .NET-based system UDT, you can implement these "missing" methods yourself. There's a couple of ways to go about this:

1. Use your own assembly that uses the hierarchyid (that's Microsoft.SqlServer.Types.SqlHierarchyId to you) as parameters. Because its a "normal" SQL Server type, you can use you use it anywhere you can use a "native" SQL Server data type, like DATE.
2. Code your own UDT that inherits from SqlHierarchyId. Perhaps call it HIERARCHYID2, to further upset those folks who are offended by DATETIME2.

Wow. Did you say inheritence from a system UDT? Before we go down this path, bear in mind that this works with the CTP4 version of SQL Server 2008, but there's no guarentees about it working in the next CTP or release. I've heard nothing to that effect, but you never know. I've coded up a little stub and it seems to work, but...

I thought that T-SQL doesn't support UDT inheritence. Technically, it doesn't. That is, the SQL Server system catalogs (sys.types, etc) don't track UDT inheritence. This means that in order to allow T-SQL to "see" methods and properties in the base class, you need to write methods that do nothing but delegate to the base class. Because all the methods that you care about are public (by definition) you can do this. Simply override the methods that you don't want to pass through. I wrote about this a long time ago, see this blog post from 2004.

A few other considerations. Your UDT must be a class, not a struct, of course. Your UDT must implement IBinarySerialize and use UserDefined serialization, because the base class does. Microsoft.SqlServer.Types allows partially trusted callers, so your assembly should be able to work with permission_set safe. Finally, remember that SQL Server does not allow you to expose overloaded methods in assemblies, although you can use them in your internal implementation.

I'm working to expand the stub implementation, so if anyone has some neat ideas for derived methods and properties, I'd be interested in hearing from you. And, when CTP5 comes out...GEOGRAPHY2 anyone? If it doesn't do exactly what you want, change it.

By the way, I've wondered if deriving from a system data type isn't analogous to using undocumented system stored procedures, a practice which everyone I meet seems to disapprove of, but they do it anyway. I'm think at this point that it's more analogous to using a documented/supported system stored procedure, like sp_spaceused, in your own script. They can't change HIERARCHYID post-SQL Server 2008 without break all existing code. So you're safe, I think. Other opinions?

Categories:
SQL Server 2008

There's been some rumors going 'round about the immenent release of SQL Server 2008 CTP5. OK, maybe I've been asking around too. The main reason for the rumors is that there was a published CTP timetable at one point and its getting to be about that time. The other reason is the plethora of sessions, both here in Barcelona at TechEd/Developers and TechEd/ITForum and at DevConnections in Las Vegas (where some of my SQLskills collegues are this week) about features that won't appear until CTP5. Looking at the calendar of talks I can easily spot talks on Spatial Data and Filestream storage, two of the most compelling features in the next release, due in CTP5. I'll even be doing a spot of my own on Spatial data next week at TechEd/ITForum.

There have also been a number of postings by Microsoft employees about the upcoming CTP5 features, some of them including code. If you're at TechEd, don't miss Johannes Kebeck's talk on Virtual Earth, which may have some tidbits on usage of the SQL Server 2008 spatial data types. He's been working with the spatial data types since they were just a gleam in the SQL Server team's eye. Unfortunately, I'll miss it, because I've got a talk at the same time. Maybe Johannes can fill me in. And Michael Rys' talk on spatial data in SQL Server 2008.

But back to CTP5...the answer so far is just "soon, wait and see". But do attend the sessions and I believe you'll agree its something worth waiting for.

Categories:
SQL Server 2008

In the last few years, I've done a few talks at various conferences on the integration of SMO (SQL Server Management Objects) and Powershell. My friend and co-author of the SQL Server 2005 books, Dan Sullivan, got me into using Powershell and SMO and has written quite a number of excellent blog entries and articles about it. Because SMO is just another loadable .NET library, they're a perfect fit. For TechEd US, I even wrote a simple powershell provider that makes SQL Server look at a file system (NavigationCmdletProvider through the database objects), and showed the code.

I'll be doing another chalktalk at TechEd Developers, Europe, next week... with a twist. A few weeks ago, I'd been informed of plans to include a Powershell NavigationCmdletProvider provider that's shipped as part of SQL Server 2008! Really! It won't be in the very next CTP (due soon, according to the original official CTP schedule from long ago), but, barring untowed circumstances, it will be in the final product.

So if you have any interest in SQL Server and Powershell, I'll see you in Barcelona at the chalktalk. It will be a good time, promise.

Categories:
PowerShell | SQL Server 2008

I've been talking with folks for (it seems like) a long time about modeling sparse attributes in a relational database. Seems like I run into a new design where there's a need for sparse attributes every few weeks. If you remember the "hardware store" example (where each new sales item may have unqiue properties), that's just the tip of the iceburg. Basically, your choices boil down to: Sparse Columns (new column for each new attribute), Sparse Tables (new table for each new set of related attributes, if they are related), Entity-Attribute-Value (the "traditional" design, often eshewed because of scalability concerns), and XML (attributes model that sparse attributes, elements model the common attributes).

Last week I ran into a person with a modeling decision like this. He also informed me that he'd tried the sparse table design and ran into SQL Server's hard limit of 256 tables in a join (if you want all the sparse attributes for all products?). Wow. I can't image what a 256 table join would look like, and how the query processor would have time to load all the statistics for this one. He'd also run into the 1024 column limit with sparse columns. I told him to wait on that one; SQL Server 2008 will have sparse columns.

The XML design is interesting too, because you can do a search to which rows have which sparse attributes and spit out the right data.  It's what the XML VALUE index was designed for. And hoist the common attributes to persisted computed columns for best query perf. Seems that sparse columns may have that covered too, if its implemented like what was shown at TechEd US. There would be an optional column defined as "XML COLUMN_SET FOR ALL_SPARSE_COLUMNS". A value index on this should do the trick for a fast search too.

I (and quite a few other folks, if my networking is correct) can hardly wait...

I'll have to admit it, when I first saw that SQL Server 2008 was adding spatial data support, I thought of it as a niche. The province of geographers, cartographers, and maybe a few others. Complex, involving a lot of higher mathematics, each province having their own geographic encoding, and so on... And that level exists, to me it's the production of spatial reference data. When I think of spatial reference data, I think of map data you'd buy from ESRI and data posted by government agencies. Or made available by utilities, so you don't hit a power cable while digging in your garden. As opposed to spatial line of business data.

Now, before you go searching your LOB application for latitude and longitude columns, how about looking for columns that contain "address". It's a short hop from address to lat/long by using a geocoder. The one I used is the MapPoint web service. Now you have line-of-business spatial data. I'll bet every app has a field that contains address. And how about looking for the nearest salesperson for a potential customer? Or the nearest warehouse? Mapping programs like Virtual Earth, Google Earth, and Yahoo Maps can give you general business information and maps but how about encoding information in your own business?

I'm quite excitied over this upcoming "niche" feature and think it could make its way into each and every application. That's spatial data "for the masses" (so I'm not the greatest at sloganizing, forgive me).

A couple of people have been asking and, in preparing for my upcoming talk on spatial data on SQL Server 2008 at ITForum in Barcelona in a few weeks I got to ask about using the new system UDTs types on the client. In an upcoming CTP release there will be an MSI installer file specifically to install these types on clients. The appropriate files are installed and assembly (Microsoft.SqlServer.Types.dll) registered in the GAC.

Since these are .NET data types, these (HierarchyID, Geography, Geometry) are easily usable in SQLCLR functions and procedures too. No possible data type mismatches or nullability (the new types implement INullable and have a static property to return a NULL instance) concerns.

Categories:
SQL Server 2008 | SQLCLR

Just saw Aaron Bertrand's post from PASS on SQL Server 2008 Change Tracking vs Change Data Capture (change data capture (or CDC) is in the current CTP; change tracking is not). I'm not at PASS this week myself, but home while the house is being re-roofed. His post seems to confirm something I'd suspected all along.

Change tracking seems to go hand-in-hand with Sync Services for ADO.NET. I've been following Sync Services for a while; it's a set of libraries for controlling and implementing synchronization for disconnected database applications. Its current shipping vehicle is the Visual Studio Orcas Beta 2 release, along with SQL Server Compact Edition version 3.5. The fact that SQL Server 2008 Change Tracking provides a mechanism to keep track of which user (the sync OriginatorID) made a change (CDC doesn't) and also seems to provide automatic change table management for DELETEs (CHANGE_RETENTION) and a mechanism to "get the set of changes that have occured from a baseline" (the "sync_last_received_anchor" in sync services) makes Change Tracking line right up with what Sync Services requires.

Although Aaron mentions "offline stores like Outlook in cache mode", SQLCE is an exciting offline store because its currently deployed in places like Windows Media Player, Zune, Media Center PC, and more. SQLCE runs on mobile devices and desktops. Sync Services isn't available for mobile devices yet, but is said to be "in progress".

When you set up a Sync Service app, you currenly must make changes to the database (triggers, "tombstone tables for deletes", and such) referred to by the "ServerProvider", in order to track the information Sync needs. The Sync "ServiceProvider" architecture layers over/shares concepts with the ADO.NET provider model. But you don't have to make changes for the SQLCE 3.5-side (SQLCE is the only current "ClientProvider" that Sync Services supports) because, "support for sync is built in". Well...maybe it's built in to the server too, with SQL Server 2008. Bet we'll see (at least one) demo with Sync Services when the Change Tracking feature ships.

I'll be at TechEd and ITForum in Barcelona in November, doing sessions on SQL Server 2005 and SQL Server 2008. One of the TechEd sessions is called "To ODBC or to OLE DB?" and is a discussion of using ODBC and OLE DB is applications these days, whether one or the other is "better", if you should convert existing applications, etc. On that note...

Back at TechEd US, I'd spoken with Chris Lee, who's in charge of SQL Native Client. That's the OLE DB provider and ODBC driver that ship with SQL Server. SQL Server 2008 has a new version of the provider and driver, and when I'd asked if support for the new features (like the new DATE/TIME data types and table-valued parameters) Chris had not only replied "yes", but given me a demo of using table-valued parameters from ODBC to show off and post if I wanted to.

Here's the code. To build and run it:

1. Use the Visual C++ compiler that comes with Visual Studio 2005 or Visual Studio 2008 beta. I used VS2008 Beta 2.
2. Convert the project if needed. I ignored the warnings about 1 source file not being converted.
3. Make sure that sqlncli.h and sqlncli10.lib are available to the compiler. They're in C:\Program Files\Microsoft SQL Server\100\SDK\Includes and Lib, respectively.
4. Install SQLNCLI from the SQL Server 2008 CTP distibution on the client machine. Just run the SQLNCLI.msi in Servers\Setup.
5. Setup an ODBC System DSN from Control Panel/ODBC Administrator named TVPDemo. It must use the SQL Server Native Client 10.0 ODBC driver. Use the database of your choice, the demo will run DDL to create the database objects you need.
6. Compile the demo, set breakpoints and walk through the code

Cheers! See you in Barcelona!

 

ODBC TVP Sample code1.zip (1008.79 KB)

I received mail today from the SQL Server Compact Edition folks at Microsoft on my blog post on using SQL Server CE 3.5 beta, Visual Studio Beta 2, LINQ, and EDM. The current plan is:

1. Visual Studio 2008 will not ship with LINQ to SQLCE designer support. SQLMetal works just fine with SQLCE, though, as I'd mentioned.
2. There is planned future support for using SQLCE with EDM and LINQ to Entities when EDM ships after Visual Studio 2008 (VS 2008 SP1?).
3. There is a fix in the works for the FK issue that I had with the SQLCE Northwind sample database and SQLMetal.

Along these lines, I asked about the integration of LINQ to SQLCE with the updateable, scrollable, cursor-like behavior of SqlResultSet. Because SQLCE is an embedded database ("the engine" loads into your application) using the DataSet with SQLCE programming adds a layer of buffering (read: memory allocation and data copying) between the data and you. SqlResultSet is a perf win over using DataSet, and using LINQ to SQLCE or EDM to SQLCE, although they don't use the DataSet, doesn't allow in-place updating like SqlResultSet does. And they do use memory allocations rather than read directly from the SQLCE database. The current LINQ to SQLCE doesn't support SqlResultSet-like behavior yet, but perhaps in future...

So, people always ask... now that .NET Framework 3.0 installed on my system and there's going to be a new version of .NET that includes LINQ, what version of the framework will SQL Server load now, in SQL Server 2008? Does 2.0 still load? Or does it load 3.0 or 3.5?

Theoretically, and I've written this in books and previous blog entries, SQL Server loads the most recent version of the .NET framework. .NET service packs therefore affect SQLCLR. Especially if they contain updates to say, System.Data.dll. Or System.Xml.dll. Or [your favorite "approved" library goes here].

At present, one .exe can load one and only load version of the .NET framework. ASP.NET, for example, supports multiple versions with multiple worker-processes, that is, multiple .exe-s. Perhaps in .NET 4.0, we'll be able to run multiple .NETs in a single process. Perhaps not. The "main" .NET assembly is mscorlib.dll; this contains quite a few of the "main" .NET namespaces/classes. Namespaces don't necessarily correlate to DLLs, in Visual Studio 2008 version of .NET "System.Data" will be spread across a few DLLs. To me, there is "a new version of .NET" when there is a new version of mscorlib.dll. That's not the way its represented normally.

.NET 3.0 did not replace mscorlib.dll. Or System.Data, System.Xml, ASP.NET, or others. In my C:\WINDOWS\Microsoft.NET\Framework\v3.0 directory there is no DLLs, only three subdirectories: Windows Communication Foundation, Windows Workflow Foundation, and WPF. So, if you have a machine that "comes with .NET 3.0" it also must come with (at least) .NET 2.0.

My machine (with SQL Server 2008 and Visual Studio 2008 Beta 2) has a .NET 3.5 directory in it. And a subdirectory named "Microsoft .NET Framework 3.5 (Pre-Release Version)". Neither one has an mscorlib.dll in it. It's mostly Visual Studio-related DLLs and utilities.

The main additonal assemblies for 3.0 and 3.5 (like LINQ, for example) actually live in C:\Program Files\Reference Assemblies\Microsoft\Framework. V3.0 and V3.5. That's were the main action is, except that this is all mostly hidden by the presence of and registration in the global assembly cache (GAC).

So, an instance of SQL Server 2008 will load mscorlib.dll, version 2.0.50727.42, as always. However, in SQL Server 2008 (and 2005 for that matter, if it ships as a .NET upgrade through Windows Update), you'll see a new version of System.Data.dll, version 2.0.50727.1378. Or, perhaps, we'll be able to hold the versions constant, at the possible expense of interop between SQL Server 2005 features that use .NET. Like SSIS, SSMS, SSRS, and so on.

Got it? ;-)

Categories:
SQL Server 2008 | SQLCLR

Let's talk about clients and SQL Server 2008. First, a little history...

Every new version of SQL Server seems to include a new feature that requires a change to the protocol that SQL Server uses to talk to clients, the TDS protocol. TDS stands for tabular data stream, and is a propriatary protocol used by both SQL Server and Sybase. Since version 4.21, SQL Server's version of TDS and Sybase's version have "grown apart". Because TDS includes version negotiation you can still use old libraries to talk to newer versions, albeit at reduced functionality.

Even if they don't update the TDS protocol, there will always be new features that require changes to the client stack at some level.

SQL Server comes with support for the following client stacks:
ODBC - Open Database Connectivity, related to the ANSI SQL standard, vendor neutral
OLE DB - a COM-based vendor neutral library
ADO.NET - a .NET-based vendor neutral library

ADO (classic) is an IDispatch-friendly library over OLE DB
LINQ to SQL and Entity Data Model use ADO.NET to talk to the database. Remember LINQ to SQL is actually LINQ to SQL Server at present.

In addition, Microsoft ships a JDBC (which does not stand for Java Database Connectivity, the docs say so ;-) driver separately. DBLIB also still works, but not shipped with SQL Server any more. It's functionality is limited to features that existed in SQL Server 6.5, IIRC. Other vendors (e.g. DataDirect) ship SQL Server clients too. DataDirect licenses the TDS stack.

Before SQL Server 2005, providers and drivers were part of MDAC. MDAC (Microsoft Data Access Components) was once a separate install, but now its part of the OS. TDS libraries were separate DLLs (e.g. DBMSSOCN.dll). This stopped in

SQL Server 2005, and there are two main distribution vehicles:
SQL Native Client (SQLNCLI): OLE DB, ODBC, and network libs
ADO.NET System.Data.dll: SqlClient provider and network libs

So how does this happen in SQL Server 2008? For OLE DB and ODBC, there is a new version of SQL Native Client, version 10. Since MDAC is now part of the operating system, if you install SQL Server 2008 over SQL Server 2005, you'll now see three ODBC drivers, on my CTP4 + VS 2008 beta 2 system:

SQL Server          version 2000.86.1830.00
SQL Native Client   version 2005.90.3042.00
SQL Server Native Client 10.0 version 2007.100.1049.14

For OLE DB there are three providers:
Microsoft OLE DB Provider for SQL Server
SQL Native Client
SQL Server Native Client 10.0

For ADO.NET, the situation is a bit more interesting. There's a revision to System.Data.dll to include the new functionality. The new version currently ships with Visual Studio 2008 Beta 2 and its version number currently is 2.0.50727.1378. It simply replaces the version that's installed on the system (and in the GAC) at the time. There's only one ".NET 2.0" version registered in the GAC, as "Version Number 2.0.0.0". Hmmm....seems very "MDAC-like with MDAC as part of the operating system" (that is, the operating system includes .NET 2.0).

If you want to use the new SQL Server 2008 functionality, like date, time, datetime2, dateoffset, table-valued parameters and large UDTs/UDAggs (and perhaps FILESTREAM support when it arrives) with OLE DB or ODBC, you need to use the new driver/provider. This means changing the connection string. AND RETESTING. This also applies to SSIS packages, Reporting Services reports and anywhere else you used OLE DB or ODBC.

If you only install the operating system and SQL Server 2008, you won't have the original (version 9.0) provider and driver. These will be provided as a separate download, for folks that have installed only 2008 but haven't restested.

BTW, once upon a time, there was talk of "multiple versions of SQL Native Client running side-by-side, using the fusion SxS loader". That didn't happen. The new providers are separate and are registered in the registry separately. No SxS magic needed.

So, Microsoft.SqlServer.Types lives in the resource database and runs in its own appdomain when its used by system functions, like SELECT * FROM sys.assemblies. Let's try an experiment with the following setup.

I have two user assemblies in a database named "test". One doesn't access any .NET types, its called datetest. The new DATE/TIME-related data type series are not .NET-based, but you can use them in SQLCLR procs, as you can use NVARCHAR data type. There are some restrictions on DATE/TIME series, more about that later. My second user assembly is named hiertest. It uses the HierarchyID data type in SQLCLR code. That's (of course) OK too. Both my user assemblies are owned by DBO. Both catalog as SAFE_ACCESS.  There are no user assemblies in pubs database. Turn on SQL Profiler.

In pubs: SELECT * FROM sys.assemblies;

In profiler:
Assembly Load event:
Assembly Load Failed for requested assembly system.enterpriseservices, version=2.0.0.0, culture=neutral, publickeytoken=b03f5f7f11d50a3a, processorarchitecture=x86 with Error code: 0x80070002
In SQL Server Log:
 Common language runtime (CLR) functionality initialized using...
 AppDomain 2 (32767.sys[runtime].1) created
   
Still in pubs: declare @h hierarchyid; select HierarchyID::GetRoot(); -- invoke static method of hierarchyid data type

In profiler:
 Assembly Load Succeeded for Microsoft.SqlTypes.Types
In SQL Server Log:
 Unsafe assembly 'microsoft.sqlserver.types..." loaded into AppDomain 2 (32767.sys[runtime].1)

USE test
GO

In test: SELECT * FROM sys.assemblies;
In SQL Server log:
Assembly Load Failed for requested assembly system.enterpriseservices, version=2.0.0.0, culture=neutral, publickeytoken=b03f5f7f11d50a3a, processorarchitecture=x86 with Error code: 0x80070002
  
In test: EXEC dbo.somedateproc -- my SQLCLR proc that uses date.
In SQL Server log:
 AppDomain 3 (test.dbo[runtime].2) created
In profiler:
 Assembly Load Succeeded for datetimetest
  
In test: SELECT * FROM dbo.AncestorAndSelf('/'); -- my SQLCLR UDF that uses HierarchyID
In SQL Server log:
 Unsafe assembly 'microsoft.sqlserver.types..." loaded into AppDomain 3 (test.dbo[runtime].2)
In profiler:
 Assembly Load Succeeded for Microsoft.SqlTypes.Types
 Assembly Load Succeeded for hiertest

So what happened??

There is only one resource database appdomain, it gets created when system functions (sys.assemblies) use system assemblies. The actual assembly that contains the system .NET data types loads only when it's required, into the resource database's appdomain. By the way, the declaration of a (NULL) hierarchyID variable isn't enough to load the assembly, you must actually use the variable. If you use only TSQL and the new data types, only this one appdomain is needed, regardless of how many different databases use them.

Additonal appdomains are created on a per-database, per-assembly owner basis as in SQL Server 2005. Each "user" appdomain will also load the system assembly Microsoft.SqlTypes.Types, if and only if it needs it. That is, if you use HierarchyID (or Geometry/Geography) in a SQLCLR procedure.

One last bit. Why are the appdomains referred to as: "AppDomain 2 (32767.sys[runtime].1)" and what happened to AppDomain 1? When .NET is loading into any executing process, there is a single appdomain created, the default appdomain. AppDomain 1. SQL Server doesn't load Microsoft.SqlServer.Types into this default appdomain, but starts another for the resource database. That's AppDomain 2 (into the process). The ".1" in "32767.sys[runtime].1" is first user appdomain.

Database administrators like to know about everything going on in "their" database, and with good reason...if the database fails (or even runs slowly) its (s)he who gets the first phone call for help. SQLCLR is still relatively new. Hope this was helpful in explaining what's going on. But remember, the exact appdomain implementation could be refined further in later releases. Cheers.

Categories:
SQL Server 2008 | SQLCLR

I'm running with SQLCLR on, because I'd like to see the interaction between system SQLCLR code and appdomains. See the previous blog posting for an explanation of why this doesn't affect my system functions. First, I bring up SQL Server 2008 "fresh", open SQL log in SSMS, and start a profiler trace to catch Assembly Loading events. SQLCLR is nowhere to be seen.

USE TEMPDB
GO

SELECT * FROM sys.assemblies;
GO

The assembly list contains an entry for Microsoft.SqlServer.Types, that's the assembly that contains the system UDTs. It actually lives in the resource database, but shows up in system metadata lists for every database. In the SQL Server log, just listing the assemblies in a database produces:

Common language runtime (CLR) functionality initialized using...
AppDomain 2 (32767.sys[runtime].1) created

Database 32767 is the resource database, although its not for the most part directly visible in SQL Server 2005 metadata or in SSMS.

.NET appdomains are created on a per-database basis, currently one appdomain per assembly owner. So this functionality runs under an appdomain created for the owner "sys". The assembly is actually owned by principal_id 4, which is INFORMATION_SCHEMA according to sys.database_principals. But sys.schemas indicates that the sys schema is owned by principal_id 4 as well, INFORMATION_SCHEMA schema is owned by principal 3. They're likely referring to principal_id in the resource database, not the current database. Sys.assemblies also has this assembly marked as is_user_defined = 0 (false). And, the assembly has a safety level of UNSAFE. Hmmm...

Knowing how SQLCLR exception escalation works, I was concerned by the system assembly being UNSAFE. It is  running in its own appdomain. However, when I looked at the SQL Server log for the SQLCLR message, another seemingly unrelated message attracted my attention:

Using xpstar.dll version 2007.100.1049 to execute extended stored procedure xp_instance_regread...

SQL Server internals have always included extended stored procedures to perform system functions. User-written extended stored procedures can cause problems if poorly written, but this one (xp_instance_regread) was written by the SQL Server team, its part of SQL Server itself. Hmmm...so is Microsoft.SqlServer.Types part of SQL Server. And .NET code has more built in safeguards than unmanaged code. And BOL has indicated since SQL Server 2005 betas:

"This feature (i.e. extended stored procedures) will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature. Use CLR Integration instead."

So I guess its OK, they're taking their own advise. And this assembly has been tested with SQL Server for hardening, it doesn't produce the error message that cataloging, say, System.Runtime.Remoting, does.

One final item. SQL Profiler does not load Microsoft.SqlServer.Types.dll just because I execute "select * from sys.assemblies", but profiler reports:

Assembly Load Failed for requested assembly system.enterpriseservices, version=2.0.0.0, culture=neutral, publickeytoken=b03f5f7f11d50a3a, processorarchitecture=x86 with Error code: 0x80070002

Perhaps there's a stray assembly dependency somewhere in CTP4. System.EnterpriseServices is not on the list of tested assemblies either, so it shouldn't load automatically.

Categories:
SQL Server 2008 | SQLCLR

I was listening to a replay of the webcast recording on the HierarchyID by Michael Wang (thanks, Michael) and as he mentioned the considerations for the CLR-based type with respect to DDL, I thought it would be interesting to go back and see how this type showed up in the various facilities that we have for monitoring what SQLCLR is doing. These facilities are:

1. Ability to see assemblies registed in each database
2. Watch code-running appdomains (but not transient DDL-only appdomains) being created/torn down in SQL log
3. SQL Profiler event for Assembly.Load
4. Monitor memory and CLR-related processes using DMVs and perfmon

Before reporting the results obtained with SQL Server 2008 CTP4, we need a few clarifications. First, the way SQLCLR manages appdomains is an implementation detail and subject to change in future releases, service packs, or can even change before SQL Server 2008 ships. I'm just observing. Second, let's talk about what exactly the "sp_configure 'clr enabled', 0" does. You can also set this option using SQL Server Service Area Configuration utility. This option indicates whether or not its possible to run *user-written SQLCLR code*, that is, SQLCLR stored procedures, UDFs, trigger, UDTs, and UDAggs written by programmers.

The switch *does not*:

1. Keep SQLCLR from loading in SQL Server's process. This always loads the first time that you need it.
2. Prevent DBAs from using SQLCLR-related DDL, such as CREATE/ALTER/DROP ASSEMBLY, and define SQLCLR objects.
3. Prevent system functions that use SQLCLR from running. In SQL Server 2005, there were no SQLCLR system functions that I was aware of.

In SQL Server 2008, system functions that use SQLCLR that immediately come to mind include:
   a. The HierarchyID and upcoming spatial data types, Geometry and Geography
   b. Change Data Capture and the Dynamic Management Framework

So, its not that using system functions written in SQLCLR is "a trick" that bypasses an established control mechanism. The reality is that the mechanism was never defined to prevent SQLCLR loading, DDL, or system functions.

One things that will prevent SQLCLR from running is to enable lightweight pooling, or "fiber mode scheduling". This is also a configuration option, and its incompatible with SQLCLR. The CLR is not "fiber aware", although it may be implemented sometime in the future. A few other SQL Server system features are incompatible with fiber mode as well.

Categories:
SQL Server 2008 | SQLCLR

I've been trying out LINQ for SQL and the Entity Data Model (EDM) latest betas with some "different" data sources. This post covers using SQLCE 3.5. The latest version of SQLCE 3.5 comes with Visual Studio 2008 Beta 2.

Using SQLCE 3.5 data sources doesn't work with the built-in LINQ for SQL designer in Visual Studio 2008 beta 2 or the CTP EDM designer released shortly after beta 2. You can add a Data Connection for a SQLCE 3.5 database. I used the Northwind.sdf sample database that was supplied. But dragging a SQLCE table on to the LINQ to SQL designer produces "The selected object(s) use an unsupported provider". The EDM Wizard doesn't even list the SQLCE ADO.NET data provider as a choice. So there's no LINQ to SQL or EDM designer support for SQLCE 3.5.

SQLCE works quite well with SQLmetal, although my attempt to generate a .dbml for the Northwind.sdf sample database produces the error message:

error DBML1055: The DeleteOnNull attribute of the Association element 'Order Det
ails_FK01' can only be true for singleton association members mapped to non-null
able foreign key columns.

Removing the "Order Details" table provides a 'hit it with a hammer' solution; I haven't figured out a nicer way to solve this problem yet. Then, the SQLMetal-generated classes worked fine.

Using EDMgen against the Northwind.sdf file produced the following error:

error 7001: Could not load System.Data.SqlServerCe.Entity.dll. Reinstall SQL Server Compact.
Could not load file or assembly 'System.Data.SqlServerCe.Entity, Version
=3.5.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91' or one of its depend
encies. The system cannot find the file specified.

Hmmm...the version of SQLCE that came with Visual Studio 2008 B1 did come with this DLL. And, I had it working with EDM then. But the version in VS2008 B2 doesn't. As a last resort, I installed the SQLCE 3.5 version from VS 2008 B1, it's out on the web as a separate download. This changed the error message to:

error 7001: Method 'CreateDbCommandDefinition' in type 'System.Data.SqlServerCe.
SqlCeProviderServices' from assembly 'System.Data.SqlServerCe.Entity, Version=3.
5.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91' does not have an implementation.

Sure enough, the required implementation of the ProviderServices class required for EDM support has changed between VS Beta1 and VS Beta2. I confirmed this by checking the methods in the SqlProviderServices in System.Data.Entity.dll, they did change recently, adding CreateDbCommandDefinition. Unless I can find an updated System.Data.SqlServerCe.Entity.dll somewhere, it doesn't look like SQLCE is usable with EDM at this point in beta-time.

That's all for now.

I've been trying out LINQ for SQL and the Entity Data Model (EDM) latest betas with some "non-traditional" data sources. Namely SQL Server 2008 (CTP4) and SQL Server Compact Edition 3.5. I'll cover using SQL Server 2008 CTP in this blog entry, and using SQLCE 3.5 in the next one.

Using SQL Server 2008 data sources doesn't work with the built-in designers in Visual Studio 2008 beta 2. Adding new "LINQ to SQL classes" produces an empty design surface onto which you can drag items from Server Explorer. Problem is, you can't make a Server Explorer Data Connection for a SQL Server 2008 database "only versions 2005 and before are currently supported. The EDM Wizard actually includes "Use this selection to connect to Microsoft SQL Server 2000 or 2005..." when you attempt to create a Data Connection. Not SQL Server 2008. So there's no LINQ to SQL or EDM designer support. Or Server Explorer support, for that matter.

But each of the mapping products comes with a command line tool as well. LINQ to SQL uses SQLMetal, while the EDM uses EDMgen.

Using the command line tools, I was able to generate the appropriate artifacts (and use them) for SQL Server 2008. I can even load the artifacts (.dbml or .edmx) files into the Visual Studio designer after they were generated from the command line and edit them. Very cool.

I then did an experiment to see if LINQ to SQL or EDM supports the new data types in SQL Server 2008, namely, the new date/time-related types and the hierarchyID, which is implemented as a SQLCLR UDT.

SQLmetal did not produce an error or warning on the date/time types, but it did create definitions for the new data types as "NVarChar" with appropriate lengths. Using these definitions in LINQ code prints database-correct values, albeit as strings. Haven't tried an insert/update/delete yet. I had less success with hierarchyID, this produces the message:

warning SQM1021: Unable to extract column 'c1' of Table 'dbo.h1' from SqlServer
because the column's DbType is a user-defined type (UDT).

SQLmetal then error'd out and didn't generate a dbml file at all.

Trying date/time-related data types with EDMgen produces errors indicating that the date, time, datetime2, datetimeoffset aren't supported. Neither are SQLCLR UDTs, hierarchyID is reported as date type <unknown>.

Well, these are still beta. That's all for now.

Just found out about this one today. This SQLCLR function works right now in CTP4 of SQL Server 2008.

public static Nullable<int> AddTwo(Nullable<int> x, Nullable<int> y)
{ return x+y; }

create function dbo.addtwo (@x int, @y int)
returns int
as
external name asmname.[Mynamespace.Class1].AddTwo;

select dbo.addtwo(2, null);
-> null

Great! Turns out, I suggested this back in 2005...after SQL Server 2005 shipped. And they listened. Actually, going back over the list from 2005... SSMS does display NULL for a NULL UDT already, as of SP1. And large UDTs and UDAggs are also on the agenda for SQL Server 2008.

That may also explain why there's no System.Data.SqlTypes.TimeSpan/DateTimeOffset in Visual Studio Orcas Beta2. They're not needed.

Categories:
SQL Server 2008 | SQLCLR

Last blog entry on SQL Server Extended Events for a while. But...a couple of questions came up since I wrote my first blog entry on SQL Server Extended Events.
  What are the major advantages to SQL Server Extended Events?
  Is this really using the Crimson event system?

There's a couple of reasons that come to mind as an answer for the first one. First, and maybe foremost, this eventing system has an ETW target and therefore allows end-to-end tracing. ETW is a provider-based tracing system that is integrated throughout Windows. With the providers available ("logman -query providers" from the command line) you could trace from your ASP.NET app (by way of your IIS server), into SqlClient, across the network (providing that you can decode a TDS trace), into SQL Server, and back. That's a lot of power.

The main hassle I've always had with ETW is the dearth of post-processing tools available. You can use the tracerpt utility to process the ETW output into a comma-separated value file, but where you go from there depends on how well how you post-process the CSV. There is a logreader utility that can do simple SQL-like queries against a variety of log file formats. At one point, I made up a simple SSIS job to load SqlClient ETW output into SQL Server to do T-SQL queries, but where you go from there depends on your ability to decode the variable "data" fields in each event. Although the .mof files allow you to decode the binary format into columns based on the data type, CSVs aren't usually self-describing either, you have to know what each bit means.

Next reason why I was intrigued was the granularity of the eventing. At first glance, you can:
1. Create arbitrary groupings/rankings (buckets) on the event data with the bucketizer
2. Pair alloc/dealloc of most any type of resource with the pair_matcher
3. Add extra data (actions) to events. They even added a mechanism to determine causality.
4. Use as many targets as you wish (targets are separate from events and actions)
5. Add events and targets to a running session
6. Specify how much resources (like memory, dispatch latency) your trace should take (see CREATE EVENT SESSION DDL)
7. Use synchronous or asynchronous event collection, and event buffer retention
8. Specify memory partitioning by CPU or NUMA node

One of the good things about a trace is to attempt to balance "intrusiveness" (which slows things down) with thoroughness (you ARE usually tracing because there's a problem, after all).

The other question concerns the Crimson eventing system. Crimson is a really old codename for Windows Unified Eventing (Windows Eventing 6.0). You can collect your events in XML format and it uses an XML config file for registration, hence my possible confusion with the tern "XEvent" which I'd heard used for SQL Server's Exgtended Events. Matt Pietrick describes it as "an attempt to unify event log and ETW tracing". It's available on Vista and Longhorn OS's only. Here's a couple more references:

http://msdn.microsoft.com/msdnmag/issues/07/05/SecurityBriefs/
http://msdn.microsoft.com/msdnmag/issues/07/04/ETW/

Don't search for XEvent like I did, you'll get a lot of hits for the XWindow system XEvent (remember XWindows?). Or Crimson, you'll get a lot of hits on University of Alabama.

All of the articles refer to using the wevtutil utility to list event providers, like logman lists ETW providers. So I installed SQL Server 2008 on Longhorn Server (Windows Server 2008) beta3 and looked for "new" event providers and events. I didn't see any, so I'm not sure that SQL Server Extended Events will register anything more than ETW with Windows Unified Eventing. And the bucketizer/pairer targets write to dynamic management views, not to the event log. There is one more target, the asynch file system target, but that's not in this CTP. So stay tuned, perhaps this is an investing towards future "unified eventing".

BTW, I began to wonder: is unifying the event log with ETW data is really a good idea. Event logs are the "normal" messages that are emitted, tracing seems to me to be a "special occasion" messaging with possible very high message volumes. I don't think the two are actually mixed in the same physical location in Vista/Longhorn, but...what do you think about this unification?

This won't be as long of an entry because I'm trying to finish describing the items that you can use in an XEvent session, that is, the items that exist in a package.
  Events
  Targets
  Actions
  Predicates
  Maps
  Types
 
Let's do types and maps. A type is simply a data type, a simple type like Int16 or complex type like 'SOS_context'. Almost all the types live in package0, there's only one each in sqlos and sqlserver packages. Maps are enumerated constants.

See types:
select * from sys.dm_xe_objects where object_type = 'type'

and map:
select * from sys.dm_xe_objects where object_type = 'map'

and legal enumeration (map_key) values:
select map_value, map_key from sys.dm_xe_map_values where name = 'keyword_map'

Predicates: for predicates (think filters in SQL Profiler), you need a predicate source and a predicate comparator.
select * from sys.dm_xe_objects where object_type like 'pred%' order by object_type

There's probably more to it than this, there are customizable event attributes that can be SET in ADD EVENT, and predicates can use event fields for filtering (but not actions). But this will get you started.

One last thing that bears mentioning is the pair_matching target. This target allows you to specify a pair of events (like lock_acquired, lock_released) and after you've run the workload a while, it will show you (the relevent fields in the XML structure exposed by target_data (as in, SELECT CAST(target_data as xml) FROM sys.dm_xe_session_targets;) those events that do not match. That is, the locks that have been acquired but not yet released. VERY cool.

Once again BOL shows an example that takes advantage of knowledge of the pair_matching target's XML data structure. The XML used for target_data appear to be schema-less, i.e. they don't go by a named XML schema, i.e. you have to know what the structure items (elements, attributes, and values) mean. BTW, I keep referring to the BOL because I very much like the info in the BOL, as far as it goes, especially at this early stage. I come to (hopefully) elucidate and expound upon the BOL, not to complain about it. THANKS Buck, Alan, Steve, and all...

You can specify begin and end events, begin and end matching_columns and matching_actions. This is from:

SELECT * FROM sys.dm_xe_object_columns
WHERE column_type = 'customizable'
WHERE object_name = 'pair_matching'

Hope this was useful. Happy event tracing.

It's another rainy day in Portland in summer, so I thought I'd stay inside and write more about SQL Server 2008 Extended Events.

I wanted to finish things off by talking about actions and predicates. Need to make a detour at targets, too. I noticed the BOL examples (my point is to try not to repeat things you can find in the BOL) don't contain an example of actions in DDL. So we'll start with them. An action is an additional piece of data that you can tack on to an event. Like a stack trace, or even a causality ID. Or sql_text.

The available actions can be seen with:
SELECT * FROM sys.dm_xe_events WHERE type = 'action'

So let's try sql_text with our existing EVENT SESSION.

CREATE EVENT SESSION PubsLocksETW
ON SERVER
ADD EVENT sqlserver.lock_acquired
   (ACTION (sqlserver.sql_text)
    WHERE sqlserver.database_id = 12) -- (pubs database), add a predicate too
ADD TARGET package0.etw_classic_sync_target
   (SET default_etw_session_logfile_path = N'C:\temp\sqletw.etl' )
GO

But the sql_text doesn't appear in the ETW file. I did this a few times, because I thought I got it wrong. Maybe it will appear in the async file target, which isn't in this CTP. The BOL also says that not every action is valid for every event. Hmmm... the metadata (sys tables) seemed to be happy, but it ain't there. But I can use it with the bucketizer and pairer targets.

The bucketizer makes ...er' buckets (groups) of different "readings" (events) on a single data object. As in, group by lock type or group by cpu time. You can control how many buckets it makes. In fact that bucketizer target needs syntax like the following (from BOL).

CREATE EVENT SESSION MostLocks
ON SERVER
ADD EVENT sqlserver.lock_acquired (where sqlserver.database_id = 12) -- (pubs)
-- this means "create buckets based on object_id (object being locked in this case)"
ADD TARGET package0.synchronous_bucketizer
    (SET filtering_event_name='sqlserver.lock_acquired', source_type=0, source='object_id')
GO

But how did they figure out what to put after "SET"? Where does 'filtering_event_name' come from?

SELECT * FROM sys.dm_xe_object_columns
WHERE column_type = 'customizable'
WHERE object_name = 'synchronous_bucketizer'

There they are... and the description field suggests a use for 'action'.

CREATE EVENT SESSION PubsLockByText
ON SERVER
ADD EVENT sqlserver.lock_acquired
   (ACTION (sqlserver.sql_text)
    WHERE sqlserver.database_id = 12) -- (pubs database)

-- source_type= 1 means buckets by action, not by event
ADD TARGET package0.synchronous_bucketizer
   ( set filtering_event_name='sqlserver.lock_acquired', source_type=1, source='sqlserver.sql_text')
GO

Aha... now I have buckets created based on the text of the SQL statement that caused the lock, rather than by object_id.

This is getting to be too much for a single blog entry. But one last thing for now. You may not have noticed that I switched from using the "package.asynchronous_bucketizer" as the BOL does to using "package0.synchronous_bucketizer". Why? Because I want to do a simple, controlled experiment and I may not want to wait for the buffer to be full and async bucketizer to write out. BTW, for a simple controlled experiment, you can do:

USE pubs
GO

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
UPDATE authors SET au_fname = 'bob';
ROLLBACK  -- You don't really want everyone named 'bob', do you?

Leave the EVENT SESSION running to see the buckets. BOL has a cool query against the XML structure but to see the raw XML, if this is your only EVENT SESSION running...

SELECT CAST(target_data as xml) FROM sys.dm_xe_session_targets;

 

It's summertime in Oregon and its been nice and warm out (75-85F) lately. But today its raining (or specifically, the skies are quite ominous right now), so it's time to write more about my latest favorite subject: SQL Server 2008 Extended Events.

The SQL Server 2008 extended events introduce quite a bit of new terminology, but in investigating the specifics you come across some familiar themes.

Extended events are all contained in packages. An event package is identified by a GUID and a name. Three packages are provided and you can see brief descriptions by querying sys.dm_xe_packages. A package is just a container for all of the other objects (like events and targets) you'll refer to in event sessions. The grouping has no effect of EVENT SESSIONs; you can specify any object from any package in a single event session. The package names are: sqlserver, sqlos, and package0.

Two of the object types that packages contain are events and targets. Events name the information you can collect, these are defined in event_columns. Targets define where the event information is captured and how its processed before being collected. So what can you capture, already?

SELECT * FROM sys.dm_xe_objects WHERE type = 'event'

Only sqlos and sqlserver packages contain events. The events in sqlos are 40 low-level operating system-interaction events, as you might guess. An example is async_io_requested. The sqlserver packages contains over 80 events. These events seems to correspond to SQL Server counters you would see in performance monitor, rather than SQL Profiler trace events that EVENT NOTIFICATIONs use in SQL Server 2005, although there is some overlap. Many of these events only collect one event-specific column, a counter.

You can get a list of all the available events and the event-specific columns they collect by using:

SELECT convert(varchar(55),o.name) as [Object Name]
      ,convert(varchar(25),c.name) as [Column Name]
      ,c.column_id as [Column ID]
      ,convert(varchar(12),c.column_type) as [Column Type]
FROM sys.dm_xe_objects o
JOIN sys.dm_xe_object_columns c
ON o.name = c.object_name
WHERE o.object_type = 'event' AND c.column_type != 'readonly' -- readonly columns are common to most events
ORDER BY [Object Name]

So, to put this all together in an event session, lets use a variation of the BOL example:

CREATE EVENT SESSION test0
ON SERVER
ADD EVENT sqlserver.checkpoint_begin
ADD EVENT sqlserver.checkpoint_end
ADD TARGET package0.etw_classic_sync_target
    (SET default_etw_session_logfile_path = N'C:\temp\sqletw.etl')
go

See the previous blog entry for information about getting ETW working. Note that, in a single event session, we're using items from two different packages, sqlserver and package0. Start the session, using ALTER SESSION, then take a few checkpoints (or produce whatever event you decide to collect), ALTER SESSION to stop the session. Then you transform the (binary) ETL file to a .csv file by using tracerpt.exe.

When starting out with XEvent support, I thought it would be good to start with the ETW target, although you can capture and catagorize events in buckets with the async bucketizer target, and pair related events (like obtain lock/release lock) with the pair matching target. Both VERY cool. But I just wanted a raw, vanilla trace, to start out. And I wrote a paper on ADO.NET and ETW once. So easy one first, I thought...

It turns out that you need privileges to start an ETW session. The ETW session is started for you (rather than you using the logman utility and starting it yourself) when you issue an ALTER EVENT SESSION...STATE=START. But mine never started. It always produced:

Msg 25602, Level 17, State 17, Line 1
The target, "CE79811F-1A80-40E1-8F5D-7445A3F375E7.package0.etw_classic_sync_target", encountered a configuration error during initialization.  Object cannot be added to the event session.

BTW, the guid before the name of the package is the package module id. You don't have to use it in CREATE EVENT SESSION...ADD TARGET...

The reason for this (for those of you that aren't reading the Katmai forums) is that the SQL Server service account is used to activate the ETW session. In order for this to work, the service account must be a member of the groups "Performance Monitor Users" and "Performance Log Users". Mine wasn't a member of "Performance Log Users". Make it a member of the group and this target "works a treat", as my UK friends would say. I'm tracing events to ETW as I write this. THANKS to Jerome Halmans for this information, its not yet in BOL that I could find.

I've always installed SQL Server (since 2005) by creating a simple account that's a member of only the USERS group in Windows machine/domain. During install SQL Server gives this account all the privs (and only the privs) it needs. Mostly it does this by creating a group SQLServerMSSQLUser[machine][instance]. But it also makes the user you specify (I call it SQLService) members of groups (like "Performance Monitor Users") when it requires group membership. It's a good idea to pick a service account this way for principal of least privilege, rather than running SQL Server as something else, like Admin or LocalSystem. See the security best practices whitepaper for details.

It's an interesting observation that not all the privs you need are tied to that single group, created at installation. That's (one of a few reasons) why its always best to use SQL Server Configuration Manager to change the service account rather than the "Services" control panel applet.

I don't know if they're going to add "Performance Log Users" to the list of things that the installer does. If they don't add it automatically, don't forget to add it yourself for this feature (that is: ETW target in SQL Server XEvent). And don't forget to point the ETL file to a directory that the service account has permission to write to.

When I'm doing problem solving, its always good to have too much information rather than too little. With this in mind, I was quite interested in looking at SQL Server Extended Events (XEvent support) in SQL Server 2008.

You could always get diagnostic information in SQL Server through DBCC and SQL Trace/SQL Profiler. In SQL Server 2005 there are enhancements to SQL Profiler, dynamic management views (which enhanced and in some cases superceded DBCC information), DDL Triggers, and Event Notifications. There is also a WMI provide for events which uses event notifications internally. Event Notifications are sent to a SQL Server service broker queue and the events that are exposed are the same ones DDL triggers handle and most of the ones that SQL Profiler sees. In addition to all this info, there is an ETW (event tracing for Windows) provider for SQL Server. ETW support also appeared in System.Data.dll and the SNAC OLE DB provider/ODBC driver in SQL Server 2005/ADO.NET 2.0.

SQL Server 2008 adds support for XEvent (was codenamed Crimson), the new event system in Windows. The BOL provides info on this support, which works by creating and activating EVENT SESSIONs with DDL statements. Event sessions deal with items from event packages: events, targets, actions, types, predicates, and maps. You can mix and match the items from different packages in an EVENT SESSION.

So how do you get started? Create an event session (with CREATE SESSION DDL) and add items from the packages to your session (either in CREATE SESSION or in ALTER SESSION). You need at least one event and one target. You start/stop collecting by using ALTER EVENT SESSION...STATE=START/STOP.

- SQL Server 2008 ships with three packages: sqlserver, sqlos, and package0.
- There are lots of events. You can find them in sys.dm_xe_objects where object_type = 'event'.
- There are four targets, three of which work in the July CTP. The ones that work are:
    package0.asynchronous_bucketizer
    package0.pair_matching
    package0.etw_classic_sync_target

The first two targets write their info to sys.dm_xe_session_targets. You can join this to sys.dm_xe_sessions (after starting a session and collecting events) and look around. The fields in these DMVs are doc'd in BOL. Start with sys.dm_xe_session_targets.target_data.

The third target writes to an ETW session/file, providing compatibility with ETW. More on this one next.

I've been looking at the mapping of the new Katmai date/time data types to .NET types, both from the point of view of SQL Server ADO.NET clients and of SQLCLR procedures/functions/triggers. There are a couple of things that "interesting", if I'm not misunderstanding something obvious.

There are no new System.Data.SqlTypes to correspond to the new SQL Server data types. The beta Visual Studio docs state this as a fact. So, unless SQLCLR supports the generic Nullable types as parameters in Katmai, you'll pass these in and out as strings. Or only use them in UDFs with RETURNS NULL ON NULL INPUT. No word yet (that I'm aware of) on Nullable type support in SQL Server 2008, BUT the new HierarchyID type (which is .NET based) supports INullable, like SqlTypes do. So I can pass in a NULL HierarchyID to SQLCLR, but not a NULL TIME parameter.

SQL Server's TIME data type is mapped to TimeSpan, which is a time interval. data type. This might confuse folks that know SQL Server doesn't support SQL-99's date/time interval data types. But the TimeSpan seems to have appropriate semantics as long as you stay away from the "Days" related properties and methods.

There is a new .NET data type for DATETIMEOFFSET, System.DateTimeOffset. That's good.

To distinguish between SQL Server 2000/2005/2008 usage/mappings, there is a (client-side) connection string parameter: "Type System Version". You have a choice of 2008, 2005, 2000, or "Latest". This was already used in SQL Server 2005 to distinguish between 2005 (XML data type exists) vs 2000 (XML data type is a long string). And other distinctions.

I wonder if LINQ for SQL and Entity Framework will support these... ;-)

I've been trying out table-valued parameters along with ADO.NET support in Orcas and came across an interesting dilemma. It centers around INSERTs using TVP against a table with an identity column. ADO.NET can use DataTable, IDataReader or IList<SqlDataRecord> to represent a TVP parameter. Let's say that I want a TVP and a procedure for insert that looks like this:

CREATE TYPE dbo.JobsTableWithIdentity AS TABLE (
  job_id smallint identity primary key,
  job_desc varchar(50),
  min_lvl tinyint,
  max_lvl tinyint
);

The "insert proc" would look like this:

CREATE PROCEDURE dbo.InsertJobsID (@tvp1 dbo.JobsTableWithIdentity readonly)
as
INSERT INTO dbo.Jobs (job_desc, min_lvl, max_lvl)
  SELECT job_desc, min_lvl, max_lvl from @tvp1;

Using this in ADO.NET (with either DataTable or IDataReader as a parameter) produces the error: "INSERT into an identity column not allowed on table variables. The data for table-valued parameter "@tvp1" doesn't conform to the table type of the parameter." But I didn't DO an insert into an identity column in the proc. And this works just fine in T-SQL:

DECLARE @t dbo.JobsTableWithIdentity;
INSERT @t VALUES('hi', 10, 10);
INSERT @t VALUES('hi2', 10, 10);
EXEC InsertJobsID @t;

So its ADO.NET "deciding" this is an error. The ADO.NET workaround (if I do want to start with a DataTable that contains the identity column, add rows to it, and call Update) is this:

CREATE TYPE dbo.JobsTableWithoutIdentity AS TABLE (
  -- job_id smallint identity primary key,
  job_desc varchar(50),
  min_lvl tinyint,
  max_lvl tinyint
);

-- sproc dbo.InsertJobsNoID changed accordingly

// and then, in ADO.NET code
// DataTable "t" contains a real jobs table, to which I've added rows

DataTable added = t.GetChanges(DataRowState.Added);
added.Columns.Remove("job_id");
da.InsertCommand.CommandText = "dbo.InsertJobsNoID";
da.InsertCommand.CommandType = CommandType.StoredProcedure;
da.InsertCommand.Parameters.AddWithValue("@tvp1", added);

But, should I have to do this? Or modify the T-SQL code, given I've not used the IDENTITY column? But, I will need this column to UPDATE (or MERGE) using the TVP. A dilemma...

I saw Dan Jones' posting that the SQL Server 2008 July CTP (aka CTP4) was available on the Connect website. This one has some good things in it (like the new date/time datatypes as well as the hierarchyid data type) that should keep me busy for a while. There's much more new stuff than that, but the connect website has also the detalis.

One thing that is included that isn't intuatively obvious is ADO.NET client support. The Visual Studio Orcas Beta 2 release contained a new version (well, its still called version 2.0.50727, hmmm...) of System.Data.dll with support for the new date/time data types and also for table-valued parameters. The only problem was that it didn't even *connect* to CTP3 (there was a "network protocol" error). But it connects to the July CTP just fine. One things that not working in Orcas Beta 2 is SQLCLR projects against a SQL Server 2008 database, but typing CREATE ASSEMBLY is a small price to pay.

The new OLE DB provider and ODBC driver have been in place in the last CTP, but this is the first I've seen of .NET client functionality.

There are two items (listed in the readme file) that will not be in the SQL Server 2008 release. One is SQL Server Notification Services, which made its first appearence as an web release add-in to SQL Server 2000. Its not shipping in SQL Server 2008, and (some/most of) its functionality will eventually appear in Reporting Services. Another (removed from the installer) is a less-well known add-in that also debuted as a web release, SQLXML 4.0 (NOT to be confused with the XML data type, which is alive and well and has new xsd:date etc support). This used to be known as the SQLXML Web Release (V1,V2,V3) for SQL Server 2000 and SQLXML 4.0 (mostly) shipped "in the box" in SQL Server 2005. Some of its functionality was superceded by native XML and Web Services support in SQL Server 2005. It will be removed from the installer and shipped as a separate component instead, like all versions previous to version 4.0 were.

Categories:
SQL Server 2008

SQL Server 2008 will contain an ANSI SQL standard MERGE statement with extensions. Listening to the webcast last Friday, there's a fairly straightforward way to describe how this works.

Let's go back to first principals. Relational databases support two ways to do UPDATE and DELETE; positioned updates and searched updates. In a positioned update, you open an updatable cursor over a set of rows, navigate to the row you want, and issue an "UPDATE...WHERE CURRENT OF" statement. The searched update (UPDATE foo SET x=y WHERE z=1) is a SELECT and UPDATE in one statement. You specify the rows you want and also update them. The way you specify/gather the rows to update is mostly up to you, SQL Server even supports using a JOIN to do this.

update t
set t.name = s.name, t.age  = s.age
from [target] t
join [source] s on t.id = s.id;
go

MERGE in SQL Server 2008 is a searched operation that can operate on up to three different rowsets depending on how you specify the MERGE statement. These are actually the three rowsets you can get from a FULL OUTER JOIN. The statement:

select [target] t
inner join [source] s on t.id = s.id;

gets the rows in table T with a matching id value in table S. Let's call this rowset1. A left outer join will also include the rows in T that do not match S (rowset2); right outer join includes the rows in T that don't match S (rowset3), and FULL OUTER JOIN contains all three rowsets.

In MERGE, you can get up to all three rowsets (INNER, LEFT, and RIGHT join rowsets)

merge [target] t
 using [source] s on t.id = s.id
 when matched then update t.name = s.name, t.age = s.age -- use "rowset1"
 when not matched then insert values(id,name,age) -- use "rowset2"
 when source not matched then delete; -- use "rowset3"

The query processor will do the appropriate type of join to gather the rowsets that you need, as all clauses are not required. This optimizes performance over multiple statements, as you only have to gather the rows to process once.

Even, if you use only "when matched", MERGE is an improvement over our first "update using a join". If more than one row in the source matches one row in the target...

insert into t values(1, 'Fred', 42)
insert into s values(1, 'Buddy', 43)
insert into s values(1, 'Sam', '95)

The update using a join non-deterministically picks a row in the source to do the update, it could be Buddy or Sam. MERGE throws the following exception.

Msg 8672, Level 16, State 1, Line 1
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to  ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.

MERGE can actually do more than three operations using predicates in the "match/no match clauses", but that's it for now.

Many of you have already heard the "hardware store" story. What's the best way to model products in a hardware store, where new items arrive at the hardware store each day. Each item has a "short list" of similar properties (e.g. UPC, price) and a long list of dissimilar properties (e.g. paint has color, type, amount and curtain rods have width, metal, etc). How to model the dissimilar properties for each item in relational table(s)?

This isn't as unusual of a problem as you might think, examples I've heard lately include:
 Items in a directory system (like AD)
 Readings for lab test results
 Attributes for Sharepoint items

I've always thought of the main contenders as:
1. Sparse tables - one per product
2. Sparse columns - 90% of the column values would be NULL
3. Model as XML - similar properties are subelements, sparse properties are attributes
4. Entity-attribute-value (EAV) - also known as open schema. A separate "properties" table with name-value pairs.

EAV is one of the most popular solutions, even supposedly endorsed by standard schemas in some industries. Many relational purists detest EAV because its non-relational. It's main drawbacks are that the "name-value pair" table gets huge fast, with the corresponding lack of performance, the need for careful editing (color and colour would be two different attributes), and the fact that the "value" column of name-value must have a data type of nvarchar or SQL-variant.

SQL Server 2005 added the PIVOT keyword. One use for PIVOT is the change the EAV tables into something that looks like sparse tables.

I even had the opportunity to ask Joe Celko (no fan of EAV) which he prefers, trying to ease him towards the "model as XML" mechanism. He stood up for sparse tables or sparse columns.

SQL Server 2008 will include support for sparse columns. You can designate a column as

SPARSE in the DDL, like this:

CREATE TABLE products (product_num int, item_num int, price decimal(7,2), ...,
                       color char(5) SPARSE, width float SPARSE...)

You can have a huge number of sparse columns per table, although the number of non-sparse columns remains at 1024. In addition, SQL Server 2008 will support sparse indexes (aka filtered indexes) defined like:

CREATE INDEX coloridx ON products(color) WHERE product_num IN (21,22,42...)

Finally, you can have an XML "COLUMN SET" column for each table; this exposes the sparse properties (or perhaps a subset of them?) for each item as a collection of XML elements, for those folks that like to model these as XML.

ALTER TABLE products ADD COLUMN properties XML COLUMN_SET FOR ALL_SPARSE_COLUMNS

It's an interesting idea; the proof will be in the perf as well as the usability.

ORDPATH is a hierarchical labeling scheme used in the internal implementation of the XML data type in SQL Server 2005. It's meant to provide optimized representation of hierarchies, simplify insertion of nodes at arbitrary locations in a tree, and also provide document order. It's described in the academic whitepaper "ORDPATHs: Insert-Friendly XML Node Labels". In addition to being used internally when the XML data type is stored, its also part of the key of the PRIMARY XML INDEX, used to speed up XQuery. It's an implementation/specialization of the path enumeration model of representing hierarchies in relational databases, mentioned in Joe Celko's book "Trees and Hierarchies in SQL".

In SQL Server 2008, there are additional uses of ORDPATH. There is a new system data type HierarchyID, that will likely use ORDPATH in its implementation. This allows simply hierarchies to be represented as relational column and provides methods that optimize common hierarchical operations (like parent, child, sibling, ancestors, descendants) without being concerned about the intricacies of elements and attributes.

In addition to representing and indexing XML and hierarchies, Michael Rys mentioned at his TechEd chalktalk on spatial data that the spatial data types may be indexed using a multi-level grid system and that these indexes would also use ORDPATH. Since neither HierarchyID or spatial types are in the current CTP of SQL Server 2008, we'll have to wait a bit to see if this is truly "ordpath everywhere".

SQL Server 2008 will contain, if my count is correct, 7 new data types. Note: none of these are in the current CTP.

DATE - ANSI-compliant date data type
TIME - ANSI-compliant time data type with variable precision
DATETIMEOFFSET - timezone aware/preserved datetime
DATETIME2 - like DATETIME, but with variable precision and large date range

GEOMETRY - "flat earth" spatial data type
GEOGRAPHY - "round earth" spatial data type
HIERARCHYID - represents hierarchies using path enumeration model

The first four (date/time series) are NOT implemented/exposed as .NET system UDTs, but the last three are exposed as .NET system UDTs. This means that, for the first time, .NET will be used as part of SQL Server. In SQL Server 2005, there were originally DATE and TIME data types implemented in .NET, but implementing temporal data is an intricate process. After much wailing and gnashing of teeth by some members of the user community, these were removed.  One of the complaints was the .NET implementation. Hmmm...

One of the nice side-effects of implementing spatial and hierarchyid as .NET types is that these will be shipped as a separate assembly, and that the types will be available for client-side and middle-tier use as well as in the database. So if you want to do some massive number crunching of spatial sequences on a computation server and the network traffic from database to computation server is acceptable, you can do so.

In general, SQLCLR makes the "logic in database or middle-tier" argument easier to deal with. Although there's no "run on database or run on server" switch in VS, with minimal code changes you can move your logic, or even duplicate the logic between tiers if need be. You can't do this with T-SQL; although its faster and better for data access on the database, it doesn't run outside the database. Unless you want to use SQL Server Express Edition as an application server. But that's a discussion for another time.

Had to write about another thing that "caught my ear" at TechEd during a chalktalk by Rick Negrin about Service Broker usage patterns.

Service Broker supports "internal activation", that is, associate a stored procedure that gets invoked when a queue has messages to process as well as "external activiation". When using external activation, an event notification occurs when a queue has messages to process, and this notification is picked up by an external application; the external application processes the queue messages, out-of-process to SQL Server.

There is an SSMS template (see template explorer) for an internal activator procedure and an engine sample implementation of a class library to support SQLCLR activator procs. In addition, Remus Rusanu has written some excellent blog entries on activation program message processing patterns.

There is a sample implementation of an external activator as well. Rick mentioned "productizing the external activator for SQL Server 2008". This would mean making the activator part of the SQL Server product, along with the extensive testing and support that go along with it.

That's great news, and a supported, standard, configurable, external activator would make a great addition for SQL Server 2008. Now, about improved broker diagnostic and configuration utilities...and that SQLCLR support library....and improved SSMS support. Those would help things out too (some people are never satisfied).

The June CTP of SQL Server 2008 contains support for table-valued parameters. Here's a usage scenario for these that has been around for a while.

Imagine you are running an online store and deal with (among other data) orders and order detail lines. You'd like to have a stored procedure that can add an entire order in one server round trip, regardless of the number of items that I order (that is, 1 order header and 1->n order detail lines). You'd even settle for two round trips, one for the order header, one for the details. Before SQL Server 2008, there is no built-in mechanism that supports this. In the past, I've seen some pretty interesting workarounds, such as:

1. Compose an arbitrarily large SQL batch as a single "command text" by using string concatenation on the client/middle-tier. All SQL Server database APIs support one (and only one) batch per Command. ADO classic did something like this when you inserted/updated/deleted multiple rows in a disconnected Recordset and called for a "batch update".

2. Create a stored procedure with some "static" parameters and an arbitrarily large number of repeating parameters, most of which will always be NULL. The limit to the number of parameters in a stored procedure is 2100. Both this method and the previous one make for some pretty hideous-looking code.

3. Use multiple parameter sets. OLE DB does support multiple parameter sets and some databases can optimize inserts that use multiple parameter sets. The SQL Server providers, at least last time I looked with SQL Profiler, turn multiple parameter sets into multiple calls, that is, one round trip per parameter set. That's not what I wanted.
 
SQL Server 2008's solution to this age-old problem is table-valued parameters (TVPs, for short). You start using a TVP by creating a custom type, using the CREATE TYPE statement, like this:

CREATE TYPE lineitem_type (line_number INT, order_id INT, product INT, quantity INT);

Information about these table types appear in sys.types and also in a new metadata view, sys.table_types. Once you've created such a table type, you can use it in T-SQL like this:

CREATE PROC new_order (@order_id INT, @line_items lineitem_type)
AS
-- silly table names used for clarity
INSERT INTO orders_table VALUES (@order_id ... ) ;
INSERT INTO line_items_table
  SELECT * FROM @line_items;

One round trip. Compact, clean code. Nice.

SQL Server 2008 will include GROUPING SETS; a (very) short explanation is that these allow the equivalent of multiple GROUP BY clauses in a single SQL statement. The result is a UNION ALL of the resultsets. SQL Server 2008 also contains/allows standard syntax for ROLLUP and CUBE, which have been in SQL Server for a while.

One way to use (or to think of) grouping sets is that, while ROLLUP with N columns produces a UNION of N+1 results and CUBE produces N-squared -1 results, grouping sets can produce an intermediate number of results, when not all the dimensions produced by CUBE are needed. Grouping sets should allow better optimization of this type of dimensional query.

SQL Server 2008 Reporting Services will contain a new type of control, called the TABLIX. The SQL Server 2008 CTP BOL defines a TABLIX as: "A Reporting Services RDL data region that contains rows and columns resembling a table or matrix, possibly sharing characteristics of both." Grouping sets sound like a good fit with this component. When TABLIX is available, a short profiler session would confirm this.

Of course, in addition to performance benefits and TABLIX support, GROUPING SETS are part of the ISO-ANSI SQL-2006 spec. Another plus.

There was a fairly well-known Powerpoint slide that attempted to summarize the new SQL Server 2005 features in bullet points of a single slide. By the release, there were so many new features, the feature list had to be rendered in a 5-point font to fit. At TechEd 2007, Microsoft presented the new features of SQL Server 2008 (was: SQL Server Katmai) in an analogous format. Although it's not yet down to a 5-point font, there are quite a few impressive new features on tap. Here's the list, modulo (my own) potential spelling errors. There is currently a CTP of SQL Server 2008 as well as beta1 of the next version of Visual Studio.NET (Orcas) and the ADO.NET synchronization framework. The features I can find in the current betas have asterisks. [Note: Asterisks updated for the July CTP]

Transparent Data Encryption
External Key Management
Data Auditing
Pluggable CPU
Transparent Failover for Database Mirroring
Declarative Management Framework*
Server Group Management*
Streamlined Installation*
Enterprise System Management*
Performance Data Collection*
System Analysis*
Data Compression
Query Optimization Modes
Resource Governor
Entity Data Model*
LINQ*
Visual Entity Designer*
Entity Aware Adapters

SQL Server Change Tracking*
Synchronized Programming Model*
Visual Studio Support*
SQL Server Conflict Detection
FILESTREAM data type
Integrated Full Text Search
Sparse Columns
Large User Defined Types*
Date/Time Data Type*
LOCATION data type
SPATIAL data type
Virtual Earth Integration
Partitioned Table Parallelism
Query Optimizations*
Persistent Lookups
Change Data Capture*
Backup Compression
MERGE SQL Statement*
Data Profiling
Star Join*

Enterprise Reporting Engine*
Internet Report Deployment
Block Computations
Scale out Analysis
BI Platform Management
Export to Word and Excel
Author reports in Word and Excel
Report Builder Enhancements
TABLIX
Rich Formatted Data
Personalized Perspectives
… and many more

It should be straightforward to eventually produce a 5-point slide because these features haven't made "the list" yet.

Grouping Sets*
Table-valued parameters*
Table-valued constructors*
"Delighters" (1-stmt variable declaration/assignment, increment operators)*
HierarchyID data type*
Large User Defined Aggregates
New versions of ODBC driver*, OLE DB provider*, and ADO.NET data provider
SQLCE Version 3.5*
Database Mirroring Enhancements*
Data Mining Algorithm Enhancements*
SSIS support for the new date/time data types*
SSIS VSTA support*

That's an impressive set of features. And I'm pretty certain, since this is a summary, that's there are more to come.

Categories:
SQL Server 2008

SQL Server "Katmai", which some folks are already calling "SQL Server 2008" because of its scheduled release date, was officially annonuced yesterday. I've been following things and noticed, a while back, that there are a number of Katmai sessions scheduled at TechEd in Orlando, in June. Now that there is an official announcement, I guess I can blog about these without possibly hurting anyone's feelings. Note that, as with the official announcement, the descriptions are pretty sparse. So you'll have to come and see for yourself...

DAT201 - The Next Release of Microsoft SQL Server: Overview
DAT202 - The Next Release of Microsoft SQL Server: Beyond Relational
DAT303 - The Next Release of Microsoft SQL Server: Data Warehousing Enhancements
DAT204 - The Next Release of Microsoft SQL Server: Manageability Overview

DAT17-TLC - Spatial Support in Microsoft SQL Server
DAT06-TLC - New T-SQL System Types in the Next Release of Microsoft SQL Server
DAT18-TLC - Overview of New T-SQL Programmability Features in the Next Release of Microsoft SQL Server

Categories:
SQL Server 2008

Theme design by Nukeation based on Jelle Druyts