Saturday, March 01, 2008

This post is part of a series, see parts 1 and 2. Sorry to be so long getting back to this series.

This post covers LINQ to SQL and EF worry #3. That is:

LINQ to SQL and EF will encourage "SELECT to middle-tier, then update/delete" rather than issuing SQL UPDATE/DELETE statements that are set-based.

Neither LINQ to SQL or Entity Framework currently contains an insert/update/delete language. Entity SQL could contain DML in the future, LINQ (Language Intergrated QUERY) to SQL doesn't have one. But both APIs can affect INSERT/UPDATE/DELETE operations on the database. You create or manipulate object instances then call SaveChanges (EF) or SubmitChanges (LINQ to SQL).

The manipulate objects and save method works well in LINQ to SQL and reasonably well in EF. The distinction is that in EF, if there are related entities (e.g. a title row contains an integrity constraint that mandates that is title's publisher must exist in the publisher's table), you must fetch the related entity first. Or synthsize a reference using EntityKey (see associated post here) to save the database roundtrip involved in fetching the related entity. See this associated post about deleting a single row without fetching. But how about UPDATE?

SaveChanges and SubmitChanges can boxcar multiple INSERT/UPDATE/DELETE operations in a single roundtrip. But let's consider the number of database roundtrips involved to change a single customer row. This requires one roundtrip to "GET" the row and another to update the row. And what about a searched update in SQL (UPDATE...WHERE) that updates multiple rows? Or an updated based on a SQL join condition (my favorite example, using update over a recusrive CTE, gets all employees reporting to a certain manager and gives them all a raise)? The number of fetches required JUST to do the update increases. Maybe not the roundtrips required to get the rows, but the sheer number of fetches (network/datebase traffic) required.

Let's address the general "GET then UPDATE" pattern first. I worried about this one until I realized that, in most applications I've worked on, you don't usually do a "blind" UPDATE or DELETE. A customer web application fetches a row (and related rows), a pair of eyes inspects the row to ensure this IS indeed the right row, and then presses the gadget that causes an UPDATE/DELETE. So "GET then UPDATE" is an integral part of most applications anyway. OK for now. If UPDATE/DELETE of a row affects related rows, this can be accomplished with cascading UPDATE/DELETE in the database.

But how about multiple, searched, updates without inspecting/fetching ALL the rows involved? Neither LINQ to SQL or EF has a straighforward way to deal with this. AlexJ wrote an excellent 4-part blog series (start here) about rolling your own searched update in EF with an underlying SQL Server using .NET extension methods getting the SQL query text and string handling to turn it into an Update, but its not necessarily what I'd call straightforward. Maybe, wrapped in a library to encapsulate the details... It also looks SQL Server-dependent, and I thought EF wasn't supposed to be. So you'd need to replicate this for each provider.

LINQ to SQL contains the ultimate fallback method for this case. DataContext.ExecuteCommand() lets you execute any SQL command, including parameters. EF doesn't have the equivalent because (remember) your data store is an object model over a conceptual data source, not the data source itself.

I think this is one place (searched UPDATE/DELETE) that I'd suggest/mandate using stored procedures. The blind searched operation is accomplished in a single database roundtrip, and you can even (in SQL Server at least) use the OUTPUT clause in SQL DML to obtain information in rowset form as to exactly what got deleted. Since this is a database-specific operation, stored procedure sounds like a good workaround for this problem.

MHO: With stored procedures as needed and the realization that most apps use "GET then UPDATE" anyway, I think I'll dismiss this worry.

Saturday, March 01, 2008 2:52:51 PM (Pacific Standard Time, UTC-08:00)  #    Comments [3]  | 

Recently I had the occasion to load .NET 3.5 on to a machine that had an existing instance of SQL Server 2005. .NET 3.5 does not work by versioning the "main .NET assemblies" (e.g. there is no "version 3.5" of mscorlib.dll, System.dll) but by replacing the 2.0 versions of them. You can observe this by inspecting the 4-part version number. For example System.dll (in \windows\Microsoft.NET, Framework\2.0.50727 as well as in the GAC) changed from version 2.0.50727.42 to 2.0.50727.1433 when I installed .NET 3.5. As an aside, when I installed Vista SP1 recently (I think this is what triggered it) System.dll went to version 2.0.50727.1434.

SQLCLR (using SQL Server as a .NET runtime host) will load a series of "approved" assemblies by directly using the fusion APIs. However there are less than 20 approved assemblies for SQL Server 2005 and sometimes people will write SQLCLR code using assemblies that aren't of the approved list. The two most common cases I've seen include using System.DirectoryServices.dll or some of the WCF client assemblies. Neither is on the approved list.

Although using system assemblies outside the approved list is discouraged, it IS possible. The resulting (user) assembly must be cataloged as UNSAFE and the CREATE ASSEMBLY process will catalog not only the original user assembly and referenced "system" assembly but, because of the way assembly dependancies work, quite a few (I've seen up to 20) dependent system assemblies. You can see the user assembly and all dependent system assemblies in the SQL Server metadata table "sys.assemblies". You also receive warnings that "assembly xyz has not been tested in this environment".

I've always said that if you use system assemblies that are not in the "approved" list, you now shift the responsibility of managing those assemblies to *you*, the SQL Server application programmer and/or DBA.

So what does this have to do with .NET 3.5 and SQL Server? It's a little known fact that when SQL Server loads a "user" assembly, it will check to see if a similar assembly (ie assembly with the same strong name and version) exists in the GAC (.NET global assembly cache). If the user assembly in SQL Server and the user assembly version in the GAC have the same .NET version number but a different MVID, SQL Server will refuse to load the assembly (from SQL Server's system catalog) and throw an exception.

As pre-requisite information, all of the "main" system assemblies (like System.dll) have a GAC version number of 2.0.0.0 even though the actual version number may be 2.0.50727.42. Or 2.0.50727.1433. You can see this by inspecting the GAC. And...an MVID is a .NET module version identifier. This is different from the version number, even non-strong named assemblies have MVIDs. The module version identifier is a GUID that changes *each time* an assembly is reassembled.

You can prove that the GAC/MVID check occurs by cataloging a user assembly to SQL Server and to the GAC, then recompiling/changing the assembly in the GAC (only MVID, not version number) but not the version in SQL Server. The resulting error after the the GAC'd version is changed is:

Msg 10314, Level 16, State 11, Line 1
An error occurred in the Microsoft .NET Framework while trying to load assembly id 65536. The server may be running out of resources, or the assembly may not be trusted with PERMISSION_SET = EXTERNAL_ACCESS or UNSAFE.... (rest of message elided).

But again, what's this got to do with .NET 3.5? I cataloged two assemblies to SQL Server 2005 prior to installer .NET 3.5. One assembly simply did mathematics (catalog as SAFE), one used an assembly that wasn't from the "approved" list, System.Remoting.dll (catalog as UNSAFE, it brings in about 15 dependent system assemblies that are not on the approved list). User-defined functions as these assemblies were also cataloged and worked fine.

Now install .NET 3.5. This changes the MVIDs (but not the version) of some of the System GAC'd assemblies. The SAFE assemblies that refer only to system assemblies on the "approved list are unaffected". My mathematics assembly/function still worked great. My UseRemoting assembly/function WAS affected. The resulting error is:

Msg 6522, Level 16, State 1, Procedure doIt, Line 0
A .NET Framework error occurred during execution of user-defined routine or aggregate "doIt":
System.IO.FileLoadException: Could not load file or assembly 'System.Runtime.Remoting, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' or one of its dependencies. Assembly in host store has a different signature than assembly in GAC.

(Exception from HRESULT: 0x80131050)
System.IO.FileLoadException:
   at UseRemoting.Class1.DoIt()

So what's the moral of the story? If you stick to the assemblies on the approved list you are fine. Fusion loader will get the latest version for you, no changes are needed, although you want to test your user assembly again in the new environment. But by using assemblies not on the approved list YOU take responsibility for the assembly. After installing a new version on .NET that changes system assemblies that you use, don't forget to DROP your user assembly (this should also drop all the old dependent system assemblies), recompile and retest your user assembly (always a good practice) and then run CREATE ASSEMBLY again. All should be well, because your "unapproved" assemblies are the same version the GAC uses.

As an aside, but an advance warning, SQL Server 2005 does NOT need .NET 3.5. But SQL Server 2008 DOES use .NET 3.5 and installs it as part of SQL Server 2008 install. See previous paragraph.

Hope this helps.

Saturday, March 01, 2008 1:31:15 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Friday, February 29, 2008

It's good to see Ed Katibah open up a blog on SQL Server spatial data. Ed is sort of the "dad" of spatial data in SQL Server; he's got a ton of experience and history in this space. Be sure to catch his posting on what's new in spatial for CTP6.  I've been trying out the new functions and the more I use the spatial types and functions, the more I understand the reasoning behind the dual Geometry - Geography types in SQL Server.

Issac's back at blogging too about spatial indexes, really makes for good reading.

Subscribed!

Friday, February 29, 2008 11:28:28 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Thursday, February 28, 2008

I've been working with Filestream storage in SQL Server 2008 since it appeared in CTP5. The way I've always set it up is to use sp_filestream_configure. During the CTP6 setup process, I noticed you could now configure Filestream as part of setup. Because I knew how to use sp_filestream_configure I skipped that part of setup. And everything just worked as expected when I used it. A friend of mine, not wanting to miss anything, configured Filestream as part of setup. When he tried to use it, the following "Catch 22-like error messages occurred":

EXEC sp_filestream_configure @enable_level = 3;

"The FILESTREAM feature is already configured to the specified level. No change has been made."

CREATE DATABASE ... with a filegroup for FILESTREAM

Msg 5591, Level 16, State 1, Line 1
FILESTREAM feature is disabled.

Huh? After checking the short list of usual suspects (e.g. NTFS file system, running SSMS as admin under Vista) we were both puzzled. Especially because it "worked for me" as it always had. There's nothing more frustrating to hear during problem resolution then "Well, I'm not sure what you did wrong because it works for me". Grrrr...

There is finally a resolution thanks to the storage engine team (confrmed and expounded on by Joanna's blog entry on March 3), and it turns out that the way filestream is configured changed in CTP6 for some very good reasons. You see, sp_filestream_configure sets up Filestream in the OS (requires OS privs) and SQL Server (requires SQL Server privs). Unless you're running sp_filestream_configure as an OS Admin who's also a SQL Server sysadmin, it may not work completely. But sp_filestream_configure is "IN" for one last CTP (CTP6). Gone before RTM.

So filestream configuration was broken into two parts:
1. Configuring filestream at an OS level is moved into setup. Or use SQL Server Configuration Manager. Or WMI scripting (with SMO/WMI) is you like scripting (I do). In the SQL Server Configuration Manager GUI, you configure it by select SQL Server Services (left pane), right-click on your SQL Server service instance and choose "properties" and use the FILESTREAM tab. Note that the equivalent FILESTREAM tab does NOT appear when using Services Control Panel applet. I hope you switched (as you should have) to SQL Server Configuration Manager back in SQL Server 2005 days.

2. Configuring filestream at a SQL Server instance level requires EXEC sp_configure 'filestream access level', '2'. Note that the access level choices in SQL Server are 0,1,2.

3. BOTH configurations steps/setting (OS and SQL Server instance) must be compatible for filestream to work. If, for example, filestream is enabled at the OS level but disabled in SQL Server, you'll get message 5591. See above.

So how did my friend get the "Catch 22 errors"? In CTP6 (but not in future), setup performed step 1, but not step 2. In future setup will do both steps. In CTP6, sp_filestream_configure (to be removed before RTM) will do both steps. But before sp_filestream_configure does anything, it CHECKS to see if either configuration job is already done. Thus the "filestream feature is already configured" message.

Got it? So get used to configuring filestream in both places using SQL Server Configuration Manager AND sp_configure. That's the way of the future. The fact that both exist in one CTP can be frustrating but is understandable. Reminds me of a similar shift when configuring HTTP endpoints (which also requires both OS and SQL configuration) during the SQL Server 2005 CTPs.

Hope this post saves someone a few "grrrr... moments" when using this new useful feature.

Thursday, February 28, 2008 9:58:50 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Tuesday, February 26, 2008

Quite a bit of activity last week in the Visual Studio and ORM spaces.

Visual Studio released a CTP of Visual Studio 2008 support for SQL Server 2008 CTP6. It allows you to connect to CTP6 with Server Explorer, which enables quite a few other features to work. check with Overview document on the download page. SQLCLR projects and T-SQL and SQLCLR debugging work too.

Curiously the Overview document says "using the LINQ to SQL Designer with SQL Server 2008 databases" is specifically not supported. But I've used SQL Server 2008 WITH NO new data types (e.g. DATE, TIME, etc) and it does seems to work. EF beta3 designer too. Maybe they are referring to lack of support for the new data types. Speaking of which...

Faisal Mohamood announced on the ADO.NET team blog that they ARE working on support for the four new date and time types in SQL Server 2008. Hooray. Now let's keep our fingers crossed for SQL Server UDT support.

Julie Lerman wants these types in EF too, but I'm not sure I agree. DATE and TIME would be nice because they're ANSI standard types. DATETIME2 maps favorably to DateTime in EF, actually better coverage of the value space than DATETIME. Maybe what she really means is that she wants the SSDL and MDL and the designer to be aware of them and generate the closest possible mapping. Even if that's "string".

The distinction is that the ADO.NET Entity Framework is meant to be database neutral, not SQL Server specific, and supports a discrete subset of database data types not a 1-1 mapping of EF types to SQL Server types. LINQ to SQL, OTOH, is SQL Server-specific and should support a 1-1 mapping. At this point I'm really thinking they should have called it "LINQ to SQL Server". Not that the *concept* can't be extended to other databases (LINQ to DB2, LINQ to Oracle, etc) but these databases will have to implement their own. The fact is that LINQ to SQLCE does exist and is separate from LINQ to SQL (Server). And other vendors may follow with their own, database-dependent implementation of a light LINQ layer. But speaking of SQLCE...

The SQLCEBLOG announced the beta of SQLCE 3.5 SP1 with ADO.NET Entity Framework Beta 3. There was support for EF in SQLCE a while ago, but it was postponed in the 3.5 release which sports LINQ support. Now it looks like Entity Framework support is back. Hooray. BUT, although a common programming model between SQLCE and a subset of SQL Server functionality is a great idea, IMHO the best and most lightweight programming model for SQLCE uses SqlResultset. Still the winner.

Oh yes, BTW, SQL Sevrer 2008 CTP6 was released last week. But that's been posted on EVERY SQL Server related blog in the world. Congrats for another CTP folks. Sparse columns, filtered indexes, and filtered statistics are in as well as quite a few other features. 

Tuesday, February 26, 2008 11:48:12 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Thursday, February 14, 2008

Sometime before SQL Server 2008 RTM, the libraries for LINQ and LINQ to XML (that is System.Core.dll and System.Xml.Linq.dll) but NOT LINQ to SQL will be added to the SQLCLR "approved" assembly list. The approved assembly list is the list of .NET Framework libraries that have been tested in a SQL Server-hosted environment, annotated their code with HostProtectionAttributes where needed, and can be used in user assemblies that can be cataloged with "PERMISSION_SET = SAFE". That means you can use 'em in SQLCLR user-defined functions, procs, and other SQLCLR database objects.

Being curious, I asked "why LINQ and LINQ to XML but not LINQ to SQL"? The answers I got were "LINQ to SQL not tested in this environment" and "not sure there should be yet another way to access the database with .NET". Fair enough. Although, its never seems to be a problem to provide yet another way to mix SQL Server and XML. The number of different ways now exceeds the number of fingers on one hand.

OTOH, SMO doesn't appear to be added to the approved assembly list in SQL Server 2008. Nor has WCF. Oh well.

Enjoy...

Thursday, February 14, 2008 4:58:31 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

I often hear those who identify themselves as "database programmers" (sometimes I think these may be folks who program only inside-database code), say: "I'd give learning LINQ and/or LINQ to SQL/Entities or Entity SQL (eSQL) a go in my sparse time, but I don't have a good book or a tool.

I haven't read a book on either of these yet, spending most of my time in BOL. BOL is a good start (but not comprehensive) for LINQ and LINQ to SQL. eSQL docs are pretty sparse, but that's because its not a shipping product yet. Maybe things will improve. (Note: if anyone wants to send me a copy of their book to read in my spare time, I'll give it a go ;-).

Following LINQ is more difficult than EF (modulo LINQ to Entities) because there's many implementations/dialects. There are operators that LINQ (over objects) can do, but LINQ to SQL or LINQ to Entities cannot.

On to tools:

For LINQ, I like LINQPad by Joseph Albahari. Simple to set up, although the queries are not *exactly* the same as you'd write in a program. And it doesn't support LINQ to Entities yet (or LINQ to DataSet for that matter). But, all in all, an excellent tool.

For eSQL, there's no comparison to Zlatko Michailov's eSqlBlast. This is not only an eSQL practice query processor, but also a visualizer for the results you'd get back. You just type in the eSQL. Even comes with source code. Latest version is on CodePlex. This is straight eSQL, no LINQ to Entities support here either.

Neither one of these has Intellisence or statement completion, I believe.

Enjoy...

Thursday, February 14, 2008 4:24:56 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

This post covers LINQ to SQL and EF worry #2. That is:

LINQ to SQL and EF will encourage "SELECT * FROM..." style coding because you get back a nice, clean object this way instead of the less-useful anonymous type you receive by doing projection. This will also make most/all covering indexes useless.

LINQ to SQL and EF can return something other than a whole object instance. Here's an example:

// This returns a collection of authors
var query =  from a in ctx.authors
             select a;

// this returns an anonymous type
var query =  from a in ctx.authors
             join ta in ctx.titleauthors on a.au_id equals ta.au_id
             join t in ctx.titles on ta.title_id equals t.title_id
             select new { a.au_id, t.title_id };

The collection of authors returned by the first query is updatable and a reference to it can be used outside the function in which its created. The anonymous type is not updatable and cannot be used outside the function in which its created (there is a nice workaround for this, but even the author of the workaround describes it as a bit of a hack). Although I can see a use for anonymous types in data-binding the good ol' dropdown list. But because they're be can't insert a new row unless you include all the columns you need, perhaps not dropdown comboboxes.

You don't necessarily need to return an anonymous type. You can define (by hand) a class that represents the projection of authorid and titleid. Or use a view. But, in order to do this on a large-scale project, you'd need to define a class for each projection in the entire project. Just for fun, I ask my students "do you know every projection (rowset) that every query in your project returns"? Can you even enumerate them? No one's answered "yes" to that question yet. So anonymous type (or much extra work). Maybe someone will write a tool to do this some day. Or "whole objects"...aka SELECT * FROM.

Counter to this is perhaps you SHOULD know every projection your project returns. Would certain help in tuning to know them all.

BTW, this is the same issue you'd run into using stored procs that return rowsets with LINQ to SQL; they return anonymous types. Using rowset-returning procs with EF *forces* (in beta3) you to define a class to contain the rowset produced. That's good. But EF can't make use of procs that return more than 1 rowset (SqlDataReader.NextResult() in ADO.NET). LINQ to SQL can use these, you get multiple anonymous types.

So simple 1-rowset sprocs with EF, or custom classes for every projection (and sproc in LINQ to SQL) it is. Or nice, clean, full objects (using SELECT * FROM). And what about covering indexes? An overly-simplistic definition would be "non-clusted index defined over the set of columns used by one table in a projection". Makes for some nice query plans (and sometimes even helps with concurrency issues in SELECTs). But if we're always doing SELECT * FROM, flush those covering indexes. The only index that matters is the (possibly) clustered index on the base table.

A counter-argument to this is that you really shouldn't define a covering index for every projection just because you can. Any index takes space and affects insert/updates/deletes, there's a tradeoff here. A second counter-argument is that if you need many, many covering indexes, perhaps your database isn't as well normalized as it could be. I'm not really sure I buy this argument.

MHO: This is a for the mostpart a valid worry.

Thursday, February 14, 2008 3:55:51 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Wednesday, February 13, 2008

I'd been meaning to write this entry for a while, about my opinion on LINQ to SQL and the ADO.NET Entity Framework and performance. I've just finished reading the 2-part blog series "Exploring the Performance of the ADO.NET Entity Framework", and was surprised (I guess) that the database performance aspect was barely mentioned.

One way to look at performance is to examine and profile the ADO.NET code, but because both EF and LINQ to SQL are T-SQL code generators (EF is not database-specific, but I'm only talking about SQL Server here, so far), another way to talk about performance is to examine the generated T-SQL code. I want to look at the generated T-SQL code side.

In this posting I'm going to list my top worries WRT the code generated by these APIs. And try and argue for and against the worries at the same time. Here's the list.

1. LINQ to SQL and EF will proliferate dynamic SQL, and will almost surely produce suboptimal dynamic SQL, causing database performance problems and plan cache pollution. And dynamic SQL is evil to start with.

2. LINQ to SQL and EF will encourage "SELECT * FROM..." style coding because you get back a nice, clean object this way instead of the less-useful anonymous type you receive by doing projection. This will also make most/all covering indexes useless.

3. LINQ to SQL and EF will encourage "SELECT to middle-tier, then update/delete" rather than issuing direct SQL UPDATE/DELETE statements that are set-based.

4. LINQ to SQL and EF will write code that gets too much or too little data at a time. This is a variation of #2. Too much data in one query is a waste. Too little data is also bad because it means extra database round-trips.

5. LINQ to SQL and EF will discourage the use of stored procedures to return rowsets; returning rowsets in stored procedures is *thought to be* superior to returning them in dynamic SQL. In addition, the code generators will use a subset of T-SQL query constructs, that is, only the constructs that the LINQ or ESQL language supports, rather than the full power of the T-SQL query language.

6. LINQ to SQL and EF queries will be "untunable" because, even if you discover a performance problem, you can't change the underlying API code to produce the exact SQL query that you want. There's too many layers of abstraction to change it.

7. Other arguments? I'll accept other points to argue/worry about...

Here the argument about worry-point #1. Other worry-points in upcoming posts.

It's interesting to see the number of folks who do query tuning for a living salivating over the prospect of tuning the bad, bad, bad queries that will assuredly result from these two data access stacks. And the number of DBA-types who'd like to "ban LINQ/EF and databases in their companies". It's also interesting to note that most people who profess a dislike for the generated code, have never seen (or seen very little of) the generated code. Usually, when someone sites a particularly bad instance of generated code they're never able to tell me if the code came from LINQ to SQL or EF. So I'd like to open a clearinghouse for LINQ to SQL and EF queries that generate really poor SQL. Send them to me at my SQLskills email address, together with enough info/data to reproduce it. Conor Cunningham's blog has begun addressing the question of LINQ's IN operator and SQL cache pollution, by writing about IN and SQL plans.

So far, its been my experience that LINQ to SQL, being more relation-centric, will in general generate code that's closer to what a good T-SQL programmer would generate. EF is more "object-centric" and sometimes generates SQL thats meant to construct object graphs. But neither one of them (that I can deduce) can generate a full outer join.

About dynamic SQL. It's almost dogma amongst database programmers that "static SQL" in stored procedures is better for security than dynamic SQL constructed via string concatenation. Besides the obvious association between dynamic SQL and SQL injection, using dynamic SQL means that all users must be given access to the underlying tables, unless you strictly use LINQ to SQL/EF with views and sprocs. LINQ to SQL and EF make every attempt to use parameterized SQL, and LINQ to SQL claims to have minimized/eraticated the potential for SQL injection when using their code. And remember, many programmers use dynamic SQL currently, LINQ to SQL would be an improvement for them.

More about plan cache pollution. I'll start by recommending Sangeetha Shekar's blog series on the plan cache. The starting entry of the 17-part series is here.  Except for the concept of "many different projections when one stored procedure will suffice" there's no cachability difference between parameterized SQL and a stored procedure. Non-parameterized SQL suffers a slight cachability difference until its reused. Because LINQ to SQL and EF are code generators, its likely that they may generate more homogenous SQL than programmers who use SqlCommand.CommandText. And programmers that use dynamic SQL (the ones most likely to use only LINQ to SQL/EF) are likely causing plan cache pollution right now in any case.

MHO.

Wednesday, February 13, 2008 5:47:20 PM (Pacific Standard Time, UTC-08:00)  #    Comments [4]  | 

This entry is a continuation of the previous posting on how to use SQL Server 2008 spatial data and Virtual Earth. This entry discusses your web service and SQL Server code choices.

The point of the web service is to translate one of the output formats of SQL Server 2008 spatial to a format this Virtual Earth can use. The spatial data types can be output in:
1. Well-known text format
2. "Native" format - that is, return a Geometry/Geography type to a .NET client as the SqlGeometry/SqlGeography native .NET types
3. GML - Geography markup language

Remember that Virtual Earth needs:
1. GEORSS or
2. KML
3. Custom XML-based format (and custom Javascript code to process it)
4. Javascript strings (to pass to eval())

There is not a one-to-one correspondence between any of these methods, so there is going to be a need for a transformation somewhere. In addition, a SQL Server Geometry or Geography type can contain a collection of points, linestrings, or polygons. There may also need to be a one-to-many translation. This is facilitated by methods STNumGeometries/STGeometryN and STNumPoints/STPointN.

Finally, there could be a pushpin (point) associated with the center of a polygon and/or text associated with the pushpin. Where the text comes from will be application-dependent.

Because there are many ways to process XML in SQL Server and in web services, that's lots of choices. Some that come to mind are:
1. Transforming a SqlGeometry/SqlGeography from a SqlDataReader to a custom XML format
   Javascript code in the web page processes the custom format.
2. Using SQL Server's XQuery to transform GML into GEORSS or KML
   This is returned to the web service as a SqlXml data type.
   Consumed directly as a layer in Virtual Earth.
3. Using an XSLT transform (or LINQ to XML) to transform GML to GEORSS or KML. This can occur in the web service or even a SQLCLR procedure, although this type of processing should really be accomplished in the web service.
4. Using SELECT ... FOR XML PATH to construct GEORSS or KML directly.
   Similar processing to choice #2 above.
5. Transforming a SqlGeometry/SqlGeography from a SqlDataReader to Javascript strings.
   Web page simply calls eval()

Hopefully this posting will elucidate the choices you have for this rendering. Depending on which method you choose, there could be quite a few "moving parts" (a la Rube Goldberg) in the solution.
- Web pages with custom Javascript code
- Web service to supply the data
- HTTP Handler to allow Virtual Earth to use remote GEORSS/KML
- Data access code in SQL Server
- Transformation code in web service or SQL Server

So.....what's your favorite method for using SQL Server 2008 spatial data with Virtual Earth?

Wednesday, February 13, 2008 3:53:53 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

Last week at the SQL Server 2008 Jumpstart I showed an example of using SQL Server 2008 spatial data types with Virtual Earth. I showed a single coding style. There really is a plethora of coding styles to using these together, and I'd like to describe some of the most common ones. I'd like to do this without much code for now, because an end-to-end description sometimes results in the graphic aspects (which other controls to use, how to build Ajax web pages, etc) seem to get in the way.

I'm not even going to mention using Virtual Earth to serve tiles directly, but only the Virtual Earth web control. Start with the Virtual Earth Interactive SDK for coding information.

In general, what we're after is:
   Page contains Virtual Earth control
   Page makes a request to a Web Service for data (usually Ajax)
   Web Service get and processes data from database, returns data
   Page uses Virtual Earth calls to present the data

The first two are straightforward, although the Virtual Earth control is available only online via a URL. This makes testing when you don't have internet access virtually impossible. The Ajax request (that is, call a URL with an XML payload using a browser-specific mechanism, wait asynchronously for the response) normally doesn't use SOAP-based Web Services, because you'd have to add and remove the SOAP headers. It's usually XML-in, XML-out, sans SOAP headers.

The Virtual Earth calls depend on what version of Virtual Earth control you're using. I'll show this using version 6.
1. You can render using shapes directly
2. You can import GEORSS into a shape layer
3. You can import KML into a shape layer

Virtual Earth supports visualizing shapes directly (#1 above) by using:
1. AddPushpin - to visualize a point or the centroid of a polygon
2. AddPolyline - that's Linestring in SQL Server 2008
3. AddPolygon - for polygons

For shape layers (#2 and #3 above), the code looks something like this:

var l = new VEShapeLayer();           
var veLayerSpec = new VEShapeSourceSpecification(type, local.xml, l);

map.ImportShapeLayerData(veLayerSpec, onFeedLoad);

The "type" in the VEShapeSourceSpecification constructor can either be GEORSS or ImportXML (that's KML). You can also have a Live Search Maps Collection but likely that one won't come from SQL Server.  Bear in mind that the VEShapeSourceSpecification constructor needs a *local* XML file URL. If you don't want to use a local XML file, you can build an ASP.NET HTTPHandler that uses redirection to pretend a remote XML file is local (see Mike McDougall's article and code to accomplish this).

One last twist that I've seen is that you can build your Javascript code directly in each page OR make the Web Service program reformat your data into Javascript calls directly. Once you're back at the page (the Web Service passes back a string of Javascript) you simply call Javascript's "eval" function. This makes the web page client-side code easy, but the Web Service code lots more complex, as you're generating code, not points, lines, and polygons.

Web Service choices follow next....

Wednesday, February 13, 2008 3:52:09 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Tuesday, February 12, 2008

SQL Server 2005 introduced the concept of endpoints. Every connection entry point into SQL Server is abstracted as a endpoint. You can see all the endpoints on your SQL Server instance by using the sys.endpoints metadata view. In addition to those you'd expect (for TCP/IP, Named Pipes, Shared Memory) there are also endpoints defined for the dedicated admin connection, Service Broker, Database Mirroring...and HTTP.

The endpoint concept is still with us, but HTTP endpoints, those endpoints that allow you to expose SQL Server procs and functions directly as HTTP Web Services will be deprecated in SQL Server 2008. Upon creating an HTTP endpoint (aka SOAP endpoint) in SQL Server 2008, you'll be warned that "this feature will be removed in a future version. You should not use these in new development work." This was announced at the SQL Server 2008 Jumpstart last week.

So what replaces this functionality (and how many folks actually used it in SQL Server 2005). The likely successor seems to be ADO.NET Data Services, a mechanism to produce RESTful services over any type of data, using Entity Data Model or LINQ to SQL. The model you expose through data services doesn't really have to be a database at all.

And for those of you that liked the idea of exposing your stored procedures as services, ADO.NET Data Services supports "Service Operations", which can be parameterized and contain any type of logic you wish. Service Operations aren't limited to stored procedures, but could be used to encapsulate them.

Tuesday, February 12, 2008 5:08:32 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 

Wow. It's already Tuesday and I'm still recovering from last week. I've been laying low for the last six months or so, writing a few (OK, quite a few) blog entries on SQL Server 2008 topics. But I'd really been working on my next SQL Server course (on 2008, of course), that premiered last week as part of a program known as SQL Server 2008 Jumpstart.

Simon Sabin (the newest SQLskills associate) and I presented to a good-sized crowd of Microsofties and trainers from everywhere. We did the developer track; there were also DBA and BI tracks by my well-known SQLskills associates. You'll see this course in a lot of different venues in the next few months, but it won't always be us that's doing the presenting. Just so many places we can be at once...

We covered everything from new T-SQL features to Spatial Data and Indexes and everything in-between. In addition to the SQL Server features, there was an entire day devoted to client APIs and dev tools; not only ODBC (yes, I did an ODBC demo in C++) and ADO.NET, but also the Entity Framework, ADO.NET Data Services, Occasionally Connected Apps.

So, for the next few weeks, I'm going to be writing down my thoughts on questions (some of them rather pointed) that arose about the topics we presented. Some of the post will be facts, answers to questions...but they'll be some opinion pieces as well (marked with MHO, that's my humble opinion).

Tuesday, February 12, 2008 4:48:59 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Monday, January 21, 2008

In a previous blog entry a while ago, I wrote about writing a single SQL statement that did a SQL MERGE operation, used the OUTPUT clause to put out a rowset and directed the rowset into an INSERT statement. This happens in one statement without the need of explicitly defining a temporary table and using multiple SQL statements.

Turns out that this feature has a name: Composable DML. I've also heard it called (in SQL Server Books Online) "DML table source".

I've also heard the term composable queries, both in references to both Entity Framework functions and LINQ to SQL. They even have an "IsComposable" attribute in the Function definition) In this context, it means that the output of a function that produces a rowset (in this case a SQL Server table-valued function) or code that produces a rowset can be combined with further queries that do more filtering or projection, or even retrieve related rows. When the queries are submitted to the database, they are composed into a single SQL statement.

The point is pretty similar, reduce the resources required to accomplish a set of operations by reducing the number of statements or database round-trips needed.

Monday, January 21, 2008 4:30:03 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

More about Service Broker priority in SQL Server 2008.

The books online states, when setting a initiator/target priority as local service name/remote service name that the priority affects:

1. Sends from the initiator queue
2. Receives from the initiator queue
3. Getting the next conversation group from the initiator queue.

And specifies the mirror image priority for target/initiator as local service name/remote service name affects:

1. Receives from the target queue
2. Sends from the target service
3. Getting the next conversation group from the target queue

So, in my previous example, where

The initiator/target as local service name/remote service name is priority 7
The target/initiator as local service name/remote service name is priority 6

Let's send a message from the initiator that "sticks" in sys.transmission_queue. Suppose that either the network is slow or we don't have the appropriate setup. But the message is floundering in the transmission queue. Already SENT, waiting to be delivered.

"select * from sys.transmission_queue;" shows the message BEING SENT (and transmitted) at priority 7. And, when the message is finally DELIVERED, the priority in the receiveq "select * from receiveq" is priority 6.

Normally, you'd likely make both priorities in the initiator/target pair the same to avoid confusion, but you can make them different for greater granularity. And it DOES help to illustrate where and when priority is used.

Monday, January 21, 2008 3:32:45 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

Conversation priority is a new feature with SQL Server 2008. In a previous blog post, I talked about how to simply set one up. But you'd usually not want to set up a priority for all messages. So, lets set up a simple service and then define a "Premier Customer" priority. Before trying this you need to make sure the database is set to honor broker priority.

ALTER DATABASE pubs SET HONOR_BROKER_PRIORITY ON;

Suppose I had a simple (one database) service pair. A service called "sender" (note lower case) using senderq is the initiator and a service called "receiver" using receiverq is the target. They use the [default] contract. So we can set up a "Premier Customer" by contract.

CREATE CONTRACT premier ([DEFAULT] SENT BY ANY);
GO

Now, let's ensure that the target can also use the premier contract in addition to the DEFAULT contract.

ALTER SERVICE receiver ON QUEUE receiveq (ADD CONTRACT premier);

We need to set up the priority.

CREATE BROKER PRIORITY PremierCustomer
    FOR CONVERSATION
    SET (CONTRACT_NAME = premier,
         LOCAL_SERVICE_NAME = sender,
         REMOTE_SERVICE_NAME = 'receiver',
         PRIORITY_LEVEL = 7);
GO

The interesting thing is, this priority ONLY covers messages sent and received (and get conversation group) by the sender/senderq, IT DOESN'T cover the receives in the receiver/receiveq queue. So we'll set up a mirror image priority, and make the priority level still higher than the default (5) but gratuitously different than the other priority

CREATE BROKER PRIORITY PremierCustomer2
    FOR CONVERSATION
    SET (CONTRACT_NAME = premier,
         LOCAL_SERVICE_NAME = receiver,   -- local service is receiver
         REMOTE_SERVICE_NAME = 'sender', -- remote service is sender
         PRIORITY_LEVEL = 6);                   -- note level 6, not 7
GO

Now, begin a conversation using sender and receiver and the premier contract and send a message. Note that when you issue "SELECT * FROM sys.conversation_endpoints;" the sender (initiator's) endpoint is priority 7, the receiver's endpoint is priority 6. And, looking at the message in the receiveq its set to priority 6.

If you'd send a return message from the receiver (target) back to the sender (initiator), the return message in the senderq would be priority 7.

So when setting up priorities, if you want all messages on both sides to be affected, don't forget to set up a priority for EACH side. Here's the entire demo.

BrokerPriorityByContractDemo.zip (1.57 KB)
Monday, January 21, 2008 3:31:03 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Thursday, December 27, 2007

One last SQL syntax post for the evening...

We've all heard about SQL Server 2008 row constructors. They allow syntax like this to work:

CREATE TABLE name_table (name varchar(20), age int);
go
INSERT INTO name_table VALUES ('Bob', 54), ('Mary', 30), ('Sam', 15), ('Buddy', 9);
go

But how about using them as a table source:

SELECT n.name, n.age, tab.species
FROM name_table n
JOIN
(
  VALUES ('Bob', 'person'), ('Mary', 'person'), ('Sam', 'cat'), ('Buddy', 'cat')
) tab (name, species)
ON n.name = tab.name;

You specify a table alias and name the columns, and its just another (synthesized on the fly) table.

Thursday, December 27, 2007 11:07:25 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 

To round out the new plan guide-related features in SQL Server 2008, there is a new way to express a table hint that increases the plan guide's reach.

The sp_create_plan_guide procedure requires a hint as the last parameter. This can be in a form OPTION (hint), just an XML query plan (in SQL Server 2008 "OPTION (USE PLAN)" isn't required), or NULL. Specifying NULL can be used to "subtract" a hint from an existing query where the hint is hardcoded in source code you don't have the ability to change.

This requirement limited plan guides to hints that could be used in an OPTION clause, and table hints didn't qualify because they are inline. In SQL Server 2008 this limitation is removed. You can specify a table hint by using the syntax OPTION (TABLE HINT (table_alias, hint)). This makes table hints usable in plan guides. You can even specify multiple hints on a per-table basis.

OPTION (TABLE HINT (table_alias1, hint1), (table_alias2, hint2)...)

BOL indicates in a few places that this syntax exists specifically to make table hints accessible to plan guides. Cool.

Another use for an analogous syntax that's not in SQL Server 2008 would be to be able to specific per-table JOIN hints the same way. You can currently specify only one JOIN hint in the option clause. An analogous way to specify multiple, per-join JOIN hints in an OPTION clause would be nice. But, if you're controlling the plan that closely, perhaps its best to resort to plan forcing.

BTW, manditory disclaimer: Hints are (usually) evil and should be used only as a last resort. Using a USE PLAN hint seems, at least to me, similar to writing custom navigation code for each query in IMS/DB, as I did in the '80s. You're throwing the query algebrizer/optimizer away...or at least pinning its hands.

Thursday, December 27, 2007 10:55:03 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

SQL Server 2005 introduced the OUTPUT clause in DML statements. With the OUTPUT clause, you can output column values to a table variable, a table, or return these values to the user. You can even use multiple OUTPUT clauses to push the values into both a table variable and a table from the same statement.

In SQL Server 2008 there is an additional option. You can use your OUTPUT column values directly in an INSERT-SELECT statement. Here's what it would look like, using MERGE with an OUTPUT clause (and an example from one of the early webcasts):

CREATE TABLE Stock (Stock VARCHAR(10) PRIMARY KEY, Qty INT CHECK (Qty > 0));
CREATE TABLE Trades (Stock VARCHAR(10) PRIMARY KEY, Delta INT CHECK (Delta <> 0));
CREATE TABLE AuditChanges (Action varchar(6), Stock VARCHAR(6), Qty INT);
GO

INSERT Stock VALUES('MSFT', 10), ('BOEING', 5);
INSERT Trades VALUES('MSFT', 5), ('BOEING', -5), ('GE', 3);
GO

INSERT INTO AuditChanges
SELECT * FROM
(
MERGE Stock S
  USING Trades T
  ON S.Stock = T.Stock
  WHEN MATCHED AND (Qty + T.Delta = 0) THEN
    DELETE
  WHEN MATCHED THEN
    UPDATE SET Qty += T.Delta
  WHEN NOT MATCHED THEN
    INSERT VALUES(Stock, T.Delta)
  OUTPUT $action, T.Stock, inserted.Qty
) tab (action, stock, qty);
GO

Notice how the OUTPUT clause requires a table alias (in this case "tab") and needs to name the columns returned from OUTPUT.

Thursday, December 27, 2007 10:23:32 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

One of the nifty new SQL Server 2005 features was called "plan forcing". You could acheive plan forcing by using the USE PLAN query hint but this made such a hideous query that the better way to do this is to use a plan_guide. A plan_guide is a named database object (like table or view) that associates a query hint with a particular query.

In SQL Server 2008 there have been some enhancements made for plan_guides and plan forcing. The biggest change is that you can create a plan from the plan cache directly, rather than using the rather long set of steps that you needed in SQL Server 2005. You only need a few lines of boilerplate code, setting up and invoking sp_create_plan_guide_from_cache, and the plan guide is automatically created. See SQL Server 2008 Nov 2007 for details and copious examples. The concept of creating a plan guide from a query plan already in cache is known as "plan freezing".

You can also validate that a plan guide will still be used (after dropping an index, for example) by using the new sys.fn_validate_plan_guide function. In addition, its easy to see if a Plan Guide is being used or not with the profiler trace events "Plan_guide_successful" and "Plan_guide_unsuccessful", located under the "Performance" category. These can also be monitored over time with two new perfmon counters "Guided Plan Executions/sec" and, what has to be the most humorous perfmon counter "Misguided Plan Executions/sec".

One other plan guide enhancement that doesn't seem to be mentioned in BOL is that you can use plan guides for INSERT, UPDATE, DELETE, or even MERGE statements in SQL Server 2008. This feature works in the SQL Server 2008 Nov CTP (CTP5).

As always the standard disclaimer is "query hints are (usually) evil" but if you're desperate enough to need plan forcing or plan freezing, its good to know that its there.

One reason why I've seen people use plan freezing is as a hedge against plan instability that can result from what's known as "parameter sniffing". Parameter sniffing is usually a good thing; the query processor sniffs the parameter values the first time you execute a parameterized stored procedure and uses the statistics based on those parameters accordingly. However, if your "first time execution" was based on uncommon parameter values, you're now stuck with a suboptimal plan. Plan freezing can be use to freeze a plan that's most optimal and "override" parameter sniffing. You just have to catch such a plan in the cache.

The drawback is, of course, that you are stuck with the same plan even if the statistics change. I'm sometimes accused of liking plan guides and plan forcing too much, but perhaps it's because I'd oftentimes get a call when parameter sniffing (or a service pack: yes, it does happen every now and again) changed to a query plan that is suboptimal. For those trouble calls (ie, "my query just starting running 4 times slower today and the plan is different from yesterday") plan_guides/forcing let you change the plan without changing the underlying code. When the fix arrives, you just disable the plan guide to let the optimizer go on its merry way.

Thursday, December 27, 2007 2:26:20 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Wednesday, December 26, 2007

After getting insert/update/delete stored procedures going with a simple standalone table (no relationships), I started thinking about how I'd do this in a many-many relationship. Let's say that I have students and classes, with a many-many relationship represented by a studentclass table with only a student_id and class_id. EDM maps this to a many-many relationship, leaving the "join table" out of the conceptual model (no studentclass entity). That's what I'd want. But...how to map insert/update/delete on the studentclass table, because there is no entity for it?

Turns out that this is OK, because mapping insert/update/delete to sprocs is not exposed on the entity (CSD layer), but on the mapping (MSD) layer. And, sure enough, there is an ModificationFunctionMapping element exposed under AssociationSetMapping, as well as under EntityTypeMapping. The designer doesn't support this yet, but the raw XML schema does.

In addition, the Entity Framework "Help Overview" file does mention "Mapping Association Sets to stored procedures". It doesn't help matters by showing an example of Categories and Products in the Northwind sample database, where the Categories and Products tables don't have a many-many relationship or a join table. Do I *need* stored procedures on a zero-or-one-many relationship as well as on the "base" tables that comprise the relationship? I can insert and delete a relationship between an existing product and an existing category, so I might need these in addition to the stored procs mapping the base tables, in case these aren't already enforced by key constraints in the database and model? Doesn't seem like I'd need them at all, unless I specifically left the appropriate key constraints out of the database, but wanted them in the mapping and conceptual model.

But the Help Overview specifically mentions "mapped to a join table in a relationship", so many-many sounds like to the relationship type (AssociationSet cardinality) I'd need stored procedure mappings for.

So there IS a way to do it with many-many.

Wednesday, December 26, 2007 2:13:40 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 

I've been working with the new ADO.NET Entity Framework designer for a few weeks now, and I've got to like it. If you're used to the LINQ To SQL Object Relational designer, in which all the action takes place in the "diagram pane", it takes some getting used to. The ADO.NET Entity Framework designer actually uses three panes, the Entity Designer pane, the Model Browser pane (which docs itself in the Solution Explorer group at the right in my layout), and the Mapping Details pane (which docs itself in the Error List group at the bottom in my layout) pane.

In general, you can work with Entities (the CSDL) in the diagram pane or Model Browser. You can work with the store (database, the SSDL) in Model Browser, and the mapping (MSL) in Mapping Detalis. The Properties Window displays the property of the currently selected item, as with most things in Visual Studio.

Knowing which pane to work in makes it easier to figure out how to perform operations like mapping INSERT/UPDATE/DELETE to stored procedures. In the download announcement (which lists the features by pane) one the the new features was "View/Edit mappings using stored procedures (insert/update/delete)". The Entity Model Tools document seemed to indicate "not yet supported". I remember doing this "by hand" in earlier betas and it was fairly painful to hand-edit the XML file(s). I used the Model Browser to create a Function Import for my stored procedures and was ready to go...well...

Mapping the stored procedures to the insert/update/delete procedures in the Entity Designer (my LINQ to SQL experience got in the way here) and trying the Model Browser because the Mapping Details didn't seem to know about it, I finally located the little buttons on the extreme left side of the Mapping Details pane. The bottom button, "Map Entity to Functions" brought up an alternate Mapping Details pane, where I could map the stored procedures. Whew.

Maybe I'm "GUI impaired" but after I thought about it a while, where else would this function fit in the designer? Like I said, the EF/EDM is rich enough to make the designer quite complex. But after adding an ordinary "read" stored procedure (ie, not for insert/update/delete) to the model, and the XML hand-editing that this entails, I'm HAPPY to have to designer. Keep adding those new features...

Wednesday, December 26, 2007 11:55:13 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Thursday, December 20, 2007

Since its appearance in SQL Serve 2008 CTP5, folks have been wondering about the reason for SQL Server's new Change Tracking feature, and how it differs from Change Data Capture (CDC). The CTP5 BOL has a nice writeup about how it differs from CDC, so I'd encourage you to start there. But about "why it's there...".

The main reason for change tracking, as far as I can see, is to be used with ADO.NET Sync Services. ADO.NET Sync Services is part of the Microsoft Sync Framework and is a way to do (I'm trying to summarize here) "client directed programmable replication". For database folks ADO.NET Sync Services can be thought of as providing synchronization between SQL Server Compact Edition and any relational datastore, but the model is extensible. ADO.NET Sync Services 1.0 shipped with SQL Server Compact Edition 3.5 in Visual Studio 2008 (there may be other ship vehicles I'm unaware of) and provides 2-tier, 3-tier, or service-oriented synchronization. It provides hub-and-spoke synchronization. ADO.NET Sync Services version 2.0 will add (the CTP is out now) peer-to-peer synchronization. Microsoft Sync Framework (CTP 1 is out now) includes sync support for data stores that aren't necessarily databases. The best place to go to find some highly informative examples that illustrate the synchronization patterns supported by the model is "The Synchronizer's" (Rafik's) blog.

What's this all have to do with SQL Server 2008 change tracking? Although ADO.NET Sync Services 1.0 is a good start (and there's GUI-based designers in VS2008), it usually means adding timestamp or datetime columns and tombstone tables (tables that track the primary key of deleted rows) to existing database tables and triggers to populate the information sync services needs. That's fairly intrusive, especially with packed applications. SQL Server 2008 change tracking takes care of all of this for you.

You turn on change tracking on a database with ALTER DATABASE and on individual tables with ALTER TABLE. Change tracking does the rest. You can access the information you need for Sync Services applications using the CHANGETABLE table-valed function and a few related functions. You usually want to turn on ALLOW_SNAPSHOT_ISOLATION in the database as well, because change tracking works by tracking when a transaction is committed, rather than when its started.

Change tracking provides the information ADO.NET Sync Services needs to answer request such as "what rows have changed since my particular client (change originator is tracked by change tracking as well) last synchronized" and "have I synchronized with the main database so long ago (change tracking info has a DBA-specified retention) that I need to sync the entire table because the incremental info I need is no longer available".

Currently the VS2008 designers (they're accessed by Added a "Local Database Cache" item to a programming project) don't have an option to indicate "just use SQL Server 2008 and I'm using change tracking", so you have to code the synchronization procedures to use change tracking with Sync Services by hand. Perhaps such a feature is in the works for ADO.NET Sync Services 2.0.

And how about SQL Server Compact Edition (currently the only "ClientProvider" that ship with Sync Services 1.0). Well the SQL Server Compact Edition, Sync Services "just works". No special setup is required, although you may (I haven't determined this) need SQL Server Compact Edition 3.5. This version of SQLCE ships with VS2008. One last thing...ADO.NET Sync Services 1.0 isn't available for compact devices yet, so when you sync with SQL Server Compact Edition, it must be deployed on the desktop.

Thursday, December 20, 2007 12:08:30 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Saturday, December 08, 2007

As a continuation of the previous discussion, here's a DELETE of a title row without fetching it from the database:

pubsEntities model = new pubsEntities();
titles deletetitle = new titles();
deletetitle.title_id = "BU9994";
deletetitle.EntityKey = new EntityKey("pubsEntities.titles", "title_id", "BU9994");
model.Attach(deletetitle);
model.DeleteObject(deletetitle);
model.SaveChanges();

In this case, you need to populate both the property (title_id) that corresponds to the primary key and also populate the EntityKey itself. Attach the object to the store (note that all the original values need not be specified), delete the object from the store, and save the changes. This bit of trickery convinces the ObjectStateManager that the object was in the store all the time, that you'd fetched it from the database. Then you deleted it from the store, so they'll delete it from the database.

Of course, if you have associated titleauthor rows, you'll get an error trying to delete that title. As you would in the relational database. Of course, you can put an ON DELETE CASCADE in the database or delete all of the associated rows first. SQL Server 2005 also allows ON DELETE SET NULL and ON DELETE SET DEFAULT, but you relational data model has to allow this; be careful about using these. You can synthesize and delete all the associated rows and the parent in a single round trip if know what the rows keys are in their entirety, but how do you delete a set of rows in a single statement without fetching or knowing all of the primary key pieces (like the associated au_id in titleauthor table)? Better yet...

UPDATE without fetch is even going to be even more interesting, because you need to know the original row values. Or how about updating/deleting a set of rows based on a non-key column, like "UPDATE titles SET royalty = NULL WHERE ytd_sales IS NULL"? Alex James, Metadata PM of the EF team has posted the first in a series of blog entries on doing just these types of things with EF...it will be interesting to see how far you can take this.

Saturday, December 08, 2007 8:36:17 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

The ADO.NET Entity Framework Beta 3 was released this week. For details, see the ADO.NET team blog.

One of the things that bothered me originally about EDM was that Entity SQL in V1 doesn't contain INSERT, UPDATE, or DELETE statements. Only SELECT. So to update or delete a row, I'd have to fetch it first. Let's start with something simple, like an INSERT with a foreign key constaint?

All of the "add" examples I've seen always insert a new customer and new order and new order details at the same time. But what if your customer already exists?

As an example, the titles table in the pubs database requires a pub_id of an existing publisher. But EDM represents this constraint as a publishers class field in the titles class and an additional  publishersReference instance as a member also. My first attempt to create a new title with an existing publisher (pub_id 1389) went like this.

titles t = new titles();
t.title_id = "BU9994";
// rest of columns elided for clarity
// statement to set the pub_id to 1389
model.AddTotitles(t);
model.SaveChanges();

My first "statement to set the pub_id to 1389" was:

publishers pub = new publishers();
pub.pub_id = "1389";
t.publishers = pub;

This one tries to add the publisher row too and gets a duplicate key violation.

This worked
t.publishers = model.publishers.First(pub => pub.pub_id == "1389");

BUT it causes a SELECT to get the publisher from the database. I wondered at this point if I really needed to do a SELECT to INSERT a row with an existing foreign key row. With some help from Danny Simmons, a much nicer way is to use the publishersReference class and synthesize an EntityKey.

t.publishersReference = new System.Data.Objects.DataClasses.EntityReference<publishers>();
t.publishersReference.EntityKey = new EntityKey("PubsTestEntities.publishers", "pub_id", "1389");

This does the insert correctly with no gratuitous SELECT. Confirmed by SQL Profiler. Good.

Saturday, December 08, 2007 7:39:11 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Wednesday, November 21, 2007

One last SQLCLR feature I'd forgotton about but was quite highly publicized. This is extension of SQLCLR UDT and UDAgg maximum size from 8000 bytes to 2gb. You just the MaxByteSize of -1 in the appropriate attribute, like this:

[SqlUserDefinedAggregate(Format.UserDefined, MaxByteSize=-1,IsNullIfEmpty=true)]

The serialization format is going to have to be UserDefined, the limt for Format.Native is still 8000 bytes.

Just catalog the UDT or UDAgg and use it as you would any other TYPE/AGGREGATE. I crufted up a proof-of-concept one pretty easily, and the new spatial data types (GEOGRAPHY and GEOMETRY) are also large (system) UDTs, so you know this works as of CTP5.

Now you can have your "infinite" UDT that holds an array or "infinite" string concatenation UDAgg. Enjoy.

Wednesday, November 21, 2007 7:39:46 AM (Pacific Standard Time, UTC-08:00)  #    Comments [3]  | 
Tuesday, November 20, 2007

Another cool SQLCLR feature in SQL Server 2008 is the ability to define a table-valued function with an ORDER clause. If you have intimate knowledge that the function always returns rows in sorted order, you declare your SQLCLR UDF with an ORDER clause in the CREATE FUNCTION DDL statement. Let's experiment with this using the cheap and easy Fibonacci sequence function from Dan Sullivan and my SQL Server 2005 Developer's Guide.

create assembly orderedtvf from 'C:\temp\OrderedTVF.dll'
go

-- no order clause
create function FibonacciUnOrdered (@prevstart int, @start int, @rows int)
returns table (next int, prev int)
as external name orderedtvf.FData.Fibonacci
go

The Fibonacci sequences generated are always in ascending order because that's how the function is implemented. In fact, they are in order by both the "next" and "prev" column. Because there is no limit to the number of method signatures we can have over the same SQLCLR UDF, we use the same SQLCLR method, just changing the DDL statement and function name.

create function FibonacciByNext (@prevstart int, @start int, @rows int)
returns table (next int, prev int)
order (next asc) -- this is new
as external name orderedtvf.FData.Fibonacci
go

create function FibonacciByPrev (@prevstart int, @start int, @rows int)
returns table (next int, prev int)
order (prev asc) -- same concept, different ordering column
as external name orderedtvf.FData.Fibonacci
go

Now let's do some testing. The query plan iterators and plan cost are shown as comments.

-- cost: 0.0279081
-- TVF -> Sort -> Select
select * from dbo.FibonacciUnordered(3,4,5)
order by next

With an ordered TVF, there is no SORT iterator, but there are extra query plan steps to operate on the ordered set. AND... the query cost is over 10x lower.

-- cost: 0.0023802
-- TVF -> Segment -> Sequence Project -> Assert -> Select
select * from dbo.FibonacciByNext(3,4,5)
order by next

-- cost: 0.0023802
-- TVF -> Segment -> Sequence Project -> Assert -> Select
select * from dbo.FibonacciByPrev(3,4,5)
order by prev

Note that it IS best to have a different TVF name for each sort order, and "no order", if you plan to use different ORDER BY clauses. This one has a Sort AND its cost is greater than the function that's declared UnOrdered

-- Different order
-- cost: 0.0292881 (more than Unordered = 0.0279081)
-- TVF -> Segment -> Sequence Project -> Assert -> Sort -> Select
select * from dbo.FibonacciByNext(3,4,5)
order by prev

In addition, the query plan guarentees that you don't lie in your order clause. Here's proof.

create function FibonacciWrong (@prevstart int, @start int, @rows int)
returns table (next int, prev int)
order (prev desc) -- THEY ARE IN ASCENDING ORDER, NOT DESCENDING
as external name orderedtvf.FData.Fibonacci
go

-- Error:
-- The order of the data in the stream does not conform to the ORDER hint
-- specified for the CLR TVF 'dbo.FibonacciWrong'.
-- The order of the  data must match the order specified in the ORDER hint for a CLR TVF.
-- Update the ORDER hint to reflect the order in which the input data is ordered,
-- or update the CLR TVF to match the order specified by the ORDER hint.
select * from dbo.FibonacciWrong(3,4,5)
order by prev

Remember, you're not only saving a SORT iterator in the query plan, you're saving memory too. The SORT iterator requires a memory grant. And because there are no stats for SQL Server to use in these "opaque to SQL" functions, the memory grant for the SORT iterator in FibonacciUnOrdered is 1024K. And, we hope that the rather generous memory grant is enough, else memory is being allocated during query execution.

So, declaring ordered TVFs is worth it.

Tuesday, November 20, 2007 4:39:05 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

One thing that I didn't find in the BOL What's New page is some of the new SQLCLR functionality in SQL Server 2008. The first one that intrigued me is support of multi-input user-defined aggregates. Suppose I wanted to implement Oracle's COVAR_POP aggregate, an analytic function that returns the population covariance between two expressions. The signature is COVAR_POP(expr1, expr2) and I want the signature to stay the same in SQL Server.

All that I need to do this is to use the "template" for a .NET UDAgg struct/class, replacing the Accumulate method that take one parameter with a 2-parameter method, like this:

public void Accumulate([SqlFacet(Precision = 20, Scale = 10)] Nullable<decimal> Value1, 
                               [SqlFacet(Precision = 20, Scale = 10)] Nullable<decimal> Value2)
{
  // code here
}

My CREATE AGGREGATE DDL statement would change a bit:

CREATE ASSEMBLY multiparmagg FROM 'C:\temp\multiparmagg.dll'
go

CREATE AGGREGATE dbo.covar_pop(@expr1 decimal(20,10), @expr2 decimal(20,10))
RETURNS decimal(20,10)
EXTERNAL NAME multiparmagg.CovarPop;
go

To invoke:

create table dbo.test_covar (
  i1 decimal(20,10),
  i2 decimal(20,10)
);
go
-- fill with data, then...
select dbo.covar_pop(i1, i2) from dbo.test_covar;

Happy aggregating.

Tuesday, November 20, 2007 3:24:11 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

With any new product or CTP, I've always gotten fast results by starting with the readme.txt file that comes with the product/CTP. Although its now called ReadmeSQL2008.htm, its still worth reading. This usually gives you answers about install scenarios and last-minute changes.

For example, this readme file answers two of the (so-far) FAQs about CTP5.
  What happened to Surface Area Configuration Utility?
  Can I install SQL Server 2008 CTP on Windows Server 2008? (ie, is it officially supported)

You'll need to RTFR for answers to these.

Right after the readme file, the first thing to do (after you've installed it) is to take a gander at the SQL Server Books Online, in the "What's New" section. Here's a bookmark for those without the facility to look in the TOC. mshelp://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10sq_GetStart/html/6a428023-e3cc-4626-a88a-4c13ccbd7db0.htm.

There's WAY TOO MANY new features in this CTP to describe in a few sentences. And the BOL writers have really had their hands full with this CTP, by the looks of the sheer volume of new material. Enough to keep anyone reading and experiementing for weeks. However...

Every once in a while something slips through the readme and What's new, or isn't finished yet, like the "What's new in SQL Server Installation" section. Or folks think of new ways to use features the BOL writers hadn't thought of. Or want to expound on the repercussions of feature XYZ. Stay tuned for these.

Tuesday, November 20, 2007 11:49:10 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

SQL Server 2008 adds the concept of priority for conversations. It's setup using special DDL statements, priority cannot be specified on the CREATE DIALOG CONVERSATION or SEND/RECEIVE DML statements. The DDL statements are CREATE/ALTER/DROP BROKER PRIORITY.

To specify a priority, you associate a BROKER PRIORITY object with combinations of the qualifiers LOCAL_SERIVCE_NAME/REMOTE_SERVICE_NAME/CONTRACT and the priority is associated with all messages and conversation endpoints that match that combination. The wildcard 'ALL' (or leaving the qualifier out entirely) is permitted for any or all of the qualifiers, and matching precidence is specifying in BOL under the CREATE BROKER PRIORITY syntax. The set of defined priorities is stored in sys.conversation_priorities metadata view.

In addition to this setup, the database has to be set to use prior