Friday, February 29, 2008

It's good to see Ed Katibah open up a blog on SQL Server spatial data. Ed is sort of the "dad" of spatial data in SQL Server; he's got a ton of experience and history in this space. Be sure to catch his posting on what's new in spatial for CTP6.  I've been trying out the new functions and the more I use the spatial types and functions, the more I understand the reasoning behind the dual Geometry - Geography types in SQL Server.

Issac's back at blogging too about spatial indexes, really makes for good reading.

Subscribed!

Friday, February 29, 2008 11:28:28 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Thursday, February 28, 2008

I've been working with Filestream storage in SQL Server 2008 since it appeared in CTP5. The way I've always set it up is to use sp_filestream_configure. During the CTP6 setup process, I noticed you could now configure Filestream as part of setup. Because I knew how to use sp_filestream_configure I skipped that part of setup. And everything just worked as expected when I used it. A friend of mine, not wanting to miss anything, configured Filestream as part of setup. When he tried to use it, the following "Catch 22-like error messages occurred":

EXEC sp_filestream_configure @enable_level = 3;

"The FILESTREAM feature is already configured to the specified level. No change has been made."

CREATE DATABASE ... with a filegroup for FILESTREAM

Msg 5591, Level 16, State 1, Line 1
FILESTREAM feature is disabled.

Huh? After checking the short list of usual suspects (e.g. NTFS file system, running SSMS as admin under Vista) we were both puzzled. Especially because it "worked for me" as it always had. There's nothing more frustrating to hear during problem resolution then "Well, I'm not sure what you did wrong because it works for me". Grrrr...

There is finally a resolution thanks to the storage engine team (confrmed and expounded on by Joanna's blog entry on March 3), and it turns out that the way filestream is configured changed in CTP6 for some very good reasons. You see, sp_filestream_configure sets up Filestream in the OS (requires OS privs) and SQL Server (requires SQL Server privs). Unless you're running sp_filestream_configure as an OS Admin who's also a SQL Server sysadmin, it may not work completely. But sp_filestream_configure is "IN" for one last CTP (CTP6). Gone before RTM.

So filestream configuration was broken into two parts:
1. Configuring filestream at an OS level is moved into setup. Or use SQL Server Configuration Manager. Or WMI scripting (with SMO/WMI) is you like scripting (I do). In the SQL Server Configuration Manager GUI, you configure it by select SQL Server Services (left pane), right-click on your SQL Server service instance and choose "properties" and use the FILESTREAM tab. Note that the equivalent FILESTREAM tab does NOT appear when using Services Control Panel applet. I hope you switched (as you should have) to SQL Server Configuration Manager back in SQL Server 2005 days.

2. Configuring filestream at a SQL Server instance level requires EXEC sp_configure 'filestream access level', '2'. Note that the access level choices in SQL Server are 0,1,2.

3. BOTH configurations steps/setting (OS and SQL Server instance) must be compatible for filestream to work. If, for example, filestream is enabled at the OS level but disabled in SQL Server, you'll get message 5591. See above.

So how did my friend get the "Catch 22 errors"? In CTP6 (but not in future), setup performed step 1, but not step 2. In future setup will do both steps. In CTP6, sp_filestream_configure (to be removed before RTM) will do both steps. But before sp_filestream_configure does anything, it CHECKS to see if either configuration job is already done. Thus the "filestream feature is already configured" message.

Got it? So get used to configuring filestream in both places using SQL Server Configuration Manager AND sp_configure. That's the way of the future. The fact that both exist in one CTP can be frustrating but is understandable. Reminds me of a similar shift when configuring HTTP endpoints (which also requires both OS and SQL configuration) during the SQL Server 2005 CTPs.

Hope this post saves someone a few "grrrr... moments" when using this new useful feature.

Thursday, February 28, 2008 9:58:50 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Tuesday, February 26, 2008

Quite a bit of activity last week in the Visual Studio and ORM spaces.

Visual Studio released a CTP of Visual Studio 2008 support for SQL Server 2008 CTP6. It allows you to connect to CTP6 with Server Explorer, which enables quite a few other features to work. check with Overview document on the download page. SQLCLR projects and T-SQL and SQLCLR debugging work too.

Curiously the Overview document says "using the LINQ to SQL Designer with SQL Server 2008 databases" is specifically not supported. But I've used SQL Server 2008 WITH NO new data types (e.g. DATE, TIME, etc) and it does seems to work. EF beta3 designer too. Maybe they are referring to lack of support for the new data types. Speaking of which...

Faisal Mohamood announced on the ADO.NET team blog that they ARE working on support for the four new date and time types in SQL Server 2008. Hooray. Now let's keep our fingers crossed for SQL Server UDT support.

Julie Lerman wants these types in EF too, but I'm not sure I agree. DATE and TIME would be nice because they're ANSI standard types. DATETIME2 maps favorably to DateTime in EF, actually better coverage of the value space than DATETIME. Maybe what she really means is that she wants the SSDL and MDL and the designer to be aware of them and generate the closest possible mapping. Even if that's "string".

The distinction is that the ADO.NET Entity Framework is meant to be database neutral, not SQL Server specific, and supports a discrete subset of database data types not a 1-1 mapping of EF types to SQL Server types. LINQ to SQL, OTOH, is SQL Server-specific and should support a 1-1 mapping. At this point I'm really thinking they should have called it "LINQ to SQL Server". Not that the *concept* can't be extended to other databases (LINQ to DB2, LINQ to Oracle, etc) but these databases will have to implement their own. The fact is that LINQ to SQLCE does exist and is separate from LINQ to SQL (Server). And other vendors may follow with their own, database-dependent implementation of a light LINQ layer. But speaking of SQLCE...

The SQLCEBLOG announced the beta of SQLCE 3.5 SP1 with ADO.NET Entity Framework Beta 3. There was support for EF in SQLCE a while ago, but it was postponed in the 3.5 release which sports LINQ support. Now it looks like Entity Framework support is back. Hooray. BUT, although a common programming model between SQLCE and a subset of SQL Server functionality is a great idea, IMHO the best and most lightweight programming model for SQLCE uses SqlResultset. Still the winner.

Oh yes, BTW, SQL Sevrer 2008 CTP6 was released last week. But that's been posted on EVERY SQL Server related blog in the world. Congrats for another CTP folks. Sparse columns, filtered indexes, and filtered statistics are in as well as quite a few other features. 

Tuesday, February 26, 2008 11:48:12 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Thursday, February 14, 2008

Sometime before SQL Server 2008 RTM, the libraries for LINQ and LINQ to XML (that is System.Core.dll and System.Xml.Linq.dll) but NOT LINQ to SQL will be added to the SQLCLR "approved" assembly list. The approved assembly list is the list of .NET Framework libraries that have been tested in a SQL Server-hosted environment, annotated their code with HostProtectionAttributes where needed, and can be used in user assemblies that can be cataloged with "PERMISSION_SET = SAFE". That means you can use 'em in SQLCLR user-defined functions, procs, and other SQLCLR database objects.

Being curious, I asked "why LINQ and LINQ to XML but not LINQ to SQL"? The answers I got were "LINQ to SQL not tested in this environment" and "not sure there should be yet another way to access the database with .NET". Fair enough. Although, its never seems to be a problem to provide yet another way to mix SQL Server and XML. The number of different ways now exceeds the number of fingers on one hand.

OTOH, SMO doesn't appear to be added to the approved assembly list in SQL Server 2008. Nor has WCF. Oh well.

Enjoy...

Thursday, February 14, 2008 4:58:31 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

I often hear those who identify themselves as "database programmers" (sometimes I think these may be folks who program only inside-database code), say: "I'd give learning LINQ and/or LINQ to SQL/Entities or Entity SQL (eSQL) a go in my sparse time, but I don't have a good book or a tool.

I haven't read a book on either of these yet, spending most of my time in BOL. BOL is a good start (but not comprehensive) for LINQ and LINQ to SQL. eSQL docs are pretty sparse, but that's because its not a shipping product yet. Maybe things will improve. (Note: if anyone wants to send me a copy of their book to read in my spare time, I'll give it a go ;-).

Following LINQ is more difficult than EF (modulo LINQ to Entities) because there's many implementations/dialects. There are operators that LINQ (over objects) can do, but LINQ to SQL or LINQ to Entities cannot.

On to tools:

For LINQ, I like LINQPad by Joseph Albahari. Simple to set up, although the queries are not *exactly* the same as you'd write in a program. And it doesn't support LINQ to Entities yet (or LINQ to DataSet for that matter). But, all in all, an excellent tool.

For eSQL, there's no comparison to Zlatko Michailov's eSqlBlast. This is not only an eSQL practice query processor, but also a visualizer for the results you'd get back. You just type in the eSQL. Even comes with source code. Latest version is on CodePlex. This is straight eSQL, no LINQ to Entities support here either.

Neither one of these has Intellisence or statement completion, I believe.

Enjoy...

Thursday, February 14, 2008 4:24:56 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

This post covers LINQ to SQL and EF worry #2. That is:

LINQ to SQL and EF will encourage "SELECT * FROM..." style coding because you get back a nice, clean object this way instead of the less-useful anonymous type you receive by doing projection. This will also make most/all covering indexes useless.

LINQ to SQL and EF can return something other than a whole object instance. Here's an example:

// This returns a collection of authors
var query =  from a in ctx.authors
             select a;

// this returns an anonymous type
var query =  from a in ctx.authors
             join ta in ctx.titleauthors on a.au_id equals ta.au_id
             join t in ctx.titles on ta.title_id equals t.title_id
             select new { a.au_id, t.title_id };

The collection of authors returned by the first query is updatable and a reference to it can be used outside the function in which its created. The anonymous type is not updatable and cannot be used outside the function in which its created (there is a nice workaround for this, but even the author of the workaround describes it as a bit of a hack). Although I can see a use for anonymous types in data-binding the good ol' dropdown list. But because they're be can't insert a new row unless you include all the columns you need, perhaps not dropdown comboboxes.

You don't necessarily need to return an anonymous type. You can define (by hand) a class that represents the projection of authorid and titleid. Or use a view. But, in order to do this on a large-scale project, you'd need to define a class for each projection in the entire project. Just for fun, I ask my students "do you know every projection (rowset) that every query in your project returns"? Can you even enumerate them? No one's answered "yes" to that question yet. So anonymous type (or much extra work). Maybe someone will write a tool to do this some day. Or "whole objects"...aka SELECT * FROM.

Counter to this is perhaps you SHOULD know every projection your project returns. Would certain help in tuning to know them all.

BTW, this is the same issue you'd run into using stored procs that return rowsets with LINQ to SQL; they return anonymous types. Using rowset-returning procs with EF *forces* (in beta3) you to define a class to contain the rowset produced. That's good. But EF can't make use of procs that return more than 1 rowset (SqlDataReader.NextResult() in ADO.NET). LINQ to SQL can use these, you get multiple anonymous types.

So simple 1-rowset sprocs with EF, or custom classes for every projection (and sproc in LINQ to SQL) it is. Or nice, clean, full objects (using SELECT * FROM). And what about covering indexes? An overly-simplistic definition would be "non-clusted index defined over the set of columns used by one table in a projection". Makes for some nice query plans (and sometimes even helps with concurrency issues in SELECTs). But if we're always doing SELECT * FROM, flush those covering indexes. The only index that matters is the (possibly) clustered index on the base table.

A counter-argument to this is that you really shouldn't define a covering index for every projection just because you can. Any index takes space and affects insert/updates/deletes, there's a tradeoff here. A second counter-argument is that if you need many, many covering indexes, perhaps your database isn't as well normalized as it could be. I'm not really sure I buy this argument.

MHO: This is a for the mostpart a valid worry.

Thursday, February 14, 2008 3:55:51 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Wednesday, February 13, 2008

I'd been meaning to write this entry for a while, about my opinion on LINQ to SQL and the ADO.NET Entity Framework and performance. I've just finished reading the 2-part blog series "Exploring the Performance of the ADO.NET Entity Framework", and was surprised (I guess) that the database performance aspect was barely mentioned.

One way to look at performance is to examine and profile the ADO.NET code, but because both EF and LINQ to SQL are T-SQL code generators (EF is not database-specific, but I'm only talking about SQL Server here, so far), another way to talk about performance is to examine the generated T-SQL code. I want to look at the generated T-SQL code side.

In this posting I'm going to list my top worries WRT the code generated by these APIs. And try and argue for and against the worries at the same time. Here's the list.

1. LINQ to SQL and EF will proliferate dynamic SQL, and will almost surely produce suboptimal dynamic SQL, causing database performance problems and plan cache pollution. And dynamic SQL is evil to start with.

2. LINQ to SQL and EF will encourage "SELECT * FROM..." style coding because you get back a nice, clean object this way instead of the less-useful anonymous type you receive by doing projection. This will also make most/all covering indexes useless.

3. LINQ to SQL and EF will encourage "SELECT to middle-tier, then update/delete" rather than issuing direct SQL UPDATE/DELETE statements that are set-based.

4. LINQ to SQL and EF will write code that gets too much or too little data at a time. This is a variation of #2. Too much data in one query is a waste. Too little data is also bad because it means extra database round-trips.

5. LINQ to SQL and EF will discourage the use of stored procedures to return rowsets; returning rowsets in stored procedures is *thought to be* superior to returning them in dynamic SQL. In addition, the code generators will use a subset of T-SQL query constructs, that is, only the constructs that the LINQ or ESQL language supports, rather than the full power of the T-SQL query language.

6. LINQ to SQL and EF queries will be "untunable" because, even if you discover a performance problem, you can't change the underlying API code to produce the exact SQL query that you want. There's too many layers of abstraction to change it.

7. Other arguments? I'll accept other points to argue/worry about...

Here the argument about worry-point #1. Other worry-points in upcoming posts.

It's interesting to see the number of folks who do query tuning for a living salivating over the prospect of tuning the bad, bad, bad queries that will assuredly result from these two data access stacks. And the number of DBA-types who'd like to "ban LINQ/EF and databases in their companies". It's also interesting to note that most people who profess a dislike for the generated code, have never seen (or seen very little of) the generated code. Usually, when someone sites a particularly bad instance of generated code they're never able to tell me if the code came from LINQ to SQL or EF. So I'd like to open a clearinghouse for LINQ to SQL and EF queries that generate really poor SQL. Send them to me at my SQLskills email address, together with enough info/data to reproduce it. Conor Cunningham's blog has begun addressing the question of LINQ's IN operator and SQL cache pollution, by writing about IN and SQL plans.

So far, its been my experience that LINQ to SQL, being more relation-centric, will in general generate code that's closer to what a good T-SQL programmer would generate. EF is more "object-centric" and sometimes generates SQL thats meant to construct object graphs. But neither one of them (that I can deduce) can generate a full outer join.

About dynamic SQL. It's almost dogma amongst database programmers that "static SQL" in stored procedures is better for security than dynamic SQL constructed via string concatenation. Besides the obvious association between dynamic SQL and SQL injection, using dynamic SQL means that all users must be given access to the underlying tables, unless you strictly use LINQ to SQL/EF with views and sprocs. LINQ to SQL and EF make every attempt to use parameterized SQL, and LINQ to SQL claims to have minimized/eraticated the potential for SQL injection when using their code. And remember, many programmers use dynamic SQL currently, LINQ to SQL would be an improvement for them.

More about plan cache pollution. I'll start by recommending Sangeetha Shekar's blog series on the plan cache. The starting entry of the 17-part series is here.  Except for the concept of "many different projections when one stored procedure will suffice" there's no cachability difference between parameterized SQL and a stored procedure. Non-parameterized SQL suffers a slight cachability difference until its reused. Because LINQ to SQL and EF are code generators, its likely that they may generate more homogenous SQL than programmers who use SqlCommand.CommandText. And programmers that use dynamic SQL (the ones most likely to use only LINQ to SQL/EF) are likely causing plan cache pollution right now in any case.

MHO.

Wednesday, February 13, 2008 5:47:20 PM (Pacific Standard Time, UTC-08:00)  #    Comments [4]  | 

This entry is a continuation of the previous posting on how to use SQL Server 2008 spatial data and Virtual Earth. This entry discusses your web service and SQL Server code choices.

The point of the web service is to translate one of the output formats of SQL Server 2008 spatial to a format this Virtual Earth can use. The spatial data types can be output in:
1. Well-known text format
2. "Native" format - that is, return a Geometry/Geography type to a .NET client as the SqlGeometry/SqlGeography native .NET types
3. GML - Geography markup language

Remember that Virtual Earth needs:
1. GEORSS or
2. KML
3. Custom XML-based format (and custom Javascript code to process it)
4. Javascript strings (to pass to eval())

There is not a one-to-one correspondence between any of these methods, so there is going to be a need for a transformation somewhere. In addition, a SQL Server Geometry or Geography type can contain a collection of points, linestrings, or polygons. There may also need to be a one-to-many translation. This is facilitated by methods STNumGeometries/STGeometryN and STNumPoints/STPointN.

Finally, there could be a pushpin (point) associated with the center of a polygon and/or text associated with the pushpin. Where the text comes from will be application-dependent.

Because there are many ways to process XML in SQL Server and in web services, that's lots of choices. Some that come to mind are:
1. Transforming a SqlGeometry/SqlGeography from a SqlDataReader to a custom XML format
   Javascript code in the web page processes the custom format.
2. Using SQL Server's XQuery to transform GML into GEORSS or KML
   This is returned to the web service as a SqlXml data type.
   Consumed directly as a layer in Virtual Earth.
3. Using an XSLT transform (or LINQ to XML) to transform GML to GEORSS or KML. This can occur in the web service or even a SQLCLR procedure, although this type of processing should really be accomplished in the web service.
4. Using SELECT ... FOR XML PATH to construct GEORSS or KML directly.
   Similar processing to choice #2 above.
5. Transforming a SqlGeometry/SqlGeography from a SqlDataReader to Javascript strings.
   Web page simply calls eval()

Hopefully this posting will elucidate the choices you have for this rendering. Depending on which method you choose, there could be quite a few "moving parts" (a la Rube Goldberg) in the solution.
- Web pages with custom Javascript code
- Web service to supply the data
- HTTP Handler to allow Virtual Earth to use remote GEORSS/KML
- Data access code in SQL Server
- Transformation code in web service or SQL Server

So.....what's your favorite method for using SQL Server 2008 spatial data with Virtual Earth?

Wednesday, February 13, 2008 3:53:53 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

Last week at the SQL Server 2008 Jumpstart I showed an example of using SQL Server 2008 spatial data types with Virtual Earth. I showed a single coding style. There really is a plethora of coding styles to using these together, and I'd like to describe some of the most common ones. I'd like to do this without much code for now, because an end-to-end description sometimes results in the graphic aspects (which other controls to use, how to build Ajax web pages, etc) seem to get in the way.

I'm not even going to mention using Virtual Earth to serve tiles directly, but only the Virtual Earth web control. Start with the Virtual Earth Interactive SDK for coding information.

In general, what we're after is:
   Page contains Virtual Earth control
   Page makes a request to a Web Service for data (usually Ajax)
   Web Service get and processes data from database, returns data
   Page uses Virtual Earth calls to present the data

The first two are straightforward, although the Virtual Earth control is available only online via a URL. This makes testing when you don't have internet access virtually impossible. The Ajax request (that is, call a URL with an XML payload using a browser-specific mechanism, wait asynchronously for the response) normally doesn't use SOAP-based Web Services, because you'd have to add and remove the SOAP headers. It's usually XML-in, XML-out, sans SOAP headers.

The Virtual Earth calls depend on what version of Virtual Earth control you're using. I'll show this using version 6.
1. You can render using shapes directly
2. You can import GEORSS into a shape layer
3. You can import KML into a shape layer

Virtual Earth supports visualizing shapes directly (#1 above) by using:
1. AddPushpin - to visualize a point or the centroid of a polygon
2. AddPolyline - that's Linestring in SQL Server 2008
3. AddPolygon - for polygons

For shape layers (#2 and #3 above), the code looks something like this:

var l = new VEShapeLayer();           
var veLayerSpec = new VEShapeSourceSpecification(type, local.xml, l);

map.ImportShapeLayerData(veLayerSpec, onFeedLoad);

The "type" in the VEShapeSourceSpecification constructor can either be GEORSS or ImportXML (that's KML). You can also have a Live Search Maps Collection but likely that one won't come from SQL Server.  Bear in mind that the VEShapeSourceSpecification constructor needs a *local* XML file URL. If you don't want to use a local XML file, you can build an ASP.NET HTTPHandler that uses redirection to pretend a remote XML file is local (see Mike McDougall's article and code to accomplish this).

One last twist that I've seen is that you can build your Javascript code directly in each page OR make the Web Service program reformat your data into Javascript calls directly. Once you're back at the page (the Web Service passes back a string of Javascript) you simply call Javascript's "eval" function. This makes the web page client-side code easy, but the Web Service code lots more complex, as you're generating code, not points, lines, and polygons.

Web Service choices follow next....

Wednesday, February 13, 2008 3:52:09 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Tuesday, February 12, 2008

SQL Server 2005 introduced the concept of endpoints. Every connection entry point into SQL Server is abstracted as a endpoint. You can see all the endpoints on your SQL Server instance by using the sys.endpoints metadata view. In addition to those you'd expect (for TCP/IP, Named Pipes, Shared Memory) there are also endpoints defined for the dedicated admin connection, Service Broker, Database Mirroring...and HTTP.

The endpoint concept is still with us, but HTTP endpoints, those endpoints that allow you to expose SQL Server procs and functions directly as HTTP Web Services will be deprecated in SQL Server 2008. Upon creating an HTTP endpoint (aka SOAP endpoint) in SQL Server 2008, you'll be warned that "this feature will be removed in a future version. You should not use these in new development work." This was announced at the SQL Server 2008 Jumpstart last week.

So what replaces this functionality (and how many folks actually used it in SQL Server 2005). The likely successor seems to be ADO.NET Data Services, a mechanism to produce RESTful services over any type of data, using Entity Data Model or LINQ to SQL. The model you expose through data services doesn't really have to be a database at all.

And for those of you that liked the idea of exposing your stored procedures as services, ADO.NET Data Services supports "Service Operations", which can be parameterized and contain any type of logic you wish. Service Operations aren't limited to stored procedures, but could be used to encapsulate them.

Tuesday, February 12, 2008 5:08:32 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 

Wow. It's already Tuesday and I'm still recovering from last week. I've been laying low for the last six months or so, writing a few (OK, quite a few) blog entries on SQL Server 2008 topics. But I'd really been working on my next SQL Server course (on 2008, of course), that premiered last week as part of a program known as SQL Server 2008 Jumpstart.

Simon Sabin (the newest SQLskills associate) and I presented to a good-sized crowd of Microsofties and trainers from everywhere. We did the developer track; there were also DBA and BI tracks by my well-known SQLskills associates. You'll see this course in a lot of different venues in the next few months, but it won't always be us that's doing the presenting. Just so many places we can be at once...

We covered everything from new T-SQL features to Spatial Data and Indexes and everything in-between. In addition to the SQL Server features, there was an entire day devoted to client APIs and dev tools; not only ODBC (yes, I did an ODBC demo in C++) and ADO.NET, but also the Entity Framework, ADO.NET Data Services, Occasionally Connected Apps.

So, for the next few weeks, I'm going to be writing down my thoughts on questions (some of them rather pointed) that arose about the topics we presented. Some of the post will be facts, answers to questions...but they'll be some opinion pieces as well (marked with MHO, that's my humble opinion).

Tuesday, February 12, 2008 4:48:59 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 

Theme design by Jelle Druyts

Pick a theme: