Saturday, April 05, 2008

It appears that there are some changes in .NET 3.5 System.Transactions (or System.Transactions.dll and System.Data.dll version 2.0.50727.1433 if you're looking at assembly versions). Florin Lazar blogs about a change to the syntax here. And Alazel Acheson blogs about changes to SqlClient changes to accomodate using lightweight transactions with less promotion to distributed transactions in the ADO.NET Team blog.

It is interesting to read how the latest version of SqlClient can combine automatic transactions with connection pooling to use promotion to a distributed only when necessary. It was reminiscent of how COM+ used the connection pool; the difference is that COM+ always used distibuted transactions. Some of the changes use a transaction-aware connection reset mode that's only supported in SQL Server 2008. So server changes were required as well.

At the end of the blog entry Alazel mentions that you can't use this facility with two open connections at the same time; it requires that there is a free connection with the appropriate transaction scope in the pool. I immediately thought of SQL Server's sp_getbindtoken and sp_bindsession, the manual way to allow two connections to share a single transaction context. But sp_getbindtoken and sp_bindsession are on the deprecation list (to be removed in a future version of SQL Server) in SQL Server 2008. The books online entry for these calls recommends "using MARS or distributed transactions instead". But why MARS?

MARS (multiple active resultsets) made its appearence in SQL Server 2005 and its data access stacks. It permits you to have multiple interleaved sessions while using a single SQL Server connection. These multiple sessions share the same transaction, modulo an interesting but escoteric behavior known as batch-scoped transactions. MARS also does not support named savepoints.

What MARS and sp_getbindtoken/sp_bindsession have in common is that they are both solutions to the same "problem". I prefer calling this a SQL Server "behavior" because its not technicall a problem, just how the underlying network stack works. SQL Server does not allow other activity on a connection (like an UPDATE or a second SELECT) while reading a rowset is in progress. Pre-MARS, if you wanted to have multiple commands in a single transaction scope, you would use two connections and "bind" the transaction scopes together with the transaction token. With the MARS capability this is no longer necessary.

So, the OpenConnction1/DoCommand1/Close then OpenConnection2/DoCommand2/Close pattern is now acommodated by System.Transactions and SQL Server 2008 without transaction promotion. To do two commands with the same transaction scope without closing the connection, use a single connection and multiple sessions with MARS. With either data access pattern, only a single physical connection to SQL Server is needed.

Saturday, April 05, 2008 2:14:44 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Monday, March 24, 2008

Short post this evening...

Just in case anyone else flails around looking for this feature, its right under your nose. When you use the PowerShell SQL Server 2008 provider, you have visibility to a single, local machine (and all its SQL Server instances you can access with integrated security) by default. To get access to multiple machines using the provider, simply reference a SQL provider path that contains that machine name. If the (Windows) principal has access to the other machine's SQL Server instance, it will open a connection using Windows auth.

For example, say that I'm on a machine named zmv20. I have access to machine zmv21's SQL Server instance as well.

>cd SQLSERVER:\SQL
>dir

MachineName
-----------
zmv20

>dir zmv21  <---- makes a connection to zmv21

Instance Name  (on zmv21)
-------------
DEFAULT

>dir           <---- now you can "see" zmv21 too

MachineName
-----------
zmv20
zmv21

Using the test-path cmdlet also works.

>test-path SQLSERVER:\SQL\zmv22  <---- can I login to this machine too?

But bear in mind that this is subject to a connection timeout lag. The error message indicates that its first using WMI to obtain the machine connection.

Of course, all this is doc'd in SQL Server Books Online. Where I missed it a few times...I was looking for the equivalent of a "connect" command. You don't need one.

Monday, March 24, 2008 11:21:58 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Thursday, March 20, 2008

I'm still getting used to the new sparse column feature in SQL Server 2008.

I'd just read in the BOL definition of ALTER TABLE that you can add a column_set to an existing table. I was converting a sample app from an EAV (entity attribute value) design to sparse columns. I used the existing "attribute-value" table to create the sparse columns, then created the table. I then went back to add the column_set with ALTER TABLE after the fact.

alter table sparsetest3 add spcolset xml column_set for all_sparse_columns

Received this error message:
Msg 1734, Level 16, State 1, Line 1
Cannot create the sparse column set 'spcolset' in the table 'sparsetest3' because the table already contains a sparse column set. A table cannot have more than one sparse column set.

Huh? I figured that I should be able to add the column_set and filed the lack of this capability as a bug. It came back as "by design".  And doc'd in BOL that way. The fact that its by design makes perfect sence and helped to solidify in my mind how the column_set works.

In a "normal" table (even with sparse columns) without a column_set, "select * from table" returns all of the columns. If a table has a sparse column_set, "select * from..." behaves differently, and returns only the non-sparse columns and the column_set column. NOT the individual sparse columns. You can INSERT or UPDATE this table by using only the column set. When you update using a column_set, all of the sparse columns that you don't specify are set to NULL. You can even update to column_set to NULL itself, which NULLs out all of the sparse columns.

The reason that "A column set cannot be added to a table if that table already contains sparse columns" (BOL exact wording under "Guidelines for using sparse columns") is that it could break existing code that uses "select * from...". Imagine:

create table sparsetab (
 id int identity primary key,
 col1 int,
 spcol2 int sparse
);

select * from sparsetab -- returns spcol2
-- add column_set (this is disallowed, but imagine it DID work)
select * from sparsetab -- doesn't return spcol2, only the column_set, can break code that relies ont spcol2

And BOL is right in ALTER TABLE as well.

create table sparsetab2 (
 id int identity primary key,
 col1 int
);
go
-- add first sparse column and column_set at the same time, works fine.
alter table sparsetab2
 add spcol1 int sparse,
       spcolset xml column_set for all_sparse_columns
go

You can even, as BOL indicates, add a column_set to a table that does not yet have a sparse column.

create table sparsetab4 (
 id int identity primary key,
 spcolset xml column_set for all_sparse_columns
);
go
-- Now you can add sparse columns, they use the column_set

The ONLY issue I have is with the error message 1734 at the beginning of the post. It's misleading, because I DON'T already have a column_set. But I DO already have sparse columns.

In last sparse column correction from a long-ago post. I'd heard (early on) that you would be able to have over 4 million sparse columns (actually sizeof(int) of them). This turns out to be incorrect, it was announced lately that the limit will be 30,000 sparse columns. And, in the current CTP6 you can only have 1024 total columns (in as previous versions); the sparse column limit will be changed sometime before RTM.

Amazingly, when I quoted the 30,000 column limit to a class last week, there was a groan. One student told me his EAV table already had over 60,000 unique attributes. 30,000 wouldn't be enough...must be a HUGE EAV table. That's the motivation for sparse column. Also, his EAV table had the "value" column defined as SQL_VARIANT. That's the other motivation, sparse columns are strongly typed. Although inserting through the column_set always uses a string (nvarchar) as the value, and attempts to convert string to the definied data type for specific columns.

Thursday, March 20, 2008 10:03:07 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Monday, March 17, 2008

This is the last part of a series on programming policy-based management. The series starts here.

In the previous installment, I created a policy that was constrained to a single database. To accomplish this, I used a Condition that called out the database by name, and tied it to the TargetSet using TargetSet's SetLevelCondition method.

An alternative consists of creating a policy as part of a PolicyCategory. Each Policy is a member of exactly one PolicyCategory. The default PolicyCategory is the only one that "ships with the system", but you can define your own. If you don't specify otherwise in the code (SSMS has a Category dropdown on the Description page of the new Category dialog), your policy is a member of the default category.

Each PolicyCategory has a property that indicates whether it's manditory that a database subscribe to that category. If this property is true, each database has an implicit subscription to the category. If not, a database must explicitly subscribe. The code to define a PolicyCategory is straightforward:

static void CreateCategory(PolicyStore ps)
{
    PolicyCategory cat = new PolicyCategory(ps, "MyNewCategory");
    cat.MandateDatabaseSubscriptions = false;
    cat.Create();
}

To create a Policy that's a member of the PolicyCategory, simply use the aptly-named PolicyCategory property. If you're using a named PolicyCategory you may not want to restrict that policy to a specific database. Here's the changes to the Policy definition.

// No Condition On This One, applies to all databases, but must be subscribed to
//ts1.SetLevelCondition(ts1.GetLevel("Server/Database"), "FinanceDB");

// Name the category
p2.PolicyCategory = "MyNewCategory";

To create a PolicyCategorySubscription (for those categories that are not Manditory), the only interesting part is that you need a SqlSmoObject. You can use a concrete subclass (like Database) or make up a SqlSmoObject by using a URN. Dan Sullivan's and my book "Developer's Guide to SQL Server 2005" covers both ways to make an SqlSmoObject. Here's the code for PolicyCategorySubscription.

static void CreatePolicyCategorySubscription(PolicyStore ps)
{
    Server svr = new Server(); // open a connection to default instance, local server
    Database db = new Database(svr, "pubs");
    PolicyCategorySubscription subs = new PolicyCategorySubscription(ps, db);
    subs.PolicyCategory = "MyNewCategory";
    subs.Create();
}

This means that only the pubs database now follows the policies in "MyNewCategory". BTW, there's currently the PowerShell provider in CTP6 throws an error when attempting list a PolicyCategorySubscription (its in the hierarchy at the same level as Policy). The PolicyCategorySubscription still works as advertised though.

This concludes the series on Programming Policy-Based Management with SMO. Hope it was useful.

Monday, March 17, 2008 6:58:47 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

This post is part of a series on programming policy-based management. The series begins here.

So, we were working with a policy that required an ObjectSet. ObjectSets contain TargetSets. For example, the ObjectSet for the naming policy (IMultipartName) we were working on needs a TargetSet for Procedure, Synonym, Table, Function, Type, View, and XmlSchemaCollection.

Note that this collection is similar to what you'd see for a MultipartName policy in the SSMS designer dialog. The title for it in SSMS is "Against Targets:". We want our policy to apply only to Tables. After fumbling around for a while attempting to define TargetSets and add them to the ObjectSet's collection of them, I found that the seven TargetSets I needed were *already* defined. This reduces enabling only the table's TargetSet to two lines of additional code. Notice that you can reference a specific TargetSet in an ObjectSet by using an indexer. The indexer is an SMO URL.

TargetSet ts1 = os1.TargetSets["Server/Database/Table"];
ts1.Enabled = true;

But how to restrict this policy to a single database? For this we need a Condition to name the database. Because this condition is simple we can use ExpressionNode.Parse(). Here's the Condition code.

// Create a condition to enforce
Condition con = new Condition(ps, "FinanceDB");
con.Facet = "Database";
// Note: Using Parse() treats the string as an SMO URL. Only works for simplest cases
string s = "@Name = 'finance'";    
// try-catch code omitted for brevity
con.ExpressionNode = ExpressionNode.Parse(s);
con.Create();

Back to our TargetSet. We restrict the TargetSet to a specific database by using TargetSet.SetLevelCondition. SetLevelCondition takes a TargetSetLevel, and we get the appropriate TargetSetLevel (which is prepopulated) by.... you guessed it...using a SMO URL. After ts1.Enabled, this limits the policy to a single database defined by our Condition.

// FinanceDB is the name of our Condition that "limit/defines" this policy only to Finance
ts1.SetLevelCondition(ts1.GetLevel("Server/Database"), "FinanceDB");

Having initialized the ObjectSet correctly, we now create the ObjectSet, tie it to the Policy and voila...

os1.Create();
p2.ObjectSet = "CheckFinanceTab_ObjectSet";
p2.Create();

Check this policy by using the following code in a query window:

use finance
go
create table dbo.foo (id int);
go

You get the expected error:
Policy 'CheckFinanceTab' has been violated by '/Server/(local)/Database/finance/Table/dbo.foo'.
This transaction will be rolled back.
Policy description: ''
Additional help: '' : ''.
Msg 3609, Level 16, State 1, Procedure sp_syspolicy_dispatch_event, Line 50
The transaction ended in the trigger. The batch has been aborted.

Monday, March 17, 2008 2:44:58 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

This is part of a series on programming policy-based management. The series starts here.

Now, we'll tackle programming a little bit more complicated policy. The table-naming standard that applies to a set of database objects. For this, we need three items:
1. Condition for defining the policy itself.
2. Policy that uses the condition and contains....
3. Condition that specifies a set of database objects to which the policy should be applied.

The first condition is straightforward

Condition con = new Condition(ps, "TablePattern");
con.Facet = "IMultipartNameFacet";
ExpressionNode exp = new ExpressionNodeOperator(OperatorType.LIKE,
    new ExpressionNodeAttribute("Name"),
    new ExpressionNodeConstant("fintbl%")
    );
con.ExpressionNode = exp;
con.Create();

Note that, as with the complex condition we specified previously (part 3), you can't specify 'LIKE' by using the Parse() method, but 'LIKE' does appear when you access the condition by using ToString(). Perhaps, in future, Parse and ToString will be reflexive on an ExpressionNode. Or maybe I'm just missing something.

The first part of the policy is straightforward too:

Policy p2 = new Policy(ps, "CheckFinanceTab");
p2.Condition = "TablePattern";
p2.AutomatedPolicyExecutionMode = AutomatedPolicyExecutionMode.Enforce;
p2.Enabled = true;

But we want this policy to apply to to the TABLES (not views, for example) and only those tables in the finance database. For this, we need an ObjectSet.

ObjectSet os1 = new ObjectSet(ps, "CheckFinanceTab_ObjectSet");
os1.Facet = "IMultipartNameFacet";

Note that ObjectSets reference Facets too. Note also that ObjectSets are a "top-level" object, that is, in the PowerShell provider they appear at the same level as Conditions and Policies. Note that Facets do NOT appear at that level.

Before we go any further, cavaet... there is no function in SSMS to define an ObjectSet standalone. Therefore, no way to delete a "half-baked" ObjectSet from SSMS if you don't want it. To delete an ObjectSet you don't want, you need to go into the PowerShell provider, navigate to SQLSERVER:\SQLPolicy\{server}\{instance}\ObjectSets and delete it. "del MyObjectSet". In SSMS, deleting a policy that refers to an ObjectSet deletes the ObjectSet too. Not sure if you can have multiple policies refer to the same ObjectSet yet, but if you could I assume you'd have to delete all the policies before an ObjectSet would go away.

Monday, March 17, 2008 2:04:44 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

This is part of a series on programming policy-based management. The series starts here.

So, to initialize my Condition's ExpressionNode I need more than just a string. It doesn't look like, at this time, every ExpressionNode CAN initialized with a string. But we can use the subclasses. The ones I need here are Operator, Attribute, and Function. I need: DatabaseMailEnabled (Attribute), Equals (Operator) and false (Function). BTW, ExpressionNodeFunction appears to be what you are programming when you use the "Advanced Functions" dialog in the SSMS Condition dialog.

Putting one expression together first looks like this:

ExpressionNode exp1 = new ExpressionNodeOperator(OperatorType.EQ,
    new ExpressionNodeAttribute("DatabaseMailEnabled"),
    new ExpressionNodeFunction(ExpressionNodeFunction.Function.False)
    );

Note that, unlike in SSMS, you don't need '@' before the attribute name. But I need to specify both DatabaseMail and SqlMail. That's just a little more complex.

ExpressionNode exp1 = new ExpressionNodeOperator(OperatorType.EQ,
    new ExpressionNodeAttribute("DatabaseMailEnabled"),
    new ExpressionNodeFunction(ExpressionNodeFunction.Function.False)
    );
ExpressionNode exp2 = new ExpressionNodeOperator(OperatorType.EQ,
    new ExpressionNodeAttribute("SqlMailEnabled"),
    new ExpressionNodeFunction(ExpressionNodeFunction.Function.False)
    );
ExpressionNode exp_both = new ExpressionNodeOperator(OperatorType.AND, exp1, exp2);

con1.ExpressionNode = exp_both;
con1.Create();

And, nice as you please. Looks just like the one defined in SSMS using the GUI. Make sure that you remove (or comment out) the original ExpressionNode.Parse statement.

Now, on to the policy.

// Create a policy that uses the condition we just created
Policy p1 = new Policy(ps, "OffByDefaultSMO");
p1.Condition = "MailOffSMO";
p1.AutomatedPolicyExecutionMode = AutomatedPolicyExecutionMode.None;
p1.Enabled = false;
p1.Create();

Looks like the one created by the GUI and works like it too.

This is a pretty simple policy, because it can only be applied at the instance level. Next, we'll look at a policy that can be applied to a set of database objects, e.g. All tables in a particular database. This requires that we investigate ObjectSets.

Cavaet. It's pretty easy to make a mistake, sans docs. Some mistakes produce a generic "I can't do this"-type message. So ALWAYS drill into the INNER exception if you get an error. That is:

try
{
    con1.ExpressionNode = exp_both;
    con1.Create();
}
catch (Exception ex)
{
    Console.WriteLine(ex.Message);
    if (ex.InnerException != null)
    {
        Console.WriteLine(ex.InnerException.Message);
    }
}

Stack trace might even be helpful.

Monday, March 17, 2008 1:15:14 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

This is the second part in a series about programming policy-based management. The series starts here.

To build our MailOffByDefault policy we need:
   Condition that specifies properties and settings
   Policy that uses the condition

Condition first. This looks pretty straighforward.

Condition con1 = new Condition(ps, "MailOffSMO");
con1.Facet = "ISurfaceAreaFacet";
con1.ExpressionNode = ExpressionNode.Parse(
                        "@DatabaseMailEnabled = 0 and @SqlMailEnabled = 0");
con1.Create();

The Condition class uses PolicyStore instance (ps) and names the condition (MailOffSMO). You initialize a facet with a string. So where did the string come from? It is named in the SMO library but I "found" it by making an equivalent condition with SSMS and inspecting it.

foreach (Condition c in ps.Conditions)
 // ... look at c in the VS visualizer

ExpressionNode should "define" my condition, and since there is no ExpressionNode constructor, I first tried Parse(). This created the Condition, but it was unusable in SSMS. SSMS wanted "false" not "0". I searched around for how to specify "false" in the parse string for a while, then came upon something better.

ExpressionNode has six subclasses that can be used in combination to specify any set of expressions that you need. These are
  ExpressionNodeAttribute
  ExpressionNodeConstant
  ExpressionNodeChildren (with subclasses)
    ExpressionNodeFunction, ExpressionNodeGroup, and ExpressionNodeOperation

That's next.

Monday, March 17, 2008 12:46:24 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

I've been looking at the new Policy-Based Management (was: Declarative Management Framework) in SQL Server BOL. All of the BOL examples use the SSMS user-interface to define and maintain the policy store. While there will likely be 3 ways to configure PBM as there is with, say, replication (SSMS, SMO, and system stored procs), I thought I'd take a try at programming it using the new SMO libraries.

Because the docs are sparse (there are listing of the new classes but no description of what they do in BOL), I figured I'd start by writing C# code, because I like the visualizers (those components that allow you to drill into a heavily nested structures while debugging) in Visual Studio. Later on, I'll port these to PowerShell. I can also use PowerShell reflection capabilities and the new SQL Server provider to get a quick look a the structures.

Be aware of the fact that, since this API is so sparsely documented it could change by RTM. Always a consideration.

The new classes live, for the most part, in two libraries:
  Microsoft.SqlServer.Dmf
  Microsoft.SqlServer.Management.Sdk.Sfc

I also added references in the project to:
  Microsoft.SqlServer.ConnectionInfo
  Microsoft.SqlServer.PolicyEnum
  Microsoft.SqlServer.Smo

I mean to start by replicating the two examples in the PBM (is that its new acronym?) books online tutorial. But first, we need a starting point. It's the PolicyStore class. The PolicyStore is also the machine-root of the PowerShell provider drive SQLSERVER:SQLPolicy\{machine}\{instance}.

You can initialize the PolicyStore's connection with an instance of SqlStoreConnection from the ...Management.Sdk.Sfc namespace. Not sure what Sfc stands for, but being an old C++ programmer, perhaps its SQL Foundation Classes ;-) ? No matter. Luckily you can initialize a SqlStoreConnection with a plain old SqlConnection. So, lets connect to the store.

SqlConnection conn = new SqlConnection("server=zmv32;integrated security=sspi");
PolicyStore ps = new PolicyStore();
ps.SqlStoreConnection = new SqlStoreConnection(conn);
conn.Open();

Where to go from here? In the PowerShell provider, the subdirectories of the policy store are Conditions, Policies, ObjectSets, PolicyCategories, and PolicyCategorySubscriptions. To create the first policy (MailOffByDefault) we need a Condition and a Policy. That's next.

Monday, March 17, 2008 12:43:55 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Friday, March 07, 2008

This is the last post in the series, at least for now. I'll update it (or post more on the topics discussed here) as the products involved evolve and mature. This one's about:

LINQ to SQL and EF queries will be "untunable" because, even if you discover a performance problem, you can't change the underlying API code to produce the exact SQL query that you want. There's too many layers of abstraction to change it.

T-SQL is a declarative language, allowing you the ability to rewrite queries for better performance. A whole cottege industry has grown up around this (I teach it myself), and it usually consist of changing the SQL to get the plan you want, based on your intimate knowledge of the (current) data and the (current) use cases. As one of the simplest examples, you can switch between joins, correlated subqueries, and nested subqueries to see which one gives best performance. Or use EXISTS rather than a correlated subquery or IN clause.

Because the queries (LINQ and/or ESQL) are programmatically transformed in SQL queries there is not (that I'm currently aware of) the ability to "rephrase" LINQ/ESQL queries to produce subtlely different SQL queries and thus better performance. If you can produce rephrased SQL by changing a LINQ/ESQL query (not just rewriting a LINQ/EF query to produce different results that are more optimal), I'd be interested in hearing about it. Perhaps another cottege industry awaits...

BTW, although most/many SQL queries can be rewritten (sometimes many different ways) and tested for best generated query plan/best performance, the limitation is that, in future, the query processor can get smarter, thus making your past years' work unncessary. Usually though, you've benefited from rewriting SQL for that extra 6 mos-5 years until the query processor changed anyway.

Besides query rewrites, you can also "hint" queries, in most dialects of SQL I've seen. This helps when the query processor chooses a suboptimal plan (uncommon, but not unheard of) and you have intimate knowledge or data and use cases. Or when you're trying to service different use cases with the same query; SQL queries only have one plan at a time (modulo parallelized plans) and you might have to satisfy different use case by differently hinting the same query. Because the translation to SQL is deeply imbedded in the LINQ/EF source code, if I find a performance problem that can be helped with a hint, I can't hint in the LINQ/ESQL code. This means going back to using stored procedures (they work with hints) and away from the model.

Hinting is usually not preferred over rewriting the SQL because hints "tie the query processor's hands", i.e. if the statistics change so that a different plan would work better, the query processor can't use this information because you've told it how to accomplish the query. You've changed SQL from a declarative language to an imperative language. It's best not to put query hints in code, but separate them to a separate layer. SQL Server 2005 calls this separate layer plan guides. The plan guide is a named database object that relates a hint to a query without changing the underlying code. You can add/drop plan guides or turn them on/off at will. Or re-evalute them when things (the statistics or use cases) change.

Can you use plan guides with LINQ/EF queries? Two things to keep in mind. First, a plan guide for a SQL statement requires an exact match on a batch-by-batch basis. Machine-generated SQL will likely make exact match easier, but you will have to check that the guides are being used each time LINQ/EF libraries changes. Second, plan guides work best if you have a limited number of them in your database. They're meant to be special-case... not to add another level of complexity to an already complex (and getting more so as the layers of abstraction increase) situation. So use these with care.

So, is this an issue worth worrying about? I think we'll need to wait and see. Fix a few "bad (generated) SQL or bad queries" problems before giving up entirely. Or, fix performance problems (in the generated SQL) by going to stored procedures and see how many procs you have after a year. Are the folks who are licking their chops in anticipation of LINQ/EF related perf problems justified? Well, its not me that thinks optimizing declarative languages will always have its place.

MHO.

Hope you enjoyed this series. As implementations of these models take hold, I'll be watching for items that would change my opinions. Or prove them...

Friday, March 07, 2008 3:08:08 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Thursday, March 06, 2008


This post is part of a series about worries when implementing LINQ to SQL or ADO.NET Entity Framework from a SQL database-centric programmer's perspective. The last two worries are related. It's mostly about either level of abstraction being one level away from the "real SQL code" that's being executed.

First off...
LINQ to SQL and EF will discourage the use of stored procedures to return rowsets; returning rowsets in stored procedures is *thought to be* superior to returning them in dynamic SQL. In addition, the code generators will use a subset of T-SQL query constructs, that is, only the constructs that the LINQ or ESQL language supports, rather than the full power of the T-SQL query language.

A stored procedure is always thought of by stored procedure afficianados as representing a "contract" between consumer and provider. That is, the database metadata tells me exactly what I'm going to get. Although the database metadata does indicate number, type, etc of parameters, this is absolutely not true for rowsets returned by stored procedures. There is NO database metadata that records anything about the returned rowsets, or even how many rowsets a stored procedure will return. Actually the number of rowsets returned is part of the ANSI standard but SQL Server implement it. In addition, errors that might happen in the middle of a stored procedure might result of rowsets being missing. And than there's always the possibility of returning multiple and/or different rowsets by using a stored procedure with conditional code... Not much of a rowset contract at all.

One way to ameliorate this problem (in SQL Server) is to use multistatement table-valued functions to return one rowset with known metadata. The main hassle with this is performance; a multistatement table-valued function is the equivalent of filling a table variable in code and then returning it. There are I/O considerations (the I/O of reading the base tables + I/O of reading the table variable at the consumer) The are also performance considertations as SQL Server table variables have no statistics, if the table-valued function is used as a row source in a larger query (composable queries), there is no way to estimate the number of rows returned by the TVF.

SQL Server's strongly typed table-valued parameters in SQL Server 2008 would be an analogous concept, but currently these are limited in being "input only" in procedures. No strong typed results yet. Oracle is an exception to this "no contract for rowsets" concept. Because Oracle doesn't return rowsets from stored procedures, they introduced a special parameter type called refcursor. Refcursors can appear in database APIs as a parameter (of type Refcursor or more generically "table"). And you can have strongly typed Refcursors, providing the needed contract. We'll have to wait for Oracle's (or DataDirect Technologies') EF provider or LINQ abstraction product to see how they use this.

So now that we've determined that there is no more of a rowset contract for stored procedures than ad-hoc SQL (the difference really is in SQL encapsulation and support of ownership chains, but that's another story), what about extentions that ESQL doesn't support? There are database-specifc extensions like SQL Server's PIVOT operator, or ANSI SQL standards, like ranking and windowing.

LINQ folks are quick to talk about implementation thorugh "extension methods" but the long and short of this is that these are a LINQ-ism, unrelated to LINQ to SQL. That is, the LINQ construct to SQL dialect statement mapping is fixed and embedded in the LINQ product. Using extensions to the *SQL statement mapping* (ie changing what SQL statement is produced) require either going deep into the framework (if this can be done at all) or implementing equivalent concepts on the client side, leaving the generated database code alone.

EF may have a little better story with this because each provider-writer implements the ESQL to query mapping, conceivably you could write a custom provider to encapsulate the supplied provider with extensions. However, the ESQL language itself does not have to capability of ODBC-like "escape clauses", so there'd be no way to express this extended SQL-based functionality in ESQL.

So I'd classify the "subset of SQL" and "stored procedure rowset is an anonymous type" problem as something that might be worked out in future releases of databases and frameworks. Until LINQ to SQL or EF provides escape clauses in the framework, the easiest way out is the ultimate escape clause, using the stored procedure that returns (anonymous) rowsets. And the more stored procedures are used (not CUD procedures, which enhance the model, but rowset-returning procedures), the farther away from the model you get. Which interferes with the usefulness of the model in general.

MHO.

Thursday, March 06, 2008 12:39:03 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

This post covers LINQ to SQL and EF worry #4. That is:

LINQ to SQL and EF will write code that gets too much or too little data at a time. Too much data in one query is a waste. Too little data is also bad because it means extra database round-trips.

I really put this in for completeness. Both LINQ to SQL and EF have good mechanisms to deal with this one. In addition, its not necessarily (only) an ORM problem. In a file system graphic user interface, do you prefetch all of the (perhaps thousands) or files' information throughout the entire file system when someone wants to look at the content of the C drive? The answer with this one is "it depends". If you know you're going to eventually display all of the related entities' information you likely do want to get them. If not, perhaps you want to get related entities all at one, when the first child entity is selected. Or get the children one at a time when each child entity is selected.

LINQ to SQL addresses this by implementing a property on the DataContext, the DeferredLoadingEnabled property. It is True by default, retrieving only the Customer object when the Customer has Orders. The related Orders objects are retrieve with an extra roundtrip to the database, one row at a time, when the Customers' Orders property is accessed in code. There is a related property, LoadOptions, on the DataContext that takes a DataLoadOptions instance that allows you to control exactly how much related data is retrieved. That is, do I want only related Orders or Orders, OrderDetails, and associated Products in a single round trip? The DataLoadOptions also allows you to filter the amount of data you get from related tables, that is, I want each Customers' associated Orders, but only OrderID and OrderDate.

ADO.NET Entity Framework does this a little differently. They don't have a property for whether deferred loading is enabled, they just load deferred by default. In order to load associated entites, there is a separate Load method, and an IsLoaded property that you can check before loading. EF also has an Include property of the query of also you to specify which related entities can be loaded, if eager loading is desired. With EF you can also use Entity-Splitting in your design if you know you always want to retrieve OrderID and OrderDate, but no other properties, from the Orders table. Object purists may frown on composing object based only on commonly-used queries, however.

You can also retrieve only certain columns from an object (ie all the fields in Customers but the Customers' picture) with either a related or anonymous type. And, of course, you can always specify a join that returns an anonymous type is desired to get just the properties you need from related tables.

So I'd say that this worry not only is completely unwarrented, but that LINQ to SQL and EF make programmers think more about lazy loading vs eager loading, and make it clearer and more maintainable than a join, which always returns an "anonymous rowset" with columns from all tables interspersed. That is, you know exactly what related data (at an object level) is being requested and retrieved.

MHO.

Thursday, March 06, 2008 11:41:33 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Monday, March 03, 2008

SQL Server 2005 introduced some new features that left "traditional" SQL folks puzzled. What's the hidden use for SQLCLR? Or the XML data type? Or Service Broker? Besides being available to you as a programmer, these are all used internally. 

I once had a conversation with a database person (not SQL Server) who, when asked about the viability of certain features in his database product for the general database developer, responded that "although there are only a handful of users using feature X, we use it internally in the product." So its always interesting to observe the SQL Server team using its own features to implement other features. Here's a couple of examples from SQL Server 2008.

SQL Server 2008 includes PowerShell integration and a PowerShell provider for SQL Server that allows you to navigate the database and policy object models using a file paradigm. Where else are they using PowerShell in the product?

A glance at the BOL reveals that there are a few new SMO (SQL Server Management Object) libraries, with names ending in DMF (e.g. Microsoft.SqlServer.Management.Dmf). These are used to setup and manage policy. According to this Policy-Based Management webcast, the libraries are used both internally (via SQLCLR) and externally (hosted by PowerShell) to define and check policy. Remember DMF stands for declarative management framework, the original name for the feature now known as Policy-Based Management.

Another use of PowerShell in SQL Server 2008 is as a subystem for SQL Agent jobs. In addition to the SQL Server 2005 Agent subsystems, you can use the PowerShell subsystem with jobs. Create a new job and jobstep to see PowerShell as a choice.

These (PowerShell and Policy) are also used together when you use an "on-schedule" policy execution mode. Choosing this mode schedules a SQL Agent job that uses a job step hosted by PowerShell that invokes a PowerShell cmdlet named Evaluate-Policy. The help file for this cmdlet says it "supports the SQL Server 2008 infrastructure". By the way, "On change - log only" policy execution mode uses Service Broker for its implementation.

Another example of synergy between new features involves the extended event feature. Extended events are implemented via an event engine built into SQL Server. Individual events are lightweight to fire and the infrastructure is amazingly flexible. Events can be associated with actions that pick up addition info, filtered via predicates, and dispatched synchronously or asynchronously to a variety of event targets.

Besides the events,actions,target, etc used for diagnosis, Extended Events are also used to implement to new SQL Server 2008 Auditing feature. Auditing has its own package of events,actions,data types, and targets which are used when you define an AUDIT in DDL, but because they are just "ordinary" Extended Event objects, you can also use them in conjunction with any of the other Extended Event facilities.

That's just a couple of examples of "feature synergy" in SQL Server 2008. I've blogged about others in SQL Server 2008 (e.g. SQLCLR UDTs for Spatial data types and hierarchyID) and I'll blog about more examples as I come across them.

Monday, March 03, 2008 9:49:28 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Saturday, March 01, 2008

This post is part of a series, see parts 1 and 2. Sorry to be so long getting back to this series.

This post covers LINQ to SQL and EF worry #3. That is:

LINQ to SQL and EF will encourage "SELECT to middle-tier, then update/delete" rather than issuing SQL UPDATE/DELETE statements that are set-based.

Neither LINQ to SQL or Entity Framework currently contains an insert/update/delete language. Entity SQL could contain DML in the future, LINQ (Language Intergrated QUERY) to SQL doesn't have one. But both APIs can affect INSERT/UPDATE/DELETE operations on the database. You create or manipulate object instances then call SaveChanges (EF) or SubmitChanges (LINQ to SQL).

The manipulate objects and save method works well in LINQ to SQL and reasonably well in EF. The distinction is that in EF, if there are related entities (e.g. a title row contains an integrity constraint that mandates that is title's publisher must exist in the publisher's table), you must fetch the related entity first. Or synthsize a reference using EntityKey (see associated post here) to save the database roundtrip involved in fetching the related entity. See this associated post about deleting a single row without fetching. But how about UPDATE?

SaveChanges and SubmitChanges can boxcar multiple INSERT/UPDATE/DELETE operations in a single roundtrip. But let's consider the number of database roundtrips involved to change a single customer row. This requires one roundtrip to "GET" the row and another to update the row. And what about a searched update in SQL (UPDATE...WHERE) that updates multiple rows? Or an updated based on a SQL join condition (my favorite example, using update over a recusrive CTE, gets all employees reporting to a certain manager and gives them all a raise)? The number of fetches required JUST to do the update increases. Maybe not the roundtrips required to get the rows, but the sheer number of fetches (network/datebase traffic) required.

Let's address the general "GET then UPDATE" pattern first. I worried about this one until I realized that, in most applications I've worked on, you don't usually do a "blind" UPDATE or DELETE. A customer web application fetches a row (and related rows), a pair of eyes inspects the row to ensure this IS indeed the right row, and then presses the gadget that causes an UPDATE/DELETE. So "GET then UPDATE" is an integral part of most applications anyway. OK for now. If UPDATE/DELETE of a row affects related rows, this can be accomplished with cascading UPDATE/DELETE in the database.

But how about multiple, searched, updates without inspecting/fetching ALL the rows involved? Neither LINQ to SQL or EF has a straighforward way to deal with this. AlexJ wrote an excellent 4-part blog series (start here) about rolling your own searched update in EF with an underlying SQL Server using .NET extension methods getting the SQL query text and string handling to turn it into an Update, but its not necessarily what I'd call straightforward. Maybe, wrapped in a library to encapsulate the details... It also looks SQL Server-dependent, and I thought EF wasn't supposed to be. So you'd need to replicate this for each provider.

LINQ to SQL contains the ultimate fallback method for this case. DataContext.ExecuteCommand() lets you execute any SQL command, including parameters. EF doesn't have the equivalent because (remember) your data store is an object model over a conceptual data source, not the data source itself.

I think this is one place (searched UPDATE/DELETE) that I'd suggest/mandate using stored procedures. The blind searched operation is accomplished in a single database roundtrip, and you can even (in SQL Server at least) use the OUTPUT clause in SQL DML to obtain information in rowset form as to exactly what got deleted. Since this is a database-specific operation, stored procedure sounds like a good workaround for this problem.

MHO: With stored procedures as needed and the realization that most apps use "GET then UPDATE" anyway, I think I'll dismiss this worry.

Saturday, March 01, 2008 2:52:51 PM (Pacific Standard Time, UTC-08:00)  #    Comments [3]  | 

Recently I had the occasion to load .NET 3.5 on to a machine that had an existing instance of SQL Server 2005. .NET 3.5 does not work by versioning the "main .NET assemblies" (e.g. there is no "version 3.5" of mscorlib.dll, System.dll) but by replacing the 2.0 versions of them. You can observe this by inspecting the 4-part version number. For example System.dll (in \windows\Microsoft.NET, Framework\2.0.50727 as well as in the GAC) changed from version 2.0.50727.42 to 2.0.50727.1433 when I installed .NET 3.5. As an aside, when I installed Vista SP1 recently (I think this is what triggered it) System.dll went to version 2.0.50727.1434.

SQLCLR (using SQL Server as a .NET runtime host) will load a series of "approved" assemblies by directly using the fusion APIs. However there are less than 20 approved assemblies for SQL Server 2005 and sometimes people will write SQLCLR code using assemblies that aren't of the approved list. The two most common cases I've seen include using System.DirectoryServices.dll or some of the WCF client assemblies. Neither is on the approved list.

Although using system assemblies outside the approved list is discouraged, it IS possible. The resulting (user) assembly must be cataloged as UNSAFE and the CREATE ASSEMBLY process will catalog not only the original user assembly and referenced "system" assembly but, because of the way assembly dependancies work, quite a few (I've seen up to 20) dependent system assemblies. You can see the user assembly and all dependent system assemblies in the SQL Server metadata table "sys.assemblies". You also receive warnings that "assembly xyz has not been tested in this environment".

I've always said that if you use system assemblies that are not in the "approved" list, you now shift the responsibility of managing those assemblies to *you*, the SQL Server application programmer and/or DBA.

So what does this have to do with .NET 3.5 and SQL Server? It's a little known fact that when SQL Server loads a "user" assembly, it will check to see if a similar assembly (ie assembly with the same strong name and version) exists in the GAC (.NET global assembly cache). If the user assembly in SQL Server and the user assembly version in the GAC have the same .NET version number but a different MVID, SQL Server will refuse to load the assembly (from SQL Server's system catalog) and throw an exception.

As pre-requisite information, all of the "main" system assemblies (like System.dll) have a GAC version number of 2.0.0.0 even though the actual version number may be 2.0.50727.42. Or 2.0.50727.1433. You can see this by inspecting the GAC. And...an MVID is a .NET module version identifier. This is different from the version number, even non-strong named assemblies have MVIDs. The module version identifier is a GUID that changes *each time* an assembly is reassembled.

You can prove that the GAC/MVID check occurs by cataloging a user assembly to SQL Server and to the GAC, then recompiling/changing the assembly in the GAC (only MVID, not version number) but not the version in SQL Server. The resulting error after the the GAC'd version is changed is:

Msg 10314, Level 16, State 11, Line 1
An error occurred in the Microsoft .NET Framework while trying to load assembly id 65536. The server may be running out of resources, or the assembly may not be trusted with PERMISSION_SET = EXTERNAL_ACCESS or UNSAFE.... (rest of message elided).

But again, what's this got to do with .NET 3.5? I cataloged two assemblies to SQL Server 2005 prior to installer .NET 3.5. One assembly simply did mathematics (catalog as SAFE), one used an assembly that wasn't from the "approved" list, System.Remoting.dll (catalog as UNSAFE, it brings in about 15 dependent system assemblies that are not on the approved list). User-defined functions as these assemblies were also cataloged and worked fine.

Now install .NET 3.5. This changes the MVIDs (but not the version) of some of the System GAC'd assemblies. The SAFE assemblies that refer only to system assemblies on the "approved list are unaffected". My mathematics assembly/function still worked great. My UseRemoting assembly/function WAS affected. The resulting error is:

Msg 6522, Level 16, State 1, Procedure doIt, Line 0
A .NET Framework error occurred during execution of user-defined routine or aggregate "doIt":
System.IO.FileLoadException: Could not load file or assembly 'System.Runtime.Remoting, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' or one of its dependencies. Assembly in host store has a different signature than assembly in GAC.

(Exception from HRESULT: 0x80131050)
System.IO.FileLoadException:
   at UseRemoting.Class1.DoIt()

So what's the moral of the story? If you stick to the assemblies on the approved list you are fine. Fusion loader will get the latest version for you, no changes are needed, although you want to test your user assembly again in the new environment. But by using assemblies not on the approved list YOU take responsibility for the assembly. After installing a new version on .NET that changes system assemblies that you use, don't forget to DROP your user assembly (this should also drop all the old dependent system assemblies), recompile and retest your user assembly (always a good practice) and then run CREATE ASSEMBLY again. All should be well, because your "unapproved" assemblies are the same version the GAC uses.

As an aside, but an advance warning, SQL Server 2005 does NOT need .NET 3.5. But SQL Server 2008 DOES use .NET 3.5 and installs it as part of SQL Server 2008 install. See previous paragraph.

Hope this helps.

Saturday, March 01, 2008 1:31:15 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Friday, February 29, 2008

It's good to see Ed Katibah open up a blog on SQL Server spatial data. Ed is sort of the "dad" of spatial data in SQL Server; he's got a ton of experience and history in this space. Be sure to catch his posting on what's new in spatial for CTP6.  I've been trying out the new functions and the more I use the spatial types and functions, the more I understand the reasoning behind the dual Geometry - Geography types in SQL Server.

Issac's back at blogging too about spatial indexes, really makes for good reading.

Subscribed!

Friday, February 29, 2008 11:28:28 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Thursday, February 28, 2008

I've been working with Filestream storage in SQL Server 2008 since it appeared in CTP5. The way I've always set it up is to use sp_filestream_configure. During the CTP6 setup process, I noticed you could now configure Filestream as part of setup. Because I knew how to use sp_filestream_configure I skipped that part of setup. And everything just worked as expected when I used it. A friend of mine, not wanting to miss anything, configured Filestream as part of setup. When he tried to use it, the following "Catch 22-like error messages occurred":

EXEC sp_filestream_configure @enable_level = 3;

"The FILESTREAM feature is already configured to the specified level. No change has been made."

CREATE DATABASE ... with a filegroup for FILESTREAM

Msg 5591, Level 16, State 1, Line 1
FILESTREAM feature is disabled.

Huh? After checking the short list of usual suspects (e.g. NTFS file system, running SSMS as admin under Vista) we were both puzzled. Especially because it "worked for me" as it always had. There's nothing more frustrating to hear during problem resolution then "Well, I'm not sure what you did wrong because it works for me". Grrrr...

There is finally a resolution thanks to the storage engine team (confrmed and expounded on by Joanna's blog entry on March 3), and it turns out that the way filestream is configured changed in CTP6 for some very good reasons. You see, sp_filestream_configure sets up Filestream in the OS (requires OS privs) and SQL Server (requires SQL Server privs). Unless you're running sp_filestream_configure as an OS Admin who's also a SQL Server sysadmin, it may not work completely. But sp_filestream_configure is "IN" for one last CTP (CTP6). Gone before RTM.

So filestream configuration was broken into two parts:
1. Configuring filestream at an OS level is moved into setup. Or use SQL Server Configuration Manager. Or WMI scripting (with SMO/WMI) is you like scripting (I do). In the SQL Server Configuration Manager GUI, you configure it by select SQL Server Services (left pane), right-click on your SQL Server service instance and choose "properties" and use the FILESTREAM tab. Note that the equivalent FILESTREAM tab does NOT appear when using Services Control Panel applet. I hope you switched (as you should have) to SQL Server Configuration Manager back in SQL Server 2005 days.

2. Configuring filestream at a SQL Server instance level requires EXEC sp_configure 'filestream access level', '2'. Note that the access level choices in SQL Server are 0,1,2.

3. BOTH configurations steps/setting (OS and SQL Server instance) must be compatible for filestream to work. If, for example, filestream is enabled at the OS level but disabled in SQL Server, you'll get message 5591. See above.

So how did my friend get the "Catch 22 errors"? In CTP6 (but not in future), setup performed step 1, but not step 2. In future setup will do both steps. In CTP6, sp_filestream_configure (to be removed before RTM) will do both steps. But before sp_filestream_configure does anything, it CHECKS to see if either configuration job is already done. Thus the "filestream feature is already configured" message.

Got it? So get used to configuring filestream in both places using SQL Server Configuration Manager AND sp_configure. That's the way of the future. The fact that both exist in one CTP can be frustrating but is understandable. Reminds me of a similar shift when configuring HTTP endpoints (which also requires both OS and SQL configuration) during the SQL Server 2005 CTPs.

Hope this post saves someone a few "grrrr... moments" when using this new useful feature.

Thursday, February 28, 2008 9:58:50 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Tuesday, February 26, 2008

Quite a bit of activity last week in the Visual Studio and ORM spaces.

Visual Studio released a CTP of Visual Studio 2008 support for SQL Server 2008 CTP6. It allows you to connect to CTP6 with Server Explorer, which enables quite a few other features to work. check with Overview document on the download page. SQLCLR projects and T-SQL and SQLCLR debugging work too.

Curiously the Overview document says "using the LINQ to SQL Designer with SQL Server 2008 databases" is specifically not supported. But I've used SQL Server 2008 WITH NO new data types (e.g. DATE, TIME, etc) and it does seems to work. EF beta3 designer too. Maybe they are referring to lack of support for the new data types. Speaking of which...

Faisal Mohamood announced on the ADO.NET team blog that they ARE working on support for the four new date and time types in SQL Server 2008. Hooray. Now let's keep our fingers crossed for SQL Server UDT support.

Julie Lerman wants these types in EF too, but I'm not sure I agree. DATE and TIME would be nice because they're ANSI standard types. DATETIME2 maps favorably to DateTime in EF, actually better coverage of the value space than DATETIME. Maybe what she really means is that she wants the SSDL and MDL and the designer to be aware of them and generate the closest possible mapping. Even if that's "string".

The distinction is that the ADO.NET Entity Framework is meant to be database neutral, not SQL Server specific, and supports a discrete subset of database data types not a 1-1 mapping of EF types to SQL Server types. LINQ to SQL, OTOH, is SQL Server-specific and should support a 1-1 mapping. At this point I'm really thinking they should have called it "LINQ to SQL Server". Not that the *concept* can't be extended to other databases (LINQ to DB2, LINQ to Oracle, etc) but these databases will have to implement their own. The fact is that LINQ to SQLCE does exist and is separate from LINQ to SQL (Server). And other vendors may follow with their own, database-dependent implementation of a light LINQ layer. But speaking of SQLCE...

The SQLCEBLOG announced the beta of SQLCE 3.5 SP1 with ADO.NET Entity Framework Beta 3. There was support for EF in SQLCE a while ago, but it was postponed in the 3.5 release which sports LINQ support. Now it looks like Entity Framework support is back. Hooray. BUT, although a common programming model between SQLCE and a subset of SQL Server functionality is a great idea, IMHO the best and most lightweight programming model for SQLCE uses SqlResultset. Still the winner.

Oh yes, BTW, SQL Sevrer 2008 CTP6 was released last week. But that's been posted on EVERY SQL Server related blog in the world. Congrats for another CTP folks. Sparse columns, filtered indexes, and filtered statistics are in as well as quite a few other features. 

Tuesday, February 26, 2008 11:48:12 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Thursday, February 14, 2008

Sometime before SQL Server 2008 RTM, the libraries for LINQ and LINQ to XML (that is System.Core.dll and System.Xml.Linq.dll) but NOT LINQ to SQL will be added to the SQLCLR "approved" assembly list. The approved assembly list is the list of .NET Framework libraries that have been tested in a SQL Server-hosted environment, annotated their code with HostProtectionAttributes where needed, and can be used in user assemblies that can be cataloged with "PERMISSION_SET = SAFE". That means you can use 'em in SQLCLR user-defined functions, procs, and other SQLCLR database objects.

Being curious, I asked "why LINQ and LINQ to XML but not LINQ to SQL"? The answers I got were "LINQ to SQL not tested in this environment" and "not sure there should be yet another way to access the database with .NET". Fair enough. Although, its never seems to be a problem to provide yet another way to mix SQL Server and XML. The number of different ways now exceeds the number of fingers on one hand.

OTOH, SMO doesn't appear to be added to the approved assembly list in SQL Server 2008. Nor has WCF. Oh well.

Enjoy...

Thursday, February 14, 2008 4:58:31 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

I often hear those who identify themselves as "database programmers" (sometimes I think these may be folks who program only inside-database code), say: "I'd give learning LINQ and/or LINQ to SQL/Entities or Entity SQL (eSQL) a go in my sparse time, but I don't have a good book or a tool.

I haven't read a book on either of these yet, spending most of my time in BOL. BOL is a good start (but not comprehensive) for LINQ and LINQ to SQL. eSQL docs are pretty sparse, but that's because its not a shipping product yet. Maybe things will improve. (Note: if anyone wants to send me a copy of their book to read in my spare time, I'll give it a go ;-).

Following LINQ is more difficult than EF (modulo LINQ to Entities) because there's many implementations/dialects. There are operators that LINQ (over objects) can do, but LINQ to SQL or LINQ to Entities cannot.

On to tools:

For LINQ, I like LINQPad by Joseph Albahari. Simple to set up, although the queries are not *exactly* the same as you'd write in a program. And it doesn't support LINQ to Entities yet (or LINQ to DataSet for that matter). But, all in all, an excellent tool.

For eSQL, there's no comparison to Zlatko Michailov's eSqlBlast. This is not only an eSQL practice query processor, but also a visualizer for the results you'd get back. You just type in the eSQL. Even comes with source code. Latest version is on CodePlex. This is straight eSQL, no LINQ to Entities support here either.

Neither one of these has Intellisence or statement completion, I believe.

Enjoy...

Thursday, February 14, 2008 4:24:56 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

This post covers LINQ to SQL and EF worry #2. That is:

LINQ to SQL and EF will encourage "SELECT * FROM..." style coding because you get back a nice, clean object this way instead of the less-useful anonymous type you receive by doing projection. This will also make most/all covering indexes useless.

LINQ to SQL and EF can return something other than a whole object instance. Here's an example:

// This returns a collection of authors
var query =  from a in ctx.authors
             select a;

// this returns an anonymous type
var query =  from a in ctx.authors
             join ta in ctx.titleauthors on a.au_id equals ta.au_id
             join t in ctx.titles on ta.title_id equals t.title_id
             select new { a.au_id, t.title_id };

The collection of authors returned by the first query is updatable and a reference to it can be used outside the function in which its created. The anonymous type is not updatable and cannot be used outside the function in which its created (there is a nice workaround for this, but even the author of the workaround describes it as a bit of a hack). Although I can see a use for anonymous types in data-binding the good ol' dropdown list. But because they're be can't insert a new row unless you include all the columns you need, perhaps not dropdown comboboxes.

You don't necessarily need to return an anonymous type. You can define (by hand) a class that represents the projection of authorid and titleid. Or use a view. But, in order to do this on a large-scale project, you'd need to define a class for each projection in the entire project. Just for fun, I ask my students "do you know every projection (rowset) that every query in your project returns"? Can you even enumerate them? No one's answered "yes" to that question yet. So anonymous type (or much extra work). Maybe someone will write a tool to do this some day. Or "whole objects"...aka SELECT * FROM.

Counter to this is perhaps you SHOULD know every projection your project returns. Would certain help in tuning to know them all.

BTW, this is the same issue you'd run into using stored procs that return rowsets with LINQ to SQL; they return anonymous types. Using rowset-returning procs with EF *forces* (in beta3) you to define a class to contain the rowset produced. That's good. But EF can't make use of procs that return more than 1 rowset (SqlDataReader.NextResult() in ADO.NET). LINQ to SQL can use these, you get multiple anonymous types.

So simple 1-rowset sprocs with EF, or custom classes for every projection (and sproc in LINQ to SQL) it is. Or nice, clean, full objects (using SELECT * FROM). And what about covering indexes? An overly-simplistic definition would be "non-clusted index defined over the set of columns used by one table in a projection". Makes for some nice query plans (and sometimes even helps with concurrency issues in SELECTs). But if we're always doing SELECT * FROM, flush those covering indexes. The only index that matters is the (possibly) clustered index on the base table.

A counter-argument to this is that you really shouldn't define a covering index for every projection just because you can. Any index takes space and affects insert/updates/deletes, there's a tradeoff here. A second counter-argument is that if you need many, many covering indexes, perhaps your database isn't as well normalized as it could be. I'm not really sure I buy this argument.

MHO: This is a for the mostpart a valid worry.

Thursday, February 14, 2008 3:55:51 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Wednesday, February 13, 2008

I'd been meaning to write this entry for a while, about my opinion on LINQ to SQL and the ADO.NET Entity Framework and performance. I've just finished reading the 2-part blog series "Exploring the Performance of the ADO.NET Entity Framework", and was surprised (I guess) that the database performance aspect was barely mentioned.

One way to look at performance is to examine and profile the ADO.NET code, but because both EF and LINQ to SQL are T-SQL code generators (EF is not database-specific, but I'm only talking about SQL Server here, so far), another way to talk about performance is to examine the generated T-SQL code. I want to look at the generated T-SQL code side.

In this posting I'm going to list my top worries WRT the code generated by these APIs. And try and argue for and against the worries at the same time. Here's the list.

1. LINQ to SQL and EF will proliferate dynamic SQL, and will almost surely produce suboptimal dynamic SQL, causing database performance problems and plan cache pollution. And dynamic SQL is evil to start with.

2. LINQ to SQL and EF will encourage "SELECT * FROM..." style coding because you get back a nice, clean object this way instead of the less-useful anonymous type you receive by doing projection. This will also make most/all covering indexes useless.

3. LINQ to SQL and EF will encourage "SELECT to middle-tier, then update/delete" rather than issuing direct SQL UPDATE/DELETE statements that are set-based.

4. LINQ to SQL and EF will write code that gets too much or too little data at a time. This is a variation of #2. Too much data in one query is a waste. Too little data is also bad because it means extra database round-trips.

5. LINQ to SQL and EF will discourage the use of stored procedures to return rowsets; returning rowsets in stored procedures is *thought to be* superior to returning them in dynamic SQL. In addition, the code generators will use a subset of T-SQL query constructs, that is, only the constructs that the LINQ or ESQL language supports, rather than the full power of the T-SQL query language.

6. LINQ to SQL and EF queries will be "untunable" because, even if you discover a performance problem, you can't change the underlying API code to produce the exact SQL query that you want. There's too many layers of abstraction to change it.

7. Other arguments? I'll accept other points to argue/worry about...

Here the argument about worry-point #1. Other worry-points in upcoming posts.

It's interesting to see the number of folks who do query tuning for a living salivating over the prospect of tuning the bad, bad, bad queries that will assuredly result from these two data access stacks. And the number of DBA-types who'd like to "ban LINQ/EF and databases in their companies". It's also interesting to note that most people who profess a dislike for the generated code, have never seen (or seen very little of) the generated code. Usually, when someone sites a particularly bad instance of generated code they're never able to tell me if the code came from LINQ to SQL or EF. So I'd like to open a clearinghouse for LINQ to SQL and EF queries that generate really poor SQL. Send them to me at my SQLskills email address, together with enough info/data to reproduce it. Conor Cunningham's blog has begun addressing the question of LINQ's IN operator and SQL cache pollution, by writing about IN and SQL plans.

So far, its been my experience that LINQ to SQL, being more relation-centric, will in general generate code that's closer to what a good T-SQL programmer would generate. EF is more "object-centric" and sometimes generates SQL thats meant to construct object graphs. But neither one of them (that I can deduce) can generate a full outer join.

About dynamic SQL. It's almost dogma amongst database programmers that "static SQL" in stored procedures is better for security than dynamic SQL constructed via string concatenation. Besides the obvious association between dynamic SQL and SQL injection, using dynamic SQL means that all users must be given access to the underlying tables, unless you strictly use LINQ to SQL/EF with views and sprocs. LINQ to SQL and EF make every attempt to use parameterized SQL, and LINQ to SQL claims to have minimized/eraticated the potential for SQL injection when using their code. And remember, many programmers use dynamic SQL currently, LINQ to SQL would be an improvement for them.

More about plan cache pollution. I'll start by recommending Sangeetha Shekar's blog series on the plan cache. The starting entry of the 17-part series is here.  Except for the concept of "many different projections when one stored procedure will suffice" there's no cachability difference between parameterized SQL and a stored procedure. Non-parameterized SQL suffers a slight cachability difference until its reused. Because LINQ to SQL and EF are code generators, its likely that they may generate more homogenous SQL than programmers who use SqlCommand.CommandText. And programmers that use dynamic SQL (the ones most likely to use only LINQ to SQL/EF) are likely causing plan cache pollution right now in any case.

MHO.

Wednesday, February 13, 2008 5:47:20 PM (Pacific Standard Time, UTC-08:00)  #    Comments [4]  | 

This entry is a continuation of the previous posting on how to use SQL Server 2008 spatial data and Virtual Earth. This entry discusses your web service and SQL Server code choices.

The point of the web service is to translate one of the output formats of SQL Server 2008 spatial to a format this Virtual Earth can use. The spatial data types can be output in:
1. Well-known text format
2. "Native" format - that is, return a Geometry/Geography type to a .NET client as the SqlGeometry/SqlGeography native .NET types
3. GML - Geography markup language

Remember that Virtual Earth needs:
1. GEORSS or
2. KML
3. Custom XML-based format (and custom Javascript code to process it)
4. Javascript strings (to pass to eval())

There is not a one-to-one correspondence between any of these methods, so there is going to be a need for a transformation somewhere. In addition, a SQL Server Geometry or Geography type can contain a collection of points, linestrings, or polygons. There may also need to be a one-to-many translation. This is facilitated by methods STNumGeometries/STGeometryN and STNumPoints/STPointN.

Finally, there could be a pushpin (point) associated with the center of a polygon and/or text associated with the pushpin. Where the text comes from will be application-dependent.

Because there are many ways to process XML in SQL Server and in web services, that's lots of choices. Some that come to mind are:
1. Transforming a SqlGeometry/SqlGeography from a SqlDataReader to a custom XML format
   Javascript code in the web page processes the custom format.
2. Using SQL Server's XQuery to transform GML into GEORSS or KML
   This is returned to the web service as a SqlXml data type.
   Consumed directly as a layer in Virtual Earth.
3. Using an XSLT transform (or LINQ to XML) to transform GML to GEORSS or KML. This can occur in the web service or even a SQLCLR procedure, although this type of processing should really be accomplished in the web service.
4. Using SELECT ... FOR XML PATH to construct GEORSS or KML directly.
   Similar processing to choice #2 above.
5. Transforming a SqlGeometry/SqlGeography from a SqlDataReader to Javascript strings.
   Web page simply calls eval()

Hopefully this posting will elucidate the choices you have for this rendering. Depending on which method you choose, there could be quite a few "moving parts" (a la Rube Goldberg) in the solution.
- Web pages with custom Javascript code
- Web service to supply the data
- HTTP Handler to allow Virtual Earth to use remote GEORSS/KML
- Data access code in SQL Server
- Transformation code in web service or SQL Server

So.....what's your favorite method for using SQL Server 2008 spatial data with Virtual Earth?

Wednesday, February 13, 2008 3:53:53 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

Last week at the SQL Server 2008 Jumpstart I showed an example of using SQL Server 2008 spatial data types with Virtual Earth. I showed a single coding style. There really is a plethora of coding styles to using these together, and I'd like to describe some of the most common ones. I'd like to do this without much code for now, because an end-to-end description sometimes results in the graphic aspects (which other controls to use, how to build Ajax web pages, etc) seem to get in the way.

I'm not even going to mention using Virtual Earth to serve tiles directly, but only the Virtual Earth web control. Start with the Virtual Earth Interactive SDK for coding information.

In general, what we're after is:
   Page contains Virtual Earth control
   Page makes a request to a Web Service for data (usually Ajax)
   Web Service get and processes data from database, returns data
   Page uses Virtual Earth calls to present the data

The first two are straightforward, although the Virtual Earth control is available only online via a URL. This makes testing when you don't have internet access virtually impossible. The Ajax request (that is, call a URL with an XML payload using a browser-specific mechanism, wait asynchronously for the response) normally doesn't use SOAP-based Web Services, because you'd have to add and remove the SOAP headers. It's usually XML-in, XML-out, sans SOAP headers.

The Virtual Earth calls depend on what version of Virtual Earth control you're using. I'll show this using version 6.
1. You can render using shapes directly
2. You can import GEORSS into a shape layer
3. You can import KML into a shape layer

Virtual Earth supports visualizing shapes directly (#1 above) by using:
1. AddPushpin - to visualize a point or the centroid of a polygon
2. AddPolyline - that's Linestring in SQL Server 2008
3. AddPolygon - for polygons

For shape layers (#2 and #3 above), the code looks something like this:

var l = new VEShapeLayer();           
var veLayerSpec = new VEShapeSourceSpecification(type, local.xml, l);

map.ImportShapeLayerData(veLayerSpec, onFeedLoad);

The "type" in the VEShapeSourceSpecification constructor can either be GEORSS or ImportXML (that's KML). You can also have a Live Search Maps Collection but likely that one won't come from SQL Server.  Bear in mind that the VEShapeSourceSpecification constructor needs a *local* XML file URL. If you don't want to use a local XML file, you can build an ASP.NET HTTPHandler that uses redirection to pretend a remote XML file is local (see Mike McDougall's article and code to accomplish this).

One last twist that I've seen is that you can build your Javascript code directly in each page OR make the Web Service program reformat your data into Javascript calls directly. Once you're back at the page (the Web Service passes back a string of Javascript) you simply call Javascript's "eval" function. This makes the web page client-side code easy, but the Web Service code lots more complex, as you're generating code, not points, lines, and polygons.

Web Service choices follow next....

Wednesday, February 13, 2008 3:52:09 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Tuesday, February 12, 2008

SQL Server 2005 introduced the concept of endpoints. Every connection entry point into SQL Server is abstracted as a endpoint. You can see all the endpoints on your SQL Server instance by using the sys.endpoints metadata view. In addition to those you'd expect (for TCP/IP, Named Pipes, Shared Memory) there are also endpoints defined for the dedicated admin connection, Service Broker, Database Mirroring...and HTTP.

The endpoint concept is still with us, but HTTP endpoints, those endpoints that allow you to expose SQL Server procs and functions directly as HTTP Web Services will be deprecated in SQL Server 2008. Upon creating an HTTP endpoint (aka SOAP endpoint) in SQL Server 2008, you'll be warned that "this feature will be removed in a future version. You should not use these in new development work." This was announced at the SQL Server 2008 Jumpstart last week.

So what replaces this functionality (and how many folks actually used it in SQL Server 2005). The likely successor seems to be ADO.NET Data Services, a mechanism to produce RESTful services over any type of data, using Entity Data Model or LINQ to SQL. The model you expose through data services doesn't really have to be a database at all.

And for those of you that liked the idea of exposing your stored procedures as services, ADO.NET Data Services supports "Service Operations", which can be parameterized and contain any type of logic you wish. Service Operations aren't limited to stored procedures, but could be used to encapsulate them.

Tuesday, February 12, 2008 5:08:32 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  |