Monday, August 30, 2004

There are a number of different ways that SQL Server 2005 checks to see whether on not you're trying to catalog or execute dangerous .NET code. This checking is enforced by SQL Server at CREATE ASSEMBLY time and also at execution time.

During CREATE ASSEMBLY, SQL Server reflects on the assembly and check for certain coding practices than can "cause problems" at runtime. It's well-known that SQL Server does not allow you to catalog an asssembly as SAFE or EXTERNAL_ACCESS if the assembly contains finalizers or mutable statics.

It's also well-known that SQL Server will throw a SecurityException at runtime if you attempt to access classes or methods marked with a Host Protection Attribute. HostProtectionAttribute was added to the .NET 2.0 runtime to allow class-library authors to annotate their code in order to inform the host (in this case SQL Server) of behaviors such as "MayLeakOnAbort" or "SelfAffectingThreading". HPAs provide protection at execution time, not catalog time.

It's a little-known fact that SQL Server also checks for the presence of certain attributes on assembly code at CREATE ASSEMBLY time. If your code contains dangerous attributes CREATE ASSEMBLY does not succeed. This is different checking than what HPAs allow. This list of "dangerous attributes" is checked at catalog time. The attributes SQL Server considers dangerous include:
  System.STAThreadAttribute and System.MTAThreadAttribute
  System.ThreadStaticAttribute
  System.ContextStaticAttribute
  System.DllImportAttribute

Doubtless there are others. Attempting to catalog an assembly that contains one of these attributes produces an error message like this:
Msg 6237, Level 16, State 1, Line 1
CREATE ASSEMBLY failed because method 'ProcWithAttr' on type 'StoredProcedures' in assembly 'SomeAssm' has invalid custom attribute 'System.STAThreadAttribute'.

Interestingly, I was able to find one that slipped by; SQL Server will allow cataloging an assembly with the UnverifiableCodeAttribute as SAFE. Maybe it was due to the fact that my module did not actually *contain* unverifiable code, just an attribute that said it did.

I'm amazed at the thoroughness with which SQL Server checks for code that can cause it problems as a .NET runtime host. .NET reflection is truly its friend; try to implement this type of checking on an extended stored procedure!

Monday, August 30, 2004 9:26:41 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Sunday, August 29, 2004

Because of SQL Server 2005's support of XQuery, a lot of folks have been asking about good XQuery-specific books. I've found two excellent books, each with a different focus.

"XQuery From The Experts" is a compendium of essays by the folks in the W3C XQuery working group. It's an excellent treatise of how XQuery is meant to work, the problems its meant to solve, and the details of implementation. Since most of the writers are also XQuery processor implementers, you're getting the "inside story". I especially like Jonathan Robie's XQuery tutorial chapter and Michael Rys' chapter on implementing an XML data type. Michael's XML data type implementation theory is not identical to the SQL Server 2005 implementation (although he's the PM for this feature), but its close enough in concept to make it a great read if you want to understand this feature.

If you're more interested in XQuery from a user perspective, Michael Brundage's "XQuery: The XML Query Language" is right up your alley. Michael covers the syntax and usage in depth, with lots of examples. The XQuery specs lack all but the most cursory examples (it's not their charter) so Mr. Brundage's book is a must. Although Michael works at Microsoft, his book doesn't cover the System.Xml implementation in .NET 2.0, but the language as defined in the spec. So you'll be doing deltas with the SQL Server 2005 or System.Xml implementations (which are closely aligned) in any case.

Remember that both books are based on a standard-in-progress, although now it's close to done. Happy XQuery-ing.

Sunday, August 29, 2004 11:09:45 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Wednesday, August 25, 2004

I'm at the Ascend program launch this week; it's being launched as part of a larger event. Yes, I finally got to meet Roger Doherty. I'm doing a day-long (actually a very long day-long) presentation of SQL Server 2005 new features for developers tomorrow. Today, I'm listening to Larry Chesnut speak about "DBA and scalability topics" as I write this. It's interesting to see what people consider "DBA topics" as opposed to "developer topics", because in the SQL Server culture, many small shop developers are also their own DBA. In addition, SQL Server runs some of the world's largest installtions.

It's also interesting that I've met up with quite a few folks here who took my class in beta 1 timeframe. They're not coming for the most part because they've "already seen it". If you look back at the blog entry for my "20 favorite beta 2 features" about a month ago, there's many new features even since beta 1. So I have at least 25 new things to talk about; the list of 20 were only favorites and I've "found" more since then.

I've also been asked here if our book "First Look at SQL Server for Developers" is based on beta2/.NET 2.0 beta 1 or the earlier Yukon beta1. The answer is Beta2 with qualifications. There's a lead time in publishing and we wanted the book to ship with the beta. So we took the information that we could get, did experiments intermediate versions and wrote what we thought would be in beta2 at the time. The server folks were pretty settled by that point, but there may be little nuances since that I've found through experimentation. The client-side folks made some big changes since we declared done, we caught as many of them as we could. That said, I'm keeping up to date with the latest on the book's website. It's difficult to write "in depth" book as the product's changing. I was inspired by Eric Gunnerson's excellent C# book, which shipped early but followed the product as it evovled via the book's website.

Wednesday, August 25, 2004 11:13:12 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Monday, August 23, 2004

If you're developing procedural code, UDTs, or UDAggs in SQLCLR for SQL Server 2005, Visual Studio 2005 has these nice auto-deploy projects called SQL Server projects that you can use. Comes with templates for each SQL Server 2005 item you can develop, just choose a SQL Server and database at project creation and hit “Deploy” menu entry when you're ready. F5 debugging is even built in to them. VERY cool.

But...there are some folks that don't want any help. Or have an existing class library they want to try. Or want to start from first principals and show others how to install a .NET assembly “by hand”. In Visual Studio Alpha/SQL Server 2005 Beta1 you could do this with a default class library project. In VS Beta1, you can't because there are mutable statics in the class library project “by default”. You have to install it as UNSAFE. Or....

1. Create a new class library project.
2. Click "show all files".
3. Look under Properties (C# project) or MyProject (VB.NET project)
4. Delete Resources.resx and Settings.settings. If you are working in VB.NET project you do NOT have to delete MyApplication.myapp.
5. You're good to go. Just code it up and manually deploy as SAFE (the default).

Monday, August 23, 2004 3:01:23 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 

To squeeze every ounce of performance out of the SqlServer data provider, the rule is to allocate the least number of ancillary object instances possible. In SQLCLR Optimizations 1, I mentioned a mechanism to keep from creating a SqlCommand and associated objects, just to initialize a SqlDefinition.

Another performance increase comes by not allocating/using extra buffers. When you execute a statement that produces a rowset, this means using SqlDataReader only when you need it. You need it only when you are  going to consume the results IN the .NET procedure and do some processing with them. You don't need a SqlDataReader just to pass a rowset back to the client. As an example, if I want to emulate this trivial T-SQL proc:

CREATE PROCEDURE getauthors
AS
SELECT * FROM authors


this .NET code looked pretty straightforward to me:

public static void GetAuthors1()
{
  // create and initialize command
  SqlCommand cmd = SqlContext.GetCommand();
  cmd.CommandText = "select * from authors";

  // get data reader
  SqlDataReader rdr = cmd.ExecuteReader();

  // pass results to client
  SqlContext.GetPipe().Send(rdr);
}

This actually allocates enough of a buffer in your procedure to hold a row and reads into the buffer, just to turn around and hand it to the SqlPipe. That buffer is pure overhead. A more performant way, that basically points the rowset at the SqlPipe, is to use SqlPipe.Execute. You can Execute a SqlCommand or use a SqlExecutionContext.

The SqlExecutionContext is the class that represents the volitile parts of SqlCommand, the parameters, and contains methods to execute SQL and return results. Here's the same procedure, using both SqlDefinition/SqlExecutionContext and SqlPipe.Execute.

// initialized once during class construction
readonly static SqlDefinition def =
  new SqlDefinition("select * from authors", null, null);

public static void GetAuthors2()
{
  // lookup static part of command
  // in this example, the command is entirely static
  SqlExecutionContext ec =
    SqlContext.GetConnection().CreateExecutionContext(def);

  // execute it. no SqlDataReader needed, results to client
  SqlContext.GetPipe().Execute(ec);
}

At first glance, you look at System.Data.SqlServer and see a few classes that look strange, but most everything looks the same as with the SqlClient provider. The big perf improvements, however, come from using the classes that look strange and are different. That's why they are there, to allow efficiencies that you can't get (like not allocating a buffer to receive a rowset) on the client. But you wouldn't send a rowset back to the "client" from the client, this only happens in a procedure inside SQL Server. On the server it's a whole different ballgame.

 

Sunday, August 22, 2004 11:14:56 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Friday, August 20, 2004

What's error 6522? It's the error that you always get from an unhandled exception in a SQLCLR stored procedure. If I write a SQLCLR error that throws an unhandled exception (let's write one on purpose):

[SqlProcedure]
public static void ErrorExecute()
{
    SqlCommand cmd = SqlContext.GetCommand();
    // everyone knows Bob can't type
    cmd.CommandText = "select * from authorss";
    SqlContext.GetPipe().Execute(cmd);
}

using it from T-SQL:

execute errorexecute
print @@error

Yields:

Msg 208, Level 16, State 1, Line 0
Invalid object name 'authorss'.
Msg 6522, Level 16, State 1, Procedure ErrorExecute, Line 1
A .NET Framework error occurred during execution of user defined routine or aggregate 'ErrorExecute':
System.Data.SqlServer.SqlException: Invalid object name 'authorss'.
   at System.Data.SqlServer.Internal.StandardEventSink.HandleErrors()
   at System.Data.SqlServer.Internal.RequestExecutor.HandleExecute(EventTranslator eventTranslator, SqlConnection conn, SqlTransaction tran, ClrLevelContext clrlvlctxtPipe, UrtExecutionType uetType, InternalResultSetOptions irsoOpts, CommandBehavior eBehavior, Object& objResult)
   at System.Data.SqlServer.Internal.RequestExecutor.ExecuteToPipe(SqlConnection conn, SqlT.
6522

Not that @@ERROR returns 6522, NOT 208. If you write an equivalent bad T-SQL proc:

create procedure errortsql
as
-- cant type any better in TSQL
select * from authorss
go

execute errortsql
print @@error

returns 208.

Why does this matter? Say that I have a T-SQL stored procedure (not this one, obviously) that I want to replace with a SQLCLR equivalent that say, runs faster. Say the procedure is used in 20 places in my application, each with semantics that check for specific values of @@ERROR. Won't work the same.

You can't catch the exception in SQLCLR and throw the "correct" one (using a SqlCommand with CommandText of RAISERROR(...)) either, because you can't throw a 208 error (or any other system error) with RAISERROR. So I'll have to change each one of my caller procs too when I switch to SQLCLR. The only current workaround is to change to the new T-SQL try-catch syntax

BEGIN TRY
  EXECUTE errorexecute
END TRY
BEGIN CATCH
  -- prints 208, whew...
  PRINT CONVERT(varchar(10), error_number())
END CATCH

Hopefully this will be changed or a workaround in SQLCLR will be available in the next beta. I want @@ERROR to somehow return 208, not 6522. Or this will be a great motivator for everyone to change to T-SQL TRY-CATCH.

Friday, August 20, 2004 3:14:09 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

Good T-SQL programmers are always looking to optimize their procedural code. Little optimizations can end up as big savings if the code will be executed many times or if it's used in many places in the application. Good SQLCLR programmers will be doing this too. Recently I ran into a micro-optimization that has promise.

Everyone who's used SQLCLR in earnest knows by now that it's useful to separate SqlCommand into two pieces: SqlDefinition and SqlExecutionContext. SqlDefinition is the static portion of the command, SqlExecutionContext contains the execution methods like ExecuteNonQuery and friends. Last time I'd heard rumors, about a 20% performance improvement, although it's not usually useful (or not allowed in betas) to report performance numbers. It's beta software after all, possibly with extra unoptimized code that will be removed/optimized at release. Especially SQLCLR, a brand new feature. So I didn't confirm the rumored numbers; sounded like a reasonable assumption.

SqlDefinitions are best initialized in the class constructor and stashed in readonly static (Shared Readonly in VB.NET) variables. I've done it like this:

static readonly SqlDefinition def = null;

static MyClass() {
  SqlCommand cmd = SqlContext.GetCommand();
  cmd.CommandText =
      "select * from authors where au_id like @au_id";
  cmd.Parameters.Add("@au_id", SqlDbType.Varchar);
  cmd.Parameters[0].Size = 50;
  def = new SqlDefinition(cmd);
}

Lately, I've noticed there's an even better way. There's a constructor for SqlDefinition that takes a CommandText, CommandType, array of SqlMetaData, and array of ParameterDirection. I can even put the initialization of these in the variable declarations, like this:

readonly static SqlMetaData[] md =
    new SqlMetaData[1] { new SqlMetaData("@au_id", SqlDbType.VarChar, 50) };
readonly static ParameterDirection[] pd =
    new ParameterDirection[1] { ParameterDirection.Input };
readonly static SqlDefinition def = new SqlDefinition(
    "select * from authors where au_id like @au_id",
    CommandType.Text, md, pd);

The variables must be defined in this order, as the SqlDefinition depends on the other variables being initialized first. Or I could initialize them explicitly in the class constructor.

Doing it this way saves me from creating object instances, then throwing them away to the garbage collector. Namely:
1. 1 SqlCommand
2. 1 SqlParameterCollection
3. N SqlParameters where N is the number of parms in the query. SqlParameter actually contains SqlMetaData as a member, plus "other stuff"

Because I'm only executing this ONCE (the first time the class is instantiated) in an entire SQL Server run, is this a micro-optimization? Probably. But what if I have 100 or 1000 such SqlDefinitions? Maybe not as "micro" then... what do you think?

Friday, August 20, 2004 9:54:48 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Wednesday, August 18, 2004

I've been trying out the new session support in HTTP endpoints. You manipulate this through SOAP headers, and looking through the list of SOAP headers supported allowed me to understand this feature from an entirely different perspective. It looks like the session support makes this into more of an alternate client stack than a mechanism to serve web services on the internet or (more likely) intranet.

The HTTP endpoints session support is meant to overcome the limitation that HTTP is a stateless protocol, while TDS/SNI/Netlibs is a stateful one. That is, you normally login to SQL Server perform multiple operations and logoff. With HTTP endpoint sessions, you can do this. There are SOAP headers for "connection string parameters" like initial catalog and application name, there are initiate (login) and terminate (logoff) attributes. There are headers exposing concepts like database transactions and prepared statements. There are even headers for new SQL Server 2005 features like database mirroring partner (for failover) and notification request (for query notifications).

I'll post some samples to show what's possible. They use the SoapHeader definition ("include file") starter (tweaked a bit) from the BOL. Bear in mind that if you define the endpoint on your own machine (you can't come over to my house and use my machine, sorry) you need to update the web reference. Doing so wipes out the SOAP Headers that you need in the proxy class (Reference.cs) to make this work. So don't forget to add 'em back. Save the original code in case you forget. For a more detailed explanation than that, refer to BOL or come to the class and I'll show you how. Or come to Windev and Dan Sullivan will, he's doing the talk on this feature. And have a think about why this is useful, maybe I'll discuss this next.

Wednesday, August 18, 2004 11:02:55 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Monday, August 16, 2004

When you load XML into an ADO.NET DataSet using it's ReadXml() method, ADO.NET uses a fixed set of rules to determine how to infer a relational structure for the XML. These rules are doc'd in the .NET framework docs. One of the most frustrating rules was that you couldn't have the same attribute name or subelement name in more than one different element. Doing so cause the ReadXml() method to produce the well-known error “The same table (Table name) cannot be the child table in two nested relations.”

The algorithm is improved so this type of XML can be loaded in .NET 2.0. It can load my test semistructured (also known as “tags and text”) document, as well as more mundane XML documents that caused this error in previous versions. The document:

<book>
<chapter>
<title>Testing your <noun>typewriter</noun></title>
<p>The quick brown <noun>fox</noun> jumps over
the lazy <noun>dog</noun></p>
</chapter>
</book> 

Produces 4 tables, named book, chapter, p, and noun. It even serializes out using WriteXml() the same way it looked when it was loaded, adding only an extra unused namespace declaration for xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance".

This will be a help for developers using this method of XML to relational mapping.

Monday, August 16, 2004 9:32:25 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Thursday, August 12, 2004

I was talking with Dan Sullivan this week about UDTs in SQL Server 2005 and he's made an interesting discovery. About a week ago, I'd happened to mention to Dan that UDTs and UDAggs (user-defined aggregates) are supposed to have a pattern of "deserialize-do action-serialize". He thought I'd misunderstood. Then he went off to write an improved UDT/UDAgg for his talk at Windev. Dan has an engineering background and is excited about using UDTs for data types like Linear Dimension (he wrote the LDim example in our book).

What Dan discovered in attempting to implement some initialization inside the constructor is that, inside SQL Server operations, the constructor is never called! His UDT used Format.UserDefined, which means he implements the IBinarySerialize interface to Read and Write the UDT state. The engine just allocates some bare memory for an instance with the IL “newobj” instruction, then calls IBinarySerialize.Read to fill it out. What this means is that any internal fields that you would initialize inside a constructor must also be initialized when IBinarySerialize.Read is called as well. Dan found this out the hard way, reference types he'd expected to be initialized in the constructor were null when Read is called.

He's made a common implementation pattern in his UDTs: always have an Initialize or Init method and call it both in the constructor (for "ordinary" or client usage) and in IBinarySerialize.Read (for use inside SQL Server 2005). A good thing to keep in mind if you're writing a UDT.

When I asked if this wasn't against the "rules", his response was that C# always calls the constructor after newobj, but it's not a .NET requirement. Niels Berglund mentioned that this behavior is also true in Format.Native too, but all its fields are value types, so they are all initialized to zero by the runtime.

Niels also mentioned an interesting pattern he observed by tracing through a UDAgg he'd written with Format.UserDefined.UDAggs implement an Init() method instead of a constructor so that instances can be reused, for example in GROUP BY statements. The pattern Niels observed was:

1. Init
2. Write (but Accumulate hasn't been called yet)
3. Read-Accumulate-Write (for each row processed)
4. Read-Terminate

This pattern had puzzled him at the time, but alongside Dan's UDT observations this makes sense. I'd originally thought that the Read-Accumulate-Write pattern was for parallel processing, but it always happens even on a single processor. At this point we're all thinking it might a hedge based on how thread scheduling works in SQL Server.

So don't forget Dan's "Initialize in Read or constructor" pattern when you write Format.UserDefined UDTs. I think I have the good fortune to work with some pretty observant folks.

Thursday, August 12, 2004 10:18:03 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

As Mark Fussell will tell you, one of the coolest things in System.Xml 2.0 is the static creator methods on XmlReader and XmlWriter. It made his top 10 list, albeit at number 10.

I've been looking into how the client side handles SQL Server 2005's XML data type and was bemoaning the fact that System.Data.SqlTypes.SqlXml had a CreateReader method that probably didn't set what I wanted set - ConformanceLevel.Fragment. Sounded like just the ticket for SQL Server 2005. Checked with Reflector, and lo and behold, they *do* use XmlReader.Create with ConformanceLevel.Fragment. Very cool, and the information was helpful to Kent Tegels on the beta newsgroup as well, just this morning. My research timing karma seems pretty good lately.

Now, I want an overload that looks like SqlXml.CreateReader(XmlReaderSettings). And something that handles sequences of atomic types. Maybe in .NET 2.0 beta2. Hey Mark... maybe I'll hit him up for it at WinDev.

Thursday, August 12, 2004 2:12:17 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Tuesday, August 10, 2004

Came across something recently that I thought was odd (or a bug) when dealing with UDTs in SQL Server Management Studio.

If I have a UDT, say ComplexNumber, I deploy it to SQL Server 2005. Use it as a column in a UDT table, complextab. Insert a row:

INSERT complextab VALUES('1:1i')

Then attempt to SELECT all of the values:

SELECT * FROM complextab

I get the following error message:

An error occurred while executing batch. Error message is: File or assembly name 'ComplexNumber, Version=1.0.0.0, Culture=neutral, PublicKeyToken=17177e16a4b86577', or one of its dependencies, was not found.

Scratched my head and thought about it for a few minutes. Reason for this is that SQL Server Management Studio (SSMS) is just another client of SQL Server that uses ADO.NET. In order to use user-defined types with ADO.NET, you must deploy the assembly to the client. So putting the ComplexNumber assembly in the GAC or in the directory that contains SSMS fixes this "problem". It's not a problem, it’s by design. The SSMS will call ToString() ON THE CLIENT SIDE and display the value just fine.

Note that:
1. If you change it to: "SELECT complexcol.ToString() FROM complextab" it works because ToString is being called on the server, not on the client.
2. If you run the same command from SQLCMD it display the binary value of the ComplexNumber column. That's because it uses OLE DB to talk to SQL Server, not ADO.NET. Interestingly, this is what SQL Workbench in Beta 1 did also. Guess the folks at SSMS decided to nicely format it for Beta 2.

Tuesday, August 10, 2004 1:55:02 PM (Pacific Standard Time, UTC-08:00)  #    Comments [3]  | 

I’m a person who always likes to know where things live in the OS, database, or whatever product I’m dealing with. Being able to point to a specific DLL or configuration file or registry entry gives me something tangible to hang my hat on, rather than think that things happen “by magic”.

 

So I was intrigued by the apparent disappearance of the network libraries in SQL Server 2005. Start with these points:

1. SQL Server 2005 B2 does not ship a new version of MDAC
2. SQL Server 2005 B2 does ship with SQL Native Client (codename was SNAC) with new OLE DB provider and ODBC driver. It also appears to cover network connectivity.  In the new client config utility, all (four) supported protocols list netlibs as “SQLNCLI.dll”. That's SQL Native Client.
3. System.Data.SqlClient does not use MDAC any more
4. System.Data.SqlClient did use DBNETLIB.dll to talk to SQL Server in the past

 

So does .NET SqlClient use SQL Native Client? Dan, Niels, and I did some experiments and it doesn’t appear that SqlClient calls SNAC at all. We even installed .NET 2.0 on a machine without SNAC and did tests that include MARS and async (these require new protocol features). Worked fine, no SQLNCLI or DBNETLIB loaded or in sight. So how’d they do that?

 

Looking in System.Data.dll, it appears that there are some calls that start with the letters SNI. There are no calls to DBNETLIB (as there was in System.Data.dll in .NET 1.1). A quick search of Books Online yielded only references to SNI associated with Service Broker protocol (as in “the SNI connection string for Service Broker...”).

 

So it appears that both System.Data.dll and SQLNCLI.dll implement SNI calls directly rather than calling a specific library. And that the SNI model is used by more than just client-server protocols. It is possibly used for everything that is covered by CREATE ENDPOINT. System.Data.SqlClient doesn’t use MDAC or SNAC, but both System.Data.SqlClient and SNAC are compatible with previous versions of SQL Server, because I could use the OLE DB Native Client provider with SQL Server 2000 just fine. Hmmm.

 

Stay tuned...

Tuesday, August 10, 2004 9:29:59 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Monday, August 09, 2004

I was writing slides and labs last week. I read Dare's discussion of “The Sex and Cash theory“ about doing what you think is cool vs doing what you do for cash (work) that may not be as “sexy“. Didn't feel so bad after that. My discussions with others revolved around the “official” words and terminolgy for various SQL Server 2005-related items.

SQL Server Management Studio - its too long to fit on a slide. The obvious acronym (SMS) is already “taken” and would be confusing. Then I saw SSMS as an acronym posted on a newsgroup. Cool acronym. Done.

Did you also notice the suffixes for SQL Server Management Studio projects? They absolutely win the aware for longest Windows file suffix ever. “MyProject.ssmssln is for the solution file“ and “MyProject.ssmssqlproj“ for the file that lists the subfiles in a project. Wow. Well the file associations work and I guess we've gone way past 8.3 names.

SQL Server's native XML data type - I refer to this everywhere as the “XML data type” . An XML aficionado pointed out that I am preempting  a term usually used to refer to XML types defined in an XML schema and that I should cease and desist. So I did a little research.

The SQL Server 2005 Books Online refers to it as “xml data type”, but with a lower-case “xml”. Michael Rys, the PM of the SQL Server XML group refers to it in his “XQuery From The Experts“ book chapter on an XML native data type in a SQL database as “XML datatype”. Upper case XML. Another database vendor refers to it as XMLType. Couldn't find it in the ANSI-SQL 2003 spec, part 14 at all (it's mainly about mapping). And the ANSI-SQL 2003 spec refers to types defined in an XML Schema as “XML Schema Data Types”, not XML data type at all.

“XML folks“ don't usually like to see the acronym XML in small letters and “SQL folks“ have a convention of capitalizing SQL keywords and data types, they might be offended if it is in lower case. XML is now a T-SQL keyword. Oh well...In the end, I decided to stick with I have. And now I use the term “XML Schema Data Types”. If you ever read anything on this subject that I've written, now you'll know what I mean.

Monday, August 09, 2004 10:11:34 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Sunday, August 01, 2004

I've been looking at the SMO (SQL Server Management Objects) model recently. Because the amount of data they retrieve is potentially large and potentially expensive to produce (say sp_spaceused on every database), they use most/all of the *data transfer optimization* techniques: both sparse and eager population, lazy instanciation, caching, deferred/chunky writeback, and locality of reference. Reasonable, documented defaults with progammable overrides.

What's this got to do with object-relational mapping? Well, nothing. And everything. These techniques are data optimization techniques. They are the same whether you talk about them in terms of SQL SELECT statements or objects. The fact that pieces of data in a database are mapped to properties of classes in an object model (data model mapping) is only peripherally related to data access optimization, at best.

Before I go forward, I'll state that I don't consider myself an O/R mapping (or XML-relational or XML-object) bigot on either side, either for it or against it. Really. But many folks, especially when they are bashing mapping, use the data transfer optimization techniques (along with another data access concept - locality of reference) to justify their arguments. That's like me blaming the inventor of the knife and fork for my current extra weight. Data optimization, badly implemented, will kill a data-centric object model. But you can accomplish the same thing with bare SQL statements and DataReaders/Rowsets.

For an easy to picture example, imagine paging through a large resultset in ASP.NET. You can:
1. Cache the entire resultset (eager population + caching)
2. Use TOP queries to SELECT on a per-page basis using a custom ASP.NET pager
3. Send the whole bunch to client for paging (eager population + locality of reference)
4. Fetch entire resultset each time and throw all but one-page away. (bad perf).

You can accomplish all this, including the bad design, with DataReaders only, using DataSets for your cache. Not an object-mapped-to-a-relation in sight. Another easy-to-picture example is the Windows file explorer, with plus signs next to each directory. Sparse population/lazy instanciation. The “show details“ setting: sparse property population choice.

Here's a few more quick examples, using ObjectSpaces (technology on-hold, but this was the latest implementation I'd studied) and SQL statements.

// baseline
ObjectReader<Customers> or = os.GetObjectReader<Customer>("");

SqlCommand cmd = new SqlCommand("select * from customers");
SqlDataReader rdr = cmd.ExecuteReader();

// eager population
ObjectReader<Customers> or = os.GetObjectReader<Customer>("", "Orders, Details");

SqlCommand cmd = new SqlCommand(
 "select a.*, b.*, c.* from customers a
  inner join orders b on a.custid = b.custid
  inner join details c on b.orderid = c.orderid");
SqlDataReader rdr = cmd.ExecuteReader();

// sparse population
// customers contains a customer photo
// sparse population in mapping file, can be overriden on query
ObjectReader<Customers> or = os.GetObjectReader<Customer>("");

// leave customer photo out of the SELECT
SqlCommand cmd = new SqlCommand("select customerid, customer_name... from customers");
SqlDataReader rdr = cmd.ExecuteReader();

SO...is mapping among data models a bad idea, or is not knowing/designing for/using the data optimization techniques a bad idea? Mapping objects to data *badly* (or having a model so narrow there are no choices) is just as bad as writing bad SELECTs. You have to know where your data lives, how expensive it is to produce it, and how you are going to use it in ANY case. O/R mapping DOESN'T ABSOLVE YOU from that responsibility. But bad designs are the result of uninformed designers, not a specific technology. Thinking that a specific technology is a panacea and now you don't have to consider design, is also dangerous.

Sunday, August 01, 2004 1:23:35 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

There have been a number of questions recently (well OK, three) on the beta newsgroups about SqlDependency problems. It's been suggested that SqlDependency doesn't work in Whidbey Beta 1.

The reason for this is that SqlDependency (and, of course, its lower-level cousin SqlNotificationRequest) requires a “valid” notify-able SQL query to work. What's a “valid”  notify-able query for a NotificationRequest? Subscriptions for query notification in SQL Server 2005 use the same underlying mechanism to be notified of resultset changes as indexed views do. Therefore the rules are the same as the rules for indexed views.

Since the example in our “First Look at SQL Server 2005 for Developers” book also uses an invalid query (it doesn't use a two-part table name), I posted the rules for indexed views (and therefore for SqlNotificationRequest/SqlDependency on the book website, under “changes since the book shipped/chapter 12”. It seemed that you could “get away with” not always following the rules until SQL Server 2005 Beta2. The list of rules is from the SQL Server 2005 Beta 2 Books Online.

By the way, if you submit an invalid SELECT statement with a SqlDependency, you'll get an immediate notification with the reason (in SqlNotificationEventArgs) “invalid”.

Sunday, August 01, 2004 8:24:38 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 

Theme design by Jelle Druyts

Pick a theme: