Wednesday, July 21, 2004

In recent posts I'd mentioned mapping a few times, it's time to get back and explore it in earnest. There appears to be three major data models in use by programmers today:

  • Relational - that's where the majority of the corporate data is stored. SQL and its product-specific dialects is the main programming language that follows this model.
  • Objects - really defined as state and behavior, the state being the data, the behavior being what you use to manipulate that data. Object-oriented databases peaked and waned in the 90s, but object-oriented programming languages are what's used by the majority of programmers today
  • XML - XML has accomodations for markup and data, a schema language and query languages. XML-specific databases are on the rise, albeit slowly. Instead, most/all mainstream relational database vendors are adding accomodations for XML data in their databases today (ANSI-SQL 2003), just as they added accomodations for objects in the 90s (ANSI-SQL 1999).

If you program using more than one of these models, it becomes necessary to map between these models. For example, object-oriented programmers are never satisfied with my answer when asking about representing data as objects: “the familiar connection-command-rowset paradigm IS an object-oriented implementation of data access“. They don't want rowsets and rowset metadata; they want business objects. They are just as unlikely to be happy with the answer that the XML DOM or streaming apis can be implemented using an object model. So mapping (or dealing with the reality that the "other side" of the protocol pipe may be using mapping) is a necessity if you can't mandate what all of the users of your data will do. When you speak of object-oriented programmers using web services (which uses XML as a marshaling format and for just about everything else) you're even talking mapping for data marshaling. Whether you stick with composition and simple types or go all out to make relational or XML represent complex types, it's just a different refinement of the same mapping. So we're talking two way:

Relational to object mapping
Relational to XML mapping
XML to object mapping

Currently, there are Microsoft implementations of mapping in the .NET libraries in System.Xml.Serialization and the remoting SoapSerializer. The SOAP serializer is based on mapping type systems to XML based on a pre-XML schema attempt in SOAP 1.0 section 6. With the completion of the XML Schema spec, fewer people each day remember that SOAP originally stood for Simple Object Access Protocol. It's acronym status has even been revoked. The schema-centric model (document-literal format in WSDL) is what most modern web services use.

The experimental COmega language made this mapping an implicit part of the programming language, consuming metadata with two utilities SQL2COmega and XML2COmega, and making SELECT a language keyword. This be an interesting approach with a future.

I was intrigued by the inclusion of a three-part mapping schema format that made a brief appearance in .NET 2.0 alpha 1, and seems to have faded off into the future. The format was based on the presumption that you had existed XML schemas, relational databases and object models and you didn't want them to change. Current technologies like SQLXML mapping schemas in SQL Server's SQLXML use annotated schemas, with annotations that mapping XML elements and attributes to table columns or special formats like UpdateGrams and DiffGrams. This meant you had to annotate the schemas. The mapping format left XSD schemas alone, but introduced XML representations of relational schemas (known as RSD format) and object schemas (known as OSD format). Mapping any type to any other was accomplished via a mapping schema format (known as MSD) that could represent constructs outside the boundary of the "original" data model. It was also flexible in that it did not mandate a one-to-one mapping, implemented declarative mapping (for offline code generation) and could expose multiple overlapping mappings over the same data.

This mapping format made its appearance in two guises: System.Data.SqlXml (XML/relational) and ObjectSpaces (object/relational). ObjectSpaces has been postponed until "the next release" of .NET, .NET 2.0 SqlXml mapping has also been pushed off into the future. I couldn't even find a current link to this work on MSDN. System.Xml.Serialization didn't choose to implement this mapping model, keeping .NET attributes as its model to acheive mapping. So it will be intresting to see what (if anything) becomes of this model.

Bottom line is, unless you develop in a cloistered environment where you can safely ignore other data models, you're gonna have to map some data. Embedded SQL-92 development is such an environment, so is using XML web services with the native XML apis, simple types, and query languages only. The rest of everyone will have to use toolkits or roll their own mapping for now. And stay away from complicated domain-specific constructs, because the other side may be using a different domain model.

Wednesday, July 21, 2004 3:59:43 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Friday, July 16, 2004

Sometimes you have to hit me over the head to make me aware that something's changed. I've been working with Visual Studio 2005 beta 1 for about 10 days now, and just noticed that creating a SQLCLR project (that's [Language of Your Choice]/Database/SQL Server Project) no longer includes a reference to Microsoft.VisualStudio.DataTools.SqlAttributes.dll.

This was the DLL that included deployment attributes for Visual Studio to use when it auto-deployed its SQLCLR assemblies to SQL Server. I only realized this when I saw the [SqlFunction] attribute decorating my new user-defined function, created by Add/NewItem. No more explaining the difference between [SqlFunc] and [SqlFunction]. Hooray! Besides combining those two attributes, all of the attributes are now in System.Data.dll in the System.Data.Sql namespace. And there's a new one (knew about that), SqlFacet.

There are some other interesting things which bear mentioning:
1. The SQLCLR classes generated with add/new item are now partial classes in C#. There's a nice MSDN article that talks about what partial classes are.
2. The security level of the assembly (SAFE/EXTERNAL_ACCESS/UNSAFE) is now a setting in project properties. No more hardcoded EXTERNAL_ACCESS.
3. "Add References" produces a dialog with a subset of base CLR assemblies; they look to be only the CLR assemblies that are "allowed to load" in SQL Server. All other assemblies must be referenced through the "Project" tab, if you want them.

The only drawback to this is that it requires that I write a few more entries in the
"SQL Server 2005 changes since book publication (by chapter)" pages on the "First Look At SQL Server 2005 for Developers" website. Fell off the bleeding edge again, geez. But the degree of improvement in the changes make it worth it.

 

Friday, July 16, 2004 3:16:25 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

Since you've probably seen this on every other blog in the world already, I'm not going to post that the language that was known as X# and Xen has now shipped as COmega. It's not a commercial product, but available at Microsoft Research. (Well, I guess I did post about it then, didn't I?).

I've heard so much about this language that I had to download it, install it and play with it. It installs the compiler, two tools, SQL2COmega and XSD2COmega, some samples, and Visual Studio template projects. The tools' purpose is to create COmega classes to be used with the XML data and SQL Server data of your choice. Very cool. This language is at least partially (if not mostly) about mapping between data models (Classes<->XML and Classes<->relational databases), a theme I keeping coming back to.

In keeping with the concept of rite of passage programs in the last post, I had to write my "select * from authors" program. BTW, it is also required that you write these in the middle of the night, preferably after 2am. Yes, I know COmega comes with a Northwind Sample project, but it *has* to be authors, OK? I already noticed that COmega installs a set of Visual Studio templates and a "New Project" got me a new COmega console app. Let's try that out...

You can run SQL2COmega from inside Visual Studio by clicking "add reference". This creates a library and adds a reference to it in your program. First thing to notice is that you're better off naming columns (not even sure if you can do "select * from...") because the "anonymous structure" classes you use are strongly typed. The other oddity the "select" statement is part of the language, so the table must be DB.[TableName], the compiler is comiling this as well. Reminiscent of embedded SQL, if I say so myself.

Well, it works. Cool. I'll leave it up to Mark to write "99 bottles of beer". It should look amazingly similar to his XQuery version.

One wish for COmega is to have a .NET 2.0/Visual Studio 2005 B1 compatible version. It may run under .NET 2.0, but I haven't tried this because the installer complains about not having Visual Studio 2003 installed. It isn't installed on my "new technologies" partition. And just maybe... that it might be a "non-research project" some day.

Here's the program. Next step (with any new database API, especially mapping technologies) is to run this process with SQL Profiler on. And, of course, to do this without Visual Studio, and figure out if its doing any "magic". Didn't seem to be any. And to reflect upon SQL2COmega and XSD2COmega to figure out how they're doing mapping.

using System;
using System.Data.SqlTypes;  // for SqlString
using pubs;                  // for referenced library from SQL2COmega

public class Test {

    static void Main() {

      // au is an "anonymous structure"
      // select is a built-in language keyword
      // DB is a property of Database class
      // generated by SQL2COmega
      // NOT a database schema name
      struct{SqlString au_id; SqlString au_lname;}* au
          = select au_id, au_lname from DB.authors;

      foreach( row in au )
      {
         Console.WriteLine("{0} {1}", row.au_id, row.au_lname);
      }
  }
}

Friday, July 16, 2004 2:46:23 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 

When you get a brand new tool, programming language, or API, the first thing most programmers use it for is to write their favorite "rite of passage" program. This is an example that is so simple, it's only purpose is to illustrate the that compiler or tools are installed and working correctly and that "the system knows about them". One of my old friends would call them "the canonical examples".

Two of these that I always revert back to are "Hello, World" and "Add two numbers together". Hello World is the first thing that you write with any new programming language. Popularized by K&R, perhaps, maybe before. Add two numbers together is the starter distributed technologies example, the "original" DCE/RPC came with this one, on the premise that, if you had to add two numbers together, it would be quicker to do it on the Cray Supercomputer across the world than to entrust it to the wimpy little CPU on your workstation. Another such program I'd just heard of (so I'm culturally deprived), via a Mark Fussell blog posting is the "99 bottles of beer" program, especially useful for recusrive languages. And of course, my habit of writing a new data provider when the model changes.

Being a database programmer, when using SQL Server, the rite of passage program is to do what amounts to "select * from authors" in the pubs sample database. I was saddened to hear that the pubs sample database wouldn't ship by default with SQL Server 2005, but already had my "instpubs.sql" script stashed away for a rainy day.

Here's my three favorite "rite of passage" programs, written as SQLCLR stored procedures/user-defined functions.

[SqlProcedure]
public static void SayHello()
{   // the 'H'  and 'W' must be capitalized. Exclamation point is required.
    SqlContext.GetPipe().Send("Hello World!");
}

[SqlFunction]
public static SqlInt32 AddTwo(SqlInt32 x, SqlInt32 y)
{
    return x + y;
}

[SqlProcedure]
public static void GetAuthors()
{
   SqlCommand cmd = SqlContext.GetCommand();
   cmd.CommandText = "select * from authors";
   SqlContext.GetPipe().Execute(cmd);
}

and the T-SQL to invoke them:

USE pubs
GO

EXECUTE SayHello
GO

DECLARE @answer int
SET @answer = dbo.AddTwo(2,3)
PRINT @answer
GO

EXECUTE GetAuthors
GO

Friday, July 16, 2004 2:28:24 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Wednesday, July 14, 2004

About a month ago, there was another chapter in the long discussion about DataSets and web services. An old cohort Scott Hanselman fired off the first salvo, which was taken up by Ted Neward. Doug Purdy responds that Purchase Orders are root of all evil and lots of other chimed in. I'm just catching up...

This comes up again as I work with SQL Server 2005 web services. They too produce “SqlRowset == Schema + Any" in WSDL. Both XML and object afficianados tend to dislike this, but it exactly reflects the relational resultset, rowset, whatever you'd like to call it.

Relational results are columns and rows, where each row contains the same number of columns and each column contains the same data type for each row. Been around for a while now. The resultset is a "generic container class" or special 2-dimensional array if you want to think in those terms, in which each instance can contain different types, but the shape is rectangular. Resultsets, looking at an Open Data Services program which makes the protocol explicit, are preceeded by a sequence of describe packets that contain the name, data type, and other information about each column. So "Schema + Any" means exactly that. *Instance* schema, followed by the instance it describes. Corresponds quite nicely with relational databases, where the majority of corporate data is stored.

The problems stem from the fact that lots of web services toolkits, that are doing XML-to-Object mapping, don't recognize this paradigm. They are looking for an individual (static) schema, not an instance (dynamic) schema. In addition, the XML Schema spec states that how you find a schema from an instance of an XML document is implementation dependent. It mentions a hint (xsi:schemaLocation) and some location strategies for schemas with namespaces that are URLs, but says processors don't have to support the hint. Most WSDL that uses "Schema + Any" doesn't provide the hint. With Microsoft XML processors, this may be because "Schema + Any" is part of the implementation, an "implementation dependent" location strategy.

There's a few ways to resolve this, if you're looking for strongly typed objects at compile time. One way is strongly-typed DataSets, which fetch out the information at compile time, make a DataSet wrapper class, and, for web services, write WSDL that amounts to "Any from a specific namespace", and import the namespace schema, which includes the strong type. Another way is to write your own WSDL for each resultset.

If you write a stored procedure in SQL Server that looks like this:

CREATE PROCEDURE getauthors
AS
SELECT au_id, au_fname, au_lname FROM authors

you know exactly what that resultset will look like. Unless the DBA changes the column type, in which case you'll have other problems. Even if you use the "SELECT * FROM ..." concept (which is not usually recommended) this works unless the DBA adds or deletes a column, or changes the order of definition. I haven't been able to find a metadata table in SQL Server (or in the ANSI/SQL spec) that lists the shape of all resultsets returned by stored procedures (there is one for *parameter* information) , so you have to use your knowledge of the resultset to handcraft the WSDL. In ADO.NET, you can use the DataSet's "WriteXmlSchema" method as a starter. Or use "SET FMTONLY ON;SET NO_BROWSETABLE ON;EXECUTE...". SQL Server 2005 web services that map stored procedure invocation to web service methods will let you specify a custom WSDL-generating stored procedure. Statically store the metadata (instance schemas) in the database or dynamic generate them at WSDL generation time.

BTW, this mapping of resultsets to XML schemas amounts to relational-to-XML mapping. So web service toolkits can accomplish their XML-to-object mapping. I'll return to this theme in future...

Wednesday, July 14, 2004 12:35:10 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Tuesday, July 13, 2004

I collegue of mine is studying SQL Server Analysis Services and asks about APIs for it. I have the unique (wierd?) perspective of having come at OLAP through the APIs: OLE DB for OLAP, ADOMD, and XMLA. However, he's interrested in .NET APIs and asks: “Can the OleDb data provider support the MSOLAP OLE DB provider”? My answer is “not exactly” with the qualification that the OleDb data provider doesn't support the OLE DB Dataset. Thought I've have to explain that one.

Back when I was teaching OLE DB in earnest, I took it upon myself to learn every one of the 60-some-odd OLE DB interfaces. Could tell you the difference between IRowsetView and IViewRowset. A varitable fountain of now-mostly-obsolete (unless you use linked servers or maintain OLE DB code) esoteric information. OLE DB for OLAP (and later OLE DB for Data Mining) added more interfaces and OLE DB for OLAP even added a new cotype called the Dataset. I taught a class to OLE DB for OLAP provider writers and learned these too. The OLE DB for OLAP Dataset is unrelated to the ADO.NET DataSet.

The OLE DB for OLAP Dataset is a refinement to the OLE DB Rowset, adding multidimensional features like multiple values per cell. This Dataset (lower-case 's') is what's not supported in ADO.NET (it would be a specialization of DataReader in the model, sort of).

Most people programmed ADO MD (ADO for multidimensional) instead. It encapsulated the OLE DB for OLAP Dataset in a class called Cellset. AND there is a recent release that supports .NET, called ADOMD for .NET. Cool.

After going through Resultset, Recordset, Rowset, DataReader and others, there's just a limit to how many new names you can come up with for “the object that encapsulates columns and rows”. Even when they are cells. Terminology reuse is inevitable, but confusing.

Glad we got THAT out of the way.

Tuesday, July 13, 2004 8:21:54 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Thursday, July 08, 2004

After writing DevelopMentor's Essential OLE DB class and teaching it to provider writers (and detail-oriented consumer writers) for a few years, I always thought that a good way to get myself acquainted with the new data model was to write a new database client provider/driver/etc whenever the model changed. I wrote a simple .NET data provider during .NET alpha 1 and published an article in MSDN magazine about it in Dec 2001. Well, the model changed again.

ADO.NET 2.0 enhanced the data provider model in some pretty revolutionary ways. It added something that was always needed, a factory for providers. This class (System.Data.Common.ProviderFactories) gets a list of providers or a provider-specific factory class (e.g. SqlClientFactory). The provider-specific factory can be used to get provider-specific Connection, Command, etc, instances. Providers that want to participate now have a special place in the config file. The 4 Microsoft providers are listed in machine.config.

In addition to this change, the new provider model is based on base classes rather than interfaces (although interfaces are still there for backward compatibility). Provider writers implement classes that derive from DbConnection, DbCommand, etc. This allows adding new functionality without breaking backward compatibility.

So I've updated the provider and posted the update on my website. The updated provider uses base classes and also has a Provider Factory class. There's also a sample program that uses it. I didn't put the provider in machine.config, but have an app.config that adds it.

An overview of the new model is provided in this MSDN online article, and two more are in the works. Look for them in MSDN online.

Enjoy.

Thursday, July 08, 2004 7:30:55 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Tuesday, July 06, 2004

If you've recently installed the SQL Express version of SQL Server 2005, you may or may not have noticed SQL Native Client (SNAC). If you want to use the new features of SQL Server 2005 like multiple active resultsets or snapshot isolation from OLE DB, ADO, or ODBC, you're going to need SNAC.

SNAC is meant to separate SQL Server's OLE DB provider and ODBC driver from MDAC. Currently the SQL Server OLE DB provider (SQLOLEDB) and ODBC driver ship as part of MDAC (Microsoft Data Access). A new version of Microsoft Data Access (MDAC 9.0) was planned in order to support the new SQL Server 2005 functionality. MDAC 9.0 won't happen. Instead, the current components of MDAC (2.8 SP1?) will ship as part of the Windows family of operating systems. MDAC will change infrequently, changes will not be tied to new SQL Server functionality. SNAC will be versioned with new versions of SQL Server and ship separately.

Once you've installed SNAC, you should see a new OLE DB provider (SQL Native Client) and a new ODBC driver (SQL Native Client). You need to use them instead of the older versions to get the SQL Server 2005-specific functionality. Note that they need to be coded into the connection string (OLE DB, ADO, or ODBC) or ODBC DSN. The SQLNCLI provider does not support XML like the SQLOLEDB provider does.

If you're using ADO.NET's SqlClient data provider, it has been rearchitected so it doesn't use MDAC. Or SNAC, as far as I know. If you've used SqlClient with SQL Server 2005 beta 1, you won't need "USE MDAC9" in connection strings as MDAC 9.0 isn't shipping with .NET 2.0 beta 1/SQL Server 2005 beta 2. SNAC is provided for vendors that still use OLE DB, ADO, and ODBC in the products. Nice feature. I have a soft spot in my heart for OLE DB (having taught provider writers for a few year), and having MDAC as a part of the OS is overdue. Not shipping MDAC for SQL Server changes will be a good thing.

Tuesday, July 06, 2004 8:23:43 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Sunday, July 04, 2004

I was checking some of the SQL Server 2005 features on the way over on the plane. The latest build that I have is SQL Express build, fairly recent. Many of the XQuery features have been updated to Nov 2003 XQuery spec. Very cool. But comments wasn't one of them. A nit, I know.

The {-- a comment --} style comments still work when using SQL Server XQuery but the new style, smiley-face comments (: a comment :) don't. They generate a syntax error.

Now I *liked* the old style comments (don't like to see my code grin at me, call me sensitive), but the smiley-face ones are part of the spec revision, unless I'm missing something. And they are supported in the client-side System.Data.SqlXml's XQueryCommand dialect of XQuery. That's even more puzzling because the two dialects (SQL Server and System.Data.SqlXml) are synchronized in other ways. For example, System.Data.SqlXml no longer supports the XQuery “LET” verb and has compile-time strong type checking like SQL Server.

I lag in reporting things I find, because I know beta 2 is still a work in progress. Thought it would be added. All of the book examples have the smiley face comments, just do “find and replace” if you run 'em. Beta 2 still isn't out yet, maybe... :)

Sunday, July 04, 2004 2:21:29 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 

The first blog entry got me thinking. For someone supposedly cloistered writing a book, I have been doing some heavy traveling. For the last year, I've been teaching Yukon beta 1 quite a bit. Many students were participants in the Ascend program phase 1. They are folks from software companies who write application and system software that runs on SQL Server. They came to the class, Yukon for Developers, to see what all the buzz was about, not only with T-SQL, but with revolutionary features like .NET procedural code support, user-defined types, XML and XQuery, service broker, and others. They were joined by support people and folks from MCS (Microsoft Consulting Services).

It was great fun. I met and hung out with the Ascend guys: Andy Gammuto, Don Petersen, and Roger Doherty. I actually haven't met Roger yet, he's just a voice on the phone so far. We took the show to such faraway places as Sydney and Munich; Niels did a gig in Nice.

The best part was seeing what folks thought of the new features. I've always met some people using XML in databases on real-world problems in phase 1. They enjoyed trying out the XQuery support. Phase 2 promises to be just as exciting.

So I guess that's not exactly hiding. More like “where in the world is Bob”. I've been around the block a few times. Might even qualify for 1k airplane miles this year. But that's another story.

Sunday, July 04, 2004 2:20:14 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Thursday, July 01, 2004

I've finally decided to take the plunge and start this up. I've been in hiding for the last year and three quarters or so, working on a SQL Server 2005 (was Yukon) course and a companion book. The book, entitled “A First Look at SQL Server 2005 for Developers” shipped on Monday. I have a single advance copy. Amazon doesn't think it's shipped yet though, has the book's old name (with Yukon in it) and is missing an author (oh well).

Dan Sullivan and Niels Berglund are my co-authors. The book has been a long time in coming. The SQL Server folks were nice enough to let us release it (the material is based on beta 2) even though Beta 2 isn't out yet. Modulo SQL Express, which was released yesterday on MSDN. So I have a (very) few spare cycles.

I'm going to write about adventures in Ascend program phase 2, and any technical insights and info I pick up along the way. I may not be writing much right away though, I'm on my way to Tokyo for a teaching gig tomorrow morning.

Thursday, July 01, 2004 7:03:21 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 

Theme design by Jelle Druyts

Pick a theme: