On mapping between data models

In recent posts I'd mentioned mapping a few times, it's time to get back and explore it in earnest. There appears to be three major data models in use by programmers today:

  • Relational – that's where the majority of the corporate data is stored. SQL and its product-specific dialects is the main programming language that follows this model.
  • Objects – really defined as state and behavior, the state being the data, the behavior being what you use to manipulate that data. Object-oriented databases peaked and waned in the 90s, but object-oriented programming languages are what's used by the majority of programmers today
  • XML – XML has accomodations for markup and data, a schema language and query languages. XML-specific databases are on the rise, albeit slowly. Instead, most/all mainstream relational database vendors are adding accomodations for XML data in their databases today (ANSI-SQL 2003), just as they added accomodations for objects in the 90s (ANSI-SQL 1999).

If you program using more than one of these models, it becomes necessary to map between these models. For example, object-oriented programmers are never satisfied with my answer when asking about representing data as objects: “the familiar connection-command-rowset paradigm IS an object-oriented implementation of data access“. They don't want rowsets and rowset metadata; they want business objects. They are just as unlikely to be happy with the answer that the XML DOM or streaming apis can be implemented using an object model. So mapping (or dealing with the reality that the "other side" of the protocol pipe may be using mapping) is a necessity if you can't mandate what all of the users of your data will do. When you speak of object-oriented programmers using web services (which uses XML as a marshaling format and for just about everything else) you're even talking mapping for data marshaling. Whether you stick with composition and simple types or go all out to make relational or XML represent complex types, it's just a different refinement of the same mapping. So we're talking two way:

Relational to object mapping
Relational to XML mapping
XML to object mapping

Currently, there are Microsoft implementations of mapping in the .NET libraries in System.Xml.Serialization and the remoting SoapSerializer. The SOAP serializer is based on mapping type systems to XML based on a pre-XML schema attempt in SOAP 1.0 section 6. With the completion of the XML Schema spec, fewer people each day remember that SOAP originally stood for Simple Object Access Protocol. It's acronym status has even been revoked. The schema-centric model (document-literal format in WSDL) is what most modern web services use.

The experimental COmega language made this mapping an implicit part of the programming language, consuming metadata with two utilities SQL2COmega and XML2COmega, and making SELECT a language keyword. This be an interesting approach with a future.

I was intrigued by the inclusion of a three-part mapping schema format that made a brief appearance in .NET 2.0 alpha 1, and seems to have faded off into the future. The format was based on the presumption that you had existed XML schemas, relational databases and object models and you didn't want them to change. Current technologies like SQLXML mapping schemas in SQL Server's SQLXML use annotated schemas, with annotations that mapping XML elements and attributes to table columns or special formats like UpdateGrams and DiffGrams. This meant you had to annotate the schemas. The mapping format left XSD schemas alone, but introduced XML representations of relational schemas (known as RSD format) and object schemas (known as OSD format). Mapping any type to any other was accomplished via a mapping schema format (known as MSD) that could represent constructs outside the boundary of the "original" data model. It was also flexible in that it did not mandate a one-to-one mapping, implemented declarative mapping (for offline code generation) and could expose multiple overlapping mappings over the same data.

This mapping format made its appearance in two guises: System.Data.SqlXml (XML/relational) and ObjectSpaces (object/relational). ObjectSpaces has been postponed until "the next release" of .NET, .NET 2.0 SqlXml mapping has also been pushed off into the future. I couldn't even find a current link to this work on MSDN. System.Xml.Serialization didn't choose to implement this mapping model, keeping .NET attributes as its model to acheive mapping. So it will be intresting to see what (if anything) becomes of this model.

Bottom line is, unless you develop in a cloistered environment where you can safely ignore other data models, you're gonna have to map some data. Embedded SQL-92 development is such an environment, so is using XML web services with the native XML apis, simple types, and query languages only. The rest of everyone will have to use toolkits or roll their own mapping for now. And stay away from complicated domain-specific constructs, because the other side may be using a different domain model.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.