This post is an example of the "bring your own problem" approach in the classroom, inspired by upcoming SQLskills Immersion Events in August. See previous posting for the reason behind the blog post title.

SQL Server is a relational database that supports XML. But just because your data is received in XML format doesn't necessarily mean you should store it that way for querying. And if you intend to do any serious querying of your XML, be prepared to use at least a primary XML index.

I'd had students who planned to use XML in SQL Server for a big application with lots of queries (in the hundreds of thousands per day) and lots of inserts. Think of your typical OLTP system does with XML documents instead of multi-table joins. A single document might contain information from 4-5 or more relational tables. Is it a good idea to store what's usually relational data as XML?

XML is stored in SQL Server in a propriatary binary format. SQL Server's XQuery is not standalone, but always combined with SQL statements. To be useable in these statements the XML is decomposed (or the part of its that's needed to satisfy the XML predicate is) into relatonal in-memory structures. See the TVF steps XmlReader or XmlReader with XPath Filter iterators in the query plan. That's more-or-less the structure that is persisted on disk (as an internal table) when you create XML indexes. The primary XML index and secondary XML indexes are simply clustered and nonclustered indexes over the internal table. With these indexes SQL Server has a reasonably fast implimentation of XQuery, for a relational database, because its using the relational engine. But "raw" XML has no statistics on the element and attribute values to help the relational engine select a best plan. You'll see huge costs for this decomposition in SQL query plans, but bear in mind that the costs may be overestimated because of overestimating of cardinality, even with schema-valid XML.

So you can do partial decomposition once-or-more per-query or decompose once (during insert of a row with an XML column) and query the structure using the XML index many times. Either way, you're using relational queries… but the overhead of the decomposition is "non-trivial". And the table structure is non-trivial, taking up between 2-6 times the space of the original XML. 

If you want to store XML but need to query only one or two element or attribute values, consider persisted computed columns, based on XQuery. You can even persist a subset of the XML document into a separate table (think instead-of trigger) and query that, to cut down on the size of the XML index(es). Remember though, you will be doing one or more XML queries against non-indexed data with each insert into the base table.

The upshot of all this is, if you have data that can be modeled as relational, use relational tables. If you'd thought of using XML columns and queries because it is a ubiquitous format that everyone uses (the "its cool" factor), well…you'll thank me later. And if you do go with XQuery, don't think "I won't use XML indexes, they are too big". Use 'em…and/or persisted computed columns.

Does everybody get that?