Choosing Paths for Selective XML Index – Part 1

About a month ago, I wrote a series of blog entries on the Selective XML Index introduced in SQL Server 2008 SP1. Just to summarize before I start up again, the Selective XML Index (I’ve abbreviate as SXI) is a feature introduced to allow indexing of only certain paths in an XML column. It uses side-tables using sparse column technology (that was introduced in SQL Server 2008) to effectively index pieces of XML documents while keeping the size of the index relatively low. It is unrelated to the “original” XML index technology (CREATE PRIMARY XML INDEX, etc) that creates large indexes (2-5 times size of the original documents) but indexes every element, attribute, and text node. Either XML indexing technology may be used independently or both may be used together. In addition to the (primary) SXI, you can have “secondary” SXIs, which are simply nonclustered indexes over specific columns in the side table.

Since I wrote the original series, Books Online topics for SXI have appeared. So I thought I’d work through a strategy for deciding to define these indexes and show how these indexes affect query plans. As implied by the name of the feature, since you index specific paths, deciding which paths to index is key. Each path takes up space in the side-table. In addition, you can specify “keywords” on the paths, such as XQuery data type, SQL data type, and whether this path is a singleton. I’ve also found four new Extended events to help in my decision. These events fire when you have a selective XML index in general, but the path may be missed or specified incorrectly. Namely,

selective_xml_index_no_compatible_sql_type – Occurs when a value() method is used in a query, specifying a sql type that differs from the type specified during selective XML index creation for that path.

selective_xml_index_no_compatible_xsd_types – Occurs when the user specifies a different resulting XSD type in a query, than the one that was used to promote the path during selective XML index creation.

selective_xml_index_path_not_indexed – Occurs when the path given in the user query is not promoted in the selective XML index, and thus the index is not used.

selective_xml_index_path_not_supported – Occurs when selective XML index is not used for a XML query due to the user using an unsupported path type.

I’ve created and started an XEvent session with these events, also including the text of the SQL Server statement involved. So let’s start with a simple example. After enabling SXI on the database (see my original posting), I took this one from the BOL and define the table as:

CREATE TABLE tbl (  id int identity primary key,  xmlcol XML );

Note that xmlcol is untyped XML, meaning we don’t have an associated XML Schema Collection. We’re looking to optimize, as an exemplar query:
SELECT FROM tbl T WHERE T.xmlcol.exist(‘/a/b[c = “43”]’) = 1;

And add a single row that will return true:
INSERT T values(‘<a><b><c>43</c></b></a>’);

Let’s try this query first without any index. The query plan is supplied as a file (because it’s fairly large), but consists of:

1. Clustered index (CI) scan of base table
2. Two “XML Reader with XPath Filter” steps (to select the nodes we want)    a. One step for the path outside the XPath predicate (a/b)    b. One step for the path outside the XPath predicate (c as child of a/b)
3. One “XML Reader” to get the value of text node c.

For such a simple query against a 1-row table, the estimated subtree cost is over 1. (1.09117). We can do better.

So let’s start by trying BOL suggested SXI:

CREATE SELECTIVE XML INDEX simple_sxi ON Tbl(xmlcol)
path123 =  ‘/a/b’,
path124 =  ‘/a/b/c’

Unfortunately, this returns an error: Msg 6394, Level 16, State 1, Line 1 Found non leaf element indexed for the value with path ‘simple_sxi’ for selective XML index ‘path123’.

It’s “upset” because I have the path /a/b. It won’t index any path that not a “terminating path”, i.e. that has other XML underneath (in this case the “c” element). This is interesting for a few reasons. First off, I can’t index “mixed content” with the SXI. Mixed content in XML is content which contains both tags and text, for example an HTML page. Here’s a simple example.

— The “value” of node b contains both text and tags (the <i></i> tags).
INSERT tbl values(‘<a><b>This is some text. This word is <i>italicized</i>.</b></a>’);

Since storing/querying mixed content (e.g. WordXML) is a common use case, the fact that it won’t work with XSI is good to know. But, in this case, I just want to look for the existance of the /a/b node. BTW, existance in this case isn’t related to the “XML exist” method, but to my filter in my XML Reader With XPath Filter step. Is there a way to do this? Turns out, this is exactly what the “node()” specifier is for. So let’s try that.

CREATE SELECTIVE XML INDEX simple_sxi ON Tbl(xmlcol)
path123 =  ‘/a/b’ as XQUERY ‘node()’,
path124 =  ‘/a/b/c’

No error on creating the SXI now. That’s good. And how about the query plan? Ahh…that’s much better.

1. Clustered index scan on the SXI side-table. Filter on value 43.
2. Clustered index seek on side-table for primary key of base table.
3. Clustered index seek to join to base table.

Query cost: 0.0098608. That’s a little better.

…And this post is getting a little long. So I’ll continue on with further investigations in the next post. But what about the XEvents? We haven’t seen any events yet. When we don’t have an SXI, it doesn’t suggest “missing paths”. And the SXI we did use contained all the appropriate paths, so there’s no XEvents on that query. So we’ll wait on those for now.


Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.