Selective XML Index – Implementation Details

A while after I posted part 1, there was a twitter message from Aaron (@AaronBertrand), a well-known "smart person" in the SQL Server space, about using feature packs as "ship vehicles", especially when they (possibly) have the effect of requiring keeping all instances in-sync at the service pack level to keep things working and possibly even making SP feature relevent during a restore. I answered that "it must be a compelling feature". And then we reminisced about features like "vardecimal" and "15k partitions". So I'll discuss why it is a compelling feature (or at least why I happen to think it is) later on in the process.

But for now, about how it's implemented…

Before this even. You can add or remove path specs on a Selective XML Index. I found this one by "reading into an error message" and trying syntax until it worked.

— add a pathspec
alter index fooidx on foo
for
(
add another_pathspec = 'foo/bar2'
);

–remove the pathspec we just added
alter index fooidx on foo
for
(
remove another_pathspec
);

As Micheal said, the selective XML index is implemented using sparse columns. The first column (or columns) is the primary key of the base table, which they always named pk1…n. Then a row_id column, which is incremented if there are multiple matching nodes in the same row.

Each pathspec consists of at least two columns: a path column and 1-2 value column(s). Path column is named path_[n]_id (which is always varbinary(900) and sparse).

Value column is named either "[pathspec_name]_[n]_value" or "[pathspec_name]_[n]_sql_value" (for SQL type pathspec). If the pathspec contains a wildcard, there is an additional column named path_[n]_hid (varbinary(900)) where HID stands for hierarchyid (i.e. the path in the document). A pathspec with an XPath/XQuery wildcard would look like this: '/foo/*/baz2'.

As an aside, remember that the SQL Server XML data type (node paths) and hierarchyid data type use the same encoding scheme, known as Ordpath, see my old blog entrry "Ordpath, ordpath, everywhere".

All of the columns (except pk1 and row_id) are sparse. As far as data types (because the nice thing is that SPARSE columns can preserve strong typing) as typed as:
 Type 1: xs:untypedAtomic = varbinary(max)
 Type 2: The closest SQL data type to XSD type (e.g. xs:double = SQL FLOAT)
 Type 3: The SQL type specified in the "pathspec AS SQL…" specification. Including length and collation.

The length of all of the sparse columns are based on the length of the column (either express or implied) in the pathspec.

So, what's this all mean?

It means that, depending on how selective each pathspec is, how many pathspecs are designation in the Selective XML Index definition and the sparseness of each of the column values, the Selective XML Index can be MUCH, MUCH smaller than the (original) PRIMARY XML INDEX (which is implemented as a 12-column side-table, one row per XML node). Because you're only indexing those values your care about. And individual node data CAN be sparse in XML (element text can be, but XML attribute nodes are even more likely to be sparse).

And it also means that the limitations of SQL Server 2008 SPARSE columns apply. See this starting point in Books Online to understand these limits in detail. And, because there isn't a XML columnset column, you are limited to theoretical maximum of 511 pathspecs, if none of them are wildcards and there's a 1 column primary key (1024 maximum columns in table without a columnset, – 2 / 2. I don't know what the real limit is yet.

@bobbeauch

2 thoughts on “Selective XML Index – Implementation Details

Comments are closed.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.