How To Return Document Property Values in Full-Text Search

About a week ago I got involved with a question via Twitter that was posted originally to Stack Exchange. The question was "Is it possible to list document properties via FTS"? The question refers to SQL Server 2012's new support or property-based search with a syntax that looks like this:

SELECT name DocumentName, file_stream.GetFileNamespacePath() Path
FROM Documents
WHERE CONTAINS(PROPERTY(file_stream, 'Title'), 'data OR SQL');

This functionality has nice synergy with another SQL Server 2012 feature, the filetable feature. So you can store documents in a filetable and search with FTS. Including document properties. There's just one thing missing: no way to get the actual property values after you've searched for them. So, as I wrote in this suggestion on Connect:

"Suppose I have three documents that have authors of "Bob Beauchemin", "Bob Newhart" and "Bob Gupta". I can do a CONTAINS-based search for documents with authors that have a keyword "Bob", but can't bring back the entire propery value, which would be very useful.

Currently, the closest I can get is using sys.dm_fts_index_keywords_by_property. But think only produces the keywords, not the property values. So, in the case above, I wouldn't know if the actual author name was "Bob Newhart" or "Newhart Bob". and keywords wouldn't include noisewords."

Well, now there IS a solution. I'd just written a blog post about a standalone component for full-text search called ThinkHighlight. Wrote to the folks over at Interactive Thoughts and… voila! Their latest version supports returning not only highlighted text, but an XML format column that includes highlighted text AND property name-value pairs called HitHighlightEx. It includes all the properties as the canonical propertyset/propertyid, and if you're FTS index uses a SEARCH PROPERTY LIST, it will resolve the propertyset/propertyid to the property name from the SEARCH PROPERTY LIST.

As an example, let's use the demo script from the FTS blog. I can search for the keyword "compression" and return highlighted text and properties:

DECLARE @query nvarchar(50) = 'compression';
DECLARE @context varbinary(8000) = dbo.HitHighlightContext('Documents', 'file_stream', @query, -1);
SELECT TOP 10 Name, dbo.HitHighlightEx(@context, 'top-fragment', 200, stream_id) as description
FROM Documents
WHERE CONTAINS(file_stream, @query);

But better, I can take the property XML apart and return individual columns as properties:

DECLARE @query nvarchar(50) = 'compression';
DECLARE @context varbinary(8000) = dbo.HitHighlightContext('Documents', 'file_stream', @query, -1);
SELECT Name,
  Description.value('(/document/properties/property[@name="Title"]/@value)[1]', 'nvarchar(max)') as Title,
  Description.value('(/document/properties/property[@name="Author"]/@value)[1]', 'nvarchar(max)') as Author,
  Description.value('(/document/properties/property[@name="Tags"]/@value)[1]', 'nvarchar(max)') as Tags
FROM
(
SELECT TOP 10 Name, dbo.HitHighlightEx(@context, 'top-fragment', 200, stream_id) as description
FROM Documents
WHERE CONTAINS(file_stream, @query)
) a;

And even get properties that aren't in my SEARCH PROPERTY LIST. So, to get the "LastAuthor" property (Property/Details in Windows Explorer calls this "Last Saved By"):

DECLARE @query nvarchar(50) = 'compression';
DECLARE @context varbinary(8000) = dbo.HitHighlightContext('Documents', 'file_stream', @query, -1);
SELECT Name,
  Description.value('(/document/properties/property[@name="Title"]/@value)[1]', 'nvarchar(max)') as Title,
  Description.value('(/document/properties/property[@name="Author"]/@value)[1]', 'nvarchar(max)') as Author,
  Description.value('(/document/properties/property[@name="Tags"]/@value)[1]', 'nvarchar(max)') as Tags,
  Description.value('(/document/properties/property[@set="f29f85e0-4ff9-1068-ab91-08002b27b3d9" and @id="8"]/@value)[1]', 'nvarchar(max)') as LastAuthor
FROM
(
SELECT TOP 10 Name, dbo.HitHighlightEx(@context, 'top-fragment', 200, stream_id) as description
FROM Documents
WHERE CONTAINS(file_stream, @query)
) a;

As always, you can find a complete list of canonical Windows properties here.

So there you have it. To get in on the beta and make suggestions, download the ThinkHighlight beta here. BTW: I have NO affliation of any kind with Interactive Thoughts. But I DO like the way they respond to enhancement requests.

@bobbeauch

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.