Thinking about Hekaton? Then think about collations.

Among the new features in SQL Server 2014, the most compelling, at least from my point of view, is the in-memory tables and procedures known as project Hekaton. Because the tables are organized completely differently than traditional SQL Server tables there are, naturally, some limitations around Hekaton tables and procedures. In looking over the list in Kalen Delaney’s CTP1 whitepaper, I’m sure most relational designers would be struck by the “no check constraints, no foreign keys, no triggers” limits. I was intrigued by the repercussions with respect to collation. Hekaton tables must use a BIN2 collation on all character columns that participate in indexes. In addition, the natively compiled procedures only support BIN2 collations for comparisons, sorting, and grouping. Since, BIN2 collations are unusual in my experience, I decided to look into this. Especially since you don’t need a copy of SQL Server 2014 to look into this now.

BIN2 uses a sorting/grouping/comparison algorithm based on Unicode codepoints. This means that sorting/grouping/comparison (I’ll use just the word “sorting” as a shortcut for a while) is, by definition, case-sensitive and accent-sensitive. Any upper-case character sorts before all lower-case characters. So if I create a BIN2-version of the authors table in the pubs database, the last name “del Castillo” sorts at the end of the list using “ORDER BY au_lname”. This would be surprising to most end-users and, as Michael Kaplan wrote in his “International Features in SQL Server 2000” whitepaper: “in general, one of the most effective ways to alienate end users of an application is to get a task such as basic sorting wrong.”

Maybe I’m worrying too much about this, as many/most folks in places other than US may be used to using the COLLATE clause on expressions *everywhere*, includng string literals. As Qingsong Yao writes, “A linguistic collation usually can sort one language correctly, but not all languages. For example, the Latin1_General_CI_AS collation does not follow the rule of French language.” But for those that aren’t familiar with COLLATE clause, unless you like binary sort, get used to the COLLATE clause. The equivalent for “ORDER BY au_lname” with COLLATE would be “ORDER BY au_lname COLLATE Latin1_General_CI_AS_KS_WS ASC”, replacing “Latin1_General…” by whatever collation you’re already using to sort. And we couldn’t use this with a natively compiled procedure.

Finally, I thought it would be nice to come up with recommendations for collation on an instance and database level. Qingsong Yao has some in his blog articles, such as:

1. Don’t alter a database collation (of an existing database)
2. Changing the collation of a column is not so trivial
3. Got Collation conflict, How to avoid this?

So after reading his suggestions, how about…

1. If you’re changing some tables in a DB to Hekaton, use a collation on columns in those tables.
2. If you plan to be going mostly or all Hekaton in a DB, create a new copy of the DB with a BIN2 collation at a database level
3. Do remember that tempdb uses the instance collation, so you can specify the COLLATE on temp tables or use the COLLATE DATABASE_DEFAULT syntax Kimberly mentions here.
4. If you’re going to have an all-Hekaton instance (not sure I’d do this because of limits on linked servers, etc. with compiled procs)
you could try a BIN2 collation at an instance level.
5. Don’t specify some columns with BIN2 and some non-BIN2 in the same table for Hekaton, because it may drive your programmers crazy, unless they’re already used to it.

Feel free to argue any of these points with me; if I think you have a better argument, I’ll change it.

But, to summarize this, before going to Hekaton, MAKE SURE and try things out (especially reports) with a BIN2 collation, and add the COLLATE clause when necessary to change things back to what users are expecting. Users DO like things fast, but the also like things “correct”.

Hope this makes your implementation smoother. And thanks to SQL Server Books Online, Michael Kaplan, Qingsong Yao, and Kalen Delaney’s “Inside SQL Server 2008 – Internals” for helping to provide some clarity on BIN2 collations and collations in general.

Cheers, Bob

Back to blogging, after a rather long hiatus

I was about to add a new blog entry, (coming soon) and realized that it’s been *months* since I blogged last, I’ve been rather busy lately, moving house after 15 years, and doing quite a bit of traveling (quite a bit for *me*, anyway) lately. I don’t think I’ve gone a month without a blog entry since early 2000’s. So, I’m getting back to it and more blog entries are coming. If you subscribe, I’ll try and make it worth your while.

Cheers, Bob

Speaking on Windows Azure SQL Database this month in Portland

The last time I spoke at the Portland SQL Server User Group was the first time I’d ever seen taken any questions on Windows Azure SQL Database, Microsoft’s Platform-as-a-Service (PaaS) offering that is a variation of SQL Server. So this month, on March 28, I’ll be doing a presentation on “Windows Azure SQL Database for the SQL Server Professional”.

I’ve been working with Windows Azure SQL Database almost since its inception, but, unlike the “box” version, they’ll be no discussion of versions or of what features were introduced when. Because there is only one version of Windows Azure SQL Database that you can be running…the current one!

Come and bring your questions. See you there.


I’m speaking on SQL Server service accounts next week

Just to let you know. The Portland (Oregon) SQL Server User Group has invited me to speak at the meeting on Thursday, Jan 24 at 6:30pm. I’ll be speaking on SQL Server service accounts, entitled “Who’s running my SQL Server services?”. I’ll be covering the evolution of service account choices and management in later versions of SQL Server and address how to manage these accounts and exactly what they control.

Full description of the talk and directions to the meeting are at the group’s website: Hope to see you there!


Getting Windows Azure SQL Database diagnostics using Windows Azure cmdlets

I’ve been trying out the new supported Windows Azure PowerShell Cmdlets this week and I’d have to say I’m impressed. I’ve only used the ones for setup (Get-AzureSettingsFile and friends) and the ones for Windows Azure SQL Database (WASD) so far, and there’s a new one for WASD I really like. The original Codeplex WAAPS cmdlets only supported WASD Server and Firewall rule manipulation, the new ones also support a series of *-*AzureSqlDatabase ones, like Get-AzureSqlDatabase for example. I thought I should be able to “navigate” from what’s returned from the “Server” cmdlets down to the database, but unless I’ve missed something, that’s not possible as the Server cmdlets return SqlDatabaseServerContext. To get to Database cmdlets you need an IServerDataServiceContext and SqlDatabaseServerContext doesn’t implement that interface. Instead you need to use the cmdlet New-AzureSqlDatabaseServerContext, which requires a Server name and a PSCredential. The database commands work well, but…

The more interesting bit is what you can obtain through the context. You see, New-AzureSqlDatabaseServerContext does return a class that implements IServerDataServiceContext but that class is really ServerDataServiceSqlAuth. You should check that this is true in your script before using it by using the PowerShell “-is” operator. The ServerDataServiceSqlAuth includes 15 properties that return enumerations (actually they are WCF Data Services DataServiceQuery<T> objects) of interesting things. In fact there is an entire WCF Data Services object model for this which you can retrieve using RetrieveMetadata method on ServerDataServiceSqlAuth. Some of the interesting things are:

EventLogs – WASD event logs were recently introduced
Servers and ServerMetrics
Databases and DatabaseMetrics
Operations and OperationDetails – these are DAC operations you’ve done

So you can “monitor” your WASD through PowerShell, without ever hitting the Windows Azure Portal. Very Nice. If you didn’t understand that whole alphabet soup of class library references, here’s what it looks like in code. Remember, first you must use Get-AzureSettingsFile and Import-AzureSettingsFile to “configure” your administrative certificate.

# Get a SQLDataServicesSqlAuth (context), substitute name of your server
$cred = Get-Credential
$ctx = New-AzureSqlDatabaseServerContext -ServerName “[myserver]” -credential $cred

# check to see if you have the right class before going further
if ($ctx -is [Microsoft.WindowsAzure.Management.SqlDatabase.Services.Server.ServerDataServiceSqlAuth]) {

# gets EventLogs enumeration
# or ServerMetrics enumeration

Since this is really an “enumeration”, you could filter it in a loop like this:

# filter only events for master database
$ctx.EventLogs | Foreach-Object -Process { if ($_.DatabaseName -eq “master”) { $_ } }  #or whatever logic you want
# or even simpler
$ctx.EventLogs | where { $_.DatabaseName -eq “master” }

} # end of check for right class

So what else could one want? Well, if a had to say, it might be additional operations you can do with the portal, like wrapping the DAC Import-Export service (although I’ve been able to do this by hand, it would be nice to have a supported cmdlet) or setting up Windows Azure SQL Data Sync. Those may show up later on in time. Or the ability (this is probably do-able with WCF Data Services, I haven’t looked that hard yet) to filter the EventLogs or other enumerations before the fact. In the last line of code above, I’m fetching all the event log lines and filtering them at the client, rather than asking WCF Data Services for a subset. But for now, I’m happy to have the set of useful information and also the base functionality that these cmdlets provide. But of course, I do have an aversion to using web-based GUI for repeatable tasks.

Happy scripting, Bob.


I’m doing Expertseminarium in Stockholm in March

I’m happy to announce that I’ll be in Stockholm Mar 12-13 as a speaker in AddSkills’ Expertseminarium program. I’ll be doing 2 days full of SQL Server query optimization and internals information, troubleshooting, and analysis techniques. We’ll be diving into SQL Server using tools like Extended Events to get a good look into what’s going on with your system. Hope to see you there!

Cheers, Bob


Choosing Paths for Selective XML Index – Part 2 – Using the SXI XEvents

In part 1 of this series, we tried a simple example from the Books Online without any SXI index at all, and an SXI with all the paths covered. No XEvents were emitted in either case. Now let’s change the XSI around (by dropping and recreating, although we could use ALTER, so envision a drop in between each example) to see what triggers the XEvents and what the query plan effects are.

— first let’s back off one path
CREATE SELECTIVE XML INDEX simple_sxi ON Tbl(xmlcol)
path123 =  ‘/a/b’ as XQUERY ‘node()’
— path124 =  ‘/a/b/c’

— now, you get XML Reader, XML Reader with XPath Filer, CI Seek on SIDX, CI Scan on base table
— and “selective_xml_index_path_not_indexed” XEvent for /a/b/c
SELECT FROM tbl T WHERE T.xmlcol.exist(‘/a/b[c = “43”]’) = 1

— how about adding the SINGLETON keyword and XQuery Type?
CREATE SELECTIVE XML INDEX simple_sxi ON Tbl(xmlcol)
path123 =  ‘/a/b’ as XQUERY ‘node()’,
path124 =  ‘/a/b/c’ as XQUERY ‘xs:string’ SINGLETON

— now, you get a single CI Seek on SIDX and CI Scan on base table
SELECT FROM tbl T WHERE T.xmlcol.exist(‘/a/b[c = “43”]’) = 1

— how about an XML.value query instead of XML.exists?
— get XML Reader with XPath Filter, XML Reader, and CI Scan on base table
— and “selective_xml_index_no_compatible_sql_type” XEvent for /a/b/c
SELECT FROM tbl T WHERE T.xmlcol.value(‘(/a/b/c)[1]’, ‘nvarchar(10)’) = ’43′;

— For this to use SXI, we need:
ALTER INDEX simple_sxi ON Tbl FOR (add path124s = ‘/a/b/c’ as SQL nvarchar(10) SINGLETON);

— SXI CI Seek and CI Scan on base table, no XEvents
SELECT FROM tbl T WHERE T.xmlcol.value(‘(/a/b/c)[1]’, ‘nvarchar(10)’) = ’43′;

BTW, if you use ‘varchar(10)’ rather than ‘nvarchar(10)’ in the query, the SXI won’t be used. One last one, let’s see if the SXI will be used if we specify an incorrect XQuery type:

CREATE SELECTIVE XML INDEX simple_sxi ON Tbl(xmlcol)
path123 =  ‘/a/b’ as XQUERY ‘node()’,
path124 =  ‘/a/b/c’ as XQUERY ‘xs:double’ SINGLETON

— We get “selective_xml_index_no_compatible_xsd_types” XEvent for /a/b/c
— And XML Reader with XPath Filter, XML Reader, CI Seek on SIDX and CI Scan on base table
SELECT FROM tbl T WHERE T.xmlcol.exist(‘/a/b[c = “43”]’) = 1;

So, to summarize, in this post and the last post we’ve shown:
The SXI cannot be used to search for mixed content.
The ‘AS XQUERY node()’ specification can be used to check for existance of non-terminating node.
The XEvents for SXI can guide you to understanding which nodes to specify.
You must use “As SQL…” hint in order for SXI to be used in XML.value method.
You use “As XQUERY…” hint to use SXI for XQuery predicates or paths in the XML.exist method.

To go back to the BOL examples and test your understanding, write a selective XML index create statement for the following:

SELECT T.record,
T.xmldata.value(‘(/a/b/c/d/e[./f = “SQL”]/g)[1]’, ‘nvarchar(100)’)

Cheers, Bob


Choosing Paths for Selective XML Index – Part 1

About a month ago, I wrote a series of blog entries on the Selective XML Index introduced in SQL Server 2008 SP1. Just to summarize before I start up again, the Selective XML Index (I’ve abbreviate as SXI) is a feature introduced to allow indexing of only certain paths in an XML column. It uses side-tables using sparse column technology (that was introduced in SQL Server 2008) to effectively index pieces of XML documents while keeping the size of the index relatively low. It is unrelated to the “original” XML index technology (CREATE PRIMARY XML INDEX, etc) that creates large indexes (2-5 times size of the original documents) but indexes every element, attribute, and text node. Either XML indexing technology may be used independently or both may be used together. In addition to the (primary) SXI, you can have “secondary” SXIs, which are simply nonclustered indexes over specific columns in the side table.

Since I wrote the original series, Books Online topics for SXI have appeared. So I thought I’d work through a strategy for deciding to define these indexes and show how these indexes affect query plans. As implied by the name of the feature, since you index specific paths, deciding which paths to index is key. Each path takes up space in the side-table. In addition, you can specify “keywords” on the paths, such as XQuery data type, SQL data type, and whether this path is a singleton. I’ve also found four new Extended events to help in my decision. These events fire when you have a selective XML index in general, but the path may be missed or specified incorrectly. Namely,

selective_xml_index_no_compatible_sql_type – Occurs when a value() method is used in a query, specifying a sql type that differs from the type specified during selective XML index creation for that path.

selective_xml_index_no_compatible_xsd_types – Occurs when the user specifies a different resulting XSD type in a query, than the one that was used to promote the path during selective XML index creation.

selective_xml_index_path_not_indexed – Occurs when the path given in the user query is not promoted in the selective XML index, and thus the index is not used.

selective_xml_index_path_not_supported – Occurs when selective XML index is not used for a XML query due to the user using an unsupported path type.

I’ve created and started an XEvent session with these events, also including the text of the SQL Server statement involved. So let’s start with a simple example. After enabling SXI on the database (see my original posting), I took this one from the BOL and define the table as:

CREATE TABLE tbl (  id int identity primary key,  xmlcol XML );

Note that xmlcol is untyped XML, meaning we don’t have an associated XML Schema Collection. We’re looking to optimize, as an exemplar query:
SELECT FROM tbl T WHERE T.xmlcol.exist(‘/a/b[c = “43”]’) = 1;

And add a single row that will return true:
INSERT T values(‘<a><b><c>43</c></b></a>’);

Let’s try this query first without any index. The query plan is supplied as a file (because it’s fairly large), but consists of:

1. Clustered index (CI) scan of base table
2. Two “XML Reader with XPath Filter” steps (to select the nodes we want)    a. One step for the path outside the XPath predicate (a/b)    b. One step for the path outside the XPath predicate (c as child of a/b)
3. One “XML Reader” to get the value of text node c.

For such a simple query against a 1-row table, the estimated subtree cost is over 1. (1.09117). We can do better.

So let’s start by trying BOL suggested SXI:

CREATE SELECTIVE XML INDEX simple_sxi ON Tbl(xmlcol)
path123 =  ‘/a/b’,
path124 =  ‘/a/b/c’

Unfortunately, this returns an error: Msg 6394, Level 16, State 1, Line 1 Found non leaf element indexed for the value with path ‘simple_sxi’ for selective XML index ‘path123′.

It’s “upset” because I have the path /a/b. It won’t index any path that not a “terminating path”, i.e. that has other XML underneath (in this case the “c” element). This is interesting for a few reasons. First off, I can’t index “mixed content” with the SXI. Mixed content in XML is content which contains both tags and text, for example an HTML page. Here’s a simple example.

— The “value” of node b contains both text and tags (the <i></i> tags).
INSERT tbl values(‘<a><b>This is some text. This word is <i>italicized</i>.</b></a>’);

Since storing/querying mixed content (e.g. WordXML) is a common use case, the fact that it won’t work with XSI is good to know. But, in this case, I just want to look for the existance of the /a/b node. BTW, existance in this case isn’t related to the “XML exist” method, but to my filter in my XML Reader With XPath Filter step. Is there a way to do this? Turns out, this is exactly what the “node()” specifier is for. So let’s try that.

CREATE SELECTIVE XML INDEX simple_sxi ON Tbl(xmlcol)
path123 =  ‘/a/b’ as XQUERY ‘node()’,
path124 =  ‘/a/b/c’

No error on creating the SXI now. That’s good. And how about the query plan? Ahh…that’s much better.

1. Clustered index scan on the SXI side-table. Filter on value 43.
2. Clustered index seek on side-table for primary key of base table.
3. Clustered index seek to join to base table.

Query cost: 0.0098608. That’s a little better.

…And this post is getting a little long. So I’ll continue on with further investigations in the next post. But what about the XEvents? We haven’t seen any events yet. When we don’t have an SXI, it doesn’t suggest “missing paths”. And the SXI we did use contained all the appropriate paths, so there’s no XEvents on that query. So we’ll wait on those for now.


New XEvent events in SQL Server 2012 SP1

I’ve been looking a bit more into the SQL Server 2012 SP1 release, mostly to find diagnostics related to the Selective XML Index (SXI). I found some additional extended events for SXI (there are four of them in the “index” category) and I’ll cover them in, as well as exercise them, the next posting. But in my travels, I also came across five more new-to-SP1-from-RTM extended events. They occur in the “warnings” category when you’re using the XEvent GUI, and correspond, for the most part, to warnings that were added to query plans in SQL Server 2012 RTM. So if you liked these as warnings you’ll probably like them as events. They are:

plan_affecting_convert – Occurs when a type convert issue affects the plan.  The expression value indicates the conversion that can cause inaccurate Cardinality Estimation or prevent from using a seek based plan in the query plan choice.  If performance is affected, rewriting the query could help.

spatial_guess – Occurs when the optimizer cannot get enough join selectivity information.  Use this event to look at the queries where spatial index could be used and the optimizer guesses the spatial selectivity. (Note: I *always* seem to see this hint, even when there should be enough information for the optimizer to guess the spatial selectivity, and filed a Connect item for this one.)

unmatched_filtered_indexes – Occurs when the optimizer cannot use a filtered index due to parameterization.  For each such index an event is fired.  The unmatched_database_name, unmatched_schema_name, unmatched_table_name, unmatched_index_name fields give details of the filtered index that could not be used.

The “original” new SQL Server 2012 plan warnings, hash_warning and sort_warning exist in RTM in the “errors” category; they came over to extended events when all the SQL Profiler events were moved over.

There is also one additional warning.

optimizer_timeout – Occurs when the optimizer times out either due to spending too much time or hitting a memory limit.  Use this event to look at all the queries that are impacted by the optimizer timeout in a particular workload. This can be very useful when tuning a particular workload.

Finally, there is an additional replication XEvent in the replication category:

logreader_start_scan – Outputs the replnextlsn value from the database table that the first replcmds scan uses as a starting point.

Happy event-ing!


Speaking at DevWeek London in March

Lucky me, I’ll be speaking at DevWeek in London again this year. The festivities start off with pre-cons on March 4, I’ll do a whole day on writing best-performing T-SQL and application code. After that, I’ll be doing breakout talks on a variety of topics, from SQL Server Extended Events, to Data Movement in Windows Azure SQL Database, to SQLCLR internal, and more.

Hope to see some familiar faces at this one and make some new friends as well. Stop by and say hi…