Integrating SQL Server 2014 constructs into the T-SQL canon

So I was going through the features in SQL Server 2014, seeing how and where they fit into the canon of SQL Server’s T-SQL. I’m going to add some of this for my advanced T-SQL Immersion Event at the end of September. SQL Server 2014 is supposed to be a release with no T-SQL changes at all. Not windowing enhancements, not time interval data types, nothing. So the effect on the T-SQL programmer should be minimal, right? Well, maybe not minimal.

The most visible new feature in SQL Server 2014 is known as Hekaton. In a perfect world, Hekaton tables would simply be a deployment decision by the DBA. The miracle “go faster” switch, just buy some more memory move your tables over. But it turns out that the surface area of tables and statements on tables in wide and deep. And in “V1″ of memory-optimized tables there are a number of constructs that they don’t support. These include some data types, as well as constraints and triggers. So T-SQL developers need to “make up” for lack of these by coming up with their own scheme, no mean feat although there are some examples of this type of code in the Books Online. And that’s just the beginning, the list of unsupported items means that the move of a table to memory-optimized won’t be a simple change to DDL.

As another example, the change from pages to hash buckets in the in-memory engine gives new credibility (well, gives credibility) to the concept of uniqueidentifier as primary key. With hash buckets, spreading out the key values won’t cause fragmentation, it should actually be as fast or faster. It is still a bigger data type though, which means more to keep in memory. Just buy a bit more memory and a fast disk for the log.

T-SQL compiled stored procedures that calls table-specific code is needed to make memory-optimized tables run as fast as possible. In fact, compiled stored procedures can only access memory-optimized tables. These aren’t your traditional stored procedures. They must be defined with a transaction isolation level, as well as other options. And these stored procedures also have a number of limitations, including limiting usage of some built-in functions, tempdb, and collation choice. They also compile on first use (when you create them initially and when you bring up SQL Server) and don’t recompile because of statistics changes. Finally, these procedures need error handling because, depending on isolation level, they might rollback due to the multi-version engine. Ah, the multi-version engine.. That may affect T-SQL programmers a little.

There’s more than just rowstore and linked list memory-optimized tables. This release can use columnstore (xVelocity) as the primary storage vehicle, rather than as an additional copy of your warehouse data as in SQL Server 2012. And the columnstore is updatable now. And because columnstore is “V2″ now, programmers can “undo” almost every workaround that was needed to take full advantage of “Batch Mode” processing because constructs like outer joins will now use Batch Mode automatically.

Let’s not forget the new cardinality estimation. If you’re fearing plan regressions, you can always opt out, but sounds like some tuning, as well as better estimates leading to better plans, will be going on. And there’s always parallelism in SELECT..INTO too.

So ever if they never add another T-SQL feature to SQL Server 2014 by RTM, there’s lots to talk about that pertains to the way you’ll write T-SQL in future as well as, of course, lots about how we write it for “best results” today. See you later this month.


An implied (not implying) permissions function

I’d always wondered why they didn’t make the “ImplyingPermissions” function (code is part of the books online) part of the product. I originally thought it was because the function was subtly (or not so subtly) misnamed. What the function does, is “given a permission in a permission class, return a table of which built-in permissions grant (or deny) you that input permission by permission inheritance/covering”. To use their example, if the input permission is “ALTER” in the “SCHEMA” class, the returned table includes the “ALTER” permission in the “DATABASE” class, meaning that granting someone “ALTER DATABASE” also grants them “ALTER SCHEMA” in every schema in that database.

I always wanted the corollary function, “ImpliedPermissions”. That is, given the permission “ALTER” in the “DATABASE” class, what other permissions at lower levels am I granted by hierarchy/covering? There’s probably someone that’s already written that function, but searching for “SQL Server Implied Permissions” always brings me back to the “ImplyingPermissions” function. Which is where I got the idea about its being misnamed. It’s non-trivial to write, given that there are two types of permission “inheritance” – inheritance and covering. But I came upon this quick way that uses the ImpliedPermission function and CROSS APPLY:

CREATE FUNCTION dbo.ImpliedPermissions(@class nvarchar(64), @permission_name nvarchar(64))
SELECT a.*, b.height, b.RANK
FROM sys.fn_builtin_permissions(”) AS a
CROSS APPLY dbo.ImplyingPermissions(a.class_desc, a.permission_name) AS b
WHERE b.CLASS = @class AND b.permname = @permission_name

Note that this is not the same as creating a LOGIN or USER, assigning it permissions, then impersonating the principal and using sys.fn_my_permissions. The “My permissions” function (which must have been invented when Windows was calling everything “My…”) requires a *specific* securable and class, not a class and permission. So it can tell you which permissions (multiple) you’ve been granted on schemaX.tableX, and one level and type of inheritance between table and column (SELECT on a table gives you SELECT on all the columns), but not functionality like “SELECT” in the “OBJECT” class. And the Implying/ImpliedPermissions functions themselves don’t deal with individuals, just classes.

So now there is a corollary function too. I did notice that it was a bit slow, especially trying to use it with semi-joins and anti semi-joins. So, if you’re using this function a lot, you may want to materialize the permission hierarchy table (builtin_permissions CROSS APPLY ImplyingPermissions) somewhere. And now I can tell you, “if you give someone class X/permission Y, what else are you giving them? Well, at least to the extent that sys.fn_builtin_permissions() accurately reflects the permission hierarchy/covering (I think I’ve found some edge cases). Enjoy…


Batch Mode improvement for columnstores in SQL Server 2014 CTP1

This post is a continuation of my previous posts on columnstore indexes in SQL Server 2014 CTP1 and query plans. The test bed is the same as in previous posts, based on an enlarged version of a fact table in AdventureWorks2012DW.

One of the things that improves query speed when using columnstore indexes is the usage of a new batch mode, as opposed to “traditional” row mode. Batch mode refers to the fact that individual iterators execute in batches rather than row-by-row. You can observe this in query plan iterators; each one contains the properties “Actual Execution Mode” and “Estimated Execution Mode”.

In SQL Server 2014, usage of Batch Mode will be more mainstream. We’ve actually already seen evidence of improvements from SQL Server 2012 to SQL Server 2014 in the first post. In SQL Server 2012, The hash join implementation consists of two iterators: a build iterator and an actual join iterator. In SQL Server 2014, there is not a separate build iterator; everything is pushed into the join iterator. There were also other limitations in SQL Server 2012 for batch mode. For example, batch mode was only supported in inner joins and the hash table must fit entirely in memory. In addition, query patterns such as scalar aggregation, multiple distinct aggregations and IN clauses and other patterns could not use Batch Mode. A series of query rewrites for common patterns to enable queries to use Batch Mode were posted on the Technet Wiki. SQL Server 2014 removes some, if not most, of these limitations.

I’ll look at one of these queries today to see how things have improved, that’s the outer join. We’ll use the following query as a test:

select ProductAlternateKey, SUM(SalesAmount) as SumSales
from DimProduct
left outer join FactInternetSalesNewCCI fact
on DimProduct.ProductKey = fact.ProductKey
group by ProductAlternateKey
order by SumSales desc;

In SQL Server 2012, because of the outer join, none of the iterators (including the nonclustered columnstore scan) in the query use Batch Mode. In SQL Server 2014, the left outer join iterator, as well as the following HashMatch partial aggregate and the columnstore scan use Batch Mode. People who invested in the nonclustered columnstore index in SQL Server 2012 will be glad to note that these improvements work with nonclustered columnstore as well as clustered columnstore.

In addition, I tried this query in SQL Server using both the old and new cardinality estimation. The new estimation was right on the money, with a final estimate of 504 Estimated and Actual Rows v. the old estimator’s estimate of 376 Estimated v. 504 Actual. Of course, the higher, more accurate estimate results in a plan with a higher estimated cost and memory grant (the more accurate estimate is higher after all), but we’ll know that we’re using the “right” plan because of the more accurate estimates. In this case, the different estimate did not change the shape of the plan.

So, to summarize:
1. The cases in which Batch Mode will be used with Columnstore iterators in plans have been expanded in SQL Server 2014, leading to even faster queries.
2. The Batch Mode improvements work with the nonclustered as well as the clustered columnstore indexes.
3. The new cardinality estimator almost always results in more accurate cardinality estimates in SQL Server 2014.

Cheers, Bob

As always, query plans…

Columnstore indexes and cardinality estimates in SQL Server 2014 CTP1

This post is a continuation of my previous post about SQL Server 2014 Columnstore and Query Plans. I’ll take a look at how the new cardinality estimates affects columnstore index query plans. I’ve written a bit about the new cardinality estimator in SQL Server 2014, as well as looked into how to tell what the estimator is doing by using XEvent trace. Today, I’ll apply that to columnstore as well as a data warehouse query against a traditional rowstore.

First off, since I used the AdventureWorks2012DW database and attached it in SQL Server 2014 rather than building it “by hand, from scratch”, I’m using the “old” cardinality estimator by default. This applies even to new tables in databases restored/attached from pre-2014 versions, as we’re using here. I can get queries to use the “new” estimator on a query-by-query basis by using the query hint “OPTION(QUERYTRACEON 2312)”. You can also set the trace flag at other levels of granularity (session and ‌instance-level) of course. But it’s easier to compare the two using the query hints. You can confirm that this works by referencing the “main” (e.g. SELECT, in this case) iterator’s CardinalityMode110 is OFF (old estimator) or ON (new estimator).

With the sample query from the previous post, all queries (traditional bitmap, nonclustered columnstore, and clustered columnstore) got the same estimate of 100.8 estimated rows v. 119 actual rows. Although the query shapes were different between “traditional” bitmap plans and columnstore plans, the estimates were consistent at the end. Because of the different plan shapes, it’s not straightforward to determine where the estimates started to diverge. SQL Server 2012 SP1 plans for traditional bitmap and nonclusted columnstore got the same eventual estimate as the analogous queries in SQL Server 2014 CTP1. Trace flag 2312 has no
effect on cardinality in SQL Server 2012, as you would expect (although yes, I did try it).

When the new cardinality estimates are turned on, the eventual estimate for all variations of the table is slightly worse at the end, 189.842 estimated rows v. 119 actual rows. With the “traditional” bitmap plan this doesn’t change the plan but does result in the estimated plan having a slightly higher cost for the new cardinality estimator (54%-46% of the batch). The memory grant is slightly higher as well, as you would expect. With the clustered and nonclustered columnstore the higher estimated rows results in a different plan choice (HashMatch (aggregate) – old v. Sort and StreamAggregate – new). This different plan choice results in a plan with much higher cost (71%-29% of the batch), although the choice of StreamAggregate results in a slightly lower memory grant for the new plan.

To summarize:
1. The new cardinality estimator in SQL Server 2014 works with queries against columnstores as well as rowstores.
2. The cardinality estimates are consistent between columnstores and rowstores, even though the plan shapes might be different.
3. The new cardinality estimator can change a plan. As in any query optimizer change, there can be regressions, but I’ve mostly seen improvements with the new estimator. As well as improvement with a different columnstore plan, coming up.

Cheers, Bob

Query plans are included.

Columnstore Indexes and Query Plans in SQL Server 2014 CTP1

Today I thought I’d try out some of the new, highly-touted columnstore improvements in SQL Server 2014. First off, to figure out what’s new I used three main sources: one is the “canon”, the whitepaper, “Enhancements to SQL Server Column Stores“, presented at Sigmod 2013. The second source is the SQL Server 2014 CTP1 Books Online page, that documents and illustrates the DDL to define and work with clustered (and nonclustered) columnstore indexes. The third source is Niko Neugebauer’s excellent blog post series and experiments on Clustered Columnstore Indexes, that starts here.

Rather than repeat what those folks have already implemented, documented, and experimented with respectively, I went off on a different tangent. I’d like to find out how the new clustered columnstore index affects query plans, and look for changes in queries between SQL Server 2012 and 2014 CTP1. I started by restoring AdventureWorks2014 to a SQL Server 2014 instance and a SQL Server 2012 SP1 instance. And taking the FactInternetSales table up to over a million (about 1.15 million) rows using a combination of select…into and insert…select.

On the SQL Server 2012 instance a made in copy of the table without a nonclustered columnstore index and one with a nonclustered columnstore index. The nonclustered columnstore index contained all of the columns in the table. SQL Server 2014 had a third copy of this table, one that used a clustered columnstore index.

With each table (named FactInternetSalesNew, FactInternetSalesNewNCI, and FactInternetSalesNewCCI so I could keep trace of them), I did the relevant variation of the following query:

select ProductAlternateKey, CalendarYear, SUM(SalesAmount) as SumSales
from FactInternetSalesNew fact
join DimDate on fact.OrderDateKey = DateKey
join DimProduct on DimProduct.ProductKey = fact.ProductKey
where CalendarYear BETWEEN 2006 AND 2007
and ProductAlternateKey LIKE ‘BK%’
group by ProductAlternateKey, CalendarYear;

The query plans were interesting is their differences and similarities.

1.The “traditional” row-based storage used a HashJoin with a Bitmap for one dimension (DimDate) and a MergeJoin to join in the Product dimension. This was the same in SQL Server 2012 SP1 and 2014. This is a different plan shape than any of the columnstore plans, with the dimensions being joined in a different order and the aggregation happening at a different point in the plan.
2.The SQL Server 2012 nonclustered columnstore index used BatchHashTableBuild iterators to build hash tables for each dimension, then HashJoins for the join with each dimension. The scan of the fact table, BatchHashTableBuilds and HashJoins, and a HashMatch (partial aggregation) iterator after the joins used Batch Mode; the rest of the iterators used Row mode.
3. The SQL Server 2014 nonclustered columnstore index did NOT use BatchHashTableBuild iterators, instead the join between the Fact table and DateDim used a HashMatch. However, the HashMatch iterator did contain the new (for 2014) property “BitmapCreator=True”. The HashMatch iterator to join in the Product dimension did use a HashMatch, but the BitmapCreator property was not present. All iterators EXCEPT the Exchange, DimensionTable scans, and the last HashMatch (aggregate) used Batch Mode.
4. The SQL Server 2014 clustered columnstore index plan was essentially the same as the nonclustered columnstore index plan.

For clarity’s sake, I’ll continue the series and see how the cardinality estimation works with columnstore indexes in the next post.
For expediency’s sake, the plans are posted as an attachment. I’ll hopefully get back to replace them with pictures soon.

So, to summarize:
1. The query plan for nonclustered columnstore index changes for the example query between SQL Server 2012 SP1 and 2014 CTP1. All experiments are performed on my 2-proc VM.
2. In SQL Server 2014 CTP1, the plan for nonclustered and clustered columnstore indexes are the same. The savings of the clustered columnstore index is that you don’t have to store your actual data twice.

Cheers, Bob

Tracing The New Cardinality Estimation in SQL Server 2014

According to the whitepaper “Mission-Critical Performance and Scale with SQL Server and Windows Server” there’s a new cardinality estimator in SQL Server 2014. The whitepaper doesn’t say a lot about how it works, but does mention “trace information will be provided”.

That’s XEvent trace of course, and the trace event is “query_optimizer_estimate_cardinality”. I decided to see what kind of information there is. Created an event session with that single event, and added an action of sql_text and named it, unsurprisingly, “Cardinality”.

Before starting it up, you need to make sure you’re using the new cardinality estimator on your queries; you can check this by looking in the query plan at the value of property “CardinalityEstimationModel110″. You should confirm that the value is “ON”. (Note: this a strange to me because, if the new estimator was introduced in SQL Server 2014, shouldn’t it be CardinalityEstimationModel120?) Databases that are restored from earlier versions’ backups don’t seem to have that property on, databases created and populated “from scratch” on SQL Server 2014 will. There’s also a trace flag 2312 to enable the new estimation model. You also need to be creating a plan, so make sure you don’t already have the plan in cache. Not getting the new model with restored databases from previous versions lets you compared the cardinality estimates and query plans between 2014 CTP1 and previous SQL Server versions.

Note: This post is already way too long to answer the question “What does a cardinality estimator in a database do?” To get a feel for it type “Cardinality Estimation SQL Server” into your favorite search engine and I’ll meet you back here in a while…(hint…it estimates cardinality).

This event fires, naturally, multiple times during the creation of a query plan. I started off with a “simple” 5-table join with a GROUP BY clause and received 54 of these events. Each event contains three major pieces of diagnostic information:
–InputRelation – Logical operator and part of a parse tree. Some ops contain cardinality numbers.
–Calculator – Information about the cardinality calculators used.
–StatsCollection – Stats read for this estimation step and estimates.

After switching back and forth between the chunks of XML using SSMS, I decided to factor the most useful pieces of information into columns, for easier reading. However the XML can be arbitrarily complex, so I left the original XML columns intact as well. Some calculators provide different estimate information, so these should be factored out into a case-based columns. After reading these sets of rows for a while, things become as intuitive as when you first started reading and understanding query plans.

For example, my 5-table join used the Northwind view “Order Subtotals”. This view looks like:

SELECT “Order Details”.OrderID, Sum(CONVERT(money,(“Order Details”.UnitPrice*Quantity*(1-Discount)/100))*100) AS Subtotal
FROM “Order Details”
GROUP BY “Order Details”.OrderID

My first row for this query was for the global aggregate logical operator (LogOp_GbAgg). I sifted out the following pieces of information:
From InputRelation:
–Operator: LogOp_GbAgg
–Table1: Order Details
–Cardinality1: 2155
From Calculator:
–CalculatorType: DistinctCountCalculator
–CalculatorName: CDVCPlanLeaf
–SingleColumnStat: OrderID
–CoveringStatDensity: 2155
From StatsCollection:
–StatsCollectionName: CStCollGroupBy
–StatsCollectionCardinality: 830
–StatsInfoObject1: Order Details
–StatsId: 1

Which means that we expect to read 2155 rows from “Order Details” and group it by OrderID into 830 groups. This is what’s estimated for the query plan and works out to be exactly the same as the acual number of rows. Comparing it to the “old” (pre-2014) estimate, its exactly the same. For this one. Other estimates that I’ve looked at are usually closer in the “new” estimate than the “old” estimate and, in many cases result in different query plans. As in every query optimizer change, there are also some regressions. I’ve only looked at less than 20 queries so far.

I’m providing the query I used to take apart the XEvent trace along with this article. You’re welcome to add more columns, and if you come up with more interesting ways to present the information, please let me know. BTW, the  SQL query heavily uses SQL Server XQuery, so it’s going to be slow for large trace files unless you materialize intermediate results as tables and add XML indexes.

I also “consulted the binary” for lists of enumerated values of Operator, CalculatorType, CalculatorName, and StatsCollectionName, but they are too many to list here; they’re included in the zip file. It’s interesting to see what kind of query clauses result in a special calculator (e.g. CSelCalcTrieBased or CSelCalcExpressionComparedToExpression) or special statistics collection names (my favorite amusing ones are CStCollFudge and CStCollBlackBox).

Hope this adds to your estimation investigation enjoyment.
Cheers, Bob


New query plan items in SQL Server 2014 CTP1

Every new version of SQL Server includes new or expanded features that effect query plans. Most (sometimes not all) of the query plan changes (and most of the functionality changes) can be discovered by “diff-ing” the XML schema for query plans between the current/new version and the previous version. I’ve done this before to enumerate the plan changes from SQL Server 2008 R2 to SQL Server 2012.

Here’s the changes discovered from diff-ing the plan schemas from SQL Server 2012 SP1 to SQL Server 2014 CTP1.

First, the BaseStatementInfoType (these show up as the properties on the far left iterator (e.g. SELECT iterator) has some new attributes. Note that not all of them show up in the CTP1 plans I’ve looked at, the ones I’ve seen in real plans are marked with an asterisk.
CardinalityEstimationModel110* – does this use the new SQL Server 2014 cardinality estimate model or the “legacy” model?

There’s also a new reason for non-parallelized plan: NoParallelForMemoryOptimizedTables. I didn’t find this in the diff, but noticed this in real CTP1 plans while looking for the other attributes.

In the QueryPlanType structure, there’s an additional attribute, EffectiveDegreeOfParallelism. This is the maximum parallelism used by a columnstore index build. Parallel columnstore index builds is a new feature.

There’s a few for the Hekaton feature:
– ObjectType/StorageType – new enumeration member: MemoryOptimized*
– TableScanType or RowsetType: New Attribute on (Predicate/PartitionId/IndexedViewInfo) – IsMemoryOptimized*
– IndexKindType enumeration now includes:
NonClusteredHash – Hekaton index*
SelectiveXML and SecondarySelectiveXML – for Selective XML Index feature introduced in SQL Server 2012 SP1. Don’t know why these weren’t in the 2012 SP1 schema.

New Attribute on HashType: BitmapCreator
New Attribute on RelOpType: StatsCollectionId

Note that, as I mentioned before, I haven’t actually seen all these new attributes yet and the XML Schema doesn’t always make discovering where I should look for them easy. But I now know approximately what to look for and looking at query plans in XML format (rather than “picture” format) will help. As time goes on, we’ll find out what causes these attributes to surface. For example, the CardinalityEstimationModel110 attribute appears in most/every query plan, but the EffectiveDegreeOfParallelism requires collecting a query plan for a columnstore index build.

Feel free to let me know if you find any more items. Happy splunking…

Hekaton, read committed isolation, and statement-level inconsistency

It’s stated in SQL Server 2014 Books Online that “The following isolation levels are supported for transactions accessing memory-optimized tables: SNAPSHOT, REPEATABLE READ, and SERIALIZABLE”. But later on, on the same page, it’s stated the “The isolation level READ COMMITTED is supported for memory optimized tables with autocommit transactions”. “Autocommit transaction” is just another way to say that individual SQL statements are always atomic, and so an “autocommit transaction” is a single SQL statement that either completely commits or completely rolls back. There’s no way, for example, for a single SQL UPDATE statement to update 10 rows successfully but fail to update 1 row. Either all 11 rows are updated or none of them are.

I had trouble reconciling these two seemingly conflicting statements about Hekaton tables’ isolation level and set out to find out if Hekaton tables can implement “traditional” SQL Server read committed semantics, or if they *support* read committed, but implement it with snapshot semantics. Read committed means that “the latest committed version of a row is read”.

A well-known behavior of SQL Server tables under read committed is known as “statement-level inconsistency”. You can see this behavior by executing a single SELECT statement in a client program running under a debugger, reading a few rows and then stopping the program, and, in a different session, updating, deleted, or inserting rows that have not yet been read (but remember that SQL Server uses 8000 byte buffers for reading and sending rows to the client side). The final resultset usually will reflect the changes that happen *during* the execution of the statement, which includes the reading of the rows from table to buffer to client. If you haven’t seen this behavior demonstrated, come to a class or place where I’m speaking and ask.

Back to Hekaton tables. There are four use cases, and let’s assume the session’s isolation level is read committed, the default.
1. Running in a compiled stored procedure – the isolation level must be specified in the stored procedure definition. Read committed is not supported.
2. Explicit user transactions (i.e. BEGIN TRAN…etc) – attempting to perform a SELECT statement on a Hekaton table without an isolation level hint or with a query hint of readcommitted produces an error.
3. Implicit user transactions (i.e. SET IMPLICIT_TRANSACTIONS ON) – work the same as explicit user transactions with respect to Hekaton tables.
4. Autocommit transactions – attempting to perform a SELECT on a Hekaton table without an isolation level hint succeeds. You aren’t allowed to use the readcommitted query hint even in an autocommit transaction.

We’re interested in use case 4. And, with that use case, in CTP1, using a SELECT statement without a hint produces SNAPSHOT semantics, rather than READ COMMITTED semantics. There is no statement-level inconsistency for Hekaton tables Also with READ COMMITTED SNAPSHOT isolation, the behavior of the READ COMMITTED isolation level (RCSI) is indistinguishable from SNAPSHOT.

Given those experimental results (and perhaps I need to do more experimentation with queries with more complex plans), I’m prepared to conclude until convinced otherwise that Hekaton tables support READ COMMITTED (and RCSI) with autocommit transactions, but *implement* that level with SNAPSHOT semantics. What’s the difference?

With a multi-version storage engine, each version is read as of a well-defined logical read time. With SNAPSHOT isolation, the logical read time is the beginning of the transaction. READ COMMITTED isolation would read the appropriate version with a logical read time equal to the current time, according to the whitepaper “High-Performance Concurrency Control Mechanisms for Main-Memory Databases“. In autocommit transactions, the logical read time always appears to be the beginning of the (one and only) SQL statement. SNAPSHOT and READ COMMITTED and (RCSI) are indistinguishable.

To wrap this up, most folks would consider the Hekaton behavior to be an improvement, as READ COMMITTED statement-level inconsistancy is not usually considered to be a desirable behavior. But it was worth asking the question… So do Hekaton tables really implement only “SNAPSHOT, REPEATABLE READ, and SERIALIZABLE” isolation?

Cheers, Bob

Speaking on Hekaton at Portland Oregon SQL Server User Group

As you can probably tell by the last couple of blog postings, I’ve be doing a bit of experimenting with the SQL Server 2014 Hekaton feature in the past few weeks. I’ll be doing a hopefully demo-heavy (but I do have a few slides) talk next Thursday July 25 at the Portland SQL Server User Group at 6:30 to share some observations. If you’re at all interested (and you’re nearby, this is Portland Oregon, after all) show up with your questions and your intuition and we’ll all share, maybe try out some things I haven’t thought of yet. Hope to see you there.

Cheers, Bob

Transaction dependencies and speculative reads with memory-optimized tables

In reading the whitepaper about “High-Performance Concurrency Control Mechanisms for Main-Memory Databases”, I was intrigued by the discussion of speculative reads and transaction dependencies. It’s not always good to use information from an academic whitepaper as details of an implementation, because the real implementation might be differ slightly from the description of the whitepaper’s implementation. In addition, bear in the that the observations here are based on the CTP1 version of SQL Server 2014, and the details may change by RTM.

The whitepaper describes two implementations of the multiversion storage engine, one using optimistic concurrency with no locking and one using locking. Because the description of the Hekaton feature mentioned “lock-free structures” as one of the pillars, I looked at the optimistic implementation as possibly close to the CTP1 implementation.

To paraphrase the whitepaper, there are at least two ways the speculative reads can take place.
1. Transaction A determines if it can read Version V. Version V’s begin timestamp is another transaction’s (TB) ID. TB is in the preparing state. Use transaction B’s end timestamp (obtained right before TB-prepare) as V’s begin time. Speculatively read V. Transaction A has a commit dependency of Transaction B.
2. Transaction A determines if it can read Version V.  Version V’s end timestamp is another transaction’s (TE) ID. TE is in the preparing state. Use transaction E’s end timestamp (obtained right before TE-prepare) as V’s end time. Speculatively ignore V. Transaction A has a commit dependency of Transaction E.

These descriptions mention dependencies happening in the transaction A’s Active phase, but in addition, there is mention that “Transaction A may acquire additional commit dependencies during validation but only if it speculatively ignores a version.” For information about transaction phases, reference the previous blog entry or the whitepaper.

After noticing the extended events dependency_acquiredtx_event and waiting_for_dependenciestx_event, I set out to look for those dependencies. Because tx A can only acquire a dependency on tx B when B is in the preparing phase (state) and, in most cases, the preparing state is usually pretty short, the dependency sounded almost like a race condition. Making the preparing state as long as possible would give me a better chance.

In the implementation of pessimistic concurrency, the whitepaper mentions two “read sets” that are checked during the prepare phase. The ReadSet contains pointers to every version read and the ScanSet stores information needed to repeat every scan. The whitepaper also describes WriteSet but that’s outside my scope. During the prepare phase ReadSets are checked to ensure consistency in isolation level Repeatable Read or Serializable. In addition, ScanSets are checked to guard against phantoms in isolation level serializable. Serializable isolation with a large ScanSet seemed to be the best choice to lengthen the prepare phase.

I declared a two column memory-optimized table with a primary key/hash index on one column (I called it id) and no index at all on the other column (c1). Added 250000 rows. And figured that running the following batch inside a “traditional transaction” (to slow things down even more compared to a compiled stored procedure) from multiple clients should produce the behavior.

begin tran
select top(150000) * from txtest with (serializable);
update txtest with (serializable)  set c1 = 1
where c1 in (select top(10) c1 from txtest with (serializable) order by newid()); — update 10 random rows

Running betwen 5-10 of these clients simultaneously for 20 iterations of each client produced the “transaction dependency” behavior consistently. Tracing this with an extended events session that included:

Besides observing the behavior, I was able to make some interesting observations from the event session.
1. You can have “chains” of dependent transactions, e.g. tx A depends on tx B, tx B depends on tx C, etc.
2. You can have multiple dependent transactions on the same transaction, e.g. txs A, B, and C all depend on tx D.
3. You can have multiple dependency_acquiredtx_event for the same two transactions, e.g. two different occurrences of the event for the dependency tx A depends on tx B.

I also noticed one other interesting behavior. In my tests with 10 clients x 20 iterations (200 transactions total) between 60% and 80% of the transactions routinely abort. That’s not surprising, the test was set up to produce read conflicts. What was surprising is that, although transaction commit wasn’t the norm, every transaction that had dependency(ies) on it ended up committing. 22 of 200 transactions in one test. And the dependent ones also committed. That’s surprising, but perhaps the fact that transactions that have dependencies and those that they depend on all eventually commit is just an artifact of the test batch I chose.

So, from this test you can deduce that:
1. Transaction dependencies and speculative reads are real (can be observed) in the current implementation.
2. Large scans and transaction isolation levels higher than snapshot should only be used with memory-optimized tables when *absolutely* necessary. 3. That’s especially true for transaction isolation level serializable.

Cheers, Bob