An implied (not implying) permissions function

I’d always wondered why they didn’t make the “ImplyingPermissions” function (code is part of the books online) part of the product. I originally thought it was because the function was subtly (or not so subtly) misnamed. What the function does, is “given a permission in a permission class, return a table of which built-in permissions grant (or deny) you that input permission by permission inheritance/covering”. To use their example, if the input permission is “ALTER” in the “SCHEMA” class, the returned table includes the “ALTER” permission in the “DATABASE” class, meaning that granting someone “ALTER DATABASE” also grants them “ALTER SCHEMA” in every schema in that database.

I always wanted the corollary function, “ImpliedPermissions”. That is, given the permission “ALTER” in the “DATABASE” class, what other permissions at lower levels am I granted by hierarchy/covering? There’s probably someone that’s already written that function, but searching for “SQL Server Implied Permissions” always brings me back to the “ImplyingPermissions” function. Which is where I got the idea about its being misnamed. It’s non-trivial to write, given that there are two types of permission “inheritance” – inheritance and covering. But I came upon this quick way that uses the ImpliedPermission function and CROSS APPLY:

CREATE FUNCTION dbo.ImpliedPermissions(@class nvarchar(64), @permission_name nvarchar(64))
SELECT a.*, b.height, b.RANK
FROM sys.fn_builtin_permissions(”) AS a
CROSS APPLY dbo.ImplyingPermissions(a.class_desc, a.permission_name) AS b
WHERE b.CLASS = @class AND b.permname = @permission_name

Note that this is not the same as creating a LOGIN or USER, assigning it permissions, then impersonating the principal and using sys.fn_my_permissions. The “My permissions” function (which must have been invented when Windows was calling everything “My…”) requires a *specific* securable and class, not a class and permission. So it can tell you which permissions (multiple) you’ve been granted on schemaX.tableX, and one level and type of inheritance between table and column (SELECT on a table gives you SELECT on all the columns), but not functionality like “SELECT” in the “OBJECT” class. And the Implying/ImpliedPermissions functions themselves don’t deal with individuals, just classes.

So now there is a corollary function too. I did notice that it was a bit slow, especially trying to use it with semi-joins and anti semi-joins. So, if you’re using this function a lot, you may want to materialize the permission hierarchy table (builtin_permissions CROSS APPLY ImplyingPermissions) somewhere. And now I can tell you, “if you give someone class X/permission Y, what else are you giving them? Well, at least to the extent that sys.fn_builtin_permissions() accurately reflects the permission hierarchy/covering (I think I’ve found some edge cases). Enjoy…


Batch Mode improvement for columnstores in SQL Server 2014 CTP1

This post is a continuation of my previous posts on columnstore indexes in SQL Server 2014 CTP1 and query plans. The test bed is the same as in previous posts, based on an enlarged version of a fact table in AdventureWorks2012DW.

One of the things that improves query speed when using columnstore indexes is the usage of a new batch mode, as opposed to “traditional” row mode. Batch mode refers to the fact that individual iterators execute in batches rather than row-by-row. You can observe this in query plan iterators; each one contains the properties “Actual Execution Mode” and “Estimated Execution Mode”.

In SQL Server 2014, usage of Batch Mode will be more mainstream. We’ve actually already seen evidence of improvements from SQL Server 2012 to SQL Server 2014 in the first post. In SQL Server 2012, The hash join implementation consists of two iterators: a build iterator and an actual join iterator. In SQL Server 2014, there is not a separate build iterator; everything is pushed into the join iterator. There were also other limitations in SQL Server 2012 for batch mode. For example, batch mode was only supported in inner joins and the hash table must fit entirely in memory. In addition, query patterns such as scalar aggregation, multiple distinct aggregations and IN clauses and other patterns could not use Batch Mode. A series of query rewrites for common patterns to enable queries to use Batch Mode were posted on the Technet Wiki. SQL Server 2014 removes some, if not most, of these limitations.

I’ll look at one of these queries today to see how things have improved, that’s the outer join. We’ll use the following query as a test:

select ProductAlternateKey, SUM(SalesAmount) as SumSales
from DimProduct
left outer join FactInternetSalesNewCCI fact
on DimProduct.ProductKey = fact.ProductKey
group by ProductAlternateKey
order by SumSales desc;

In SQL Server 2012, because of the outer join, none of the iterators (including the nonclustered columnstore scan) in the query use Batch Mode. In SQL Server 2014, the left outer join iterator, as well as the following HashMatch partial aggregate and the columnstore scan use Batch Mode. People who invested in the nonclustered columnstore index in SQL Server 2012 will be glad to note that these improvements work with nonclustered columnstore as well as clustered columnstore.

In addition, I tried this query in SQL Server using both the old and new cardinality estimation. The new estimation was right on the money, with a final estimate of 504 Estimated and Actual Rows v. the old estimator’s estimate of 376 Estimated v. 504 Actual. Of course, the higher, more accurate estimate results in a plan with a higher estimated cost and memory grant (the more accurate estimate is higher after all), but we’ll know that we’re using the “right” plan because of the more accurate estimates. In this case, the different estimate did not change the shape of the plan.

So, to summarize:
1. The cases in which Batch Mode will be used with Columnstore iterators in plans have been expanded in SQL Server 2014, leading to even faster queries.
2. The Batch Mode improvements work with the nonclustered as well as the clustered columnstore indexes.
3. The new cardinality estimator almost always results in more accurate cardinality estimates in SQL Server 2014.

Cheers, Bob

As always, query plans…

Columnstore indexes and cardinality estimates in SQL Server 2014 CTP1

This post is a continuation of my previous post about SQL Server 2014 Columnstore and Query Plans. I’ll take a look at how the new cardinality estimates affects columnstore index query plans. I’ve written a bit about the new cardinality estimator in SQL Server 2014, as well as looked into how to tell what the estimator is doing by using XEvent trace. Today, I’ll apply that to columnstore as well as a data warehouse query against a traditional rowstore.

First off, since I used the AdventureWorks2012DW database and attached it in SQL Server 2014 rather than building it “by hand, from scratch”, I’m using the “old” cardinality estimator by default. This applies even to new tables in databases restored/attached from pre-2014 versions, as we’re using here. I can get queries to use the “new” estimator on a query-by-query basis by using the query hint “OPTION(QUERYTRACEON 2312)”. You can also set the trace flag at other levels of granularity (session and ‌instance-level) of course. But it’s easier to compare the two using the query hints. You can confirm that this works by referencing the “main” (e.g. SELECT, in this case) iterator’s CardinalityMode110 is OFF (old estimator) or ON (new estimator).

With the sample query from the previous post, all queries (traditional bitmap, nonclustered columnstore, and clustered columnstore) got the same estimate of 100.8 estimated rows v. 119 actual rows. Although the query shapes were different between “traditional” bitmap plans and columnstore plans, the estimates were consistent at the end. Because of the different plan shapes, it’s not straightforward to determine where the estimates started to diverge. SQL Server 2012 SP1 plans for traditional bitmap and nonclusted columnstore got the same eventual estimate as the analogous queries in SQL Server 2014 CTP1. Trace flag 2312 has no
effect on cardinality in SQL Server 2012, as you would expect (although yes, I did try it).

When the new cardinality estimates are turned on, the eventual estimate for all variations of the table is slightly worse at the end, 189.842 estimated rows v. 119 actual rows. With the “traditional” bitmap plan this doesn’t change the plan but does result in the estimated plan having a slightly higher cost for the new cardinality estimator (54%-46% of the batch). The memory grant is slightly higher as well, as you would expect. With the clustered and nonclustered columnstore the higher estimated rows results in a different plan choice (HashMatch (aggregate) – old v. Sort and StreamAggregate – new). This different plan choice results in a plan with much higher cost (71%-29% of the batch), although the choice of StreamAggregate results in a slightly lower memory grant for the new plan.

To summarize:
1. The new cardinality estimator in SQL Server 2014 works with queries against columnstores as well as rowstores.
2. The cardinality estimates are consistent between columnstores and rowstores, even though the plan shapes might be different.
3. The new cardinality estimator can change a plan. As in any query optimizer change, there can be regressions, but I’ve mostly seen improvements with the new estimator. As well as improvement with a different columnstore plan, coming up.

Cheers, Bob

Query plans are included.

Columnstore Indexes and Query Plans in SQL Server 2014 CTP1

Today I thought I’d try out some of the new, highly-touted columnstore improvements in SQL Server 2014. First off, to figure out what’s new I used three main sources: one is the “canon”, the whitepaper, “Enhancements to SQL Server Column Stores“, presented at Sigmod 2013. The second source is the SQL Server 2014 CTP1 Books Online page, that documents and illustrates the DDL to define and work with clustered (and nonclustered) columnstore indexes. The third source is Niko Neugebauer’s excellent blog post series and experiments on Clustered Columnstore Indexes, that starts here.

Rather than repeat what those folks have already implemented, documented, and experimented with respectively, I went off on a different tangent. I’d like to find out how the new clustered columnstore index affects query plans, and look for changes in queries between SQL Server 2012 and 2014 CTP1. I started by restoring AdventureWorks2014 to a SQL Server 2014 instance and a SQL Server 2012 SP1 instance. And taking the FactInternetSales table up to over a million (about 1.15 million) rows using a combination of select…into and insert…select.

On the SQL Server 2012 instance a made in copy of the table without a nonclustered columnstore index and one with a nonclustered columnstore index. The nonclustered columnstore index contained all of the columns in the table. SQL Server 2014 had a third copy of this table, one that used a clustered columnstore index.

With each table (named FactInternetSalesNew, FactInternetSalesNewNCI, and FactInternetSalesNewCCI so I could keep trace of them), I did the relevant variation of the following query:

select ProductAlternateKey, CalendarYear, SUM(SalesAmount) as SumSales
from FactInternetSalesNew fact
join DimDate on fact.OrderDateKey = DateKey
join DimProduct on DimProduct.ProductKey = fact.ProductKey
where CalendarYear BETWEEN 2006 AND 2007
and ProductAlternateKey LIKE ‘BK%’
group by ProductAlternateKey, CalendarYear;

The query plans were interesting is their differences and similarities.

1.The “traditional” row-based storage used a HashJoin with a Bitmap for one dimension (DimDate) and a MergeJoin to join in the Product dimension. This was the same in SQL Server 2012 SP1 and 2014. This is a different plan shape than any of the columnstore plans, with the dimensions being joined in a different order and the aggregation happening at a different point in the plan.
2.The SQL Server 2012 nonclustered columnstore index used BatchHashTableBuild iterators to build hash tables for each dimension, then HashJoins for the join with each dimension. The scan of the fact table, BatchHashTableBuilds and HashJoins, and a HashMatch (partial aggregation) iterator after the joins used Batch Mode; the rest of the iterators used Row mode.
3. The SQL Server 2014 nonclustered columnstore index did NOT use BatchHashTableBuild iterators, instead the join between the Fact table and DateDim used a HashMatch. However, the HashMatch iterator did contain the new (for 2014) property “BitmapCreator=True”. The HashMatch iterator to join in the Product dimension did use a HashMatch, but the BitmapCreator property was not present. All iterators EXCEPT the Exchange, DimensionTable scans, and the last HashMatch (aggregate) used Batch Mode.
4. The SQL Server 2014 clustered columnstore index plan was essentially the same as the nonclustered columnstore index plan.

For clarity’s sake, I’ll continue the series and see how the cardinality estimation works with columnstore indexes in the next post.
For expediency’s sake, the plans are posted as an attachment. I’ll hopefully get back to replace them with pictures soon.

So, to summarize:
1. The query plan for nonclustered columnstore index changes for the example query between SQL Server 2012 SP1 and 2014 CTP1. All experiments are performed on my 2-proc VM.
2. In SQL Server 2014 CTP1, the plan for nonclustered and clustered columnstore indexes are the same. The savings of the clustered columnstore index is that you don’t have to store your actual data twice.

Cheers, Bob

Tracing The New Cardinality Estimation in SQL Server 2014

According to the whitepaper “Mission-Critical Performance and Scale with SQL Server and Windows Server” there’s a new cardinality estimator in SQL Server 2014. The whitepaper doesn’t say a lot about how it works, but does mention “trace information will be provided”.

That’s XEvent trace of course, and the trace event is “query_optimizer_estimate_cardinality”. I decided to see what kind of information there is. Created an event session with that single event, and added an action of sql_text and named it, unsurprisingly, “Cardinality”.

Before starting it up, you need to make sure you’re using the new cardinality estimator on your queries; you can check this by looking in the query plan at the value of property “CardinalityEstimationModel110″. You should confirm that the value is “ON”. (Note: this a strange to me because, if the new estimator was introduced in SQL Server 2014, shouldn’t it be CardinalityEstimationModel120?) Databases that are restored from earlier versions’ backups don’t seem to have that property on, databases created and populated “from scratch” on SQL Server 2014 will. There’s also a trace flag 2312 to enable the new estimation model. You also need to be creating a plan, so make sure you don’t already have the plan in cache. Not getting the new model with restored databases from previous versions lets you compared the cardinality estimates and query plans between 2014 CTP1 and previous SQL Server versions.

Note: This post is already way too long to answer the question “What does a cardinality estimator in a database do?” To get a feel for it type “Cardinality Estimation SQL Server” into your favorite search engine and I’ll meet you back here in a while…(hint…it estimates cardinality).

This event fires, naturally, multiple times during the creation of a query plan. I started off with a “simple” 5-table join with a GROUP BY clause and received 54 of these events. Each event contains three major pieces of diagnostic information:
–InputRelation – Logical operator and part of a parse tree. Some ops contain cardinality numbers.
–Calculator – Information about the cardinality calculators used.
–StatsCollection – Stats read for this estimation step and estimates.

After switching back and forth between the chunks of XML using SSMS, I decided to factor the most useful pieces of information into columns, for easier reading. However the XML can be arbitrarily complex, so I left the original XML columns intact as well. Some calculators provide different estimate information, so these should be factored out into a case-based columns. After reading these sets of rows for a while, things become as intuitive as when you first started reading and understanding query plans.

For example, my 5-table join used the Northwind view “Order Subtotals”. This view looks like:

SELECT “Order Details”.OrderID, Sum(CONVERT(money,(“Order Details”.UnitPrice*Quantity*(1-Discount)/100))*100) AS Subtotal
FROM “Order Details”
GROUP BY “Order Details”.OrderID

My first row for this query was for the global aggregate logical operator (LogOp_GbAgg). I sifted out the following pieces of information:
From InputRelation:
–Operator: LogOp_GbAgg
–Table1: Order Details
–Cardinality1: 2155
From Calculator:
–CalculatorType: DistinctCountCalculator
–CalculatorName: CDVCPlanLeaf
–SingleColumnStat: OrderID
–CoveringStatDensity: 2155
From StatsCollection:
–StatsCollectionName: CStCollGroupBy
–StatsCollectionCardinality: 830
–StatsInfoObject1: Order Details
–StatsId: 1

Which means that we expect to read 2155 rows from “Order Details” and group it by OrderID into 830 groups. This is what’s estimated for the query plan and works out to be exactly the same as the acual number of rows. Comparing it to the “old” (pre-2014) estimate, its exactly the same. For this one. Other estimates that I’ve looked at are usually closer in the “new” estimate than the “old” estimate and, in many cases result in different query plans. As in every query optimizer change, there are also some regressions. I’ve only looked at less than 20 queries so far.

I’m providing the query I used to take apart the XEvent trace along with this article. You’re welcome to add more columns, and if you come up with more interesting ways to present the information, please let me know. BTW, the  SQL query heavily uses SQL Server XQuery, so it’s going to be slow for large trace files unless you materialize intermediate results as tables and add XML indexes.

I also “consulted the binary” for lists of enumerated values of Operator, CalculatorType, CalculatorName, and StatsCollectionName, but they are too many to list here; they’re included in the zip file. It’s interesting to see what kind of query clauses result in a special calculator (e.g. CSelCalcTrieBased or CSelCalcExpressionComparedToExpression) or special statistics collection names (my favorite amusing ones are CStCollFudge and CStCollBlackBox).

Hope this adds to your estimation investigation enjoyment.
Cheers, Bob


New query plan items in SQL Server 2014 CTP1

Every new version of SQL Server includes new or expanded features that effect query plans. Most (sometimes not all) of the query plan changes (and most of the functionality changes) can be discovered by “diff-ing” the XML schema for query plans between the current/new version and the previous version. I’ve done this before to enumerate the plan changes from SQL Server 2008 R2 to SQL Server 2012.

Here’s the changes discovered from diff-ing the plan schemas from SQL Server 2012 SP1 to SQL Server 2014 CTP1.

First, the BaseStatementInfoType (these show up as the properties on the far left iterator (e.g. SELECT iterator) has some new attributes. Note that not all of them show up in the CTP1 plans I’ve looked at, the ones I’ve seen in real plans are marked with an asterisk.
CardinalityEstimationModel110* – does this use the new SQL Server 2014 cardinality estimate model or the “legacy” model?

There’s also a new reason for non-parallelized plan: NoParallelForMemoryOptimizedTables. I didn’t find this in the diff, but noticed this in real CTP1 plans while looking for the other attributes.

In the QueryPlanType structure, there’s an additional attribute, EffectiveDegreeOfParallelism. This is the maximum parallelism used by a columnstore index build. Parallel columnstore index builds is a new feature.

There’s a few for the Hekaton feature:
– ObjectType/StorageType – new enumeration member: MemoryOptimized*
– TableScanType or RowsetType: New Attribute on (Predicate/PartitionId/IndexedViewInfo) – IsMemoryOptimized*
– IndexKindType enumeration now includes:
NonClusteredHash – Hekaton index*
SelectiveXML and SecondarySelectiveXML – for Selective XML Index feature introduced in SQL Server 2012 SP1. Don’t know why these weren’t in the 2012 SP1 schema.

New Attribute on HashType: BitmapCreator
New Attribute on RelOpType: StatsCollectionId

Note that, as I mentioned before, I haven’t actually seen all these new attributes yet and the XML Schema doesn’t always make discovering where I should look for them easy. But I now know approximately what to look for and looking at query plans in XML format (rather than “picture” format) will help. As time goes on, we’ll find out what causes these attributes to surface. For example, the CardinalityEstimationModel110 attribute appears in most/every query plan, but the EffectiveDegreeOfParallelism requires collecting a query plan for a columnstore index build.

Feel free to let me know if you find any more items. Happy splunking…

Hekaton, read committed isolation, and statement-level inconsistency

It’s stated in SQL Server 2014 Books Online that “The following isolation levels are supported for transactions accessing memory-optimized tables: SNAPSHOT, REPEATABLE READ, and SERIALIZABLE”. But later on, on the same page, it’s stated the “The isolation level READ COMMITTED is supported for memory optimized tables with autocommit transactions”. “Autocommit transaction” is just another way to say that individual SQL statements are always atomic, and so an “autocommit transaction” is a single SQL statement that either completely commits or completely rolls back. There’s no way, for example, for a single SQL UPDATE statement to update 10 rows successfully but fail to update 1 row. Either all 11 rows are updated or none of them are.

I had trouble reconciling these two seemingly conflicting statements about Hekaton tables’ isolation level and set out to find out if Hekaton tables can implement “traditional” SQL Server read committed semantics, or if they *support* read committed, but implement it with snapshot semantics. Read committed means that “the latest committed version of a row is read”.

A well-known behavior of SQL Server tables under read committed is known as “statement-level inconsistency”. You can see this behavior by executing a single SELECT statement in a client program running under a debugger, reading a few rows and then stopping the program, and, in a different session, updating, deleted, or inserting rows that have not yet been read (but remember that SQL Server uses 8000 byte buffers for reading and sending rows to the client side). The final resultset usually will reflect the changes that happen *during* the execution of the statement, which includes the reading of the rows from table to buffer to client. If you haven’t seen this behavior demonstrated, come to a class or place where I’m speaking and ask.

Back to Hekaton tables. There are four use cases, and let’s assume the session’s isolation level is read committed, the default.
1. Running in a compiled stored procedure – the isolation level must be specified in the stored procedure definition. Read committed is not supported.
2. Explicit user transactions (i.e. BEGIN TRAN…etc) – attempting to perform a SELECT statement on a Hekaton table without an isolation level hint or with a query hint of readcommitted produces an error.
3. Implicit user transactions (i.e. SET IMPLICIT_TRANSACTIONS ON) – work the same as explicit user transactions with respect to Hekaton tables.
4. Autocommit transactions – attempting to perform a SELECT on a Hekaton table without an isolation level hint succeeds. You aren’t allowed to use the readcommitted query hint even in an autocommit transaction.

We’re interested in use case 4. And, with that use case, in CTP1, using a SELECT statement without a hint produces SNAPSHOT semantics, rather than READ COMMITTED semantics. There is no statement-level inconsistency for Hekaton tables Also with READ COMMITTED SNAPSHOT isolation, the behavior of the READ COMMITTED isolation level (RCSI) is indistinguishable from SNAPSHOT.

Given those experimental results (and perhaps I need to do more experimentation with queries with more complex plans), I’m prepared to conclude until convinced otherwise that Hekaton tables support READ COMMITTED (and RCSI) with autocommit transactions, but *implement* that level with SNAPSHOT semantics. What’s the difference?

With a multi-version storage engine, each version is read as of a well-defined logical read time. With SNAPSHOT isolation, the logical read time is the beginning of the transaction. READ COMMITTED isolation would read the appropriate version with a logical read time equal to the current time, according to the whitepaper “High-Performance Concurrency Control Mechanisms for Main-Memory Databases“. In autocommit transactions, the logical read time always appears to be the beginning of the (one and only) SQL statement. SNAPSHOT and READ COMMITTED and (RCSI) are indistinguishable.

To wrap this up, most folks would consider the Hekaton behavior to be an improvement, as READ COMMITTED statement-level inconsistancy is not usually considered to be a desirable behavior. But it was worth asking the question… So do Hekaton tables really implement only “SNAPSHOT, REPEATABLE READ, and SERIALIZABLE” isolation?

Cheers, Bob

Speaking on Hekaton at Portland Oregon SQL Server User Group

As you can probably tell by the last couple of blog postings, I’ve be doing a bit of experimenting with the SQL Server 2014 Hekaton feature in the past few weeks. I’ll be doing a hopefully demo-heavy (but I do have a few slides) talk next Thursday July 25 at the Portland SQL Server User Group at 6:30 to share some observations. If you’re at all interested (and you’re nearby, this is Portland Oregon, after all) show up with your questions and your intuition and we’ll all share, maybe try out some things I haven’t thought of yet. Hope to see you there.

Cheers, Bob

Transaction dependencies and speculative reads with memory-optimized tables

In reading the whitepaper about “High-Performance Concurrency Control Mechanisms for Main-Memory Databases”, I was intrigued by the discussion of speculative reads and transaction dependencies. It’s not always good to use information from an academic whitepaper as details of an implementation, because the real implementation might be differ slightly from the description of the whitepaper’s implementation. In addition, bear in the that the observations here are based on the CTP1 version of SQL Server 2014, and the details may change by RTM.

The whitepaper describes two implementations of the multiversion storage engine, one using optimistic concurrency with no locking and one using locking. Because the description of the Hekaton feature mentioned “lock-free structures” as one of the pillars, I looked at the optimistic implementation as possibly close to the CTP1 implementation.

To paraphrase the whitepaper, there are at least two ways the speculative reads can take place.
1. Transaction A determines if it can read Version V. Version V’s begin timestamp is another transaction’s (TB) ID. TB is in the preparing state. Use transaction B’s end timestamp (obtained right before TB-prepare) as V’s begin time. Speculatively read V. Transaction A has a commit dependency of Transaction B.
2. Transaction A determines if it can read Version V.  Version V’s end timestamp is another transaction’s (TE) ID. TE is in the preparing state. Use transaction E’s end timestamp (obtained right before TE-prepare) as V’s end time. Speculatively ignore V. Transaction A has a commit dependency of Transaction E.

These descriptions mention dependencies happening in the transaction A’s Active phase, but in addition, there is mention that “Transaction A may acquire additional commit dependencies during validation but only if it speculatively ignores a version.” For information about transaction phases, reference the previous blog entry or the whitepaper.

After noticing the extended events dependency_acquiredtx_event and waiting_for_dependenciestx_event, I set out to look for those dependencies. Because tx A can only acquire a dependency on tx B when B is in the preparing phase (state) and, in most cases, the preparing state is usually pretty short, the dependency sounded almost like a race condition. Making the preparing state as long as possible would give me a better chance.

In the implementation of pessimistic concurrency, the whitepaper mentions two “read sets” that are checked during the prepare phase. The ReadSet contains pointers to every version read and the ScanSet stores information needed to repeat every scan. The whitepaper also describes WriteSet but that’s outside my scope. During the prepare phase ReadSets are checked to ensure consistency in isolation level Repeatable Read or Serializable. In addition, ScanSets are checked to guard against phantoms in isolation level serializable. Serializable isolation with a large ScanSet seemed to be the best choice to lengthen the prepare phase.

I declared a two column memory-optimized table with a primary key/hash index on one column (I called it id) and no index at all on the other column (c1). Added 250000 rows. And figured that running the following batch inside a “traditional transaction” (to slow things down even more compared to a compiled stored procedure) from multiple clients should produce the behavior.

begin tran
select top(150000) * from txtest with (serializable);
update txtest with (serializable)  set c1 = 1
where c1 in (select top(10) c1 from txtest with (serializable) order by newid()); — update 10 random rows

Running betwen 5-10 of these clients simultaneously for 20 iterations of each client produced the “transaction dependency” behavior consistently. Tracing this with an extended events session that included:

Besides observing the behavior, I was able to make some interesting observations from the event session.
1. You can have “chains” of dependent transactions, e.g. tx A depends on tx B, tx B depends on tx C, etc.
2. You can have multiple dependent transactions on the same transaction, e.g. txs A, B, and C all depend on tx D.
3. You can have multiple dependency_acquiredtx_event for the same two transactions, e.g. two different occurrences of the event for the dependency tx A depends on tx B.

I also noticed one other interesting behavior. In my tests with 10 clients x 20 iterations (200 transactions total) between 60% and 80% of the transactions routinely abort. That’s not surprising, the test was set up to produce read conflicts. What was surprising is that, although transaction commit wasn’t the norm, every transaction that had dependency(ies) on it ended up committing. 22 of 200 transactions in one test. And the dependent ones also committed. That’s surprising, but perhaps the fact that transactions that have dependencies and those that they depend on all eventually commit is just an artifact of the test batch I chose.

So, from this test you can deduce that:
1. Transaction dependencies and speculative reads are real (can be observed) in the current implementation.
2. Large scans and transaction isolation levels higher than snapshot should only be used with memory-optimized tables when *absolutely* necessary. 3. That’s especially true for transaction isolation level serializable.

Cheers, Bob

A more complex discussion of user transactions and memory-optimized tables

In the last blog post, I tried out a simple example to show the difference between the three transaction isolation levels supported by memory-optimized tables. I only used user-transactions in T-SQL, and only showed the behavior of transactions consisting of SELECT statements, when INSERT/UPDATE/DELETE activity was taking place in other sessions running concurrently. But of course, things are not always *that* simple, and there are differences regarding:
1. Transactions that do concurrent INSERT/UPDATE/DELETE in multiple sessions
2. Transactions running in or using compiled stored procedures (which must be defined as atomic and declare their isolation level)
3. Autocommit transactions (i.e. single atomic SQL statements)

Today, I’ll expand the discussion to cover some additional cases.

One thing I did mention in the previous post is WHEN a user transaction fails. The previous post originally mentioned (as a simplification) that transactions with memory-optimized tables “…fail the user transaction at commit time, if there’s a conflict detected. SQL Server’s SNAPSHOT against “traditional” tables fail during user transactions at the statement where the conflict is detected.”

That’s not always true with concurrent *actions* in a user transaction. User transactions can fail at different points in a transactions lifetime, that is:

1. Normal Processing phase – during the lifetime of a transaction
2. Preparation phase – transaction decides whether it can commit or must abort

According to the definition in the whitepaper “High-Performance Concurrency Control Mechanisms for Main-Memory Databases” (sections 2 and 3), there are a total of 3 transaction processing phases:

Tx created
1. Normal Processing
2. Preparation
3. Postprocessing
Tx terminated

The preparation phase concludes with writes to the transaction log, if the transaction commits. The postprocessing phase consists of timestamp fix up, whether the transaction commits or aborts. It’s also good to remember that, in a T-SQL user transaction, the transaction begins with the first statement after “BEGIN TRANSACTION” that touches data, rather than during the T-SQL “BEGIN TRANSACTION” statement.

All of the conditions I looked at last week failed in the preparation phase.

It’s obvious that, if two concurrent INSERTS with the same primary key happen, one of them must fail. If the later insert happens during a transaction, that transaction fails at commit time (in the preparation phase), rather than normal processing time.

begin tran insert dbo.t1 with (snapshot) values(5, ‘Fred’)
– insert row with same primary key in another session and commit other session (insert, commit in other session works)
commit — Msg 41325, Level 16, State 0, Line 66 The current transaction failed to commit due to a serializable validation failure.

However, WRITE conflicts (transaction updates or deletes a row already updated/deleted by another session AFTER the transaction starts) fail at normal processing time. This is what we’re used to with conflict detection in SNAPSHOT isolation level with “ordinary” SQL Server tables. The previous post is rephrased from the simplified original “…fail at commit time” to “…CAN fail at commit time” to indicate that write conflicts fail the way that we’re “used to”.

begin tran
select * from dbo.t2 with (snapshot) — starts the transaction
– update row 2 in another transaction and commit other session
update dbo.t1 with (snapshot) set c2 = ‘Sam’ where c1 = 2
– Msg 41302, Level 16, State 110, Line 120
– The current transaction attempted to update a record that has been updated since this transaction started. The transaction was aborted.
– Msg 3998, Level 16, State 1, Line 120
– Uncommittable transaction is detected at the end of the batch. The transaction is rolled back.
commit — transaction already rolled back (above)

begin tran
select * from dbo.t2 with (snapshot) — starts the transaction
– update row 2 in another transaction and commit
delete dbo.t1 with (snapshot) where c1 = 2
– Msg 41302, Level 16, State 111, Line 130
– The current transaction attempted to update a record that has been updated since this transaction started. The transaction was aborted.
– Msg 3998, Level 16, State 1, Line 130
– Uncommittable transaction is detected at the end of the batch. The transaction is rolled back.
– The statement has been terminated.
commit — transaction already rolled back (above)

Just to confirm what I mentioned before, the following shows that the transaction starts when the first data is touched, rather than at BEGIN TRANSACTION. If I remove the SELECT from (unrelated) table dbo.t2 and update and commit row 2 in another session before accessing data at all, the transaction is consistent (you’re just updated already updated/committed data after the transaction starts) and the UPDATE succeeds.

begin tran
– update row 2 in another transaction and commit
update dbo.t1 with (snapshot) set c2 = ‘Sam’ where c1 = 2 — tx starts here, 2 is already updated/commited to new value
commit — OK

In traditional SNAPSHOT isolation, 2 INSERTers with the same primary key or 2 UPDATE/DELETEers when both sessions are in transactions cause the transaction with the last action to wait, but with multi-version memory-optimized tables, we don’t want to wait, and rather than wait transactions can take dependencies on other transactions, continuing but assuming the transactions “before” them will commit. These dependencies are resolved by waiting, if needed, during the preparation phase, rather than doing normal processing. A best practice with memory-optimized tables is to keep transactions short, to minimize waits for transaction dependencies. Refer to the whitepaper for a description of conditions that can cause transaction dependencies.

There are places where reading or writing inside a transaction can cause other sessions to wait, however. In repeatable read isolation level and above, but not in snapshot isolation level, a SELECT statement on a memory-optimized table WITH A T-SQL STATEMENT within a T-SQL user transaction will cause another session attempting to DROP the table to wait.

begin tran
select * from dbo.t1 with (snapshot)
– drop table t1 in the other session, it works
select * from dbo.t1 with (snapshot) — invalid object name dbo.t1
commit — OK

begin tran
select * from dbo.t1 with (repeatableread)
– drop table t1 in the other session, this waits
select * from dbo.t1 with (repeatableread) — OK
commit — OK, (and drop works after this tx committed)

However, reads from within a repeatable read isolation level COMPILED PROCEDURE and writes (INSERT/UPDATE/DELETE) from within a COMPILED PROCEDURE at any isolation level allow a DROP TABLE from another session statement to work, and cause a validation error at preparation time. Writes in a T-SQL statement at any isolation level cause the same validation error.

begin tran
insert into dbo.t1 with (snapshot) values(5, ‘Fred’)
– drop table t1 in the other session – the drop statement hangs
commit — OK, (and drop works after this tx committed)

create procedure dbo.insert_t1(@c1 int, @c2 varchar(10))
with native_compilation, schemabinding, execute as owner
as begin atomic with (transaction isolation level=snapshot, language=N’us_english’)
insert into dbo.t1 values(@c1, @c2)

begin tran
execute dbo.insert_t1 5, ‘Fred’
– drop procedure dbo.insert_t1 and dbo.insert_t1 and table t1 in the other session (this works without hanging/waiting)

– note that we must drop the procedure (and any other procedures that reference table t1)
– first, because compiled procedures are schemabound.
– attempting to drop the table without dropping the procedure would fail with a schemabinding error.
commit — Msg 41305, Level 16, State 0, Line 211 The current transaction failed to commit due to a repeatable read validation failure.

So we’ve succeeded in muddying the waters a bit by illustrating that:
1. With memory-optimized tables, transactions can fail during normal procession as well as the preparation phase
2. Transactional behavior can cause different behavior in other sessions depending on whether we use a compiled procedure or use a T-SQL statement.

With memory-optimized tables, these are not your mother’s/father’s traditional transactions. AND…we haven’t covered all the differences and edge-cases yet, either.

Cheers, Bob