Getting close to SQLskills developer immersion event in February

I’m gearing up for my first SQLskills Immersion Event of the year, Immersion Event for Developers (IEDEV) in Tampa on Feb 17-21.

This one is getting kind of jam-packed with material because, in addition to coverage of T-SQL and its associated topics like query and procedural code performance and transactions and locking, this event will round out your toolbox. They’ll be coverage of specialized techniques such as coding with SQLCLR (.NET procedures, types, and aggregates), SQL and XML, Service Broker, Spatial, and Fulltext Search, as well as designing for tracking changes. As with T-SQL, when using these special techniques, the emphasis is always on performance and using the right tool for the job. We’ll see a lot of query plans and delve into details about specialized index usage. I’ve also included some information and demos on how SQL Server 2014 features fit into the big picture.

Registering soon will give you time to get geared up too, for the sheer amount of information we’ll cover. And, as always, they’ll be some after-hours events and giveaways. Arrive well-rested… If you have specific questions about the coverage, feel free to send me mail before hand. Hope to see you in sunny Tampa in February!

Cheers, Bob

Compiled T-SQL? Don’t throw away all your SQLCLR code just yet

The speed increase from memory-optimized tables in SQL Server 2014 is nice, but I thought, to get the most bang for your buck for really need to use compiled stored procedures in addition. After looking at some of the C code these produce, it almost looked like I was doing direct vector branching into the table’s C-based specific access routines. But, in the meantime, I’d worked with a friend of mine who was trying to convert some custom SQLCLR code to compiled T-SQL. He was using table variables in place of CLR objects and, although the code run way faster than the non-compiled T-SQL code, it was still 90% slower than his SQLCLR equivalent.

In general, you’ll get the biggest speed increase from a compiled sproc if you use do lots of logic in code, or use nested loops in plans that return lots of rows. I thought I had just the use case.

Once upon a time, someone had given me code that did a lot of calculations in SQLCLR and in T-SQL. The SQLCLR code uses 2-and-3-dimensional arrays as well as some data access. The T-SQL code was an unoptimized, line-by-line port of the SQLCLR code used multiple temporary tables and a cursor over outer iterator. You could remove the outer cursor and still SQLCLR performed 100 iterations (100 products in this case) 7-8 times faster than T-SQL performed 1 iteration (1 product). In other words, SQLCLR was about 700-800X faster than T-SQL. Folks used to tell me this was “cheating” because of the large number of SQL queries the T-SQL code did (100s of queries, many using INSERT INTO a temp table). Nevertheless, it was worth a try.

Converting the code was pretty straightforward.
Replace temp tables with strongly-typed in-memory table variables
Make sure each table variable has a key and index (they’re required in in-memory table variables)
Change a subquery into another select into table variable + main query (compiled sprocs don’t support subqueries yet)
Make sure everything follows the rules for schemabinding
Didn’t use the outer cursor, so the compiled T-SQL code either performed a single iteration or called the compiled proc from within non-compiled T-SQL cursor code

I also had to wait until CTP2, because the procedure used DATEADD. That’s wasn’t supported until CTP2 (good choice of what to support sooner, SQL team, thanks for adding date functions).

The results for switching to in-memory tables were encouraging:
SQLCLR – standard tables – 5 sec
TSQL – standard tables – 37 sec (x100 = 370 sec) – 740X slower
TSQL – in-memory tables – 10 sec (x100 = 100 sec) – 200x slower
SQLCLR doesn’t support in-memory tables through context connection. I didn’t try it with an external connection back into the same instance.

Switching to compiled sproc (and pre-compiling the sprocs to factor out the first-time compile overhead for each)
TSQL – in-memory tables – 10 sec (x100 = 100 sec)
Compiled TSQL (which only supports in-memory tables) – 10 sec (x100 – 100 sec)

So for this particular use case (just a straight port, didn’t try to optimize the code), the difference was minimal. BTW, I’m not pretending this is a benchmark or anywhere near, just a quick test and some round numbers for comparison.

Then I took a harder look at my code. The SQL code was doing a large *number* of SQL statements, each of which used and returned a small number of rows. The computations were done, not in program logic, but with iterations over SQL statements (not a good way to do custom computations in ANY case). Horrible use case, I guess. Lessons learned:
-In-memory tables are not multidimensional arrays or custom “objects” as far as speed is concerned.
-The SQL statement still has overhead over in-memory operations even with simple SQL in a compiled sproc
-Iterating with SQL statements doesn’t substitute for aggregations and computations done in a single SQL statement (I knew that one already).
-Remember where the sweet spot is for compiled stored procedures.

So folks, don’t throw away your custom SQLCLR sprocs that do custom computations just yet. But do give it a test. And push as much of the computation as possible into the SQL statement and optimize those SQL statements.

Cheers, Bob

Correlating SQL Server 2014 cardinality events and query plan

I’ve been doing some investigation of the new cardinality estimator in SQL Server 2014. I’ve always thought the best way to see how the estimation process worked was through the query_optimizer_estimate_cardinality XEvent. I wrote about this event in an earlier CTP1 post. Joe Sack used this event to show how the new estimator uses techniques to improve the query plan, I’d encourage you to read his series of posts. If you don’t know what a cardinality estimator is or why you’d care, Ben Nevarez’ article is a nice intro. And, as Joe says and I agree, documentation on the XEvent is sparse.

Lately, I thought it would be nice to demonstrate how the cardinality information emitted by the XEvent shows up in the query plan, and have been studying a number of plans v. event traces to find a pattern by brute force. But it was right in front of my face… While setting up the XEvent trace (again), I happened upon this description one of the event’s fields.

stats_collection_id – “ID of the stats collection generated for the input. This same ID will also appear in ShowplanXML as an attribute of the RelOp element when the event is enabled.”

Ohhh. So, you need to enable the event (i.e. turn on the XEvent trace), then run the query with the query plan turned on. Turning on either an estimated plan or actual plan will work, but actual plan is the best because you can see the estimated v. actual rows affected, and see not only whether the estimate was good, but how the cardinality estimator arrived at the estimate.

And when you do this, each plan iterator contains a new property, StatsCollectionId. Which corresponds to stats_collection_id in your XEvent trace. You need to show the properties window (click on the iterator to select it, then hit F4) to see this property. Or look in the XML directly (and we know how much fun that is…). So, graphic showplan and F4 is your friend.

The first time I tried this, with a really simple query, I was really confused. The plan consisted of a single ClusteredIndexScan, and that iterator had a StatsCollectionId of 1. The XEvent trace had 5 events with stats_collection_id of 2,2,3,3 and 5. Hmmm…

Then I tried a more complex plan, figuring that I had more chances to see some correlation. More iterators to try. Sure enough. There are no cardinality estimate rows for a scan. Because they don’t estimate cardinality for a scan, they know how many rows there are in each table or index. So you won’t find correlation between XEvent and scan iterator, or I just haven’t seen it so far. On every other iterator in the plan, there’s nice correlation. OK, now that we have correlation, let’s extract the information from the event.

There’s three main fields, defined in the event description like this:
1. input_relation – “The input relation on which cardinality is being updated”. This is, which plan logical operator (e.g. LogOp_GbAgg) is being estimated, what statistics were used as input.
2. calculator – “The strategy used for the estimate”. Which calculator was used to calculated cardinality. BTW, it is perfectly valid for this fields to be empty (empty Calculator element), guess that means they didn’t need a cardinality estimation strategy.
3. stats_collection – “The stats collection generated for the input”. What type of stats were used and which stats were loaded.

So each record in the trace represents Input -> Calculator -> Output …And you can read your way through a whole plan/events pair this way.

This post is getting kind of long (there will be more to come) and I’ll end with the observation that there are lots more XEvents in the trace than there are iterators in the plan. What’s the extra ones for? At this point I’m thinking that, since this information is gathered as the plan is being produced, these events represent intermediate steps in the estimation process, and only one step is tied to the plan iterator. But that’s a guess for now.

For your enjoyment, I’ve included an example with a SQL query, query plan, XEvent trace, and SQL/XML query to take apart the XEvent trace and look at what I thought were relevent subelements and attributes of the event fields. There are also some information and samples in my previous post, and in Joe’s posts, to get you started.

Cheers, @bobbeauch

In-memory OLTP – read_set, write_set, and scan_set

I was looking through some in-memory OLTP DMVs in SQL Server 2014 and came across these in the DMV sys.dm_xtp_system_memory_consumers:

Thought it might be nice to see how these work, but explain first I’ll explain what they’re about. By the way, a lookaside is just a cache, sometimes of pointers to memory addresses. You’ve probably heard of the L1 and L2 cache built into CPUs.

These sets and their corresponding lookasides have to do with how memory-optimized tables do transactions against multi-version concurrency control (MVCC) tables (memory-optimized tables in SQL Server 2014 use MVCC). The three basic transaction types supported by MVCC tables (lowest to highest isolation) are snapshot, repeatable read, and serializable. In all isolation levels, reads are performed as of the beginning of the transaction. So any of these levels will read rows where the begin-time of the transaction is between the begin-timestamp and end-timestamp of the row.

Snapshot transactions require no additional processing, simply read the correct versions. Repeatable read means that the version you’d read at the end of the transaction must be the same as you’d read at the end. Serializable means repeatable read and in addition, if you’re doing a scan (e.g. select rows where id is between 10 and 20) no new rows in your range were added between beginning and end of your transaction. READ_SET and SCAN_SET have to do with repeatable read and serializable transactions, respectively.

SQL Server MVCC accomplishes the additional checks by **re-reading the rows at commit time**. Sounds expensive. It needs to make this re-reading fast, so instead of doing the whole operation again, repeatable read transactions keep a pointer to each row they’ve read in a READ_SET and check the pointer again at commit time to see if they’re the same. You can see this behavior on a system with no other activity, in sys.dm_xtp_system_memory_consumers.

— execute sys.dm_xtp_system_memory_consumers and check the READ_SET_LOOKASIDE amount
select * from sys.dm_xtp_system_memory_consumers;
— now repeatable read transaction using AdventureWorks memory-optimized table sample
begin transaction
select * from Sales.SalesOrderHeader_inmem with (repeatableread);
— note READ_SET_LOOKASIDE should be bigger
select * from sys.dm_xtp_system_memory_consumers;

For a serializable transaction you need to ensure read consistency with the READ_SET and also ensure scans will produce the same rows and no additional rows with SCAN_SET. Choose a different table if you want to see them both increase, because, if there’s no other changes in Sales.SalesOrderHeader_inmem (from the first experiment) the READ_SET_LOOKASIDE will retain the same rows/same size:

— execute sys.dm_xtp_system_memory_consumers and check the READ_SET_LOOKASIDE and SCAN_SET_LOOKASIDE amount
select * from sys.dm_xtp_system_memory_consumers;
— now repeatable read transaction using AdventureWorks memory-optimzed table sample
begin transaction
select * from Production.Product_inmem with (serializable);
— note READ_SET_LOOKASIDE and SCAN_SET_LOOKASIDE should be bigger
select * from sys.dm_xtp_system_memory_consumers;

So what about the WRITE_SET then? Transactions that update MVCC rows replace the end-timestamp of the rows they update with the transaction ID while the transaction is running. A bit indicates whether the information is an end-timestamp or a transaction ID. The transaction ID also prohibits concurrent updates of the same row (it’s not a lock, transactions that attempt concurrent update will simply fail and roll back the transaction). They’ll also insert new rows with a transaction ID as a begin-timestamp.

Transaction commit happens like this:
Acquire an end-timestamp
Validation (see READ_SET and SCAN_SET above)
Harden to the transaction log
Update the timestamps in the updated/new rows

The WRITE_SET is used to make updating the timestamps fast. Because this entry is getting a bit long, I’ll leave it to the reader to observe WRITE_SET activity using the DMV. And I’ll also leave it to you to figure out how the transaction commit series of events guarantees transactionally consistent data. Hint: edge-cases and additional information is available in my previous blog entries of the subject.

Cheers, @bobbeauch

SQL Server 2014 memory-optimized table variables – metadata and garbage collection

I’d been doing some experimentation with memory-optimized table types in SQL Server 2014 and thought it was about time to write about it. Yesterday however, there was an excellent post on them at SQL Server team blog. So I’ll stay away from restating their points for the most part, and encourage you to read their post.

First off, although you can use memory-optimized table variables outside of compiled stored procedures, I think the biggest use for them will be in compiled procedures, mainly because compiled procedures don’t support #temporary tables. So, everywhere folks use and abuse temp tables, they’ll be using memory-optimized table variables as a replacement. Be careful of this however, because memory-optimized table variables, like “ordinary” table variables, have no statistics and, absent of an OPTION RECOMPILE hint, other SQL statement always use an estimate of one row for them.

Memory-optimized table variables must be strongly typed, so you start by creating a table type with the CREATE TYPE DDL statement. You can’t use a durability specification in this statement, but that only makes sense, because memory-optimized table variables are non-durable by definition. They don’t participate in availability group failover. The compiled code module for them is created at CREATE TYPE time and uses a naming conversion for a ‘v’ in the name (e.g. xtp_v_15_277576027.dll), rather than ‘t’ for memory-optimized table modules or ‘p’ for compiled procedure modules. Because there is no data load at SQL Server startup, this module isn’t created after a instance restart until the first time its used to create a variable.

It’s important to realize that these table variables don’t live in tempdb; they live in the memory-optimized object space of the database in which they are created. This takes pressure off tempdb, but they can compete for memory with other memory-optimized objects in the same database. In a short test, as I was doing a bunch of inserts into a memory-optimized table, I instanciated a memory-optimized table variable and filled it with a large number of rows. The inserts into the table (not the table variable) started failing because the combination exceeded my memory allocation for optimized objects in that database. It’s also possible for a series of memory optimized variables that are active at the same time to exceed the memory allocation and fail due to lack of memory. Granted that because table variables are usually kept to a small number of rows and have a limited lifetime, this may not be an big issue; my concern was the proliferation that could result because they replace temp tables in compiled stored procedures.

There are a few more surprises. Individual memory-optimized table variables have no metadata like “ordinary” table variables do, so you can’t tell how many you have active at a time by consulting the metadata. They do appear in sys.dm_db_xtp_memory_consumers as memory_consumer_type of PGPOOL. This consumer is “used for all table variables and includes usage for serializable scans” according to the BOL description of that DMV. However, the pool doesn’t appear in sys.dm_db_xtp_table_memory_stats at all. This means that it doesn’t appear in the SSMS “Memory used by memory-optimized objects” report, even under the “system allocated memory” category. This is useful to know if you use the report or sys.dm_db_xtp_table_memory_stats to monitor usage.

Finally, about memory-optimized table variables and garbage collection. Although these variables may participate in garbage collection during their lifetime (e.g. for updates or deletes), when the variable goes out of scope, the memory it used is released in it’s entirety (sys.dm_db_xtp_memory_consumers/used_bytes goes to zero). There is a small amount of memory (up to ~62.5 mb or so, in my tests) that remains allocated in this pool after the last variable goes out of scope, but under low-memory conditions, even this is given back.

Bear in mind that this is all based upon observed behavior in SQL Server 2014 CTP2 and may change by RTM. If it does, I’ll update this post.

Cheers, @bobbeauch

Tracking execution stats of SQL Server 2014 natively-compiled sprocs

Haven’t had time to blog much lately. Finding out strange and interesting things about SQL Server 2014 to present at some seminars and conferences next year. “Shameless self-promotion” posts for those events will be forthcoming. 😉 A few days ago, an experiment with tracking compiled stored procedures turned up some “interesting” results. Thought I’d write about that.

You normally get information about query execution at a query level with sys.dm_exec_query_stats, and info at a stored procedure level sys.dm_exec_procedure_stats. However, with in-memory, OLTP the point is to make things run as quickly as possible and, as such statistics (such as worker_time and elasped_time) don’t get collected. By default *nothing* appears in these DMVs about compiled sprocs. Not even a count of the number of time it’s been executed.

However, Books Online mentions using sys.sp_xtp_control_query_exec_stats and sys.sp_xtp_control_proc_exec_stats to enable collection. Although sys.sp_xtp_control_proc_exec_stats controls a global setting, sys.sp_xtp_control_query_exec_stats controls collecting query information on a per-proc basis. That is, you can turn in on/off for individual compiled stored procedures.

I needed a test bed to try this out. Since there is a nice, new in-memory-OLTP sample on Codeplex that layers on top of the AdventureWorks2012 database, and being the lazy person I am, I decided to use this one. Shouldn’t have been so lazy, perhaps. I downloaded the sample, downloaded AdventureWorks2012 and found interesting behaviors right off. After attaching the data (mdf) file with ObjectExplorer, I found the the database had no owner (owner is blank). That ruined the sample setup, so I changed the database owner to ‘sa’. Then the sample setup ran. Next, I issued the statement:

select * from sys.sql_modules where uses_native_compilation = 1;

This produced output naming two stored procedures (good) and four triggers (not so good, triggers can’t be natively compiled). One for the Connect site.

Strangely the native procedures were named Sales.usp_InsertSalesOrder_inmem and Sales.usp_UpdateSalesOrderShipInfo_native. While I wondered why they both weren’t suffixed by _inmem or _native, I found additional stored procedures in the demo that ended in “_inmem” and WEREN’T compiled stored procedures, they just referenced in-memory tables. One for me to remember when looking at the results.

Started by clearing the procedure cache (DBCC FREEPROCCACHE, standard disclaimer not to do this on production) and inserting 10000 rows and updating additional rows with collection off (the default). Took 7 seconds. After this experiment there were no rows for those procedures/queries in the two DMVs. Good. Now turn global settings on for both types of collection, free proccache, and run again. Executes in 4 seconds. Something needed some “warmup” because I would have expected “with collection” to take longer. OK let’s look at the DMVs.

For query_stats, there are 5 queries in the two compiled sprocs. I get sql_handle (for the query), start/end offset, creation/last execution time, and the worker_time and elasped_time buckets. Nothing else. I don’t know what possessed me to think there’s be something in read/write counts, but these were zero, because of the way memory-optimized tables are stored. They’re not in stored in pages. Procedure_stats had the same sparse infomation, which I was used to by then.

In case you were wondering what the query plan looked like, there wasn’t one exposed. Plan_handle was zeros for those queries. In SQL Server 2014 CTP2 you still can’t get an actual query plan (with actual counts), but you can get an estimated plan. The estimated plans are really sparse as far as information goes, compared to what you’re used to.

One final surprise was that, I thought I’d run multiple iterations of the test, using DBCC FREEPROCCACHE in between to zero the numbers. Not only does turning off data collection not zero the numbers, but running DBCC FREEPROCCACHE doesn’t get rid of the rows, either. These rows remain in the DMV until I drop and re-create the natively compiled proc. Or restart SQL Server, of course. That’s something I wasn’t expecting, but logic-ing it out a bit, perhaps this information wasn’t retrieved from the plan cache. Sure enough, looking at sys.dm_exec_cached_plans yielded no rows for these plans, because they’re not traditional plans.

That’s enough fun for now, hopefully. More, on related topics, perhaps, coming…sometime.


Integrating SQL Server 2014 constructs into the T-SQL canon

So I was going through the features in SQL Server 2014, seeing how and where they fit into the canon of SQL Server’s T-SQL. I’m going to add some of this for my advanced T-SQL Immersion Event at the end of September. SQL Server 2014 is supposed to be a release with no T-SQL changes at all. Not windowing enhancements, not time interval data types, nothing. So the effect on the T-SQL programmer should be minimal, right? Well, maybe not minimal.

The most visible new feature in SQL Server 2014 is known as Hekaton. In a perfect world, Hekaton tables would simply be a deployment decision by the DBA. The miracle “go faster” switch, just buy some more memory move your tables over. But it turns out that the surface area of tables and statements on tables in wide and deep. And in “V1” of memory-optimized tables there are a number of constructs that they don’t support. These include some data types, as well as constraints and triggers. So T-SQL developers need to “make up” for lack of these by coming up with their own scheme, no mean feat although there are some examples of this type of code in the Books Online. And that’s just the beginning, the list of unsupported items means that the move of a table to memory-optimized won’t be a simple change to DDL.

As another example, the change from pages to hash buckets in the in-memory engine gives new credibility (well, gives credibility) to the concept of uniqueidentifier as primary key. With hash buckets, spreading out the key values won’t cause fragmentation, it should actually be as fast or faster. It is still a bigger data type though, which means more to keep in memory. Just buy a bit more memory and a fast disk for the log.

T-SQL compiled stored procedures that calls table-specific code is needed to make memory-optimized tables run as fast as possible. In fact, compiled stored procedures can only access memory-optimized tables. These aren’t your traditional stored procedures. They must be defined with a transaction isolation level, as well as other options. And these stored procedures also have a number of limitations, including limiting usage of some built-in functions, tempdb, and collation choice. They also compile on first use (when you create them initially and when you bring up SQL Server) and don’t recompile because of statistics changes. Finally, these procedures need error handling because, depending on isolation level, they might rollback due to the multi-version engine. Ah, the multi-version engine.. That may affect T-SQL programmers a little.

There’s more than just rowstore and linked list memory-optimized tables. This release can use columnstore (xVelocity) as the primary storage vehicle, rather than as an additional copy of your warehouse data as in SQL Server 2012. And the columnstore is updatable now. And because columnstore is “V2” now, programmers can “undo” almost every workaround that was needed to take full advantage of “Batch Mode” processing because constructs like outer joins will now use Batch Mode automatically.

Let’s not forget the new cardinality estimation. If you’re fearing plan regressions, you can always opt out, but sounds like some tuning, as well as better estimates leading to better plans, will be going on. And there’s always parallelism in SELECT..INTO too.

So ever if they never add another T-SQL feature to SQL Server 2014 by RTM, there’s lots to talk about that pertains to the way you’ll write T-SQL in future as well as, of course, lots about how we write it for “best results” today. See you later this month.


An implied (not implying) permissions function

I’d always wondered why they didn’t make the “ImplyingPermissions” function (code is part of the books online) part of the product. I originally thought it was because the function was subtly (or not so subtly) misnamed. What the function does, is “given a permission in a permission class, return a table of which built-in permissions grant (or deny) you that input permission by permission inheritance/covering”. To use their example, if the input permission is “ALTER” in the “SCHEMA” class, the returned table includes the “ALTER” permission in the “DATABASE” class, meaning that granting someone “ALTER DATABASE” also grants them “ALTER SCHEMA” in every schema in that database.

I always wanted the corollary function, “ImpliedPermissions”. That is, given the permission “ALTER” in the “DATABASE” class, what other permissions at lower levels am I granted by hierarchy/covering? There’s probably someone that’s already written that function, but searching for “SQL Server Implied Permissions” always brings me back to the “ImplyingPermissions” function. Which is where I got the idea about its being misnamed. It’s non-trivial to write, given that there are two types of permission “inheritance” – inheritance and covering. But I came upon this quick way that uses the ImpliedPermission function and CROSS APPLY:

CREATE FUNCTION dbo.ImpliedPermissions(@class nvarchar(64), @permission_name nvarchar(64))
SELECT a.*, b.height, b.RANK
FROM sys.fn_builtin_permissions(”) AS a
CROSS APPLY dbo.ImplyingPermissions(a.class_desc, a.permission_name) AS b
WHERE b.CLASS = @class AND b.permname = @permission_name

Note that this is not the same as creating a LOGIN or USER, assigning it permissions, then impersonating the principal and using sys.fn_my_permissions. The “My permissions” function (which must have been invented when Windows was calling everything “My…”) requires a *specific* securable and class, not a class and permission. So it can tell you which permissions (multiple) you’ve been granted on schemaX.tableX, and one level and type of inheritance between table and column (SELECT on a table gives you SELECT on all the columns), but not functionality like “SELECT” in the “OBJECT” class. And the Implying/ImpliedPermissions functions themselves don’t deal with individuals, just classes.

So now there is a corollary function too. I did notice that it was a bit slow, especially trying to use it with semi-joins and anti semi-joins. So, if you’re using this function a lot, you may want to materialize the permission hierarchy table (builtin_permissions CROSS APPLY ImplyingPermissions) somewhere. And now I can tell you, “if you give someone class X/permission Y, what else are you giving them? Well, at least to the extent that sys.fn_builtin_permissions() accurately reflects the permission hierarchy/covering (I think I’ve found some edge cases). Enjoy…


Batch Mode improvement for columnstores in SQL Server 2014 CTP1

This post is a continuation of my previous posts on columnstore indexes in SQL Server 2014 CTP1 and query plans. The test bed is the same as in previous posts, based on an enlarged version of a fact table in AdventureWorks2012DW.

One of the things that improves query speed when using columnstore indexes is the usage of a new batch mode, as opposed to “traditional” row mode. Batch mode refers to the fact that individual iterators execute in batches rather than row-by-row. You can observe this in query plan iterators; each one contains the properties “Actual Execution Mode” and “Estimated Execution Mode”.

In SQL Server 2014, usage of Batch Mode will be more mainstream. We’ve actually already seen evidence of improvements from SQL Server 2012 to SQL Server 2014 in the first post. In SQL Server 2012, The hash join implementation consists of two iterators: a build iterator and an actual join iterator. In SQL Server 2014, there is not a separate build iterator; everything is pushed into the join iterator. There were also other limitations in SQL Server 2012 for batch mode. For example, batch mode was only supported in inner joins and the hash table must fit entirely in memory. In addition, query patterns such as scalar aggregation, multiple distinct aggregations and IN clauses and other patterns could not use Batch Mode. A series of query rewrites for common patterns to enable queries to use Batch Mode were posted on the Technet Wiki. SQL Server 2014 removes some, if not most, of these limitations.

I’ll look at one of these queries today to see how things have improved, that’s the outer join. We’ll use the following query as a test:

select ProductAlternateKey, SUM(SalesAmount) as SumSales
from DimProduct
left outer join FactInternetSalesNewCCI fact
on DimProduct.ProductKey = fact.ProductKey
group by ProductAlternateKey
order by SumSales desc;

In SQL Server 2012, because of the outer join, none of the iterators (including the nonclustered columnstore scan) in the query use Batch Mode. In SQL Server 2014, the left outer join iterator, as well as the following HashMatch partial aggregate and the columnstore scan use Batch Mode. People who invested in the nonclustered columnstore index in SQL Server 2012 will be glad to note that these improvements work with nonclustered columnstore as well as clustered columnstore.

In addition, I tried this query in SQL Server using both the old and new cardinality estimation. The new estimation was right on the money, with a final estimate of 504 Estimated and Actual Rows v. the old estimator’s estimate of 376 Estimated v. 504 Actual. Of course, the higher, more accurate estimate results in a plan with a higher estimated cost and memory grant (the more accurate estimate is higher after all), but we’ll know that we’re using the “right” plan because of the more accurate estimates. In this case, the different estimate did not change the shape of the plan.

So, to summarize:
1. The cases in which Batch Mode will be used with Columnstore iterators in plans have been expanded in SQL Server 2014, leading to even faster queries.
2. The Batch Mode improvements work with the nonclustered as well as the clustered columnstore indexes.
3. The new cardinality estimator almost always results in more accurate cardinality estimates in SQL Server 2014.

Cheers, Bob

As always, query plans…

Columnstore indexes and cardinality estimates in SQL Server 2014 CTP1

This post is a continuation of my previous post about SQL Server 2014 Columnstore and Query Plans. I’ll take a look at how the new cardinality estimates affects columnstore index query plans. I’ve written a bit about the new cardinality estimator in SQL Server 2014, as well as looked into how to tell what the estimator is doing by using XEvent trace. Today, I’ll apply that to columnstore as well as a data warehouse query against a traditional rowstore.

First off, since I used the AdventureWorks2012DW database and attached it in SQL Server 2014 rather than building it “by hand, from scratch”, I’m using the “old” cardinality estimator by default. This applies even to new tables in databases restored/attached from pre-2014 versions, as we’re using here. I can get queries to use the “new” estimator on a query-by-query basis by using the query hint “OPTION(QUERYTRACEON 2312)”. You can also set the trace flag at other levels of granularity (session and ‌instance-level) of course. But it’s easier to compare the two using the query hints. You can confirm that this works by referencing the “main” (e.g. SELECT, in this case) iterator’s CardinalityMode110 is OFF (old estimator) or ON (new estimator).

With the sample query from the previous post, all queries (traditional bitmap, nonclustered columnstore, and clustered columnstore) got the same estimate of 100.8 estimated rows v. 119 actual rows. Although the query shapes were different between “traditional” bitmap plans and columnstore plans, the estimates were consistent at the end. Because of the different plan shapes, it’s not straightforward to determine where the estimates started to diverge. SQL Server 2012 SP1 plans for traditional bitmap and nonclusted columnstore got the same eventual estimate as the analogous queries in SQL Server 2014 CTP1. Trace flag 2312 has no
effect on cardinality in SQL Server 2012, as you would expect (although yes, I did try it).

When the new cardinality estimates are turned on, the eventual estimate for all variations of the table is slightly worse at the end, 189.842 estimated rows v. 119 actual rows. With the “traditional” bitmap plan this doesn’t change the plan but does result in the estimated plan having a slightly higher cost for the new cardinality estimator (54%-46% of the batch). The memory grant is slightly higher as well, as you would expect. With the clustered and nonclustered columnstore the higher estimated rows results in a different plan choice (HashMatch (aggregate) – old v. Sort and StreamAggregate – new). This different plan choice results in a plan with much higher cost (71%-29% of the batch), although the choice of StreamAggregate results in a slightly lower memory grant for the new plan.

To summarize:
1. The new cardinality estimator in SQL Server 2014 works with queries against columnstores as well as rowstores.
2. The cardinality estimates are consistent between columnstores and rowstores, even though the plan shapes might be different.
3. The new cardinality estimator can change a plan. As in any query optimizer change, there can be regressions, but I’ve mostly seen improvements with the new estimator. As well as improvement with a different columnstore plan, coming up.

Cheers, Bob

Query plans are included.