The CSelCalcAscendingKeyFilter Calculator

This post is a continuation of the SQL Server 2014 Cardinality Estimator enhancements exploration series (yeah – I’ve decided to call this a series, since there will be quite a bit to talk about, time permitting):

So for this post, I created a nonclustered index on the charge_dt column from the charge table in the Credit sample database and then inserted 10 new rows as follows:

INSERT INTO [dbo].[charge]
([member_no]
,[provider_no]
,[category_no]
,[charge_dt]
,[charge_amt]
,[statement_no]
,[charge_code])
SELECT TOP 10 [member_no]
,[provider_no]
,[category_no]
,GETDATE()
,[charge_amt]
,[statement_no]
,[charge_code]
FROM dbo.charge
GO

Notice I’m using GETDATE() for the charge_dt, and if you look at the last few rows of the histogram for charge_dt you’ll see that nothing in the year 2013 is represented (and I’m only inserting 10 new rows, and haven’t manually update statistics recently):

image thumb11 The CSelCalcAscendingKeyFilter Calculator

In SQL Server 2012, what row estimate would you expect if I executed the following query?

SELECT [charge_no]
FROM dbo.[charge]
WHERE [charge_dt] = ‘2013-11-17 12:46:44.307’
OPTION (RECOMPILE);

In this case, the plan tree (via SQL Sentry Plan Explorer) is as follows:

image thumb12 The CSelCalcAscendingKeyFilter Calculator

The estimate is for 1 row (actual is 10 rows).

What are my estimates if I switch the database to compatibility mode 120 and re-execute the same exact query?

image thumb13 The CSelCalcAscendingKeyFilter Calculator

In 2014 CTP2 – my estimate is for 10 rows, and re-checking the histogram, the maximum RANGE_HI_KEY is still 1999-10-13 10:51:19.317 and not a more recent value.

What does the query_optimizer_estimate_cardinality event show?

<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcColumnInInterval” Selectivity=”-1.000″
CalculatorFailed=”true” TableName=”[Credit].[dbo].[charge]” ColumnName=”charge_dt” />

<FilterCalculator CalculatorName=”CSelCalcAscendingKeyFilter” Selectivity=”0.000″
TableName=”[Credit].[dbo].[charge]” ColumnName=”charge_dt” UseAverageFrequency=”true” StatId=”7″ />
</CalculatorList>

I highlighted a couple of interesting areas – including the CalculatorFailed=”true” associated with CSelCalcColumnInInterval and then what seems to be the application of a second calculator, CSelCalcAscendingKeyFilter – which implies I would assume something related to the classic ascending key issue and a “true” for “UseAverageFrequency”.

If this is average frequency, then we’re looking at density times the row count, which would mean 1,600,010 rows times the 1.006087E-05 density from the that particular statistics object:

image thumb14 The CSelCalcAscendingKeyFilter Calculator

However if you do that calculation, you’ll get 16.0974926087 instead of the 10 row estimate.

Looking further in the query_optimizer_estimate_cardinality output (still associated with the prior calculator) I see the following:

<StatsCollection Name=”CStCollFilter” Id=”2″ Card=”10.00″>
<LoadedStats>
<StatsInfo DbId=”6″ ObjectId=”229575856″ StatsId=”7″ />
</LoadedStats>
</StatsCollection>

Continuing the search, I loaded 90 more rows with that same ‘2013-11-17 12:46:44.307’ value and re-executed the query to see the query_optimizer_estimate_cardinality event information.

This time the estimates bumped up as follows (16 rows instead of a 10 row estimate, with 100 actual rows):

image thumb15 The CSelCalcAscendingKeyFilter Calculator

There was still no change to the statistics (no additional step for the year 2013 data).

As for query_optimizer_estimate_cardinality event information, I saw the following:

<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcColumnInInterval” Selectivity=”-1.000″ CalculatorFailed=”true” TableName=”[Credit].[dbo].[charge]” ColumnName=”charge_dt” />
<FilterCalculator CalculatorName=”CSelCalcAscendingKeyFilter” Selectivity=”0.000″ TableName=”[Credit].[dbo].[charge]” ColumnName=”charge_dt” UseAverageFrequency=”true” StatId=”7″ />
</CalculatorList>

<StatsCollection Name=”CStCollFilter” Id=”2″ Card=”16.10“>
<LoadedStats>
<StatsInfo DbId=”6″ ObjectId=”229575856″ StatsId=”7″ />
</LoadedStats>
</StatsCollection>

The 16.10 reflects what the average frequency value was that I expected (16.0974926087 rounded up).  But regarding the original 10 estimate, I wondered what would happen if I deleted 99 out of the 100 rows with the 2013 value.  Would I still see a Card=”10.00” or something else?

This is what I saw:

image thumb16 The CSelCalcAscendingKeyFilter Calculator

The estimate was for 16 rows again – with 1 actual row returned.

Then I thought about reverting to the beginning of my test – restoring from backup to the vanilla version of the Credit database and then flipping it to 2014 mode again.  I did this, re-created the index on charge_dt and then re-ran the initial 10 row insert and follow-up query against the 2013 date value that didn’t exist as a histogram step.  I was able to repeat the 10 row estimate effect.

I also tried a re-set of the test with just 2 rows initially inserted and got the estimate of 2 rows and Card=”2.00” for the stats_collection output for query_optimizer_estimate_cardinality. Same thing if I did 15 rows (got a 15 row estimate).  Doing 16 rows, the same – estimating 16 instead of 16.10.

And when I reset the test with 17 rows (which is above the average frequency), the estimate used is the average frequency of 16.10 and not 17.  Interesting!

Much more to explore on this overall subject.

“CSelCalcCombineFilters_ExponentialBackoff” Calculator– Part II

At the end of my last post, “CSelCalcCombineFilters_ExponentialBackoff” Calculator I gave the following homework:

Homework assignment – add another member column to the mix that has a 0.100 selectivity (for example – county) and compare the estimates using a pre-2014 and 2014 CTP2 compatibility mode.

Here is the answer-key.  I’ll start off with adding a new column called “county” to the member table (and yes – county gets a little involved for some cities like NYC, but let’s just go with a uniform county association to city/state for now):

ALTER TABLE [dbo].[member]
ADD [county] varchar(15) NULL;

And then next I’ll revise my previous city/state update script to include county:

UPDATE  [dbo].[member]
SET     [city] = 'Minneapolis',
[state_prov] = 'MN',
[county] = 'Hennepin'
WHERE   [member_no] % 10 = 0;

UPDATE  [dbo].[member]
SET     [city] = 'New York',
[state_prov] = 'NY',
[county]='Manhattan'
WHERE   [member_no] % 10 = 1;

UPDATE  [dbo].[member]
SET     [city] = 'Chicago',
[state_prov] = 'IL',
[county]='Cook'
WHERE   [member_no] % 10 = 2;

UPDATE  [dbo].[member]
SET     [city] = 'Houston',
[state_prov] = 'TX',
[county]='Harrisburg'
WHERE   [member_no] % 10 = 3;

UPDATE  [dbo].[member]
SET     [city] = 'Philadelphia',
[state_prov] = 'PA',
[county]='Philadelphia'
WHERE   [member_no] % 10 = 4;

UPDATE  [dbo].[member]
SET     [city] = 'Phoenix',
[state_prov] = 'AZ',
[county]=''
WHERE   [member_no] % 10 = 5;

UPDATE  [dbo].[member]
SET     [city] = 'San Antonio',
[state_prov] = 'TX',
[county]='Bexar'
WHERE   [member_no] % 10 = 6;

UPDATE  [dbo].[member]
SET     [city] = 'San Diego',
[state_prov] = 'CA',
[county]='San Diego'
WHERE   [member_no] % 10 = 7;

UPDATE  [dbo].[member]
SET     [city] = 'Dallas',
[state_prov] = 'TX',
[county]='Dallas'
WHERE   [member_no] % 10 = 8;
GO

Now, what row estimate will we get for the member table pre-2014 if we execute the following query?

SELECT  m.[lastname],
m.[firstname],
SUM(c.[charge_amt]) AS [Total_Charge_amt]
FROM    [dbo].[member] AS [m]
INNER JOIN [dbo].[charge] AS 1 ON
m.[member_no] = c.[member_no]
WHERE   [city] = 'Minneapolis' AND
[state_prov] = 'MN' AND
[county] = 'Hennepin'
GROUP BY m.[lastname],
m.[firstname]
OPTION (RECOMPILE);
GO

We see the following estimated rows versus actual (keeping in mind that there are no multi-column stats to help us out):

image thumb5 “CSelCalcCombineFilters ExponentialBackoff” Calculator– Part II

Estimated number of rows is 10, but actual is 1000.

Given that we know the three predicates are correlated, but SQL Server thinks they are independent (no multi-column stats), we can see the calculation is as follows:

image thumb6 “CSelCalcCombineFilters ExponentialBackoff” Calculator– Part II

Now if I switch to 2014 compatibility mode, I see the following estimated number of rows:

image thumb7 “CSelCalcCombineFilters ExponentialBackoff” Calculator– Part II

And I see the following query_optimizer_estimate_cardinality event output we see the following for the 2014 version of the plan:

<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcCombineFilters_ExponentialBackoff” Conjunction=”true”>
<SubCalculator>
<FilterCalculator CalculatorName=”CSelCalcColumnInInterval” Selectivity=”0.100″ TableName=”[m]” ColumnName=”state_prov” StatId=”5″ />
<FilterCalculator CalculatorName=”CSelCalcColumnInInterval” Selectivity=”0.100″ TableName=”[m]” ColumnName=”county” StatId=”8″ />
<FilterCalculator CalculatorName=”CSelCalcColumnInInterval” Selectivity=”0.100″ TableName=”[m]” ColumnName=”city” StatId=”4″ />
</SubCalculator>
</FilterCalculator>
</CalculatorList>

So updating the formulas, how is the 177.828 derived?

image thumb8 “CSelCalcCombineFilters ExponentialBackoff” Calculator– Part II

So we see in 2014 CTP2 that instead of multiplying the selectivity of each predicate (again, assuming independence), the Exponential Backoff calculator is blunting the impact 10000*0.1*POWER(0.1,1/2)*POWER(0.1,1/4).

Still a skew – but less so.  177.828 estimated vs. 10.  And think it doesn’t really matter?

For this example on my test system, pre-2014 the Clustered Index Scan was serial and fed to a Distribute Streams exchange operator, and with the new estimate, the operator is parallel.

Before:

image thumb9 “CSelCalcCombineFilters ExponentialBackoff” Calculator– Part II

After:

image thumb10 “CSelCalcCombineFilters ExponentialBackoff” Calculator– Part II

Plenty more to explore in this area…

“CSelCalcCombineFilters_ExponentialBackoff” Calculator

One more post for the day, just while I have the blogging bug…

Per my last post, we saw that the query_optimizer_estimate_cardinality event can provide some interesting information in SQL Server 2014 around how cardinality estimates get calculated.

I have a setup I’ve used in the past to demonstrate column correlation challenges (see my post, Exploring Column Correlation and Cardinality Estimates, for the full example).  I’ll use that setup for this post as well, using the Credit database on SQL Server 2014 with the database compatibility level set to 120:

USE [Credit];
GO

-- Modifying the Credit data to set up the DBA's scenario
UPDATE  [dbo].[member]
SET     [city] = 'Minneapolis',
        [state_prov] = 'MN'
WHERE   [member_no] % 10 = 0;

UPDATE  [dbo].[member]
SET     [city] = 'New York',
[state_prov] = 'NY'
WHERE   [member_no] % 10 = 1;

UPDATE  [dbo].[member]
SET     [city] = 'Chicago',
[state_prov] = 'IL'
WHERE   [member_no] % 10 = 2;

UPDATE  [dbo].[member]
SET     [city] = 'Houston',
[state_prov] = 'TX'
WHERE   [member_no] % 10 = 3;

UPDATE  [dbo].[member]
SET     [city] = 'Philadelphia',
[state_prov] = 'PA'
WHERE   [member_no] % 10 = 4;

UPDATE  [dbo].[member]
SET     [city] = 'Phoenix',
[state_prov] = 'AZ'
WHERE   [member_no] % 10 = 5;

UPDATE  [dbo].[member]
SET     [city] = 'San Antonio',
[state_prov] = 'TX'
WHERE   [member_no] % 10 = 6;

UPDATE  [dbo].[member]
SET     [city] = 'San Diego',
[state_prov] = 'CA'
WHERE   [member_no] % 10 = 7;

UPDATE  [dbo].[member]
SET     [city] = 'Dallas',
[state_prov] = 'TX'
WHERE   [member_no] % 10 = 8;
GO

So let’s look at the cardinality estimate for a specific operator in the following query (using RECOMPILE to get fresh query_optimizer_estimate_cardinality events on each execution):

SELECT  m.[lastname],
m.[firstname],
SUM(c.[charge_amt]) AS [Total_Charge_amt]
FROM    [dbo].[member] AS [m]
INNER JOIN [dbo].[charge] AS 1 ON
m.[member_no] = c.[member_no]
WHERE   [city] = 'Minneapolis' AND
[state_prov] = 'MN'
GROUP BY m.[lastname],
m.[firstname]
OPTION (RECOMPILE);
GO

Below is the abridged plan and the operator and estimates I’m interested in:

image thumb “CSelCalcCombineFilters ExponentialBackoff” Calculator

image thumb1 “CSelCalcCombineFilters ExponentialBackoff” Calculator

In SQL Server 2014 CTP2 – we estimate 316.228 rows for the Clustered Index Scan against the member table – and the actual number of rows is 1,000.

Now if I flip my database compatibility mode to pre-120, I’ll see the following actual vs. estimated instead:

image thumb2 “CSelCalcCombineFilters ExponentialBackoff” Calculator

Before SQL Server 2014, the estimate for  “[city] = ‘Minneapolis’ AND  [state_prov] = ‘MN’” is 100 rows – and if we added the supporting statistics or index on city and state_prov, we’ll get a better estimate (but there are limitations to how useful this can be – another topic altogether, so see this post).  We know that, in this example, the two columns are correlated.  But we would need to help SQL Server if we didn’t want it to assume each predicate was independent.

Anyhow – the original issue pre SQL Server 2014 was that our city and state_prov columns were seen as independent, and so we saw our estimates reflect this (10% of row estimate for city multiplied by 10% of row estimate for state_prov).  So we wind up with an underestimate.

Now in SQL Server 2014 CTP2, with the Credit database compatibility mode put back to 120, we see the skew still exists, but the gap isn’t as extreme – showing 316.228 instead of 100 as an estimate.  Still incorrect, but less so.

Now putting this in Excel, we can see the following:

image thumb3 “CSelCalcCombineFilters ExponentialBackoff” Calculator

Earlier we updated 10% of the 10,000 rows for various city/state combinations, so what’s up with the 32% for the state predicate selectivity?

That 32% is, using Excel formula style, =POWER((10%),1/2).

And if we look at the query_optimizer_estimate_cardinality for the 2014 CTP2 version, we’ll see the following calculator reference (abridged output):

image thumb4 “CSelCalcCombineFilters ExponentialBackoff” Calculator

And here is the full calculator blurb for the CSelCalcCombineFilters_ExponentialBackoff:

<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcCombineFilters_ExponentialBackoff” Conjunction=”true”>
<SubCalculator>
<FilterCalculator CalculatorName=”CSelCalcColumnInInterval” Selectivity=”0.100″ TableName=”[m]” ColumnName=”state_prov” StatId=”5″ />
<FilterCalculator CalculatorName=”CSelCalcColumnInInterval” Selectivity=”0.100″ TableName=”[m]” ColumnName=”city” StatId=”4″ />
</SubCalculator>
</FilterCalculator>
</CalculatorList>

We see the selectivity of state_prov and city are both 0.100.  And yet unlike the pre-2014 version that multiples the two selectivity percentages, we’re not doing that in 2014 CTP2. The calculator name references “ExponentialBackoff” – which according to Wikipedia (yeah I know), is defined as “an algorithm that uses feedback to multiplicatively decrease the rate of some process, in order to gradually find an acceptable rate”.

What does this mean in this specific case?

In absence of multi-column statistics for city and state_prov, our estimated rows is not underestimated nearly as much as it used to be pre-2014.

Homework assignment – add another member column to the mix that has a 0.100 selectivity (for example – county) and compare the estimates using a pre-2014 and 2014 CTP2 compatibility mode.

A first look at the query_optimizer_estimate_cardinality XE event

This weekend I set up a Windows Azure virtual machine running SQL Server 2014 CTP2 (12.0.1524).  I had a local copy of SQL Server 2014 CTP2 in VMware as well, but thought it would be more fun to explore it on Windows Azure (kill two explorations with one stone).  On a side note, I’m really digging having the azure credits to play with each month.  If you have an MSDN subscription, be sure to take advantage of this benefit.

In this post I’m just sharing my initial exploration steps regarding the query_optimizer_estimate_cardinality XE event.  I’m not entirely sure how well this event will be documented, but I’m definitely interested in learning more about it.

For my test scenario, I attached a version of AdventureWorksLT2012, set it to compatibility level 120 and then created the following session:

CREATE EVENT SESSION [XE_Card_Calculator] ON SERVER
ADD EVENT sqlserver.query_optimizer_estimate_cardinality
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,
MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO

If you’re looking for the query_optimizer_estimate_cardinality in the GUI, keep in mind that it is in the Debug channel (so you’ll need to select this in order to see it).  This also implies that there is likely nontrivial overhead to enabling this event, so while there isn’t an explicit warning for this event like other more invasive ones, I would still use it with caution.

As for the description of this event in the GUI, it is as follows:

“Occurs when the query optimizer estimates cardinality on a relational expression.”

Okay – no big deal, right?  Why care?

So in keeping things simple, I executed the following query against a single table (using the RECOMPILE so I can get the event each time I tested it out):

SELECT AddressLine1
FROM [SalesLT].[Address]
WHERE AddressID = 9
OPTION (RECOMPILE);

The actual query execution plan had a Clustered Index Seek with an estimate of 1 row.  And gathering the query_optimizer_estimate_cardinality event I saw two events surfaced.

The first event had the following information:

calculator <CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcUniqueKeyFilter” />
</CalculatorList>
creation_time 2013-11-16 16:56:35.6666666
input_relation <Operator Name=”LogOp_Select” ClassNo=”32″>
<StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”450.00″ TableName=”SalesLT.Address” />
<Operator Name=”ScaOp_Comp ” ClassNo=”100″>
<CompInfo CompareOp=”EQ” />
<Operator Name=”ScaOp_Identifier ” ClassNo=”99″>
<IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Address]” ColumnName=”AddressID” />
</Operator>
<Operator Name=”ScaOp_Const ” ClassNo=”98″>
<ConstInfo Type=”int” Value=”(9)” />
</Operator>
</Operator>
</Operator>
query_hash 13158512245962950952
stats_collection <StatsCollection Name=”CStCollFilter” Id=”2″ Card=”1.00″ />
stats_collection_id 2

The second event had the following information:

calculator <CalculatorList />
creation_time 2013-11-16 16:56:35.6666666
input_relation <Operator Name=”LogOp_SelectIdx” ClassNo=”43″>
<StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”450.00″ TableName=”SalesLT.Address” />
<Operator Name=”ScaOp_Comp ” ClassNo=”100″>
<CompInfo CompareOp=”EQ” />
<Operator Name=”ScaOp_Identifier ” ClassNo=”99″>
<IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Address]” ColumnName=”AddressID” />
</Operator>
<Operator Name=”ScaOp_Const ” ClassNo=”98″>
<ConstInfo Type=”int” Value=”(9)” />
</Operator>
</Operator>
</Operator>
query_hash 13158512245962950952
stats_collection <StatsCollection Name=”CStCollFilter” Id=”2″ Card=”1.00″ />
stats_collection_id 2

So there is a lot here to dig through, but I highlighted a couple of values that stood out. And I know that AddressID happens to be my clustered, unique, primary key column for this table.

What happens if I reference a non-unique key value that is covered by an index (such as StateProvince)?

SELECT AddressID
FROM [SalesLT].[Address]
WHERE StateProvince = 'Arizona'
OPTION (RECOMPILE);

This query uses in Index Seek as I expected, and this time for query_optimizer_estimate_cardinality I saw a new calculator value:

<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcColumnInInterval” Selectivity=”0.029″ TableName=”[AdventureWorksLT2012].[SalesLT].[Address]” ColumnName=”StateProvince” StatId=”5″ />
</CalculatorList>

The stats_collection value was as follows:

<StatsCollection Name=”CStCollFilter” Id=”2″ Card=”13.00″>
<LoadedStats>
<StatsInfo DbId=”5″ ObjectId=”69575286″ StatsId=”5″ />
</LoadedStats>
</StatsCollection>

Also – for a scenario where I didn’t have stats – and disabled them from being auto-created (to simulate a wild guess scenario), I saw the following calculator list:

<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcPointPredsFreqBased“>
<SubCalculator Role=”DistinctCountPlan”>
<DistinctCountCalculator CalculatorName=”CDVCPlanLeaf Guesses=”1″ CoveringStatId=”4″ CoveringStatDensity=”450.000″ />
</SubCalculator>
</FilterCalculator>

</CalculatorList>

The “Guesses” part looks promising (thinking magic numbers/selectivity guesses/heuristics, whatever you like to call it).

When executing a query that kicked off auto-stats operations, I saw the following operator information:

<Operator Name=”LogOp_GbAgg” ClassNo=”31″>
<StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”847.00″ TableName=”SalesLT.Customer” />
<Operator Name=”AncOp_PrjList ” ClassNo=”137″>
<Operator Name=”AncOp_PrjEl ” ClassNo=”138″>
<Operator Name=”ScaOp_AggFunc ” ClassNo=”90″>
<AggFuncInfo AggType=”STATMAN” />
<Operator Name=”ScaOp_Identifier ” ClassNo=”99″>
<IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Customer]” ColumnName=”LastName” />
</Operator>
</Operator>
</Operator>
</Operator>
</Operator>

And I saw the following calculator information (for the auto-stats operations):

<CalculatorList>
<DistinctCountCalculator CalculatorName=”CDVCPlanTrivial” />
</CalculatorList>

And lastly, I tried a query with a bad-practice (fiddling with the column reference via concatenation) to see what steps would be taken:

SELECT [CustomerID]
FROM [SalesLT].[Customer]
WHERE LastName + ' ' = 'Gates'
OPTION (RECOMPILE);

This query plan just had a Clustered Index Scan, but spawned five query_optimizer_estimate_cardinality events associated with it (and I tested this a few times to see if the 5-event output was consistent):

calculator input_relation
<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcHistogramComparison” Selectivity=”0.002″ ComparisonType=”Interval” />
</CalculatorList>
<Operator Name=”LogOp_Select” ClassNo=”32″>
<StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”847.00″ TableName=”SalesLT.Customer” />
<Operator Name=”ScaOp_Comp ” ClassNo=”100″>
<CompInfo CompareOp=”EQ” />
<Operator Name=”ScaOp_Arithmetic ” ClassNo=”87″>
<ArithmeticInfo Operation=”ADD” />
<Operator Name=”ScaOp_Identifier ” ClassNo=”99″>
<IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Customer]” ColumnName=”LastName” />
</Operator>
<Operator Name=”ScaOp_Const ” ClassNo=”98″>
<ConstInfo Type=”nvarchar(1)” Value=”N’ ‘” />
</Operator>
</Operator>
<Operator Name=”ScaOp_Const ” ClassNo=”98″>
<ConstInfo Type=”nvarchar(5)” Value=”N’Gates’” />
</Operator>
</Operator>
</Operator>
<CalculatorList /> <Operator Name=”LogOp_Project” ClassNo=”29″>
<OpProjectInfo />
<StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”847.00″ TableName=”SalesLT.Customer” />
<Operator Name=”AncOp_PrjList ” ClassNo=”137″>
<Operator Name=”AncOp_PrjEl ” ClassNo=”138″>
<Operator Name=”ScaOp_Arithmetic ” ClassNo=”87″>
<ArithmeticInfo Operation=”ADD” />
<Operator Name=”ScaOp_Identifier ” ClassNo=”99″>
<IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Customer]” ColumnName=”LastName” />
</Operator>
<Operator Name=”ScaOp_Const ” ClassNo=”98″>
<ConstInfo Type=”nvarchar(1)” Value=”N’ ‘” />
</Operator>
</Operator>
</Operator>
</Operator>
</Operator>
<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcHistogramComparison” Selectivity=”0.002″ ComparisonType=”Interval” />
</CalculatorList>
<Operator Name=”LogOp_Select” ClassNo=”32″>
<StatsCollection Name=”CStCollProject” Id=”3″ Card=”847.00″ />
<Operator Name=”ScaOp_Comp ” ClassNo=”100″>
<CompInfo CompareOp=”EQ” />
<Operator Name=”ScaOp_Identifier ” ClassNo=”99″>
<IdentifierInfo ColumnName=”Expr1002″ />
</Operator>
<Operator Name=”ScaOp_Const ” ClassNo=”98″>
<ConstInfo Type=”nvarchar(5)” Value=”N’Gates’” />
</Operator>
</Operator>
</Operator>
<CalculatorList /> <Operator Name=”LogOp_Project” ClassNo=”29″>
<OpProjectInfo />
<StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”847.00″ TableName=”SalesLT.Customer” />
<Operator Name=”AncOp_PrjList ” ClassNo=”137″>
<Operator Name=”AncOp_PrjEl ” ClassNo=”138″>
<Operator Name=”ScaOp_Arithmetic ” ClassNo=”87″>
<ArithmeticInfo Operation=”ADD” />
<Operator Name=”ScaOp_Identifier ” ClassNo=”99″>
<IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Customer]” ColumnName=”LastName” />
</Operator>
<Operator Name=”ScaOp_Const ” ClassNo=”98″>
<ConstInfo Type=”nvarchar(1)” Value=”N’ ‘” />
</Operator>
</Operator>
</Operator>
</Operator>
</Operator>
<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcHistogramComparison” Selectivity=”0.002″ ComparisonType=”Interval” />
</CalculatorList>
<Operator Name=”LogOp_Select” ClassNo=”32″>
<StatsCollection Name=”CStCollProject” Id=”5″ Card=”847.00″ />
<Operator Name=”ScaOp_Comp ” ClassNo=”100″>
<CompInfo CompareOp=”EQ” />
<Operator Name=”ScaOp_Identifier ” ClassNo=”99″>
<IdentifierInfo ColumnName=”Expr1002″ />
</Operator>
<Operator Name=”ScaOp_Const ” ClassNo=”98″>
<ConstInfo Type=”nvarchar(5)” Value=”N’Gates’” />
</Operator>
</Operator>
</Operator>

Lots of scenarios to mull over and dig through as time permits.

Why care?

Many query performance issues (and associated query plan quality issues) are due to cardinality estimate skews.    It would be great to have a way to more efficiently point to how the various estimates are being calculated and why the estimates are off.

I’m not sure how in-depth this event and associated calculators will be documented by Microsoft, and my assumption is that we’ll need to figure it out via collective reverse-engineering.  But in the meantime this new XE event might prove to be quite useful for troubleshooting the more mysterious cardinality estimates.

New Course: “SQL Server: Temporary Objects”

This is my first blog post since going-independent. As mentioned in my previous post, I’ll still be blogging periodically on my original SQLskills blog. 

As for what I’ve been up to these last few weeks, I’ve been working on-site with a local Minneapolis client – spending about half of my time on the DBA side of things and half embedded with a developer team.  I’ve been helping out on longer-term projects and I’m definitely enjoying it so far.   

So on to the subject of this brief blog post – today Pluralsight published my new course, “SQL Server: Temporary Objects”.

This is a short course on how to effectively leverage SQL Server temporary objects in your T-SQL code. The decision as to whether to use temporary tables or table variables can have a significant impact on your workload performance and scalability, and this course walks through several specific scenarios so that you can see first-hand the nuances of this design decision.

There and Back (to Minneapolis) Again

Exactly two years ago I started my journey with SQLskills. It has been an incredible experience and I’m grateful for my time with everyone in the company and also grateful for all the people I got to meet along the way. I’ve gained friendships and working relationships that will last a lifetime.

While this has been a great adventure, I found myself feeling over the last two years that I’m a bit of a Hobbit at heart (hence the blog post title reference). I prefer the comforts of home and have missed the longer-term, multi-month engagements and projects that I’ve worked on in the past in other roles. While I do a lot of consulting work at SQLskills, that work is mostly shorter engagements spread over many clients, so I realized a transition would be needed at some point.

Paul and I worked out the plan last May, so this plan has been in-the-works for a while. Nothing dramatic to report and no regrets on either side.

Where am I going next?

I set up my own company and booked my first local client engagement in Minneapolis starting October 1st working with a local Fortune 500 company for at least the next few months full time, if not longer.

I’ll still be partnering with SQLskills when the opportunity arises, be it a consulting engagement, guest-blog post or Pluralsight course.

Not to be sentimental – and yet I am – so I’ll share just a few of my favorite SQLskills memories. It was hard to choose, since there were so many and they may seem like simple experiences, but it is the simple stuff that means the most to me in life:

  • Playing chess and drinking Prosecco late at night with Paul after class. We had the best conversations and sometimes Paul would even let me win (or rather, I would wait for him to get sleepy enough so that I would start winning).
  • During the Immersion Events when not teaching that day, Kimberly would invite me to tag along to pick up the books for the next week of class at Kinkos. We had our best conversations during these mini-road trips.
  • Visiting Jonathan in Florida and spending time with him and his incredible family. Jonathan is fiercely loyal and big-hearted (but don’t tell him I said that).
  • Laughing at Glenn’s stealth jokes and wicked-dry humor. Favorite Glenn catch phrases include “that’s just silly” – and how he gets all excited about his favorite processor chipset.
  • Weekly discussions with Erin have been incredible – discussing both life and SQL Server with her – and I always got something from the conversation.

So next week I’ll be teaching in Seattle (IE2: Immersion Event on Performance Tuning). It will be a great way to end this two-year adventure.

So that’s all for now. But you’ll still be hearing from me on this blog and elsewhere. Thanks!

Workloads that generate the 0×0000000000000000 query hash value?

In the last SQLskills Insider demo video I went through a scenario where you can use the query_hash column from sys.dm_exec_query_stats to aggregate statistics for queries that only differ by literal values.

I received an interesting question from someone who watched the video, ran the demo query in production and saw a 0×0000000000000000 value as the highest consumer of worker time (and also had the highest execution count).

I asked this person to show me the query text where [query_hash] = 0×0000000000000000 and it ended up being a series of FETCH NEXT and FETCH API_CURSOR calls, for example:

FETCH API_CURSOR00000000072ADF22
FETCH API_CURSOR00000000072A0B6E
FETCH API_CURSOR000000000729B7B2
FETCH API_CURSOR000000000729030C
FETCH API_CURSOR000000000729030C

There may be other circumstances where 0×0000000000000000 may manifest – but I haven’t witnessed them or heard of them.  If you have non-cursor scenarios where 0×0000000000000000 occurs, please share in the comments.

Thanks!

New Article on SQLPerformance.com “Exploring Low Priority Lock Wait Options in SQL Server 2014 CTP1”

My 10th guest blog post was published today on SQLPerformance.com:

Exploring Low Priority Lock Wait Options in SQL Server 2014 CTP1

In this post I further explore (from the last SQLPerformance.com post) online options, and specifically low priority lock wait improvements being included in SQL Server 2014 CTP1.

New Article on SQLPerformance.com “Exploring Partition-Level Online Index Operations in SQL Server 2014 CTP1”

My 9th guest blog post was published today on SQLPerformance.com:

Exploring Partition-Level Online Index Operations in SQL Server 2014 CTP1

In this post I explore the online, single-partition rebuild improvement being introduced in SQL Server 2014 CTP1.

We can handle the truth

Today I attended the MCM call with Microsoft Learning (MSL).

I won’t get personal here, because in spite of everything, I do imagine that the folks in MSL are under quite a bit of stress right now (yes – so is the community, but more on that later).  I myself remember getting chewed out back in November of 2010 when we announced that SQL MCM was removing the training requirement.  While I received some support, I received a good share of hate-email and comments – and I cannot recall a more stressful period of time in my career.  Communicating change is tough – and there is definitely a right way to do it and a wrong way to do it.  I think the first mistake is to think your audience can’t handle the truth.

With that said, here, in my dream world, is what I wished MSL would have said on today’s call.  They might have expressed some variations on a few of these items – and I’ll save it for MSL to communicate – but otherwise this is just an imaginary list of talking points:

<imaginary MSL talking points>

  • I’m sorry about how and when we communicated the program cancellation.  It was incredibly ungracious and we really regret it.
  • For anyone who has invested in the program in the last X number of months, we’ll be providing full refunds and will work through each scenario on a case-by-case basis.
  • We will extend the ability to take exams for X number of months.  We agree it was unreasonable and unfair to give a 1 month notice.
  • We ended the MCM program because we never really knew how to make it work.  Our organization isn’t structured to support programs like this – programs that are strategic but don’t generate direct-revenue.
  • We wanted to model the programs after what Cisco does, but we didn’t actually do much of what we should have to make it more like Cisco.
  • We wanted MCM to have industry-wide recognition, but we didn’t invest in long-term marketing.
  • We don’t really plan on making an MCM\MCA replacement, hence the cancellation.
  • When we say “pause” – we mean cancel and retire.  There will be a new “top” tier certification, but a much broader audience and it will not resemble MCM.
  • Even if we ask the product team to protect these programs, they have other priorities right now and aren’t in the certification business.
  • We will move all distribution lists and NDA-access related benefits to someone on the MVP community team to manage.  They have budget and know how to handle very large technical communities. They will manage this moving forward and you will be a member of the community and will be grandfathered in as appreciation of your time investment.
  • Business is business, but we’ll throw in what perks we can to soften the blow (MSDN subscriptions, PASS tickets, we’ll use our imagination).

</imaginary MSL talking points>

Now back to my own, non-imaginary voice for a bit.  A few thoughts and opinions:

  • I do really hope that anyone in the pipeline gets a chance to complete what they started if they choose to do so.
  • I do also hope that people are reimbursed according to each situation.
  • I hold out a very, very small hope that the various product teams will re-adopt each MCM/MCA program.
  • I hope that everyone will be civil and not resort to bullying the people in MSL.  Be tough.  Be honest.  Be vocal.  But don’t be vicious or get personal please.  Keep perspective.
  • I know we don’t need an acronym to be masters with the product.  The biggest benefit of being an MCM was the community and also the process of achieving it.
  • We’ll all be okay.

Lastly, I of course remain fiercely loyal to SQL Server.  It is the horse I bet on 16 years ago and I have no regrets.   But as for the SQL Server certification program, quite a bit will need to happen before I would feel comfortable advocating for them again.