The post Control Flow vs. Data Flow Demo appeared first on Joe Sack.
]]>Let’s start by dropping clean buffers (on a test system, please):
DBCC DROPCLEANBUFFERS; USE [Credit]; GO SELECT COUNT(*) AS [page_count] FROM [sys].[dm_os_buffer_descriptors] AS bd WHERE [bd].[database_id] = DB_ID() AND [bd].[allocation_unit_id] = 15045483298816; GO
This returns 0 rows for the allocation_unit_id associated with the table we’re about to query from the Credit database:
SELECT [charge].[charge_no] FROM [dbo].[charge] ORDER BY [charge].[charge_no]; GO
The actual plan shows the following (via SQL Sentry Plan Explorer):
Nothing fancy, just a Clustered Index Scan. And in terms of page counts from sys.dm_os_buffer_descriptors, we see 9,303 data pages in cache now.
Now let’s drop the clean buffers and execute a query returning just the top 100 rows:
DBCC DROPCLEANBUFFERS; USE [Credit]; GO SELECT TOP 100 [charge].[charge_no] FROM [dbo].[charge] ORDER BY [charge].[charge_no]; GO
The actual plan is as follows:
Re-executing the query against sys.dm_os_buffer_descriptors, this time we see just 13 data pages.
The original question/discussion was with regards to the storage engine – and whether all data pages still get loaded into memory even with a TOP. As we see in this scenario, this was not the case.
The post Control Flow vs. Data Flow Demo appeared first on Joe Sack.
]]>The post Redundant Query Plan Branches appeared first on Joe Sack.
]]>CREATE VIEW [dbo].[basic_member] AS SELECT [member].[member_no], [member].[lastname], [member].[firstname], [member].[middleinitial], [member].[street], [member].[city], [member].[state_prov], [member].[mail_code], [member].[phone_no], [member].[region_no], [member].[expr_dt], [member].[member_code] FROM [dbo].[member] WHERE [member].[member_no] NOT IN (SELECT [corp_member].[member_no] FROM [dbo].[corp_member]); GO
A simple SELECT from this view returns 8,498 rows and has the following plan shape (and I’m boxing in an “areas of interest” via SQL Sentry Plan Explorer’s rendering of the plan):
We see that the view has a predicate on member_no NOT IN the corp_member table. But what happens if the original report writer doesn’t look at the view definition and decides they need this same predicate applied at the the view reference scope (not realizing this was already taken care of)? For example:
SELECT [basic_member].[member_no], [basic_member].[lastname], [basic_member].[firstname], [basic_member].[middleinitial], [basic_member].[street], [basic_member].[city], [basic_member].[state_prov], [basic_member].[mail_code], [basic_member].[phone_no], [basic_member].[region_no], [basic_member].[expr_dt], [basic_member].[member_code] FROM [dbo].[basic_member] WHERE [basic_member].[member_no] NOT IN (SELECT [corp_member].[member_no] FROM [dbo].[corp_member]);
Like the previous query against the view, we see 8,498 rows. But unlike the previous query, we see the following plan:
Notice the redundancy – even though the result set is identical between the two versions. And the tables I’m using are small, but you can still see the difference in scan count and logical reads.
Query Against View
Table ‘member’. Scan count 2, logical reads 305, physical reads 2, read-ahead reads 294, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘corporation’. Scan count 1, logical reads 8, physical reads 1, read-ahead reads 6, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Query with Redundant Predicate
Table ‘member’. Scan count 3, logical reads 325, physical reads 2, read-ahead reads 294, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘corporation’. Scan count 2, logical reads 16, physical reads 1, read-ahead reads 6, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
And there is additional I/O overhead associated with the second plan of course. For the Credit database, the scale is small, but imagine the difference for a very large table.
You cannot always count on easily identifying redundant areas. The Query Optimizer may find an optimal plan in spite of the way it was written – but if you do see repeating branches in a query execution tree associated with a performance issue, you may want to explore the possibility of overlapping/redundant logic further.
The post Redundant Query Plan Branches appeared first on Joe Sack.
]]>The post What Does the Future Hold for Cardinality Estimates and Cost Models in Windows Azure SQL Database? appeared first on Joe Sack.
]]>First of all, I saw a blog post from Grant Fritchey where he noticed that the estimated costs for a few plan operators were different.
Secondly, I read a paper called “Testing Cardinality Estimation Models in SQL Server” by Campbell Fraser, Leo Giakoumakis, Vikas Hamine, and Katherine F. Moore-Smith. This was a for-fee article, but the non-member price of $15.00 was worth it. One particularly interesting quote was as follows:
“The new CE model is planned as a future service release of the Microsoft SQL Azure service.”
That quote was a tipping point for further investigation, so, collaborating with Jonathan Kehayias, we discussed a testing approach and set up two different Azure databases, with one database on Web Edition and the other Business Edition. The intention wasn’t to perform “formal” tests, but I did want to sniff around and see what variations in cost and cardinality estimates I could find (if any) between SQL Azure (version 11.0.2006) and SQL Server 2012 (version 11.0.2316) across various types of queries. I used the Credit database for a variety of test queries – with identical schema and data in all three databases (one engine DB and two Azure DBs).
One thing I’ve learned so far is that you should watch out for is misinterpreting cost differences for “identical” databases. Even if you load the same exact schema and rows, you will likely have a different data page count between Engine and Azure (think of the Azure fillfactor and RCSI behavior). For example, after loading my dbo.member table in SQL Azure, it had 159 pages versus the 142 pages in my SQL Server 2012 version. So testing an initial Clustered Index Scan query against that table showed me an estimated IO of 0.1075694 in SQL Server 2012 versus 0.120162 for SQL Azure. So assuming one random I/O and the rest sequential, I see that my SQL Azure cost is still calculated the same for the Clustered Index Scan:
— Random I/O – 0.003125
— Sequential – 0.000740741
SELECT 0.003125 +
0.000740741 * (159-1);
So the key will be to make sure I’m looking at true apples-to-apples comparisons here. I’ll be testing when I have a few spare moments between other tasks – but in the meantime I’m very interested to learn more about what new changes will come in to SQL Azure in the future. I’ll share anything interesting I find on the blog – and if you find noteworthy QO items, please share on the comments of this blog as well.
Thanks!
The post What Does the Future Hold for Cardinality Estimates and Cost Models in Windows Azure SQL Database? appeared first on Joe Sack.
]]>The post Capturing Transient Query Plan Changes appeared first on Joe Sack.
]]>– What were the wait stats associated with the unpredictable query?
– What did the query execution plan look like in the “good” versus “bad” condition?
To address the query wait stats question, I set up an Extended Events session to track the query’s accumulated wait stats for each execution. The performance issue would only happen a couple of times today, so I scheduled a job to loop the execution – without disconnecting, and keeping the session ID being referenced in the Extended Events session. I set up a separate table to track each test run (begin/end time). This way the long periods could be associated back to the Extended Events session.
We caught a few instances of the long running query, and the associated wait stats were primarily related to PAGEIOLATCH_SH. That opened up other considerations which I don’t cover in this post, but I was still interested in seeing the execution plan for the long-running time periods versus the “steady” state.
For example, let’s say you have the following query:
EXEC sp_executesql N'SELECT p.ProductLine, SUM(f.SalesAmount) TotalSalesAmount FROM [dbo].[FactInternetSales] f INNER JOIN [dbo].[DimProduct] p ON f.ProductKey = p.ProductKey GROUP BY p.ProductLine ORDER BY p.ProductLine'; GO
This particular query leverages parallelism and a columnstore index in my database. I can search for the query hash and query plan hash as follows (after executing on my test system):
SELECT TOP 1 last_execution_time, query_hash, query_plan_hash, p.query_plan FROM sys.dm_exec_query_stats AS qs CROSS APPLY sys.dm_exec_sql_text (qs.plan_handle) AS t CROSS APPLY sys.dm_exec_query_plan (qs.plan_handle) AS p WHERE t.text LIKE '%SUM(f.SalesAmount)%' ORDER BY last_execution_time DESC; GO
I order it by last execution because I want the most recent execution version (and you have to be careful when grabbing the query_hash to make sure the text match in your WHERE clause isn’t capturing another query). Obviously easier to do on an isolated test system – but not impossible in production if you pay attention to execution count statistics. The following shows example results from the previous query:
I was interested in the query_hash and also the query_plan – which for this example, is the parallel plan:
Now what if I re-execute my query – but with parallelism disabled at the server scope? No query change – but now the plan must change since it can’t run in batch execution mode. Will tracking it based on my query_hash still work?
SELECT query_plan_hash, p.query_plan FROM sys.dm_exec_query_stats AS qs CROSS APPLY sys.dm_exec_query_plan (qs.plan_handle) AS p WHERE qs.query_hash = 0x3759AECF09255926 GO
In this case, I haven’t changed my query, but the plan changed, and so my query_hash allowed me to see the new serial query plan:
In the particular consulting scenario I mentioned at the beginning of this post, I wrote the most recent query plan to a separate table for each test run so that when the long performance happened on the test environment, we could go back and compare the different plans based on the same query hash value.
The post Capturing Transient Query Plan Changes appeared first on Joe Sack.
]]>The post SQL Server 2012’s Information on Parallel Thread Usage appeared first on Joe Sack.
]]>The new thread statistics show information on the number of concurrent execution paths within an execution plan, the count of used threads and also the count of reserved threads per NUMA node. Definitely a useful feature, giving you visibility to actual thread reservation and not just the number of concurrently executing workers.
The post SQL Server 2012’s Information on Parallel Thread Usage appeared first on Joe Sack.
]]>The post Memory Grant Execution Plan Statistics appeared first on Joe Sack.
]]>Here is an example of MemoryGrantInfo from an actual SQL Server 2012 execution plan:
<MemoryGrantInfo SerialRequiredMemory="5632" SerialDesiredMemory="11016" RequiredMemory="47368" DesiredMemory="52808" RequestedMemory="52808" GrantWaitTime="0" GrantedMemory="52808" MaxUsedMemory="4312" />
The unit of measurement is in KB – and in the below perfmon screenshot you can see that the “52808” value in the GrantedMemory attribute matches the Granted Workspace Memory (KB) performance monitor counter:
Examining the latest Showplan Schema (last updated March 2012) under the MemoryGrantType section, you’ll find a documentation element which states the following:
“Provide memory grant estimate as well as actual runtime memory grant information. Serial required/desired memory attributes are estimated during query compile time for serial execution. The rest of attributes provide estimates and counters for query execution time considering actual degree of parallelism.”
To test the serial plan related memory stats, I re-ran my original query using MAXDOP 1 and got the following results in my actual execution plan:
<MemoryGrantInfo SerialRequiredMemory="1536" SerialDesiredMemory="1600" RequiredMemory="1536" DesiredMemory="1600" RequestedMemory="1600" GrantWaitTime="0" GrantedMemory="1600" MaxUsedMemory="328" />
The serial memory (desired and required) numbers shifted downwards. My original MemoryGrantInfo was for a query referencing a columnstore index executing in batch mode (with parallelism, as required).
Here is the original plan:
By capping the parallelism to a MAXDOP 1, the QO still chose to reference the columnstore index scan operator, but the plan operator requirements changed and thus so did the overall memory requirements (including a change from a Hash Match to a Merge Join):
The Granted Workspace Memory (KB) perfmon counter – which showed 1600 KB for the serial plan – matched the 1600 value for SerialDesiredMemory, DesiredMemory and RequestedMemory.
While the MemoryGrantInfo element doesn’t replace the need for the sys.dm_exec_query_memory_grants DMV when trying to evaluate request level concurrent memory grant activity, having this additional information in the execution plan is much more convenient when you’re trying to identify requirements scoped to a specific query.
The post Memory Grant Execution Plan Statistics appeared first on Joe Sack.
]]>The post SQL Server 2012’s RetrievedFromCache Attribute appeared first on Joe Sack.
]]>Let’s say I execute the following query immediately after executing DBCC FREEPROCCACHE:
SELECT p.ProductLine, SUM(f.SalesAmount) AS TotalSalesAmount FROM [dbo].[FactInternetSales] AS f INNER JOIN [dbo].[DimProduct] AS p ON f.ProductKey = p.ProductKey GROUP BY p.ProductLine ORDER BY p.ProductLine;
What value for RetrievedFromCache would you expect to see? In this example, I saw the following attribute value (with the attribute highlighted and StmtSimple abridged for clarity):
<StmtSimple StatementCompId=”2″ StatementEstRows=”5″ StatementId=”1″ StatementOptmLevel=”FULL” StatementSubTreeCost=”57.0909″ RetrievedFromCache=”true”>
This value is also “true” for scenarios where you use sp_recompile on a module – as it just means that object will be recompiled on the next run and retrieved from cache.
What if I add a RECOMPILE query hint?
SELECT p.ProductLine, SUM(f.SalesAmount) AS TotalSalesAmount FROM [dbo].[FactInternetSales] AS f INNER JOIN [dbo].[DimProduct] AS p ON f.ProductKey = p.ProductKey GROUP BY p.ProductLine ORDER BY p.ProductLine OPTION (RECOMPILE);
This time, I saw a “false” for RetreivedFromCache:
<StmtSimple StatementCompId=”1″ StatementEstRows=”5″ StatementId=”1″ StatementOptmLevel=”FULL” StatementSubTreeCost=”57.0909″ RetrievedFromCache=”false”>
And what about scenarios where you have “optimize for ad hoc workloads” enabled for the SQL Server instance?
EXEC sp_configure 'show advanced options',1; RECONFIGURE; EXEC sp_configure 'optimize for ad hoc workloads',1; RECONFIGURE;
I executed DBCC FREEPROCCACHE and execute the following query (which should be “stubbed” given the server option):
SELECT p.ProductLine, SUM(f.SalesAmount) AS TotalSalesAmount FROM [dbo].[FactInternetSales] AS f INNER JOIN [dbo].[DimProduct] AS p ON f.ProductKey = p.ProductKey GROUP BY p.ProductLine ORDER BY p.ProductLine; GO
Sure enough – the RetrievedFromCache is False:
<StmtSimple StatementCompId=”1″ StatementEstRows=”5″ StatementId=”1″ StatementOptmLevel=”FULL” StatementSubTreeCost=”57.0909″ RetrievedFromCache=”false”>
And if I execute the SELECT query a second time without clearing the cache, it turns to “true”:
<StmtSimple StatementCompId=”1″ StatementEstRows=”5″ StatementId=”1″ StatementOptmLevel=”FULL” StatementSubTreeCost=”57.0909″ RetrievedFromCache=”true”>
Now if I disable “optimize for ad hoc workloads” – what changes?
EXEC sp_configure 'optimize for ad hoc workloads',0; RECONFIGURE; EXEC sp_configure 'show advanced options',0; RECONFIGURE;
As expected – after executing DBCC FREEPROCCACHE and executing the SELECT query, I see a RetrievedFromCache value of “true” in contrast to “false” when optimize for ad hoc workloads is enabled.
The post SQL Server 2012’s RetrievedFromCache Attribute appeared first on Joe Sack.
]]>The post SpillToTempDb warning and SpillLevel’s mapping to single versus multiple pass appeared first on Joe Sack.
]]>This post shows a few examples of Sort related SpillToTempDb execution plan warnings and the associated SpillLevel attribute.
This blog post is based on SQL Server 2012, version 11.0.2316 and I’m using the AdventureWorksDW2012 database and creating a separate version of the FactInternetSales table called FactInternetSales_Spill:
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey, CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight, CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate INTO [dbo].[FactInternetSales_Spill] FROM [dbo].[FactInternetSales];
I started off with 60,398 rows and no indexes. I then created a clustered index and had the Include Actual Execution Plan enabled:
ALTER TABLE [dbo].[FactInternetSales_Spill] ADD CONSTRAINT [PK_FactInternetSales_SalesOrderNumber_SalesOrderLineNumber_Spill] PRIMARY KEY CLUSTERED ( [SalesOrderNumber] ASC, [SalesOrderLineNumber] ASC )WITH (ONLINE = OFF) ON [PRIMARY]; GO
The associated execution plan had no spill warnings:
I pumped up the size of this table to 664,378 rows:
INSERT [dbo].[FactInternetSales_Spill] (ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey, CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight, CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate) SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey, CurrencyKey, SalesTerritoryKey, LEFT(CAST(NEWID() AS NVARCHAR(36)),20), SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight, CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate FROM [dbo].[FactInternetSales]; GO 10
Dropping and and re-creating the index, I still didn’t see spill warnings, so I pumped it up to 7,912,138 rows:
INSERT [dbo].[FactInternetSales_Spill] (ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey, CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight, CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate) SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey, CurrencyKey, SalesTerritoryKey, LEFT(CAST(NEWID() AS NVARCHAR(36)),20), SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight, CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate FROM [dbo].[FactInternetSales]; GO 120
Creating the clustered index on this larger table caused the spill warning to be raised for the Sort iterator (screen shot from the graphical plan, properties of the Sort iterator and the blurb from the XML showplan):
Now the “SpillLevel=”8” was interesting to me. I was also running “old-school” profiler AND an extended events session at the time to see what they had to say about these warnings.
In profiler, I saw 9 sort warning events, 8 of which are “2 – Multiple pass”. The single pass means the sort table was written to disk and only a single pass over the data was required for the sorted input (but as you see, there were multiple spill events). The multiple pass means that, well, multiple passes over the spilled data were needed in order to obtain the sorted output:
As I expected, Extended Events tells me the same thing:
What was also interesting is the behavior if I set ONLINE = ON for my index creation:
ALTER TABLE [dbo].[FactInternetSales_Spill] ADD CONSTRAINT [PK_FactInternetSales_SalesOrderNumber_SalesOrderLineNumber_Spill] PRIMARY KEY CLUSTERED ( [SalesOrderNumber] ASC, [SalesOrderLineNumber] ASC )WITH (ONLINE = ON) ON [PRIMARY]; GO
Each event sub class now shows as “Single Pass”:
But that’s not all… The execution plan for my “ONLINE=ON” index creation shows a spill level of “1” – not “8” or “9”. So each single pass – even though it happens nine times, just shows up as just SpillLevel=”1”:
Now my index creation was executing with parallelism, and indeed 8 threads were involved the Sort iterator execution:
Removing parallelism from the picture, aside from my index creation taking significantly longer, I see only one spill warning now:
And setting the maximum degree of parallelism to “4” – I see a total of 5 warnings:
And the plan itself shows a spill level of “4”:
So the “multiple pass” increases the spill level one-for-one. If I have multiple “single pass” events for the same sort, it shows up as a spill level “1”.
This was just for my particular scenario. Have you seen other behaviors as well? If so, please share here.
The post SpillToTempDb warning and SpillLevel’s mapping to single versus multiple pass appeared first on Joe Sack.
]]>The post Partitions Accessed and Partition Range in the Query Execution Plan appeared first on Joe Sack.
]]>“Actual Partition Count” shows a value of 1 and “Actual Partitions Accessed” shows a value of 50. The “Actual Partitions Accessed” property name could cause confusion though, since what you’re actually looking at is the partition numbers accessed (not the count of partitions accessed).
I prefer the XML naming convention instead:
<PartitionsAccessed PartitionCount="1">
<PartitionRange Start="50" End="50" />
</PartitionsAccessed>
The name mapping is as follows from Graphical-to-XML Plan formats:
“Actual Partition Count” = PartitionsAccessed element & PartitionCount attribute
“Actual Partitions Accessed” = PartitionRange
If I modify the query to access two partitions, I see the following (graphical and XML plan output):
<PartitionsAccessed PartitionCount="2">
<PartitionRange Start="1" End="1" />
<PartitionRange Start="50" End="50" />
</PartitionsAccessed>
And here is an example accessing all partitions in the table:
<PartitionsAccessed PartitionCount="63">
<PartitionRange Start="1" End="63" />
</PartitionsAccessed>
Once you realize the mapping, its no big deal to understand what’s going on, although I do see it causing confusion (hence this blog post).
The post Partitions Accessed and Partition Range in the Query Execution Plan appeared first on Joe Sack.
]]>