SQL Server Maintenance Plans and Parallelism – Index Rebuilds

In my previous post, SQL Server Maintenance Plans and Parallelism – CHECKDB, we looked at the degree of parallelism used when CHECKDB is run.  It ultimately depends on SQL Server Edition and the max degree of parallelism setting for the instance, which is not the case for index rebuilds (today’s topic, as you probably surmised!).

Index Rebuilds

The max degree of parallelism can be configured for index rebuilds using WITH (MAXDOP = n):

USE [AdventureWords2012];
GO

ALTER INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
     REBUILD WITH (MAXDOP = 8);
GO

If this option is included, it overrides the max degree of parallelism value configured for the instance. For example, I can rebuild the IX_SalesOrderDetail_ProductID index on Sales.SalesOrderDetail with MAXDOP set to 8, even though MAXDOP is set to 4 for the instance.  If WITH (MAXDOP = n) is not specified for an ALTER INDEX … REBUILD statement, then SQL Server will use the MAXDOP value set for the instance.

Now, unfortunately, parallel index operations are only permitted in Enterprise Edition.  If you’re running Standard Edition, you’re stuck with single threaded rebuilds, just like you’re stuck with single threaded integrity checks.  Despite this sad news, I thought I’d run through a demo that shows the max degree of parallelism used during the index rebuild. I’m going to run ALTER INDEX REBUILD for a selected index in the AdventureWorks2012 database, and I’ll use Extended Events to capture each statement executed (sp_statement_completed event), and the actual query plan for the statement (query_post_execution_showplan event).

**Important note here again: it is NOT recommended to capture the query_post_execution_showplan event against a live, production system.  This event generates significant performance overhead, and you are warned of this when configuring the session via the GUI.  If you repeat any of the demos here, please make sure to execute them against a test environment.  It’s very important to me that you do not bring down your production environment.**

Here are the statements to create the event session, start it, run the ALTER INDEX … REBUILD statements, then stop the event session.  As in my previous post, I am using a file target to capture the output, and the path is C:\temp.  You may need to modify this path for your environment.  I still have max degree of parallelism set to 4 for my instance, but we’ll set it before we run anything just for good measure.

sp_configure 'show advanced options', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
sp_configure 'max degree of parallelism', 4;
GO
RECONFIGURE WITH OVERRIDE;
GO

CREATE EVENT SESSION [CapturePlans] ON SERVER
ADD EVENT sqlserver.query_post_execution_showplan(
     ACTION(sqlserver.plan_handle,sqlserver.sql_text)),
ADD EVENT sqlserver.sp_statement_completed(
     ACTION(sqlserver.sql_text))
ADD TARGET package0.event_file(SET filename=N'C:\temp\CapturePlans.xel'),
ADD TARGET package0.ring_buffer(SET max_memory=(102400))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,
     MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF);
GO

ALTER EVENT SESSION [CapturePlans]
     ON SERVER
     STATE=START;
GO

USE [AdventureWords2012];
GO

ALTER INDEX [IX_SalesOrderDetailEnlarged_ProductID] ON [Sales].[SalesOrderDetailEnlarged]
     REBUILD WITH (MAXDOP = 8);
GO

ALTER INDEX [IX_SalesOrderDetailEnlarged_ProductID] ON [Sales].[SalesOrderDetailEnlarged]
     REBUILD;
GO

ALTER EVENT SESSION [CapturePlans]
     ON SERVER
     STATE=STOP;
GO

Note that I used a different version of the SalesOrderDetail table named SalesOrderDetailEnlarged.  This table has over 4 million rows in it and was populated using Jonathan’s Create Enlarged AdventureWorks Table script to ensure I’d have a table large enough to warrant a parallel rebuild.  After I stopped the event session I opened the .xel file from C:\temp in Management Studio and added the sql_text column to the display so I could easily find the ALTER INDEX statements.

The screen shot below is from the ALTER INDEX statement with MAXDOP = 8 included.  The query_post_execution_showplan event is highlighted, you can see the sql_text, and I hovered over the showplan_xml to show the first part of the xml version of the plan.  Note the red box around QueryPlan DegreeofParallelism…it’s 8, as expected:

indexrebuild 8 SQL Server Maintenance Plans and Parallelism   Index Rebuilds

ALTER INDEX … REBUILD WITH (MAXDOP = 8)

If you’re playing along at home in your test environment, you can click on the Query Plan to see the graphical view, or double-click the XML to view that plan that way.  Now check out the screen capture below, which is for the ALTER INDEX statement that did not include the MAXDOP option:

indexrebuild default SQL Server Maintenance Plans and Parallelism   Index Rebuilds

ALTER INDEX … REBUILD (default option)

The max degree of parallelism for the plan is 4 because if the MAXDOP option is not included, SQL Server uses the max degree of parallelism set for the instance.  Note that this holds true when parallelism is disabled for an instance (max degree of parallelism = 1):

sp_configure 'max degree of parallelism', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO

ALTER EVENT SESSION [CapturePlans]
 ON SERVER
 STATE=START;
GO

USE [AdventureWords2012];
GO

ALTER INDEX [IX_SalesOrderDetailEnlarged_ProductID] ON [Sales].[SalesOrderDetailEnlarged]
     REBUILD;
GO

ALTER EVENT SESSION [CapturePlans]
 ON SERVER
 STATE=STOP;
GO
indexrebuild default maxdop1 1024x509 SQL Server Maintenance Plans and Parallelism   Index Rebuilds

ALTER INDEX … REBUILD (default option) – MAXDOP = 1 for instance

The plan shows a DegreeOfParallelism of 0 – this means that the query did not use parallelism – and that the plan includes a NonParallelPlanReason* of “MaxDOPSetToOne”.  Therefore, if MAXDOP is set to 1 for an instance, and the default ALTER INDEX … REBUILD statements are used to rebuild indexes – where the MAXDOP option is not included – then rebuilds will be single-threaded.  For some well-known applications (e.g. SharePoint, SAP, BizTalk)  it is recommended to set the max degree of parallelism to 1 for the instance.  While that option may be appropriate for application-specific queries, it means that your index rebuild operations may run longer than if parallelism was enabled.  It may be worth modifying your index maintenance script to include the MAXDOP option for ALTER INDEX REBUILD statements.

In the event that you have a max degree of parallelism value above 1 specified for the instance, but you’re not sure what the “right” MAXDOP value should be for your index rebuilds, you can let SQL Server decide.  If you include the WITH (MAXDOP = 0) option in your rebuild syntax, then the optimizer will determine how many CPUs to use, which could be anywhere from 1 to all of the CPUs available to SQL Server.  This is the recommended setting per Books Online, but I would caution you to use this option only if you’re comfortable with SQL Server potentially using all CPUs for a rebuild.  If you happen to be running other tasks or processes in the database while the rebuilds run – not ideal, but for a 24×7 solution you often don’t have a choice – then you should specify a MAXDOP value below the total number of CPUs available.

Finally, in case you’re wondering about parallelism and reorganizing indexes…the WITH (MAXDOP = n) option is not available for ALTER INDEX REORGANIZE, as index reorganization is always a single-threaded operation.  The final post in this series will cover parallelism and the UPDATE STATISTICS command, and if you’re manually managing statistics and specifying the sample, you don’t want to miss it!

*If you’re interested, Joe talks about the NonParallelPlanReason attribute  in his post, SQL Server 2012 Execution Plan’s NonParallelPlanReason, which may be useful when you’re digging into execution plans in SQL Server 2012 and higher.

SQL Server Maintenance Plans and Parallelism – CHECKDB

Many posts and articles that discuss parallelism and SQL Server revolve around query performance and query optimization. Parallelism can affect performance, and some DBAs and database developers spend a great amount of time trying to find the “right” max degree of parallelism (MAXDOP) setting for an instance. Finding that right value is a science and an art, but understanding what degree of parallelism is used by SQL Server for standard maintenance tasks is straight science.

Parallelism and Instance Configuration

To start, I’m working under the assumption that you are familiar with parallelism and understand the implications of leaving max degree of parallelism set to 0, changing it to 1, or setting it to another value. Note: If you’re looking for background reading, I recommend Adam Machanic’s SQL University Parallelism Week posts (Part 1, Part 2, and Part 3), and Paul White’s article, Understanding and Using Parallelism in SQL Server. To understand what impact max degree of parallelism has on maintenance tasks, we need to know its value for an instance, and the easiest way to find it is via sys.configurations:

SELECT [name], [value], [value_in_use]
FROM [sys].[configurations]
WHERE [name] = 'max degree of parallelism';

In my environment, it’s set to 4 for purposes of this demo (I have 8 logical processors on my laptop):

MAXDOP SQL Server Maintenance Plans and Parallelism   CHECKDB

max degree of parallelism configuration

The three maintenance tasks I want to cover in this series are database consistency checks, index rebuilds, and statistic updates.  As you might have guessed from the title, this first post will cover consistency checks.

Database Consistency Checks

In November of last year I blogged about DBCC CHECKDB and parallel checks due to an inconsistency I found in Books Online. The inconsistency has since been fixed in BOL (hooray for Connect items!), and the short story is that any consistency check (CHECKDB, CHECKTABLE, CHECKFILEGROUP, and CHECKCATALOG) is single-threaded in Standard Edition. Regardless of the number of cores in your server and the MAXDOP setting for the instance, any CHECK command will be single-threaded.

In Enterprise Edition, checks can be done in parallel, and the max degree of parallelism is determined by the Query Processor. The Query Processor respects the max degree of parallelism setting for the instance, therefore since I have MAXDOP = 4 for my instance, up to four processors can be used by a CHECK command. There is no MAXDOP option that can be included with a CHECK command; it always respects the value set for the instance.  We can prove this using Extended Events and the sp_statement_completed and query_post_execution_showplan events.

**Important note here: it is NOT recommended to capture the query_post_execution_showplan event against a live, production system. This event generates significant performance overhead, and you are warned of this when configuring the session via the GUI. If you repeat any of the demos here, please make sure to execute them against a test environment. I really do not want you to bring down your production environment.**

Here are the statements to create the event session, start it, run CHECKDB, then stop the event session. I am using a file target to capture the output, and the path is C:\temp. You may need to modify this path for your environment.

CREATE EVENT SESSION [CapturePlans] ON SERVER
ADD EVENT sqlserver.query_post_execution_showplan(
ACTION(sqlserver.plan_handle,sqlserver.sql_text)),
ADD EVENT sqlserver.sp_statement_completed(
ACTION(sqlserver.sql_text))
ADD TARGET package0.event_file(SET filename=N'C:\temp\CapturePlans.xel'),
ADD TARGET package0.ring_buffer(SET max_memory=(102400))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,
MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF);
GO

ALTER EVENT SESSION [CapturePlans]
ON SERVER
STATE=START;
GO

DBCC CHECKDB (AdventureWorks2012) WITH NO_INFOMSGS;
GO

ALTER EVENT SESSION [CapturePlans]
ON SERVER
STATE=STOP;
GO

After stopping the event session, open the .xel file from C:\temp in Management Studio and add the sql_text column to the display to easily find the CHECKDB statement.  In the  screen shot below the query_post_execution_showplan event is highlighted, you can see the sql_text, and I hovered over the showplan_xml to show the first part of the xml version of the plan. Note the red box around QueryPlan DegreeofParallelism…it’s 4, as expected because it’s set to that for the instance.  The query plan is also included in the screen shot to show the full plan.

checkdb isparallel 1024x420 SQL Server Maintenance Plans and Parallelism   CHECKDB

DBCC CHECKDB with parallelism

To view the full XML for the plan, just double-click the showplan_xml value and it will open in a new window. Just for fun, if we change MAXDOP for the instance to 1 (which I’ve seen in some OLTP environments) note that CHECKDB now runs single-threaded, regardless of SQL Server version.

sp_configure 'show advanced options', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
sp_configure 'max degree of parallelism', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO

ALTER EVENT SESSION [CapturePlans]
ON SERVER
STATE=START;
GO

DBCC CHECKDB (AdventureWorks2012) WITH NO_INFOMSGS;
GO

ALTER EVENT SESSION [CapturePlans]
ON SERVER
STATE=STOP;
GO

Again, if we open the most recent .xel file in C:\temp and add the sql_text column, we can see that CHECKDB runs single-threaded:

checkdb noparallel 1024x420 SQL Server Maintenance Plans and Parallelism   CHECKDB

DBCC CHECKDB single-threaded

Realize that if max degree of parallelism is set to 1 for instance, and you’re running Enterprise Edition, your consistency checks could be taking longer than necessary.  The only way to work around this in Enterprise Edition is to change the max degree of parallelism for the instance, run the check, then change it back.  Alternatively, in some enterprise environments that are 24/7, it may not be desirable to have CHECK commands use a high number of threads.  In that case, lowering max degree of parallelism while CHECKs run may not be an option – you may not want to impact the performance of your regular queries.  Instead, you could disable parallel checking for CHECK commands entirely using trace flag 2528, or you could use Resource Governor to limit CPU use. And one final, interesting note (thanks Paul): CHECKALLOC is always single-threaded, regardless of Edition.

In the next post we’ll look at parallelism and index maintenance.

SQL Server Baselines Series on SQLServerCentral.com

Today the fourth article in my baselines series was published on SQLServerCentral.com.  Listed below are the links to all of the articles, and I’ve decided to continue the series on my blog here.  I truly believe that every SQL Server environment can benefit from having baselines.  I say that all the time, I know, and people seem to agree with me :)  But yet there are many systems out there that don’t have baseline data.  The biggest roadblock that I see is deciding what to capture, and then writing the scripts to capture what you want.  The goal of the series I wrote for SQLServerCentral.com was to provide background about why baselines are important, and provide a set of scripts to get started.  The value of baselines may seem obvious to you, but for new DBAs the benefits may not be as apparent, so taking a few minutes to step through them is useful.  The scripts I provide are straight-forward and there’s a reason for that: It’s not rocket science.  Capturing baseline data is really easy to do, and my goal is to provide scripts so DBAs can quickly and easily start capturing data, and hopefully once people realize how simple it was, they will develop their own scripts.

But just because I’ve provided a few scripts doesn’t mean I’m done.  I am going to continue to develop and post scripts here that DBAs can use to set up baselines in their own environment. I recognize that there are some excellent applications available from third party vendors that will capture baseline data, and if your company has the budget to purchase such an application, I would highly recommend doing so. My scripts are not a replacement for any of those applications. But for those of you who don’t have the budget, or perhaps need some data in order to make a case for why you need one of those applications, these scripts will get you started.

If you have any feedback about the scripts, or requests for what I write next, feel free to email me! The scripts are a work in progress and I plan to continually work to make them better.

5 Reasons You Must Start Capturing Baseline Data

Back to Basics: Capturing Baselines on Production SQL Servers

Capturing Baselines on SQL Server: Where’s My Space?

Capturing Baselines on SQL Server: Wait Statistics

Understanding When Statistics Will Automatically Update

During the PASS Summit in November I presented a session on Demystifying Database Statistics (if you attended Summit but missed it, you can stream it from here). During the session I went through a demo that showed the automatic update of a statistic. The database had the Auto Update Statistics option enabled, and I had added almost 25,000 rows to a table with 121,000 rows. The number of rows added via the bulk import was just over the limit to trigger an auto-update of statistics for the next query, which I showed. I had many people ask why statistics did not update immediately after the bulk load of the 25,000 rows completed, so I want to step through the demo that I did at Summit to clarify when an automatic update will occur.

The Setup

Start with a copy of the AdventureWorks2012 database, which you can download from CodePlex. Ensure that the Auto Update Statistics option is enabled:

IF (SELECT COUNT(*) FROM [sys].[databases] WHERE [name] = 'AdventureWorks2012' AND [is_auto_create_stats_on] = 0) = 0
BEGIN
ALTER DATABASE [AdventureWorks2012] SET AUTO_UPDATE_STATISTICS ON
END;

We’re going to use a copy of the Sales.SalesOrderDetail table for the demo. After we create the table, we will check to see when statistics last updated. We can use various methods to check statistics date, such as DBCC SHOW_STATISTICS or STATS_DATE, but since the release of SP1 for SQL Server 2012, I have exclusively used sys.dm_db_stats_properties to get this information.

USE [AdventureWorks2012];
GO
SELECT *
INTO [Sales].[TestSalesOrderDetail]
FROM [Sales].[SalesOrderDetail];
GO
CREATE CLUSTERED INDEX [PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID] ON [Sales].[TestSalesOrderDetail] ([SalesOrderID], [SalesOrderDetailID]);
GO
CREATE UNIQUE NONCLUSTERED INDEX [AK_TestSalesOrderDetail_rowguid] ON [Sales].[TestSalesOrderDetail] ([rowguid]);
GO
CREATE NONCLUSTERED INDEX [IX_TestSalesOrderDetail_ProductID] ON [Sales].[TestSalesOrderDetail] ([ProductID]);
GO
SELECT
OBJECT_NAME([sp].[object_id]) AS "Table",
[sp].[stats_id] AS "Statistic ID",
[s].[name] AS "Statistic",
[sp].[last_updated] AS "Last Updated",
[sp].[rows],
[sp].[rows_sampled],
[sp].[unfiltered_rows],
[sp].[modification_counter] AS "Modifications"
FROM [sys].[stats] AS [s]
OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [sp]
WHERE [s].[object_id] = OBJECT_ID(N'Sales.TestSalesOrderDetail');

Here’s the output:

stats auto update 1 1024x82 Understanding When Statistics Will Automatically Update

sys.dm_db_stats_properties after initially creating the table

This is my go-to DMV for statistics because in addition to including the date that statistics were last updated and row information, I also get a count of modifications since the last statistic update. I just created the table, so the Last Updated date is current, and I have not made any changes so the modification count is 0.

Invalidating Statistics

For my demo I want to bulk insert enough rows to invalidate the statistics. SQL Server has pre-determined thresholds where it considers statistics to be out-of-date and therefore invalid. The technical article, Statistics Used by the Query Optimizer in Microsoft SQL Server 2008, documents these thresholds. For a regular table, statistics are out of date when:

o The table size has gone from 0 to >0 rows (test 1).

o The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then (test 2).

o The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered (test 3).

The Sales.SalesOrderDetail table has 121317 rows:

(121317 * 0.20) + 500 = 24764

The bulk insert below loads 24775 rows, which should be enough to invalidate statistics. Download the .txt file for the bulk insert here.

BULK INSERT AdventureWorks2012.Sales.TestSalesOrderDetail
FROM 'C:\SQLStuff\Statistics\Data\sod.txt'
WITH
(
DATAFILETYPE = 'native',
TABLOCK
);
(24775 row(s) affected)

After the bulk load completes, re-run the query against sys.dm_db_stats_properties and review the output:

stats auto update 2 1024x82 Understanding When Statistics Will Automatically Update

sys.dm_db_stats_properties after the bulk import

The statistics have not updated, but the modification counter has changed, as expected. The statistics are now out of date based on the threshold defined previously, and we would expect that a query or data modification against Sales.TestSalesOrderDetail would trigger an update of statistics. But before we try that, let’s review what causes the automatic update.

The aforementioned article states:

The statistics auto update is triggered by query optimization or by execution of a compiled plan, and it involves only a subset of the columns referred to in the query.

When a query is first compiled, if the optimizer needs a particular statistics object, and that statistics object exists, the statistics object is updated if it is out of date. When a query is executed and its plan is in the cache, the statistics the plan depends on are checked to see if they are out of date. If so, the plan is removed from the cache, and during recompilation of the query, the statistics are updated. The plan also is removed from the cache if any of the statistics it depends on have changed.

To be clear, if a query plan exists in cache and that plan uses specific statistics, when the query executes SQL Server checks to see if any of the statistics used in the plan are out of date. If they are, then the automatic update of those statistics occurs.

If a plan does not exist in cache for a query, then if the optimizer uses a statistics object that is out of date when the plan compiles, SQL Server will automatically update those statistics.

Invoking the Automatic Update

We have not run any query against Sales.TestSalesOrderDetail except our bulk insert. At the time that the query compiled for the bulk insert, no statistics for Sales.TestSalesOrderDetail were out of date; therefore no statistics required an automatic update.

Now let’s issue an update against Sales.TestSalesOrderDetail that will change the ProductID for a specific SalesOrderID, and then query sys.dm_db_stats_properties:

UPDATE Sales.TestSalesOrderDetail SET ProductID = 717 WHERE SalesOrderID = 75123;
GO
(3 row(s) affected)

Output:

stats auto update 3 1024x82 Understanding When Statistics Will Automatically Update

sys.dm_db_stats_properties after the update

We can see that the PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID statistic object automatically updated. We could simply assume that the optimizer used this object in the plan. However, in SQL Server 2012 we can look at the plan XML and confirm this.

In his post, Statistics used in a cached query plan, Fabiano Neves Amorim describes a method to capture statistics information from the plan. Please note the following:

  • This information is only available in SQL Server 2012 and higher.
  • This query requires use of an undocumented trace flag. I do not recommend using this trace flag in a Production environment.
  • This query interrogates the plan cache. The plan cache may be very large for your system, depending on the amount of memory on the server. Before querying the plan, I recommend setting the transaction isolation level to READ UNCOMMITTED, and also recommend using OPTION (MAXDOP 1) to limit CPU utilization. The query may take longer to execute, but it reduces the impact on other queries executing concurrently.

If we query the plan cache using Fabio’s query, we get the following information:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
GO
DBCC TRACEON (8666);
GO
WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' as p)
SELECT qt.text AS SQLCommand,
qp.query_plan,
StatsUsed.XMLCol.value('@FieldValue','NVarChar(500)') AS StatsName
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) qt
CROSS APPLY query_plan.nodes('//p:Field[@FieldName="wszStatName"]') StatsUsed(XMLCol)
WHERE qt.text LIKE '%UPDATE%'
AND qt.text LIKE '%ProductID%';
GO
DBCC TRACEOFF(8666);
GO

Output:

stats auto update 4 1024x47 Understanding When Statistics Will Automatically Update

statistic object used in the query plan

The output confirms that the optimizer used the PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID statistic object in the query plan, and because it was out of date, SQL Server automatically updated it.

If you are running earlier versions of SQL Server, you can see what statistics the optimizer loads using the undocumented trace flags 9292 and 9204. Paul White has an excellent post, How to Find the Statistics Used to Compile an Execution Plan, which I recommend reading if you’re interested.

One thing I want to point out: the IX_TestSalesOrderDetail_ProductID statistic, which has ProductID as its key, did not update automatically when the UPDATE query executed. This is expected. Even though the statement modified ProductID for three rows (the modification counter for the statistic increased from 24775 to 24778), the optimizer did not use that statistic in the plan. If the plan does not use a statistic, the statistic will not automatically update, even if the query modifies columns in said statistic key.

Summary

As I stated initially, I wanted to write this post to clear up any confusion surrounding automatic updates for statistics. Even if a statistic becomes outdated as the result of a modification, it will not automatically update after the modification completes. The statistic will automatically update the next time a query plan uses it.

To reiterate (yes, this might be overkill), SQL Server will automatically update a statistic when:

  • A query compiles for the first time, and a statistic used in the plan is out of date
  • A query has an existing query plan, but a statistic in the plan is out of date

For those that attended my session at Summit, I hope this helps address any questions you might have still had. If not, please leave a comment and I will get back to you!

Trending Database Growth From Backups

When I start working with a client, one question I always ask is whether they are collecting baselines of their SQL Server environment (a shocker, I know). If they are not, I explain why it’s a good idea to start capturing them. And even though I think it’s an easy argument to make, I find I make a better case when I have data to back it up. But how do you make the argument for baseline data, when you don’t have any real data to show?

There is data in SQL Server that you can mine; you just have to know where to find it. If I look at a client system and notice that maintenance tasks keep taking longer and longer, then I might assume it’s due to database growth. Now, if it’s just database integrity checks that are taking longer and longer, that might be a sign that something is wrong. However, that’s out of scope for this post, so let’s stick with the assumption that the database is growing larger over time because data is rarely deleted, only added. Depending on the client’s current storage and the duration of the tasks, I may have some concerns about how much disk space they’re going to need down the road. I really want to trend database growth, among other things, over time. And one way I can approximate growth is by using information from full backups.

When you backup a database, every page that is allocated in the database is copied to the backup. This means you could have a 100GB database with a backup of only 50GB, because only 50GB’s worth of pages are allocated. If my database files are pre-sized, as they hopefully are, then looking at backup size will not tell me anything about the current size of the database. However, it will tell me about the growth of it – which is really what I’m after.

Backup information is stored in msdb, and while it should be removed on a regular basis via a scheduled maintenance task, it is not unusual for at least three to six months of data to exist, if not more. Everything I need for this example I can capture from dbo.backupset, which has one row for every successful backup operation. Here’s my query*:

SELECT
[database_name] AS "Database",
DATEPART(month,[backup_start_date]) AS "Month",
AVG([backup_size]/1024/1024) AS "Backup Size MB",
AVG([compressed_backup_size]/1024/1024) AS "Compressed Backup Size MB",
AVG([backup_size]/[compressed_backup_size]) AS "Compression Ratio"
FROM msdb.dbo.backupset
WHERE [database_name] = N'AdventureWorks'
AND [type] = 'D'
GROUP BY [database_name],DATEPART(mm,[backup_start_date]);

In this query I’m filtering on a specific database, and I’m only looking at full backups (type = ‘D’). Log backups would be interesting to examine as well, but that’s for another post. I’m also aggregating all the full backups for one month. Whether you’re running full backups daily or weekly, I would recommend aggregating the data by month. Trying to look at the changes day-by-day or even week-by-week is too detailed. We want to look at the big picture, and a monthly summary gives us that. Here’s the output for my AdventureWorks database:

output thumb Trending Database Growth From Backups

Notice that the backup size increases over time, but it’s not linear. If I graph it in Excel, I can really see the trend:

image thumb Trending Database Growth From Backups

Further analysis is natural from this point on – what’s the percent increase each month? Each quarter? Which month had the largest increase? When is the database going to fill up the storage we have allocated currently? In my case, I just want to be able to show that we can get this kind of information, plus a lot more, from SQL Server if we just capture it. And this data supports my point very well. If you want to dig deeper into database growth analysis, I say run with it. J

Hopefully you now see how easy it to use data from SQL Server to make your life easier: the information the above query provides can help you understand database growth and start basic capacity planning. I also hope this information helps to convince you (or your manager) that collecting baseline data can be extremely beneficial, and now’s the time to start. If you need more background, or some queries to get you started, please check out my Baselines series on SQL Server Central. Good luck!

EDIT: *For those of you running SQL Server 2005 and below, you will need to exclude compression information:

SELECT
[database_name] AS "Database",
DATEPART(month,[backup_start_date]) AS "Month",
AVG([backup_size]/1024/1024) AS "Backup Size MB"
FROM msdb.dbo.backupset
WHERE [database_name] = N'AdventureWorks'
AND [type] = 'D'
GROUP BY [database_name],DATEPART(mm,[backup_start_date]);

DBCC CHECKDB Parallel Checks and SQL Server Edition

It’s been a few weeks since the PASS Summit but I’m still getting the occasional email from people who attended one of my sessions. I consider this a good thing – I believe that if someone follows up with me a few weeks after they attended a session, then something I said stuck with them and it was worth their time to follow up. Some people have had questions about something I said during a session, and others are following up on discussions we had during the week.

I had one question about minimizing the resources used by DBCC CHECKDB, and one of the suggestions I provided was to reduce MAXDOP when CHECKDB was running, to limit the number of processors used by CHECKDB. However, I want to point out that CHECKDB only runs in parallel on Enterprise Edition. You may be aware of this already, as Paul mentions it in one of his posts, and also in the SQL Server 2008 Internals book. But, as I was perusing the DBCC CHECKDB entry online, I discovered that it contains incorrect information. From BOL (emphasis mine):

“Checking Objects in Parallel

By default, DBCC CHECKDB performs parallel checking of objects. The degree of parallelism is automatically determined by the query processor. The maximum degree of parallelism is configured just like parallel queries. To restrict the maximum number of processors available for DBCC checking, use sp_configure. For more information, see Configure the max degree of parallelism Server Configuration Option. Parallel checking can be disabled by using trace flag 2528. For more information, see Trace Flags (Transact-SQL).”

The first sentence is incorrect. DBCC CHECKDB does not perform parallel checking of objects in Standard Edition. At all. CHECKDB is always single-threaded in Standard Edition and I confirmed this with the SQL Server development team.

For Enterprise Edition, CHECKDB respects the maximum degree of parallelism setting configured for the instance. If you have MAXDOP set to 0 for the instance, but want to restrict CHECKDB to only use 2 processors, you need to either change MAXDOP for the instance to 2, or you need to use Resource Governor (also an Enterprise-only feature).

I did enter a Connect item for this, and hopefully the documentation will be fixed soon. I hope this clears up any confusion that may exist.

What Checks Update dbccLastKnownGood?

One of the methods I’ve used to check when a client last ran DBCC CHECKDB on their database is the dbccLastKnownGood value on the boot page (page 9 in the PRIMARY file).  When working with a client a couple weeks ago, the value for dbccclastknowngood was from 2006, but the customer stated they were running integrity checks.  When I investigated their method for checks, I discovered that due to the database size they were running DBCC CHECKALLOC and DBCC CHECKCATALOG weekly, and checking different user tables with DBCC CHECKTABLE at night.  Breaking out the checks between the different commands does not update dbccLastKnownGood – running CHECKDB is the only thing that does.  Let’s have a look…

I have a copy of the AdventureWorks2012 database installed, and I can look at the boot page using DBCC PAGE or DBCC DBINFO.  Both commands are undocumented, but as Paul notes in the links provided, they are safe to run.  I’ll use DBINFO:

DBCC TRACEON (3604);
GO
DBCC DBINFO ('AdventureWorks2012');
GO

The text below is a snippet of the output to show only what’s relevant:

dbi_dbmOldestXactLsn = 0:0:0 (0×00000000:00000000:0000)                  dbi_CloneCpuCount = 0

dbi_CloneMemorySize = 0             dbi_updSysCatalog = 1900-01-01
00:00:00.000

dbi_LogBackupChainOrigin = 0:0:0
(0×00000000:00000000:0000)             

dbi_dbccLastKnownGood = 1900-01-01 00:00:00.000                          dbi_roleSequence
= 0

dbi_dbmHardenedLsn = 0:0:0 (0×00000000:00000000:0000)                    dbi_localState = 0

dbi_safety = 0                      dbi_modDate = 2012-11-15
09:42:47.813

dbi_verRDB = 184552376              dbi_lazyCommitOption = 0           

dbi_svcBrokerGUID = 10048a46-646c-4c44-8037-8ea6c83f4e2c                 dbi_svcBrokerOptions =
0×00000001

If we run DBCC CHECKALLOC, DBCC CHECKCATALOG, and DBCC CHECKTABLE for all system and user tables, you can see that dbccLastKnownGood good does not change (please don’t holler because I’m using a cursor…it does what I need for this example!):

DBCC CHECKALLOC ('AdventureWorks2012') WITH ALL_ERRORMSGS, NO_INFOMSGS;
GO

DBCC CHECKCATALOG ('AdventureWorks2012');
GO

USE [AdventureWorks2012];
GO

DECLARE @schemaname varchar(500);
DECLARE @tablename varchar(500);
DECLARE @fulltable varchar(1000);
DECLARE @string varchar(5000); </span>

DECLARE TableList CURSOR FOR
SELECT [ss].[name],[so].[name]
FROM [sys].[objects] AS so
JOIN [sys].[schemas] AS ss ON [so].[schema_id]=[ss].[schema_id]
WHERE [so].[type] in ('S','U')
ORDER BY [so].[name]; 

BEGIN
OPEN TableList

FETCH NEXT FROM TableList
INTO @schemaname, @tablename;

WHILE @@FETCH_STATUS = 0>
BEGIN
SET @fulltable = @schemaname + '.' + @tablename;
SET @string = N'DBCC CHECKTABLE (''' + @fulltable + ''') WITH ALL_ERRORMSGS, NO_INFOMSGS;'
EXEC (@string)
FETCH NEXT FROM TableList
INTO @schemaname, @tablename;
END 

CLOSE TableList;
DEALLOCATE TableList;
END

<sDBCC DBINFO ('AdventureWorks2012');
GO

Output:

dbi_dbmOldestXactLsn = 0:0:0 (0×00000000:00000000:0000)                  dbi_CloneCpuCount = 0

dbi_CloneMemorySize = 0             dbi_updSysCatalog = 1900-01-01
00:00:00.000

dbi_LogBackupChainOrigin = 0:0:0
(0×00000000:00000000:0000)             

dbi_dbccLastKnownGood = 1900-01-01 00:00:00.000                          dbi_roleSequence
= 0

dbi_dbmHardenedLsn = 0:0:0 (0×00000000:00000000:0000)                    dbi_localState = 0

dbi_safety = 0                      dbi_modDate = 2012-11-15
09:42:47.813

dbi_verRDB = 184552376              dbi_lazyCommitOption = 0           

dbi_svcBrokerGUID = 10048a46-646c-4c44-8037-8ea6c83f4e2c                 dbi_svcBrokerOptions =
0×0000000

No change in dbccLastKnownGood.  What happens if we run DBCC CHECKFILEGROUP?  I created a second filegroup in the AdventureWorks2012 database and created one table in the filegroup, then ran the check on that filegroup:

ALTER DATABASE [AdventureWorks2012]
ADD FILEGROUP [TestFG]; 

ALTER DATABASE [AdventureWorks2012]
ADD FILE ( NAME = N'TestFile', FILENAME = N'D:\Databases\SQL2012\AdventureWorks2012\TestFile.ndf' ,
SIZE = 4096KB , FILEGROWTH = 1024KB )
TO FILEGROUP [TestFG];

CREATE TABLE [dbo].[test] (
[col1] INT,
[col2] INT
)
ON [TestFG]; 

INSERT INTO [dbo].[test] (
col1,
col2
)
VALUES
(1,2),
(3,4); 

DBCC CHECKFILEGROUP (2) WITH ALL_ERRORMSGS, NO_INFOMSGS;
GO
DBCC DBINFO ('AdventureWorks2012');
GO

Output:

dbi_dbmOldestXactLsn = 0:0:0 (0×00000000:00000000:0000)                  dbi_CloneCpuCount = 0

dbi_CloneMemorySize = 0             dbi_updSysCatalog = 1900-01-01
00:00:00.000

dbi_LogBackupChainOrigin = 0:0:0
(0×00000000:00000000:0000)             

dbi_dbccLastKnownGood = 2012-11-15
12:25:34.090                         
dbi_roleSequence
= 0

dbi_dbmHardenedLsn = 0:0:0 (0×00000000:00000000:0000)                    dbi_localState = 0

dbi_safety = 0                      dbi_modDate = 2012-11-15
09:42:47.813

dbi_verRDB = 184552376              dbi_lazyCommitOption = 0           

dbi_svcBrokerGUID = 10048a46-646c-4c44-8037-8ea6c83f4e2c                 dbi_svcBrokerOptions =
0×0000000

Look at that, it updated.  And finally, we know that CHECKDB will update the value, but what happens if we include the WITH PHYSICAL_ONLY option, where the checks are not quite as involved? (See Paul’s post on check options for VLDBs for additional notes on WITH PHYSICAL_ONLY.)

DBCC CHECKDB ('AdventureWorks2012') WITH PHYSICAL_ONLY;
GO
DBCC DBINFO ('AdventureWorks2012');
GO

Output:

dbi_dbmOldestXactLsn = 0:0:0 (0×00000000:00000000:0000)                  dbi_CloneCpuCount = 0

dbi_CloneMemorySize = 0             dbi_updSysCatalog = 1900-01-01
00:00:00.000

dbi_LogBackupChainOrigin = 0:0:0
(0×00000000:00000000:0000)             

dbi_dbccLastKnownGood = 2012-11-15
12:45:21.173                         
dbi_roleSequence
= 0

dbi_dbmHardenedLsn = 0:0:0 (0×00000000:00000000:0000)                    dbi_localState = 0

dbi_safety = 0                      dbi_modDate = 2012-11-15
09:42:47.813

dbi_verRDB = 184552376              dbi_lazyCommitOption = 0           

dbi_svcBrokerGUID = 10048a46-646c-4c44-8037-8ea6c83f4e2c                 dbi_svcBrokerOptions =
0×0000000

The dbccLastKnownGood value does update, as it does with CHECKDB without the PHYSICAL_ONLY.  Therefore, if you’re relying on dbccLastKnownGood to tell you when CHECKDB was last run, realize it may not give you the entire story.

Note: post edited 11/16/2012 to replace screen shots of DBCC DBINFO output with text, as the images did not render well in all browsers.

T-SQL Tuesday #36: What Does Community Mean?

 

tsql2sday150x150 thumb T SQL Tuesday #36: What Does Community Mean? I find this month’s T-SQL Tuesday topic quite appropriate, as it follows the 2012 PASS Summit, a time when I get to see so many members of the SQL Server Community.  Chris Yates (@YatesSQL) is hosting this month and asks, “What Does Community Mean to You?”  My short answer? #sqlfamily, #sqlhelp and opportunity.

When I think of the SQL Community, it is the people that come to mind – so many people that I have met and become friends with, and so much of it is due to Twitter.  When I think of the SQL Community, I think of Twitter.  I do.  It may seem crazy, and I know that there is more to the community than conversations on Twitter – there is the Summit, there are User Groups, there are SQLSaturdays, there are blogs – the community encompasses so much.  But how do I communicate with everyone?  So frequently, so quickly?  How I have met so many people from all over the US?  All over the world?  Twitter.

Our community is incredibly unique.  None of my friends or family have this kind of cohort – they don’t have the friendships that span the globe.  They don’t regularly converse with colleagues all around the world, working together to answer questions and solve problems.  I’ve lost so many hours of sleep due to chats with Rob Farley (I curse that 17.5 hour time difference!) and Jonathan Kehayias (I’m not convinced that he sleeps) but it’s all worth it.

I have found some really close friends in the Community.  Friends that I would spend time with even if they didn’t work with SQL Server.  Friends that I know I can call or email at any time for anything.  Just over a year ago I blogged about #sqlfamily, and what I wrote then still holds true today.

The SQL Community is also about helping others.  The generosity of individuals continually astounds me.  So many members of the community volunteer for PASS, helping to make the organization so successful.  Many people help others solve technical problems via the #sqlhelp hashtag on Twitter.  And when it’s too much for 140 characters, blog posts are written, questions posted to forums or emails exchanged.  I’ve seen it happen to others, it’s happened to me.  The people of this community share information and provide help, and then celebrate success.

And finally, the SQL Community can provide significant opportunities for people.  The opportunity to meet a favorite blogger at Summit, the opportunity to talk to someone at Microsoft to understand how it really works, the opportunity to meet someone who will become that friend you call when you have life changing news.  And yes, opportunity can be a job opening.  How many jobs have been found through the community?  Too many to count.  I found my team through the SQL Community.  If I were not involved, I do not believe I would have the job I have today.  I love what I do.  I work with five amazing individuals.  People that I respect personally and professionally.  My job, just like the Community, is exhausting, rewarding, and a whole lot of fun.

If you’re not involved, I recommend you think about it.  You never know where it will take you.  And for those of you in this community – I thank you for making it what it is today, and I cannot wait to see where we go in the future.

PASS Summit 2012: Day 2

Happy second day of Summit!  Holy buckets, yesterday was a busy day.  But it was great.  After the keynote I headed off to present my DBCC session which went well.  Thank you to everyone that attended and to all of those who had questions and provided feedback.  I am presenting again today, same time (10:15 AM PST) and same room (618-620), but this time it’s Demystifying Database Statistics.  Statistics is another of my favorite topics, I expect the time will fly by just as it did yesterday.

I just finished my PASS TV interview which was short but fun and I am at the blogger’s table again this morning, quite curious to see what the keynote brings.  I know we will hear from Douglas McDowell, Thomas LaRock and Quentin Clark.  Douglas and Thomas are Vice Presidents on the PASS Board of Directors and Quentin is Corporate Vice President, SQL Program Management at Microsoft.

And we’re off…

8:22 AM Douglas takes the stage first and mentions that he ran into many people yesterday who could not decide what session to attend for a time slot, because there were multiple that were of interest.  Douglas’ suggestion?  Buy the DVDs.  It’s a good recommendation, although remember that if you’ve attended the Summit, you’ll also have access to the sessions online.  If you’ll want to watch the sessions when you’re not on the internet, you’ll need to buy the DVDs.

Douglas is talking about finance, as that’s his role, and points out that this year they’ve hired an additional Community Evangelist.  Karla Landrum was the first Community Evangelist, but she has been so busy that another person was needed.  This past quarter Niko Neugebaur was hired and I know he is thrilled about joining the team.  Karla and Niko provide amazing support to the SQL Server community – if you see one of them this week please say thank you!

Douglas has spent some time talking about the revenue and expenses for PASS. The Summit is the largest revenue generator for PASS – not surprising with over 3000 attendees here this week.

8:35 AM Tom LaRock takes the stage and he gets to announce the PASSion Award Winner, who is Jen Stirrup.  Congratulations Jen!!

Next year’s Summit will be in Charlotte, NC, from October 15-18. Early registration is already available at a very discounted rate.

Tonight is the Community Appreciation Party at Seattle’s Experience Music Project and it starts at 7 PM (ends at 10 PM).  I have never been to EMP and look forward to checking it out!

8:55 AM Quentin Clark has taken the stage and now we’re getting a demo of some new PDW functionality.  I know that big data is a hot topic and of interest right now…but I’m still an engine girl at heart.  Julie Strauss from Microsoft is on stage for a demo, and she’s talking about movie data (I want that data).  There’s a side conversation with Quentin – there’s obviously an ongoing joke about how it’s hard for her to pronounce some words (she’s a Viking).  And then she says, “We don’t talk we just do.”  Love it.

Ok, I need to head off to get ready for my session.  Did I mention that I went to Bob Ward’s session on the SQLOS yesterday?  It was three hours of fantastic technical knowledge icon smile PASS Summit 2012: Day 2

PASS Summit 2012: Day 1

Happy first day of Summit everyone!  Ok, if you attended any pre-conference sessions then today probably does not feel like the first day, but it’s the first day with a keynote and general sessions.  My day started with a 3 mile-ish #sqlrun down along the waterfront.  It was a great way to kick off the day, huge thanks to my roommate Jes Borland for organizing!  Today I present my DBCC Commands: The Quick and the Dangerous session at 10:15 AM PST in room 618-620 and I hope to see you there!

I was fortunate to have been selected to sit at the blogger’s table this week, so I will post through the keynote sessions today and tomorrow.  There is no keynote session on Friday, but I plan to attend Dr. DeWitt’s session, Big Data Meets SQL Server, and will try to live blog that as long as I have decent connectivity.

For those of you not in Seattle, don’t despair!  You can watch this morning’s keynote, and catch many other sessions (yes, actual sessions!) and events from the Summit on PASS TV.  I kid you not: PASS TV.  Open up your favorite browser and go to the main PASS TV page and you can connect from there.  I will be on PASS TV tomorrow morning before the keynote so make sure to tune in!  I wonder if they will have someone to do my hair and make up?  Can I get a rider that includes coffee and M&Ms?  icon wink PASS Summit 2012: Day 1

And if you find the PASS site a little busy, try the Microsoft SQL Server site, which will also live stream today and tomorrow’s keynotes, and Dr. DeWitt’s session on Friday.  If you’re reading this later in the day, check out the main page for the Summit to catch up on what you missed.

Finally…before we start, check out what I noticed on the back of my Summit Program Guide:

photo thumb PASS Summit 2012: Day 1

 

 

 

 

 

 

 

 

Interesting stuff!  A Business Analytics conference hosted by PASS in April – the PASS teams are definitely keeping busy.  I will post more details as I have them.

What’s on tap for today?  I know we will hear from PASS President Bill Graziano, as well as Ted Kummert, Corporate Vice President, Data Platform Group, Microsoft.  Now, I have also been told that there is a huge announcement coming about SQL Server.  This is not a shocker, there is always a huge announcement during the keynote.  Any guesses?  It will probably be up first, so stay tuned! 

8:15 AM – lights have dimmed, and here we go!  Testimonials from various members of the BoD are played first (I am a sucker for video montages, BTW)…

8:18 AM – Bill Graziano takes the stage to welcome everyone.  There are 3894 Summit attendees this year, at the 14th Summit, with 57 different countries represented.  Bill’s highlighting PASS TV and the fact that it will stream the keynotes, session and interviews with different members of the community.  There are 120,000 members of PASS, and Bill has challenged everyone to speak in the next year.  If that’s something that interests you – I encourage you to reach out to local speakers (whether you know them personally or not) and ask for help getting started.  As a speaker, I would be more than willing to work with someone to help start presenting within the SQL Server community.  Bill mentions that on Friday there will be a Board of Directors Q&A in room 401 at 9:45 AM.

8:24 AM – Bill mentions the Virtual Chapters – woohoo!  We are up to 20 different Virtual Chapters which is fantastic.  They are a great resource for members of the community and I recommend checking them out.  And of course, in an addition to Local Chapters, there are SQLSaturday opportunities for training.  There have been 79 SQLSaturdays this year, with many of them held overseas.  PASS is growing – in just the couple years I’ve been involved, I am amazed at the increased number of events and members.

Bill confirms the Business Analytics Conference in April!  Registration is now open…but I cannot find the site…yet…

Hm, Bill mentions an amendment to the by-laws that would designate certain spots on the board based on location around the world.  Look for a proposal for these changes in the next 60 days. 

Bill thanks the sponsors for Summit.  As I mentioned before, definitely stop by the Exhibit Hall if you can.  It’s a great place to try out technology and software that might help your SQL Server solution, and again, the sponsors help make the Summit happen.  Bill highlights Microsoft’s involvement at Summit, which includes the SQL Server Clinic, hosted by the Customer Advisory Team (CAT), Developer Chalk Talks, Hands on Labs, Certification Training, Focus Groups and Solutions Theatre.  There are over 300 engineers from Microsoft here at Summit.  If you’re having technical issues in your solution, I highly recommend taking advantage of Microsoft’s presence here at Summit.

8:39 AM – Ted Kummert takes the stage.  I’m going to guess the big announcement is coming soon, and I’m thinking it’s going to be about Hekaton, and you find more information here as well.  Ted mentions that SP1 for SQL Server 2012 is now available.  There was no applause from the crowd.  Interesting.  Ted’s talking about big data and approaching the tipping point (which of course reminds me of Kimberly Tripp and statistics…stay focused on big data, no stats!). 

Ted’s talking about how if everything fits in memory, and you design around that assumption, huge transformations in business process are possible.  It’s one thing to accelerate queries, but if you want to accelerate business process, you have to consider all workloads involved.  Desktop all the way to the data warehouse.  You have to think about the entire architecture as it relates to the business process.  Today they’re announcing an in-memory transactional capability to SQL Server – and there you go, Ted says Hekaton and that it will ship in the next major release of SQL Server.  There’s the big announcement kids.

8:54 AM – Let’s see some demos!  Let’s see what Hekaton does when you have a highly transactional, in memory database.    We’re starting with OLTP…demo processes a bunch of sales orders.  Starts around 2000 transactions/sec, with CPU around 33% but there’s a ton of latches.  Switching over to Hekaton (note, what we’re seeing is NOT the final experience).  You can optimize based on table…SalesOrderDetail gets converted so it can run in memory with no application changes.  But all the data for the SalesOrderDetail table and indexes are in memory.  Running the tool again, looking at performance increase and it’s 10x improvement (with no change in the application or hardware).  Using more CPU (near 90%) but no latches with 10,000 transactions/sec.    Hekaton can also look at code (like Stored Procedures) to see if they can be optimized.  The tool picks a specific SP to optimize – so we migrate it over to Hekaton which recompiles so the SP runs native in memory (no code changes).  And now…performance is over 50,000 transactions/sec (but we don’t get to see CPU).  Oh, here comes information about Columnstore Indexes…  Before SQL 2012 there were lots of data warehouse queries that would result in full table scans.  Columnstore Indexes provide huge performance gains (taking a one plus minute query down to a second).  And in the next major release of SQL Server these indexes will be UPDATABLE (woohoo this is HUGE!) AND they can be CLUSTERED.  Ohhh, shiny.  Ohhhh, the opportunity to incorrectly architect.  Just because you can create the updatable Columnstore Indexes and use Hekaton doesn’t mean you still don’t have to consider schema design.  You still need to architect!

9:03 AM – Customer testimonial from BWIN, who has been working with Hekaton – they are largest online gaming site in the world and require the best technology speed to stay at the top of their game.  First test run took transactions/sec from 15,000 to over 250,000.  Holy cow.  The next release of PDW will be available in the first half of 2013.  Time for a demo of PDW from Christian Kleinerman…and he logs in as sa.  *sigh*  But he’s funny.  The changes in PDW take a query with a terabyte of data from 2 minutes to seconds.  The technologies used here are Columnstore and PDW.  Now Christian is talking about PolyBase, the ability to query Hadoop data through an interface in PDW.  Interesting.

9:24 AM – PolyBase is also going to be in the next release of PDW – it is built for big data – along with xVelocity and Columstore improvements.

9:32 AM – Amir Netz takes the stage for some BI demos.  I love Amir’s demos because you uses movie data.  You all know how much I love movies.  I really, really want a copy of the database he uses…  Oh, and Amir is funny, and he talks fast, and he is dynamic, and has a great accent.  He is fun to watch on stage (I know that’s not technical, but I appreciate speakers who keep the audience engaged).  I do love Amir’s demos, though I know so little about PowerView, BI and so on.

9:45 AM – ok friends, I have a session in a half hour so I am ducking out to get ready.  I’ll try to update a bit more later, but otherwise, catch you tomorrow morning!  And a quick reminder that tonight is the Exhibitor Reception, which starts at 6:15 PM and goes until
9:15 PM.  Again, lease take time to visit the sponsors throughout the week – they
provide amazing support for the PASS Summit and for numerous other events in the
SQL Server Community throughout the year.