T-SQL Tuesday #41: Presenting and Loving It

I’ve been on a TSQL Tuesday hiatus, which was completely unintentional.  When I read Bob Pusateri’s topic for this month I knew I had to end my sabbatical and get back in the swing of these posts.  Bob’s question was, “How did you come to love presenting?”  Oh Bob…how much time do you have?  :)

It goes back to graduate school.  I’ve blogged before about my mentor, Susan Brown, and in my original homage I mentioned that I would not be the speaker that I am today, were it not for her.  And I said that then, and still believe it now, because she found and created opportunities for me to speak publicly, and she provided feedback and encouragement – two things absolutely vital for any speaker to improve and succeed.

During my first year of graduate school the School of Kinesiology held a research day, designed to bring the entire department together to showcase our research efforts.  It’s very easy to have no idea what other researchers are doing not just within the University, but even within a small department like Kinesiology.  The idea was to explain our research, what we’d learned, and share ideas.  I gave a 10 minute session on the research we were doing with botulinum toxin (yes, Botox before it was cool for cosmetic reasons) and its effects on upper limb function in children with spasticity.  I was terrified.  I had spoken in front of groups before – I took a Communications (read: public speaking) class my junior year, I was a leader in my sorority (yes, you read that right) and spoke often, and I had done campus tours during my senior year (Bob has a great story about tours in his post).  But speaking to hundreds of people, who were my peers and professors?  That was a whole new ballgame.

I can’t remember how many slides I created, at least 10, before Susan told me that she typically used one slide for each 10 minutes of a talk.  I remember thinking she was crazy…talking for 10 minutes in front of the entire department (and many other researchers from different areas of the University) seemed like an eternity.  [What’s ironic is that I can’t always finish recording a SQLskills Insider Video in less than 10 minutes these days.]

At any rate, I remember standing at the front of the room in the Michigan League Ballroom feeling incredibly uncomfortable.  Not only were there hundreds of people there, but I was wearing a dress (if you know me, you’re laughing).  I made it through my 10 minutes with one slight timing issue – I had someone play a video, which taught me the importance of having the entire presentation under my control – and I cannot remember if it was great or horrible.  But I didn’t walk away thinking, “I’ll never do this again.”

Soon after Susan asked if I would like to take over teaching the Motor Control portion of the introductory Movement Science course required for all Kinesiology students.  The course was broken into three sections, Motor Control, Biomechanics and Exercise Physiology, with students rotating between the sections and a different instructor for each.  This meant I would teach the same material three times in a semester, which sounds boring but was ideal as a first time instructor.  And I would get paid.  I said yes.

Susan gave me all of her materials, and I converted all of her overheads (yes, overheads) to PowerPoint.  Then I started reading.  While I had taken the same class myself as a sophomore, had taken many advanced Motor Control classes since then, and was getting a master’s degree in Motor Control, teaching the course was something else entirely.  You have to know the information at a different level.  During those early days I often thought of the learning model followed in medicine, “See one, do one, teach one.”  I’d learned it, I’d been tested on it, now it was time for me to teach it.

Some may state that teaching is not the same as presenting.  If you get down into the details, that’s probably true, but it’s not a topic I want to debate here.  To me, they are one and the same.  For me, when I present at a User Group, a SQLSaturday or a conference like the PASS Summit, I am teaching.

And that is what I love: teaching.  And I realized it in graduate school, when I was teaching that introductory Movement Science course.  It happened in the first semester, in the very first rotation.  I cannot remember the name of the student, but she grasped what I was teaching, she understood.  She asked questions during class, she took notes, and she aced the quizzes and the test.  I taught, she learned.  That was pretty cool.

Now…do I believe that I had that much of an impact on her learning?  No.  Do I believe that if I weren’t such a fantastic teacher that she wouldn’t have done so well?  Absolutely not.  She was a smart kid, motivated, and interested in the material.  She would have done well no matter what.  But in those few weeks I realized that I had something to teach those who wanted to learn, and I realized that I wanted to be good at teaching – for them.

As a student, I appreciated good instructors.  Not every instructor is fully “on” every single day – teaching is hard, and the semester is long.  But there were many instructors whose classes I enjoyed, not just for the material, but for the way they explained it.  Susan was that type of instructor.  I wanted to be that type of instructor.  So I worked at it.  For some, teaching and presenting come naturally.  For many, we have to work at it.  And to work at it, you practice.  I taught that same section of that same course for two years.  Yes, 12 times.  But that experience established a foundation upon which I’ve been building ever since.

In my first technology job I wore many hats, and one of them was software trainer.  In my next job, I sought out opportunities to teach others, and eventually, I found the SQL Community and realized that I could present at User Groups, SQLSaturdays and conferences, like so many others.  And here I am.  I still love teaching, I love it when you see the light bulb go on for someone. I love it when you hear that someone took what they learned and applied it to their environment, and then learned something new.  And I really appreciate it when attendees come back and tell me what they learned – as I have not seen every use case and am always, always learning myself.

One of the things that I value most about SQL Server is that it’s vast and it’s always changing.  As such, my learning never ends, and the opportunity to find new things to teach never ends.  As my good friend Allen White ( b | t ) always says, “I can learn from all of you” (and that means you, dear reader).  If you want to share what you learn, I encourage you teach.  Don’t think of it as presenting – that word can be scary.  Think of it as teaching.  Everyone has great and interesting experiences.  Learn how to tell a story, and share what you know.

SQL Server Maintenance Plans and Parallelism – Index Rebuilds

In my previous post, SQL Server Maintenance Plans and Parallelism – CHECKDB, we looked at the degree of parallelism used when CHECKDB is run.  It ultimately depends on SQL Server Edition and the max degree of parallelism setting for the instance, which is not the case for index rebuilds (today’s topic, as you probably surmised!).

Index Rebuilds

The max degree of parallelism can be configured for index rebuilds using WITH (MAXDOP = n):

USE [AdventureWords2012];
GO

ALTER INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
     REBUILD WITH (MAXDOP = 8);
GO

If this option is included, it overrides the max degree of parallelism value configured for the instance. For example, I can rebuild the IX_SalesOrderDetail_ProductID index on Sales.SalesOrderDetail with MAXDOP set to 8, even though MAXDOP is set to 4 for the instance.  If WITH (MAXDOP = n) is not specified for an ALTER INDEX … REBUILD statement, then SQL Server will use the MAXDOP value set for the instance.

Now, unfortunately, parallel index operations are only permitted in Enterprise Edition.  If you’re running Standard Edition, you’re stuck with single threaded rebuilds, just like you’re stuck with single threaded integrity checks.  Despite this sad news, I thought I’d run through a demo that shows the max degree of parallelism used during the index rebuild. I’m going to run ALTER INDEX REBUILD for a selected index in the AdventureWorks2012 database, and I’ll use Extended Events to capture each statement executed (sp_statement_completed event), and the actual query plan for the statement (query_post_execution_showplan event).

**Important note here again: it is NOT recommended to capture the query_post_execution_showplan event against a live, production system.  This event generates significant performance overhead, and you are warned of this when configuring the session via the GUI.  If you repeat any of the demos here, please make sure to execute them against a test environment.  It’s very important to me that you do not bring down your production environment.**

Here are the statements to create the event session, start it, run the ALTER INDEX … REBUILD statements, then stop the event session.  As in my previous post, I am using a file target to capture the output, and the path is C:\temp.  You may need to modify this path for your environment.  I still have max degree of parallelism set to 4 for my instance, but we’ll set it before we run anything just for good measure.

sp_configure 'show advanced options', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
sp_configure 'max degree of parallelism', 4;
GO
RECONFIGURE WITH OVERRIDE;
GO

CREATE EVENT SESSION [CapturePlans] ON SERVER
ADD EVENT sqlserver.query_post_execution_showplan(
     ACTION(sqlserver.plan_handle,sqlserver.sql_text)),
ADD EVENT sqlserver.sp_statement_completed(
     ACTION(sqlserver.sql_text))
ADD TARGET package0.event_file(SET filename=N'C:\temp\CapturePlans.xel'),
ADD TARGET package0.ring_buffer(SET max_memory=(102400))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,
     MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF);
GO

ALTER EVENT SESSION [CapturePlans]
     ON SERVER
     STATE=START;
GO

USE [AdventureWords2012];
GO

ALTER INDEX [IX_SalesOrderDetailEnlarged_ProductID] ON [Sales].[SalesOrderDetailEnlarged]
     REBUILD WITH (MAXDOP = 8);
GO

ALTER INDEX [IX_SalesOrderDetailEnlarged_ProductID] ON [Sales].[SalesOrderDetailEnlarged]
     REBUILD;
GO

ALTER EVENT SESSION [CapturePlans]
     ON SERVER
     STATE=STOP;
GO

Note that I used a different version of the SalesOrderDetail table named SalesOrderDetailEnlarged.  This table has over 4 million rows in it and was populated using Jonathan’s Create Enlarged AdventureWorks Table script to ensure I’d have a table large enough to warrant a parallel rebuild.  After I stopped the event session I opened the .xel file from C:\temp in Management Studio and added the sql_text column to the display so I could easily find the ALTER INDEX statements.

The screen shot below is from the ALTER INDEX statement with MAXDOP = 8 included.  The query_post_execution_showplan event is highlighted, you can see the sql_text, and I hovered over the showplan_xml to show the first part of the xml version of the plan.  Note the red box around QueryPlan DegreeofParallelism…it’s 8, as expected:

ALTER INDEX ... REBUILD WITH (MAXDOP=8)

ALTER INDEX … REBUILD WITH (MAXDOP = 8)

If you’re playing along at home in your test environment, you can click on the Query Plan to see the graphical view, or double-click the XML to view that plan that way.  Now check out the screen capture below, which is for the ALTER INDEX statement that did not include the MAXDOP option:

ALTER INDEX ... REBUILD (default option)

ALTER INDEX … REBUILD (default option)

The max degree of parallelism for the plan is 4 because if the MAXDOP option is not included, SQL Server uses the max degree of parallelism set for the instance.  Note that this holds true when parallelism is disabled for an instance (max degree of parallelism = 1):

sp_configure 'max degree of parallelism', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO

ALTER EVENT SESSION [CapturePlans]
 ON SERVER
 STATE=START;
GO

USE [AdventureWords2012];
GO

ALTER INDEX [IX_SalesOrderDetailEnlarged_ProductID] ON [Sales].[SalesOrderDetailEnlarged]
     REBUILD;
GO

ALTER EVENT SESSION [CapturePlans]
 ON SERVER
 STATE=STOP;
GO
ALTER INDEX ... REBUILD (default option) - MAXDOP=1 for instance

ALTER INDEX … REBUILD (default option) – MAXDOP = 1 for instance

The plan shows a DegreeOfParallelism of 0 – this means that the query did not use parallelism – and that the plan includes a NonParallelPlanReason* of “MaxDOPSetToOne”.  Therefore, if MAXDOP is set to 1 for an instance, and the default ALTER INDEX … REBUILD statements are used to rebuild indexes – where the MAXDOP option is not included – then rebuilds will be single-threaded.  For some well-known applications (e.g. SharePoint, SAP, BizTalk)  it is recommended to set the max degree of parallelism to 1 for the instance.  While that option may be appropriate for application-specific queries, it means that your index rebuild operations may run longer than if parallelism was enabled.  It may be worth modifying your index maintenance script to include the MAXDOP option for ALTER INDEX REBUILD statements.

In the event that you have a max degree of parallelism value above 1 specified for the instance, but you’re not sure what the “right” MAXDOP value should be for your index rebuilds, you can let SQL Server decide.  If you include the WITH (MAXDOP = 0) option in your rebuild syntax, then the optimizer will determine how many CPUs to use, which could be anywhere from 1 to all of the CPUs available to SQL Server.  This is the recommended setting per Books Online, but I would caution you to use this option only if you’re comfortable with SQL Server potentially using all CPUs for a rebuild.  If you happen to be running other tasks or processes in the database while the rebuilds run – not ideal, but for a 24×7 solution you often don’t have a choice – then you should specify a MAXDOP value below the total number of CPUs available.

Finally, in case you’re wondering about parallelism and reorganizing indexes…the WITH (MAXDOP = n) option is not available for ALTER INDEX REORGANIZE, as index reorganization is always a single-threaded operation.  The final post in this series will cover parallelism and the UPDATE STATISTICS command, and if you’re manually managing statistics and specifying the sample, you don’t want to miss it!

*If you’re interested, Joe talks about the NonParallelPlanReason attribute  in his post, SQL Server 2012 Execution Plan’s NonParallelPlanReason, which may be useful when you’re digging into execution plans in SQL Server 2012 and higher.

SQL Server Maintenance Plans and Parallelism – CHECKDB

Many posts and articles that discuss parallelism and SQL Server revolve around query performance and query optimization. Parallelism can affect performance, and some DBAs and database developers spend a great amount of time trying to find the “right” max degree of parallelism (MAXDOP) setting for an instance. Finding that right value is a science and an art, but understanding what degree of parallelism is used by SQL Server for standard maintenance tasks is straight science.

Parallelism and Instance Configuration

To start, I’m working under the assumption that you are familiar with parallelism and understand the implications of leaving max degree of parallelism set to 0, changing it to 1, or setting it to another value. Note: If you’re looking for background reading, I recommend Adam Machanic’s SQL University Parallelism Week posts (Part 1, Part 2, and Part 3), and Paul White’s article, Understanding and Using Parallelism in SQL Server. To understand what impact max degree of parallelism has on maintenance tasks, we need to know its value for an instance, and the easiest way to find it is via sys.configurations:

SELECT [name], [value], [value_in_use]
FROM [sys].[configurations]
WHERE [name] = 'max degree of parallelism';

In my environment, it’s set to 4 for purposes of this demo (I have 8 logical processors on my laptop):

max degree of parallelism configuration

max degree of parallelism configuration

The three maintenance tasks I want to cover in this series are database consistency checks, index rebuilds, and statistic updates.  As you might have guessed from the title, this first post will cover consistency checks.

Database Consistency Checks

In November of last year I blogged about DBCC CHECKDB and parallel checks due to an inconsistency I found in Books Online. The inconsistency has since been fixed in BOL (hooray for Connect items!), and the short story is that any consistency check (CHECKDB, CHECKTABLE, CHECKFILEGROUP, and CHECKCATALOG) is single-threaded in Standard Edition. Regardless of the number of cores in your server and the MAXDOP setting for the instance, any CHECK command will be single-threaded.

In Enterprise Edition, checks can be done in parallel, and the max degree of parallelism is determined by the Query Processor. The Query Processor respects the max degree of parallelism setting for the instance, therefore since I have MAXDOP = 4 for my instance, up to four processors can be used by a CHECK command. There is no MAXDOP option that can be included with a CHECK command; it always respects the value set for the instance.  We can prove this using Extended Events and the sp_statement_completed and query_post_execution_showplan events.

**Important note here: it is NOT recommended to capture the query_post_execution_showplan event against a live, production system. This event generates significant performance overhead, and you are warned of this when configuring the session via the GUI. If you repeat any of the demos here, please make sure to execute them against a test environment. I really do not want you to bring down your production environment.**

Here are the statements to create the event session, start it, run CHECKDB, then stop the event session. I am using a file target to capture the output, and the path is C:\temp. You may need to modify this path for your environment.

CREATE EVENT SESSION [CapturePlans] ON SERVER
ADD EVENT sqlserver.query_post_execution_showplan(
ACTION(sqlserver.plan_handle,sqlserver.sql_text)),
ADD EVENT sqlserver.sp_statement_completed(
ACTION(sqlserver.sql_text))
ADD TARGET package0.event_file(SET filename=N'C:\temp\CapturePlans.xel'),
ADD TARGET package0.ring_buffer(SET max_memory=(102400))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,
MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF);
GO

ALTER EVENT SESSION [CapturePlans]
ON SERVER
STATE=START;
GO

DBCC CHECKDB (AdventureWorks2012) WITH NO_INFOMSGS;
GO

ALTER EVENT SESSION [CapturePlans]
ON SERVER
STATE=STOP;
GO

After stopping the event session, open the .xel file from C:\temp in Management Studio and add the sql_text column to the display to easily find the CHECKDB statement.  In the  screen shot below the query_post_execution_showplan event is highlighted, you can see the sql_text, and I hovered over the showplan_xml to show the first part of the xml version of the plan. Note the red box around QueryPlan DegreeofParallelism…it’s 4, as expected because it’s set to that for the instance.  The query plan is also included in the screen shot to show the full plan.

DBCC CHECKDB with parallelism

DBCC CHECKDB with parallelism

To view the full XML for the plan, just double-click the showplan_xml value and it will open in a new window. Just for fun, if we change MAXDOP for the instance to 1 (which I’ve seen in some OLTP environments) note that CHECKDB now runs single-threaded, regardless of SQL Server version.

sp_configure 'show advanced options', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
sp_configure 'max degree of parallelism', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO

ALTER EVENT SESSION [CapturePlans]
ON SERVER
STATE=START;
GO

DBCC CHECKDB (AdventureWorks2012) WITH NO_INFOMSGS;
GO

ALTER EVENT SESSION [CapturePlans]
ON SERVER
STATE=STOP;
GO

Again, if we open the most recent .xel file in C:\temp and add the sql_text column, we can see that CHECKDB runs single-threaded:

DBCC CHECKDB single-threaded

DBCC CHECKDB single-threaded

Realize that if max degree of parallelism is set to 1 for instance, and you’re running Enterprise Edition, your consistency checks could be taking longer than necessary.  The only way to work around this in Enterprise Edition is to change the max degree of parallelism for the instance, run the check, then change it back.  Alternatively, in some enterprise environments that are 24/7, it may not be desirable to have CHECK commands use a high number of threads.  In that case, lowering max degree of parallelism while CHECKs run may not be an option – you may not want to impact the performance of your regular queries.  Instead, you could disable parallel checking for CHECK commands entirely using trace flag 2528, or you could use Resource Governor to limit CPU use. And one final, interesting note (thanks Paul): CHECKALLOC is always single-threaded, regardless of Edition.

In the next post we’ll look at parallelism and index maintenance.

SQL Server Baselines Series on SQLServerCentral.com

Today the fourth article in my baselines series was published on SQLServerCentral.com.  Listed below are the links to all of the articles, and I’ve decided to continue the series on my blog here.  I truly believe that every SQL Server environment can benefit from having baselines.  I say that all the time, I know, and people seem to agree with me :)  But yet there are many systems out there that don’t have baseline data.  The biggest roadblock that I see is deciding what to capture, and then writing the scripts to capture what you want.  The goal of the series I wrote for SQLServerCentral.com was to provide background about why baselines are important, and provide a set of scripts to get started.  The value of baselines may seem obvious to you, but for new DBAs the benefits may not be as apparent, so taking a few minutes to step through them is useful.  The scripts I provide are straight-forward and there’s a reason for that: It’s not rocket science.  Capturing baseline data is really easy to do, and my goal is to provide scripts so DBAs can quickly and easily start capturing data, and hopefully once people realize how simple it was, they will develop their own scripts.

But just because I’ve provided a few scripts doesn’t mean I’m done.  I am going to continue to develop and post scripts here that DBAs can use to set up baselines in their own environment. I recognize that there are some excellent applications available from third party vendors that will capture baseline data, and if your company has the budget to purchase such an application, I would highly recommend doing so. My scripts are not a replacement for any of those applications. But for those of you who don’t have the budget, or perhaps need some data in order to make a case for why you need one of those applications, these scripts will get you started.

If you have any feedback about the scripts, or requests for what I write next, feel free to email me! The scripts are a work in progress and I plan to continually work to make them better.

5 Reasons You Must Start Capturing Baseline Data

Back to Basics: Capturing Baselines on Production SQL Servers

Capturing Baselines on SQL Server: Where’s My Space?

Capturing Baselines on SQL Server: Wait Statistics

Understanding When Statistics Will Automatically Update

During the PASS Summit in November I presented a session on Demystifying Database Statistics (if you attended Summit but missed it, you can stream it from here). During the session I went through a demo that showed the automatic update of a statistic. The database had the Auto Update Statistics option enabled, and I had added almost 25,000 rows to a table with 121,000 rows. The number of rows added via the bulk import was just over the limit to trigger an auto-update of statistics for the next query, which I showed. I had many people ask why statistics did not update immediately after the bulk load of the 25,000 rows completed, so I want to step through the demo that I did at Summit to clarify when an automatic update will occur.

The Setup

Start with a copy of the AdventureWorks2012 database, which you can download from CodePlex. Ensure that the Auto Update Statistics option is enabled:

IF (SELECT COUNT(*) FROM [sys].[databases] WHERE [name] = 'AdventureWorks2012' AND [is_auto_create_stats_on] = 0) = 0
BEGIN
ALTER DATABASE [AdventureWorks2012] SET AUTO_UPDATE_STATISTICS ON
END;

We’re going to use a copy of the Sales.SalesOrderDetail table for the demo. After we create the table, we will check to see when statistics last updated. We can use various methods to check statistics date, such as DBCC SHOW_STATISTICS or STATS_DATE, but since the release of SP1 for SQL Server 2012, I have exclusively used sys.dm_db_stats_properties to get this information.

USE [AdventureWorks2012];
GO
SELECT *
INTO [Sales].[TestSalesOrderDetail]
FROM [Sales].[SalesOrderDetail];
GO
CREATE CLUSTERED INDEX [PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID] ON [Sales].[TestSalesOrderDetail] ([SalesOrderID], [SalesOrderDetailID]);
GO
CREATE UNIQUE NONCLUSTERED INDEX [AK_TestSalesOrderDetail_rowguid] ON [Sales].[TestSalesOrderDetail] ([rowguid]);
GO
CREATE NONCLUSTERED INDEX [IX_TestSalesOrderDetail_ProductID] ON [Sales].[TestSalesOrderDetail] ([ProductID]);
GO
SELECT
OBJECT_NAME([sp].[object_id]) AS "Table",
[sp].[stats_id] AS "Statistic ID",
[s].[name] AS "Statistic",
[sp].[last_updated] AS "Last Updated",
[sp].[rows],
[sp].[rows_sampled],
[sp].[unfiltered_rows],
[sp].[modification_counter] AS "Modifications"
FROM [sys].[stats] AS [s]
OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [sp]
WHERE [s].[object_id] = OBJECT_ID(N'Sales.TestSalesOrderDetail');

Here’s the output:

sys.dm_db_stats_properties after initially creating the table

sys.dm_db_stats_properties after initially creating the table

This is my go-to DMV for statistics because in addition to including the date that statistics were last updated and row information, I also get a count of modifications since the last statistic update. I just created the table, so the Last Updated date is current, and I have not made any changes so the modification count is 0.

Invalidating Statistics

For my demo I want to bulk insert enough rows to invalidate the statistics. SQL Server has pre-determined thresholds where it considers statistics to be out-of-date and therefore invalid. The technical article, Statistics Used by the Query Optimizer in Microsoft SQL Server 2008, documents these thresholds. For a regular table, statistics are out of date when:

o The table size has gone from 0 to >0 rows (test 1).

o The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then (test 2).

o The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered (test 3).

The Sales.SalesOrderDetail table has 121317 rows:

(121317 * 0.20) + 500 = 24764

The bulk insert below loads 24775 rows, which should be enough to invalidate statistics. Download the .txt file for the bulk insert here.

BULK INSERT AdventureWorks2012.Sales.TestSalesOrderDetail
FROM 'C:\SQLStuff\Statistics\Data\sod.txt'
WITH
(
DATAFILETYPE = 'native',
TABLOCK
);
(24775 row(s) affected)

After the bulk load completes, re-run the query against sys.dm_db_stats_properties and review the output:

sys.dm_db_stats_properties after the bulk import

sys.dm_db_stats_properties after the bulk import

The statistics have not updated, but the modification counter has changed, as expected. The statistics are now out of date based on the threshold defined previously, and we would expect that a query or data modification against Sales.TestSalesOrderDetail would trigger an update of statistics. But before we try that, let’s review what causes the automatic update.

The aforementioned article states:

The statistics auto update is triggered by query optimization or by execution of a compiled plan, and it involves only a subset of the columns referred to in the query.

When a query is first compiled, if the optimizer needs a particular statistics object, and that statistics object exists, the statistics object is updated if it is out of date. When a query is executed and its plan is in the cache, the statistics the plan depends on are checked to see if they are out of date. If so, the plan is removed from the cache, and during recompilation of the query, the statistics are updated. The plan also is removed from the cache if any of the statistics it depends on have changed.

To be clear, if a query plan exists in cache and that plan uses specific statistics, when the query executes SQL Server checks to see if any of the statistics used in the plan are out of date. If they are, then the automatic update of those statistics occurs.

If a plan does not exist in cache for a query, then if the optimizer uses a statistics object that is out of date when the plan compiles, SQL Server will automatically update those statistics.

Invoking the Automatic Update

We have not run any query against Sales.TestSalesOrderDetail except our bulk insert. At the time that the query compiled for the bulk insert, no statistics for Sales.TestSalesOrderDetail were out of date; therefore no statistics required an automatic update.

Now let’s issue an update against Sales.TestSalesOrderDetail that will change the ProductID for a specific SalesOrderID, and then query sys.dm_db_stats_properties:

UPDATE Sales.TestSalesOrderDetail SET ProductID = 717 WHERE SalesOrderID = 75123;
GO
(3 row(s) affected)

Output:

sys.dm_db_stats_properties after the update

sys.dm_db_stats_properties after the update

We can see that the PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID statistic object automatically updated. We could simply assume that the optimizer used this object in the plan. However, in SQL Server 2012 we can look at the plan XML and confirm this.

In his post, Statistics used in a cached query plan, Fabiano Neves Amorim describes a method to capture statistics information from the plan. Please note the following:

  • This information is only available in SQL Server 2012 and higher.
  • This query requires use of an undocumented trace flag. I do not recommend using this trace flag in a Production environment.
  • This query interrogates the plan cache. The plan cache may be very large for your system, depending on the amount of memory on the server. Before querying the plan, I recommend setting the transaction isolation level to READ UNCOMMITTED, and also recommend using OPTION (MAXDOP 1) to limit CPU utilization. The query may take longer to execute, but it reduces the impact on other queries executing concurrently.

If we query the plan cache using Fabio’s query, we get the following information:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
GO
DBCC TRACEON (8666);
GO
WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' as p)
SELECT qt.text AS SQLCommand,
qp.query_plan,
StatsUsed.XMLCol.value('@FieldValue','NVarChar(500)') AS StatsName
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) qt
CROSS APPLY query_plan.nodes('//p:Field[@FieldName="wszStatName"]') StatsUsed(XMLCol)
WHERE qt.text LIKE '%UPDATE%'
AND qt.text LIKE '%ProductID%';
GO
DBCC TRACEOFF(8666);
GO

Output:

statistic object used in the query plan

statistic object used in the query plan

The output confirms that the optimizer used the PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID statistic object in the query plan, and because it was out of date, SQL Server automatically updated it.

If you are running earlier versions of SQL Server, you can see what statistics the optimizer loads using the undocumented trace flags 9292 and 9204. Paul White has an excellent post, How to Find the Statistics Used to Compile an Execution Plan, which I recommend reading if you’re interested.

One thing I want to point out: the IX_TestSalesOrderDetail_ProductID statistic, which has ProductID as its key, did not update automatically when the UPDATE query executed. This is expected. Even though the statement modified ProductID for three rows (the modification counter for the statistic increased from 24775 to 24778), the optimizer did not use that statistic in the plan. If the plan does not use a statistic, the statistic will not automatically update, even if the query modifies columns in said statistic key.

Summary

As I stated initially, I wanted to write this post to clear up any confusion surrounding automatic updates for statistics. Even if a statistic becomes outdated as the result of a modification, it will not automatically update after the modification completes. The statistic will automatically update the next time a query plan uses it.

To reiterate (yes, this might be overkill), SQL Server will automatically update a statistic when:

  • A query compiles for the first time, and a statistic used in the plan is out of date
  • A query has an existing query plan, but a statistic in the plan is out of date

For those that attended my session at Summit, I hope this helps address any questions you might have still had. If not, please leave a comment and I will get back to you!

Trending Database Growth From Backups

When I start working with a client, one question I always ask is whether they are collecting baselines of their SQL Server environment (a shocker, I know). If they are not, I explain why it’s a good idea to start capturing them. And even though I think it’s an easy argument to make, I find I make a better case when I have data to back it up. But how do you make the argument for baseline data, when you don’t have any real data to show?

There is data in SQL Server that you can mine; you just have to know where to find it. If I look at a client system and notice that maintenance tasks keep taking longer and longer, then I might assume it’s due to database growth. Now, if it’s just database integrity checks that are taking longer and longer, that might be a sign that something is wrong. However, that’s out of scope for this post, so let’s stick with the assumption that the database is growing larger over time because data is rarely deleted, only added. Depending on the client’s current storage and the duration of the tasks, I may have some concerns about how much disk space they’re going to need down the road. I really want to trend database growth, among other things, over time. And one way I can approximate growth is by using information from full backups.

When you backup a database, every page that is allocated in the database is copied to the backup. This means you could have a 100GB database with a backup of only 50GB, because only 50GB’s worth of pages are allocated. If my database files are pre-sized, as they hopefully are, then looking at backup size will not tell me anything about the current size of the database. However, it will tell me about the growth of it – which is really what I’m after.

Backup information is stored in msdb, and while it should be removed on a regular basis via a scheduled maintenance task, it is not unusual for at least three to six months of data to exist, if not more. Everything I need for this example I can capture from dbo.backupset, which has one row for every successful backup operation. Here’s my query*:

SELECT
[database_name] AS "Database",
DATEPART(month,[backup_start_date]) AS "Month",
AVG([backup_size]/1024/1024) AS "Backup Size MB",
AVG([compressed_backup_size]/1024/1024) AS "Compressed Backup Size MB",
AVG([backup_size]/[compressed_backup_size]) AS "Compression Ratio"
FROM msdb.dbo.backupset
WHERE [database_name] = N'AdventureWorks'
AND [type] = 'D'
GROUP BY [database_name],DATEPART(mm,[backup_start_date]);

In this query I’m filtering on a specific database, and I’m only looking at full backups (type = ‘D’). Log backups would be interesting to examine as well, but that’s for another post. I’m also aggregating all the full backups for one month. Whether you’re running full backups daily or weekly, I would recommend aggregating the data by month. Trying to look at the changes day-by-day or even week-by-week is too detailed. We want to look at the big picture, and a monthly summary gives us that. Here’s the output for my AdventureWorks database:

output

Notice that the backup size increases over time, but it’s not linear. If I graph it in Excel, I can really see the trend:

image

Further analysis is natural from this point on – what’s the percent increase each month? Each quarter? Which month had the largest increase? When is the database going to fill up the storage we have allocated currently? In my case, I just want to be able to show that we can get this kind of information, plus a lot more, from SQL Server if we just capture it. And this data supports my point very well. If you want to dig deeper into database growth analysis, I say run with it. J

Hopefully you now see how easy it to use data from SQL Server to make your life easier: the information the above query provides can help you understand database growth and start basic capacity planning. I also hope this information helps to convince you (or your manager) that collecting baseline data can be extremely beneficial, and now’s the time to start. If you need more background, or some queries to get you started, please check out my Baselines series on SQL Server Central. Good luck!

EDIT: *For those of you running SQL Server 2005 and below, you will need to exclude compression information:

SELECT
[database_name] AS "Database",
DATEPART(month,[backup_start_date]) AS "Month",
AVG([backup_size]/1024/1024) AS "Backup Size MB"
FROM msdb.dbo.backupset
WHERE [database_name] = N'AdventureWorks'
AND [type] = 'D'
GROUP BY [database_name],DATEPART(mm,[backup_start_date]);

DBCC CHECKDB Parallel Checks and SQL Server Edition

It’s been a few weeks since the PASS Summit but I’m still getting the occasional email from people who attended one of my sessions. I consider this a good thing – I believe that if someone follows up with me a few weeks after they attended a session, then something I said stuck with them and it was worth their time to follow up. Some people have had questions about something I said during a session, and others are following up on discussions we had during the week.

I had one question about minimizing the resources used by DBCC CHECKDB, and one of the suggestions I provided was to reduce MAXDOP when CHECKDB was running, to limit the number of processors used by CHECKDB. However, I want to point out that CHECKDB only runs in parallel on Enterprise Edition. You may be aware of this already, as Paul mentions it in one of his posts, and also in the SQL Server 2008 Internals book. But, as I was perusing the DBCC CHECKDB entry online, I discovered that it contains incorrect information. From BOL (emphasis mine):

“Checking Objects in Parallel

By default, DBCC CHECKDB performs parallel checking of objects. The degree of parallelism is automatically determined by the query processor. The maximum degree of parallelism is configured just like parallel queries. To restrict the maximum number of processors available for DBCC checking, use sp_configure. For more information, see Configure the max degree of parallelism Server Configuration Option. Parallel checking can be disabled by using trace flag 2528. For more information, see Trace Flags (Transact-SQL).”

The first sentence is incorrect. DBCC CHECKDB does not perform parallel checking of objects in Standard Edition. At all. CHECKDB is always single-threaded in Standard Edition and I confirmed this with the SQL Server development team.

For Enterprise Edition, CHECKDB respects the maximum degree of parallelism setting configured for the instance. If you have MAXDOP set to 0 for the instance, but want to restrict CHECKDB to only use 2 processors, you need to either change MAXDOP for the instance to 2, or you need to use Resource Governor (also an Enterprise-only feature).

I did enter a Connect item for this, and hopefully the documentation will be fixed soon. I hope this clears up any confusion that may exist.

What Checks Update dbccLastKnownGood?

One of the methods I’ve used to check when a client last ran DBCC CHECKDB on their database is the dbccLastKnownGood value on the boot page (page 9 in the PRIMARY file).  When working with a client a couple weeks ago, the value for dbccclastknowngood was from 2006, but the customer stated they were running integrity checks.  When I investigated their method for checks, I discovered that due to the database size they were running DBCC CHECKALLOC and DBCC CHECKCATALOG weekly, and checking different user tables with DBCC CHECKTABLE at night.  Breaking out the checks between the different commands does not update dbccLastKnownGood – running CHECKDB is the only thing that does.  Let’s have a look…

I have a copy of the AdventureWorks2012 database installed, and I can look at the boot page using DBCC PAGE or DBCC DBINFO.  Both commands are undocumented, but as Paul notes in the links provided, they are safe to run.  I’ll use DBINFO:

DBCC TRACEON (3604);
GO
DBCC DBINFO ('AdventureWorks2012');
GO

The text below is a snippet of the output to show only what’s relevant:

dbi_dbmOldestXactLsn = 0:0:0 (0x00000000:00000000:0000)                  dbi_CloneCpuCount = 0

dbi_CloneMemorySize = 0             dbi_updSysCatalog = 1900-01-01
00:00:00.000

dbi_LogBackupChainOrigin = 0:0:0
(0x00000000:00000000:0000)             

dbi_dbccLastKnownGood = 190001-01 00:00:00.000                          dbi_roleSequence
= 0

dbi_dbmHardenedLsn = 0:0:0 (0x00000000:00000000:0000)                    dbi_localState = 0

dbi_safety = 0                      dbi_modDate = 2012-11-15
09:42:47.813

dbi_verRDB = 184552376              dbi_lazyCommitOption = 0           

dbi_svcBrokerGUID = 10048a46-646c-4c44-8037-8ea6c83f4e2c                 dbi_svcBrokerOptions =
0x00000001

If we run DBCC CHECKALLOC, DBCC CHECKCATALOG, and DBCC CHECKTABLE for all system and user tables, you can see that dbccLastKnownGood good does not change (please don’t holler because I’m using a cursor…it does what I need for this example!):

DBCC CHECKALLOC ('AdventureWorks2012') WITH ALL_ERRORMSGS, NO_INFOMSGS;
GO

DBCC CHECKCATALOG ('AdventureWorks2012');
GO

USE [AdventureWorks2012];
GO

DECLARE @schemaname varchar(500);
DECLARE @tablename varchar(500);
DECLARE @fulltable varchar(1000);
DECLARE @string varchar(5000); </span>

DECLARE TableList CURSOR FOR
SELECT [ss].[name],[so].[name]
FROM [sys].[objects] AS so
JOIN [sys].[schemas] AS ss ON [so].[schema_id]=[ss].[schema_id]
WHERE [so].[type] in ('S','U')
ORDER BY [so].[name]; 

BEGIN
OPEN TableList

FETCH NEXT FROM TableList
INTO @schemaname, @tablename;

WHILE @@FETCH_STATUS = 0>
BEGIN
SET @fulltable = @schemaname + '.' + @tablename;
SET @string = N'DBCC CHECKTABLE (''' + @fulltable + ''') WITH ALL_ERRORMSGS, NO_INFOMSGS;'
EXEC (@string)
FETCH NEXT FROM TableList
INTO @schemaname, @tablename;
END 

CLOSE TableList;
DEALLOCATE TableList;
END

<sDBCC DBINFO ('AdventureWorks2012');
GO

Output:

dbi_dbmOldestXactLsn = 0:0:0 (0x00000000:00000000:0000)                  dbi_CloneCpuCount = 0

dbi_CloneMemorySize = 0             dbi_updSysCatalog = 1900-01-01
00:00:00.000

dbi_LogBackupChainOrigin = 0:0:0
(0x00000000:00000000:0000)             

dbi_dbccLastKnownGood = 190001-01 00:00:00.000                          dbi_roleSequence
= 0

dbi_dbmHardenedLsn = 0:0:0 (0x00000000:00000000:0000)                    dbi_localState = 0

dbi_safety = 0                      dbi_modDate = 2012-11-15
09:42:47.813

dbi_verRDB = 184552376              dbi_lazyCommitOption = 0           

dbi_svcBrokerGUID = 10048a46-646c-4c44-8037-8ea6c83f4e2c                 dbi_svcBrokerOptions =
0x0000000

No change in dbccLastKnownGood.  What happens if we run DBCC CHECKFILEGROUP?  I created a second filegroup in the AdventureWorks2012 database and created one table in the filegroup, then ran the check on that filegroup:

ALTER DATABASE [AdventureWorks2012]
ADD FILEGROUP [TestFG]; 

ALTER DATABASE [AdventureWorks2012]
ADD FILE ( NAME = N'TestFile', FILENAME = N'D:\Databases\SQL2012\AdventureWorks2012\TestFile.ndf' ,
SIZE = 4096KB , FILEGROWTH = 1024KB )
TO FILEGROUP [TestFG];

CREATE TABLE [dbo].[test] (
[col1] INT,
[col2] INT
)
ON [TestFG]; 

INSERT INTO [dbo].[test] (
col1,
col2
)
VALUES
(1,2),
(3,4); 

DBCC CHECKFILEGROUP (2) WITH ALL_ERRORMSGS, NO_INFOMSGS;
GO
DBCC DBINFO ('AdventureWorks2012');
GO

Output:

dbi_dbmOldestXactLsn = 0:0:0 (0x00000000:00000000:0000)                  dbi_CloneCpuCount = 0

dbi_CloneMemorySize = 0             dbi_updSysCatalog = 1900-01-01
00:00:00.000

dbi_LogBackupChainOrigin = 0:0:0
(0x00000000:00000000:0000)             

dbi_dbccLastKnownGood = 2012-11-15
12:25:34.090                         
dbi_roleSequence
= 0

dbi_dbmHardenedLsn = 0:0:0 (0x00000000:00000000:0000)                    dbi_localState = 0

dbi_safety = 0                      dbi_modDate = 2012-11-15
09:42:47.813

dbi_verRDB = 184552376              dbi_lazyCommitOption = 0           

dbi_svcBrokerGUID = 10048a46-646c-4c44-8037-8ea6c83f4e2c                 dbi_svcBrokerOptions =
0x0000000

Look at that, it updated.  And finally, we know that CHECKDB will update the value, but what happens if we include the WITH PHYSICAL_ONLY option, where the checks are not quite as involved? (See Paul’s post on check options for VLDBs for additional notes on WITH PHYSICAL_ONLY.)

DBCC CHECKDB ('AdventureWorks2012') WITH PHYSICAL_ONLY;
GO
DBCC DBINFO ('AdventureWorks2012');
GO

Output:

dbi_dbmOldestXactLsn = 0:0:0 (0x00000000:00000000:0000)                  dbi_CloneCpuCount = 0

dbi_CloneMemorySize = 0             dbi_updSysCatalog = 1900-01-01
00:00:00.000

dbi_LogBackupChainOrigin = 0:0:0
(0x00000000:00000000:0000)             

dbi_dbccLastKnownGood = 2012-11-15
12:45:21.173                         
dbi_roleSequence
= 0

dbi_dbmHardenedLsn = 0:0:0 (0x00000000:00000000:0000)                    dbi_localState = 0

dbi_safety = 0                      dbi_modDate = 2012-11-15
09:42:47.813

dbi_verRDB = 184552376              dbi_lazyCommitOption = 0           

dbi_svcBrokerGUID = 10048a46-646c-4c44-8037-8ea6c83f4e2c                 dbi_svcBrokerOptions =
0x0000000

The dbccLastKnownGood value does update, as it does with CHECKDB without the PHYSICAL_ONLY.  Therefore, if you’re relying on dbccLastKnownGood to tell you when CHECKDB was last run, realize it may not give you the entire story.

Note: post edited 11/16/2012 to replace screen shots of DBCC DBINFO output with text, as the images did not render well in all browsers.

T-SQL Tuesday #36: What Does Community Mean?

 

TSQL2sDay150x150 I find this month’s T-SQL Tuesday topic quite appropriate, as it follows the 2012 PASS Summit, a time when I get to see so many members of the SQL Server Community.  Chris Yates (@YatesSQL) is hosting this month and asks, “What Does Community Mean to You?”  My short answer? #sqlfamily, #sqlhelp and opportunity.

When I think of the SQL Community, it is the people that come to mind – so many people that I have met and become friends with, and so much of it is due to Twitter.  When I think of the SQL Community, I think of Twitter.  I do.  It may seem crazy, and I know that there is more to the community than conversations on Twitter – there is the Summit, there are User Groups, there are SQLSaturdays, there are blogs – the community encompasses so much.  But how do I communicate with everyone?  So frequently, so quickly?  How I have met so many people from all over the US?  All over the world?  Twitter.

Our community is incredibly unique.  None of my friends or family have this kind of cohort – they don’t have the friendships that span the globe.  They don’t regularly converse with colleagues all around the world, working together to answer questions and solve problems.  I’ve lost so many hours of sleep due to chats with Rob Farley (I curse that 17.5 hour time difference!) and Jonathan Kehayias (I’m not convinced that he sleeps) but it’s all worth it.

I have found some really close friends in the Community.  Friends that I would spend time with even if they didn’t work with SQL Server.  Friends that I know I can call or email at any time for anything.  Just over a year ago I blogged about #sqlfamily, and what I wrote then still holds true today.

The SQL Community is also about helping others.  The generosity of individuals continually astounds me.  So many members of the community volunteer for PASS, helping to make the organization so successful.  Many people help others solve technical problems via the #sqlhelp hashtag on Twitter.  And when it’s too much for 140 characters, blog posts are written, questions posted to forums or emails exchanged.  I’ve seen it happen to others, it’s happened to me.  The people of this community share information and provide help, and then celebrate success.

And finally, the SQL Community can provide significant opportunities for people.  The opportunity to meet a favorite blogger at Summit, the opportunity to talk to someone at Microsoft to understand how it really works, the opportunity to meet someone who will become that friend you call when you have life changing news.  And yes, opportunity can be a job opening.  How many jobs have been found through the community?  Too many to count.  I found my team through the SQL Community.  If I were not involved, I do not believe I would have the job I have today.  I love what I do.  I work with five amazing individuals.  People that I respect personally and professionally.  My job, just like the Community, is exhausting, rewarding, and a whole lot of fun.

If you’re not involved, I recommend you think about it.  You never know where it will take you.  And for those of you in this community – I thank you for making it what it is today, and I cannot wait to see where we go in the future.

PASS Summit 2012: Day 2

Happy second day of Summit!  Holy buckets, yesterday was a busy day.  But it was great.  After the keynote I headed off to present my DBCC session which went well.  Thank you to everyone that attended and to all of those who had questions and provided feedback.  I am presenting again today, same time (10:15 AM PST) and same room (618-620), but this time it’s Demystifying Database Statistics.  Statistics is another of my favorite topics, I expect the time will fly by just as it did yesterday.

I just finished my PASS TV interview which was short but fun and I am at the blogger’s table again this morning, quite curious to see what the keynote brings.  I know we will hear from Douglas McDowell, Thomas LaRock and Quentin Clark.  Douglas and Thomas are Vice Presidents on the PASS Board of Directors and Quentin is Corporate Vice President, SQL Program Management at Microsoft.

And we’re off…

8:22 AM Douglas takes the stage first and mentions that he ran into many people yesterday who could not decide what session to attend for a time slot, because there were multiple that were of interest.  Douglas’ suggestion?  Buy the DVDs.  It’s a good recommendation, although remember that if you’ve attended the Summit, you’ll also have access to the sessions online.  If you’ll want to watch the sessions when you’re not on the internet, you’ll need to buy the DVDs.

Douglas is talking about finance, as that’s his role, and points out that this year they’ve hired an additional Community Evangelist.  Karla Landrum was the first Community Evangelist, but she has been so busy that another person was needed.  This past quarter Niko Neugebaur was hired and I know he is thrilled about joining the team.  Karla and Niko provide amazing support to the SQL Server community – if you see one of them this week please say thank you!

Douglas has spent some time talking about the revenue and expenses for PASS. The Summit is the largest revenue generator for PASS – not surprising with over 3000 attendees here this week.

8:35 AM Tom LaRock takes the stage and he gets to announce the PASSion Award Winner, who is Jen Stirrup.  Congratulations Jen!!

Next year’s Summit will be in Charlotte, NC, from October 15-18. Early registration is already available at a very discounted rate.

Tonight is the Community Appreciation Party at Seattle’s Experience Music Project and it starts at 7 PM (ends at 10 PM).  I have never been to EMP and look forward to checking it out!

8:55 AM Quentin Clark has taken the stage and now we’re getting a demo of some new PDW functionality.  I know that big data is a hot topic and of interest right now…but I’m still an engine girl at heart.  Julie Strauss from Microsoft is on stage for a demo, and she’s talking about movie data (I want that data).  There’s a side conversation with Quentin – there’s obviously an ongoing joke about how it’s hard for her to pronounce some words (she’s a Viking).  And then she says, “We don’t talk we just do.”  Love it.

Ok, I need to head off to get ready for my session.  Did I mention that I went to Bob Ward’s session on the SQLOS yesterday?  It was three hours of fantastic technical knowledge :)