New SQLskills Pluralsight courses

We’ve just released a new course on Pluralsight, and there were a couple of other courses released earlier this year that you may have missed.

The new course is Joe Sack’s SQL Server: Common Query Tuning Problems and Solutions – Part 1. It’s almost three hours long and the abstract is:

There are a wide variety of common performance problems that you will encounter when query tuning. Part 1 of this two-part course begins by showing how to validate query execution assumptions around statistics and cardinality in order to ensure you’re troubleshooting the true root cause of a query performance issue. You’ll then learn about common areas where bad assumptions can occur that negatively impact overall query performance, and about common problems and solutions related to being imprecise when querying a database. The course is very demo-centric, with lots of practical tips and tricks you’ll be able to apply in your environment. This course is perfect for developers, DBAs, and anyone responsible for query performance tuning on SQL Server, from SQL Server 2005 onward.

The other recent courses are:

  • Glenn’s SQL Server 2014 DMV Diagnostic Queries: Part 1 (1hour)
    • It is very common for SQL Server instances to be configured incorrectly in one or more ways, and this often has a detrimental effect on workload performance. This course will describe and demonstrate more than 20 DMV queries you can use to easily examine SQL Server instance configuration information and identify problems. This course is perfect for anyone who is responsible for a SQL Server instance, with most topics also applicable to SQL Server 2012 and earlier versions.
  • Glenn’s Scaling SQL Server 2012 – Part 1 (4 hours)
    • When considering how to improve the scalability of a SQL Server workload, many people jump to the conclusion that scaling up or scaling out are the only answers. This course will show you that there are a huge number of things you can do to improve performance and scalability before you should even think of scale-up or scale-out. The bulk of the course describes architecture and design issues in both the application and the database that can affect scalability, along with demonstrations of some of the issues and detailed scripts to help you find the issues in the first place. The course ends with a section on common indexing problems that can drastically limit a workload’s scalability, along with how to identify and fix them. Save your company a bunch of money by postponing the scaling decision through fixing your existing scalability bottlenecks, not ignoring them! This course is perfect for anyone who is responsible for improving the performance and scalability of a SQL Server workload, with most topics applicable to SQL Server 2005 onward.
  • Erin’s Supporting SQL Server ISV Applications (2 hours)
    • This course follows on from the Developing and Deploying SQL Server ISV Applications course and describes how to effectively support your application and provide the best experience for your customers and your company, whether you work for a large or small ISV and are creating a complex or simple application. The course discusses how to give customers recommendations, how to best support customer implementation of your application, and how to maintain the application database. It then explains how to deal with customer performance issues and overly-persistent customers, plus the importance of capturing usage information from customers and how to make use of it. The course wraps up with a discussion of how to build good relationships and a community with your customers, and how to create meaningful and useful documentation. This course is perfect for anyone involved in supporting applications that use SQL Server for data storage. The information in the course applies to all versions from SQL Server 2005 onward.

Check them out!

We’ve got a lot more courses in production and planned for the remainder of 2014 around DMVs, stored procedure performance, Service Broker, Change Data Capture, fragmentation, and more. Watch this space (and the Insider Newsletter) for details.

Thanks

SQL Server 2005 survey from the SQL Server Team

SQL Server 2005 will end Extended Support on 4/12/2016 (see the Support Lifecyle page for details) and the SQL Server Team in Microsoft is conducting a survey with customers to gain some insight that will help with development of the next version of SQL Server. I volunteered to host a public version of the survey on my blog, take responses from you, and forward them on to the relevant SQL Server Team members.

If you’d like to participate, please cut-and-paste the survey below and email me with your answers. I’ll pass them on, stripping out your contact details if you specify that.

I’ve made the survey into plain text to ease c&p issues. Please don’t put your answers in a comment, just send them in email. I’ll delete your email once I’ve processed it.

[Edit 9/18/14: Please don't send any more answers - the SQL Server team has enough results. Thanks!]

Thanks!

Do you want to be anonymous?

Questions from the SQL Server Team:
1) Is your company still using SQL 2005 for applications in production?
   a) If so, how important are the apps (mission-critical, LOB, etc.)
   b) If so, which SKU (Express, Standard, Enterprise)?

2) In the following list of features, are you or your company still relying on them?
   a) SQL Mail
   b) CREATE/ALTER TABLE or CREATE INDEX syntax without parentheses around table options
   c) sp_configure options 'user instances enabled' and 'user instance timeout'
   d) SQL Server Database Management Objects (SQL-DMO)
   e) sp_dboption
   f) DATABASEPROPERTY
   g) UPDATE table1, table2, ... SET syntax
   h) Database compatibility level 80

3) Are there any features in SQL Server 2005 that has been announced deprecated that you still rely on?
   a) The deprecated features list is: http://technet.microsoft.com/en-us/library/ms143729(v=sql.90).aspx

4) If you're looking to upgrade from SQL Server 2005:
   a) To which SQL Server version are you looking at upgrading to?
   b) What are the biggest upgrade blockers you see?

Many thanks for any information you can provide!

Australia classes in Sydney and Canberra in December

I’ve just released our final classes of the year for registration!

We will be coming to Sydney and Canberra in Australia and teaching our new IEPTO1: Immersion Event on Performance Tuning and Optimization – Part 1 :

  • From the curriculum: As well as optimization techniques, this course will also help with design and architecture so you can prevent performance and scalability problems from happening. The cores of this class are comprehensive coverage of indexing and statistics strategies: a SQL Server workload will not perform well unless these are designed, implemented, and tuned correctly. You will also learn why and how to optimize transaction log operations, tempdb and data file configuration, transactions and isolation levels, and locking and blocking.

Our classes in Australia will make use of the excellent Cliftons facilities in both cities, and the price includes breakfast, lunch, and coffee/tea/snacks through the day.

The dates of the classes are:

  • Sydney, NSW, December 8-12
  • Canberra, ACT, December 15-19

The class cost is US$3,995 plus GST, with an early-bird price of US$3,495 plus GST (cut-off dates vary by class).

For both classes, you can register for US$3,195 plus GST through July 31st only – SUPER-early-bird price!

You can get all the logistical, registration, and curriculum details by drilling down from our main schedule page.

We hope to see you there!

July pricing special for new October classes

To celebrate our newly revamped IE1 and IE2 classes becoming IEPTO1 and IEPTO2, through the month of July you can register for either October class in Chicago at the super-early-bird price of $2,995. This price is usually reserved just for our alumnus students. At the end of July, the price will revert to the normal early-bird price of $3,295.

You can get all the logistical, registration, and curriculum details by drilling down from our main schedule page.

Hope to see you there!

PS The classes in Australia in December will be published on July 1st next week!

Revamped IE1 and IE2 classes open for registration in October

I’ve just released our final US classes this year for registration!

We’ve revamped our IE1 and IE2 classes so they’re now both focused on performance and together they form a comprehensive, 10-day performance tuning and optimization course.

  • IE1 is now IEPTO1: Immersion Event on Performance Tuning and Optimization – Part 1 (see here for the revamped curriculum)
    • From the curriculum: As well as optimization techniques, this course will also help with design and architecture so you can prevent performance and scalability problems from happening. The cores of this class are comprehensive coverage of indexing and statistics strategies: a SQL Server workload will not perform well unless these are designed, implemented, and tuned correctly. You will also learn why and how to optimize transaction log operations, tempdb and data file configuration, transactions and isolation levels, and locking and blocking.
  • IE2 is now IEPTO2: Immersion Event on Performance Tuning and Optimization – Part 2 (see here for the revamped curriculum)
    • From the curriculum: The core of this class is understanding resource usage and we will cover in-depth all the areas of concern for a SQL Server workload: I/O, CPU usage, memory usage, query plans, statement execution, parameter sniffing and procedural code, deadlocking, and the plan cache. You will learn how to use specific tools and techniques for analyzing SQL Server: creating and using performance baselines, benchmarking tools, wait and latch statistics, Extended Events, DMVs, and PerfMon. These techniques will be highly adaptable to whatever situation you encounter and you will understand not just how to capture performance data but also how to interpret it, so you can derive answers to your own performance problems rather than relying on someone giving you the answer.

As before, these classes both stand alone perfectly well, but we strongly recommend taking IEPTO1 before IEPTO2 as all its knowledge is assumed in IEPTO2.

If you’ve taken IE1 previously, you don’t need to go back and take IEPTO1. If you’ve already taken IE1, IEPTO2 is the next course for you. If you’ve already taken IE2 but not IE1, we recommend you take IEPTO1.

Our classes in October will be in Chicago, at our usual hotel in Oakbrook Terrace:

  • IE0: Immersion Event for Junior/Accidental DBAs
    • October 6-8
  • IEPTO1: Immersion Event on P.T.O. – Part 1
    • October 6-10 – special super-early-bird pricing through July!
  • IEPTO2: Immersion Event on P.T.O. – Part 2
    • October 13-17 - special super-early-bird pricing through July!

You can get all the logistical, registration, and curriculum details by drilling down from our main schedule page.

We hope to see you there!

Are mixed pages removed by an index rebuild?

This is a question that came up this morning during our IE1 class that I thought would make an interesting blog post as there are some twists to the answer.

The first 8 pages that are allocated to an allocation unit are mixed pages from mixed extents, unless trace flag 1118 is enabled.

See the following blog posts for more info:

Assuming that mixed pages are not disabled with trace flag 1118, does an index rebuild remove all mixed pages or not?

Let’s investigate. First I’ll create a clustered index with 1,000 data pages:

CREATE TABLE [MixedTest] ([c1] BIGINT IDENTITY, [c2] CHAR (8000) DEFAULT 'a');
CREATE CLUSTERED INDEX [MixedTest_CL] ON [MixedTest] ([c1]);
SET NOCOUNT ON;
GO
INSERT INTO [MixedTest] DEFAULT VALUES;
GO 1000

And then make sure that we have mixed pages be examining the first IAM page in the clustered index’s IAM chain. You can get the sp_AllocationMetadata proc here.

EXEC [sp_AllocationMetadata] N'MixedTest';
GO
Object Name   Index ID   Alloc Unit ID       Alloc Unit Type   First Page   Root Page   First IAM Page
------------  ---------  ------------------  ----------------  -----------  ----------  ---------------
MixedTest     1          72057594046185472   IN_ROW_DATA       (1:987)      (1:1732)    (1:988)
DBCC TRACEON (3604);
DBCC PAGE (N'master', 1, 988, 3);
GO

(I’m just including the relevant portion of the DBCC PAGE output here…)

<snip>
IAM: Single Page Allocations @0x00000000227EA08E

Slot 0 = (1:987)                    Slot 1 = (1:989)                    Slot 2 = (1:990)
Slot 3 = (1:991)                    Slot 4 = (1:1816)                   Slot 5 = (1:1817)
Slot 6 = (1:1818)                   Slot 7 = (1:1819)
<snip>

Now I’ll do an offline index rebuild of the clustered index, and look again at the IAM page contents (assume I’m running the proc and DBCC PAGE after the rebuild):

ALTER INDEX [MixedTest_CL] ON [MixedTest] REBUILD;
GO
<snip>
IAM: Single Page Allocations @0x0000000023B0A08E

Slot 0 = (1:1820)                   Slot 1 = (1:446)                    Slot 2 = (1:1032)
Slot 3 = (0:0)                      Slot 4 = (1:1035)                   Slot 5 = (1:1034)
Slot 6 = (1:1037)                   Slot 7 = (1:1036)
<snip>

So the answer is no, an index rebuild does not remove mixed page allocations. Only trace flag 1118 does that.

But this is interesting – there are only 7 mixed pages in the singe-page slot array above. What happened? The answer is that the offline index rebuild ran in parallel, with each thread building a partial index, and then these are stitched together. The ‘stitching together’ operation will cause some of the non-leaf index pages to be deallocated as their contents are merged together. This explains the deallocated page that was originally tracked by entry 3 in the slot array.

Let’s try an offline index rebuild that forces a serial plan.

ALTER INDEX [MixedTest_CL] ON [MixedTest] REBUILD WITH (MAXDOP = 1);
GO
<snip>
IAM: Single Page Allocations @0x0000000023B0A08E

Slot 0 = (1:1822)                   Slot 1 = (1:1823)                   Slot 2 = (1:291)
Slot 3 = (1:292)                    Slot 4 = (0:0)                      Slot 5 = (0:0)
Slot 6 = (0:0)                      Slot 7 = (0:0)
<snip>

In this case there is only one index (i.e. no parallel mini indexes) being built so there are no pages being deallocated in the new index as there is no stitching operation. But why aren’t there 8 mixed pages? This is because during the build phase of the new index, the leaf-level pages are taken from bulk-allocated dedicated extents, regardless of the recovery model in use. The mixed pages are non-leaf index pages (which you can prove to yourself using DBCC PAGE).

For parallel and single-threaded online index operations, the same two patterns occur as for offline index rebuilds, even though the algorithm is slightly different.

Enjoy!

Online index rebuild corruption bug in SQL Server 2012 SP1

This is a quick blog post to let you know about a bug in SQL Server 2012 SP1 that can cause data loss when performing index maintenance.

The data loss issue can happen in some circumstances when you do a parallel online rebuild of a clustered index while there are concurrent data modifications happening on the table AND you also hit a deadlock and another error. Nasty when it occurs, but that should hopefully be a rare combination.

The workaround is to limit the online rebuild operation to be single threaded using the WITH (MAXDOP = 1) option.

There is a hotfix available – see KB 2969896 for more details.

Depending on which build you are on of SQL Server 2012 or 2014, the best option for you will vary. See my friend Aaron Bertrand’s post for comprehensive details.

Target and actual SQL Server uptime survey results

Exactly five years ago I published survey results showing target uptime SLAs and actual uptime measurements. I re-ran the survey a few weeks ago to see what’s changed, if anything, in the space of five years, and here are the results.

24×7 Systems

 24x7target Target and actual SQL Server uptime survey results

 24x7actual Target and actual SQL Server uptime survey results

Other responses:

  • 1 x 99.95%

Non 24×7 Systems

Non24x7target Target and actual SQL Server uptime survey results

Other responses:

  • 7 x “No target or target unknown”
  • 1 x “0830 – 1730 M-Sat”

Non24x7actual Target and actual SQL Server uptime survey results

Other values:

  • 1 x “n/a”

Summary

Well, the good thing is that this survey had almost twice the number of respondents as the 2009 survey, but that could just be that a lot more people read my blog now than five years ago.

My takeaway from the data is that nothing has really changed over the last five years. Given the really low response rate to the survey (when I usually get more than 2-300 responses for a typical survey), my inference is that the majority of you out there don’t have well-defined uptime targets (or recovery time objective service level agreements, RTO SLAs, or whatever you want to call it) and so didn’t respond to the survey. The same thing happens when surveying something like backup testing frequency – where you *know* you’re supposed to do it, but don’t do it enough so feel guilty and don’t respond to the survey.

For those of you that responded, or didn’t respond and do have targets, well done! For those of you that don’t have targets, I don’t blame you, I blame the environment you’re in. Most DBAs I know that *want* to do something about HA/DR are prevented from doing so by their management not placing enough importance on the subject, from talking to a bunch of you. This is also shown by the demand for our various in-person training classes: IE2 on Performance Tuning is usually over-subscribed even though it runs 3-4 times per year, but IE3 on HA/DR has only sold out once even though we generally run it only once per year.

Performance is the number one thing on the collective minds of most I.T. management, not HA/DR planning, and that’s just wrong. Business continuity is so crucial, especially in this day and age of close competition where being down can cause fickle customers to move to a different store/service provider.

If you’re reading this and you know you don’t have well-defined uptime targets then I strongly encourage you to raise the issue with your management, as it’s likely that your entire HA/DR strategy is lacking too. For more information, you can read the results post from the survey five years ago (Importance of defining and measuring SLAs).

Don’t wait until disaster strikes to make HA/DR a priority.

Most common wait stats over 24 hours and changes since 2010

Back in February I kicked off a survey asking you to run code that created a 24-hour snapshot of the most prevalent wait statistics. It’s taken me a few months to provide detailed feedback to everyone who responded and to correlate all the information together. Thanks to everyone who responded!

I did this survey because I wanted to see how the results had changed since my initial wait statistics survey back in 2010.

The results are interesting!

2010 Survey Results

Results from 1823 servers, top wait type since server last restarted (or waits cleared). The blog post for this survey (Wait statistics, or please tell me where it hurts) has a ton of information about what these common wait types mean, and I’m not going to repeat all that in this blog post.

waitstatssurvey Most common wait stats over 24 hours and changes since 2010

2014 Survey Results

Results from 1708 servers, top wait type over 24 hours

2014waits Most common wait stats over 24 hours and changes since 2010

The distribution of the top waits has changed significantly over the last four years, even when taking into account that in the 2010 survey I didn’t filter out BROKER_RECEIVE_WAITFOR.

  • CXPACKET is still the top wait type, which is unsurprising
  • OLEDB has increased to being the top wait type roughly 17% of the time compared to roughly 4% in 2010
  • WRITELOG has increased to being the top wait 10% of the time compared with 6% in 2010
  • ASYNC_NETWORK_IO has decreased to being the top wait 8% of the time compared with 15% in 2010
  • PAGEIOLATCH_XX has decreased to being the top wait 7% of the time compared with 18% in 2010

These percentages remain the same even when I ignore the BROKER_RECEIVE_WAITFOR waits in the 2010 results.

Now I’m going to speculate as to what could have caused the change in results. I have no evidence that supports most of what I’m saying below, just gut feel and supposition – you might disagree. Also, even though the people reading my blog and responding to my surveys are likely to be paying more attention to performance and performance tuning than the general population of people managing SQL Server instances across the world, I think that these results are representative of what’s happening on SQL Server instances across the world.

I think that OLEDB waits have increased in general due to more and more people using 3rd-party performance monitoring tools that make extensive, repeated use of DMVs. Most DMVs are implemented as OLE-DB rowsets and will cause many tiny OLEDB waits (1-2 milliseconds on average, or smaller). This hypothesis is actually borne out by the data I received and confirmation from many people who received my detailed analyses of results they sent me. If you see hundreds of millions or billions of tiny OLEDB waits, this is likely the cause.

I think WRITELOG waits being the top wait have increased partly because other bottlenecks have become less prevalent, and so the next highest bottleneck is the transaction log, and partly because more workloads are hitting logging bottlenecks inside SQL Server that are alleviated starting in SQL Server 2012 (blog post coming next week!). I also think that WRITELOG waits have been prevented from becoming even more prevalent because of the increased use of solid-state disks for transaction log storage mitigating the increased logging from higher workloads.

Now it could be that the drop in PAGEIOLATCH_XX and ASYNC_NETWORK_IO waits being the top wait is just an effect caused by the increase in OLEDB and WRITELOG waits. It could also be because of environmental changes…

PAGEIOLATCH_XX waits being the top wait might have decreased because of:

  • Increased memory on servers meaning that buffer pools are larger and more of the workload fits in memory, so fewer read I/Os are necessary.
  • Increased usage of solid-state disks meaning that individual I/Os are faster, so when I/Os do occur, the PAGEIOLATCH_XX wait time is smaller and so the aggregate wait time is smaller and it is no longer the top wait.
  • More attention being paid to indexing strategies and buffer pool usage.

ASYNC_NETWORK_IO waits being the top wait might have decreased because of fewer poorly written applications, or fixes to applications that previously were poorly written. This supposition is the most tenuous of the four and I really have no evidence for this at all. I suspect it’s more likely the change is an effect of the changes in prevalence of the other wait types discussed above.

Summary

I think it’s interesting how the distribution of top waits has occurred over the last four years and I hope my speculation above rings true with many of you. I’d love to hear your thoughts on all of this in the post comments.

It’s not necessarily bad to have any particular wait type as the most prevalent one in your environment, as waits always happen, so there has to be *something* that’s the top wait on your system. What’s useful though is to trend your wait statistics over time and notice how code/workload/server/schema changes are reflected in the distribution of wait statistics.

There is lots of information about wait statistics in my Wait Statistics blog category and there’s a new whitepaper (SQL Server Performance Tuning Using Wait Statistics: A Beginners Guide) on wait statistics written by Jonathan and Erin in conjunction with Red Gate which you can download from our website here.

Enjoy!

Survey: target uptime – planned and actual

It’s been five years(!) since the last time I asked about your target uptimes for your critical SQL Server instances and I think we’d all be interested to see how things have changed.

Edit 6/2/14: The survey is closed now – see here for the results.

So I present four surveys to you. For your most critical SQL Server instance:

  • If it’s a 24×7 system, what’s the target uptime?
  • If it’s a 24×7 system, what’s your measured uptime over the last year?
  • If it’s not a 24×7 system, what’s the target uptime?
  • If it’s a 24×7 system, what’s your measured uptime over the last year?

You’ll notice that the surveys are termed in percentages. Here’s what the percentages mean for a 24×7 system:

  • 99.999% = 5.26 minutes of downtime per year
  • 99.99% = 52.56 minutes of downtime per year
  • 99.9% = 8.76 hours of downtime per year
  • 99.5% = 1.825 days of downtime per year
  • 99% = 3.65 days of downtime per year
  • 98.5% = 5.475 days of downtime per year
  • 98% = 7.3 days of downtime per year
  • 95% = 18.25 days of downtime per year

If your target uptime allows for planned maintenance downtime, then that doesn’t count as unplanned downtime, as long as your system was only down for the length of time allowed. But don’t cheat yourself and retroactively classify unplanned downtime as planned, so it doesn’t affect your actual, measured uptime.

For instance, if you have a 99.9% uptime goal for a 24×7 system, with a quarterly 4-hour maintenance window, then I would select 99.9% in the 24×7 target survey. For that same system, if the downtime was limited to the proscribed 4-hour window each quarter, and there was no other downtime *at all*, I would select 99.999% on the 24×7 measured uptime survey.

Basic advice is to use common sense in how you answer. If you say you have a 24×7 system but you have a 12-hour maintenance window each week, I wouldn’t classify that as a 24×7 system.

24×7 Systems

Survey 1: 24×7 system target uptime


Survey 2: 24×7 system measured uptime
Please be honest. Remember if you choose 99.999% that means you’re saying your system was up for all but 5 minutes in the last year.


Non-24×7 Systems

Survey 3: Non-24×7 system target uptime
Use ‘Other’ to answer if your answer is ‘No target or target unknown’.


Survey 4: Non-24×7 system measured uptime
Please be honest.



I’ll editorialize the results in a week or two.

Thanks!