New class: Immersion Event on Azure SQL Database and Azure VMs

We have a really cool new class debuting this October in Chicago: Immersion Event on Azure SQL Database and Azure VMs.

It’s a 2-day class, taught by Tim Radney. Azure is getting more and more popular, and we’re seeing many clients using it.

The modules are as follows:

  • Azure Virtual Machines
  • Migrating to Azure Virtual Machines
  • Azure SQL Database
  • Migrating to Azure SQL Database
  • Additional Azure Features

You can read a more detailed curriculum here and all the class registration and logistical details are here.

We hope to see you there!

SQLskills SQL101: Switching recovery models

As Kimberly blogged about recently, SQLskills is embarking on a new initiative to blog about basic topics, which we’re calling SQL101. We’ll all be blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

One of the things that can catch people out is the effect of switching out of the full recovery model temporarily. In this post I’ll briefly describe the three recovery models and then the problems you can have switching from full to simple, and from full to bulk-logged.

Recovery models

There are three recovery models:

  • Full recovery model (the default and the most commonly used)
    • All modifications in the database a fully logged. This doesn’t mean that every modification has a separate log record, as some operations are logged with fewer log records while still logging the entire effect of the operation (for instance, TRUNCATE TABLE operations – see here for a deep explanation).
    • The transaction log will not clear (i.e. portions are available for reuse) until a transaction log backup is performed (see here for a deep explanation).
    • All recovery options are available when a database is in the full recovery model (and has been since the last backup).
  • Bulk-logged recovery model
    • Some modifications (like an index rebuild or a bulk load, but NOT regular insert/update/deletes) can be minimally logged, which reduces the amount of log records generated so the transaction log does not have to grow really large during the operation. Note that this doesn’t change the size of subsequent log backups. For full instructions on how to get minimal logging for your operation, see the Data Loading Performance Guide whitepaper, which lists all the various conditions that have to be met.
    • The transaction log will not clear until a transaction log backup is performed (exactly the same as the full recovery model).
    • Using bulk-logged, you trade off some recovery options (point-in-time restore and tail-of-the-log backups) for the performance gains associated with minimally logged operations.
  • Simple recovery model
    • Some modifications can be minimally logged (exactly the same as the bulk-logged recovery model).
    • The log will not clear until a checkpoint occurs (usually automatically).
    • Transaction log backups are not possible, so this is the most limited in terms of recovery options.

Most people use the full recovery model, to allow log backups and permit all possible restore operations. The main thing to remember is that if your database uses the full or bulk-logged recovery model, you must perform periodic transaction log backups or the transaction log will grow forever.

Some circumstances call for simple; if you don’t need the ability to do point-in-time restore or zero-to-minimal data loss restores using log backups. An example would be a scratch database that’s repopulated once per day and any changes can be lost or easily regenerated.

Switching to Simple

Often I hear of people who switch to the simple recovery model  to try to avoid transaction log growth during a bulk load or index rebuild, when what they really mean to do is to use the bulk-logged recovery model. There are also persistent myths out there that some regular operations *require* being in the simple recovery model – this is simply (ha ha) not true.

Switching to the simple recovery model breaks the log backup chain, requiring a full or differential backup before any further log backups can be performed.

Furthermore, it limits your ability to recover during a disaster because you’ve now only got one full backup from which you can restore: the one you performed most recently. Think about it: your restore options become:

  • Full backup after switch to simple, plus the latest differential backup after that full (if you’re using differential backups) and any log backups since the switch back; or
  • Most recent full backup before switch to simple, plus the latest differential after the switch back from simple, plus any log backups

If that most-recent full backup (before or after the switch to simple) is damaged, you cannot restore – period. You can’t fall back on using the next older full backup, as that only allows the restore sequence up to, but not past, the switch to simple. Well, I guess you could do that, but then you lose all work since the switch to simple.

Switching to the simple recovery model is not something you automate or do repeatedly. About the only time when you would temporarily switch to simple is if your transaction log had run out of space and there is no way to allow it to clear (i.e. you cannot perform a log backup or add another log file) except by switching to simple and forcing a checkpoint operation. In that case you’re taking a drastic step to allow operations to continue, and being fully cognizant of the limited restore options available to you right then.

Unless you have this emergency situation, or you decide to use the simple recovery model permanently, you should not switch to simple ever.

Switching to Bulk-logged

Switching to bulk-logged during a load or index maintenance process is acceptable to avoid transaction log growth. In fact, switching back-and-forth between full and bulk-logged repeatedly doesn’t affect the log backup chain in any way. And doing so also doesn’t have any effect on log shipping or replication, but you can’t switch out of full when using database mirroring or an availability group as they mandate the full recovery model.

However, using bulk-logged can cause problems for disaster recovery, so even though its behavior may be desirable, you may need to avoid using it so you don’t risk compromising your disaster recovery options.

Problem 1: a log backup that contains a minimally-logged operation cannot be used during a point-in-time restore. This means the time you specify in the WITH STOPAT clause of the restore statement cannot be a time covered by such a log backup. You can use that log backup as part of a restore sequence, and stop at any point in time after it (as long as that point in time is not covered by another log backup containing a minimally-logged operation, of course), but just not during it.

Problem 2: if you need to perform a tail-of-the-log backup to capture all the log generated since the most recent scheduled log backup, the data files are inaccessible or damaged, and the log to be backed up contains a minimally-logged operation, that backup will fail prior to SQL Server 2008 R2, and from SQL Server 2008 R2 onward it will succeed, but be will corrupt the database when restored.

So if you’re going to use bulk-logged to save on log space during large operations, you need to make sure that a) there’s no possibility you’re going to want to restore between the last log backup and the next one, and b) there are no changes made to the database that you cannot recreate in case a disaster occurs and you can’t take a valid tail-of-the-log backup.

Switching recovery models between full and bulk-logged may not be as safe as you might think.

Summary

For every database that you’re responsible for, make sure that you understand the ramifications of changing the recovery model, as doing so could cause you problems with disaster recovery.

Do you get our bi-weekly newsletter?

You may not know this, but for the last six years I’ve been producing a bi-weekly newsletter for everyone signed up for our mailing list (which we only use for the newsletter). It’s a bit of a labor of love, and a way for me to post opinions and other information that doesn’t really fit on my blog.

I’ve just expanded the sections in the newsletter so I want to make sure you all know about it. It’s 100% free of any 3rd-party adverts and marketing and your email address will only ever be used for sending the newsletters – nothing else. And it’s 100% free to join, no special requirements.

Every newsletter has the following sections:

  • SQLskills News A quick round-up of new Pluralsight courses, classes, conferences, user groups, and SQLSaturdays where we’ll be presenting.
  • Book Review I read a huge amount and like to post a 1-2 paragraph review of something interesting I’ve read since the last newsletter. I’ve been told by many people that this is one of the sections they look forward to most!
  • The Curious Case of… Describing and explaining a problem we’ve recently found on a client system (or come across in an email question or while teaching a class) that is something others are likely to come across in their environments.
  • Paul’s Ponderings Where I write about a topic of interest to the SQL Server community.
  • Video Demo A never-before-seen demo video of something to do with SQL Server from one of the team. Occassionally I’ll pick a demo video from a Pluralsight course too.
  • #TBT (Turn Back Time) Highlighting some older resources around a particular topic, plus a selection of recent SQLskills blog posts.

Here’s an example newsletter from March 2017 so you can see what it’s like.

If you’re interested, just enter your email address below and hit Subscribe and you’ll be added to our newsletter list. No in-depth registration, no fuss, no time! Once you sign up you’ll be able to read all the previous newsletters too.

Enjoy!

SQLskills SQL101: Temporary table misuse

As Kimberly blogged about recently, SQLskills is embarking on a new initiative to blog about basic topics, which we’re calling SQL101. We’ll all be blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

After I left Microsoft in 2007, one of the first clients I worked with (who we’re still working with today) threw an interesting problem at me: “We can’t run stored proc X any more because it causes tempdb to fill the drive and then it fails.” Game on. I built some monitoring infrastructure into the proc using the DMV sys.dm_db_task_space_usage to figure out how much tempdb space was being used at various points and find the problem area.

It turned out to be problem *areas*, and in fact the proc was loaded with temporary table (I’ll just use the common contraction ‘temp table’ from now on) misuse, illustrating all three of the common temp table problems I’m going to describe below. Once I fixed that proc, (reducing the tempdb usage from more than 60GB down to under 1GB, and the run time from many minutes to a few tens of seconds) I implemented some automated monitoring built around the sys.dm_db_task_space_usage DMV to identify procedures and ad hoc queries that were misusing temp tables. We’ve since used this monitoring at many other clients to identify temp table misuse.

In this post I’d like to describe the three main ways that temp table are misused:

  • Over-population of temp tables
  • Incorrect indexing on temp tables
  • Using a temp table where none are required

Don’t get me wrong though – temp tables are great – when they’re used efficiently.

Over-Population of a Temp Table

This problem involves creating a temp table using something like a SELECT … INTO #temptable construct and pulling far more data into the temp table than is necessary.

The most common thing we see is pulling lots of user table columns into the temp table, where some of the columns are not used ever again in subsequent code. This is a HUGE waste of I/O and CPU resources (extracting the columns from the user table in the first place – and imagine the extra CPU involved if the source data is compressed!) and a big waste of tempdb space (storing the columns in the temp table). I’ve seen code pulling large varchar columns into a temp table that aren’t used, and with multi-million row datasets…

The other facet of over-population of temp tables is pulling in too many rows. For instance, if your code is interested in what happened over the last 12 months, you don’t need to pull in all the data from the last ten years. Not only will it be bloating the temp table, it will also drastically slow down the query operations. This was one of the biggest problems in the client scenario I described above.

The key to better performance is making sure your selection/projection is as focused as possible. To limit your selection, use an effective WHERE clause. To limit your projection, list only the necessary columns in your select list.

Incorrect Indexing on a Temp Table

This problem involves either creating indexes before populating the table (so that no statistics are generated) or creating a bunch of inappropriate indexes that are not used.

The most common example we see is creating a single-column nonclustered index for each of the temp table columns. Those are usually just taking up space for no use whatsoever. Temp tables *DO* need indexes (preferably after load) but as with any form of query tuning – only the RIGHT indexes. Consider creating permanent tables that mimic what’s going on in your temporary objects and then using the Database Tuning Advisor (DTA) to see if it has recommendations. While DTA’s not perfect, it’s often WAY better than guessing. Kimberly has a great post in our Accidental DBA series that discusses indexing strategies – start there.

Also, don’t create any nonclustered indexes until the temp table has been populated, otherwise they won’t have any statistics, which will slow down query performance, possibly drastically.

Oh yes, and, don’t create a clustered index for the temp table before populating it unless you know that the data being entered is already sorted to exactly match the cluster key you’ve chosen. If not, inserts into the temp table are going to cause index fragmentation which will really slow down the time it takes to populate the temp table. If you know the data is sorted and you create the clustered index first, there’s still no guarantee that the Storage Engine will feed the data into the temp table in the right order, so be careful. And if you go that route, remember that you’ll need to update the statistics of the clustered index after the temp table creation.

You need to be careful here because in some versions of SQL Server, changing the schema of a temp table in a stored proc can cause recompilation issues. Do some testing and pick the sequence of events that makes the most sense for performance in your situation.

Using a Temp Table Where None is Required

The SQL Server Query Optimizer is a fabulous beast and is very good at figuring out the most efficient way to execute most queries. If you choose to take some of the query operation and pre-calculate it into a temp table, sometimes you’re causing more harm than good. Any time you populate a temp table you’re forcing SQL Server to materialize the complete set of results of whatever query you ran to populate the temp table. This can really limit SQL Server’s ability to produce a pipeline of data flowing efficiently through a query plan and making use of parallelism and collapsing data flows when possible.

While it’s true that you might be able to do better than the optimizer sometimes, don’t expect that it’s the case all the time. Don’t just go straight to using temp tables, give the optimizer a chance – and, make sure to retest your code/expectations around Service Packs and hot fixes as these may have eliminated the need for temp tables as well.

A good way to test whether a temp table is actually a hindrance to performance is to take the tempdb-creation code, embed it as a derived table in the main query, and see if query performance improves.

It’s quite often the case that temp tables because an architectural standard in an environment when they proved useful long ago and now everyone used them, without ever checking if they’re *really* good for all cases.

One other thing you can consider is replacing temp tables with In-Memory OLTP  memory-optimized tables, in all Editions of SQL Server 2016 SP1 and later, and in Enterprise Edition of SQL Server 2014. That’s beyond the scope of this post, but you can read about it in this Books Online page on MSDN.

Summary

Always try to follow these guidelines when using a temp table:

  • Determine if a temp table is the most efficient way to achieve the goal of the code you’re writing
  • Limit the number of columns being pulled into the temp table
  • Limit the number of rows being pulled into the temp table
  • Create appropriate indexes for the temp table

Take a look at your current temp table usage. You may be surprised to find a lot of tempdb space and CPU resources being consumed by inappropriate temp table usage, population, and indexing.

Hope you found this helpful!

SQLintersection Spring 2017

As we head towards our 9th SQLintersection in May, I’m excited to say that it’s once again our most diverse, complete, and information-packed show yet!

One of the pieces of feedback we hear over and over is that attendees love SQLintersection because it’s a smaller, laid-back show, where you get to actually spend time talking with the presenters 1-1. I have to say that’s one of the reasons why we love the show so much; *we* get to spend time talking to attendees, rather than being mobbed by hundreds of people after a session ends. And we only pick presenters who we know personally, and who we know to be humble, approachable, and eager to help someone out.

We have 2 pre-con days at the show and with our post-con day, there are 7 full-day workshops from which to choose. We have 36 technology-focused (NOT marketing) sessions from which to choose, plus four relevant and timely keynotes:

  • Bob Ward on SQL Server 2016: It Just Runs Faster
  • Buck Woody on The Future of the RDBMS Professional
  • Shep Sheppard on Data Science: What Is It and Do I Need It?
  • Aaron Bertrand on SentryOne: Tools for Productivity and Performance

And the ever-lively closing Q&A that we record as a RunAs Radio podcast.

You’ll learn proven problem-solving techniques and technologies you can implement immediately. Our focus is around performance monitoring, troubleshooting, designing for scale and performance, cloud, as well as new features in SQL Server 2012, 2014, and 2016. It’s time to determine your 2008 migration strategy – should you upgrade to 2016 directly? This is the place to figure that out!

If you’re interested in how we got here – check out some of Kimberly’s past posts:

  1. SQLintersection: a new year, a new conference
  2. SQLintersection’s Fall Conference – It’s all about ROI!
  3. Fall SQLintersection is coming up soon and we can’t wait!
  4. SQLintersection Conference and SQLafterDark Evening Event – what a fantastic week in Vegas

And Kimberly recently recorded a Microsoft Channel 9 video where she discusses the upcoming show – see here.

SQLafterDark

With minimal to no marketing filler, we’ve largely kept our conference focus on ROI and technical content (performance / troubleshooting / tales-from-the-trenches with best practices on how to fix them ) but we’ve also added even more social events so that you really get time to intersect with the conference attendees and speakers. The addition of the SQL-specific, pub-quiz-style evening event SQLafterDark was wildly popular from some of our past shows and that’s returning for Spring!

 

SQLintersection: Great Speakers!

Once again, I think a great show starts with great speakers and current / useful content. All of these speakers are industry-experts that have worked in data / SQL for years (some can even boast decades) but all are still focused on consulting and working in the trenches. And, they’re good presenters! Not only will you hear useful content but you’ll do so in a way that’s digestible and applicable. Every speaker is either an MCM (Master), a SQL Server MVP, or a past/present Microsoft employee (or a combination of all three!) But, regardless of their official credentials – ALL are focused on providing the most ROI that’s possible in their session(s) and/or their workshops, and ALL have spoken for SQLintersection multiple times.

Check out this phenomenal list of speakers:

  • Aaron Bertrand – MVP, SentryOne
  • Ben Miller – MCM, MVP, Consultant
  • Bob Ward – Fount-of-Knowledge, Microsoft
  • Brent Ozar – MCM, Brent Ozar Unlimited
  • Buck Woody – MCC – Microsoft Circus Clown (long-running joke… :-), Microsoft
  • David Pless – MCM, Microsoft
  • Erin Stellato – MVP, SQLskills
  • Joe Sack – MCM, Microsoft
  • Jonathan Kehayias – MCM, MCM Instructor, MVP
  • Kimberly L. Tripp – MCM Instructor, MVP, past Microsoft, SQLskills
  • Paul S. Randal – MCM Instructor, MVP, past Microsoft, SQLskills
  • Shep Sheppard – past Microsoft, Consultant
  • Tim Chapman – MCM, Microsoft
  • Tim Radney – MVP, SQLskills

You can read everyone’s full bio on our speaker page here.

SQLintersection: When is it all happening?

The conference officially runs from Monday, May 22 through Wednesday, May 24 with pre-conference and post-conference workshops that extend the show over a total of up to 6 full days. For the full conference, you’ll want to be there from Saturday, May 20 through Thursday, May 25.

  • Saturday, May 20 – pre-con day. There are two workshops running:
    • PowerShell for the DBA from 0-60 in a Day with Ben Miller
    • SQL Server 2014 and 2016 New Features and Capabilities with David Pless and Tim Chapman
  • Sunday, May 21 – pre-con day. There are two workshops running:
    • Performance Troubleshooting using Waits and Latches with Paul S. Randal
    • Azure for the SQL Server DBA with Tim Radney
  • Monday, May 22 through Wednesday, May 24 is the main conference. Conference sessions will run all day in multiple tracks:
    • Check out our sessions online here
    • Be sure to check out our cross-conference events and sessions
    • Get your pop-culture trivia and techie-SQL-trivia hat on and join us for SQLafterDark on Tuesday evening, May 23
  • Thursday, May 25 is our final day with three post-conference workshops running:
    • Finding and Fixing Performance Problems in SQL Server with Erin Stellato and Jonathan Kehayias
    • Advanced SQL Server High Availability with Brent Ozar
    • Cortana Intelligence Suite – Microsoft R for Architects with Buck Woody

SQLintersection: Why is it for you?

If you want practical information delivered by speakers that not-only know the technologies but are competent and consistently, highly-rated presenters – this is the show for you. You will understand the RIGHT features to troubleshoot and solve your performance and availability problems now!

Check us out: www.SQLintersection.com.

We hope to see you there!

PS – Use the discount code ‘SQLskills’ when you register and receive $50 off registration!

50 online SQL Server training courses and a free trial

With the publication of our most recent Pluralsight course last month, we now have a whopping 50 online training courses available through Pluralsight, totally more than 150 hours of content. If you’re unable to come to one of our in-person Immersion Events in the US this year, these courses are a great way to learn from us. And for only $29.99/month, with access to over 5,000 courses in total, you there’s no more cost-effective way to gain new skills for yourself and your company.

You can even get a free trial of 200 minutes of listening over 10 days by going here.

Our top-5 most popular courses so far this year are:

  1. Communications: How to Talk, Write, Present, and Get Ahead! (Paul)
  2. SQL Server: Installing and Configuring SQL Server 2016 (Glenn)
  3. SQL Server: Optimizing Ad Hoc Statement Performance (Kimberly)
  4. SQL Server: Transact-SQL Basic Data Retrieval (Joe)
  5. SQL Server: Performance Troubleshooting Using Wait Statistics (Paul)

And we have courses in the works already for 2017 on:

  • Query Store (already published)
  • Azure SQL Database
  • Understanding and Using Backups
  • Indexing for Performance
  • Query Plan Analysis for Developers
  • Building Multi-Instance Asynchronous Applications
  • Building Scalable Asynchronous Applications
  • Upgrading to SQL Server 2016
  • And more!

Here’s the full list of our courses, grouped by area:

DBA/Systems Admin: Installation, Configuration, and Hardware

DBA: General

DBA: High Availability and Disaster Recovery

Developer/Architect: Writing T-SQL

Developer/Architect: General

All Roles: Performance Monitoring

All Roles: General Performance Tuning

All Roles: Query Plan Analysis and Tuning

All Roles: Career Growth

Why PFS pages cannot be repaired

Last week there was a short discussion on Twitter about why PFS pages (damaged header, not individual PFS bytes) can’t be repaired (prompted by a #sqlhelp question about why they can’t be single-page restored, like other per-database allocation bitmaps). Just for the record, they can’t be fixed by automatic page repair in a mirror or AG either.

PFS pages occur every 8088 pages in every data file and store a byte of information about itself and the following 8087 pages. The most important piece of information it stores is whether a page is allocated (in use) or not. You can read more about PFS pages and the other per-database allocation bitmaps in this blog post.

So why can’t they be rebuilt by DBCC CHECKDB, when all the other per-database allocation bitmaps can?

The answer is that the is-this-page-allocated-or-not information is not duplicated anywhere else in the database, and it’s impossible to reconstruct it in all cases.

You might think that DBCC CHECKDB could work out which pages are allocated by inferring that state if a page is linked to in some way from another page that’s known to be allocated, and it could do that, except for the case of a heap with no nonclustered indexes.

If a heap has no nonclustered indexes, there is no other structure in the database that links to any of the data pages in the heap. Therefore, without the information in a PFS page, there’s no way to tell which of the pages contained in extents allocated to the heap are actually allocated or not. This is because SQL Server does not touch a page when it is deallocated, so there’s nothing on a page that indicates whether a page is currently allocated or has been deallocated.

So what if DBCC CHECKDB can tell that there are no such cases in the database?

The answer is that the algorithm to rebuild a PFS page given the links from other pages to pages covered by that PFS range is extremely complicated and would involve searching through the entire database, reading and processing all pages a second time *after* they’ve been repaired, looking for linkages to pages in the broken range. While it sounds technically possible, when scoping out writing such an algorithm back in 2001-2002, I quickly ran into run-time and complexity challenges that made the work entirely infeasible.

It’s also not possible to just mark all the pages allocated – because then allocation-order scans would come across potentially unformatted pages and fail. It would also break backups that use WITH CHECKSUM. Such an algorithm could be made to work (in the absence of heaps with no nonclustered indexes), but has the same problems as the algorithm above.

So – PFS pages can’t be repaired, and unless the database structure changes to mirror that allocation information in some way, I don’t see that changing at any point in the future.

Hope you found this interesting!

SQLskills SQL101: Dealing with SQL Server corruption

As Kimberly mentioned last week, SQLskills is embarking on a new initiative to blog about basic topics, which we’re calling SQL101. We’ll all be blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

For my first SQL101 post, I’d like to touch on a subject that that has been core to my work since I graduated in 1994: dealing with corruption. You may not know that before joining the SQL Server engineering team at Microsoft in early 1999, I worked for the file system group at DEC (Digital Equipment), where among other things I was responsible for the VMS equivalent of the Windows chkdsk (called ANAL/DISK). It was this expertise with corruption and repairing it that led me to work on DBCC, rewriting much of the DBCC CHECKDB check and repair code for SQL Server 2005.

All through my professional career I’ve seen people make mistakes when they encounter corruption, so here I’d like to offer some quick guidelines for how to approach SQL Server corruption.

Don’t panic

When corruption appears, it can be scary. Suddenly your main database has all these errors and you don’t know what to do. The absolute best thing you can do is to keep calm and make rational decisions about how to proceed. If you knee jerk or jump to conclusions or let someone pressure you into make a snap decision, the odds are you will make a mistake and make the situation worse.

Make use of the run book

Check to see if your team or department has a disaster recovery handbook (often called a run book). This should give you useful information for you like:

  • How to access the backups
  • How to access Windows and SQL Server installation media and product keys
  • Who to call in various other departments for assistance with infrastructure
  • Who to call for help in your department
  • Who to notify of the problem (think CIO, CTO, I.T. Director)
  • How to proceed with various scenarios (e.g. restoring the main production database, or performing a bare-metal install of a new server)

Your run book might say to immediately fail over to a synchronous Availability Group replica, or some other redundant copy of the database, no matter what the problem is and then figure out the problem on the main production database afterwards. If that’s the case, that’s what you do.

And if you’re reading this and thinking ‘Hmm – we don’t have one of those…’, then that’s a big problem that should be addressed, as well as making sure that even the most junior DBA can follow the various procedures in it.

Consult my comprehensive flow chart

A few years ago I wrote a large flow chart for SQL Server Magazine, and it’s available in PDF poster form here (archived on a friend’s blog).

This can also form the basis of a run book if you don’t have one.

Understand the extent of the corruption

It is a very good idea to run DBCC CHECKDB on the database (if you haven’t already) to determine the extent of the corruption.

Depending on where the corruption is, you may be able to restore in a lot less time than restoring the entire database. For instance, if only a single page is damaged, you might be able to do a single-page restore. If only a single filegroup is damaged, you might be able to do a single filegroup restore.

Depending on what the corruption is, you may not even have to restore. For instance, if the corruption is confined to nonclustered indexes (all the corruption messages list index IDs higher than 1), you can rebuild the corrupt indexes manually with code like the following:

BEGIN TRANSACTION;
GO

ALTER INDEX CorruptIndexName ON TableName DISABLE;
GO
ALTER INDEX CorruptIndexName ON TableName REBUILD WITH (ONLINE = ON);
GO

COMMIT TRANSACTION;
GO

That means you don’t have to restore or use repair, both of which incur downtime.

Consider the ramifications of the actions you’re planning

If you’ve never dealt with corruption before and you’re not an experienced DBA, there are actions that might be tempting that could cause you bigger headaches than just having corruption.

Some examples:

  • If you have a corrupt database, don’t try to detach it from the instance as you likely won’t be able to attach it again because of the corruption. This especially applies if the database is marked as SUSPECT. If you ever have this scenario, you can reattach the database using the steps in my post Disaster recovery 101: hack-attach a damaged database.
  • If your SQL Server instance is damaged, and the database is corrupt, don’t try to attach it to a newer version of SQL Server, as the upgrade might fail and leave the database in a state where it can’t be attached to either the old or new versions of SQL Server.
  • If crash recovery is running, don’t ever be tempted to shut down SQL Server and delete the log file. That is guaranteed to cause at least data inconsistencies and at worst corruption. Crash recovery can sometimes take a long time, depending on the length of open transactions at the time of the crash that must be rolled back.

If you’re planning or have been told to do something, make sure you understand what the ramifications of that thing are.

Don’t just jump to repair

The repair option is called REPAIR_ALLOW_DATA_LOSS because you’re telling DBCC CHECKDB that it can lose data to perform repairs. The repairs (with a few exceptions) are written as ‘delete what’s broken and fix up all the links’. That’s because that’s usually the only way to write a repair algorithm for a certain corruption that fixes it in 100% of cases without making things worse. After running repair, you will likely have lost some data, and DBCC CHECKDB can’t tell you what it was. You really don’t want to run repair unless you can avoid it.

Also, there are some cases of corruption that absolutely cannot be repaired (like corrupt table metadata) so then you *have* to have backups or a redundant copy to use.

There is a last resort that we make a documented feature back in SQL Server 2005 – EMERGENCY-mode repair – for when the transaction log is damaged. That will try to get as much data out of the transaction log as possible and then run a regular repair. Although that may get the database back online, you’ll likely have data loss and data inconsistencies. It really is a last resort, and it’s not infallible either.

You really want to have backups to use or a redundant copy to fail over to instead.

But if you *have* to use repair, try to do it on a copy of the corrupt database. And then go fix your backup strategy so you aren’t forced to use repair again in future.

Be very careful with 3rd-party tools

There are some 3rd-party tools that will try to do repairs or extract data out. I’ve seen them work sometimes and I’ve seen them spectacularly fail and totally trash a database at other times. If you’re going to try one of these out, do it on a copy of the corrupt database in case something goes wrong.

Ask for help (but be careful)

If you don’t know what to do and you’re concerned that you’ll make things worse or make a wrong decision, try asking for help. For free, you could try using the #sqlhelp hashtag on Twitter, you could try posting to a forum like http://dba.stackexchange.com/ or one of the https://www.sqlservercentral.com/Forums/. Sometimes I’ll have time to respond to a quick email giving some direction, and sometimes I’ll recommend that you get some consulting help to work on data recovery.

You can also call Microsoft Customer Support for assistance, but you’ll always need to pay for that unless the source of the corruption turns out to be a SQL Server bug.

Wherever you get the help from though, be careful that the advice seems sound and you can verify the suggestion with well-known and reputable sources.

Do root cause analysis

After you’ve recovered from the corruption, try to figure out why it happened in the first place as the odds are that it will happen again. The overwhelmingly vast majority of corruptions are caused by the I/O subsystem (including all the software under SQL Server), with a very small percentage being caused by memory chip problems, and a smaller percentage being caused by SQL Server bugs. Look in the SQL Server error log, Windows event logs, ask the Storage Admin if anything happened, and so on.

Practice and research

It’s a *really* good idea to practice recovering from corruption before you have to do it for real. You’ll be more comfortable with the procedures involved and you’ll be more confident. I have some corrupt databases that you can download and practice with here.

There’s also a lot of instructional information on my blog under the following categories:

And there are two Pluralsight online training courses I’ve recorded which will give you an enormous boost in practical knowledge:

Summary

Ok – so it turned out to not be quite as quick as I thought! However, this is all 101-level information that will help you work through a corruption problem or exercise. I’ll be blogging a lot more of these 101-level posts this year. If there’s anything in particular you’d like to see us cover at that level, please leave a comment.

Enjoy!

Who is overriding MAXDOP 1 on the instance?

In Monday’s newsletter I discussed an email question I’d been sent and promised to blog about it.

The question was “how can latch waits for ACCESS_METHODS_DATASET_PARENT occur when the instance MAXDOP is set to 1?” This shouldn’t happen, as that latch is only used for coordination between threads performing a parallel table/index scan. However, just because the instance MAXDOP is set to 1, that doesn’t mean no parallelism can occur.

Anyone can override the instance MAXDOP setting using a query hint.

There are two ways to prevent this:

  • Set the instance cost threshold for parallelism setting to be extremely high, so no serial plans can be deemed expensive enough to run in parallel
  • Make everyone run within a Resource Governor (Enterprise Edition only) workload group that has the MAX_DOP set to 1

Or you can figure out where the query hint is being used and remove it.

In this post I’m going to show you a simple Extended Events session that will capture information about what’s causing CXPACKET waits (you can’t have ACCESS_METHOD_DATASET_PARENT latch waits without CXPACKET waits happening too) and then refine it to use a less expensive event.

First off I’ll set my MAXDOP to 1:

EXEC sp_configure 'show advanced options', 1;
RECONFIGURE WITH OVERRIDE;
GO

EXEC sp_configure 'max degree of parallelism', 1;
RECONFIGURE WITH OVERRIDE;
GO

SELECT [name], [value_in_use] FROM sys.configurations WHERE [name] LIKE '%degree%';
GO
Configuration option 'show advanced options' changed from 0 to 1. Run the RECONFIGURE statement to install.
Configuration option 'max degree of parallelism' changed from 0 to 1. Run the RECONFIGURE statement to install.
name                       value_in_use
-------------------------- -------------
max degree of parallelism  1

Then I’ll check for CXPACKET waits (using my waits query) after running the following query, that scans a 6.7 million row table (you can get the SalesDB database from here):

SELECT * FROM [SalesDB].[dbo].[Sales] WHERE [Quantity] > 10000;
GO

No CXPACKET waits.

But if I add a MAXDOP query hint and then check for waits:

SELECT * FROM [SalesDB].[dbo].[Sales] WHERE [Quantity] > 10000 OPTION (MAXDOP 8);
GO
WaitType            Wait_S  Resource_S Signal_S WaitCount Percentage AvgWait_S AvgRes_S AvgSig_S Help/Info URL
------------------- ------- ---------- -------- --------- ---------- --------- -------- -------- ---------------------------------------------
LATCH_EX            0.20    0.16       0.05     93        79.61      0.0022    0.0017   0.0005   https://www.sqlskills.com/help/waits/LATCH_EX
CXPACKET            0.05    0.05       0.00     16        20.00      0.0032    0.0032   0.0000   https://www.sqlskills.com/help/waits/CXPACKET

And the instance MAXDOP was successfully overridden.

Now I’ll set up a simple Extended Events session to track down the offending code (based on the query from here). It’s very important that you query the sys.dm_xe_map_values DMV to find the correct number to use in the query for the CXPACKET wait, as these numbers often change from release to release, and even in Service Packs. For instance, CXPACKET was 191 in SQL Server 2014 RTM, but is 190 in the 2014 build I’m using.

Be very careful about running this in production, as the event will fire for *every* wait that occurs and so will likely affect your workload throughput, even though it’ll short-circuit if the wait isn’t CXPACKET. I’ll show you a better event to use lower down.

IF EXISTS (
    SELECT * FROM sys.server_event_sessions
        WHERE [name] = N'InvestigateWaits')
    DROP EVENT SESSION [InvestigateWaits] ON SERVER
GO

CREATE EVENT SESSION [InvestigateWaits] ON SERVER
ADD EVENT [sqlos].[wait_info]
(
    ACTION (
        sqlserver.client_hostname,
        sqlserver.nt_username,
        sqlserver.sql_text)
    WHERE [wait_type] = 190 -- CXPACKET only
    AND [opcode] = 1 -- Just the end wait events
)
ADD TARGET [package0].[ring_buffer]
WITH
(
    MAX_MEMORY = 50 MB,
    MAX_DISPATCH_LATENCY = 5 SECONDS)
GO

ALTER EVENT SESSION [InvestigateWaits] ON SERVER STATE = START;
GO

And then when I run the select statement again I can look in the ring buffer and see the events. I put in a DISTINCT to minimize the number of lines of output. The code is:

SELECT 
    DISTINCT ([data1].[value] ('(./@timestamp)[1]', 'datetime')) AS [Time],
    [data1].[value] ('(./action[@name="client_hostname"]/value)[1]', 'VARCHAR(MAX)') AS [Host],
    [data1].[value] ('(./action[@name="nt_username"]/value)[1]', 'VARCHAR(MAX)') AS [User],
    [data1].[value] ('(./action[@name="sql_text"]/value)[1]','VARCHAR(MAX)') AS [Statement]
FROM (
    SELECT CONVERT (XML, [target_data]) AS data
    FROM sys.dm_xe_session_targets [xst]
    INNER JOIN sys.dm_xe_sessions [xs]
        ON [xst].[event_session_address] = [xs].[address]
    WHERE [xs].[name] = N'InvestigateWaits') AS t
CROSS APPLY data.nodes('//event') n (data1);
GO
Time                    Host       User            Statement
----------------------- ---------- --------------- ---------------------------------------------------------------------------------
2017-02-16 00:20:16.937 APPLECROSS APPLECROSS\Paul SELECT * FROM [SalesDB].[dbo].[Sales] WHERE [Quantity] > 10000 OPTION (MAXDOP 8);
2017-02-16 00:20:16.987 APPLECROSS APPLECROSS\Paul SELECT * FROM [SalesDB].[dbo].[Sales] WHERE [Quantity] > 10000 OPTION (MAXDOP 8);
2017-02-16 00:20:16.937 APPLECROSS                 SELECT * FROM [SalesDB].[dbo].[Sales] WHERE [Quantity] > 10000 OPTION (MAXDOP 8);

Cool – so that works, but the session will likely affect workload performance, as I mentioned above. A better event to use is degree_of_parallelism, which was introduced in SQL Server 2012 and only fires once per batch execution, rather than once for every wait that occurs.

The updated event session is:

CREATE EVENT SESSION [InvestigateWaits] ON SERVER
ADD EVENT [sqlserver].[degree_of_parallelism]
(
    ACTION (
        sqlserver.client_hostname,
        sqlserver.nt_username,
        sqlserver.sql_text)
    WHERE [dop] > 0 -- parallel plans
)
ADD TARGET [package0].[ring_buffer]
WITH
(
    MAX_MEMORY = 50 MB,
    MAX_DISPATCH_LATENCY = 5 SECONDS)
GO

And the code to parse the XML, and sample output from my query is:

SELECT 
    [data1].[value] ('(./@timestamp)[1]', 'datetime') AS [Time],
    [data1].[value] ('(./data[@name="dop"]/value)[1]', 'INT') AS [DOP],
    [data1].[value] ('(./action[@name="client_hostname"]/value)[1]', 'VARCHAR(MAX)') AS [Host],
    [data1].[value] ('(./action[@name="nt_username"]/value)[1]', 'VARCHAR(MAX)') AS [User],
    [data1].[value] ('(./action[@name="sql_text"]/value)[1]','VARCHAR(MAX)') AS [Statement]
FROM (
    SELECT CONVERT (XML, [target_data]) AS data
    FROM sys.dm_xe_session_targets [xst]
    INNER JOIN sys.dm_xe_sessions [xs]
        ON [xst].[event_session_address] = [xs].[address]
    WHERE [xs].[name] = N'InvestigateWaits') AS t
CROSS APPLY data.nodes('//event') n (data1);
GO
Time                    DOP Host       User            Statement
----------------------- --- ---------- --------------- ---------------------------------------------------------------------------------
2017-02-16 00:36:37.347 8   APPLECROSS APPLECROSS\Paul SELECT * FROM [SalesDB].[dbo].[Sales] WHERE [Quantity] > 10000 OPTION (MAXDOP 8);

Pretty cool, eh?

You can mess around with more complex code that does things like grab the tsql_stack action and then parses it out, and more information to identify the culprit, like the application name – however complex you want to get – but now you know the base event session to capture when the query hint is being used.

Enjoy!

Presenting at SQLSaturday Dublin in June

 SQLSaturday #620 - Dublin 2017

Although we won’t be presenting any Immersion Events in Europe in 2017, Kimberly and I will both be presenting workshops and sessions at SQLSaturday Dublin in June – our first SQLSaturday ever!

 

My workshop details:

  • Performance Troubleshooting Using Waits and Latches
  • One of the first things you should check when investigating performance issues are wait and latch statistics, as these can often point you in the right direction for further analysis. Unfortunately many people misinterpret what SQL Server is telling them and jump to conclusions about how to solve the problem – what is often called ‘knee-jerk performance tuning’.
    In this full-day workshop, you’ll learn how to investigate and interpret wait and latch statistics – practical techniques you can take home and start using immediately. You’ll also learn what a myriad of wait and latch types actually mean and how you can investigate lightly-documented types to gain insight into what causes them. Don’t waste time when it comes to performance troubleshooting; wait statistics are the fastest route to understanding your problems and this workshop will help you get there faster.
  • Thursday, June 15th
  • Eventbrite registration link

Kimberly’s workshop details:

  • Queries Gone Wrong: Statistics, Cardinality, Solutions
  • Have you ever wondered why SQL Server did what it did when processing your query? Have you wondered if it could have done better? Query estimates/statistics are often at the key to understanding; this session will explain the what, why, and how about estimates!
    Most of the time SQL Server returns data quickly – except when it doesn’t. Ultimately, what you see in the plan, just doesn’t seem to make sense. Why? Where did it come up with this plan? From one side, Transact-SQL is a declarative language that details what data you need but without information about how SQL Server should get it. Join order, predicate analysis – this is what SQL Server has to decide based on your query’s input? But, what should be processed first? Which table should we use to “drive” the join? Usually, it’s the table with the smallest number of rows that match your query’s predicates. But, how do they know which has the smallest set before they process the data? Statistics!
    There are numerous reasons why query performance can suffer and in this full-day workshop, Kimberly will cover a number of critical areas and for each – show you the behaviors, execution plans, troubleshooting techniques, and most importantly, possible solutions. This full-day workshop is about solving your query performance problems. Each problem has a different way of approaching it and you’ll walk away with a plethora of strategies to troubleshoot and tackle even gnarly query problems. Stop with the “sledgehammer” approaches (updating statistics, rebuilding indexes, recompiling plans, clearing cache, restarting SQL Server) and solve the problem. In this full-day workshop, you’ll learn much more finessed ways to solve query plan quality problems.
    Topics covered include understanding / maintaining statistics, handing VLTs / skewed data, distribution problems, troubleshooting common and advanced scenarios, and how to best utilize the cardinality estimation models (and trace flags) available in SQL Server versions 2008-2016.
  • Friday, June 16th
  • Eventbrite registration link

My Saturday session details:

  • Advanced Data Recovery Techniques
  • Disasters happen – plain and simple. When disaster strikes a database you’re responsible for, and backups and repair fail, how can you salvage data, and possibly your company and your job? This is where advanced data recovery techniques come in. Using undocumented tools and deep knowledge of database structures, you can manually patch up the database enough to extract critical data. This demo-heavy session will show you never-seen-before methods I’ve used extensively in the last year to salvage data for real-life clients after catastrophic corruption. You won’t believe what it’s possible to do!

Kimberly’s Saturday session details:

  • Plan Cache Pollution: Dealing with Ad Hoc Madness
  • How you execute your data requests can have a profound effect on performance and plan reuse. Did you ever wonder where that estimate (in the showplan) comes from? Is it correct or, is it way off? Why? You’ve probably heard that many of your problems are related to statistics. Potentially, you’ve even rebuilt statistics only to find that it fixes the problem. However, what you might be seeing is a false positive. And, it doesn’t always work. Come to this session to find out how you should be executing your statements for the best performance, caching, and reuse! We’ll look at ad hoc statements, dynamically constructed statements, and sp_executesql (e.g. forced statement caching) and I’ll debunk the mysteries around estimates so that you can solve performance problems the RIGHT way! If you want to demystify SQL Server’s decisions used for statement execution, query plans, and plan caching – this is the place to be!

The link to the main SQLSaturday Dublin web page is here.

There are a ton of excellent speakers attending from around the world, so it’s going to be a really great weekend – we hope to see you there!