SQLintersection Fall 2017

As we head towards our 10th SQLintersection in October, I’m excited to say that it’s once again our most diverse, complete, and information-packed show yet!

One of the pieces of feedback we hear over and over is that attendees love SQLintersection because it’s a smaller, laid-back show, where you get to actually spend time talking with the presenters 1-1. I have to say that’s one of the reasons why we love the show so much; *we* get to spend time talking to attendees, rather than being mobbed by hundreds of people after a session ends. And we only pick presenters who we know personally, and who we know to be humble, approachable, and eager to help someone out.

We have 2 pre-con days at the show and with our post-con day, there are 9 full-day workshops from which to choose. We have 40 technology-focused (NOT marketing) sessions from which to choose, plus two SQL Server keynotes, multiple industry-wide keynotes by Microsoft executives, and the ever-lively closing Q&A that we record as a RunAs Radio podcast.

You’ll learn proven problem-solving techniques and technologies you can implement immediately. Our focus is around performance monitoring, troubleshooting, designing for scale and performance, cloud, as well as new features in SQL Server 2014, 2016, and 2017. It’s time to determine your 2008 / 2008 R2 migration strategy – should you upgrade to 2016/2017 directly? This is the place to figure that out!

If you’re interested in how we got here – check out some of Kimberly’s past posts:

  1. SQLintersection: a new year, a new conference
  2. SQLintersection’s Fall Conference – It’s all about ROI!
  3. Fall SQLintersection is coming up soon and we can’t wait!
  4. SQLintersection Conference and SQLafterDark Evening Event – what a fantastic week in Vegas

And Kimberly recorded a Microsoft Channel 9 video where she discusses the Spring show – see here.

SQLafterDark

With minimal to no marketing filler, we’ve largely kept our conference focus on ROI and technical content (performance / troubleshooting / tales-from-the-trenches with best practices on how to fix them ) but we’ve also added even more social events so that you really get time to intersect with the conference attendees and speakers. The addition of the SQL-specific, pub-quiz-style evening event SQLafterDark was wildly popular from some of our past shows and that’s returning for Spring!

 

SQLintersection: Great Speakers!

Once again, I think a great show starts with great speakers and current / useful content. All of these speakers are industry-experts that have worked in data / SQL for years (some can even boast decades) but all are still focused on consulting and working in the trenches. And, they’re good presenters! Not only will you hear useful content but you’ll do so in a way that’s digestible and applicable. Every speaker is either an MCM (Master), a SQL Server MVP, or a past/present Microsoft employee (or a combination of all three!) But, regardless of their official credentials – ALL are focused on providing the most ROI that’s possible in their session(s) and/or their workshops, and ALL have spoken for SQLintersection multiple times.

Check out this phenomenal list of speakers:

  • Aaron Bertrand – MVP, SentryOne
  • David Pless – MCM, Microsoft
  • Jes Borland, past-MVP, Microsoft
  • Jonathan Kehayias – MCM, MCM Instructor, MVP
  • Justin Randall, MVP, SentryOne
  • Kimberly L. Tripp – MCM Instructor, MVP, past Microsoft, SQLskills
  • Paul S. Randal – MCM Instructor, MVP, past Microsoft, SQLskills
  • Shep Sheppard – past Microsoft, Consultant
  • Stacia Varga, MVP, Consultant
  • Tim Chapman – MCM, Microsoft
  • Tim Radney – MVP, SQLskills

You can read everyone’s full bio on our speaker page here.

SQLintersection: When is it all happening?

The conference officially runs from Tuesday, October 31 through Thursday, November 2 with pre-conference and post-conference workshops that extend the show over a total of up to 6 full days. For the full conference, you’ll want to be there from Sunday, October 29 through Friday, November 3.

  • Sunday, October 29 – pre-con day. There are two workshops running:
    • Data Due Diligence – Developing a Strategy for BI, Analytics, and Beyond with Stacia Varga
    • Performance Troubleshooting Using Waits and Latches with Paul S. Randal
    • SQL Server 2014 and 2016 New Features and Capabilities with David Pless and Tim Chapman
  • Monday, October 30 – pre-con day. There are two workshops running:
    • Building a Modern Database Architecture with Azure with Jes Borland
    • Data Science: Introduction to Statistical Learning and Graphics with R and SQL Server with Shep Sheppard
    • Extended Events: WTF OR FTW! with Jonathan Kehayias
  • Tuesday, October 31 through Thursday, November 2 is the main conference. Conference sessions will run all day in multiple tracks:
    • Check out our sessions online here
    • Be sure to check out our cross-conference events and sessions
    • Get your pop-culture trivia and techie-SQL-trivia hat on and join us for SQLafterDark on Wednesday evening, November 1
  • Friday, November 3 is our final day with three post-conference workshops running:
    • Common SQL Server Mistakes and How to Correct Them with Tim Radney
    • SQL Server 2016 / 2017 and Power BI Reporting Solutions with David Pless
    • Very Large Tables: Optimizing Performance and Availability through Partitioning with Kimberly L. Tripp

SQLintersection: Why is it for you?

If you want practical information delivered by speakers that not-only know the technologies but are competent and consistently, highly-rated presenters – this is the show for you. You will understand the RIGHT features to troubleshoot and solve your performance and availability problems now!

Check us out: www.SQLintersection.com.

We hope to see you there!

PS – Use the discount code ‘SQLskills’ when you register and receive $50 off registration!

New Pluralsight course: Indexing for Performance

It’s been a while coming but it’s worth the wait! Kimberly’s latest Pluralsight course has been published – SQL Server: Indexing for Performance – and it’s a monster 7 hours long and full of extensive demos.

The modules are:

  • Introduction
  • Row-based vs. Column-based Indexes
  • Row-based Index Concepts
  • Choosing the Clustering Key
  • Clustered Index Internals
  • Nonclustered Index Internals
  • Data Access Patterns
  • Understanding the Tipping Point
  • Covering Queries
  • Using INCLUDE to Cover Queries
  • Understanding Filtered Indexes
  • Index Consolidation

Check it out here.

We now have more than 165 hours of SQLskills online training available (see all our 53 courses here), all for as little as $29/month through Pluralsight (including more than 5,000 other developer and IT training courses). That’s unbeatable value that you can’t afford to ignore.

Enjoy!

10 years ago today…

…the second stage of my life ended, when I left Microsoft on August 31st, 2007.

The first stage of my life was education, ending in June 1994, with a Batchelor of Engineering (Hons.) degree in Computer Science and Electrical Engineering from the University of Edinburgh.

The second stage of my life was being an employee.

The first company I worked for was Digital Equipment Corporation (a.k.a. DEC, a.k.a. Digital), working on the OpenVMS/VMS file system (including the VMS chkdsk equivalent, disk/tape initialization, and general mag tape functionality) from August 1994 to January 1999. Working for DEC taught me how to be a real software engineer, and I’m very thankful that my first boss and mentor was Duncan McLaren – a damn fine engineer, a sincere geek, and good friend. I consider myself extremely lucky to have worked at DEC with Duncan to start my career and I really loved working there.

And that led me to the second and last company I worked for: Microsoft, from February 1999 to August 2007, including the jump from Scotland to Redmond, WA. See here for an explanation of how that happened. It was at Microsoft that I grew into a principal software engineer and learned how to be an engineering manager, both of software engineers and of program managers, and to lead product-wide projects. But the most important things I learned at Microsoft were how to lead and mentor people, and the value of a technical community.

Of all the people I worked with and got to know at Microsoft, the person who stands out the most is my great friend Bob Ward, who I first met in early 1999. We worked very closely over the years, and labored tirelessly together on the supportability (and PSS training) of the entire SQL Server box for the SQL Server 2005 release. You won’t find a more passionate advocate for solving customer problems, a more capable SQL Server debugger, or a more honest and sincere human being.

I absolutely *adored* working at Microsoft but in the end I couldn’t do what I wanted to do in 2007: work with Kimberly and spend my time teaching and consulting, so I made the decision to leave in June 2007 and gave the SQL Server group leaders 10 weeks notice, so I could transition everything properly.

Now I’m in the third stage of my life – owning and running a company: SQLskills, with my wonderful wife Kimberly, who’d been running the company by herself since she started it in 1995 until I joined in 2007. Kimberly taught me how to be a great presenter, how to run a successful small business, and how to be more empathetic in the ‘real world’ outside Microsoft.

This has been, and continues to be, a blast and I wouldn’t trade my life now for anything else. I’ve lost count of the number of classes I’ve taught, conferences I’ve presented at, and blog posts I’ve done here (ok, that I can put a number to: 947, including this one), plus I naturalized as a U.S. citizen in 2012. I must say that part of what makes this job so enjoyable is the superb team we have: Jon (since 3/15/11…), Glenn (5/1/12…), Erin (8/1/12…), Tim (1/19/15…), and our long-suffering assistant Libby (since forever…).

However, the thing I enjoy the most about my job now is helping people – through my blog, email, Twitter, my waits library, our classes and conferences, and more – and I don’t see that changing any time soon!

What will the next ten years bring, I wonder? Stay tuned to find out :-)

 

SQLskills SQL101: Readable secondary performance problems

As Kimberly blogged about earlier this year, SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

Yesterday I blogged about log shipping performance issues and mentioned a performance problem that can be caused by using availability group readable secondaries, and then realized I hadn’t blogged about the problem, only described it in our Insider newsletter. So here’s a post about it!

Availability groups (AGs) are pretty cool, and one of the most useful features of them is the ability to read directly from one of the secondary replicas. Before, with database mirroring, the only way to access the mirror database was through the creation of a database snapshot, which only gave a single, static view of the data. Readable secondaries are constantly updated from the primary so are far more versatile as a reporting or non-production querying platform.

But I bet you didn’t know that using this feature can cause performance problems on your primary replica?

As with most things in life, you don’t get anything for free. Readable secondaries are really useful, but there is a performance trade off you need to be aware of. All queries that are executed against a readable secondary are automatically run using read-committed snapshot isolation. This means they do not require share locks and so will not block any database changes being replayed from the primary replica (i.e. the constant redo of log records on the secondary replica that have been sent from the primary replica).

To do this requires the use of the versioning system, where (simplistically) pre-change versions of records are copied into the version store in tempdb and queries work out which version of the record is the correct one for them to process, based on the query’s starting time. All records that change get a 14-byte tag added on the end of the record that allows a query to see if this is the correct record, and if not to follow a pointer to the previous version of the record in the version store. This has been the mechanism since snapshot isolation and read-committed snapshot isolation were introduced in SQL Server 2005.

Now consider this: all AG replicas are exact copies of the primary replica. So how can versioning work on the readable secondary, adding 14-byte tags to some records? That must break the ‘exact copy’ rule, right?

Well, yes, it would… if the primary replica didn’t also change.

When a readable secondary is configured in an AG environment, all changing records on the primary replica start getting empty 14-byte versioning tags added to them. This is so that the 14-bytes of extra space on the record is noted in the transaction log and replayed on the secondary replicas, allowing the readable secondary to make use of the empty 14-byte space to store the versioning tag it needs.

This doesn’t break the ‘exact copy’ rule because the 14-bytes isn’t used for anything to do with recovery, there just has to be 14-bytes there.

So versioning tags start getting added to changing records on the primary (to be clear, it doesn’t turn on versioning on the primary) so table and index records start to get 14-bytes longer. And what happens when records get longer on pages where there isn’t enough space? Page splits in your indexes (and forwarded records in heaps – but I’ll concentrate on indexes here) leading to low page densities (wasted disk space and buffer pool memory), logical fragmentation (poor scan performance), and a bunch of extra, expensive log record generation from the page splits themselves.

To counteract this, you’ll need to implement (and/or possibly lower existing) fill factors on your indexes and even potentially start doing index maintenance on indexes that may not have required it previously. Quite an insidious problem that can be hard to figure out unless you know what’s going on under the covers!

See the following blog posts for more info:

This MSDN page has more general information and this whitepaper from Microsoft explains in more depth the various performance impacts from using readable secondaries: AlwaysOn Solution Guide: Offloading Read-Only Workloads to Secondary Replicas.

If you’re implementing readable secondaries in your AG configuration, make sure that you also investigate and implement index fill factors in the database so that the versioning tags that are added under the covers don’t start causing page splits and fragmentation.

SQLskills SQL101: Log shipping performance problems

As Kimberly blogged about earlier this year, SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

One question I’m asked regularly is this:  When our log shipping secondary is applying log backups, sometimes it takes a lot longer than usual. Any idea why this might be the case?

Log shipping has been around forever, and it’s still a hugely applicable and useful feature for very simply maintaining one or more secondary copies of a database. You can also use a secondary copy for reporting, where the restore of the log backup uses the WITH STANDBY option, leaving the secondary database in an accessible, but read-only state (when the logs aren’t being applied).

This works as follows:

  1. Make sure all users are disconnected from the secondary database
  2. Write all the log records from the log backup into the secondary database’s log file
  3. Perform the REDO part of recovery (ensuring that all operations from committed transactions are present in the secondary database)
  4. Perform the UNDO part of recovery (ensuring that all operations from uncommitted transactions are not present in the secondary database)

Step 4 writes all the log records generated by the UNDO operations into a special file called the undo file. This means that the secondary database is in read-only mode and is transactionally-consistent so that users can access it. The reason the log records are written into the undo file is so that the transaction log of the secondary database is not altered in any way, allowing subsequent log backups to be restored. If this weren’t the case, the UNDO log records would advance the secondary database’s LSN (Log Sequence Number), meaning that subsequent log backup restore operations would fail.

When the restore process begins on the secondary database, if an undo file exists, there is another step that is performed before steps 2-4 above. This additional step needs to take all the log records in the undo file and undo the effects of them – essentially putting the secondary database back into the state as of the end of step 3 from the previous restore. This database state is the same as if the previous log backup had been restored using WITH NORECOVERY instead of WITH STANDBY.

The occasional long-running restore problem happens when a log backup is restored that contains a long-running transaction that does not commit before the end of the log backup. This means that it must be completely undone as part of restoring the log backup (step 4), resulting in a very large undo file. This in itself can make restoring a log backup take a lot longer than usual. When the next log backup is restored, the additional step that undoes all the log records in the undo file has a very large undo file to process and takes much, much longer than usual. And if the log backup being restored also has an uncommitted, long-running transaction then it’s the perfect storm as the step 4 will also take a long time. These steps are all made even longer still if the log file has too many VLFs (called VLF fragmentation).

The situation where I’ve seen this most often is when the primary database is undergoing index maintenance and a log backup finishes near the end of a very long-running index rebuild operation of a large clustered index. The initial restore of that log backup on the secondary database takes much longer than usual to complete because of step 4 in the restore process. The next log backup on the primary also completes just before an index rebuild completes. When it is restored on the secondary, the whole of the large undo file has to be undone again, then the log restore occurs, and then another large undo file is generated to undo the second uncommitted index rebuild.

This is a possibility you have to be aware of if the secondary database must be available 24×7 for reporting, with only minimal downtime when each log backup is restored. In that case I would carefully augment the index maintenance operations on the primary with log backups to ensure that only complete, committed index rebuilds are present in the log backups being restored on the secondary database. Similar precautions should be taken if you have other, occasional, long-running operations.

An alternative would be to move from log shipping to database mirroring or availability groups, where the log records are continually being sent from the principal to the mirror database (or primary to secondary replica databases, in availability group terms) and there are no extra steps involving undoing log operations multiple times. With database mirroring, the drawback of this is that reporting would have to use database snapshots, so there’s a complexity trade-off involved. With availability groups, the reporting would have to use a readable secondary, which can lead to index fragmentation on the primary replica, but that can be compensated for with index fill factors (see here for more details).

So there you have it. Another example where understanding how SQL Server performs common operations can make it much easier to diagnose performance problems.

So why do you want to come to our training? And the winners are…

Wow! We had 22 entries for our competition (writing a post is a high bar so I was expecting 20-30) to win a free seat in either our IEPTO1 or IEPTO2 training classes in Chicago in October. That was a lot of reading to get through to pick the winners, but we made it. A bunch of the entrants even started SQL Server blogs so they could participate!

First off – we’d like to thank you all for the kind words about us.

Here are some quotes:

  • “For anyone who’s seen their PluralSight or MCM videos, or been lucky enough to see them at conferences, you know the kind of quality you can expect.”
  • “When it comes to SQL Server, they don’t just work with it, they helped develop it!  That alone keeps me in a little state of awe.”
  • “I see these classes as the gold standard in SQL Server training, and plan to attend someday by hook or by crook.”
  • “If you can attend these classes, you will be a giant leap ahead in the path that is SQL Server enlightenment.”
  • “The wealth of information available on SQLskills website is astounding…”
  • “I want to attend training because I want to be better at my job today, tomorrow and next year.”
  • “I want my brain to be filled past the brim with SQL Server internals and performance tuning knowledge.”
  • “No other blogger goes into SQL’s internal mechanics like he does, and word on the street is that his class is phenomenal.”
  • “I’d be able to better support our development team, know WHY something broke, and the best way to fix it, and continue being the vital asset that the company hired me to be.”
  • “I attended the IE2 course a few years ago and it was easily the best training I’ve ever been to for anything. The attention to detail was a big part of that. No questions went unanswered and the PowerPoint slides were packed with the full content rather than high-level tag lines. So when you got home and opened the course materials again you knew what they referred to.”
  • “When you are attending these courses you are not being taught by some schmuck who is reading through a book. You are being taught and guided by people who live and breathe the subject.”
  • “…where the the mysterious is made mundane and the pieces of the puzzles, even the pieces that were lost, are found and put together.”
  • “Also, out of all the places I have learned from, SQLSkills training is by far the best.  It is the best training that I have found anywhere.”
  • “I have used the skills Jonathan and Erin conveyed to troubleshoot a plethora of SQL Server administrative problems experienced both internally at my current company and externally on many of our customer’s production sites.”
  • “SQL has a lot of Red buttons,  I want to know which ones to push,  but more importantly I want to know what they do so I can know why to push them.”

There were so many good entries that we literally couldn’t decide, and we struggled *really* hard. We narrowed it down to the top 5 entries and then tried to figure out some way to pick the winners. Then we said “screw it, we’ll give all five a seat”. So there are five winners!

The grand prize winners for IEPTO1 are…
ALAN HORSMAN, BERT WAGNER, and LISA HESS – who each win US$3,795 of training!

The grand prize winners for IEPTO2 are…
BRIAN AVERITT and DAVID STOKES – who each win US$3,795 of training!

Woohoo! Well done to all of you! I’ve sent the winners an email with instructions on what to do next (if you’re a winner and haven’t received an email, please check your junk folder). All prizes are non-transferable and have no cash value.

Commiserations to those who didn’t win, but again, thank you for trying.

We’re looking forward to seeing you in Chicago in six weeks! (Btw, we’ve extended the early-bird pricing for all Chicago classes until the end of this month…)

Paul and Kimberly

Here are all the entries, divided by class requested, in the order in which they were posted – if you need some way of justifying the training to your company, just point them here! Winning posts have an asterisk.

IEPTO1

  1. * Alan Horsman: SQL Server Performance and Optimization Training
  2. Justin Onstot: The Path to Enlightenment – SQLSkills style!
  3. Josh Smith: Why I’m Hoping I Can Attend Training with the SQL Skills Team
  4. * Bert Wagner: SQLskills is Giving Away Free Training!
  5. David Clary: When the Student is Ready, The Teacher Will Appear
  6. Sina Hassanpour: SQLSkills-Challenge
  7. Ryan Boyer: Head over to SQLSkills.com! They’re giving away training!
  8. Chris Albert: The Gift of Training
  9. Matt Bowler: Immersion Submission!!
  10. * Lisa Hess: Missing Puzzle Pieces
  11. Vijai Kingston: Learn from the expert – “For FREE?”
  12. Mike Laugle: SQLSkills Competition
  13. Zach Wagner: SQL Skills Competition
  14. Kevin Crisci: PLEASE PICK ME

IEPTO2

  1. Mike McKay: Training course rant & SQLSkills
  2. * Brian Averitt: Not All Training Is Created Equal
  3. Francis Hanlon: Improving your SQLskills
  4. Aleksey Fomchenko: The best SQL Server training ever
  5. Vedran Ikonic: Will SQLskills IEPTO2 Class make me a better DBA? IT DEPENDS!
  6. Karthick Jayavelu: Tune Yourself to Optimize SQL Performance
  7. * David Stokes: What an opportunity!
  8. Mohammad Darab: IEPTO2: Performance Tuning and Optimization – Part 2 (Competition)

Competition: win a free seat in our classes in October!

We’re really excited about the classes we’ve got coming up in Chicago and to say ‘thank you’ to the SQL community for all your support of SQLskills.com over the years, we’re running a competition. It’s been a while since we ran a competition so we’re giving away TWO prizes this time!

THE COMPETITION IS NOW CLOSED. Winner details are here.

The Prize

There are two prizes:

  • free seat in our 5-day IEPTO1: Performance Tuning and Optimization – Part 1 class in Chicago, October 2-6, 2017 (link to class details)
  • free seat in our 5-day IEPTO2: Performance Tuning and Optimization – Part 2 class in Chicago, October 9-13, 2017 (link to class details)

If you win the competition, you pay US$0 instead of the US$3,295 early-bird price or US$3,795 full price. You’ll be responsible for all travel expenses and your hotel bill. The class includes breakfast, lunch, all-day tea/coffee/soda and tons of juicy content :-)

The Challenge

All you have to do is pick which class you’d like to attend and write a blog post before end-of-day Sunday, August 13th, 2017 explaining:

  • Why you want to come to that class taught by us
  • Why you’d make the best use of the knowledge you’ll get from being in the class
  • What your favorite performance tuning challenge is

SQLskills will select what we consider to be the best entry for each prize.

The Rules

  • You can only enter the competition once (i.e. one blog post choosing one class)
  • You must write a blog post. Answers in the comments section of this post will be not be accepted. If you don’t have a blog, it’s easy to create one – e.g. wordpress.com
  • Your blog post must go live before 11.59pm Pacific Time on Sunday, August 13th, 2017
  • Your blog post must specify which class you’d like to attend and contain a link to the class details (see The Prize section above)
  • You must post a comment on this blog post that links to your blog post before 11.59pm Pacific Time on Sunday, August 13th, 2017 (comments are moderated, so don’t panic if you don’t see it appear immediately)
  • If you’re on Twitter, you must tweet your blog post using the #sqlskills #sqlserver hash tags
  • If you win, you will be able to register for the class you chose in your blog post at the special price of US$0
  • If you win, only you can attend the class at the special price of US$0 – your prize is not transferable and is only valid for the October 2017 class
  • If you win, you will be responsible for all travel and hotel costs for attending the class
  • If you enter the competition we will include your name and link to your post in a blog post announcing the lucky winners

If you don’t agree with the rules, don’t enter – simple.

The Summary

We’ll announce the lucky winners on Monday, August 14th, 2017 in a blog post on my blog and on Tuesday in the next Insider newsletter (delayed a day).

Good luck and we hope to see you there!

SQLskills SQL101: REBUILD vs. REORGANIZE

As Kimberly blogged about earlier this year, SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

Often when discussing index fragmentation I find that people aren’t aware of all the differences between rebuilding an index (using ALTER INDEX … REBUILD) and reorganizing an index (using ALTER INDEX … REORGANIZE), so that’s the topic for this post. Everything I say below applies to row-based indexes only (i.e. not columnstore indexes), and equally to a clustered index or a nonclustered index.

Space Required

Rebuilding an index requires building a new index before dropping the old index, regardless of the amount of fragmentation present in the old index. This means you need to have enough free space to accommodate the new index.

Reorganizing an index first squishes the index rows together to try to deallocate some index pages, and then shuffles the remaining pages in place to make their physical (allocation) order the same as the logical (key) order. This only requires a single 8-KB page, as a temporary storage for pages being moved around. So an index reorganize is extremely space efficient, and is one of the reasons I wrote the original DBCC INDEXDEFRAG for SQL Server 2000 (the predecessor of ALTER INDEX … REORGANIZE).

If you have space constraints, and can’t make use of single-partition rebuild, reorganizing is the way to go.

Algorithm Speed

An index rebuild will always build a new index, even if there’s no fragmentation. The length of time the rebuild takes is related to the size of the index, not the amount of fragmentation in it.

Reorganizing an index only deals with the fragmentation that exists, and the more fragmentation there is, the longer a reorganize will take.

This means that for a lightly fragmented index (e.g. less than 30% fragmentation), it’s generally faster to reorganize the index, but for a more heavily fragmented index, it’s generally faster to just rebuild the index. This is why you may have seen thresholds of 0 to 5-10% do nothing, 5-10% to 30% reorganize, 30%+ rebuild. I created this guidance while I was at Microsoft – see here.

Transaction Log Generated

In the FULL recovery mode, an index rebuild is fully logged, so the transaction log will have to accommodate the full size of the index in a single transaction. This also means the entire generated transaction log may need to be mirrored, sent to your AG replicas, scanned by replication, backed up, and so on.

In the SIMPLE and BULK_LOGGED recovery modes, the amount of transaction log generated by an index rebuild will be minimal – just the allocations of pages and extents. However, the next log backup performed (either in BULK_LOGGED or after switching to FULL) will also contain all the extents changed by the rebuild, and so the log backup will be roughly the same size as if the rebuild was done in the FULL recovery mode. The benefits are in time and the fact that the transaction log itself does not have to accommodate the full size of the index during the rebuild in a single transaction.

In all recovery modes, reorganizing an index is fully logged, but is performed as a series of small transactions so should not cause the transaction log to grow inordinately. And of course, transaction log is only generated for the operations performed, which may be less for a reorganize as it only deals with fragmentation that exists.

Locks Required

An offline index rebuild of any index holds a schema-modification (i.e. super-exclusive) table lock – no updates or reads of the entire table.

An online index rebuild of any index acquires a short-term shared table lock at the start of the operation, holds an intent-shared table lock throughout the operation (which will only block exclusive and schema-modification table locks), and then acquires a short-term schema-modification  table lock at the end of the operation. ‘Online’ is a bit of a misnomer. From SQL Server 2014, you can use the WAIT_AT_LOW_PRIORITY option to delay the potential for blocking – see this blog post.

An index reorganize holds an intent-exclusive table lock throughout the operation, which will only block shared, exclusive, and schema-modification table locks. One of the major reasons I wrote DBCC INDEXDEFRAG for SQL Server 2000 was as an online alternative to DBCC DBREINDEX.

Interruptible or Not

An index rebuild operation cannot be interrupted without it rolling back everything it’s done so far – it’s atomic – all or nothing. In SQL Server 2017, however, there is a resumable-online index rebuild feature.

An index reorganize can be interrupted and the worst that will happen is that a single page move operation is rolled back.

Progress Reporting or Not

Index rebuilds do not have proper progress reporting. You can hack it for online index operations by looking at the bigintdata1 column in the Progress Report: Online Index Profiler event, which happens to show how many rows of the old index have been scanned. You can also hack it for index operations by looking at the number of page reads the SPID has done in sys.dm_exec_requests.

Index reorganize operations populate the percent_complete column of sys.dm_exec_requests so you can easily gauge how much work remains. In fact DBCC INDEXDEFRAG also used to do progress reporting, but less elegantly, by printing a progress message to your connection every 30 seconds.

Statistics

An index rebuild will always rebuild the index column statistics with the equivalent of a full scan (or sampled, for an index partition or if the index is partitioned).

An index reorganize does not see a total view of the index and so cannot update statistics, meaning that manual index statistics maintenance is required.

Summary

As you can see, there are quite a few major differences between rebuilding and reorganizing, but there’s no right answer as to which one you should use – that’s your choice.

If you have an index maintenance routine that always rebuilds and never considers reorganizing, you should reconsider. It’s usually better to reorganize a lightly fragmented index and rebuild a more heavily fragmented index – to save time and resources. You’ll find that most index maintenance products and freely-available scripts allow you to make that choice.

And as always, rather than writing your own index maintenance solution, I recommend Ola Hallengren’s free code (yes, other people have done similar, but I think Ola’s is by far the best and most widely used).

New course: Upgrading and Migrating to SQL Server 2016

We’ve just published a new Pluralsight course!

This is Glenn’s latest course – SQL Server: Upgrading and Migrating to SQL Server 2016 – and is just over two hours long. It’s a must for anyone working on a production upgrade/migration.

The modules are:

  • Introduction
  • Upgrade Planning
  • Upgrade Testing
  • Migration Planning
  • Migration Testing
  • Production Migration

Check it out here.

We now have 150 hours of SQLskills online training available (see all our 52 courses here), all for as little as $29/month through Pluralsight (including more than 5,000 other developer and IT training courses). That’s unbeatable value that you can’t afford to ignore.

Enjoy!

SQLskills SQL101: Why DBCC CHECKDB can miss memory corruption

As Kimberly blogged about earlier this year, SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

An interesting situation was discussed online recently which prompted me to write this post. A fellow MVP was seeing periodic corruption messages in his error log, but DBCC CHECKDB on all databases didn’t find any corruptions. A subsequent restart of the instance caused the problem to go away.

My diagnosis? Memory corruption. Something had corrupted a page in memory – maybe it was bad memory chips or a memory scribbler (something that writes into SQL Server’s buffer pool, like a poorly-written extended stored procedure), or maybe a SQL Server bug. Whatever it was, restarting the instance wiped the buffer pool clean, removing the corrupt page.

So why didn’t DBCC CHECKDB encounter the corrupt page?

The answer is to do with DBCC CHECKDB‘s use of database snapshots (and all other DBCC CHECK* commands). It creates a database snapshot and then runs the consistency-checking algorithms on the database snapshot. The database snapshot is a transactionally-consistent, unchanging view of the database, which is what DBCC CHECKDB requires.

More info on DBCC CHECKDB’s use of snapshots, and potential problems can be found at:

A database snapshot is a separate database as far as the buffer pool is concerned, with its own database ID. A page in the buffer pool is owned by exactly one database ID, and cannot be shared by any other databases. So when DBCC CHECKDB reads a page in the context of the database snapshot, that page must be read from the source database on disk; it cannot use the page from the source database if it’s already in memory, as that page has the wrong database ID.

This means that DBCC CHECKDB reads the entire source database from disk when it uses a database snapshot. This is not a bad thing.

This also means that if there’s a page in the source database that’s corrupt in memory but not corrupt on disk, DBCC CHECKDB will not encounter it if it uses a database snapshot (the default).

If you suspect that a database has some corruption in memory, the only way to have DBCC CHECKDB use the in-memory pages, is to use the WITH TABLOCK option, which skips using a database snapshot and instead uses locks to quiesce changes in the database.

Hope this helps clear up any confusion!