SQLskills SQL101: How can corruptions disappear?

As Kimberly blogged about earlier this year, SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

Every so often I get an email question from a confused DBA: why is it that sometimes corruptions seem to disappear?

The situation is commonly as follows:

  • There is a regular SQL Agent job that runs DBCC CHECKDB
  • One morning the DBA finds that the job failed, reporting corruptions in one of the databases
  • The DBA runs DBCC CHECKDB on that database again, but this time there are no reported corruptions

This can lead the DBA to mistrust DBCC CHECKDB. Remember the SQL Server 2000 days where sometimes DBCC CHECKDB occasionally reported corruptions when there weren’t any? Those days are long gone now: if DBCC CHECKDB reports corruption, then at that time that it ran there was definitely corruption.

Think about what DBCC CHECKDB is doing: it reads and processes all the allocated pages in the database – all the pages that are part of tables and indexes at the time that DBCC CHECKDB runs. It doesn’t check all the pages in the data files; only those that are currently being used. The pages that are not currently allocated to an object cannot be checked as there’s no “page history” maintained. There’s really no way for DBCC CHECKDB to tell if they have ever been used before or not and since they’re not currently allocated there’s no valid page structure on them and no past to verify.

And if your database is still being accessed then the set of allocated pages can change after DBCC CHECKDB runs. A simple example of this occurring is:

  • Nonclustered index X of table Y has some corrupt pages in, which the DBCC CHECKDB (being run by a SQL Agent job) reports
  • Another SQL Agent job runs and performs index maintenance where it rebuilds index X (the rebuild operation always builds a new index and then drops the old index)
  • The DBA runs DBCC CHECKDB manually and there are no corruptions reported in the new index structure

Nonclustered index corruption is the best kind of corruption to have. The rebuild operation rewrote the index to a new set of pages and deallocated the pages that had corruption. When DBCC CHECKDB is run manually, those new pages are not corrupt and the old pages are not checked, as they are no longer in use.

These kind of ‘disappearing’ corruptions are a problem because it’s almost impossible to investigate them further. However, they could indicate a problem with your I/O subsystem. If you find that they’re occurring repeatedly, consider briefly preventing the process that causes the corrupt pages to be deallocated so you can investigate the corruption.

Another cause of disappearing corruptions can be transient I/O subsystem problems, where page reads sometimes fail outright and then succeed after that. Take a look at these blog posts on read-retry and Agent alerts for more information.

And yet one more cause could be that the database is mirrored or is part of an availability group and the page was fixed by automatic page repair before the second DBCC CHECKDB occurred. You can look in the msdb.dbo.suspect_pages table (more details here) for an entry for the broken page(s) with event_type of 4.

Bottom line: From SQL Server 2005 onward, if DBCC CHECKDB reports corruption, then at the time that it ran there definitely was corruption. Make sure you don’t just ignore the problem as next time the corruption occurs, you may not be so ‘lucky’ that it just seemed to disappear.

SQLskills SQL101: Should you kill that long-running transaction?

As Kimberly blogged about earlier this year, SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

One of the problems you may have encountered is a transaction log file growing because of a long-running query.

What do you do?

The temptation is to kill the query, but is that the correct approach? Well, as always, the answer starts with ‘it depends’.

A long-running query, no matter how much work it’s done (and hence how many log records it has generated) will prevent the log from clearing, as the log all the way back to the LOP_BEGIN_XACT log record of the long-running transaction is required, in case that transaction rolls back. And the log will not be able to clear until (at least) that long-running transaction has committed or finished rolling-back.

You can tell how many log records a transaction has generated using my script here, along with the total space taken up in the log by the transaction.

If the long-running query has generated hardly any log records, then killing it will mean that it rolls-back quickly and then hopefully the next log backup (in the full and bulk-logged recovery models) or checkpoint (in the simple recovery model) will allow the log to clear and stop its growth.

However, if the long-running query has generated a *lot* of log records, then it’s going to take a long time to roll back (as rolling back each log record means generating the ‘anti-operation’ for that log record, making the change, and generating *another* log record describing the change). That rollback itself won’t cause the log to grow any more, as a transaction always reserves free space in the log to allow it to roll back without requiring log growth (my script above also tells you that amount of space). However, as it will take a long time to roll back, other activity in the log from other transactions will likely cause the log to grow more until it’s finally able to clear.

So it may actually be better to allow a long-running transaction that’s generated a lot of log records to continue running until it completes. If the time to completion is going to be a lot less than the time to roll back, this could mean less overall extra log growth until you’re able to finally clear the log, and then potentially resize it back to normal, and continue running.

The trick is knowing what the query is doing and/or being able to figure out how close to completion it is. You could look at is the logical_reads column in the DMV sys.dm_exec_requests and correlate that with the number of pages in the index or table being scanned, or look at the number of log records generated in the script output and correlate that to the number of records you’d expect an UPDATE statement to perform.

Bottom line: don’t always knee-jerk and decide to cancel a problematic, long-running query without thinking about what the effect of that will be.

Spring 2018 classes in Chicago and Bellevue open for registration

I’ve just released our first set of 2018 classes for registration!

We’ll be adding new classes in Chicago over the coming weeks, including classes on security, data mining and AI, and BI, and Kimberly’s long-awaited class on data warehousing/partitioning/very-large tables. We’re also planning some classes in Europe, in London and/or Dublin (details before the end of 2017).

All classes have discounts for registering before the end of 2017! (details on the individual class web pages…)

Our classes in April will be in Chicago, IL:

  • IEPTO1: Immersion Event on Performance Tuning and Optimization – Part 1
    • April 23-27
  • IE0: Immersion Event for Junior/Accidental DBAs
    • April 23-25
  • IEUpgrade: Immersion Event on Upgrading and New Features
    • April 23-25
  • IECAG: Immersion Event on Clustering and Availability Groups
    • April 26-27
  • IEAzure: Immersion Event on Azure SQL Database and Azure VMs
    • April 26-27
  • IEPTO2: Immersion Event on Performance Tuning and Optimization – Part 2
    • April 30 – May 4
  • IEPS: Immersion Event on PowerShell for SQL Server DBAs
    • April 30 – May 2
  • Coming soon for the week of April 30: classes on security and BI
  • IESSIS1: Immersion Event on Learning SQL Server Integration Services
    • May 7-11
  • Coming soon for the week of May 7: classes on data mining and AI.

Our class in June will be in Bellevue, WA:

  • IEPTO1: Immersion Event on Performance Tuning and Optimization – Part 1
    • June 18-22

You can get all the logistical, registration, and curriculum details by drilling down from our main schedule page.

We hope to see you there!

New Pluralsight course: Analyzing Query Performance for Developers

Erin’s latest Pluralsight course has been published – SQL Server: Analyzing Query Performance for Developers. It’s four hours long, and from the course description:

Developers are sometimes charged with fixing performance issues, and they have no idea where to start. They’ve never seen a query plan, or when they look at one they’re not sure what it *really* means. The developers know how long it takes for a query to execute, but they don’t know how to see how much IO, CPU, or memory it uses. And sometimes they just create an index and that seems to fix the problem (even though they have no idea why!). In this course, SQL Server: Analyzing Query Performance for Developers, you’ll learn about all the data that SQL Server generates when a query executes – it’s more than just the query plan – and you’ll learn where it exists and how to find it. First, you’ll discover what the data represents and how to read a query plan. Next, you’ll explore some of the query plan operators you’ll see most often. Finally, you’ll learn what information is included in the plan, which may not be immediately obvious, but can be extremely valuable when troubleshooting. When you’re finished this course, you’ll have the skills and knowledge to start examining and understanding query plans and making query changes for better performance!

The modules are:

  • Introduction
  • Finding Information About Queries
  • Understanding Query Performance Metrics
  • Reading Query Plans
  • Operators in a Query Plan
  • Important Information in a plan

Check it out here.

We now have more than 165 hours of SQLskills online training available (see all our 54 courses here), all for as little as $29/month through Pluralsight (including more than 5,000 other developer and IT training courses). That’s unbeatable value that you can’t afford to ignore.


SQLskills SQL101: Running out of ints and bigints

As Kimberly blogged about earlier this year, SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

We’re teaching a class this week, and one topic that always comes up is using an int identity as a clustering key and the possibility of running out of integers. Depending on your insert volume, this might be quite likely, as an int can only store 2^32 (^ = ‘to the power’) or about 4 billion values, between -2^31 and 2^31-1.

Imagine that you have a theoretical system that can create a thousand data rows per second. Using an int identity value increasing by 1 and starting at 1, you’ll run out of values when the value hits 2^31-1 and tries to insert the next value. Let’s simplify the math by just saying that 2^31 is the limit. With a thousand values per second, that would mean 2^31 / 1,000 = 2.15 million seconds or just under 25 days. While many of you don’t sustain 1,000 rows per second, this is still a very problematic limitation.

One solution is to use a bigint identity as the key. This can store 2^64 or about 18.5 quintillion (18.5 billion billion) values, between -2^63 and 2^63-1.

Every so often someone asks whether it’s possible to run out of bigint values. My answer is no. Well, technically yes, there is a limit, but in practical terms the answer is no.

Now imagine that you have a theoretical system that can create a million data rows per second, with a bigint identity value increasing by 1 and starting at 1. You’ll run out of values when the value hits 2^63-1 and tries to insert the next value. With a million values per second, that would mean 2^63 / 10^6 = 9.2 trillion seconds or approximately 292.5 thousand years. And by then it’s someone else’s problem… :-) And that’s only for half the possible range of bigint values.

Now what about the storage for those values? Doing a quick test of a heap with a single bigint identity column shows me that I can get 453 rows per 8KB data file page (don’t forget the record overhead, slot array overhead, and that the heap pages won’t be filled completely because of the way the free space caching and searching works). A terabyte of data would store roughly 61 billion rows.

So with 1 million rows per second, you’ll be generating 1 million x 3,600 (seconds in an hour) x 24 (hours in a day) = 86.4 billion rows per day, so you’ll need about 1.4 terabytes of new storage per day. If you’re using the bigint identity as a cluster key, each row needs new space, so you’ll need almost exactly 0.5 petabytes of new storage every year.

At that rate, actually running out of bigint values AND storing them would take roughly 150 thousand petabytes. This is clearly impractical – especially when you consider that storing *just* a bigint is pretty pointless – you’d be storing a bigint and some other data too – probably doubling the storage necessary, at least.

Why is this interesting? We’ve had a number of clients over the years that didn’t consider their data volume and designed a schema using int keys instead of bigint keys. When the inevitable happened and they ran out of int values, the process of changing to a bigint key was quite painful – as there’s no really easy, space and log efficient way to do it once you have the 2 billion rows, and especially if constraints are involved, and application changes need to be made to allow 8-byte values instead of 4-byte values in result sets.

A common stop-gap solution people use when they run out of int values is to just reset the identity seed to -2^31 and then set the increment to 1. As a short-term solution this does work, especially if the int key is a surrogate key and doesn’t have a business meaning, but it’s not ideal as a long term solution as you’ll only run out again once the int key kits -2^31. Ultimately, you’ll need to make the int to bigint change.

Summary: make sure that when you’re designing a new schema, you think through the maximum values required and pick appropriate data types then and there. Changing data types can be very painful once the system has been in production for a while and there’s a lot of data in the schema.

PS If you honestly believe you’ll run out of bigint values, you can use a decimal or numeric value, both of which can hold -10^38 to 10^38+1. Those are really big numbers. 10^ 38 is about 2^129, or 100 undecillion, or 2^64 times more values than a bigint can hold. Using our million-row-per-second server, inserting 10^38 values would take 10^38 / 10^6 = 10^32 seconds = roughly 3,170 billion billion years. Now if you’re concerned about *that*, the sun will have become a red giant and incinerated the Earth in about 5 billion years…

SQLintersection Fall 2017

As we head towards our 10th SQLintersection in October, I’m excited to say that it’s once again our most diverse, complete, and information-packed show yet!

One of the pieces of feedback we hear over and over is that attendees love SQLintersection because it’s a smaller, laid-back show, where you get to actually spend time talking with the presenters 1-1. I have to say that’s one of the reasons why we love the show so much; *we* get to spend time talking to attendees, rather than being mobbed by hundreds of people after a session ends. And we only pick presenters who we know personally, and who we know to be humble, approachable, and eager to help someone out.

We have 2 pre-con days at the show and with our post-con day, there are 9 full-day workshops from which to choose. We have 40 technology-focused (NOT marketing) sessions from which to choose, plus two SQL Server keynotes, multiple industry-wide keynotes by Microsoft executives, and the ever-lively closing Q&A that we record as a RunAs Radio podcast.

You’ll learn proven problem-solving techniques and technologies you can implement immediately. Our focus is around performance monitoring, troubleshooting, designing for scale and performance, cloud, as well as new features in SQL Server 2014, 2016, and 2017. It’s time to determine your 2008 / 2008 R2 migration strategy – should you upgrade to 2016/2017 directly? This is the place to figure that out!

If you’re interested in how we got here – check out some of Kimberly’s past posts:

  1. SQLintersection: a new year, a new conference
  2. SQLintersection’s Fall Conference – It’s all about ROI!
  3. Fall SQLintersection is coming up soon and we can’t wait!
  4. SQLintersection Conference and SQLafterDark Evening Event – what a fantastic week in Vegas

And Kimberly recorded a Microsoft Channel 9 video where she discusses the Spring show – see here.


With minimal to no marketing filler, we’ve largely kept our conference focus on ROI and technical content (performance / troubleshooting / tales-from-the-trenches with best practices on how to fix them ) but we’ve also added even more social events so that you really get time to intersect with the conference attendees and speakers. The addition of the SQL-specific, pub-quiz-style evening event SQLafterDark was wildly popular from some of our past shows and that’s returning for Spring!


SQLintersection: Great Speakers!

Once again, I think a great show starts with great speakers and current / useful content. All of these speakers are industry-experts that have worked in data / SQL for years (some can even boast decades) but all are still focused on consulting and working in the trenches. And, they’re good presenters! Not only will you hear useful content but you’ll do so in a way that’s digestible and applicable. Every speaker is either an MCM (Master), a SQL Server MVP, or a past/present Microsoft employee (or a combination of all three!) But, regardless of their official credentials – ALL are focused on providing the most ROI that’s possible in their session(s) and/or their workshops, and ALL have spoken for SQLintersection multiple times.

Check out this phenomenal list of speakers:

  • Aaron Bertrand – MVP, SentryOne
  • David Pless – MCM, Microsoft
  • Jes Borland, past-MVP, Microsoft
  • Jonathan Kehayias – MCM, MCM Instructor, MVP
  • Justin Randall, MVP, SentryOne
  • Kimberly L. Tripp – MCM Instructor, MVP, past Microsoft, SQLskills
  • Paul S. Randal – MCM Instructor, MVP, past Microsoft, SQLskills
  • Shep Sheppard – past Microsoft, Consultant
  • Stacia Varga, MVP, Consultant
  • Tim Chapman – MCM, Microsoft
  • Tim Radney – MVP, SQLskills

You can read everyone’s full bio on our speaker page here.

SQLintersection: When is it all happening?

The conference officially runs from Tuesday, October 31 through Thursday, November 2 with pre-conference and post-conference workshops that extend the show over a total of up to 6 full days. For the full conference, you’ll want to be there from Sunday, October 29 through Friday, November 3.

  • Sunday, October 29 – pre-con day. There are two workshops running:
    • Data Due Diligence – Developing a Strategy for BI, Analytics, and Beyond with Stacia Varga
    • Performance Troubleshooting Using Waits and Latches with Paul S. Randal
    • SQL Server 2014 and 2016 New Features and Capabilities with David Pless and Tim Chapman
  • Monday, October 30 – pre-con day. There are two workshops running:
    • Building a Modern Database Architecture with Azure with Jes Borland
    • Data Science: Introduction to Statistical Learning and Graphics with R and SQL Server with Shep Sheppard
    • Extended Events: WTF OR FTW! with Jonathan Kehayias
  • Tuesday, October 31 through Thursday, November 2 is the main conference. Conference sessions will run all day in multiple tracks:
    • Check out our sessions online here
    • Be sure to check out our cross-conference events and sessions
    • Get your pop-culture trivia and techie-SQL-trivia hat on and join us for SQLafterDark on Wednesday evening, November 1
  • Friday, November 3 is our final day with three post-conference workshops running:
    • Common SQL Server Mistakes and How to Correct Them with Tim Radney
    • SQL Server 2016 / 2017 and Power BI Reporting Solutions with David Pless
    • Very Large Tables: Optimizing Performance and Availability through Partitioning with Kimberly L. Tripp

SQLintersection: Why is it for you?

If you want practical information delivered by speakers that not-only know the technologies but are competent and consistently, highly-rated presenters – this is the show for you. You will understand the RIGHT features to troubleshoot and solve your performance and availability problems now!

Check us out: www.SQLintersection.com.

We hope to see you there!

PS – Use the discount code ‘SQLskills’ when you register and receive $50 off registration!

New Pluralsight course: Indexing for Performance

It’s been a while coming but it’s worth the wait! Kimberly’s latest Pluralsight course has been published – SQL Server: Indexing for Performance – and it’s a monster 7 hours long and full of extensive demos.

The modules are:

  • Introduction
  • Row-based vs. Column-based Indexes
  • Row-based Index Concepts
  • Choosing the Clustering Key
  • Clustered Index Internals
  • Nonclustered Index Internals
  • Data Access Patterns
  • Understanding the Tipping Point
  • Covering Queries
  • Using INCLUDE to Cover Queries
  • Understanding Filtered Indexes
  • Index Consolidation

Check it out here.

We now have more than 165 hours of SQLskills online training available (see all our 53 courses here), all for as little as $29/month through Pluralsight (including more than 5,000 other developer and IT training courses). That’s unbeatable value that you can’t afford to ignore.


10 years ago today…

…the second stage of my life ended, when I left Microsoft on August 31st, 2007.

The first stage of my life was education, ending in June 1994, with a Batchelor of Engineering (Hons.) degree in Computer Science and Electrical Engineering from the University of Edinburgh.

The second stage of my life was being an employee.

The first company I worked for was Digital Equipment Corporation (a.k.a. DEC, a.k.a. Digital), working on the OpenVMS/VMS file system (including the VMS chkdsk equivalent, disk/tape initialization, and general mag tape functionality) from August 1994 to January 1999. Working for DEC taught me how to be a real software engineer, and I’m very thankful that my first boss and mentor was Duncan McLaren – a damn fine engineer, a sincere geek, and good friend. I consider myself extremely lucky to have worked at DEC with Duncan to start my career and I really loved working there.

And that led me to the second and last company I worked for: Microsoft, from February 1999 to August 2007, including the jump from Scotland to Redmond, WA. See here for an explanation of how that happened. It was at Microsoft that I grew into a principal software engineer and learned how to be an engineering manager, both of software engineers and of program managers, and to lead product-wide projects. But the most important things I learned at Microsoft were how to lead and mentor people, and the value of a technical community.

Of all the people I worked with and got to know at Microsoft, the person who stands out the most is my great friend Bob Ward, who I first met in early 1999. We worked very closely over the years, and labored tirelessly together on the supportability (and PSS training) of the entire SQL Server box for the SQL Server 2005 release. You won’t find a more passionate advocate for solving customer problems, a more capable SQL Server debugger, or a more honest and sincere human being.

I absolutely *adored* working at Microsoft but in the end I couldn’t do what I wanted to do in 2007: work with Kimberly and spend my time teaching and consulting, so I made the decision to leave in June 2007 and gave the SQL Server group leaders 10 weeks notice, so I could transition everything properly.

Now I’m in the third stage of my life – owning and running a company: SQLskills, with my wonderful wife Kimberly, who’d been running the company by herself since she started it in 1995 until I joined in 2007. Kimberly taught me how to be a great presenter, how to run a successful small business, and how to be more empathetic in the ‘real world’ outside Microsoft.

This has been, and continues to be, a blast and I wouldn’t trade my life now for anything else. I’ve lost count of the number of classes I’ve taught, conferences I’ve presented at, and blog posts I’ve done here (ok, that I can put a number to: 947, including this one), plus I naturalized as a U.S. citizen in 2012. I must say that part of what makes this job so enjoyable is the superb team we have: Jon (since 3/15/11…), Glenn (5/1/12…), Erin (8/1/12…), Tim (1/19/15…), and our long-suffering assistant Libby (since forever…).

However, the thing I enjoy the most about my job now is helping people – through my blog, email, Twitter, my waits library, our classes and conferences, and more – and I don’t see that changing any time soon!

What will the next ten years bring, I wonder? Stay tuned to find out :-)


SQLskills SQL101: Readable secondary performance problems

As Kimberly blogged about earlier this year, SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

Yesterday I blogged about log shipping performance issues and mentioned a performance problem that can be caused by using availability group readable secondaries, and then realized I hadn’t blogged about the problem, only described it in our Insider newsletter. So here’s a post about it!

Availability groups (AGs) are pretty cool, and one of the most useful features of them is the ability to read directly from one of the secondary replicas. Before, with database mirroring, the only way to access the mirror database was through the creation of a database snapshot, which only gave a single, static view of the data. Readable secondaries are constantly updated from the primary so are far more versatile as a reporting or non-production querying platform.

But I bet you didn’t know that using this feature can cause performance problems on your primary replica?

As with most things in life, you don’t get anything for free. Readable secondaries are really useful, but there is a performance trade off you need to be aware of. All queries that are executed against a readable secondary are automatically run using read-committed snapshot isolation. This means they do not require share locks and so will not block any database changes being replayed from the primary replica (i.e. the constant redo of log records on the secondary replica that have been sent from the primary replica).

To do this requires the use of the versioning system, where (simplistically) pre-change versions of records are copied into the version store in tempdb and queries work out which version of the record is the correct one for them to process, based on the query’s starting time. All records that change get a 14-byte tag added on the end of the record that allows a query to see if this is the correct record, and if not to follow a pointer to the previous version of the record in the version store. This has been the mechanism since snapshot isolation and read-committed snapshot isolation were introduced in SQL Server 2005.

Now consider this: all AG replicas are exact copies of the primary replica. So how can versioning work on the readable secondary, adding 14-byte tags to some records? That must break the ‘exact copy’ rule, right?

Well, yes, it would… if the primary replica didn’t also change.

When a readable secondary is configured in an AG environment, all changing records on the primary replica start getting empty 14-byte versioning tags added to them. This is so that the 14-bytes of extra space on the record is noted in the transaction log and replayed on the secondary replicas, allowing the readable secondary to make use of the empty 14-byte space to store the versioning tag it needs.

This doesn’t break the ‘exact copy’ rule because the 14-bytes isn’t used for anything to do with recovery, there just has to be 14-bytes there.

So versioning tags start getting added to changing records on the primary (to be clear, it doesn’t turn on versioning on the primary) so table and index records start to get 14-bytes longer. And what happens when records get longer on pages where there isn’t enough space? Page splits in your indexes (and forwarded records in heaps – but I’ll concentrate on indexes here) leading to low page densities (wasted disk space and buffer pool memory), logical fragmentation (poor scan performance), and a bunch of extra, expensive log record generation from the page splits themselves.

To counteract this, you’ll need to implement (and/or possibly lower existing) fill factors on your indexes and even potentially start doing index maintenance on indexes that may not have required it previously. Quite an insidious problem that can be hard to figure out unless you know what’s going on under the covers!

See the following blog posts for more info:

This MSDN page has more general information and this whitepaper from Microsoft explains in more depth the various performance impacts from using readable secondaries: AlwaysOn Solution Guide: Offloading Read-Only Workloads to Secondary Replicas.

If you’re implementing readable secondaries in your AG configuration, make sure that you also investigate and implement index fill factors in the database so that the versioning tags that are added under the covers don’t start causing page splits and fragmentation.

SQLskills SQL101: Log shipping performance problems

As Kimberly blogged about earlier this year, SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

One question I’m asked regularly is this:  When our log shipping secondary is applying log backups, sometimes it takes a lot longer than usual. Any idea why this might be the case?

Log shipping has been around forever, and it’s still a hugely applicable and useful feature for very simply maintaining one or more secondary copies of a database. You can also use a secondary copy for reporting, where the restore of the log backup uses the WITH STANDBY option, leaving the secondary database in an accessible, but read-only state (when the logs aren’t being applied).

This works as follows:

  1. Make sure all users are disconnected from the secondary database
  2. Write all the log records from the log backup into the secondary database’s log file
  3. Perform the REDO part of recovery (ensuring that all operations from committed transactions are present in the secondary database)
  4. Perform the UNDO part of recovery (ensuring that all operations from uncommitted transactions are not present in the secondary database)

Step 4 writes all the log records generated by the UNDO operations into a special file called the undo file. This means that the secondary database is in read-only mode and is transactionally-consistent so that users can access it. The reason the log records are written into the undo file is so that the transaction log of the secondary database is not altered in any way, allowing subsequent log backups to be restored. If this weren’t the case, the UNDO log records would advance the secondary database’s LSN (Log Sequence Number), meaning that subsequent log backup restore operations would fail.

When the restore process begins on the secondary database, if an undo file exists, there is another step that is performed before steps 2-4 above. This additional step needs to take all the log records in the undo file and undo the effects of them – essentially putting the secondary database back into the state as of the end of step 3 from the previous restore. This database state is the same as if the previous log backup had been restored using WITH NORECOVERY instead of WITH STANDBY.

The occasional long-running restore problem happens when a log backup is restored that contains a long-running transaction that does not commit before the end of the log backup. This means that it must be completely undone as part of restoring the log backup (step 4), resulting in a very large undo file. This in itself can make restoring a log backup take a lot longer than usual. When the next log backup is restored, the additional step that undoes all the log records in the undo file has a very large undo file to process and takes much, much longer than usual. And if the log backup being restored also has an uncommitted, long-running transaction then it’s the perfect storm as the step 4 will also take a long time. These steps are all made even longer still if the log file has too many VLFs (called VLF fragmentation).

The situation where I’ve seen this most often is when the primary database is undergoing index maintenance and a log backup finishes near the end of a very long-running index rebuild operation of a large clustered index. The initial restore of that log backup on the secondary database takes much longer than usual to complete because of step 4 in the restore process. The next log backup on the primary also completes just before an index rebuild completes. When it is restored on the secondary, the whole of the large undo file has to be undone again, then the log restore occurs, and then another large undo file is generated to undo the second uncommitted index rebuild.

This is a possibility you have to be aware of if the secondary database must be available 24×7 for reporting, with only minimal downtime when each log backup is restored. In that case I would carefully augment the index maintenance operations on the primary with log backups to ensure that only complete, committed index rebuilds are present in the log backups being restored on the secondary database. Similar precautions should be taken if you have other, occasional, long-running operations.

An alternative would be to move from log shipping to database mirroring or availability groups, where the log records are continually being sent from the principal to the mirror database (or primary to secondary replica databases, in availability group terms) and there are no extra steps involving undoing log operations multiple times. With database mirroring, the drawback of this is that reporting would have to use database snapshots, so there’s a complexity trade-off involved. With availability groups, the reporting would have to use a readable secondary, which can lead to index fragmentation on the primary replica, but that can be compensated for with index fill factors (see here for more details).

So there you have it. Another example where understanding how SQL Server performs common operations can make it much easier to diagnose performance problems.