New Pluralsight course: Understanding and Performing Backups

My latest Pluralsight course has been published – SQL Server: Understanding and Performing Backups. It’s 2.5 hours long, and from the course description:

It’s really impossible to escape the need for backups even if you have some sort of synchronous replica of your databases, disasters can still happen that require restoring from backups. This means it’s critical that you understand what backups are and how to create them. In this course, SQL Server: Understanding and Performing Backups, you’ll learn what SQL Server backups are, why they’re required, and how they work. First, you’ll explore how to formulate a backup strategy based on business requirements and what restores you may need to perform. Next, you’ll delve into commonly used backup options. Finally, you’ll discover how to use backup compression, backup encryption, and maintain backup integrity. When you’re finished with this course, you’ll have the skills and knowledge to confidently perform backups in your environment and create a backup strategy.

The modules are:

  • Introduction
  • Full and Differential Data Backups
  • Transaction Log Backups
  • Log Backup Options and Considerations
  • Backup Strategies
  • General Backup Options
  • Backup Compression, Encryption, and Integrity

Check it out here.

We now have more than 165 hours of SQLskills online training available (see all our 56 courses here), all for as little as $29/month through Pluralsight (including more than 6,000 other developer and IT training courses). That’s unbeatable value that you can’t afford to ignore.

Enjoy!

SQLskills SQL101: Log shipping performance problems

As Kimberly blogged about earlier this year, SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

One question I’m asked regularly is this:  When our log shipping secondary is applying log backups, sometimes it takes a lot longer than usual. Any idea why this might be the case?

Log shipping has been around forever, and it’s still a hugely applicable and useful feature for very simply maintaining one or more secondary copies of a database. You can also use a secondary copy for reporting, where the restore of the log backup uses the WITH STANDBY option, leaving the secondary database in an accessible, but read-only state (when the logs aren’t being applied).

This works as follows:

  1. Make sure all users are disconnected from the secondary database
  2. Write all the log records from the log backup into the secondary database’s log file
  3. Perform the REDO part of recovery (ensuring that all operations from committed transactions are present in the secondary database)
  4. Perform the UNDO part of recovery (ensuring that all operations from uncommitted transactions are not present in the secondary database)

Step 4 writes all the log records generated by the UNDO operations into a special file called the undo file. This means that the secondary database is in read-only mode and is transactionally-consistent so that users can access it. The reason the log records are written into the undo file is so that the transaction log of the secondary database is not altered in any way, allowing subsequent log backups to be restored. If this weren’t the case, the UNDO log records would advance the secondary database’s LSN (Log Sequence Number), meaning that subsequent log backup restore operations would fail.

When the restore process begins on the secondary database, if an undo file exists, there is another step that is performed before steps 2-4 above. This additional step needs to take all the log records in the undo file and undo the effects of them – essentially putting the secondary database back into the state as of the end of step 3 from the previous restore. This database state is the same as if the previous log backup had been restored using WITH NORECOVERY instead of WITH STANDBY.

The occasional long-running restore problem happens when a log backup is restored that contains a long-running transaction that does not commit before the end of the log backup. This means that it must be completely undone as part of restoring the log backup (step 4), resulting in a very large undo file. This in itself can make restoring a log backup take a lot longer than usual. When the next log backup is restored, the additional step that undoes all the log records in the undo file has a very large undo file to process and takes much, much longer than usual. And if the log backup being restored also has an uncommitted, long-running transaction then it’s the perfect storm as the step 4 will also take a long time. These steps are all made even longer still if the log file has too many VLFs (called VLF fragmentation).

The situation where I’ve seen this most often is when the primary database is undergoing index maintenance and a log backup finishes near the end of a very long-running index rebuild operation of a large clustered index. The initial restore of that log backup on the secondary database takes much longer than usual to complete because of step 4 in the restore process. The next log backup on the primary also completes just before an index rebuild completes. When it is restored on the secondary, the whole of the large undo file has to be undone again, then the log restore occurs, and then another large undo file is generated to undo the second uncommitted index rebuild.

This is a possibility you have to be aware of if the secondary database must be available 24×7 for reporting, with only minimal downtime when each log backup is restored. In that case I would carefully augment the index maintenance operations on the primary with log backups to ensure that only complete, committed index rebuilds are present in the log backups being restored on the secondary database. Similar precautions should be taken if you have other, occasional, long-running operations.

An alternative would be to move from log shipping to database mirroring or availability groups, where the log records are continually being sent from the principal to the mirror database (or primary to secondary replica databases, in availability group terms) and there are no extra steps involving undoing log operations multiple times. With database mirroring, the drawback of this is that reporting would have to use database snapshots, so there’s a complexity trade-off involved. With availability groups, the reporting would have to use a readable secondary, which can lead to index fragmentation on the primary replica, but that can be compensated for with index fill factors (see here for more details).

So there you have it. Another example where understanding how SQL Server performs common operations can make it much easier to diagnose performance problems.

SQLskills SQL101: Why is restore slower than backup

As Kimberly blogged about recently, SQLskills is embarking on a new initiative to blog about basic topics, which we’re calling SQL101. We’ll all be blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

One question I get asked every so often is why it can sometimes take longer to restore a database from a full backup than it took to perform the backup in the first place. The answer is that in cases like that, there’s more work to do during the restore process.

A full backup has the following main phases:

  1. Perform a checkpoint.
  2. Read all in-use data from the data files (technically, reading all allocated extents, regardless of whether all 8 pages in the extent are in use).
  3. Read all transaction log from the start of the oldest uncommitted transaction as of the initial checkpoint up to the time that phase 2 finished. This is necessary so the database can be recovered to a consistent point in time during the restore process (see this post for more details).
  4. (Optionally test all page checksums, optionally perform backup compression, and optionally perform backup encryption).

A full restore has the following main phases:

  1. Create the data files (and zero initialize them if instant file initialization is not enabled).
  2. Copy data from the backup to the data files.
  3. Create the log file and zero initialize it. The log file must always be zero initialized when created (see this post for more details).
  4. Copy transaction log from the backup to the log file.
  5. Run crash recovery on the database.
  6. (Optionally test all page checksums during phase 2, perform decompression if the backup is compressed, and perform decryption if the backup is encrypted.)

Phase 3 above can often be the longest phase in the restore process, and is proportional to the size of the transaction log. This is done as a separate phase rather than being done in parallel with phases 1 and 2, and for a deep investigation of this, see Bob Ward’s recent blog post.

Phase 5 above might be the longest phase in the restore process if there were any long-running, uncommitted transactions when the backup was performed. This will be even more so if there are a very large number of virtual log files (thousands) in the transaction log, as that hugely slows down the mechanism that rolls back uncommitted transactions.

Here’s a list of things you can do to make restoring a full backup go faster:

  • Ensure that instant file initialization is enabled on the SQL Server instance performing the restore operation, to avoid spending time zero-initializing any data files that must be created. This can save hours of downtime for very large data files.
  • Consider backup compression, which can speed up both backup and restore operations, and save disk space and storage costs.
  • Consider using multiple backup files, each on a separate volume. SQL Server will recognize this situation and use parallel write threads (one per volume) to write to the files during the backup, and to read from them during the restore – speeding things up. If you have multiple database data files, a similar I/O parallelism will occur – providing even more of a speed boost.
  • Try to avoid having long-running transactions that will take time to roll back.
  • Manage your transaction log to avoid having an excessive number of virtual log files, so if there are transactions to roll back, the roll back will go as fast as possible. See this blog post for more details.

Hope this helps!