SQLskills SQL101: How can a SELECT cause a database to change?

As Kimberly blogged about recently, SQLskills is embarking on a new initiative to blog about basic topics, which we’re calling SQL101. We’ll all be blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

This is an interesting misconception that I was asked about last week: (paraphrasing) Surely a SELECT operation can’t cause a database to change, because it’s just reading data, not altering it in any way, right?

Well, no. There are actually quite a few side effects of queries that only read data and never perform data changes (not counting a SELECT … INTO, of course). Here are four that spring to mind…

Statistics Creation

If the database property Auto Create Statistics is set to True, when a query is being compiled and the query optimizer determines that a statistic could be created that would aid the optimization process, it will create that statistic before optimization continues, thus changing the database. Your SELECT statement could cause this to happen.

Statistics Update

If the database property Auto Update Statistics is set to True, when a query is being compiled and a necessary statistic is determined to be out-of-date, it will be automatically updated before optimization continues, thus changing the database. Your SELECT statement could cause this to happen. Additionally, if the Auto Update Statistics Asynchronously property is enabled, the statistic will be automatically updated, but after the optimization process (so the compiling query doesn’t have to wait).

Ghost Cleanup

Ghost cleanup is the funky process for removing deleted records. For all indexes, and for heaps when some form of snapshot isolation is involved, deleting a record just marks it as deleted. After the deleting transaction commits, the deleted record is later removed by a background process called the ghost cleanup task. The interesting thing though is that a deleted record is not immediately entered in the task’s list of things to do. It’s usually not until the *next* use of the data file page that the Storage Engine sees that there’s a deleted record and enters it in the task’s to-do list. So, your SELECT statement could be that ‘next’ use of a data file page with a recently deleted record on that causes the record to be cleaned up by the ghost cleanup task.

Query Store

From SQL Server 2016 onward, if you have the Query Store enabled, by default every query execution will cause some metrics to be captured and stored in the Query Store’s system tables (which you can query with various DMVs). You can stop it capturing information for all queries by setting the QUERY_CAPTURE_MODE to AUTO, which causes it to not capture “insignificant” queries, but you can’t specify what “insignificant” means. Anyway, your SELECT statement could cause this to happen.

Summary

So as you can see, just because you’re not deliberately making a change in a database, that doesn’t mean that you won’t trigger something else in SQL Server to make a change. And then there’s the Auto Shrink option for a database, which of course should *never* be enabled! (see here for explanation…)

New class: Immersion Event on Clustering and Availability Groups

We have a third exciting new class debuting this October in Chicago: Immersion Event on Clustering and Availability Groups.

It’s a 2-day class, taught by Jonathan Kehayias, our resident expert on all things AlwaysOn. We’ve seen a huge surge in clients using FCIs and AGs, so this class will be really useful to many organizations, and is a partial replacement for our previous 5-day IEHADR class.

The modules are as follows:

  • Windows Server Failover Clustering Essentials
  • AlwaysOn Failover Clustered Instances
  • AlwaysOn Availability Groups
  • Implementation Case Studies

You can read a more detailed curriculum here and all the class registration and logistical details are here.

We hope to see you there!

SQLskills SQL101: Why is restore slower than backup

As Kimberly blogged about recently, SQLskills is embarking on a new initiative to blog about basic topics, which we’re calling SQL101. We’ll all be blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

One question I get asked every so often is why it can sometimes take longer to restore a database from a full backup than it took to perform the backup in the first place. The answer is that in cases like that, there’s more work to do during the restore process.

A full backup has the following main phases:

  1. Perform a checkpoint.
  2. Read all in-use data from the data files (technically, reading all allocated extents, regardless of whether all 8 pages in the extent are in use).
  3. Read all transaction log from the start of the oldest uncommitted transaction as of the initial checkpoint up to the time that phase 2 finished. This is necessary so the database can be recovered to a consistent point in time during the restore process (see this post for more details).
  4. (Optionally test all page checksums, optionally perform backup compression, and optionally perform backup encryption).

A full restore has the following main phases:

  1. Create the data files (and zero initialize them if instant file initialization is not enabled).
  2. Copy data from the backup to the data files.
  3. Create the log file and zero initialize it. The log file must always be zero initialized when created (see this post for more details).
  4. Copy transaction log from the backup to the log file.
  5. Run crash recovery on the database.
  6. (Optionally test all page checksums during phase 2, perform decompression if the backup is compressed, and perform decryption if the backup is encrypted.)

Phase 3 above can often be the longest phase in the restore process, and is proportional to the size of the transaction log. This is done as a separate phase rather than being done in parallel with phases 1 and 2, and for a deep investigation of this, see Bob Ward’s recent blog post.

Phase 5 above might be the longest phase in the restore process if there were any long-running, uncommitted transactions when the backup was performed. This will be even more so if there are a very large number of virtual log files (thousands) in the transaction log, as that hugely slows down the mechanism that rolls back uncommitted transactions.

Here’s a list of things you can do to make restoring a full backup go faster:

  • Ensure that instant file initialization is enabled on the SQL Server instance performing the restore operation, to avoid spending time zero-initializing any data files that must be created. This can save hours of downtime for very large data files.
  • Consider backup compression, which can speed up both backup and restore operations, and save disk space and storage costs.
  • Consider using multiple backup files, each on a separate volume. SQL Server will recognize this situation and use parallel write threads (one per volume) to write to the files during the backup, and to read from them during the restore – speeding things up. If you have multiple database data files, a similar I/O parallelism will occur – providing even more of a speed boost.
  • Try to avoid having long-running transactions that will take time to roll back.
  • Manage your transaction log to avoid having an excessive number of virtual log files, so if there are transactions to roll back, the roll back will go as fast as possible. See this blog post for more details.

Hope this helps!