Files and filegroups survey results

Last month I kicked off a survey asking you to run some code to send me data on the size, number of files, and number of filegroups in your databases. I got back more than 17000 results, and I’m presenting a couple of ways of interpreting the data here. Interestingly, I only had a handful of results for databases above 100GB and for databases with more than 100 files, so to keep the graphs readable, I’ve chosen to exclude those.

Single Filegroup

First off, for databases with only a primary filegroup, how many files were there in that filegroup?

  • 24 files: 3 databases
  • 20 files: 1 database
  • 16 files: 3 databases
  • 15 files: 1 database
  • 12 files: 1 database
  • 10 files: 1 database
  • 8  files: 10 databases
  • 7 files: 1 database
  • 6 files: 50 databases
  • 5 files: 6 databases
  • 4 files: 56 databases
  • 3 files: 27 databases
  • 2 files: 67 databases
  • 1 file: 16121 databases

Unsurprisingly, single file plus single filegroup is the most common physical layout. We see this over and over, regardless of the size of the database.

As your databases start to increase in size, you need to consider splitting them up (note I’m not using the overloaded term ‘partitioning’) to allow you to do:

  • Targeted restores in the event of wholesale data loss, helping to reduce downtime. If you have multiple filegroups, you can potentially do a partial restore of only the data required to get the OLTP portion of your workload up and running, restoring the rest of the filegroups later.
  • Targeted maintenance to reduce the time and resources necessary to manage fragmentation. If your indexes are split of multiple filegroups (using partitioning) you can rebuild or reorganize just the index portion that has fragmentation.
  • Targeted performance management. If your workload uses/affects multiple portions of  your database, it may be beneficial to place those different portions of the database on different sections of your I/O subsystem.

None of these things can be done with a single file plus single filegroup database.

Data Files vs. Filegroups

FilesFilegroups Files and filegroups survey results

This is really interesting. The majority of databases that had more than one filegroup had the number of data files equal to the number of filegroups. I was prepared for this, with quite a few respondents pointing this out in their data and explaining that the database is from a third-party vendor application.

As a general rule of thumb, I recommend each filegroup having 2-4 data files, as this will give overall better I/O performance than a single data file. I’ve seen this over and over on client systems and you can see quite a few data points on the graph above reflecting that too. I also have some empirical evidence from various performance tests I’ve done (narrow scenarios, but definite proof-points):

Note that I didn’t include tempdb in this survey. Tempdb is a whole different kettle of fish (excellent British phrase!), where multiple data files can be required to alleviate in-memory contention for allocation bitmaps (classic PAGELATCH_UP/EX contention). For tempdb guidelines see: Tempdb configuration survey results and advice.

Data Files vs. Database Size

FilesDBSize Files and filegroups survey results

This is also really interesting. Without knowing what these databases are being used for, it seems that many of these databases have too many files for the size of the database (my gut feel, based on experience). My guess is that the file/filegroup layout was chosen based on rules that don’t equate to the normal reasons for having multiple file/filegroups, as I described above. What’s particularly surprising is the number of database less than 1-GB in size that have many, many data files.

Summary

So how many data files and filegroups should you have? There’s no right answer, and the sweet spot is going to vary for all of you.

Some general rules of thumb:

  • As your databases get larger, it becomes more likely that you’re going to need multiple files and filegroups
  • Multiple filegroups give you enhanced possibilities for targeted disaster recovery, easier manageability, and I/O subsystem placement
  • Each filegroup should have 2-4 files at least, with tempdb being a special case

Whatever you do, don’t forget to test to figure out the optimal configuration for your workload.

The Accidental DBA (Day 11 of 30): Backups: Backup Storage and Retention

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we’ll be covering in our brand new Immersion Event for The Accidental DBA, which we’ll be presenting for the first time in September 2013 in Bellevue, WA. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

There are two very common mistakes that DBAs (accidental or otherwise) make around backups:

  1. Storing the only copy of the backups on the same I/O subsystem as the databases
  2. Only keeping the most recent full backup, and differential/log backups since then

Backup Storage

The worst place in the world to store your backups is on the same portion of the I/O subsystem as the databases themselves.

If that I/O subsystem has a problem that causes corruption in the databases, the odds are that the same problem will affect the backups. Similarly, if someone makes a mistake and accidentally re-formats the I/O subsystem, not only will it destroy the databases but also the backups too.

How can you recover if your backups have also been corrupted or destroyed? You can’t. Or at least you can’t restore from your backups. I’ve heard of many cases where companies have lost a lot of money, business, respect, and jobs because of backups being co-located with databases.

Backups need to at least be stored on a different I/O subsystem from the databases, on a different server. That way a localized failure will not damage the backups as well.

Even that’s not good enough though. What if the entire data center is damaged or shut down (think about catastrophic power loss without any backup power source)? If the only backups are stored onsite, recovery will not be possible. For this reason, it’s highly advisable to keep copies of  your backups at an offsite location that’s easily accessible with a reasonable amount of time.

Note that I said ‘copoes of your backups at an offsite location’. You shouldn’t have your backups stored only offsite, because most of the time a disaster is localized and the entire data center is not affected. You just need to cope with the case when it is. So you should have onsite and offsite copies of your backups, to allow for fast access to cope with localized problems as well as slower access when a more comprehensive disaster occurs.

A common backup storage strategy involves first backing up locally to disk storage, to a network share or to local storage and then copied to a network share, and then copying the backups to an archive system onsite plus creating copies to ship offsite.

Whatever your strategy becomes, just don’t have only a single, local copy of your backups, stored with the databases; that’s the worst possible way to do it.

Backup Retention

One of the issues you may have is that you only have limited onsite and offsite storage space for your backups – so how long do you keep your backups for?

Imagine a timeline of backups: F1 L1 L2 D1 L3 L4 D2 L5 L6 D3 L7 L8 F2 L9 L10 D4 L11 L12 D5 L13 L14 D6 L15 L16

You may wonder why you need to keep any backup prior to the most recent full backup, F2. Think about this – what if the F2 full backup is corrupt?

If you haven’t kept any backups prior to F2, without the F2 backup to start your restore sequence, all the backups you have after F2 are completely useless.

If you can’t use the most recent full backup, the next best thing is the full backup you took before that, F1. And to use that full backup, you’ll also need the most recent differential backup that was based on that full backup, D3, plus log backups L7 and L8, to get you up to the point in time around F2, and then log backups L9 to L16 (because differential backups D4-D6 are based off F1, not F2, and so cannot be used).

As you can see, having a corrupt full backup also means having to restore a lot more backups, which will likely blow your downtime requirement, but that should be a rare case and an exception should be able to be made. You just need to make sure you have the older backups.

So, again, how long do you keep your backups for?

If I had a backup strategy that was weekly full backups, daily differential backups, and log backups every 30 minutes, I would try to keep at least a month’s worth of backups at all times onsite, with three months of backups stored offsite. Do you think that sounds excessive? Unless you’re testing a multitude of restore sequences every week using all the older backups, which is rarely possible, you need to have ‘defense in depth’ – multiple possible alternate restore strategies if multiple older backups prove to be damaged or invalid.

Summary

This was the last post in our Accidental DBA series about SQL Server backups. Over the last six posts I’ve explained:

  • What the downtime (RTO) and data loss (RPO) requirements are, and why they’re critical to know for your databases
  • The three recovery models that can be used, and how they affect the log and log backups
  • The three main types of backups (full, differential, and log) and what they’re used for
  • How to design a restore strategy that takes into account the RPO and RTO requirements, which then lets you decide what backups you need to perform
  • Commonly used BACKUP options and why they’re useful
  • The importance of validating backups to make sure that when you need them, they’re usable
  • The importance of storing multiple copies of backups, both onsite and offsite, so a disaster doesn’t result in complete data loss
  • Finally, the importance of retaining older backups so your disaster recovery strategy can cope with some backups being invalid

I hope this has given you a good grounding in the importance of having a good, well-tested backup strategy. Protecting the data from loss and damage is really one of the most important things that a DBA does.

The Accidental DBA (Day 10 of 30): Backups: Backup Testing for Validation

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we’ll be covering in our brand new Immersion Event for The Accidental DBA, which we’ll be presenting for the first time in September 2013 in Bellevue, WA. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

Kimberly and I have a saying: “You don’t have a backup until you’ve restored it!”. While this is not technically true, the sentiment is that you should be validating your backups, so that you have a good sense of confidence that when a disaster strikes and you go to use your backups, your disaster recovery will be successful.

What do I mean by ‘validating your backups’? I mean that at prescribed intervals you should emulate a disaster recovery response and perform a full restore sequence on a test server, involving your full, differential, and log backups. I usually advise doing this at least once a week, which also happens to be the most common interval between performing full backups. There’s really no frequency that’s too frequent for backup validation, but there are definitely those that are not frequent enough.

There are all kinds of excuses that people have for not regularly validating their backups, including:

  • I don’t have time
  • I don’t have the necessary disk space
  • That’s not my job
  • I don’t think it’s important

All of these excuses are invalid in my opinion. As a DBA, accidental or otherwise, it is part of y our job to protect the integrity of the database for which you’re responsible. Simply performing backups is not enough. You’re also responsible for making sure that those backups are valid, and not just at the time they were taken, but as time passes as well because an I/O subsystem can just as easily corrupt a backup on disk as it can corrupt data and log files on disk. It’s extremely important that, when a disaster strikes, you’re able to successfully use your backups – otherwise what’s the point of having them? You can’t ever get a guarantee (as I/O subsystem corruption can happen at any time) but you can get increased peace of mind.

I’ve lost count of the number of times I’ve heard of people with database corruption that also have corrupt backups, or the backups have been failing for months, or their restore sequence is broken because of a missing log backup, or some other variation of backups not working when they’re needed. You don’t want that to happen to you, and it’s really so easy to test your backups even just occasionally. If you think you don’t have enough disk space, you could even just restore to some external USB drives, anything will work as long as it lets you do the restore.

Now the question is, what RESTORE options do you want to use? At the very least, you want the restore system to verify the contents of the backup. Hopefully you’re using the CHECKSUM option for your backups (that I discussed on day 9) which validates page checksums as the pages are read from the data files and also creates a checksum of the entire backup file. If so, you at the very least you should do something like:

RESTORE VERIFYONLY
FROM DISK = N'F:\SQLskills\20130609_SalesDB_Full.bck'
WITH CHECKSUM;
GO

That will re-validate all the page checksums of the pages in the database in the backup, and also recalculate the checksum of the entire backup and validate it against the checksum stored in the backup header. This doesn’t take any disk space at all, and will let you know if your backup is corrupt.

What’s even better is to actually perform the restore operations, using the CHECKSUM option at all times, and then running consistency checks (through DBCC CHECKDB that we’ll discuss later in the month) to make sure that all the backups you would use are valid, and the database itself is free of corruptions.

If you do find you’ve got corrupt backups, you know you’re going to have to take new backups or potentially go backup to older backups that you’ve kept around for just this eventuality. Which brings up another issue: backup retention. How long do you keep backups around for, and where do you keep them? I’ll talk about that tomorrow.

The Accidental DBA (Day 9 of 30): Backups: Essential BACKUP Options

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we’ll be covering in our brand new Immersion Event for The Accidental DBA, which we’ll be presenting for the first time in September 2013 in Bellevue, WA. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

The BACKUP command has a lot of different options so in today’s post I want to introduce a few of the most useful options to you.

COMPRESSION

Backup compression was added in SQL Server 2008 in Enterprise Edition, and then in Standard Edition as well from SQL Server 2008 R2 onward. Not only can backup compression make your backups smaller, it can also speed up your backup operations AND speed up your restore operations. In that way, backup compression can be thought of as an availability feature, as anything that reduces downtime makes the data more available. The trade-off is that compressing and decompressing backups uses more CPU (you don’t get anything for free!) but most SQL Server systems these days are I/O-bound rather than CPU-bound so this is usually not a problem.

You can read more about my thoughts on backup compression in this post, including whether to enable it by default for the entire SQL Server instance. If you’re going to use it, make sure that the backups you’re using it for actually get a decent compression ratio (e.g. 20% or above) otherwise the extra CPU being used isn’t worth it.

COPY_ONLY

Once you’ve figured out what backups you need to satisfy your downtime and data-loss requirements (see the post from day 8), you’ll either have manually created SQL Agent jobs to perform regular backups, used one of the free database maintenance solutions available online (like this one, which I like to recommend), or created a database maintenance plan using SQL Server Management Studio.

If at any time you want to take a one-off, out-of-band backup, you need to be very careful how you do it. If  you perform a full backup, that full backup becomes the basis for subsequent differential backups. If you perform a log backup, that becomes part of the log backup chain. In both cases, these backups may be required during a disaster recovery operation.

To avoid having one-off backups affecting your differential backups or your log backup chain, you can use the COPY_ONLY option. For a full backup, this option makes the backup NOT become the basis for subsequent differential backups, and for a log backup, this option make the backup NOT become part of the log backup chain.

On any database where automated, regular backups are being performed, always use the COPY_ONLY option unless you have a very good reason not to. Better to be safe than sorry.

DESCRIPTION and File Names

It’s a really good practice to ensure your backup files a descriptively named – e.g. 20130609_1330_SalesDB_Log.bak – and that you set the DESCRIPTION option appropriately. Using the DESCRIPTION field is especially important if you perform a manual backup, so you have some idea of why the backup was taken and what it includes.

Note that my example file name ended with .bak. You can call your files anything you want, but .bak or .bck are commonly for data backups, and .trn is commonly used for log backups.

CHECKSUM

This option does two things:

  1. As the data file pages are being read from the data files, their page checksums are validated. If an invalid checksum is found, by default the backup will fail and report a message indicating which page is corrupt.
  2. Calculate a checksum over the entire backup and store it in the backup header.

As an aside, page checksums are a checksum written to each data file page just before it is written to disk and checked when a page is read from disk. It’s a way for SQL Server to determine that the I/O subsystem has corrupted a page.

It’s extremely important that your backups are valid and do not contain corruption, so this option should definitely be used on all backups you perform, manual or automated. By ensuring that the backup contains valid pages when the backup is performed, the backup can be verified later with more extensive checking. In tomorrow’s post I’ll discuss backup validation in more depth.

STATS

This option makes the BACKUP command print progress information as it executes. The default is to print every 10% through the operation and you can change that by using STATS = X, where X is the percentage interval between progress prints. This is especially useful when you’re performing manual backups of large databases and you want some indication that progress is being made. Incidentally, this option also applies to the RESTORE command too.

Summary

This post has just scratched the surface – you can get more information on all the BACKUP options from SQL Server Books Online here. Make sure that you know what your backup commands are doing, and especially make sure that each backup is going to a new file, which is named descriptively, and that existing files are not being overwritten using the INIT option.

Additionally, and we’re not covering it in this series but we do cover it in the Immersion Event, you should investigate the RESTORE command (see Books Online here)  and make sure you’ve tried basic restore operations – that is the reason you’re taking backups after all!

The Accidental DBA (Day 8 of 30): Backups: Planning a Recovery Strategy

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we’ll be covering in our brand new Immersion Event for The Accidental DBA, which we’ll be presenting for the first time in September 2013 in Bellevue, WA. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

Now that I’ve explained the downtime (RTO) and data loss (RPO) requirements plus the various recovery models and backup types, I want to explain how to design a recovery strategy.

When you’re designing a recovery strategy, the big temptation is to start by designing a backup strategy.

Don’t do it!

You need to design a restore strategy, not a backup strategy.

It’s all to do with framing your thinking around being able to restore while meeting your downtime and data-loss requirements.

  • Your downtime requirement tells you how quickly you must be able to restore, while meeting your data loss requirement.
  • Your data loss requirement tells you the point in time to which you must be able to restore, while meeting your downtime requirement.

If you think about it from the backup perspective first, you might design a backup strategy that does NOT allow you to meet your downtime and data-loss requirements.

A Real-Life Example

Here’s a story I like to tell when I’m teaching the consistency checking and corruption recovery section of our IE3 Immersion Event on High Availability and Disaster Recovery. Back in April 2004, when I still worked on the SQL Server team at Microsoft, I was helping out in a Product Support case involving a bank that had suffered corruption as part of a crash. They had a single full backup from January 2004 and half-hourly log backups between January and April, over 5000 backups in total. All the backups restored perfectly, but their restore ended up taking three days. All bank operations were down while the restore was taking place, and this extended outage eventually led to the bank going out of business because of a lack of client confidence.

What were they thinking, you may ask? How could they have designed that backup strategy?

Well that was their problem. They did not have much SQL Server experience and they thought they had a great backup strategy:

  • They had a full backup
  • They were running the full recovery model
  • They were taking regular log backups

Their mistake was that they had not taken their downtime (8 hours maximum) and data loss (zero) requirements into account. They met their data loss requirement but not their downtime requirement, and they had never tested doing a restore so had no idea that their requirements wouldn’t be met. They should have designed a restore strategy.

Designing a Restore Strategy

It’s imperative that you work out how quickly you have to be able to restore, and to what point, and then use that information to figure out what backups you need to do so you can perform the correct restore sequence when disaster strikes.

As far as data loss is concerned, the amount of data loss that can be tolerated defines how frequently you need to perform log backups. If 15 minutes is the acceptable amount of data loss, then you must perform log backups at least every 15 minutes. If zero is the acceptable amount of data loss, relying on log backups is dangerous, as a disaster may destroy the data and log files, making a tail-of-the-log . In that case, you will need some kind of synchronous technology like database mirroring or AlwaysOn availability groups – which we’ll discuss on day 22 of this series.

Data loss is the easier requirement to satisfy, as it’s really just thinking about log backup frequency. The downtime requirement is a lot harder, in my opinion.

As far as downtime is concerned, the amount of downtime that can be tolerated defines how quickly the entire restore sequence has to complete, from the initial restore of the most recent full backup, through to restoring the final log backup that gets the database to a point in time within the data loss requirement.

This means that you’re going to have to test the restore sequence to make sure that in the worst case it’s possible to restore within the downtime requirement.

Example Scenario

For example, with a 30-minute data-loss requirement and a 2 hour downtime requirement, you might design a restore strategy that is:

  • Perform a tail-of-the-log backup
  • Restore from the most recent full backup
  • Restore all log backups since the most recent full backup

And then set the full backup frequency at every day at midnight and the log backup frequency at every half hour. The full backup takes 20 minutes to complete and each log backup takes 5 minutes to complete.

The worst case scenario is that a crash occurs at 23:59:59. This means the restore sequence is (with some estimated completion times):

  • Perform the tail-of-the-log backup to backup the log generated since the log backup at 23:30. This takes 5 minutes.
  • Restore the full backup from midnight. This takes 20 minutes.
  • Restore the 48 log backups from today, starting with the one at 00:00 and ending with the one at 23:30. Each one takes 5 minutes, for total time of 240 minutes.
  • Restore the tail-of-the-log backup. This takes 5 minutes.

In total, the restore sequence takes 270 minutes, or 4 hours 30 minutes. This is clearly longer than the downtime requirement.

The solution is to add differential backups during each day, say every 4 hours starting at midnight, with each one taking 10 minutes to complete.

The restore sequence then becomes:

  • Perform the tail-of-the-log backup to backup the log generated since the log backup at 23:30. This takes 5 minutes.
  • Restore the full backup from midnight. This takes 20 minutes.
  • Restore the most recent differential backup from 20:00. This takes 10 minutes.
  • Restore the 8 log backups from today, starting with the one at 20:00 and ending with the one at 23:30. Each one takes 5 minutes, for total time of 40 minutes.
  • Restore the tail-of-the-log backup. This takes 5 minutes.

Now the total restore sequence takes 80 minutes, or 1 hour 20 minutes. This is within the downtime requirement.

Summary

As you can see from my (admittedly contrived) example, putting together a recovery strategy involves:

  1. Defining downtime and data-loss requirements for the database
  2. Working out what restores are necessary to meet #1
  3. Working out what backups are necessary to meet #2
  4. Testing the restore sequence to ensure it meets #1
  5. Potentially going back to #3 if the restore sequence does not meet #1 (or push back on the downtime requirement or provision a high-availability solution that allows a fast failover to a redundant copy of the database)

If the bank from 2004 had even done #4 then they’d have discovered that their backup strategy was inadequate and added in more frequent full and differential backups.

Don’t make the mistake they did – you always want to meet your downtime and data-loss requirements when disaster strikes.

The Accidental DBA (Day 7 of 30): Backups: Recovery Models and Backup Types

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we’ll be covering in our brand new Immersion Event for The Accidental DBA, which we’ll be presenting for the first time in September 2013 in Bellevue, WA. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

In today’s post I want to introduce you to the three recovery models a database can use, and to the various kinds of backups that are possible with SQL Server.

Before that, I need to explain a little bit about the transaction log that exists in each database. Whenever a transaction makes a change to a database, a record of the change is made in the transaction log, using log records to describe the change. Each transaction that’s making changes can either commit, making those changes durable in the database, or it can roll back, which removes the effects of the changes from the database.

If the system crashes, databases undergo crash recovery, which makes sure that:

  • For all transactions that were committed at the time of the crash, their effects are present in the database. This may mean that some log records are redone – re-applying the described changes to the database.
  • For all transactions that were not committed at the time of the crash, their effects are not present in the database. This may mean that some log records are undone – applying changes that reverse the described changes in the database (essentially doing the anti-operations, like deleting a table row that was inserted by an uncommitted transaction).

Crash recovery makes sure that the database is transactionally-consistent. The same set of operations occur when a database is restored as part of disaster recovery, using the transaction log records present in your various backups.

The transaction log needs to be managed to ensure it doesn’t grow out of control, by periodically allowing the contents of the log to be cleared, so the space can be reused for more log records. “Clearing” the log is sometimes referred to as “truncating” the log and they both mean the same thing: some portion of the log will be marked as available for reuse.

Back in 2009 I wrote an long, introductory article for TechNet Magazine that discusses all of this in more detail – see Understanding Logging and Recovery in SQL Server - and if you want really comprehensive information on the transaction log, see my 7.5 hour, online Pluralsight course: SQL Server: Logging Recovery, and the Transaction Log.

The way this log clearing is triggered depends on the recovery model that is used for the database.

Recovery models

There are three recovery models:

  • Full recovery model (the default and the most commonly used)
    • All modifications in the database a fully logged
    • The log will not clear until a transaction log backup is performed
  • Bulk-logged recovery model
    • Some modifications (like an index rebuild or a bulk load) can be minimally logged, which reduces the amount of log records generated
    • The log will not clear until a transaction log backup is performed, just like the full recovery model
  • Simple recovery model
    • Some modifications can be minimally logged, just like the bulk-logged recovery model
    • The log will not clear until a checkpoint occurs (usually automatically), and transaction log backups are not possible

The main thing to remember is that if your database uses the full or bulk-logged recovery model, you must perform periodic transaction log backups or the transaction log will grow forever.

Backup types

There are three commonly used backup types in SQL Server:

  • Full database backup
  • Transaction log backup
  • Differential database backup

There are also smaller-granularity data backups that are much less commonly used:

  • Full filegroup backup
  • Full file backup
  • Differential filegroup backup
  • Differential file backup

I’m going to focus on the three common backup types, as these are the ones you’ll be using as an Accidental DBA.

Full database backup

A full database backup (usually just referred to as a ‘full backup’, skipping the word ‘database’) backs up the entire database, and a full backup must be performed before you can perform other types of backups. When you’re performing a restore operation, the first backup you restore from must be a full backup, as that is the basis for other backups to be restored.

A full backup backs up all the data from the data files plus some transaction log records from the transaction log. This is to ensure that when restoring the full backup, the transaction log records allow the restored database to be brought to a transactionally-consistent point. Contrary to popular myth, full backups to not allow the transaction log to clear, and in fact do not affect the transaction log in any way whatsoever.

A full backup allows you to restore the database to a single point in time – the end of the backup – or to continue restoring more differential or log backups to bring the restored database more up-to-date.

Full backups are usually performed once per day or once per week.

Transaction log backup

A transaction log backup (usually just referred to as a ‘log backup’, skipping the word ‘transaction’) backs up all the transaction log records generated since the most recent log backup occurred. When a log backup completes, log clearing will be performed, which allows the log to be reused, helping to prevent log growth. A log backup cannot be performed until a full backup has been performed, but once a full backup has been performed, log backups and full backups do not interact or have any effect on each other at all.

A log backup allows to you continue a restore operation started by restoring a full backup, with each log backup that you restore bringing the restored database more up-to-date.

Log backups are usually performed once or multiple times per hour.

There is a special kind of log backup called a ‘tail-of-the-log’ or ‘tail-log’ backup that is performed when recovering from a disaster. The idea is that the very first thing you do after a disaster occurs is to take a final log backup, capturing all the transaction log that was generated since the most recent scheduled log backup before the disaster. During a restore sequence, this log backup will be the last one to be restored, bringing the database as close as possible to the point of the disaster.

Differential database backup

A differential database backup (usually just referred to as a ‘differential backup’ or ‘diff backup’, skipping the word ‘database’) backs up all the data that has changed since the most recent full backup. As more and more data changes in the database since the most recent full backup, successive differential backups will get larger and larger. Differential backups are cumulative, not incremental (which is a common misconception, as other database systems do have incremental data backups – SQL Server does not).

Differential backups are used to speed up restores by essentially replacing a number of log backups, as a differential backup is essentially the net effect of all the log backups since the most recent full backup.

Differential backups are not so commonly used, but when they are, they are usually performed once or multiple times per day.

Summary

Most of you will be using the full recovery model, which means you need to perform log backups regularly. But how regularly? A proper backup strategy takes into account the RTO and RPO requirements to determine what backup types to use, and how often to perform them. And that’s the topic for tomorrow’s post!

The Accidental DBA (Day 6 of 30): Backups: Understanding RTO and RPO

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we’ll be covering in our brand new Immersion Event for The Accidental DBA, which we’ll be presenting for the first time in September 2013 in Bellevue, WA. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

For the next six posts in the series I’ll be helping you understand one of the most important areas of responsibility for people looking after SQL Server: ensuring there are sufficient backups to allow recovery if a disaster occurs.

What do I mean by ‘disaster’? Anything that causes damage to your database files – generally a problem with the I/O subsystem (your disks/SAN/storage). If the database files are damaged, you need to have some way of being able to recover the damaged data, otherwise it could mean the company losing time, business, clients, money, and possibly endangering your job.

You might think that a disaster won’t happen to your company, but I/O subsystem problems happen very commonly. Even if you have some kind of redundant copy of the database using one of SQL Server’s high-availability technologies (which we’ll cover later in this blog post series), you still need to have backups in case the high-availability technology fails. It’s like the common mantra about I.T. security –  “defense in depth” – the more layers of defense against disaster your have, the better.

To start with I want to cover a few concepts that will help frame your thinking about what backups to perform. Today I’ll discuss the industry terms RTO and RPO.

RTO stands for Recovery Time Objective. In simple terms, you can think of RTO as a measure of how much downtime is acceptable, or how quickly must the data be made accessible again. RTO is often talked about in terms of the number of nines of desirable up time or accessibility for the data/database/system. For example, 5-nines means 99.999% up-time, and if that’s being measured 24 hours a day, 365 days a year, then that translates into just over 5 minutes of allowable downtime per year. That’s really hard to achieve. It’s easier to meet 4-nines (52.5 minutes per year of downtime) or 3-nines (8.75 hours per year of downtime).

When discussing the desired up time, you have to decide whether it’s being measured 24×365, or 9am-5pm during weekdays, or some other time period. You also need to decide whether the measured downtime includes scheduled downtime for maintenance/patching or not, as it’s much easier to meet a high number of nines if scheduled maintenance is permitted.

You can read a little bit more about downtime SLAs in a blog post survey I conducted back in 2009.

RPO stands for Recovery Point Objective. Again, in simple terms, you can think of RPO as a measure of how much data or work it’s acceptable to lose. It’s relatively easy to achieve very minimal or even zero data/work loss using backups, but depending on the amount of damage the database suffered when the disaster hit, recovering might take a lot of time. For instance, if an entire database is destroyed, depending on the architecture of the database and the backups that exist, it may take a significant amount of time to recover the database up to the point of the disaster. Most RPOs are defined as the the amount of time for which work may be lost. For example, an RPO might be that the database should be recoverable to a point within thirty minutes of the disaster occurring, which means that up to 30 minutes of work may be lost.

Both of these terms are called SLAs – Service Level Agreements. For each database for which you’re responsible, you need to find out what the RTO and RPO requirements are, as you’re responsible for meeting them. This means pressing the business managers/owners for business-driven decisions on what acceptable RTO and RPO values are – these are not numbers that can be decided without considering the business implications of downtime and data loss.

The business management will likely want to have as little downtime and data loss as possible when a disaster occurs and it’s part of your job to figure out what’s possible using backups (and maybe high-availability technologies as well, if the budget permits) and make sure that the business management agrees on what’s practical to achieve with the resources you have available.

Tomorrow I’ll talk about recovery models and backup types, and then on day 8 I’ll discuss how to create a backup strategy that helps to minimize downtime and data loss.

Survey: what SQL Server build number are you running?

Time for another survey (I’ll editorialize the files/filegroups data from the last survey next week).

This survey’s very simple, run the following code on as many of your instances as you can, and post a comment with the results or send me plaintext/spreadsheet output in an email. Please do not add any more info to the results (like server name) as it’s not relevant for this survey and adds a bunch of time to the results processing.

SELECT @@VERSION AS [Version];
GO
Version
------------------------------------------------------------------------------------------
Microsoft SQL Server 2008 R2 (SP1) - 10.50.2500.0 (X64)
	Jun 17 2011 00:54:03
	Copyright (c) Microsoft Corporation
	Enterprise Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)

I’ll editorialize the results in a couple of weeks.

Thanks!

New Immersion Event for Accidental/Involuntary/Junior DBAs

As I announced in our Insider newsletter last week, we’ve designed some new Immersion Events to expand the curriculum we offer.

We have a new Immersion Event for the Accidental DBA! This is a three-day class taught by Jonathan Kehayias and Erin Stellato.

This course will help you learn how to keep the lights on so that SQL Server meets the business requirements for availability and performance, and is especially suited to:

  • A server administrator that has to maintain SQL Server as a part of the infrastructure
  • An administrator for a line-of-business application such as SharePoint, Microsoft CRM, Great Plains, Dynamics, or Biztalk
  • An administrator for one or many non-Microsoft systems that rely on SQL Server
  • A junior DBA just getting started with SQL Server

The first public class will be offered in Bellevue, WA on September 30 – October 2, 2013 – which is co-located with a new Advanced Transact-SQL class that Bob will be teaching that week.

The class costs US$2,245, with an early-bird price of US$1,945 available until August 15th (all our classes have the same per-day cost, regardless of length).

The agenda for the course is:

  • Module 1: Hardware Considerations
  • Module 2: Installing and Configuring SQL Server
  • Module 3: Security
  • Module 4: Backup and Restore
  • Module 5: Disaster Recovery vs. High Availability
  • Module 6: Database Maintenance
  • Module 7: Monitoring

You can get class information, a detailed agenda, and registration information HERE.

It’s pretty cool to see how excited everyone on the team is about these new classes – we’re looking forward to giving you all the best value for your training budget!

See you there!

New Immersion Event on Advanced Transact-SQL

As I announced in our Insider newsletter last week, we’ve designed some new Immersion Events to expand the curriculum we offer.

We have a new Immersion Event on Advanced Transact-SQL! This is a four-day class taught by Bob Beauchemin, our resident developer expert.

The first public class will be offered in Bellevue, WA on September 30 – October 3, 2013 – which is co-located with a new class that Erin and Jon will be teaching that week (announcement tomorrow).

The class costs US$2,995, with an early-bird price of US$2,595 available until August 15th (all our classes have the same per-day cost, regardless of length).

The agenda for the course is:

  • Module 1: Data Types, Functions and Operators, and Scaling
  • Module 2: Writing Effective Queries
  • Module 3: Summarization and Analysis
  • Module 4: Optimizing Procedural Code – Stored Procedures
  • Module 5: Optimizing User-Defined Functions and Triggers
  • Module 6: Best practices with Action Statements
  • Module 7: Solving Difficult Problems in Relational Databases

You can get class information, a detailed agenda, and registration information HERE.

This is very exciting – we hope to see you there!