This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!
There are two very common mistakes that DBAs (accidental or otherwise) make around backups:
- Storing the only copy of the backups on the same I/O subsystem as the databases
- Only keeping the most recent full backup, and differential/log backups since then
Backup Storage
The worst place in the world to store your backups is on the same portion of the I/O subsystem as the databases themselves.
If that I/O subsystem has a problem that causes corruption in the databases, the odds are that the same problem will affect the backups. Similarly, if someone makes a mistake and accidentally re-formats the I/O subsystem, not only will it destroy the databases but also the backups too.
How can you recover if your backups have also been corrupted or destroyed? You can’t. Or at least you can’t restore from your backups. I’ve heard of many cases where companies have lost a lot of money, business, respect, and jobs because of backups being co-located with databases.
Backups need to at least be stored on a different I/O subsystem from the databases, on a different server. That way a localized failure will not damage the backups as well.
Even that’s not good enough though. What if the entire data center is damaged or shut down (think about catastrophic power loss without any backup power source)? If the only backups are stored onsite, recovery will not be possible. For this reason, it’s highly advisable to keep copies of your backups at an offsite location that’s easily accessible with a reasonable amount of time.
Note that I said ‘copies of your backups at an offsite location’. You shouldn’t have your backups stored only offsite, because most of the time a disaster is localized and the entire data center is not affected. You just need to cope with the case when it is. So you should have onsite and offsite copies of your backups, to allow for fast access to cope with localized problems as well as slower access when a more comprehensive disaster occurs.
A common backup storage strategy involves first backing up locally to disk storage, to a network share or to local storage and then copied to a network share, and then copying the backups to an archive system onsite plus creating copies to ship offsite.
Whatever your strategy becomes, just don’t have only a single, local copy of your backups, stored with the databases; that’s the worst possible way to do it.
Backup Retention
One of the issues you may have is that you only have limited onsite and offsite storage space for your backups – so how long do you keep your backups for?
Imagine a timeline of backups: F1 L1 L2 D1 L3 L4 D2 L5 L6 D3 L7 L8 F2 L9 L10 D4 L11 L12 D5 L13 L14 D6 L15 L16
You may wonder why you need to keep any backup prior to the most recent full backup, F2. Think about this – what if the F2 full backup is corrupt?
If you haven’t kept any backups prior to F2, without the F2 backup to start your restore sequence, all the backups you have after F2 are completely useless.
If you can’t use the most recent full backup, the next best thing is the full backup you took before that, F1. And to use that full backup, you’ll also need the most recent differential backup that was based on that full backup, D3, plus log backups L7 and L8, to get you up to the point in time around F2, and then log backups L9 to L16 (because differential backups D4-D6 are based off F2, not F1, and so cannot be used).
As you can see, having a corrupt full backup also means having to restore a lot more backups, which will likely blow your downtime requirement, but that should be a rare case and an exception should be able to be made. You just need to make sure you have the older backups.
So, again, how long do you keep your backups for?
If I had a backup strategy that was weekly full backups, daily differential backups, and log backups every 30 minutes, I would try to keep at least a month’s worth of backups at all times onsite, with three months of backups stored offsite. Do you think that sounds excessive? Unless you’re testing a multitude of restore sequences every week using all the older backups, which is rarely possible, you need to have ‘defense in depth’ – multiple possible alternate restore strategies if multiple older backups prove to be damaged or invalid.
Summary
This was the last post in our Accidental DBA series about SQL Server backups. Over the last six posts I’ve explained:
- What the downtime (RTO) and data loss (RPO) requirements are, and why they’re critical to know for your databases
- The three recovery models that can be used, and how they affect the log and log backups
- The three main types of backups (full, differential, and log) and what they’re used for
- How to design a restore strategy that takes into account the RPO and RTO requirements, which then lets you decide what backups you need to perform
- Commonly used BACKUP options and why they’re useful
- The importance of validating backups to make sure that when you need them, they’re usable
- The importance of storing multiple copies of backups, both onsite and offsite, so a disaster doesn’t result in complete data loss
- Finally, the importance of retaining older backups so your disaster recovery strategy can cope with some backups being invalid
I hope this has given you a good grounding in the importance of having a good, well-tested backup strategy. Protecting the data from loss and damage is really one of the most important things that a DBA does.
13 thoughts on “The Accidental DBA (Day 11 of 30): Backups: Backup Storage and Retention”
Hi Paul,
It as been great reading this series of blogs about “The Accidental DBA” I’m sure that’s helping a lot of people around the world.
I was reading this blog and I think there’s some confusion with this sentence “(because differential backups D4-D6 are based off F1, not F2, and so cannot be used)” shouldn’t it be something like this “(because differential backups D4-D6 are based off F2, not F1, and so cannot be used)”?
I may be wrong but I would like to clarify so that I don’t get the wrong idea.
Congratulations to all of you that made this series.
Vitor is correct. I guess this sentence needs correction
“because differential backups D4-D6 are based off F1, not F2, and so cannot be used”
Since a corrupted Full backup is like a adhoc Full backup that breaks the log chain. Since differential backups re initiate the log chain, then why can’t we use the F1+D3+L7+L8+D5+L13 +….
No – full backups have no effect on the log backup chain, except initiating it. D5 cannot be used in a restore sequence that begins with F1, as the differential base GUID of D5 specifies F2. Differential backups only restart the the log backup chain after a switch to-and-from the simple recovery model.
Hi Paul,
I think your Data Retention example provides a good reason why you might want to chose a full recovery model over a simple recovery model.
If your database is in simple recovery and you take weekly full backups and hourly differential backups then in most scenarios you can potentially lose up to 1 hour of data. However if we use your example where the F2 full backup is corrupt then you can only use F1 and the last differential backup before F2 was taken. You won’t be able to recovery anything from F2 onward. This is potentially almost 1 week of data loss. Am I understanding this correctly?
Absolutely spot on.
Thank you for this very helpful series!
If the SQL Server instance runs on a virtual server on a cloud platform such as Azure, is it sufficient to copy the backups to another cloud platform such as AWS? Or is it advisable to copy to a known physical device too?
As long as you have copies physically separate, that’s good enough. But having multiple copies gives extra piece of mind and redundancy too.
in my company i am workfor, backupp need to keep for 2 year,
I dont get reason to this, but this is policy regulation.
and could you more detail in best practice backup strategy using backup sql local server and combine with backup tape.
thanks
Your backup strategy and retention is dictated by what restores you need to be able to do, and how quickly – there’s no one-size-fits-all.
Good article Paul!
Just recognized a type at “Note that I said ‘copoes of your backups” –> copoes instead of copies
Oops – fixed – thanks!
You need to be more paranoid.
Backups need to be stored offline in case of a malicious attack. This has been observed in the wild. Both criminals or hackers have been known to delete online backups.
My thinking is backup to disk. Clone backup to tape. Write protect the tape and store tapes offsite.