Conference Questions Pot-Pourri #10: Shrinking the database before taking a backup

Here’s a question I got from someone who attended our database maintenance workshop at PASS last week (paraphrased):

I attended your pre-conference session on database maintenance and found it to be very informative.  From what you told use though, I think I need to change my nightly backup procedure.  I like to get my databases back to as small of a size as possible before backing them up, so I run the following commands to do this before taking the full database backup: BACKUP LOG [mydbname] WITH NO_LOG and then DBCC SHRINKDATABASE (‘mydbname’).  Could you help me with a better way of doing this? We’re on SQL Server 2005.

And here’s the answer I sent back:

How large is the database? And how long must you keep the backups around? If the cumulative size of the backups takes up a large proportion of your available storage space (and we’re talking more than just a single direct-attached 100+GB drive), then it may be worth compressing the backups – otherwise you’re likely causing yourself more trouble than its worth.

By doing BACKUP LOG WITH NO_LOG you’re effectively throwing away log records and removing the possibility of doing any kind of point-in-time, or up-to-the-second recovery (see BACKUP LOG WITH NO_LOG – use, abuse, and undocumented trace flags to stop it). If you’re running in the FULL recovery model, and you don’t care about either of these features, then you should switch to the SIMPLE recovery model. If you really want to be in FULL, don’t ever use WITH NO_LOG.

The amount of transaction log that a full backup requires cannot be changed by you truncating the log. The full backup will backup any log it requires to enable the restored database to be a transactionally consistent copy of the database. See Debunking a couple of myths around full database backups and More on how much transaction log a full backup includes.

Doing a DBCC SHRINKDATABASE (the same exact operation as a database auto-shrink) will cause massive index fragmentation, and cause file-system fragmentation of the data files, as they will likely need to grow again after you’ve squeezed all the space out of them. See Auto-shrink – turn it OFF! for more details on the effects.

If you’re really concerned about backup sizes and space is at a premium, I recommend using a 3rd-party backup compression tool so you’re not affecting the actual database. Remember also that SQL Server 2008+ has native backup compression too – see my blog post here for more details.

Hope this helps.

9 thoughts on “Conference Questions Pot-Pourri #10: Shrinking the database before taking a backup

  1. Recently on SSC there was a similar question which I answered like you did. One of the things I have learned form blogs like yours and other resources, don’t shrink! Of course that depends as well as a TX Log shrink after a one time unusually large edit
    IMO is acceptable.

    Keep up the writing, I for one enjoy reading your posts, even the ones that are over my head.

  2. Hi

    I agree with what’s been said, but have an interesting case where half may data is in
    SQL 2000 and the other in pervasive. Each day we load the pervasive data into SQL
    So the we have a full set of data. Done via DTS. This creates very large log files, if left they
    Grow 80% bigger than the data file. What what should we do? Why does the database
    clean the log file automotly?

    Regards
    Jack

  3. Clay – looks interesting – I don’t have time to download the code and read through it but I’m guessing you’re using the VDI interface to get the data in/out of SQL. Good to see a freeware alternative to the pay-for tools available on 2005. Thanks

  4. Hi Jack (not Jack Corbett),

    If your transaction log file *has* to grow (sounds like you’ve got a single very large transaction to insert the new data, rather than splitting into smaller batches) then SQL Server will not shrink it down again once the transaction log records in it are no longer needed. Managing the size of your log files is a big topic – I’ve just written a feature length article on it for the Feb 2009 TechNet Magazine – I’ll blog when the article is available.

    Thanks

  5. Hi Paul,
    We have a db around 3 tb in size which takes around 30 hours to backup using a tool called backup exec. We have identified and deleted around 300gb of data and now have mmore available space in the data files. However the backup time has not reduced at all? Any ideas on why deleting large amounts of data has not had any impact on the time taken to backup the db? The db is set to simple mode for backup.
    Thanks
    Chris

    1. I don’t know what Backup Exec is doing. With native backups, deleting 1/10 of the data would translate into roughly 1/10 faster backup, all other things being equal (including amount of log required to be backed up too).

  6. Hi
    Having started to take on more responsibility at work with SQL environments and not being a DBA myself I wanted to ask if what I have configured is optimal.

    2 node sql 2008 cluster. 4 SQL instances. Each instance on its own lun. About 30 databases on each instance. Each instance houses the db and its log file. System databases for each instance are on seperate luns.

    All databases are in recovery FULL mode.

    We use netbackup for backing up SQL. 2 polices configured for each instance. Full backup of databases done every evening, and log file backups set to run every 2 hours between 0800-1800.

    noticing some db log files are incrediably big. Have occationally had to set a DB to simple and then run a shrink on the db to get the log files down.

    If shrinking is bad, and I have to leave the recovery model as is, is our backups not the right way of doing this, or what other way should I configure our backups?

    We also have Recover Point that is replicting each lun to a colo site.

    thanks
    Steve

    1. You need to figure out why the log is growing larger than you want. If it needs to be a certain size, you need to just leave it there, as repeated log shrink/grow/shrink/grow sucks down performance when then log has to grow. I’m also concerned about your instance setup – with two instance per node, they’re going to be fighting for resources, and when a failover occurs you’ll have 4 instances fighting for resources on a single node. With your log file backups set for ever two hours, you’re looking at two hours of data loss per database in the event of a disaster – is the business ok with that? And with not doing log backups between 6pm and 8am, everything that occurs between those times will have to be held in the log – contributing to log growth. You also shouldn’t switch to simple as this breaks the log backup chain – there’s no reason to switch to simple to make the log shrink if that’s necessary. Lots of areas for improvement here.

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.