Trending Database Growth From Backups

When I start working with a client, one question I always ask is whether they are collecting baselines of their SQL Server environment (a shocker, I know). If they are not, I explain why it’s a good idea to start capturing them. And even though I think it’s an easy argument to make, I find I make a better case when I have data to back it up. But how do you make the argument for baseline data, when you don’t have any real data to show?

There is data in SQL Server that you can mine; you just have to know where to find it. If I look at a client system and notice that maintenance tasks keep taking longer and longer, then I might assume it’s due to database growth. Now, if it’s just database integrity checks that are taking longer and longer, that might be a sign that something is wrong. However, that’s out of scope for this post, so let’s stick with the assumption that the database is growing larger over time because data is rarely deleted, only added. Depending on the client’s current storage and the duration of the tasks, I may have some concerns about how much disk space they’re going to need down the road. I really want to trend database growth, among other things, over time. And one way I can approximate growth is by using information from full backups.

When you backup a database, every page that is allocated in the database is copied to the backup. This means you could have a 100GB database with a backup of only 50GB, because only 50GB’s worth of pages are allocated. If my database files are pre-sized, as they hopefully are, then looking at backup size will not tell me anything about the current size of the database. However, it will tell me about the growth of it – which is really what I’m after.

Backup information is stored in msdb, and while it should be removed on a regular basis via a scheduled maintenance task, it is not unusual for at least three to six months of data to exist, if not more. Everything I need for this example I can capture from dbo.backupset, which has one row for every successful backup operation. Here’s my query*:

SELECT
[database_name] AS "Database",
DATEPART(month,[backup_start_date]) AS "Month",
AVG([backup_size]/1024/1024) AS "Backup Size MB",
AVG([compressed_backup_size]/1024/1024) AS "Compressed Backup Size MB",
AVG([backup_size]/[compressed_backup_size]) AS "Compression Ratio"
FROM msdb.dbo.backupset
WHERE [database_name] = N'AdventureWorks'
AND [type] = 'D'
GROUP BY [database_name],DATEPART(mm,[backup_start_date]);

In this query I’m filtering on a specific database, and I’m only looking at full backups (type = ‘D’). Log backups would be interesting to examine as well, but that’s for another post. I’m also aggregating all the full backups for one month. Whether you’re running full backups daily or weekly, I would recommend aggregating the data by month. Trying to look at the changes day-by-day or even week-by-week is too detailed. We want to look at the big picture, and a monthly summary gives us that. Here’s the output for my AdventureWorks database:

output

Notice that the backup size increases over time, but it’s not linear. If I graph it in Excel, I can really see the trend:

image

Further analysis is natural from this point on – what’s the percent increase each month? Each quarter? Which month had the largest increase? When is the database going to fill up the storage we have allocated currently? In my case, I just want to be able to show that we can get this kind of information, plus a lot more, from SQL Server if we just capture it. And this data supports my point very well. If you want to dig deeper into database growth analysis, I say run with it. J

Hopefully you now see how easy it to use data from SQL Server to make your life easier: the information the above query provides can help you understand database growth and start basic capacity planning. I also hope this information helps to convince you (or your manager) that collecting baseline data can be extremely beneficial, and now’s the time to start. If you need more background, or some queries to get you started, please check out my Baselines series on SQL Server Central. Good luck!

EDIT: *For those of you running SQL Server 2005 and below, you will need to exclude compression information:

SELECT
[database_name] AS "Database",
DATEPART(month,[backup_start_date]) AS "Month",
AVG([backup_size]/1024/1024) AS "Backup Size MB"
FROM msdb.dbo.backupset
WHERE [database_name] = N'AdventureWorks'
AND [type] = 'D'
GROUP BY [database_name],DATEPART(mm,[backup_start_date]);

23 thoughts on “Trending Database Growth From Backups

  1. Thanks, this is helpful. As a DBA, I often need to request disk space from network operations. And this can help me to track the growth and estimate the future size.

    Also I would like to track database data drive and log drive growth, I guess there is no a history table that we can query with, we have to save the data to a table and then compare with that, correct?

    Thanks
    Anne

  2. Correct, you need to log drive size and space used with a job, over time. In my next article on SQL Server Central I have scripts for that (I don’t have a publish date for it yet though!).

  3. Hi Erin,
    My company host a database server that serves as the backend to thousands of card readers. Other the last few months, changes have been introduced that have increased the workload exponentially. I have been collecting PERFMON and waitstat baselines but at irregular intervals, we have monitoring tools but they are generic windows server focused, not targeted at SQL like SCOM, Spotlight and so on.

    I’m not sure if it is really that representational (as SELECT type query activity is obviously missing) but I wrote a similar query to yours above to get the size of transaction log backups and put the results in an excel pivot table. The transaction logs grow proportionally to the level of update query activity (is my supposition, think that is correct), I used this to illustrate the increasing transactional load on the server.

    Along the x axis (horizontal time series) I built a table variable of 1 hour time intervals then cross applied queries to msdb backup set size info. I can try and find the query if you are interested. It’s horribly slow but result set pivots nicely.

    Regards

  4. Hi Erin,
    This isn’t a great query but the results pivot nicely in Excel and can be quite interesting in some cases.

    DECLARE @Start datetime, @End datetime
    DECLARE @Intervals Table
    (
    iDatabase sysname,
    iBegin datetime,
    iEnd datetime
    )

    SET @Start = ‘1 Dec,2012 00:00′
    SET @End = ’31 Dec,2012 23:59’

    — x axis time intervals
    WHILE @Start < @End
    BEGIN
    INSERT INTO @Intervals (iDatabase, iBegin, iEnd)
    SELECT
    name,
    @Start,
    DATEADD(hour,1,@Start)
    FROM sys.databases
    WHERE database_id > 4

    SELECT @Start = DATEADD(hour,1,@Start)
    END

    SELECT
    y.iBegin AS FromDateTime,
    y.iEnd ToDateTime,
    y.idatabase AS DatabaseName,
    SUM(ISNULL(y.BackupSizeMB,0)) AS TranLogBackupSizeMB
    FROM
    (
    SELECT
    iBegin,
    iEnd,
    i.iDatabase,
    x.BackupSizeMB
    FROM @Intervals i
    LEFT OUTER JOIN
    (
    SELECT
    bs.database_name,
    bs.backup_start_date,
    (compressed_backup_size/1024)/1024 AS BackupSizeMB
    FROM msdb.dbo.backupset bs
    INNER JOIN sys.databases d
    ON bs.database_name = d.name
    JOIN msdb.dbo.backupmediafamily mf
    ON mf.media_set_id = bs.media_set_id
    WHERE type = ‘L’
    ) x
    ON x.database_name = i.iDatabase
    WHERE ISNULL(x.backup_start_date,i.iBegin) BETWEEN i.iBegin AND i.iEnd

    ) y
    GROUP BY
    y.iBegin,
    y.iEnd,
    y.idatabase

    You need continous time intervals along the x axis (perhaps not hourly in all cases) to make the result (total transaction log backup size for the time interval in the above) meaningful. The result can be skewed though by changes to transaction log backup job schedules, irregular reindexing jobs etc.
    This simple ‘time series table variable for x axis intervals’ technique works well with Excel pivot table trend analysis type queries like yours though.
    Best wishes
    Paul

  5. Hi Erin,
    thanks for much for this, is exactly what i needed after a request from a business area requesting data so they can pro-actively trend and prepare funds for any storage requirements in the next year – i wish all dept’s we had were this keen!!

    Anyways, i need 12 months data to go on and your query groups by month no matter what the year? eg i now need from Oct 2012 to Oct 2013 – but am struggling to seperate the year and month together, whereas only month used to seperate at the moment using DATEPART, any suggestions on pointing me in right direction how to do this please?

    Best Wishes

    Ian

    1. Hi Ian-

      Thanks for reading! You can group by month and year if you add DATEPART(year,[backup_start_date]) to your SELECT and to your GROUP BY. That should give you the info you need – let me know if it doesn’t.

      Thanks,

      Erin

  6. Hi Erin, yes that has done the trick – thanks so much, this blog post has provided me with some great information i have provided to the business :o)

  7. Great query. I added just a little, so it will sort by month and year, changed MB to GB and lopped off decimal places. CAST was to shorten the column width as I was exporting to a text file and mailing to myself. Also just looking at the current year, but could easily be commented out for all history. Someone else might find it useful.

    SELECT
    CAST([database_name] as Varchar(20)) AS “Database”,
    convert(varchar(7),[backup_start_date],120) AS “Month”,
    STR(AVG([backup_size]/1024/1024/1024),5,2) AS “Backup Size GB”,
    STR(AVG([compressed_backup_size]/1024/1024/1024),5,2) AS “Compressed Backup Size GB”,
    STR(AVG([backup_size]/[compressed_backup_size]),5,2) AS “Compression Ratio”
    FROM msdb.dbo.backupset
    WHERE [database_name] = N’Prod01′
    AND [type] = ‘D’
    and backup_finish_date > ‘2015-01-01’
    GROUP BY [database_name],convert(varchar(7),[backup_start_date],120)
    order BY [database_name],convert(varchar(7),[backup_start_date],120);

  8. Nice Post, Erin.. Did you also try to integrate the tlog growth with database growth to get a more unique results chart?

    1. Hi-

      I don’t – not sure what you’re looking for here or how you’re thinking this would be done with full backup info? I also expect that the tlog is static in size. If it’s not for some reason, then I would monitor/trend that differently.

      Erin

  9. Does this script work if backups are taken with a backup agent such as as Backup Exec or NetAgent? Is the data in MSDB still accurate?

  10. SELECT
    [database_name] AS “Database”,
    DATEPART(month,[backup_start_date]) AS “Month”,
    AVG([backup_size]/1024/1024) AS “Backup Size MB”,
    AVG([compressed_backup_size]/1024/1024) AS “Compressed Backup Size MB”,
    AVG([backup_size]/[compressed_backup_size]) AS “Compression Ratio”
    FROM msdb.dbo.backupset
    WHERE [database_name] = N’AdventureWorks’
    AND [type] = ‘D’
    GROUP BY [database_name],DATEPART(mm,[backup_start_date]);

    Just want to let you know,the script does not give results if I use [DBName]
    It should be without DBName without square brackets.

    Very Minute thing but I faced it 🙂

  11. One database of 1+ TB size (log file not counted) with 5 data files, of which 2 are active (one data and another for index, each onf different drive). The the index file (let’s say it is 20GB initially) almost tripled to 60GB in 6 months, while the active data file (initallyy at 100GB) doubled in the same period. If capacity planning using the whole db size as a starting point, the final prediction will be skewed to non-sufficiency capacity in another 6 months. But if we use individual file sizes as a starting point, we will get more accurate predictions.
    By the way! The best essay writing service – https://www.easyessay.pro/
    And Happy New Year!

  12. SELECT
    CAST([database_name] as Varchar(20)) AS ‘Database’,
    convert(varchar(7),[backup_start_date],120) AS ‘Month’,
    STR(AVG([backup_size]/1024/1024/1024),5,2) AS ‘Backup Size GB’,
    STR(AVG([compressed_backup_size]/1024/1024/1024),5,2) AS ‘Compressed Backup Size GB’,
    STR(AVG([backup_size]/[compressed_backup_size]),5,2) AS ‘Compression Ratio’
    FROM msdb.dbo.backupset
    WHERE [database_name] in(select name as database_name from sys.databases where database_id>4) –N’Prod01′
    AND [type] = ‘D’
    and backup_finish_date > ‘2015-01-01’
    GROUP BY [database_name],convert(varchar(7),[backup_start_date],120)
    order BY [database_name],convert(varchar(7),[backup_start_date],120);

    1. Hi Mirza-

      I assume this is a question – if you need to trend database growth per day, then change the date field (Month) in the SELECT to use something like DATEPART to get Month, Day, and Year information and display the growth information by day.

      Erin

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

A Fond Farewell

If you haven’t guessed from the title, I’m writing this post because I am leaving SQLskills. This Friday, January 14th, is my last day, and

Explore

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.