When I start working with a client, one question I always ask is whether they are collecting baselines of their SQL Server environment (a shocker, I know). If they are not, I explain why it’s a good idea to start capturing them. And even though I think it’s an easy argument to make, I find I make a better case when I have data to back it up. But how do you make the argument for baseline data, when you don’t have any real data to show?
There is data in SQL Server that you can mine; you just have to know where to find it. If I look at a client system and notice that maintenance tasks keep taking longer and longer, then I might assume it’s due to database growth. Now, if it’s just database integrity checks that are taking longer and longer, that might be a sign that something is wrong. However, that’s out of scope for this post, so let’s stick with the assumption that the database is growing larger over time because data is rarely deleted, only added. Depending on the client’s current storage and the duration of the tasks, I may have some concerns about how much disk space they’re going to need down the road. I really want to trend database growth, among other things, over time. And one way I can approximate growth is by using information from full backups.
When you backup a database, every page that is allocated in the database is copied to the backup. This means you could have a 100GB database with a backup of only 50GB, because only 50GB’s worth of pages are allocated. If my database files are pre-sized, as they hopefully are, then looking at backup size will not tell me anything about the current size of the database. However, it will tell me about the growth of it – which is really what I’m after.
Backup information is stored in msdb, and while it should be removed on a regular basis via a scheduled maintenance task, it is not unusual for at least three to six months of data to exist, if not more. Everything I need for this example I can capture from dbo.backupset, which has one row for every successful backup operation. Here’s my query*:
SELECT [database_name] AS "Database", DATEPART(month,[backup_start_date]) AS "Month", AVG([backup_size]/1024/1024) AS "Backup Size MB", AVG([compressed_backup_size]/1024/1024) AS "Compressed Backup Size MB", AVG([backup_size]/[compressed_backup_size]) AS "Compression Ratio" FROM msdb.dbo.backupset WHERE [database_name] = N'AdventureWorks' AND [type] = 'D' GROUP BY [database_name],DATEPART(mm,[backup_start_date]);
In this query I’m filtering on a specific database, and I’m only looking at full backups (type = ‘D’). Log backups would be interesting to examine as well, but that’s for another post. I’m also aggregating all the full backups for one month. Whether you’re running full backups daily or weekly, I would recommend aggregating the data by month. Trying to look at the changes day-by-day or even week-by-week is too detailed. We want to look at the big picture, and a monthly summary gives us that. Here’s the output for my AdventureWorks database:
Notice that the backup size increases over time, but it’s not linear. If I graph it in Excel, I can really see the trend:
Further analysis is natural from this point on – what’s the percent increase each month? Each quarter? Which month had the largest increase? When is the database going to fill up the storage we have allocated currently? In my case, I just want to be able to show that we can get this kind of information, plus a lot more, from SQL Server if we just capture it. And this data supports my point very well. If you want to dig deeper into database growth analysis, I say run with it. J
Hopefully you now see how easy it to use data from SQL Server to make your life easier: the information the above query provides can help you understand database growth and start basic capacity planning. I also hope this information helps to convince you (or your manager) that collecting baseline data can be extremely beneficial, and now’s the time to start. If you need more background, or some queries to get you started, please check out my Baselines series on SQL Server Central. Good luck!
EDIT: *For those of you running SQL Server 2005 and below, you will need to exclude compression information:
SELECT [database_name] AS "Database", DATEPART(month,[backup_start_date]) AS "Month", AVG([backup_size]/1024/1024) AS "Backup Size MB" FROM msdb.dbo.backupset WHERE [database_name] = N'AdventureWorks' AND [type] = 'D' GROUP BY [database_name],DATEPART(mm,[backup_start_date]);
23 thoughts on “Trending Database Growth From Backups”
Thanks, this is helpful. As a DBA, I often need to request disk space from network operations. And this can help me to track the growth and estimate the future size.
Also I would like to track database data drive and log drive growth, I guess there is no a history table that we can query with, we have to save the data to a table and then compare with that, correct?
Thanks
Anne
Correct, you need to log drive size and space used with a job, over time. In my next article on SQL Server Central I have scripts for that (I don’t have a publish date for it yet though!).
Hi Erin,
My company host a database server that serves as the backend to thousands of card readers. Other the last few months, changes have been introduced that have increased the workload exponentially. I have been collecting PERFMON and waitstat baselines but at irregular intervals, we have monitoring tools but they are generic windows server focused, not targeted at SQL like SCOM, Spotlight and so on.
I’m not sure if it is really that representational (as SELECT type query activity is obviously missing) but I wrote a similar query to yours above to get the size of transaction log backups and put the results in an excel pivot table. The transaction logs grow proportionally to the level of update query activity (is my supposition, think that is correct), I used this to illustrate the increasing transactional load on the server.
Along the x axis (horizontal time series) I built a table variable of 1 hour time intervals then cross applied queries to msdb backup set size info. I can try and find the query if you are interested. It’s horribly slow but result set pivots nicely.
Regards
This one? http://www.sqlservercentral.com/articles/baselines/94656/
Hi Erin,
This isn’t a great query but the results pivot nicely in Excel and can be quite interesting in some cases.
DECLARE @Start datetime, @End datetime
DECLARE @Intervals Table
(
iDatabase sysname,
iBegin datetime,
iEnd datetime
)
SET @Start = ‘1 Dec,2012 00:00′
SET @End = ’31 Dec,2012 23:59’
— x axis time intervals
WHILE @Start < @End
BEGIN
INSERT INTO @Intervals (iDatabase, iBegin, iEnd)
SELECT
name,
@Start,
DATEADD(hour,1,@Start)
FROM sys.databases
WHERE database_id > 4
SELECT @Start = DATEADD(hour,1,@Start)
END
SELECT
y.iBegin AS FromDateTime,
y.iEnd ToDateTime,
y.idatabase AS DatabaseName,
SUM(ISNULL(y.BackupSizeMB,0)) AS TranLogBackupSizeMB
FROM
(
SELECT
iBegin,
iEnd,
i.iDatabase,
x.BackupSizeMB
FROM @Intervals i
LEFT OUTER JOIN
(
SELECT
bs.database_name,
bs.backup_start_date,
(compressed_backup_size/1024)/1024 AS BackupSizeMB
FROM msdb.dbo.backupset bs
INNER JOIN sys.databases d
ON bs.database_name = d.name
JOIN msdb.dbo.backupmediafamily mf
ON mf.media_set_id = bs.media_set_id
WHERE type = ‘L’
) x
ON x.database_name = i.iDatabase
WHERE ISNULL(x.backup_start_date,i.iBegin) BETWEEN i.iBegin AND i.iEnd
) y
GROUP BY
y.iBegin,
y.iEnd,
y.idatabase
You need continous time intervals along the x axis (perhaps not hourly in all cases) to make the result (total transaction log backup size for the time interval in the above) meaningful. The result can be skewed though by changes to transaction log backup job schedules, irregular reindexing jobs etc.
This simple ‘time series table variable for x axis intervals’ technique works well with Excel pivot table trend analysis type queries like yours though.
Best wishes
Paul
Belated thanks for the query Paul!!
In my case, the backup_size and compressed_backup_size are same.
Hi Ahmad-
If you check BOL (http://technet.microsoft.com/en-us/library/ms186299.aspx), [compressed_backup_size] is:
Total Byte count of the backup stored on disk
If you’re not using compression, then this will be the same as [backup_size].
Erin
Hi Erin,
thanks for much for this, is exactly what i needed after a request from a business area requesting data so they can pro-actively trend and prepare funds for any storage requirements in the next year – i wish all dept’s we had were this keen!!
Anyways, i need 12 months data to go on and your query groups by month no matter what the year? eg i now need from Oct 2012 to Oct 2013 – but am struggling to seperate the year and month together, whereas only month used to seperate at the moment using DATEPART, any suggestions on pointing me in right direction how to do this please?
Best Wishes
Ian
Hi Ian-
Thanks for reading! You can group by month and year if you add DATEPART(year,[backup_start_date]) to your SELECT and to your GROUP BY. That should give you the info you need – let me know if it doesn’t.
Thanks,
Erin
Hi Erin, yes that has done the trick – thanks so much, this blog post has provided me with some great information i have provided to the business :o)
Ian – Glad to hear it, thanks for circling back and letting me know!
Great query. I added just a little, so it will sort by month and year, changed MB to GB and lopped off decimal places. CAST was to shorten the column width as I was exporting to a text file and mailing to myself. Also just looking at the current year, but could easily be commented out for all history. Someone else might find it useful.
SELECT
CAST([database_name] as Varchar(20)) AS “Database”,
convert(varchar(7),[backup_start_date],120) AS “Month”,
STR(AVG([backup_size]/1024/1024/1024),5,2) AS “Backup Size GB”,
STR(AVG([compressed_backup_size]/1024/1024/1024),5,2) AS “Compressed Backup Size GB”,
STR(AVG([backup_size]/[compressed_backup_size]),5,2) AS “Compression Ratio”
FROM msdb.dbo.backupset
WHERE [database_name] = N’Prod01′
AND [type] = ‘D’
and backup_finish_date > ‘2015-01-01’
GROUP BY [database_name],convert(varchar(7),[backup_start_date],120)
order BY [database_name],convert(varchar(7),[backup_start_date],120);
Nice Post, Erin.. Did you also try to integrate the tlog growth with database growth to get a more unique results chart?
Hi-
I don’t – not sure what you’re looking for here or how you’re thinking this would be done with full backup info? I also expect that the tlog is static in size. If it’s not for some reason, then I would monitor/trend that differently.
Erin
Does this script work if backups are taken with a backup agent such as as Backup Exec or NetAgent? Is the data in MSDB still accurate?
Chris-
I’ve never tested with Backup Exec or NetAgent, so I’m not sure how it gets logged in msdb (but I expect there would be an entry in there).
SELECT
[database_name] AS “Database”,
DATEPART(month,[backup_start_date]) AS “Month”,
AVG([backup_size]/1024/1024) AS “Backup Size MB”,
AVG([compressed_backup_size]/1024/1024) AS “Compressed Backup Size MB”,
AVG([backup_size]/[compressed_backup_size]) AS “Compression Ratio”
FROM msdb.dbo.backupset
WHERE [database_name] = N’AdventureWorks’
AND [type] = ‘D’
GROUP BY [database_name],DATEPART(mm,[backup_start_date]);
Just want to let you know,the script does not give results if I use [DBName]
It should be without DBName without square brackets.
Very Minute thing but I faced it 🙂
One database of 1+ TB size (log file not counted) with 5 data files, of which 2 are active (one data and another for index, each onf different drive). The the index file (let’s say it is 20GB initially) almost tripled to 60GB in 6 months, while the active data file (initallyy at 100GB) doubled in the same period. If capacity planning using the whole db size as a starting point, the final prediction will be skewed to non-sufficiency capacity in another 6 months. But if we use individual file sizes as a starting point, we will get more accurate predictions.
By the way! The best essay writing service – https://www.easyessay.pro/
And Happy New Year!
SELECT
CAST([database_name] as Varchar(20)) AS ‘Database’,
convert(varchar(7),[backup_start_date],120) AS ‘Month’,
STR(AVG([backup_size]/1024/1024/1024),5,2) AS ‘Backup Size GB’,
STR(AVG([compressed_backup_size]/1024/1024/1024),5,2) AS ‘Compressed Backup Size GB’,
STR(AVG([backup_size]/[compressed_backup_size]),5,2) AS ‘Compression Ratio’
FROM msdb.dbo.backupset
WHERE [database_name] in(select name as database_name from sys.databases where database_id>4) –N’Prod01′
AND [type] = ‘D’
and backup_finish_date > ‘2015-01-01’
GROUP BY [database_name],convert(varchar(7),[backup_start_date],120)
order BY [database_name],convert(varchar(7),[backup_start_date],120);
thank you
I need daily basis database growth check any query
Hi Mirza-
I assume this is a question – if you need to trend database growth per day, then change the date field (Month) in the SELECT to use something like DATEPART to get Month, Day, and Year information and display the growth information by day.
Erin