When I start working with a client, one question I always ask is whether they are collecting baselines of their SQL Server environment (a shocker, I know). If they are not, I explain why it’s a good idea to start capturing them. And even though I think it’s an easy argument to make, I find I make a better case when I have data to back it up. But how do you make the argument for baseline data, when you don’t have any real data to show?

There is data in SQL Server that you can mine; you just have to know where to find it. If I look at a client system and notice that maintenance tasks keep taking longer and longer, then I might assume it’s due to database growth. Now, if it’s just database integrity checks that are taking longer and longer, that might be a sign that something is wrong. However, that’s out of scope for this post, so let’s stick with the assumption that the database is growing larger over time because data is rarely deleted, only added. Depending on the client’s current storage and the duration of the tasks, I may have some concerns about how much disk space they’re going to need down the road. I really want to trend database growth, among other things, over time. And one way I can approximate growth is by using information from full backups.

When you backup a database, every page that is allocated in the database is copied to the backup. This means you could have a 100GB database with a backup of only 50GB, because only 50GB’s worth of pages are allocated. If my database files are pre-sized, as they hopefully are, then looking at backup size will not tell me anything about the current size of the database. However, it will tell me about the growth of it – which is really what I’m after.

Backup information is stored in msdb, and while it should be removed on a regular basis via a scheduled maintenance task, it is not unusual for at least three to six months of data to exist, if not more. Everything I need for this example I can capture from dbo.backupset, which has one row for every successful backup operation. Here’s my query*:

SELECT
[database_name] AS "Database",
DATEPART(month,[backup_start_date]) AS "Month",
AVG([backup_size]/1024/1024) AS "Backup Size MB",
AVG([compressed_backup_size]/1024/1024) AS "Compressed Backup Size MB",
AVG([backup_size]/[compressed_backup_size]) AS "Compression Ratio"
FROM msdb.dbo.backupset
WHERE [database_name] = N'AdventureWorks'
AND [type] = 'D'
GROUP BY [database_name],DATEPART(mm,[backup_start_date]);

In this query I’m filtering on a specific database, and I’m only looking at full backups (type = ‘D’). Log backups would be interesting to examine as well, but that’s for another post. I’m also aggregating all the full backups for one month. Whether you’re running full backups daily or weekly, I would recommend aggregating the data by month. Trying to look at the changes day-by-day or even week-by-week is too detailed. We want to look at the big picture, and a monthly summary gives us that. Here’s the output for my AdventureWorks database:

output thumb Trending Database Growth From Backups

Notice that the backup size increases over time, but it’s not linear. If I graph it in Excel, I can really see the trend:

image thumb Trending Database Growth From Backups

Further analysis is natural from this point on – what’s the percent increase each month? Each quarter? Which month had the largest increase? When is the database going to fill up the storage we have allocated currently? In my case, I just want to be able to show that we can get this kind of information, plus a lot more, from SQL Server if we just capture it. And this data supports my point very well. If you want to dig deeper into database growth analysis, I say run with it. J

Hopefully you now see how easy it to use data from SQL Server to make your life easier: the information the above query provides can help you understand database growth and start basic capacity planning. I also hope this information helps to convince you (or your manager) that collecting baseline data can be extremely beneficial, and now’s the time to start. If you need more background, or some queries to get you started, please check out my Baselines series on SQL Server Central. Good luck!

EDIT: *For those of you running SQL Server 2005 and below, you will need to exclude compression information:

SELECT
[database_name] AS "Database",
DATEPART(month,[backup_start_date]) AS "Month",
AVG([backup_size]/1024/1024) AS "Backup Size MB"
FROM msdb.dbo.backupset
WHERE [database_name] = N'AdventureWorks'
AND [type] = 'D'
GROUP BY [database_name],DATEPART(mm,[backup_start_date]);