The Accidental DBA (Day 18 of 30): Baselines

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

Baselines are a part of our normal, daily life.  It usually takes 25 minutes to get to work?  Baseline.  You need 7 hours of sleep each night to feel human and be productive?  Baseline.  Your weight is…  Ok, we won’t go there, but you get my point.  Your database server is no different, it has baselines as well.  As a DBA it’s critical that you know what they are and how to use them.

The why…

“But wait,” you say, “why do I need baselines for my server?  It’s always working so there’s no commute, it hopefully never sleeps, and its weight never changes (so unfair).”  You need them; trust me.  A baseline of your database server:

  • Helps you find what’s changed before it becomes a problem
  • Allows you to proactively tune your databases
  • Allows you to use historical information when troubleshooting a problem
  • Provides data to use for trending of the environment and data
  • Captures data – actual numbers – to provide to management, and both server and storage administrators, for resource and capacity planning

There are many, viable reasons to capture baselines.  The challenge is the time it takes to figure out where to store the information, what to capture, and when to capture it.  You also need to create methods to report on it and really use that data.

Where to store your baseline data

You’re a DBA or developer, and you know T-SQL, so the most obvious place to store your baseline information is in a database.  This is your chance to not only exercise your database design skills, but put your DBA skills to work for your own database.  Beyond design, you also need space for the database, you need to schedule regular backups, and you also want to verify integrity and perform index and statistics maintenance regularly.  Most of the posts that have appeared in this Accidental DBA series are applicable for this database, as well as your Productions databases.

To get you started, here’s a CREATE DATABASE script that you can use to create a database to hold your baseline data (adjust file locations as necessary, and file size and growth settings as you see fit):

USE [master];
GO

CREATE DATABASE [BaselineData] ON PRIMARY
( NAME = N'BaselineData',
  FILENAME = N'M:\UserDBs\BaselineData.mdf',
  SIZE = 512MB,
  FILEGROWTH = 512MB
) LOG ON
( NAME = N'BaselineData_log',
  FILENAME = N'M:\UserDBs\BaselineData_log.ldf',
  SIZE = 128MB,
  FILEGROWTH = 512MB
);

ALTER DATABASE [BaselineData] SET RECOVERY SIMPLE;

What to capture

Now that you have a place to store your data, you need to decide what information to collect.  It’s very easy to start capturing baseline data with SQL Server, particularly in version 2005 and higher.  DMVs and catalog views provide a plethora of information to accumulate and mine.  Windows Performance Monitor is a built-in utility used to log metrics related to not just SQL Server but also the resources it uses such as CPU, memory, and disk.  Finally, SQL Trace and Extended Events can capture real-time query activity, which can be saved to a file and reviewed later for analysis or comparison.

It’s easy to get overwhelmed with all the options available, so I recommend starting with one or two data points and then adding on over time.  Data file sizes are a great place to start.  Acquiring more space for a database isn’t always a quick operation; it really depends on how your IT department is organized – and also depends on your company having unused storage available.  As a DBA, you want to avoid the situation where your drives are full, and you also want to make sure your data files aren’t auto-growing.

With the statements below, you can create a simple table that will list each drive and the amount of free space, as well as the snapshot date:

USE [BaselineData];
GO
IF EXISTS ( SELECT  1
     FROM    [sys].[tables]
     WHERE   [name] = N'FreeSpace' )
  DROP TABLE [dbo].[FileInfo]

CREATE TABLE [dbo].[FreeSpace] (
   [LogicalVolume] NVARCHAR(256),
   [MBAvailable] BIGINT,
   [CaptureDate] SMALLDATETIME
)
ON [PRIMARY];
GO

Then you can set up a SQL Agent job to capture the data at a regular interval with the query below:

INSERT INTO [dbo].[FreeSpace](
   [LogicalVolume],
   [AvailableBytes],
   [CaptureDate]
)
SELECT DISTINCT
   ([vs].[logical_volume_name]),
   ([vs].[available_bytes] / 1048576),
   GETDATE()
FROM [sys].[master_files] AS [f]
CROSS APPLY [sys].[dm_os_volume_stats]([f].[database_id],[f].[file_id]) AS [vs];
GO

There is a catch with the above query – it’s only applicable if you’re running SQL Server 2008 R2 SP1 and higher (including SQL Server 2012).  If you’re using a previous version, you can use xp_fixeddrives to capture the data:

INSERT INTO [dbo].[FreeSpace](
   [LogicalVolume],
   [MBAvailable]
)
EXEC xp_fixeddrives;

UPDATE [dbo].[FreeSpace]
SET [CaptureDate] = GETDATE()
WHERE [CaptureDate] IS NULL;
GO

Capturing free space is a great start, but if you’ve pre-sized your database files (which is recommended) the free space value probably won’t change for quite a while.  Therefore, it’s a good idea to capture file sizes and available space within as well.  You can find scripts to capture this information in my Capturing Baselines on SQL Server: Where’s My Space? article.

When to capture

Deciding when you will collect data will depend on the data itself.  For the file and disk information, the data doesn’t change often enough that you need it to collect hourly.  Daily is sufficient – perhaps even weekly if the systems are low volume.  If you’re capturing Performance Monitor data, however, then you would collect at shorter intervals, perhaps every 1 minute or every 5 minutes.  For any data collection, you have to find the right balance between capturing it often enough to accumulate the interesting data points, and not gathering so much data that it becomes unwieldy and hard to find what’s really of value.

Separate from the interval at which to capture, for some data you also need to consider the timeframes.  Performance Monitor is a great example.  You may decide to collect counters every 5 minutes, but then you have to determine whether you want to sample 24×7, only on weekdays, or only during business hours.   Or perhaps you only want to capture metrics during peak usage times.  When in doubt, start small.  While you can always change your collection interval and timeframe later on, it’s much easier to start small to avoid getting overwhelmed, rather than collect everything and then try to figure out what to remove.

Using baseline data

Once you’ve set up your process for data capture, what’s next?  It’s very easy to sit back and let the data accumulate, but you need to be proactive.  You won’t want to keep data forever, so put a job in place that will delete data after a specified time.  For the free space example above, it might make sense to add a clustered index on the [CaptureDate] column, and then purge data older than three months (or six months – how long you keep the data will depend on how you’re using it).

Finally, you need to use that data in some way.  You can simply report on it – the query below will give you free disk information for a selected volume for the past 30 days:

SELECT
   [LogicalVolume],
   [MBAvailable],
   [CaptureDate]
FROM [dbo].[FreeSpace]
WHERE [LogicalVolume] = 'C'
   AND [CaptureDate] > GETDATE() - 30
ORDER BY [CaptureDate];
GO

This type of query is great for trending and analysis, but to take full advantage of the data, as part of your daily Agent job, set up a second step that queries the current day’s values and if there is less than 10GB of free space, send you an email to notify you that disk space is low.

Getting started

At this point you should have a basic understanding of baselines, and you have a few queries to get you started.  If you want to learn more you can peruse my Baselines series on SQLServerCentral.com, and for an in-depth review, you can head over to Pluralsight to view my SQL Server: Benchmarking and Baselines course.  Once you’ve set up your baselines, you will be ready to explore quick methods to review or process the data.  There are many free tools that a DBA can use to not only see what happens in real-time, but also review captured data for analysis and trending.  In tomorrow’s post, we’ll look at a few of those utilities in more detail.

2 thoughts on “The Accidental DBA (Day 18 of 30): Baselines

  1. Your Freespace table has a column ‘MBAvailable’ The agent job that has INSERT INTO inserts into that table a column ‘AvailableBytes’. I assume they should be named the same. (It seems correct in the 2005 version)

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

A Fond Farewell

If you haven’t guessed from the title, I’m writing this post because I am leaving SQLskills. This Friday, January 14th, is my last day, and

Explore

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.