Over the weekend there was a question on one of the internal aliases at MS: how can I tell what percentage of a database has changed since the last full backup, so I can choose between a differential or full backup?
No such code exists as far as I know – until now! I happened to read the thread while sitting in the airport in Washington D.C. on the way back from Iceland so I started playing around and this morning I completed the code.
The code below creates a function and a stored procedure. The basic idea behind the code is as follows:
For each online data file in the database For each GAM interval in the file Crack the DIFF map page using DBCC PAGE Interpret the DIFF bitmap to aggregate the changed extents Add the sum to the total changed extents for the database End End Report results
There’s a function that I create in msdb called SQLskillsConvertToExtents that cracks some of the DBCC PAGE output, and the main procedure is called sp_SQLskillsDIFForFULL and it created as a system object in master. I tried making it a table-valued function but you can’t do things like INSERT EXEC in a function, and that’s required for processing the DBCC PAGE output. So – create your own wrapper function or whatever to use it. The interface/output is:
EXEC sp_SQLskillsDIFForFULL 'msdb'; GO
Total Extents Changed Extents Percentage Changed ------------- --------------- ---------------------- 102 56 54.9
It’s been tested with databases with multiple files and up to around 125TB. It will not work on SQL Server 2000
Note that after doing a full backup you will never see Changed Extents equal to zero. It will always be 4 + (number of online data files – 1), and around 20 or so for msdb. This is because the extent containing the file header in each file is always marked as changed, as are three extents in the primary file containing the roots of some critical system tables.
Anyway – here it is. You can download it in a zip file from SQLskillsDIFForFULL.zip (2.65KB). Enjoy!
/*============================================================================ File: SQLskillsDIFForFULL.sql Summary: This script creates a system-wide SP SQLskillsDIFForFILL that works out what percentage of a database has changed since the previous full database backup. Date: April 2008 (revised August 2017) SQL Server Versions: All version post SQL Server 2000 ------------------------------------------------------------------------------ Copyright (C) 2008-2017 Paul S. Randal, SQLskills.com All rights reserved. For more scripts and sample code, check out http://www.SQLskills.com You may alter this code for your own *non-commercial* purposes. You may republish altered code as long as you give due credit. THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE. ============================================================================*/ -- Create the function in MSDB -- USE msdb; GO IF EXISTS (SELECT * FROM sys.objects WHERE NAME = N'SQLskillsConvertToExtents') DROP FUNCTION [SQLskillsConvertToExtents]; GO -- This function cracks the output from a DBCC PAGE dump -- of an allocation bitmap. It takes a string in the form -- "(1:8) - (1:16)" or "(1:8) -" and returns the number -- of extents represented by the string. Both the examples -- above equal 1 extent. -- CREATE FUNCTION [SQLskillsConvertToExtents] ( @extents VARCHAR (100)) RETURNS INTEGER AS BEGIN DECLARE @extentTotal INT; DECLARE @colon INT; DECLARE @firstExtent INT; DECLARE @secondExtent INT; SET @extentTotal = 0; SET @colon = CHARINDEX (':', @extents); -- Check for the single extent case -- IF (CHARINDEX (':', @extents, @colon + 1) = 0) SET @extentTotal = 1; ELSE -- We're in the multi-extent case -- BEGIN SET @firstExtent = CONVERT (INT, SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1)); SET @colon = CHARINDEX (':', @extents, @colon + 1); SET @secondExtent = CONVERT (INT, SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1)); SET @extentTotal = (@secondExtent - @firstExtent) / 8 + 1; END RETURN @extentTotal; END; GO USE [master]; GO IF OBJECT_ID (N'sp_SQLskillsDIFForFULL') IS NOT NULL DROP PROCEDURE [sp_SQLskillsDIFForFULL]; GO -- This SP cracks all differential bitmap pages for all online -- data files in a database. It creates a sum of changed extents -- and reports it as follows (example small msdb): -- -- EXEC sp_SQLskillsDIFForFULL 'msdb'; -- GO -- -- Total Extents Changed Extents Percentage Changed -- ------------- --------------- ---------------------- -- 102 56 54.9 -- -- Note that after a full backup you will always see some extents -- marked as changed. The number will be 4 + (number of data files - 1). -- These extents contain the file headers of each file plus the -- roots of some of the critical system tables in file 1. -- The number for msdb may be around 20. -- CREATE PROCEDURE [sp_SQLskillsDIFForFULL] ( @dbName SYSNAME) AS BEGIN SET NOCOUNT ON; -- Create the temp table -- IF EXISTS (SELECT * FROM [msdb].[sys].[objects] WHERE NAME = N'SQLskillsDBCCPage') DROP TABLE [msdb].[dbo].[SQLskillsDBCCPage]; CREATE TABLE msdb.dbo.SQLskillsDBCCPage ( [ParentObject] VARCHAR (100), [Object] VARCHAR (100), [Field] VARCHAR (100), [VALUE] VARCHAR (100)); DECLARE @fileID INT; DECLARE @fileSizePages INT; DECLARE @extentID INT; DECLARE @pageID INT; DECLARE @DIFFTotal BIGINT; DECLARE @sizeTotal BIGINT; DECLARE @total BIGINT; DECLARE @dbccPageString VARCHAR (200); SELECT @DIFFtotal = 0; SELECT @sizeTotal = 0; -- Setup a cursor for all online data files in the database -- DECLARE [files] CURSOR FOR SELECT [file_id], [size] FROM master.sys.master_files WHERE [type_desc] = N'ROWS' AND [state_desc] = N'ONLINE' AND [database_id] = DB_ID (@dbName); OPEN files; FETCH NEXT FROM [files] INTO @fileID, @fileSizePages; WHILE @@FETCH_STATUS = 0 BEGIN SELECT @extentID = 0; -- The size returned from master.sys.master_files is in -- pages - we need to convert to extents -- SELECT @sizeTotal = @sizeTotal + @fileSizePages / 8; WHILE (@extentID < @fileSizePages) BEGIN SELECT @pageID = @extentID + 6; -- Build the dynamic SQL -- SELECT @dbccPageString = 'DBCC PAGE ([' + @dbName + '], ' + CAST (@fileID AS VARCHAR) + ', ' + CAST (@pageID AS VARCHAR) + ', 3) WITH TABLERESULTS, NO_INFOMSGS'; -- Empty out the temp table and insert into it again -- TRUNCATE TABLE [msdb].[dbo].[SQLskillsDBCCPage]; INSERT INTO [msdb].[dbo].[SQLskillsDBCCPage] EXEC (@dbccPageString); -- Aggregate all the changed extents using the function -- SELECT @total = SUM ([msdb].[dbo].[SQLskillsConvertToExtents] ([Field])) FROM [msdb].[dbo].[SQLskillsDBCCPage] WHERE [VALUE] = ' CHANGED' AND [ParentObject] LIKE 'DIFF_MAP%'; SET @DIFFtotal = @DIFFtotal + @total; -- Move to the next GAM extent SET @extentID = @extentID + 511232; END FETCH NEXT FROM [files] INTO @fileID, @fileSizePages; END; -- Clean up -- DROP TABLE [msdb].[dbo].[SQLskillsDBCCPage]; CLOSE [files]; DEALLOCATE [files]; -- Output the results] -- SELECT @sizeTotal AS [Total Extents], @DIFFtotal AS [Changed Extents], ROUND ( (CONVERT (FLOAT, @DIFFtotal) / CONVERT (FLOAT, @sizeTotal)) * 100, 2) AS [Percentage Changed]; END; GO -- Mark the SP as a system object -- EXEC sys.sp_MS_marksystemobject [sp_SQLskillsDIFForFULL]; GO -- Test to make sure everything was setup correctly -- EXEC [sp_SQLskillsDIFForFULL] N'msdb'; GO
40 thoughts on “New script: How much of the database has changed since the last full backup?”
Do you have a 2000 version?
Should be easy to convert this to 2000 – just change the names of the system tables and it should work fine.
do I go right, that I can use this also to estimate the size of a SAN snapshot, where all changes on block level are captured?
hey thanks for this useful sql code.
thanks for your script…
two small bugs:
1)non-existent dbname will get zero @sizeTotal, and lead to divide-by-zero error;
2)dbname with dash sign ‘-‘ will get bad ‘DBCC PAGE’ sql, then EXEC (@dbccPageString) gets syntax error.
1) you can just add sys.databases to the cursor, because the “name” in master_files is the data_space’s name and not the database’s (and they might be different):
— Setup a cursor for all online data files in the database
DECLARE files CURSOR FOR
SELECT mf.[file_id], mf.[size]
FROM master.sys.master_files mf
join sys.databases dbs on mf.database_id=dbs.database_id
WHERE mf.[type_desc] = ‘ROWS’
AND mf.[state_desc] = ‘ONLINE’
2) to avoid problems with dash signs ‘-‘, spaces, etc., you can just add ”” before and after the @dbName, here:
— Build the dynamic SQL
SELECT @dbccPageString = ‘DBCC PAGE (‘
+ ””+@dbName+”” + ‘, ‘
+ CAST (@fileID AS VARCHAR) + ‘, ‘
+ CAST (@pageID AS VARCHAR) + ‘, 3) WITH TABLERESULTS, NO_INFOMSGS’;
just change the @dbname to quotename(@dbname)
Another gem.. even if it is 9 years old. One question– I do full daily backups at 11pm. But after that– a few SAN-triggered backups get run, to a virtual device (these facilitate snaps/clones I believe); the most recent at 9:30am today.
So, to extrapolate changes for 1 day– should I be counting time since the 11pm “real” backup or the 9:30am “virtual” backup?
Thanks for your help!
Unless the virtual backups are interacting with SQL Server in some way (using the VDI backup interface), you should be counting from the 11pm backup.
I ran it on SQL 2014 SP2 CU5 with a 7TB and 125TB databases and it ran fine, giving results for both but with some error messages on the 125TB database. The error was: Msg 8115, Level 16, State 2, Procedure sp_SQLskillsDIFForFULL, Line 281 Arithmetic overflow error converting expression to data type int.
The 7TB database was strictly regular data, no LOB data. The 125TB database had mostly LOB data, varbinary(max), images serialized. There were multiple filegroups and files composing each database spread over multiple controllers and disk arrays. The 125TB result was calculated on the secondary replica of an Availability Group.
Thanks for letting me know. Just out of interest, did you try changing the ints to bigints? That should stop the problem. I wasn’t able to test with such a large database at the time :-) 125TB is pretty big, even by today’s standards.
I did change to bigint and the errors went away. Worked fine and ran quicker than the first time.
Thanks for the script!
Perfect – I changed the script accordingly – thanks!
The script doesn’t handle database names with hyphens in them. *cough*SharePoint*cough*.
So recommend putting square brackets around the @dbName on lines 155-156 like so:
SELECT @dbccPageString = ‘DBCC PAGE ([‘
+ @dbName + ‘], ‘
Thanks for this script though! Love it.
Thanks – done!
I had to tweak it in a few places…fixed variable name casing for our case-sensitive servers, and because I’m paranoid I put the function in tempdb and changed the table objects to temp tables. But this is awesome! I was thinking in context of estimating how large a daily database snapshot (of a mirror or AG secondary) would grow, and looking at the diff map stuff might give an estimate of how much has changed. Was not at all surprised to find you had already coded it!
Do you think that’s a reasonable way to estimate database snapshot size? For instance, a 1TB database that gets up to, say, 10% changed each day, would you expect the snapshot to grow to 100GB daily before it is dropped and recreated?
Mahalo nui loa!
Thanks. Yes, it’s about as good as you can get from looking at the bitmaps. Remember that it’s per page pushed into the snapshot, so it’s possible the script may over-estimate, but my gut-feel says it’ll all even out and should be a reasonable estimate.
This is a brilliant script. Thanks for making this available to the community.
I made a single change to the code that I think others might find useful. I found that the original code runs a little slow on a 65TB database I support. I changed the line DELETE FROM [msdb].[dbo].[SQLskillsDBCCPage]; to TRUNCATE TABLE [msdb].[dbo].[SQLskillsDBCCPage];
My execution time went from 31 minutes to 3 minutes with that change.
Cool – I made that change in the post too – thanks!
Hi Paul, thanks for sharing this with us. I took this information further and created a script that checks all databases at once without using cursors, hope this is useful for everyone. http://www.mssqltips.com/sqlservertip/5688/script-to-show-how-much-a-sql-server-database-has-changed-since-last-full-backup/
I run the script and get the following result:
Total Extents Changed Extents Percentage Changed
201778 716 0.35
The database size is 13 GB,
But the differential backup size is greater than 3 GB !!!
The recovery model is simple.
Database changes are very little.
What can be the cause of this?
Transaction log? Did you run the script before or after the diff backup? Do you have the output of the diff backup?
The transaction log size is 4 GB and the log space used is 54.96636.
I run the script before and after diff backup (database changes are very little and the script returns almost the same result).
Yes, I have.
Log_reuse_wait_desc is nothing!
Could it be related to the checkpoint?
No. I need to see the output from the backup command itself.
Executed as user: SQL\SQLUser2016. Processed 47088 pages for database ‘NewSystem’, file ‘NewSystem’ on file 1. [SQLSTATE 01000] (Message 4035)
Processed 8 pages for database ‘NewSystem’, file ‘FBlobData’ on file 1. [SQLSTATE 01000] (Message 4035) Processed 0 pages for database ‘NewSystem’,
file ‘FileStrm’ on file 1. [SQLSTATE 01000] (Message 4035) Processed 4 pages for database ‘NewSystem’, file ‘FInmom’ on file 1. [SQLSTATE 01000] (Message 4035)
Processed 354927 pages for database ‘NewSystem’, file ‘NewSystem_log’ on file 1. [SQLSTATE 01000] (Message 4035)
BACKUP DATABASE WITH DIFFERENTIAL successfully processed 402027 pages in 23.277 seconds (134.932 MB/sec). [SQLSTATE 01000] (Message 3014). The step succeeded.
Paul, I run the following command and then I take a differential backup. the size of the diff backup is 248 MB !!!
As I thought – it’s the transaction log that’s the bulk of the backup. Data backups have to back up all log from the oldest of: checkpoint, start of oldest active transaction, start of oldest unreplicated transaction if replication enabled.
Did you ever finish your script to determine how much data will be included in the next log backup, and could that code be adapted to this differential calculation?
I’m running into this same issue where replication is hung up preventing the log from being cleared, and it’s causing my differentials to be larger than expected. Our backup solution needs to create multi-file backups to limit the size of the files due to vendor limitations, and the differential calculation is underestimating the backup size because it’s not including the log space.
I’ve searched a bit online and haven’t found a solution yet.
I couldn’t find a good way to do it, so I didn’t end up writing the script.
Thank you! for your answer!
what is the solution?
Can I Add the checkpoint command before the differential backup job?
No – all data backups always do a checkpoint as the first operation. You need to commit the long-running transaction or take care of replication.
Yes, I Know!
The system does not use replication!
DBCC OpenTran Does not show anything!
Log_reuse_wait_desc is nothing and does not show an active transaction!
Without getting on your system to look around, it’s hard to say and not something we can troubleshoot through blog comments I’m afraid. And no need to put an explanation mark on your replies.
Thank you paul!
You are the best teacher.
He is a reworking of the script, removing the need for CURSORs and the creation of objects in system databases, and allowing for a list of particular databases.
This script is based entirely on a script written by Paul Randal
Copyright (C) 2008-2017 Paul S. Randal, SQLskills.com
All rights reserved.
I rewrote it for several reasons:
1 – I have an aversion to creating objects in the system databases
2 – I don’t like CURSORs
3 – I was bored during COVID-19 lockdown
SET NOCOUNT ON;
DECLARE @databases varchar(8000); — optional comma-separated list of database names
–SET @databases = ”
— create the 2 temp tables
IF OBJECT_ID(‘tempdb..#results’,’U’) IS NOT NULL
DROP TABLE #results;
CREATE TABLE #results (
[Database Name] sysname NOT NULL,
[Total Extents] int NOT NULL,
[Changed Extents] int NOT NULL
IF OBJECT_ID(‘tempdb..#page’,’U’) IS NOT NULL
DROP TABLE #page;
CREATE TABLE #page (
DECLARE @dbname sysname;
DECLARE @file_id int, @file_pages int, @sql varchar(8000), @extent_id int, @total_extents int, @total_diffs int, @total int;
— get next database
SELECT TOP(1) @dbname = [name]
WHERE [state] = 0
AND [name] > ISNULL(@dbname,N”)
AND [name] N’tempdb’
AND (@databases IS NULL OR CHARINDEX(‘,’+[name]+’,’,’,’+@databases+’,’) > 0)
ORDER BY [name];
IF @@ROWCOUNT = 0
SET @total_extents = 0;
SET @total_diffs = 0;
SET @file_id = 0;
— get next data file for database
SELECT TOP(1) @file_id = [file_id], @file_pages = [size]
WHERE [type] = 0 –ROWS
AND [state] = 0 –ONLINE
AND [database_id] = DB_ID(@dbname)
AND [file_id] > @file_id
ORDER BY [file_id];
IF @@ROWCOUNT = 0
SET @total_extents = @total_extents + @file_pages/8; — convert pages to extents
SET @extent_id = 0;
— wind through the GAMs
WHILE @extent_id < @file_pages
— create query to get DIFF page info
SET @sql = 'DBCC PAGE ([' + @dbname + '], ' + CAST (@file_id AS varchar(9)) + ', '
+ CAST (@extent_id+6 AS varchar(9)) + ', 3) WITH TABLERESULTS, NO_INFOMSGS';
TRUNCATE TABLE #page;
INSERT INTO #page EXEC (@sql);
— parse the DIFFs and get the total
SELECT @total = SUM(diffs)
CROSS APPLY ( SELECT CHARINDEX(':', [Field]) AS colon1 ) x
CROSS APPLY ( SELECT CHARINDEX(':', [Field], colon1 + 1) AS colon2 ) y
CROSS APPLY (
SELECT CASE WHEN colon2 = 0 THEN 1
ELSE ( CAST(SUBSTRING([Field], colon2 + 1, CHARINDEX(')', [Field], colon2) – colon2 – 1) AS int)
– CAST(SUBSTRING([Field], colon1 + 1, CHARINDEX(')', [Field], colon1) – colon1 – 1) AS int) ) / 8 + 1
END AS diffs
WHERE [VALUE] = ' CHANGED'
AND [ParentObject] LIKE 'DIFF_MAP%';
SET @total_diffs = @total_diffs + @total;
— jump to next GAM
SET @extent_id = @extent_id + 511232;
— save the results for this database
INSERT INTO #results ([Database Name], [Total Extents], [Changed Extents])
VALUES (@dbname, @total_extents, @total_diffs);
— show the results
SELECT [Database Name],
CAST(([Changed Extents]*100.0)/[Total Extents] AS numeric(7,2)) AS [Percentage Changed]
ORDER BY [Database Name];
— cleanup time
IF OBJECT_ID('tempdb..#results','U') IS NOT NULL
DROP TABLE #results;
IF OBJECT_ID('tempdb..#page','U') IS NOT NULL
DROP TABLE #page;
This looks great – but I think the script is malformed because of pasting. You wouldn’t happen to have a link to the clean file would you Robert Steen.
I am trying to integrate this into Ola Hallengren’s scripts to use the output to decide if to do a full backup or just do a diff.
Wouldn’t it be nice if we could save doing a full backup if a database had not changed at all since the last full?
Thanks in advance!
Hey Michael, were you able to do any work on integrating this into Ola’s MaintenanceSolution script?
Should be using modified_extent_page_count in sys.dm_db_file_space_usage if you are on SQL 2016 sp2 or later, right?
Absolutely you could write a script to do that if you wanted, or just continue using my script, as it works on all versions of SQL Server, albeit a bit slower.
I tried this:
select (sum(modified_extent_page_count) * 100.0) / sum(allocated_extent_page_count) [% modified extents]
But it doesn’t give me the same results (22% instead of your script returning 35%), being a count at the page level instead of extents. Perhaps its accurate enough for my purposes.
With the DMVs introduced in SQL Server 2016, you can now find the number of modified pages since the last full backup via this simple query:
Divide the values by 8 to go from page to extent level.
The results I’m getting are similar to those in Paul’s query, and they’re returned a lot quicker.