An XEvent a Day (18 of 31) – A Look at Backup Internals and How to Track Backup and Restore Throughput (Part 2)

In yesterday’s blog post A Look at Backup Internals and How to Track Backup and Restore Throughput (Part 1), we looked at what happens when we Backup a database in SQL Server. Today, we are going to use the information we captured to perform some analysis of the Backup information in an attempt to find ways to decrease the time it takes to backup a database. When I began reviewing the data from the Backup in yesterdays post, I realized that I had made a mistake in the process and left Trace Flag 3213 off, which left some information that we’ll need out of the results, the Backup Buffer Configuration information. For this post I turned Trace Flag 3213 on, and reran the Backup from yesterday, so the results may differ slightly but the same Backup command was used for the tests.

The Backup Buffer Configuration information tells us how many Buffers were allocated, and what the Max Transfer Size being used is. In the screenshot above, this information is outlined in the red box. The default Buffer Count is determined by SQL Server when the BUFFERCOUNT option is not specified in the BACKUP DATABASE command. The calculation used is:

(NumberofBackupDevices * GetSuggestedIoDepth) + NumberofBackupDevices + (2*DatabaseDeviceCount)

This is covered in detail on Amit Banerjee’s blog post, Incorrect BufferCount data transfer option can lead to OOM condition. For the Backup in yesterday’s post, and the one that will be the baseline for today’s post, the BUFFERCOUNT option was not specified, and the Backup, as shown above, used 7 Buffers and the default MaxTransferSize of 1MB for the backup. If we look at the aggregated Event and wait_type information contained in the Event Session for the Backup we can begin to see what types of Backup bottlenecks we might have in the system.

SELECT ISNULL(wait_type, event_name) AS Operation, SUM(duration) AS duration, SUM(signal_duration) AS signal_duration, COUNT(*) AS occurences FROM #TestResults WHERE (duration IS NOT NULL OR signal_duration IS NOT NULL) AND ISNULL(wait_type, event_name) IN ('BACKUPBUFFER', 'BACKUPIO') GROUP BY ISNULL(wait_type, event_name)

Looking at this information, we have a large number of BACKUPBUFFER waits occurring during the backup of the database and this may be a potential tuning opportunity to improve the performance of our database Backups. To test this, we can change our Backup Script to include a 4GB MAXTRANSFERSIZE and a BUFFERCOUNT of 16. I also chose to change the filenames in the Extended Events Session to simplify working with the Target Data for each test individually. It is possible to specify the exact filename and metadatafile names when you read from the Target, but that requires more work than is needed in my opinion. For the sake of brevity I am not going to repeat all of the Extended Events information in this blog post, but instead show the outcome of running various configurations of BUFFERCOUNT against a test server.

Test Number	Backup File Count	Buffer Count	Max Transfer Size	Backup Time (s)	BACKUPBUFFER (wait ms)	BACKUPIO (wait ms)	BACKUPBUFFER (wait count)	BACKUPIO (wait count)
1	1	7	1024	122.5	159471	62587	81616	22815
2	1	16	4096	105.2	90963	69091	14513	7982
3	1	32	4096	99.5	75236	88634	12298	8679
4	1	128	4096	95.9	70173	63435	8292	4679
5	1	256	4096	95.9	50988	48942	1538	1135
6	2	128	4096	96	152323	63800	12416	4925
7	2	256	4096	96.4	109565	46953	3067	1195

The same Event Session was used to gather the above metrics in seven repetitive tests. The test server is a Dell R710 dual quad core system with 24GB RAM and HT enabled. It has eight internal 15K RPM 146GB SAS drives that are configured into 2 RAID 1 drive pairs and a single 4 disk RAID 5 array. One of the RAID 1 drive pairs was dedicated to the OS, and SQL Binaries. The other was used for writing the backups, and the database data/log/tempdb files were placed on the RAID 5 array. This isn’t the ideal configuration for a setup, but its what I had available at the moment to work with, and is similar to some of the configurations I’ve seen in the real world as well. The Backups were segregated to a dedicated disk that was RAID 1 for this test to avoid RAID 5 write penalties, and to maximize the backup write throughput by isolating it from any other operations.

The above results can be interpreted a number of different ways. As the BUFFERCOUNT increases the backup time decreases, and so does the amount of time spent waiting on Backup Buffers. However, there is a tradeoff that is being made in memory consumption; specifically memory outside of the buffer pool from Virtual Address Space. On 32 bit servers this can lead to Out of Memory exceptions, the topic of Amit’s blog post referenced previously in this blog post. Test Number 5, with 256 buffers and a 4MB transfer size will use 1GB of memory as shown in the Backup Buffer Configuration Information.

On the test server used for this testing, the bottleneck is the disks that the backup file is being written to and further improvements in performance will require additional IO to accomplish. The test database has 30GB of data in it, and with backup compression, the backup size is 7.8GB in size on disk. For a full backup to take just over a minute and a half for this database is not that bad, but it is going to local disks and there is no safeguard for the data in the event of a catastrophic loss of the physical server entirely unless the data gets copied to another location in the network after a local backup occurs.

One thought on “An XEvent a Day (18 of 31) – A Look at Backup Internals and How to Track Backup and Restore Throughput (Part 2)”

JeremyH says:

September 9, 2014 at 2:54 pm

A little late for this, but I believe you had a typo in the following sentence: “To test this, we can change our Backup Script to include a 4GB MAXTRANSFERSIZE and a BUFFERCOUNT of 16.” Did you mean 4MB MAXTRANSFERSIZE?

Great series by the way. I wish I would have started down the X events path sooner.

Reply

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.

An XEvent a Day (18 of 31) – A Look at Backup Internals and How to Track Backup and Restore Throughput (Part 2)

One thought on “An XEvent a Day (18 of 31) – A Look at Backup Internals and How to Track Backup and Restore Throughput (Part 2)”

Leave a Reply Cancel reply

Other articles

Protecting Transparent Data Encryption Databases

Do you need Trace Flag 460?

Event exceeds the number of allowable bound actions

SQL Server 2022: Measuring Extended Events Performance Impacts

What’s New in SQL Server 2022: GENERATE_SERIES()

Logging Database Scoped Configuration Changes

Imagine feeling confident enough to handle whatever your database throws at you.

An XEvent a Day (18 of 31) – A Look at Backup Internals and How to Track Backup and Restore Throughput (Part 2)

Check It Out!

SQL Server Jumpstart+ Bundle

One thought on “An XEvent a Day (18 of 31) – A Look at Backup Internals and How to Track Backup and Restore Throughput (Part 2)”

Leave a Reply Cancel reply

Other articles

Protecting Transparent Data Encryption Databases

Do you need Trace Flag 460?

Event exceeds the number of allowable bound actions

SQL Server 2022: Measuring Extended Events Performance Impacts

What’s New in SQL Server 2022: GENERATE_SERIES()

Logging Database Scoped Configuration Changes

Imagine feeling confident enough to handle whatever your database throws at you.