I recently had an interesting issue with a new SQL Server 2014 Enterprise Edition instance that was going to be hosting about 200 user databases that each use the SQL Server Filetable feature. The reported problem was that this new instance was taking two-three times longer to restore each full database backup than the restores were taking on a different instance (that had similar hardware and storage resources).
I had several initial ideas about what the problem might be, including:
- Windows instant file initialization was not enabled
This would force Windows to have to “zero-initialize” the database data file after it was created during the restore, which slows down the restore
It was already enabled on this instance, so that was not the problem
- The transaction log files were particularly large for these user databases
- Windows instant file initialization does not work on log files, only on data files
- This issue would be present on any database instance, not just one
- The VLF counts were particularly high in the log files for these databases
- This causes the recovery portion for a full database restore (or crash recovery) to take much longer
- None of the VLF counts were above 300, plus this would behave the same on any database instance
After my initial ideas were found to be incorrect, I tried restoring a full backup of one of the user databases to the new database server, making sure to use the Stats = 1 option, so the restore progress statistics would update every 1%. This gives you more immediate and fine-grained feedback about the progress of the restore, which is always a good idea.
While the restore was running, I carefully watched the disk activity in Windows Resource Monitor. You can sort the columns in Windows Resource Monitor to see the highest Reads and Writes (in bytes/second) while the database restore is running. This will show the reads from the database backup file, then the writes to the database data file(s) and the database log file. It also showed the writes to the filetable files, and then something that I did not expect…
The full database restore went very fast until it got to 92%. Then I saw a huge number of parallel reads by the MsMpEng.exe process (which is the Microsoft Antimalware service), which slowed the restore progress to an absolute crawl, also pegging one of the CPU cores on the instance (so it looks like that service is single-threaded). This seemed to be a major factor in the slowness of the database restore as MsMpEng.exe process was scanning the 8,684 Filetable documents that were in this particular user database.
The solution to this issue was simply changing the exclusion rules for the Microsoft Antimalware Service to not scan the files in the filetable directories. This one change decreased the database restore time by 56%.
Here is the restore command and the results of the test.
-- Test restore of TestUserDatabase -- This database uses Filetables RESTORE DATABASE [TestUserDatabase] FROM DISK = N'T:\BackupRestoreTest\TestUserDatabase_FULL.bak' WITH FILE = 1, MOVE N'TestUserDatabase' TO N'D:\SQLDATA\TestUserDatabase.mdf', MOVE N'TestUserDatabase_log' TO N'L:\Logs\TestUserDatabase_log.ldf', MOVE N'DOCUMENTS' TO N'D:\SQLDATA\TestUserDatabase_DOCUMENTS', NOUNLOAD, STATS = 1; GO -- Actual elapsed time 6:21 (Before directory exclusion change) --Processed 2358856 pages for database 'TestUserDatabase', file 'TestUserDatabase' on file 1. --Processed 4 pages for database 'TestUserDatabase', file 'TestUserDatabase_log' on file 1. --Processed 208905 pages for database 'TestUserDatabase', file 'DOCUMENTS' on file 1. --RESTORE DATABASE successfully processed 2567764 pages in 378.708 seconds (52.971 MB/sec). -- Actual elapsed time 2:52 (After directory exclusion change) --Processed 2358856 pages for database 'TestUserDatabase', file 'TestUserDatabase' on file 1. --Processed 4 pages for database 'TestUserDatabase', file 'TestUserDatabase_log' on file 1. --Processed 208905 pages for database 'TestUserDatabase', file 'DOCUMENTS' on file 1. --RESTORE DATABASE successfully processed 2567764 pages in 167.607 seconds (119.688 MB/sec).
Figure 1: T-SQL for Database Restore Test
One thing you might notice is that the filetable directory is on the same LUN as the PRIMARY data file for this database, which is not really the best location. Ideally this directory would be on a separate LUN from any other SQL Server database files, but this may not always be possible due to economic resource constraints.
Here is some more information about filestream performance:
Best Practices on FILESTREAM implementations
7 thoughts on “Using Windows Resource Monitor to Find Unusual Performance Bottlenecks”
Does using the STATS option with a low number (like 1) increase the overhead of the RESTORE command?
It probably adds a very tiny amount of overhead, but I really doubt that it is going to be enough to easily measure. That can easily be tested though, so I will run a few tests.
Interesting point. Thanks for sharing.
We have similar issue on backup. I’m not sure why backup of one database is taking long time on one server but the same database backup on other servers running fine. There will be no differences between these servers.
I would check the more obvious possible differences first. You can use CrystalDiskMark 4.0 to run a quick test of the sequential read and write performance of the LUNs where the database data and log files are located, and where the backup file(s) are located.