(New for 2020: we’ve published a range of SQL Server interview candidate screening assessments with our partner Kandio, so you can avoid hiring an ‘expert’ who ends up causing problems. Check them out here.)
(Check out our Pluralsight online training course: SQL Server: Understanding and Using DBCC Commands.)
One of my biggest hot-buttons is around shrinking data files. Although I used to own the shrink code while I was at Microsoft, I didn’t write it (so don’t blame me! :-) and it was never deemed a big enough customer pain-point to fix. But I really don’t like the data-file shrink process.
Now, don’t confuse shrinking the transaction log with shrinking data files. Shrinking the log may be necessary if your log has grown out of control, or as part of a process to remove excessive VLF fragmentation (see Kimberly’s excellent posts on this here and here). However, shrinking the log should be a rare operation and should not be part of any regular maintenance you perform.
In either case, I’m not talking about using the TRUNCATEONLY option – all that does is lop off any unused space at the end of the files – that’s perfectly fine. I’m talking about actually running the shrink algorithm.
Shrinking of data files should be performed even more rarely, if at all. Here’s why: data file shrink can cause *massive* index fragmentation (of the out-of-order pages kind, not the wasted-space kind) and it is very expensive (in terms of I/O, locking, transaction log generation). Let me demonstrate with a simple script you can run. The script below will create a data file, create a 10MB ‘filler’ table at the start of the data file, create a 10MB ‘production’ clustered index, and then analyze the fragmentation of the new clustered index.
USE [master]; GO IF DATABASEPROPERTYEX (N'DBMaint2008', N'Version') IS NOT NULL DROP DATABASE [DBMaint2008]; GO CREATE DATABASE DBMaint2008; GO USE [DBMaint2008]; GO SET NOCOUNT ON; GO -- Create the 10MB filler table at the 'front' of the data file CREATE TABLE [FillerTable] ( [c1] INT IDENTITY, [c2] CHAR (8000) DEFAULT 'filler'); GO -- Fill up the filler table INSERT INTO [FillerTable] DEFAULT VALUES; GO 1280 -- Create the production table, which will be 'after' the filler table in the data file CREATE TABLE [ProdTable] ( [c1] INT IDENTITY, [c2] CHAR (8000) DEFAULT 'production'); CREATE CLUSTERED INDEX [prod_cl] ON [ProdTable] ([c1]); GO INSERT INTO [ProdTable] DEFAULT VALUES; GO 1280 -- Check the fragmentation of the production table SELECT [avg_fragmentation_in_percent] FROM sys.dm_db_index_physical_stats ( DB_ID (N'DBMaint2008'), OBJECT_ID (N'ProdTable'), 1, NULL, 'LIMITED'); GO
avg_fragmentation_in_percent ----------------------------- 0.390625
The logical fragmentation of the clustered index before the shrink is a near-perfect 0.4%.
Now I’ll drop the ‘filler’ table, run a shrink to reclaim the space, and re-analyze the fragmentation of the clustered index:
-- Drop the filler table, creating 10MB of free space at the 'front' of the data file DROP TABLE [FillerTable]; GO -- Shrink the database DBCC SHRINKDATABASE ([DBMaint2008]); GO -- Check the index fragmentation again SELECT [avg_fragmentation_in_percent] FROM sys.dm_db_index_physical_stats ( DB_ID (N'DBMaint2008'), OBJECT_ID (N'ProdTable'), 1, NULL, 'LIMITED'); GO
DbId FileId CurrentSize MinimumSize UsedPages EstimatedPages ----- ------- ------------ ------------ ---------- --------------- 6 1 1456 152 1448 1440 6 2 63 63 56 56 DBCC execution completed. If DBCC printed error messages, contact your system administrator. avg_fragmentation_in_percent ----------------------------- 99.296875
Wow! After the shrink, the logical fragmentation (out-of-order pages) is almost 100%. The shrink operation *completely* fragmented the index. Now, will that have any effect on your workload performance? If the index is larger than a few thousand pages, is typically not all in the buffer pool, and is scanned a lot, then possibly. Otherwise, probably not.
Why does this happen? A data file shrink operation works on a single file at a time, and uses the GAM bitmaps (see Inside The Storage Engine: GAM, SGAM, PFS and other allocation maps) to find the highest page allocated in the file. It then moves it as far towards the front of the file as it can, and so on, and so on. In the case above, it completely reversed the order of the clustered index, taking it from perfectly defragmented to perfectly fragmented.
The same code is used for DBCC SHRINKFILE, DBCC SHRINKDATABASE, and auto-shrink – they’re equally as bad. As well as introducing index fragmentation, data file shrink also generates a lot of I/O, uses a lot of CPU, and generates *loads* of transaction log – as everything it does is fully logged.
Data file shrink should never be part of regular maintenance, and you should NEVER, NEVER have auto-shrink enabled. I tried to have it removed from the product for SQL 2005 and SQL 2008 when I was in a position to do so – the only reason it’s still there is for backwards compatibility. Don’t fall into the trap of having a maintenance plan that rebuilds all indexes and then tries to reclaim the space required to rebuild the indexes by running a shrink – that’s a zero-sum game where all you do is generate a log of transaction log for no actual gain.
So what if you *do* need to run a shrink? For instance, if you’ve deleted a large proportion of a very large database and the database isn’t likely to grow, or you need to empty a file before removing it?
The method I like to recommend is as follows:
- Create a new filegroup
- Move all affected tables and indexes into the new filegroup using the CREATE INDEX … WITH (DROP_EXISTING = ON) ON syntax, to move the tables and remove fragmentation from them at the same time
- Drop the old filegroup that you were going to shrink anyway (or shrink it way down if its the primary filegroup)
Basically you need to provision some more space before you can shrink the old files, but it’s a much cleaner mechanism. Beware that if a table has off-row LOB data, rebuilding will NOT move the LOB data to the new filegroup and there’s no good way to do that.
If you absolutely have no choice and have to run a data file shrink operation, be aware that you’re going to cause index fragmentation and you might need to take steps to remove it afterwards if it’s going to cause performance problems. The only way to remove index fragmentation without causing data file growth again is to use DBCC INDEXDEFRAG or ALTER INDEX … REORGANIZE. These commands only require a single 8KB page of extra space, instead of needing to build a whole new index in the case of an index rebuild operation (which will likely cause the file to grow).
Bottom line – try to avoid running data-file shrink at all costs! I’m often misquoted as saying ‘never run data-file shrink’ – that’s not true. I say ‘never run a regular shrink operation of any kind’, and if you absolutely must run a data-file shrink, be aware of the problems it can cause.