(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)
Yesterday I was chatting with Jonathan about some of the internals of the page compression algorithms for a client discussion. When you enable page compression, rows are first just row compressed (storing values with empty bytes stripped out) until a page fills up. The Storage Engine then attempts page compression for the page. This involves these steps, in the order listed:
- Creating a compression information (CI) record that is stored at the top of the page.
- Common prefix compression per column. The longest common prefix is found for each column and the longest column value that contains that prefix is moved to the CI record, and each value in the column is replaced with a number representing how many bytes from the common prefix to use, plus any other bytes after the prefix bytes.
- Dictionary compression. Look across all values in all columns and pull out common values into another portion of the CI record. In the column value’s place is a number representing which element of the dictionary to use.
- If the space saved by performing page compression saves 20% or more of the space on the page, the page becomes page compressed, otherwise it remains simply row compressed.
You can see some graphics that show this process in the Microsoft doc page here.
This means a page compressed table or index could have a mixture of row compressed and page compressed pages – perfectly normal.
What Jon didn’t realize was that there’s a way to track step 4, to see whether all the CPU being used to attempt page compression is worthwhile. Some people on Twitter also mentioned that they didn’t know that, and I’m not surprised as neither the docs nor the excellent whitepaper Data Compression: Strategy, Capacity Planning and Best Practices mention it.
The DMV sys.dm_db_index_operational_stats has two columns that you can use those too see whether the data in a table or index is suitable for page compression or not: page_compression_attempt_count and page_compression_success_count.
Here’s a query you can use to figure out the success rate of page compression for indexes, down to the partition level. Note that if you don’t have any partitioning, you’ll just see partition ID 1 for each index.
SELECT DISTINCT object_name (i.object_id) AS [Table], i.name AS [Index], p.partition_number AS [Partition], page_compression_attempt_count, page_compression_success_count, page_compression_success_count * 1.0 / page_compression_attempt_count AS [SuccessRate] FROM sys.indexes AS i INNER JOIN sys.partitions AS p ON p.object_id = i.object_id CROSS APPLY sys.dm_db_index_operational_stats ( db_id(), i.object_id, i.index_id, p.partition_number) AS ios WHERE p.data_compression = 2 AND page_compression_attempt_count > 0 ORDER BY [SuccessRate];
What threshold you pick for when to disable page compression for a table, index, or partition is up to you, but at least now you know how to calculate the metric.