The Curious Case of… tracking page compression success rates

(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)

Yesterday I was chatting with Jonathan about some of the internals of the page compression algorithms for a client discussion. When you enable page compression, rows are first just row compressed (storing values with empty bytes stripped out) until a page fills up. The Storage Engine then attempts page compression for the page. This involves these steps, in the order listed:

  1. Creating a compression information (CI) record that is stored at the top of the page.
  2. Common prefix compression per column. The longest common prefix is found for each column and the longest column value that contains that prefix is moved to the CI record, and each value in the column is replaced with a number representing how many bytes from the common prefix to use, plus any other bytes after the prefix bytes.
  3. Dictionary compression. Look across all values in all columns and pull out common values into another portion of the CI record. In the column value’s place is a number representing which element of the dictionary to use.
  4. If the space saved by performing page compression saves 20% or more of the space on the page, the page becomes page compressed, otherwise it remains simply row compressed.

You can see some graphics that show this process in the Microsoft doc page here.

This means a page compressed table or index could have a mixture of row compressed and page compressed pages – perfectly normal.

What Jon didn’t realize was that there’s a way to track step 4, to see whether all the CPU being used to attempt page compression is worthwhile. Some people on Twitter also mentioned that they didn’t know that, and I’m not surprised as neither the docs nor the excellent whitepaper Data Compression: Strategy, Capacity Planning and Best Practices mention it.

The DMV sys.dm_db_index_operational_stats has two columns that  you can use those too see whether the data in a table or index is suitable for page compression or not: page_compression_attempt_count and page_compression_success_count.

Here’s a query you can use to figure out the success rate of page compression for indexes, down to the partition level. Note that if you don’t have any partitioning, you’ll just see partition ID 1 for each index.

SELECT
    DISTINCT object_name (i.object_id) AS [Table],
    i.name AS [Index],
    p.partition_number AS [Partition],
    page_compression_attempt_count,
    page_compression_success_count,
    page_compression_success_count * 1.0 / page_compression_attempt_count
        AS [SuccessRate]
FROM
    sys.indexes AS i
INNER JOIN
    sys.partitions AS p
ON
    p.object_id = i.object_id
CROSS APPLY
    sys.dm_db_index_operational_stats (
        db_id(), i.object_id, i.index_id, p.partition_number) AS ios
WHERE
    p.data_compression = 2
    AND page_compression_attempt_count > 0
ORDER BY
    [SuccessRate];

What threshold you pick for when to disable page compression for a table, index, or partition is up to you, but at least now you know how to calculate the metric.

5 thoughts on “The Curious Case of… tracking page compression success rates

  1. Thanks Paul. I use index_operational_stats all the time and knew of these, but did not form a query to more easily pull this information out. Thanks for that! Another tool in my belt. I love these posts.

  2. Thanks so much, Paul!
    This was actually the missing piece that I needed to accompany my “Ultimate Compression Savings Estimation Script“!

    I made my own expansion to Paul’s script, which adds the following:

    * Outputs a remediation command to REBUILD each relevant index with ROW compression.
    * Parameters for thresholds and index rebuild options
    * The script outputs additional details such as whether the index/table in question is partitioned or not, range-scan percentage, and leaf-level updates percentage.
    * It runs the check for ALL accessible and writeable databases.

    https://eitanblumin.com/2022/01/17/detect-low-compression-rates-in-all-databases/

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.