How expensive are page splits in terms of transaction log?

Page splits are always thought of as expensive, but just how bad are they? In this post I want to create an example to show how much more transaction log is created when a page in an index has to split. I'm going to use the sys.dm_tran_database_transactions DMV to show how much more transaction log is generated when a page has to split. You can find the list of columns and a small amount of explanation of each column in Books Online here – I was reminded of its existence by someone on Twitter (sorry, don't remember who it was and I couldn't find it in search).

In the example, I'm going to create a table with approximately 1000-byte long rows:

CREATE DATABASE PageSplitTest;
GO
USE pagesplittest;
GO

CREATE TABLE BigRows (c1 INT, c2 CHAR (1000));
CREATE CLUSTERED INDEX BigRows_CL ON BigRows (c1);
GO

INSERT INTO BigRows VALUES (1, 'a');
INSERT INTO BigRows VALUES (2, 'a');
INSERT INTO BigRows VALUES (3, 'a');
INSERT INTO BigRows VALUES (4, 'a');
INSERT INTO BigRows VALUES (6, 'a');
INSERT INTO BigRows VALUES (7, 'a');
GO

I've engineered the case where the clustered index data page has space for one more row, and I've left a 'gap' at c1=5. Let's add it as part of an explicit transaction and see how much transaction log is generated:

BEGIN TRAN
INSERT INTO BigRows VALUES (8, 'a');
GO

SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions
WHERE [database_id] = DB_ID ('PageSplitTest');
GO

database_transaction_log_bytes_used
———————————–
1228

That's about what I'd expect for that row. Now what about when I cause a page split by inserting the 'missing' c1=5 row into the full page?

— commit previous transaction
COMMIT TRAN
GO

BEGIN TRAN
INSERT INTO BigRows VALUES (5, 'a');
GO

SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions
WHERE [database_id] = DB_ID ('PageSplitTest');
GO

database_transaction_log_bytes_used
———————————–
6724

Wow. 5.5x more bytes are written to the transaction log as part of the system transaction that does the split.

The ratio gets worse as the row size gets smaller. For a row with an approximately 100-byte long row (use the same code as above, but change to a CHAR (100), insert 67 rows with a 'gap' somewhere then insert the 68th to cause the split), the two numbers are 328 and 5924 – the split cause 18 times more log to be generated! For a row with an approximately 10-byte long row, I got numbers of 240 and 10436, because I created skewed data (about 256 rows with the key value 8) and then inserted key value 5 which forced a (rare) non-middle page split. That's a ratio of more than 43 times more log generated! You can try this yourself if you want: I changed the code to have a CHAR (10), inserted values 1, 2, 3, 4, 6, 7, then inserted 256 key values of 8 and then 2 of 5. The resulting page had only 6 rows – it split after the key value 5 – the Storage Engine doesn't always do a 50/50 page split. And that's not even causing nasty cascading page-splits, or splits that have to split a page multiple times to fit a new (variable-sized) row in.

Bottom line: page splits don't just cause extra IOs and index fragmentation, they generate a *lot* more transaction log. And all that log has to be (potentially) backed up, log shipped, mirrored….

15 thoughts on “How expensive are page splits in terms of transaction log?”

Chris says:

May 13, 2009 at 6:28 am

I’m relatively new to Sql Server and still trying to understand why a clustered index would be used here at all. I get that when data is going to be accessed in sequence that a clustered index will keep it all physically ordered on a disk and that caching of the data by the disk controller in will likely result in your next needed record being in memory already (although I have questions about how true that is if multiple people are accessing the database at the same time causing the disk to spin). I don’t get what this buys you in the majority of situations when data is being retrieved out of sequence by other criteria (i.e Where SalesID = ‘123’). From what you have shown here it seems like it is more likely that having a clustered PK index on an identity column instead of non-clustered is going to add overhead. Can you explain how exactly clustering a PK identity column index really benefits the database?

Reply
paul says:

May 13, 2009 at 2:08 pm

Your best bet is to read this whitepaper: http://technet.microsoft.com/en-us/library/cc917672.aspx

Reply
Jason Strate says:

May 18, 2009 at 10:22 am

Good to know, I hadn’t thought much of the performance hit of page splits on the transaction log.

Reply
Vinay says:

July 17, 2009 at 6:55 am

Hi Paul,

Very nice article on page splits, does this still happen when you have a database snapshot setup?

When its using a sparse file to capture all the data.

How do we check that pice of information.

Thank you.
Vinay.

Reply
paul says:

August 12, 2009 at 4:21 am

Hi Vinay – yes – database snapshots are completely orthogonal to this.

Reply
Igor Micev says:

June 5, 2013 at 9:07 am

Hi Paul,
Nice post!
Does this mean that the FillFactor (FF) for fragmenting indexes should be lowered, e.g. 95%. In that case the size of the indexes will be bigger. If I know how much the indexes get fragmented between two rebuilds, what is the formula to determine the FF lowering?

Regards, Igor.

Reply
1. Paul Randal says:
  
  June 5, 2013 at 9:11 am
  
  Correct.
  
  There is no formula that I know of. It’s a balancing act between lowering the fillfactor and doing more frequent index maintenance to put the leaf-level pages back to the fillfactor again. If I write an index maintenance routine, I’ll automatically lower the fillfactor if a particular index gets rebuilt each time the maintenance routine runs, say, 5 times in a row.
  
  Reply
Taiob says:

April 28, 2014 at 9:25 pm

Will the same type of tlog growth happen during index maintenance with rebuild and pages are merged (freed up)?

Reply
1. Paul Randal says:
  
  April 29, 2014 at 7:04 am
  
  Rebuild and reorganize will both generate transaction log – but the amount will vary depending on fragmentation and recovery model (for rebuilds).
  
  Reply
Sarita says:

June 13, 2015 at 9:34 pm

Hi Paul:
This information was helpful. I have couple of questions. What can we do to reduce page splits? Is there a number of page splits limit when we should start looking into it like we have for page life expectancy if < 300 could be pointing to memory bottleneck?

Thank you

Reply
1. Paul Randal says:
  
  June 15, 2015 at 2:21 pm
  
  Way more to this answer than can fit in a comment: use a fill factor, change index keys, don’t do things that cause page splits.
  
  No there’s no threshold – any page splits means there’s fragmentation.
  
  Btw – 300 is a very old and seriously out-of-date threshold for PLE. See https://www.sqlskills.com/blogs/jonathan/finding-what-queries-in-the-plan-cache-use-a-specific-index/
  
  Reply
Vlad G says:

October 25, 2020 at 12:52 pm

Hi Paul,

Am I right to guess that skewed page splits help reduce splits for multi-tenant schemas like this?

CREATE CLUSTERED INDEX CX_Foo ON dbo.Foo
(
TenantID int, — this has, say, ~50 distinct values
ID int identity
)

Do the INSERTS cause only (or mostly) “good” page splits in a such an index?

Thanks!

Reply
1. Paul Randal says:
  
  October 27, 2020 at 9:55 am
  
  Yes – it will do a ‘good’ page split (just adding a new empty page) when an insert is at the ‘end’ of a page and with the same TenantID.
  
  Reply
Artashes Khachatryan says:

October 24, 2021 at 11:57 am

Hi Paul,

Thanks for great topic about page splits.
I am wondering how is “bad splits” even possible when we have sequential index leading keys like identity, getdate or sequentialGuid?

Reply
1. Paul Randal says:
  
  October 24, 2021 at 1:35 pm
  
  When the insertion pattern is sequential, bad splits happen from rows increasing in size from variable length columns.
  
  Reply

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.

How expensive are page splits in terms of transaction log?

15 thoughts on “How expensive are page splits in terms of transaction log?”

Leave a Reply Cancel reply

Other articles

2024: the year in books

2023 (not a typo!): the year in books

Black Friday Super Sale!

Summer School Savings SALE is live!

The Curious Case of… missing log information

The Curious Case of… missing 832 errors during memory corruption

Imagine feeling confident enough to handle whatever your database throws at you.