Can GUID cluster keys cause non-clustered index fragmentation?


At the user group meeting on Monday I spent some time explaining how GUIDs can cause fragmentation in clustered indexes AND in non-clustered indexes, even if the GUID isn’t specifically included in the non-clustered index key. GUIDs are essentially random values (pseudo-random in ranges if generated using NEWSEQUENTIALID) that are also unique. Their uniqueness is what makes them attractive to many developers as a key value, without understanding the havoc they can cause in production in terms of fragmentation and poor query performance.


A GUID key causes fragmentation because of its randomness. The insertion point of a new record in an index is dictated by the value of the index key, so if the key value is random, so is the insertion point. This means that if an index page is full, a random insert that happens to have to go onto that page will cause a page split to make room for the new record. A page-split is where a new page is allocated and (as near as possible to) half the rows from the splitting page are moved to the new page. The new row is then inserted into one of the two pages, determined by the key value. Usually the newly allocated page is not physically contiguous to the splitting page, and so fragmentation has been caused. In this case *two* kinds of fragmentation have been caused – logical fragmentation (where the next logical page as determined by the index order is not the next physical page in the data file) and physical (or internal) fragmentation (where space is being wasted on index pages). These can both affect query performance (topic for a later post), as well as the expense of having to do the page split in the first place.


It’s fairly well known that GUIDs can cause fragmentation in the index where the GUID is the key (e.g. a clustered index), but not about the knock-on effects in non-clustered indexes. Here’s an example – I’ll create two clustered indexes with GUID keys (one generated from NEWID and one from NEWSEQUENTIALID), plus a non-clustered index on each. Let’s see what happens when we insert 100000 rows:



— Create a table with a GUID key
CREATE TABLE BadKeyTable (
   
c1 UNIQUEIDENTIFIER DEFAULT NEWID () ROWGUIDCOL,
   
c2 DATETIME DEFAULT GETDATE (),
   
c3 CHAR (400) DEFAULT ‘a’);
CREATE CLUSTERED INDEX BadKeyTable_CL ON BadKeyTable (c1);
CREATE NONCLUSTERED INDEX BadKeyTable_NCL ON BadKeyTable (c2);
GO


— Create another one, but using NEWSEQUENTIALID instead
CREATE TABLE BadKeyTable2 (
   
c1 UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID () ROWGUIDCOL,
   
c2 DATETIME DEFAULT GETDATE (),
   
c3 CHAR (400) DEFAULT ‘a’);
CREATE CLUSTERED INDEX BadKeyTable2_CL ON BadKeyTable2 (c1);
CREATE NONCLUSTERED INDEX BadKeyTable2_NCL ON BadKeyTable2 (c2);
GO


DECLARE @a INT;
SELECT @a = 1;
WHILE (@a < 10000)
BEGIN
   
INSERT INTO BadKeyTable DEFAULT VALUES;
   
INSERT INTO BadKeyTable2 DEFAULT VALUES;
   
SELECT @a = @a + 1;
END;
GO


— And now check for fragmentation
SELECT
   
OBJECT_NAME (ips.[object_id]) AS ‘Object Name’,
   
si.name AS ‘Index Name’,
   ROUND (ips.avg_fragmentation_in_percent, 2) AS ‘Fragmentation’,
   
ips.page_count AS ‘Pages’,
   
ROUND (ips.avg_page_space_used_in_percent, 2) AS ‘Page Density’
FROM sys.dm_db_index_physical_stats (DB_ID (‘DBMaint2008’), NULL, NULL, NULL, ‘DETAILED’) ips
CROSS APPLY sys.indexes si
WHERE
   
si.object_id = ips.object_id
   
AND si.index_id = ips.index_id
   AND ips.index_level = 0;
GO


Object Name    Index Name        Fragmentation  Pages  Page Density
————-  —————-  ————-  —–  ————
BadKeyTable    BadKeyTable_CL    99.13          8092   66.08
BadKeyTable    BadKeyTable_NCL   30.97          78     64.1
BadKeyTable2   BadKeyTable2_CL   0.83           5556   96.26
BadKeyTable2   BadKeyTable2_NCL  1.88           372    99.61


The BadKeyTable_CL clustered index with the GUID key generated from NEWID is almost perfectly fragmented, with 34% space being wasted on each page. Conversely, the BadKeyTable2_CL clustered index with the GUID key generated from NEWSEQUENTIALID is hardly fragmented and only 4% of free space is wasted (and this is just because of the row size chosen). These numbers are entirely expected given the nature of the cluster keys.


Now look at the non-clustered indexes. BadKeyTable_NCL is 31% fragmented with 36% space wasted on each page! BadKeyTable2_NCL is harldy fragmented with no free space wasted on each page. So what’s going on? The non-clustered index key in both cases is a datetime column, which has a minimum granularity of 3 milliseconds. The code above runs in a tight loop inserting records and so can insert more than one record per 3ms time interval – and I allow this because I didn’t create make the non-clustered indexes unique. For all the records inserted in one 3ms time interval, there *has* to be something that makes the non-clustered index key unique internally (as even though an index can be defined as non-unique, the Storage Engine requires that each record really is unique in an index and will add whatever it needs to so that happens).


In this case, the clustered index key (which must be present in the non-clustered index anyway) is used to differentiate between all the non-clustered index records with the same datetime value. For BadKeyTable_NCL, the cluster key is a random GUID, so the non-clustered index record insertion points ALSO become random within each 3ms time interval – leading to the fragmentation above. The BadKeyTable2_NCL non-clustered index has the same time interval issue, but it’s cluster key is a sequential GUID, so the non-clustered index doesn’t get fragmented. If the non-clustered index key was a time datatype with a larger minimum granularity (like smalldatetime, or the new date), the fragmentation of BadKeyTable_NCL would be even worse – try it for yourself and you’ll see.


So the answer to the question in the blog post title is really – it depends! Under the right conditions, a GUID cluster key can also seriously fragment a non-clustered index as well.

8 thoughts on “Can GUID cluster keys cause non-clustered index fragmentation?

  1. Paul,

    Doesn’t replication add GUIDs to tables, if they do not already have one?
    And if so, wouldn’t this have the same performance impact you have outlined above?

  2. Merge replication adds a uniqueidentifier column with a unique index if one doesn’t exist. So if you’re going to use merge replication, you should add your own GUID column with a default of NEWSEQUENTIALID.

    Thanks

  3. This is the first time I hear about NEWSEQUENTIALID and I can see their advantage in some cases compared to NEWID.

    I’m wondering whether there is a pendant in the C# programming language. For what I know there only is System.Guid.NewGuid() which seems to be similar to NEWID.

    However I might want to know about Guids send to the DB before I submit them. It would be great to be able to generate NEWSEQUENTIALID in C# code. Did anyone ever do it? I’m sure it’s pretty easy and I just don’t see it!

  4. Even though the Guid is random (i am referring to making use of newid()). Will having an identity column as the clustered index and the guid column as the nonclustered index (and guid being primary key for the purposes of referential integrity) speed up inserts? And does insert/update/delete look same in the following 2 cases 1) clustered index with primary key on identity column and non clustered guid column
    2) clustered index on identity column and non clustered guid colum as primary key
    ?

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.