New course: Index Fragmentation Internals, Analysis, and Solutions

My latest Pluralsight course has been published!

It’s called SQL Server: Index Fragmentation Internals, Analysis, and Solutions and is just over 2.5 hours long.

The modules are:

  • Introduction
  • Index Structure and Index Uses
  • Types of Index Fragmentation
  • Causes of Index Fragmentation
  • Detecting Index Fragmentation
  • Avoiding Index Fragmentation
  • Removing Index Fragmentation

Check it out here.

We now have more than 130 hours of SQLskills online training available, all for as little as $29/month through Pluralsight (including more than four thousand other developer and IT training courses). That’s unbeatable value that you can’t afford to ignore.

How would indexes on AG readable secondaries work?

Last weekend there was a suggestion on the MVP distribution list about having temporary nonclustered indexes on AG readable secondaries, in the same way that you can have temporary statistics on them. I replied that in my opinion it would be extremely difficult to do that, and said I’d blog about why. Here’s my explanation. Note that this isn’t an exhaustive list of all the problems, just the major ones that I see.

Where to store them?

There are several options that spring to mind for how to store these temporary indexes, with varying degrees of difficulty:

  1. Automatically create a temporary filegroup on the readable secondary
    1. This option has the problem that the readable secondary is read-only, and adding a filegroup would mean adding entries in a bunch of system tables (including sys.sysfiles1, sys.sysdbfiles, sys.sysbrickfiles, sys.sysprufiles). Even if this problem could be surmounted, there’s still the problem of…
    2. The readable secondary is read-only, so where to store all the information about the indexes themselves? There are a large number of system tables that have information about an index (including sys.sysallocunits, sys.sysrowsets, sys.sysrscols, sys.sysidxstats, sys.sysiscols). Even if this problem could be surmounted, there’s still the problem of maintaining the index (see below).
  2. Create an empty filegroup on the primary replica so there’s an empty filegroup to use on the readable secondary
    1. This only solves 1.a above.
  3. Store them in tempdb, the same as the temporary statistics
    1. This solves 1.a and 1.b, but then has the added difficulty of…
    2. Keeping track of the fact that there are indexes in tempdb that are really for another database, which already happens for temporary statistics, so that’s doable but there’s the much more complex problem of…
    3. Making the Storage Engine (specifically the Access Methods) get data from the tempdb index instead of an index in the real database. That’s not a code change in the Storage Engine (because the Access Methods just creates and uses a data set over whatever the Query Processor asks it to), but the query plan will have to know that the index it’s referencing which purports to be on a table in the real database is actually in tempdb, so it asks the Storage Engine to read from the correct place. I can imagine this being quite a challenge for things like a key lookup based on a nonclustered index seek/scan.
    4. This is the only potentially practical solution in my mind for where to create the temporary indexes.

That’s the easier part taken care of.

How to maintain them?

Assuming they are stored as in #3 above, then there’s the really hard problem of how to maintain the indexes, to keep them up-to-date with the underlying table (i.e. when an INSERT, UPDATE, or DELETE happens, make sure that all nonclustered indexes are updated accordingly). Creating the index would be relatively trivial using an online index build-like method, so I won’t go into that.

Back in SQL Server 2000, it would be much easier (although there weren’t any AGs back then :-) because the Storage Engine was responsible for real-time maintenance of nonclustered indexes. Since SQL Server 2005, however, this has been the purview of the Query Processor, so it drives maintaining indexes and the Storage Engine just does what it’s asked (insert a record here, update this column, etc.). There is functionality available to the Storage Engine to ask the Query Processor to manipulate nonclustered indexes without an operation occurring on the table – DBCC CHECKDB uses it for fixing missing or extra records in nonclustered indexes during repair operations.

Here’s the major problem with temporary indexes: the only data coming from the primary replica are physical log records. How to translate those physical log records into nonclustered index maintenance operations?

The only log records that are interesting are those related to the table itself (i.e. changes to the heap or clustered index). But log records are replayed by the recovery portion of the Storage Engine, not the Access Methods, so in the recovery portion, there is no context at all about the table, it’s columns, indexes, and so on. To make this work, the following would be required:

  1. The recovery portion would have to know that the log records LOP_INSERT_ROWS, LOP_DELETE_ROWS, LOP_MODIFY_ROW, LOP_MODIFY_COLUMNS, plus anything else like truncating the table, changing from a heap to a clustered index and vice versa, for certain allocation units (i.e. those for any table with a temporary nonclustered index) need to be passed to a new piece of code to do the temporary index maintenance.
  2. The new piece of code would have to know about the new indexes and do the index maintenance. In other words, Storage Engine nonclustered index maintenance like in SQL Server 2000 would have to be built again. There’s a complication with table record modifications, as the LOP_MODIFY_ROW and LOP_MODIFY_COLUMNS log records don’t say which columns are changing – just the offset and length of the change (they’re physical log records remember). The log record could potentially be changed to have a bitmap of column IDs being changed, that’s only present when there’s a temporary index on a readable secondary. More conditional code.
  3. Another option is for the Query Processor on the primary replica to know that there’s a temporary index and generate special logical operation log records to aid in the maintenance of the index on the readable secondary (in the same way that transactional replication works). I think this would be easier than having to interpret regular log records and figure out what to do.
  4. All this new code to be executed would potentially slow down the replay of log records, with the slow down increasing with each temporary index that’s added.


To summarize, there are possible solutions to how to store the temporary indexes and how to maintain them, but it’s a lot of tricky work and involves a big change to the recovery code, which is always fraught with danger as it’s such a critical portion of the Storage Engine. And I haven’t touched on partitioning, rebuild/reorganize, and other things you can do with indexes.

IMHO, the easiest solution is to not use temporary indexes and just create the indexes on the primary replica that you want to be there on the readable secondary. This doesn’t require any code changes and is available today. Of course there’s the potential downside of extra space being taken up, extra logging, and extra time to send that log to the secondaries, but I think this is the way to go.

Hope you found this interesting!

Are mixed pages removed by an index rebuild?

This is a question that came up this morning during our IE1 class that I thought would make an interesting blog post as there are some twists to the answer.

The first 8 pages that are allocated to an allocation unit are mixed pages from mixed extents, unless trace flag 1118 is enabled.

See the following blog posts for more info:

Assuming that mixed pages are not disabled with trace flag 1118, does an index rebuild remove all mixed pages or not?

Let’s investigate. First I’ll create a clustered index with 1,000 data pages:

CREATE TABLE [MixedTest] ([c1] BIGINT IDENTITY, [c2] CHAR (8000) DEFAULT 'a');
CREATE CLUSTERED INDEX [MixedTest_CL] ON [MixedTest] ([c1]);
GO 1000

And then make sure that we have mixed pages be examining the first IAM page in the clustered index’s IAM chain. You can get the sp_AllocationMetadata proc here.

EXEC [sp_AllocationMetadata] N'MixedTest';
Object Name   Index ID   Alloc Unit ID       Alloc Unit Type   First Page   Root Page   First IAM Page
------------  ---------  ------------------  ----------------  -----------  ----------  ---------------
MixedTest     1          72057594046185472   IN_ROW_DATA       (1:987)      (1:1732)    (1:988)
DBCC PAGE (N'master', 1, 988, 3);

(I’m just including the relevant portion of the DBCC PAGE output here…)

IAM: Single Page Allocations @0x00000000227EA08E

Slot 0 = (1:987)                    Slot 1 = (1:989)                    Slot 2 = (1:990)
Slot 3 = (1:991)                    Slot 4 = (1:1816)                   Slot 5 = (1:1817)
Slot 6 = (1:1818)                   Slot 7 = (1:1819)

Now I’ll do an offline index rebuild of the clustered index, and look again at the IAM page contents (assume I’m running the proc and DBCC PAGE after the rebuild):

ALTER INDEX [MixedTest_CL] ON [MixedTest] REBUILD;
IAM: Single Page Allocations @0x0000000023B0A08E

Slot 0 = (1:1820)                   Slot 1 = (1:446)                    Slot 2 = (1:1032)
Slot 3 = (0:0)                      Slot 4 = (1:1035)                   Slot 5 = (1:1034)
Slot 6 = (1:1037)                   Slot 7 = (1:1036)

So the answer is no, an index rebuild does not remove mixed page allocations. Only trace flag 1118 does that.

But this is interesting – there are only 7 mixed pages in the singe-page slot array above. What happened? The answer is that the offline index rebuild ran in parallel, with each thread building a partial index, and then these are stitched together. The ‘stitching together’ operation will cause some of the non-leaf index pages to be deallocated as their contents are merged together. This explains the deallocated page that was originally tracked by entry 3 in the slot array.

Let’s try an offline index rebuild that forces a serial plan.

IAM: Single Page Allocations @0x0000000023B0A08E

Slot 0 = (1:1822)                   Slot 1 = (1:1823)                   Slot 2 = (1:291)
Slot 3 = (1:292)                    Slot 4 = (0:0)                      Slot 5 = (0:0)
Slot 6 = (0:0)                      Slot 7 = (0:0)

In this case there is only one index (i.e. no parallel mini indexes) being built so there are no pages being deallocated in the new index as there is no stitching operation. But why aren’t there 8 mixed pages? This is because during the build phase of the new index, the leaf-level pages are taken from bulk-allocated dedicated extents, regardless of the recovery model in use. The mixed pages are non-leaf index pages (which you can prove to yourself using DBCC PAGE).

For parallel and single-threaded online index operations, the same two patterns occur as for offline index rebuilds, even though the algorithm is slightly different.