Wednesday, August 27, 2008

Every so often I'll see posts on the various data corruption forums discussing causes of corruption. In this post I want to debunk some of the myths around what can cause corruption. There are really two types of corruption to deal with, physical corruption and logical corruption.

Physical corruption

This is where something has altered the contents of a data or log file sector with no regard for what is being stored there. Possible causes of physical corruption are:

  • Problem with the I/O subsystem (99.8% of all cases I've ever seen - only 3 nines as I'd estimate I've seen around about a thousand corruption cases). Remember the I/O subsystem is everything underneath SQL Server in the I/O stack - including the OS, 3rd-party file system filter drivers, device drivers, RAID controllers, SAN controllers, network hardware, drives themselves, and so on. Millions of lines of code and lots of moving parts spinning very fast, very close to very fragile pieces of metal oxide (I once heard Jim Gray liken a disk drive head to a 747 jumbo jet flying at 500 mph at a height of 1/4 inch from the ground...)
  • Problem with the host machine hardware (0.1% of cases). Most of the time this is a memory error.
  • SQL Server bugs (0.1% of cases). Yes, there have been corruption bugs. Every piece of software has bugs. There are KB articles describing bugs.
  • Deliberate introduction of corruption using a hex editor or other means.

Physical corruption is what DBCC CHECKDB usually reports and the majority of cases are caused by a physical failures of some kind, with the minority caused by humans - software bugs.

Logical corruption

This is where something has altered some data so that a data relationship is broken. Possible causes of logical corruption are:

  • Humans

:-) Okay...

  • Application bug. The application deletes one part of an inherent data relationship but not the other. Or the application designer doesn't implement a constraint properly. Or the application designer doesn't cope with a transaction roll-back properly. You get the idea.
  • Accidental update/delete. Someone deletes or updates some data incorrectly.
  • SQL Server bug. See above.
  • DBCC CHECKDB when using the REPAIR_ALLOW_DATA_LOSS option. As is documented in Books Online, and I've blogged about and mentioned when lecturing, if you run repair, it doesn't take into account any inherent or explicit constraints on the data.

The point here is that a physical failure of a component does not cause logical corruption, it causes physical corruption. Conversely, application errors cause logical corruption, not physical corruption. DBCC CHECKDB errors are about physical corruption (okay, with the inclusion of DBCC CHECKCATALOG code in 2005, it will find cases where the DBA has manually altered the system tables, causing logical corruption) and applications cannot cause physical corruption as they can only manipulate data through SQL Server. If an application hits a SQL Server bug which causes physical corruption, that's still not the application causing physical corruption, it's SQL Server.

So - on to the myths.

  • Can an application cause physical corruption? No.
  • Can stopping a shrink operation cause corruption of any kind? No.
  • Can stopping an index rebuild cause corruption of any kind? No.
  • Can running DBCC CHECKDB without repair cause corruption of any kind? No.
  • Can creating a database snapshot cause corruption of any kind? No.

Hope this helps.

Wednesday, August 27, 2008 9:11:27 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Wednesday, August 20, 2008

At the user group meeting on Monday I spent some time explaining how GUIDs can cause fragmentation in clustered indexes AND in non-clustered indexes, even if the GUID isn't specifically included in the non-clustered index key. GUIDs are essentially random values (pseudo-random in ranges if generated using NEWSEQUENTIALID) that are also unique. Their uniqueness is what makes them attractive to many developers as a key value, without understanding the havoc they can cause in production in terms of fragmentation and poor query performance.

A GUID key causes fragmentation because of its randomness. The insertion point of a new record in an index is dictated by the value of the index key, so if the key value is random, so is the insertion point. This means that if an index page is full, a random insert that happens to have to go onto that page will cause a page split to make room for the new record. A page-split is where a new page is allocated and (as near as possible to) half the rows from the splitting page are moved to the new page. The new row is then inserted into one of the two pages, determined by the key value. Usually the newly allocated page is not physically contiguous to the splitting page, and so fragmentation has been caused. In this case *two* kinds of fragmentation have been caused - logical fragmentation (where the next logical page as determined by the index order is not the next physical page in the data file) and physical (or internal) fragmentation (where space is being wasted on index pages). These can both affect query performance (topic for a later post), as well as the expense of having to do the page split in the first place.

It's fairly well known that GUIDs can cause fragmentation in the index where the GUID is the key (e.g. a clustered index), but not about the knock-on effects in non-clustered indexes. Here's an example - I'll create two clustered indexes with GUID keys (one generated from NEWID and one from NEWSEQUENTIALID), plus a non-clustered index on each. Let's see what happens when we insert 100000 rows:

-- Create a table with a GUID key
CREATE TABLE BadKeyTable (
   
c1 UNIQUEIDENTIFIER DEFAULT NEWID () ROWGUIDCOL,
   
c2 DATETIME DEFAULT GETDATE (),
   
c3 CHAR (400) DEFAULT 'a');
CREATE CLUSTERED INDEX BadKeyTable_CL ON BadKeyTable (c1);
CREATE NONCLUSTERED INDEX BadKeyTable_NCL ON BadKeyTable (c2);
GO

-- Create another one, but using NEWSEQUENTIALID instead
CREATE TABLE BadKeyTable2 (
   
c1 UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID () ROWGUIDCOL,
   
c2 DATETIME DEFAULT GETDATE (),
   
c3 CHAR (400) DEFAULT 'a');
CREATE CLUSTERED INDEX BadKeyTable2_CL ON BadKeyTable2 (c1);
CREATE NONCLUSTERED INDEX BadKeyTable2_NCL ON BadKeyTable2 (c2);
GO

DECLARE @a INT;
SELECT @a = 1;
WHILE (@a < 10000)
BEGIN
   
INSERT INTO BadKeyTable DEFAULT VALUES;
   
INSERT INTO BadKeyTable2 DEFAULT VALUES;
   
SELECT @a = @a + 1;
END;
GO

-- And now check for fragmentation
SELECT
   
OBJECT_NAME (ips.[object_id]) AS 'Object Name',
   
si.name AS 'Index Name',
   ROUND (ips.avg_fragmentation_in_percent, 2) AS 'Fragmentation',
   
ips.page_count AS 'Pages',
   
ROUND (ips.avg_page_space_used_in_percent, 2) AS 'Page Density'
FROM sys.dm_db_index_physical_stats (DB_ID ('DBMaint2008'), NULL, NULL, NULL, 'DETAILED') ips
CROSS APPLY sys.indexes si
WHERE
   
si.object_id = ips.object_id
   
AND si.index_id = ips.index_id
   AND ips.index_level = 0;
GO

Object Name    Index Name        Fragmentation  Pages  Page Density
-------------  ----------------  -------------  -----  ------------
BadKeyTable    BadKeyTable_CL    99.13          8092   66.08
BadKeyTable    BadKeyTable_NCL   30.97          78     64.1
BadKeyTable2   BadKeyTable2_CL   0.83           5556   96.26
BadKeyTable2   BadKeyTable2_NCL  1.88           372    99.61

The BadKeyTable_CL clustered index with the GUID key generated from NEWID is almost perfectly fragmented, with 34% space being wasted on each page. Conversely, the BadKeyTable2_CL clustered index with the GUID key generated from NEWSEQUENTIALID is hardly fragmented and only 4% of free space is wasted (and this is just because of the row size chosen). These numbers are entirely expected given the nature of the cluster keys.

Now look at the non-clustered indexes. BadKeyTable_NCL is 31% fragmented with 36% space wasted on each page! BadKeyTable2_NCL is harldy fragmented with no free space wasted on each page. So what's going on? The non-clustered index key in both cases is a datetime column, which has a minimum granularity of 3 milliseconds. The code above runs in a tight loop inserting records and so can insert more than one record per 3ms time interval - and I allow this because I didn't create make the non-clustered indexes unique. For all the records inserted in one 3ms time interval, there *has* to be something that makes the non-clustered index key unique internally (as even though an index can be defined as non-unique, the Storage Engine requires that each record really is unique in an index and will add whatever it needs to so that happens).

In this case, the clustered index key (which must be present in the non-clustered index anyway) is used to differentiate between all the non-clustered index records with the same datetime value. For BadKeyTable_NCL, the cluster key is a random GUID, so the non-clustered index record insertion points ALSO become random within each 3ms time interval - leading to the fragmentation above. The BadKeyTable2_NCL non-clustered index has the same time interval issue, but it's cluster key is a sequential GUID, so the non-clustered index doesn't get fragmented. If the non-clustered index key was a time datatype with a larger minimum granularity (like smalldatetime, or the new date), the fragmentation of BadKeyTable_NCL would be even worse - try it for yourself and you'll see.

So the answer to the question in the blog post title is really - it depends! Under the right conditions, a GUID cluster key can also seriously fragment a non-clustered index as well.

Wednesday, August 20, 2008 11:11:02 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]  | 
Monday, August 11, 2008

Fresh off a week of teaching classes on the Microsoft campus, we've finalized some user group dates. Here's what we have coming up:

  • Monday August 18th: user group meeting in Redmond
  • Monday September 1st to 3rd: public class in England
    • In conjunction with our UK partners, SQL Know How at Hatfield, England
    • Topic: Best Practices in Performance and Availability in SQL Server 2005/2008 
  • Thursday September 4th: user group meeting in Ireland
  • Friday September 5th: SQL Server launch event in Ireland
    • In conjunction with Microsoft Ireland at the Dublin Microsoft office
    • Topic: SQL Server 2008 Overview for DBAs
  • Monday September 8th to 9th: public class in Scotland
    • In conjunction with our UK partners, SQL Know How at Edinburgh, Scotland
    • Topic: Indexing for Performance in SQL Server 2000/2005/2008

It's going to be a busy few weeks - hope to see you at one of these events!

Monday, August 11, 2008 10:36:44 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Thursday, July 10, 2008

Well, almost... Kimberly and I have agreed to co-author the upcoming SQL Server 2008 Internals book with our good friend Kalen Delaney. This is the 2008 evolution of Kalen's Inside SQL Server series of books and will be published by MS Press around February 2009. The other authors are (also our good friends) Adam Machanic and Conor Cunningham.

Kimberly's going to write the chapter on Index Internals and I'm going to write the chapter on DBCC Internals. This is very exciting as these are our respective favorite subjects, as you probably already know - these won't be short chapters :-)

It's going to be a busy rest of the year - phew!

PS Next year we have plans to write a book ourselves - watch this space...

Thursday, July 10, 2008 2:37:31 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Sunday, June 08, 2008

Over the last few weeks I've seen (and helped correct) quite a few myths and misconceptions about index rebuild operations. There's enough now to make it worthwhile doing a blog post (and it's too hot here in Orlando for us to go sit by the pool so we're both sitting here blogging)...

Myth 1:  index rebuild pre-allocates the necessary space

This myth has two variations:

  1. The space for the new copy of the index is pre-allocated
  2. The space for the sort portion of the rebuild is pre-allocated

Neither of these are true. Index rebuild (whether online or offline, and at least as far back as 7.0) will create a new copy of the index before dropping the old copy. The pages and extents required to do this will always be allocated as needed, as with any other operation in SQL Server. The sort phase of an index rebuild, if required (in certain cases it is skipped in 2005), will adhere to the same allocation behavior.

Myth 2: indexes are rebuilt within a single file in a multi-file filegroup

This is a new one that I just heard yesterday - (paraphrasing) "In a two-file filegroup, an index in file 1 will be rebuilt into file 2. The next time it is rebuilt, it will be built in file 1. And so on".

This is untrue. Any time any allocations are done in a multi-file filegroup, the allocations are spread amongst all the files using the allocation system's proportional fill algorithm. In a nutshell, this says that space will be allocated more frequently from larger files with more free space than from smaller files with less free space. There is no concept in SQL Server of limiting allocations to a particular file in a multi-file filegroup.

Myth 3: non-clustered indexes are always rebuilt when a clustered index is rebuilt

This is untrue. The rules are a little complex here but can be summed up as follows:

  • In 2005+, rebuilding a unique or non-unique clustered index (without changing its definition) will NOT rebuild the non-clustered indexes
  • In 2000:
    • Rebuilding a non-unique clustered index WILL rebuild the non-clustered indexes
    • Rebuilding a unique clustered index will NOT rebuild the non-clustered indexes

The first few service packs of 2000 had bugs that changed the behavior of rebuilding unique clustered indexes back and forth - this is the source of much of the confusion around this myth.

For a much more detailed discussion of this, see my blog post from last Fall - Indexes From Every Angle: What happens to non-clustered indexes when the table structure is changed?.

Myth 4: BULK_LOGGED recovery mode decreases the size of the transaction log and log backups for an index rebuild

This myth is partly true.

Switching to the BULK_LOGGED recovery mode while doing an index rebuild operation WILL reduce the amount of transaction log generated, which is very useful for limiting the size of the transaction log file (note I say 'file', not 'files' - you only need one log file).

Switching to the BULK_LOGGED recovery mode while doing an index rebuild will NOT reduce the size of the transaction log BACKUP. Although the operation will be minimally-logged, the next transaction log backup will read all the transaction log since the last backup plus all the extents that were changed by the minimally-logged index rebuild. This will result in a log backup that's almost exactly the same size as for a fully-logged index rebuild. The ONLY time a log backup will contain data extents is when a minimally-logged operation has taken place since the last log backup - see here on MSDN for more info.

If you're considering using the BULK_LOGGED recovery mode, beware that you lose the ability to do point-in-time recovery to ANY point covered by a transaction log backup that contains even a single minimally-logged operation. Make sure that there's nothing else happening in the database that you may need to effectively roll-back with P.I.T. recovery. The operations you should perform if you're going to do this are:

  • In FULL recovery mode, take log backup immediately before switching to BULK_LOGGED
  • Switch to BULK_LOGGED and do the index rebuild
  • Switch back to FULL and immediately take a log backup

This limits the time period in which you can't do P.I.T. recovery.

Myth 5: online index rebuild doesn't take any locks

This myth is untrue. The 'online' in 'online index operations' is a bit of a misnomer.  Online index operations need to take two very short-term table locks. An S (Shared) table lock at the start of the operation to force all write plans that could touch the index to recompile, and a SCH-M (Schema-Modification - think of it as an Exclusive) table lock at the end of operation to force all read and write plans that could touch the index to recompile.

The most recent time this came up on the forums was someone noticing insert queries timing out after an online index rebuild operation had just started. The problem is that the  table lock that online index rebuild needs has to be entered into the grant queue in the lock manager until it can be acquired - and it will stay there until existing transactions that are holding conflicting locks either commit or roll-back. Any transaction that requires a conflicting lock AFTER the index rebuild lock has been queued but not acquired (and then released) will wait behind it in the lock grant queue. If the query timeout is reached before the transaction can get it's lock, it will timeout.

This is still much better than the table lock being held for the entire duration of the index rebuild operation. For more info, checkout this whitepaper on Online Index Operations in SQL Server 2005.

Sunday, June 08, 2008 9:12:56 AM (Pacific Standard Time, UTC-08:00)  #    Comments [6]  | 
Wednesday, May 14, 2008

Kimberly and I were presenting at our local (Redmond) .Net Developers Association on Monday and the following question came up while Kimberly was talking about missing and extra indexes (paraphrasing):

What's the best non-clustered index to use for the query with a predicate WHERE lastname = 'Randal' AND firstname = 'Paul' AND middleinitial = 'S'?

Kimberly said that the order of the keys (e.g. lastname, firstname, middleinitial; or middleinitial, lastname, firstname; etc) doesn't matter for this case. I thought about it for a second and then argued, saying that the most selective column should come first. We agreed to discuss with the group at the end, but I thought about it some more and realized (and admitted to the group) that she's right - I should know better than to question Kimberly's knowledge of indexing... :-)

She's right because for a pure equality query using AND for multiple predicates, the Storage Engine will seek straight to the first exactly matching record in the index (and then scan for more matches if it's a non-unique index). It doesn't matter what order the index keys are defined because the Storage Engine is looking for an exact match.

When I started arguing, I was thinking about a phone book, which is ordered by lastname, firstname, middleinitial. You may think that a phone book is ordered that way because lastname is the most selective. Wrong. It's because the lastname is what most people know - it just happens to be the most selective of the three choices. Most SQL geeks should be able to find Kimberly in a phone book by looking for Tripp, Kimberly. But what if it was ordered by middleinital? I'd have no problem finding Kimberly, but how many of you would remember that her middleinitial is L? Probably a few as we both use our middle initials in our public names. What about if it was ordered by middleNAME? Again, no problem for me but who how many other people know her middle name is Lynn?

Then I started thinking about other queries and how they would play into the index choice to answer to the question above. If I also wanted to support a query with the predicate WHERE lastname = 'Randal', then having the left-most index key be anything other than lastname won't work so well. If the key order was firstname, middleinitial, lastname then all the distinct lastname values would be spread through the index rather than being together. The index might still be used to satisfy the query if it's the lowest cost index to use. However, having lastname be the leading key probably wouldn't work very well for a query with a predicate of WHERE firstname = 'Paul' - that argues for having firstname be the left-most index key.

Which should I choose? I probably I can't have both in the same index, so maybe I'd have TWO non-clustered indexes, to support both queries. The answer depends on how often the various queries are used and the trade-off between how much of a performance gain the non-clustered index would provide against the performance drop of having to maintain it during DML operations.

I hear time and again about people adding a non-clustered index for every column in the table, thinking that this will help - and my thinking is that this is wrong because these indexes can only satisfy a query where the only predicate is the column being indexed. I ran this argument past Kimberly and she added that these indexes could also be used if the column is chosen as the most selective in a multi-predicate query, and no other index has a lower cost than that one (a slim chance usually). Even what I though of as a simple case has caveats!

So what's the point of this post? Well, I wanted to show how indexing for one very simple query is pretty straightforward, but as soon as the number of different queries grows, and the query predicates get more complicated, indexing becomes more complex. You really have to know your workload and your data to know which columns are used, in what combinations, and how often - and then it helps to know how indexes are costed and used so that you can make intelligent choices about which indexes to define.

This thought-exercise has really shown me that I didn't know how much I don't know about indexes - I know precisely how they work at the Storage Engine level but not too much about how they're used by the Query Processor. I have new-found respect for Kimberly's indexing expertise. Luckily she's teaching a class at Microsoft called Indexing For Performance next week - I think I'll attend :-)

Wednesday, May 14, 2008 2:13:44 PM (Pacific Standard Time, UTC-08:00)  #    Comments [7]  | 
Saturday, January 26, 2008

It seems like all I've been talking about on the forums the last couple of days is the correct order of operations in a maintenance plan. The biggest confusion seems to be about the effect of rebuilding an index on statistics, with some incorrect advice being given out on what to do.

Rebuilding an index will update statistics with the equivalent of a full scan - doesn't matter whether you use DBCC DBREINDEX or ALTER INDEX ... REBUILD, the effect is the same. It can do this because it sees a complete picture of the index while its doing the rebuild.

Reorganizing an index (using the old DBCC INDEXDEFRAG I wrote, or the new ALTER INDEX ... REORGANIZE) will NOT update statistics at all, because it only sees a few pages of the index at a time.

The problem I've been seeing is people rebuilding indexes and then updating statistics. So why is this a problem? Well, it depends :-)

  • If your default for updating statistics is to do a sampled scan, then having rebuild update the statistics with a full scan (as a side effect), and then proceeding to manually update them again with the default sampled scan, means that after both operations you're left with sampled statistics. You've wasted resources doing the sampled scan AND you've lost the 'free' full-scan statistics that the index rebuild did for you.
  • If your default is to do a full scan, then you don't lose out on having the best statistics, but you do waste resources by unnecessarily updating statistics a second time.

So what's the solution?

The simple answer is not to update statistics on indexes that have just been rebuilt.

The more complicated answer is to:

  1. Have a list of indexes (F) that you know will cause workload performance problems if they get fragmented
  2. Have a list of indexes (S) that you know will cause workload performance problems if the statistics don't get regularly updated
  3. Evaluate fragmentation for list F, and choose to reorganize, rebuild, or do nothing
  4. For all indexes in list S that were not rebuilt in step 3, update statistics

Hope this helps.

Saturday, January 26, 2008 4:01:41 PM (Pacific Standard Time, UTC-08:00)  #    Comments [8]  | 
Thursday, January 17, 2008

One of the cool features of SQL Server 2008 for developers is spatial data support. There have been some great posts recently about using spatial (see Simon Sabin's blog here), which is all developer stuff - but what I'm interested in are the implications of spatial support for DBAs, and they are focused on spatial indexes

Note: all images in this post are taken from November CTP Books Online

There are two kinds of spatial data that 2008 supports - planar (i.e. points, lines, polygons on a single 2-D plane) and geodetic (i.e. points, lines, polygons on a geodetic ellipsoid - for example, the Earth). These are presented in SQL as geometry and geography data respectively. A common operation that's performed on spatial data is comparing two spatial values to see if they intersect at all. Now, this is a complicated calculation, which gets more computationally expensive as the complexity of the spatial values increase. Given a problem of 'which spatial values in this table does this spatial value X intersect with', it would be great to have some way of quickly pruning out spatial values in the table that cannot possibly intersect with X, and so avoid doing the expensive calculation for them. Enter spatial indexes.

Here's the basic idea behind a spatial index:

  • A plane is broken up into a grid of cells.
  • Each spatial value is evaluated to see which cells in the grid it intersects with
  • The list of cells is stored along with the primary key of the table row that the spatial value is part of
  • Comparing two spatial values for intersection is a matter of comparing the list of grid cells - if there are no matching cells, the spatial values do not intersect, and there's no need to do the expensive intersection calculation

In practice its a bit more complicated. For planar data (i.e. the geometry data type), you need to define a bounding box (i.e. 4 corner points that define a rectangle of space in which you're interested on the 2-D plane). That bounding box on the plane will be broken down into a grid of cells. The top-level grid can be up to 16x16, giving 256 cells. The next level of granularity breaks each of those top-level grid cells into a further grid, again up to 16x16. So now there could be (16x16) x (16x16) cells in the grid - or 65536 cells. This obviously allows a more exact description of a spatial value in the list of cells. And so on and so on. There are actually 4 levels of grid that the bounding box is broken up into - and each can be 16x16, for a possible total of 168 or 4 billion cells. The picture below illustrates this with a grid size of 4x4 at each level.

The bounding box and the size of the grid at each level are specified when the spatial index is created, as well as the maximum number of matching grid cells to store in the spatial index per spatial value - to a max of 8192. Once the bounding box has been decomposed into the various levels of grid, each value in the spatial column is evaluated against the grid. The value is first decomposed against the first level grid. If the number of cells it matches is less than the max per spatial value, the decomposition then moves to the second level grid. This decomposition continues until the maximum number of matching grid cells is reached. If the max is reached while processing a deeper level for a cell, (e.g. in the middle of processing the 2nd level grid of 4x4 for cell #13 in the 1st level), the deeper level matches are thrown away and only the coarser granularity matching cell is stored (e.g. continuing that example, the 2nd level grid matches are discarded and only cell #13 in the 1st level will be stored). The picture below helps to illustrate this.

So, each geometric spatial value is approximated in the spatial index by a list of matching cells in the defined bounding box. As there is a limit to the number of matching cells that can be stored in the approximation, it is an optimistic representation. This means that if two values are compared using the approximations, there will be no false negatives, as the approximations map a larger space than the actual spatial values. There can, however, be false positives. A false (or real) positive means the spatial values need to then be compared using the complex, computationally expensive intersection algorithm using the actual spatial values. So again, the spatial index serves as a way of pruning out the need to run the expensive algorithm.

The algorithm is very similar for geodetic data (i.e. the geography data type), however there's no bounding box. Instead, the entire geodetic ellipsoid is projected onto a 2-D plane and then the grid decomposition algorithm is applied to that plane in exactly the same way as for planar data. The picture below describes how the projection is done.

You may have already realized that the effectiveness of the spatial index in pruning is directly proportional to how exactly the approximations in the index actually describe the spatial values. In other words, the higher the number of grid cells at each level, and the higher the number of grid cell matches that are stored per spatial value in the index, the better the index is at pruning. More exact approximations require storing more matching grid cells at deeper granularities - i.e. taking MORE SPACE. Creating a more exact spatial index takes more space.

With all that in mind, the interesting thing for DBAs here is that there's a trade-off between CPU use to do the real intersection algorithm and spatial index size to use in pruning calls to the algorithm. It's too early to know what best practices there are - but I'll blog them as I here about them.

Thursday, January 17, 2008 11:59:06 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Wednesday, December 12, 2007

While we were in Barcelona we sat down with Richard Campbell and Greg Hughes from RunAs Radio to record a 1/2 hour interview on SQL Server 2008. We touch on a ton of different features (look at the number of Categories I've tagged this with!) and have a bunch of laughs along the way - check it out here.

PS There's been a ton of interest in the slide deck idea I had so we'll be going ahead with that. Look for an announcement sometime in the first few months of next year about how to get them. Thanks to everyone that replied!

Wednesday, December 12, 2007 10:07:05 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Monday, November 12, 2007

I’m in the middle of a flight from Washington D.C. to Zurich on the way to Barcelona for TechEd IT Forum and I can’t sleep – Kimberly’s out like a light so what else is there to do except write another blog post? :-) OK - actually posting this from Barcelona on Tuesday before our first of 12 sessions here!

In the Database Maintenance workshop we did at SQL Connections last week I promised to blog a script I used to show how data file shrink operations cause massive fragmentation of indexes. The reason is that data file shrink starts at the end of the data file and moves a single page at a time to a free space below the shrink threshold. In the process of doing so, it perfectly reverses the physical order of the pages comprising the leaf level of an index – thus perfectly fragmenting it!!

Let’s try out my simple script that demonstrates this. First thing I’m going to do is create a new database and create a 10MB ‘filler’ table, which I’m going to then drop later to create a space that shrink can use.

USE MASTER;

GO

 

IF DATABASEPROPERTYEX ('shrinktest', 'Version') > 0

      DROP DATABASE shrinktest;

 

CREATE DATABASE shrinktest;

GO

USE shrinktest;

GO

 

SET NOCOUNT ON;

GO

 

-- Create and fill the filler table

CREATE TABLE filler (c1 INT IDENTITY, c2 VARCHAR(8000))

GO

DECLARE @a INT;

SELECT @a = 1;

WHILE (@a < 1280) -- insert 10MB

BEGIN

      INSERT INTO filler VALUES (REPLICATE ('a', 5000));

      SELECT @a = @a + 1;

END;

GO

Next I’ll create the ‘production’ table that we’d really like to keep in optimal shape for performance.

-- Create and fill the production table

CREATE TABLE production (c1 INT IDENTITY, c2 VARCHAR (8000));

CREATE CLUSTERED INDEX prod_cl ON production (c1);

GO

DECLARE @a INT;

SELECT @a = 1;

WHILE (@a < 1280) -- insert 10MB

BEGIN

      INSERT INTO production VALUES (REPLICATE ('a', 5000));

      SELECT @a = @a + 1;

END;

GO 

Now I’ll use the sys.dm_db_index_physical_stats DMV to check the fragmentation of the production table’s clustered index – it should be almost zero:

-- check the fragmentation of the production table

SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (

      DB_ID ('shrinktest'), OBJECT_ID ('production'), 1, NULL, 'LIMITED');

GO

avg_fragmentation_in_percent fragment_count

---------------------------- --------------------

0.390930414386239            6

This is what I expected. Now I’m going to drop the filler table, run a shrink operation and then check the fragmentation again:

-- drop the filler table and shrink the database

DROP TABLE filler;

GO

 

-- shrink the database

DBCC SHRINKDATABASE (shrinktest);

GO

 

-- check the index fragmentation again

SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (

      DB_ID ('shrinktest'), OBJECT_ID ('production'), 1, NULL, 'LIMITED');

GO

avg_fragmentation_in_percent fragment_count

---------------------------- --------------------

99.7654417513683             1277

Wow! The index went from almost 0% fragmented to almost 100% fragmented – the shrink operation totally reversed the physical ordering of the leaf level of the clustered index – nasty.

One of the common maintenance operations I see at customer sites is to run a database shrink at some interval, and I always advise against it – now you can see why. Running a regular shrink operation can cause horrible fragmentation problems. The worst problems I see are those customers with maintenance plans that rebuild all indexes and then run a shrink to remove the extra space necessary for the index rebuilds – completely undoing the effects of the index rebuild!

One other common thing I see is to have auto-shrink set on for one or databases. This is bad for several reasons:

  • Shrink causes index fragmentation, as I’ve just demonstrated above.
  • You can't control when it kicks in. Although it doesn't have any effect like long-term blocking, it does take up a lot of resources, both IO and CPU. It also moves a lot of data through the buffer pool and so can cause hot pages to be pushed out to disk, slowing things down further. If the server is already pushing the limits of the IO subsystem, running shrink may push it over, causing long disk queue lengths and possibly IO timeouts.
  • You're likely to get into a death-spiral of auto-grow then auto-shrink then auto-grow then auto-shrink... (in my experience, if someone is using auto-shrink, they're most likely using and relying on auto-grow too). An active database usually requires free space for normal operations - so if you take that free space away then the database just has to grow again. This is bad for several reasons:
    • Repeatedly shrinking and growing the data files will cause file-system level fragmentation, which can slow down performance
    • It wastes a huge amount of resources, basically running the shrink algorithm for no reason
    • Auto-grow itself can be bad, especially if you're using SQL Server 2000 (or don't have Instant File Initialization turned on - see this post from Kimberly's blog) where all allocations to the file being grown are blocked while the new portion of the file is being zero-initialized.

Bottom-line: auto-shrink should *NEVER* be turned on…

Monday, November 12, 2007 8:28:41 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]  | 
Tuesday, November 06, 2007

A couple more questions from the last couple of classes.

Q1) Why doesn't performing an index rebuild alter the fragmentation?

A1) Here are the possibilities - all of which I've seen happen:

  • There isn't an index - either DBCC DBREINDEX or ALTER INDEX ... REBUILD are being run on a table that only has a heap, and so the (extent) fragmentation of the heap isn't changing because there's no way to rebuild a heap (except by the heavily NOT recommended method of creating and then dropping a clustered index).
  • The index is too small. An index with only a handful of pages may not show any changes in fragmentation because all the pages are single, mixed pages (see my previous post on extent types for more info) and so rebuilding the index does nothing.
  • The workload and schema are such that by the time the rebuild has finished and the fragmentation calculation has been done again, the index is already getting fragmented again.
  • The Extent Scan Fragmentation result from DBCC SHOWCONTIG is being used to gauge fragmentation for an index stored in a filegroup with multiple files. The Extent Scan Fragmentation in DBCC SHOWCONTIG does not cope with multiple files (as is documented in Books Online) and so the value may even go UP in some cases!

Q2) What operations take advantage of minimal-logging when the recovery mode is BULK_LOGGED?

A2) The list is very small - 4 four classes of operations:

  • Index builds, rebuilds, or drop of a clustered index (NOT index defrags with DBCC INDEXDEFRAG or ALTER INDEX ... REORGANIZE - this is a common misconception).
  • Bulk load operations (i.e. BCP, INSERT ... SELECT * FROM OPENROWSET (BULK...), and BULK INSERT).
  • Insert or appends of LOB data (either using WRITETEXT/UPDATETEXT for TEXT/NTEXT/IMAGE data types, or UPDATE with a .WRITE clause).
  • SELECT INTO operations on permanent tables.

For these operations, only the allocations are logged in the transaction log. Any extents that are allocated and changed through a minimally-logged operation are marked in the ML bitmaps (one for every 4GB of each file) and then the next transaction log backup will also read all those extents and include them in the backup.

Tuesday, November 06, 2007 3:17:02 PM (Pacific Standard Time, UTC-08:00)  #    Comments [3]  | 
Wednesday, October 31, 2007

This is a quick answer to a question I was sent today by someone who'd read Kimberly's partitioning whitepaper - Partitioned Tables and Indexes in SQL Server 2005 - and is implementing a "sliding-window" scenario. (This is a mechanism to allow fast insertion and deletion of significant portions of data into/from a partitioned production table. Insertion is done by taking a table and making it a new partition of the production table - called switching-in. Deletion is done by removing a partition from the production table and making it into a stand-alone table - called switching-out.)

The question is - what indexes are required on the staging table to prevent the ALTER TABLE ... SWITCH PARTITION statement from failing with a message like that below:

Msg 4947, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. There is no identical index in source table 'PartitionTest.dbo.StagingTable' for the index 'NC_Birthday' in target table 'PartitionTest.dbo.ProductionTable'.

The answer is that the staging table has to have the exact same indexes - clustered and non-clustered - as the production table. I asked Kimberly if it has to have the same constraints too - the answer is yes, plus the staging table has to have a trusted constraint on it such that SQL Server can tell (without checking all the data in the staging table) that all the data satisfies the partitioning function for the partition that you're switching-in (i.e. the partition that the staging table will become in the production table). If it doesn't, the switching-in will fail with the following error:

Msg 4982, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. Check constraints of source table 'PartitionTest.dbo.StagingTable' allow values that are not allowed by range defined by partition 4 on target table 'PartitionTest.dbo.ProductionTable'.

One thing that confuses people is that SQL Server does not create the target table for you when doing a switch-out of a partition. The target table has to exist and have the exact same schema as the production table. Also, it has to be completely empty - otherwise you'll get an error like:

Msg 4905, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. The target table 'PartitionTest.dbo.StagingTable' must be empty.

The must-be-empty requirement also holds for switching-in operations - the partition that will be created has to be empty otherwise a similar 4904 error results.

Hope this helps!

Wednesday, October 31, 2007 6:05:57 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Friday, October 19, 2007

Two of the cool features in SQL Server 2005 are CROSS APPLY and DMVs (Dynamic Management Views). Now, far be it for me to get my hands dirty explaining developer stuff like CROSS APPLY :-) but I was having a discussion with Colin Leversuch-Roberts in the UK about the composability limitations of the sys.dm_db_index_physical_stats DMV. (Btw - you should check out Colin's blog post series on Analysing Indexes - lots of useful stuff).

So CROSS APPLY lets you do join-like functionality with table-valued functions that take parameters - which you can't do using JOIN. This works for most of the DMVs, but some of them are written to an older internal implementation that doesn't support CROSS APPLY, and sys.dm_db_index_physical_stats is one of them. If you try it you'll get an error like:

Msg 413, Level 16, State 1, Line 26

Correlated parameters or sub-queries are not supported by the inline function "master.sys.dm_db_index_physical_stats".

Fortunately there's a way around this - wrap the DMV in an artificial TVF of your own, and then CROSS APPLY to that. Here's an example:

 

CREATE FUNCTION my_index_physical_stats (

@database_id INT,

@object_id INT,

@index_id INT,

@partition_number INT,

@mode INT)

RETURNS @result TABLE (

database_id SMALLINT NULL, object_id INT NULL, index_id INT NULL, partition_number INT NULL,

index_type_desc NVARCHAR(60) NULL, alloc_unit_type_desc NVARCHAR (60) NULL, index_depth TINYINT NULL,

index_level TINYINT NULL, avg_fragmentation_in_percent FLOAT NULL, fragment_count BIGINT NULL,

avg_fragment_size_in_pages FLOAT NULL, page_count BIGINT NULL, avg_page_space_used_in_percent FLOAT NULL,

record_count BIGINT NULL, ghost_record_count BIGINT NULL, version_ghost_record_count BIGINT NULL,

min_record_size_in_bytes INT NULL, max_record_size_in_bytes INT NULL, avg_record_size_in_bytes FLOAT NULL,

forwarded_record_count BIGINT NULL)

BEGIN

INSERT INTO @result SELECT * FROM

sys.dm_db_index_physical_stats (@database_id, @object_id, @index_id, @partition_number, @mode)

RETURN

END;

GO

Friday, October 19, 2007 5:35:26 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]  | 
Thursday, October 04, 2007

(This is an updated repost from earlier this year on my old blog)

Whenever I’m discussing index maintenance, and specifically fragmentation, I always make a point of saying ‘Make sure the index is being used before doing anything about fragmentation’. If an index isn’t being used very much, but has very low page density (lots of free space in the index pages), then it will be occupying a lot more disk space than it could do and it may be worth compacting (with a rebuild or a defrag) to get that disk space back. However, usually there’s not much point spending resources to remove any kind of fragmentation when an index isn’t being used. This is especially true of those people who rebuild all indexes every night or every week.