The SQLskills SQL Immersion Week is broken down into two primary parts:

Part I: Server-side Performance Tuning delivered by Kimberly L. Tripp - which runs from Monday through Thursday, October 11 through the 15
Part II: Client-side Data Access Best Practices delivered by William R. Vaughn on Friday, October 15.

Monday through Friday the formal lecture will run from 9am until 5pm with breakfast, lunch and snacks included and at 5pm each day we will go into “self-paced laptop lab time” from 5pm until 7pm. During this time we will have more specific project Q&A, self-paced labs (from the course demo databases, course scripts, etc.) and BYOP (Bring Your Own Problem) where you can work on/resolve your own issues in an informal group chat, etc. This extended course time will occur Monday through Thursday with lecture ending at 5pm on Friday. In addition to the intense technical focus, we will offer an opening dessert reception on Sunday evening and on Thursday evening we're going to sample some of Chicago's own with a dinner out!

You'll receive printed materials, a course CD which includes all slides in pdf form as well as all demo scripts and the course demo database, a certificate of completion and a few other special course gifts! All of these things are included in your price.

The course is limited to only 36 people, we'll have wireless internet access in the classroom and cable based access in our hotel rooms and almost all meals are included with the exception of M-W night dinners. Once you arrive at the hotel you'll incur very few additional expenses and the week will be focused on nothing but making your servers scale!!! The hotel has a pool and exercise room and there are tons of great restaurants within walking distance.

For more details, check out SQL Immersion Events on SQLskills.com.

We hope to see you there!

Categories:
Events

The week began with my playing shuttle bus, hotel rest stop and tour guide - and loving it... Friends from around the globe were landing in Seattle and so I was off to the airport (Monday) to greet them and get them over to the hotel. I live in Redmond and I own an SUV - a perfect combination for becoming Kimberly tour bus. I picked up Clemens Vasters, Hans Verbeeck and Andreas Eide and since the weather forecast said it was going to be a crummy week I insisted we head to Maggies Bluff for lunch. Maggies Bluff is the lowkey downstairs bar/restaurant which shares a kitchen with Palisades. Outdoor seating, plentiful parking, spectacular views of the city and a killer openface crab sandwich made it a must start to their stay in Seattle. And - to make it even more important, the weather was unexpectedly perfect... The forecast was for 60 and rain (of course) and it was 80 and sunny (go figure). But - this is about right for Seattle weather prediction. After lunch I dropped the guys off at the hotel and worked for a bit before I was off to the airport again - to pick up Goksin Bakir and Christian Weyer. Arriving just about dinner time we headed back to the hotel (without Christian's luggage as his luggage didn't make the connection ;) to hook up with a few folks for dinner. Off to the Bellevue Tap House for dinner and a relatively early night because tour guide Kimberly was soon to arrive.

However, tour guide Kimberly was foiled by the weather. Unfortunately, they were correct for the rest of the week and that messed up my plans for Tuesday... Originally, my plans were these:

Pick-up at the Hyatt at 8am sharp!

Quick breakfast stop at Gilbert's Bagels and grab bagels for the road...

Make the Edmonds/Kingston ferry at 9:20am

Drive out to Hurricane Ridge - should be there by 11:30-12. It's a beautiful stop with amazing views!

Go back through Sequim to the Three Crabs restaurant for lunch.

Head to Port Townsend to make the 2:15 the 3:00 or 3:45pm ferry to Keystone (on Whidbey Island) (this is where we can also decide to just head back via Edmonds - if we're running late?), stop at a Whidbey Winery if time permits and head up to Deception Pass

Drive back down I-5 from there to make the evening reception by 6:30ish

NOTE: If anyone ends up following this itinerary it will probably be a later return than 6:30 - I think this would be a great itinerary if you had a really full day and a full tank of gas... certainly let me know if the timings work! If you can extend the day a bit then you can probably stop at a winery after the ferry to Whidbey Island and then have dinner up in Anacortes, etc. It's a long day but a really great one (I would think) if the weather is good and you're not super rushed to be back early.

Anyway, with the Olypmics (the mountains!) socked in by clouds and the driving to be dulled by rain we decided to go a different route... We went for a pick up at 9, Gilbert's for a more relaxed breakfast and then we were off to the Museum of Flight (as suggested by Clemens, good idea CV). In my car for breakfast was Christian Weyer, Benjamin Mitchell, Clemens Vasters, Andreas Eide, Richard Campbell (aka Toy Boy) and the still blogless Goksin Bakir. We lost Benjamin and Andreas to Microsoft for the day but the rest of us were off to see Airforce One, Concorde, the Red Barn, and the new WWII exhibit. Here are a few interesting links for Concorde info:

Concorde on loan to Seattle Museum of Flight
Concorde Trivia from the BBC
British Airways Concorde Page

It was raining a bit so as we waited to get on Concorde, we stood underneath it. The photographer gets no glory but here's the a nice shot of the motley crew MotleyCrewUnderConcorde.JPG (261.05 KB). And here, well I'm glad Christian has a day job as he could never be a flight attendant ChristianFlightAttendant.JPG (101.28 KB)!

As we walked from Concorde back to the Museum of Flight I thought of the Boeing Plant tour in Everett Washington. I've always wanted to do this... so I called information to get the number for Boeing in Everett. After a few minutes of searching, the operator tells me there's nothing listed for BoWing in Everett. After spelling BoEing TWICE - we got the correct number.... hmmmmmm (that's all I'm going to say). Anyway, it was a go! This is the home to the 747, 767, and the 777. We drove up to Everett to make the 3pm tour (FYI - you cannot make reservations for same day tours so you might want to pre-order your tickets as they often sell out). I think we were all amazed at how large (and sadly, how empty) the facilities are. You can hear the statistics and sense that it's HUGE but it was another thing to be there. The doors were the size of 6 football fields.... I have many hopes that Boeing's 7E7 brings them back to filling every assembly spot!

After the tour we stopped for quick snack at the Keg and then headed back for registration and the first opening evening party at the Rock Bottom Brewery in Bellevue. With the intense three days that followed it was a nice relaxing way to start the event.

Thursday evening after the event ended a few of us ended up in the Hyatt's Hotel Bar. This photo (ThursNightCheers.JPG (93.35 KB)) is thanks to Richard Turner - who has challenged me to a game of chicken (this is from an email and well, I'm not going to explain). And, I'm looking forward to it. ;)

Since I'm leaving the technology stuff for later I'm going to skip to the closing event... Friday night boating on Lake Washington/Lake Union with DJ host Eric Schmidt EricSchmidtDJ.JPG (151.26 KB), formerly of SQL and now a Microsoft Technical Evangelist who hosted the Connected System's Track. Dancing got a bit out of control (mostly with Forte, Sanjay, and Goksin - Forte blogged about it here) but at least the boat didn't sink. Earlier, rumors spread that we were stopping at Bill's house for dinner (which is on Lake Washington) but, oh darn, we just passed by.

So, the event ended and it was an intense week of technology and socializing... Oh, and if you're wondering what the ECC stands for... it was the tla that was given to the event - as it was a bit intense (to say the least). Just to give you an idea of the hours - Wednesday ran from 8:30am until 9:30pm, Thursday ran from 8:00am until 9:00pm and Friday ran from 8:00 until 4:45 with bus pickups at 5:15 for the evening entertainment. We had 1 hr for lunch and 1hr for dinner and only a couple of breaks throughout the day. People who traveled to the event stayed in the hotel, ate in the hotel, and barely even saw the outside for 3 days..... hence the tla. You can probably figure it out here!

Thanks for a great week everyone!

Categories:
Events | Opinions

People always ask me how I keep up to date with SQL Server and how I could possibly know so many intricate details about SQL Server… Well, I test/play/figure it out until it’s obvious (painfully sometimes and sometimes it’s not obvious). So, this blog entry is based on a specific question that made me want to see if I could come up with a test and positive proof in patterns and performance – when working with large tables and range deletes. I generally recommend a clustered index be created on a unique, narrow and static column – and best if it’s an identity column – yet there is an exception to this when the table has VERY specific administrative/operational/archival patterns a.k.a. the “sliding window” scenario. The question starts by asking about my point that “deletes aren’t as big of a deal” in fragmentation. While my point was solely referring to the internal v. external fragmentation issue that led us to take some more time evaluating delete performance. In talking about delete performance, we arrive at some best practices in index creation as well as horizontal partitioning! (fyi – the scripts used to create/build this scenario are fairly cool in and of themselves – not a bad idea to spend some time reviewing that!)

 

Q: You mention that deletes are not as “big of a deal” since they leave gaps. I had a situation where I had a table with around 50 million rows, about a million rows per day. After about a month and a half, I delete some old ones. I can delete a day's worth of data (1 million rows) in about 2 minutes with a delete statement, if there are no indexes. If I have a clustered index based on the date, it took about 22 minutes. Inserts were instantaneous, about two pages worth of rows every second, and even updates were quicker. Is there anything obvious that would cause this?

OK, so I need to clarify first. My comment that deletes aren't as “big of an issue” is specifically regarding fragmentation (not really performance of the actual delete)... A delete (when it’s a range delete – as opposed to a lot of singleton deletes) typically de-allocates the entire page freeing it back to the table (remember, it’s still “reserved” by this table) or if the entire extent is freed then it goes back to the DB. However, remember, this is only when the range is contiguous or grouped, per se. So, looking solely at the result and impact to the base table – we would say that range deletes aren’t that big of a deal because they free entire pages and even entire extents – effectively, the table doesn’t have gaps. OK, so having said that, you make a good point: what about performance?

 

First, I’m somewhat intrigued by the fact that you say a delete was faster with no indexes rather than with a clustered especially if the clustered was date BUT I have a feeling that there might have been nonclustered indexes as well? (In the upcoming test matrix, you may see some patterns in timing that corroborate this idea).  Without knowing whether you had non-clustered indexes or not, let me at least speculate and test (and hopefully prove) a few points? First, I would ask what the exact definition of the CL index was – was it just date? Remember, I really feel as though there are three types of tables – tables with the wrong clustered index, tables with NO clustered index (i.e. HEAP) and tables with the right clustered index. However, the “right” clustered index CAN vary based on pattern usage of the table. In a table that has range deletes based on dates – as well as an identity column – I often create a clustered index on the composite combination of these two columns. If the CL Key is on the date itself then this should not be that much slower (that the heap – in fact, I would have expected it to be faster even though there will be some overhead in maintaining the index). I really think there’s something else there – probably nonclustered indexes, maybe a blocking scenario? So, to help – I think I can prove a few points (in terms of perf) with a script/test to see if I can give you some rough numbers (I’m really, really intrigued to see if I can’t prove my point as well J). So, I decided to try an interesting combination – and really test the numbers. My test (albeit not entirely scientific) should show a pattern! First, I’m going to take the 1.6 million row charge table (in my “pumped up version of the credit database) and I’m going to copy it 10 times into slightly different table structures. Details and the creation for each table follow:

 

ChargeHeap – A table with NO Clustered index and NO non-clustered indexes.

SELECT *

INTO ChargeHeap

FROM Credit.dbo.Charge

 

ChargeCLPKForDelete – Is really NOT that bad of a choice… I typically recommend this choice for everyday tables where there are lots of inserts/updates/deletes. However, with the specific “sliding window” scenario (where data comes in at the end AND is periodically archived off the other end) then I might make a different choice. But – still a good table to create and compare against. This will certainly improve inserts and minimize fragmentation BUT it might not be ideal for range deletes and archiving. This table has a clustered index on the Primary Key; this column is an ever-increasing identity value on charge_no.

 

SELECT *

INTO ChargeCLPKForDelete

FROM Credit.dbo.Charge

go

 

ALTER TABLE ChargeTesting.dbo.ChargeCLPKForDelete

ADD CONSTRAINT ChargeCLPKForDeletePK

          PRIMARY KEY CLUSTERED (Charge_NO)

go

 

Again, because this table will follow a “sliding window” scenario then I would change the clustering key to include the date and then the ID. I don’t typically chose date (as a single column primary key) because dates alone are rarely, if ever, unique (for more details on datetime data check out the articles I’ve written on Datetime data for SQL Magazine here). Because you already have an identity column in the table (which uniquely identfies every row), there’s no need to add a “uniqifier.”  The process of “uniqification” happens internally if the clustereing key is defined on a non-unique column (see last month’s webcast if you’re not familiar with this “uniqification” process). In the case where you already have a row identifier, it’s best to use it in the clustering key with date – especially as it has useful data in it. So – this leads us to the next TWO tables…

 

ChargeCLDateForDelete – Is a clustered index on the date ALONE.

 

SELECT *

INTO ChargeCLDateForDelete

FROM Credit.dbo.Charge

go

 

CREATE CLUSTERED INDEX ChargeCLDateForDelete

ON ChargeTesting.dbo.ChargeCLDateForDelete (Charge_DT)

go

 

ChargeCLDateForDeleteWithCompPK – Is a composite clustered primary key index on the date, charge_no together.

 

SELECT *

INTO ChargeCLDateForDeleteWithCompPK

FROM Credit.dbo.Charge

go

 

ALTER TABLE ChargeTesting.dbo.ChargeCLDateForDeleteWithCompPK

ADD CONSTRAINT ChargeCLDateForDeleteWithCompPK_PK

          PRIMARY KEY CLUSTERED (Charge_DT, Charge_NO)

go

 

The last base table will have a poor choice for the clustering key (for these deletes) as they will not have “ranges” with which to work.

 

ChargeCLReallyBadForDelete – This clustered index will be on something NOT used at all in this delete. The deletes will result in lots of internal fragmentation. In other words, the table will be larger and “internally” fragmented. The table will essentially look like “swiss cheese” meaning that the table will have lot of little gaps within the pages themselves.

 

SELECT *

INTO ChargeCLReallyBadForDelete

FROM Credit.dbo.Charge

go

 

CREATE CLUSTERED INDEX ChargeCLReallyBadForDelete

ON ChargeTesting.dbo.ChargeCLReallyBadForDelete (Charge_amt)

go

 

And what about nonclustered indexes? Finally, we will add five final tables – each of the above – with two non-clustered indexes added to see what their impact has on overall performance.

 

ChargeHeapWNCIndexes – Is a heap table with two non-clustered indexes.

 

SELECT *

INTO ChargeHeapWNCIndexes

FROM Credit.dbo.Charge

go

 

CREATE NONCLUSTERED INDEX ChargeHeapWNCIndexesNC2

ON ChargeHeapWNCIndexes (Statement_NO)

go

 

CREATE NONCLUSTERED INDEX ChargeHeapWNCIndexesNC3

ON ChargeHeapWNCIndexes (Member_no)

go

 

ChargeCLPKForDeleteWNCIndexes – Is clustered on the Primary Key and has two additional non-clustered indexes.

 

SELECT *

INTO ChargeCLBadForDeleteWNCIndexes

FROM Credit.dbo.Charge

go

 

ALTER TABLE ChargeTesting.dbo.ChargeCLBadForDeleteWNCIndexes

ADD CONSTRAINT ChargeCLBadForDeleteWNCIndexesPK

          PRIMARY KEY CLUSTERED (Charge_NO)

go

 

CREATE NONCLUSTERED INDEX ChargeCLBadForDeleteWNCIndexesNC2

ON ChargeCLBadForDeleteWNCIndexes (Statement_NO)

go

 

CREATE NONCLUSTERED INDEX ChargeCLBadForDeleteWNCIndexesNC3

ON ChargeCLBadForDeleteWNCIndexes (Member_no)

go

 

ChargeCLDateForDeleteWNCIndexes – Is clustered on the Date alone and has two additional non-clustered indexes.

 

SELECT *

INTO ChargeCLDateForDeleteWNCIndexes

FROM Credit.dbo.Charge

go

 

CREATE CLUSTERED INDEX ChargeCLDateForDeleteWNCIndexes

ON ChargeTesting.dbo.ChargeCLDateForDeleteWNCIndexes (Charge_DT)

go

 

CREATE NONCLUSTERED INDEX ChargeCLDateForDeleteWNCIndexesNC2

ON ChargeCLDateForDeleteWNCIndexes (Statement_NO)

go

 

CREATE NONCLUSTERED INDEX ChargeCLDateForDeleteWNCIndexesNC3

ON ChargeCLDateForDeleteWNCIndexes (Member_no)

go

 

ChargeCLDateForDeleteWithCompPKWNCIndexes – Is the composite primary key on the date and identity. This table also has the same two non-clustered indexes.

 

SELECT *

INTO ChargeCLDateForDeleteWithCompPKWNCIndexes

FROM Credit.dbo.Charge

go

 

ALTER TABLE ChargeTesting.dbo.ChargeCLDateForDeleteWithCompPKWNCIndexes

ADD CONSTRAINT ChargeCLDateForDeleteWithCompPKWNCIndexes_PK

          PRIMARY KEY CLUSTERED (Charge_DT, Charge_NO)

go

 

CREATE NONCLUSTERED INDEX ChargeCLDateForDeleteWithCompPKWNCIndexesNC2

ON ChargeCLDateForDeleteWithCompPKWNCIndexes (Statement_NO)

go

 

CREATE NONCLUSTERED INDEX ChargeCLDateForDeleteWithCompPKWNCIndexesNC3

ON ChargeCLDateForDeleteWithCompPKWNCIndexes (Member_no)

go

 

ChargeCLReallyBadForDeleteWNCIndexes – Is the poor choice (for this delete) of a clustered index on charge_amt. Again, this table will end up with a lot of internal fragmentation after these deletes (not to mention that inserts will probably cause a lot of fragmentation as well). Over time, this table would probably be one of the worst performers – for almost all DML.

 

SELECT *

INTO dbo.ChargeCLReallyBadForDeleteWNCIndexes

FROM Credit.dbo.Charge

go

 

CREATE CLUSTERED INDEX ChargeCLReallyBadForDeleteWNCIndexesCL

ON ChargeTesting.dbo.ChargeCLReallyBadForDeleteWNCIndexes (Charge_amt)

go

 

CREATE NONCLUSTERED INDEX ChargeCLReallyBadForDeleteWNCIndexesNC2

ON dbo.ChargeCLReallyBadForDeleteWNCIndexes (Statement_NO)

go

 

CREATE NONCLUSTERED INDEX ChargeCLReallyBadForDeleteWNCIndexesNC3

ON dbo.ChargeCLReallyBadForDeleteWNCIndexes (Member_no)

go

 

Well, now that you know the players we can setup the test. The idea of the test is to take a large chunk of early data “off” the table. Presumably to archive but I will not address the whole archiving process at this time. Once archived the data must be removed (for many – just removed – as they can always rely on backups if they need the earlier data… just make sure you keep those backups!)

 

In the tests we will delete a varying percentage of the table based on a specific date for the charge rows. I have modified and increased the size of credit in many ways before getting to these scripts so it might be hard to reproduce this BUT there are some great samples to use from these.

 

First, I increased the number of rows in credit.dbo.charge by just “doubling” the rows:

 

INSERT charge

SELECT c.[member_no] ,

            c.[provider_no] ,

            c.[category_no] ,

            c.[charge_dt] ,

            c.[charge_amt] ,

            c.[statement_no] ,

            c.[charge_code]

FROM charge AS c    -- to go to 200K rows

go

INSERT charge

SELECT c.[member_no] ,

            c.[provider_no] ,

            c.[category_no] ,

            c.[charge_dt] ,

            c.[charge_amt] ,

            c.[statement_no] ,

            c.[charge_code]

FROM charge AS c    -- to go to 400K rows

go

 

 

However, once I get the rows there I’ve realized that I have a bunch of data that doesn’t really follow an increasing ID and increasing date –which isn’t realistic at all… And, of course, I realized this on a version of my database which already had the rows increased and which I always use for all sorts of samples/examples SO – I’d rather fix the current data rather than re-write the INSERT/SELECTs. But (and as it would always happen, I found other problems – like that I had a gap in the identity values). Anyway, I decided that the best way to get dates to be ever increasing along with the identity is to FIRST cleanup the identity. I’m going to do that first by dropping and re-adding the identity col. Re-creating the identity column to reseed it and then updating the charge_dt column based on a starting time and then adding the charge_no to it. OK, so that’s not what I did BUT that would work to give you a relatively increasing value – there would be duplicates though as SQL Server rounds ms values to the nearest timetick. AND – I’m not going to get into this here at all but I did write a series of articles on datetime data for SQL Server Magazine, see here.

 

OK, so crank up your data, have ever-increasing identities to correspond to ever-increasing data and then create duplicate test tables based on the settings and patterns above. You can then create the testing patterns. The tests are going to be based on deleting a percentage of your data (you will need to determine your date ranges and estimate the correct date which is 5-10% of your table). 5-10 percent is a reasonable percentage to delete based on an “archiving” principle/idea (in fact, this might be large in many situations). Since the date values are the same for all tables this will make testing easier. Also, there a few simple testing practices here – use a separate database in which you track test run numbers, do multiple test runs and take averages (consider removing the highest and lowest values to get rid of any strange anomalies as well). SO, here’s the (TestingDeletePerformance SetupScript1 ChargeTestingResults.sql (2.49 KB)) that I used to track/store the test results. Note: you may need to change directory/path/instance names. Once the results database (ChargeTestingResults) has been created then you can create your testing database (based on your modified version of Credit.dbo.charge). The setup script to create this database (ChargeTesting), the testing procedure (DeleteTestRun) and perform a backup (to restore for reproducible test runs) is here (TestingDeletePerformance SetupScript2 ChargeTesting.sql (9.41 KB)).

 

Now, on to the testing…. You should run a series of test runs of slightly varying numbers of rows being deleted and then analyze your findings. I have a feeling that we’ll all get slightly different results on the extreme ends of a few tests BUT if you average the timings over multiple test runs I think you’ll find the same thing I’ve found!

 

So, you’ll need to set your @DateToDeleteTo and make sure your path is correct. based on your dates) but here’s the general idea… so that you input the timings into your ChargeTestingResults database:

 

DECLARE @DateToDeleteTo              datetime,

                   @NumberOfTestRuns     tinyint,

                   @CurrentTestRun           tinyint

 

-----------------------------------------

-----------------------------------------

SET @NumberOfTestRuns = 1

SET @DateToDeleteTo = 'yyyymmdd'

-----------------------------------------

-----------------------------------------

 

SET @CurrentTestRun = 0

 

WHILE @CurrentTestRun < @NumberOfTestRuns

BEGIN

          USE ChargeTesting

          EXEC DeleteTestRun @DateToDeleteTo

          USE Master

          RESTORE DATABASE ChargeTesting

                   FROM DISK = 'fix path'

          SET @CurrentTestRun = @CurrentTestRun + 1

END

go

 

OK, so now we’re ready to roll! I’ve run this 23 times (yes, thanks to my machine being idle for a few of my sleeping hours – it’s either a great time for SETI or automated tests…)

 

I’ve run the following query:

SELECT TableName, AVG(DeleteTime) AS [Average Time]

, MIN(DeleteTime) AS [Min Time]
, MAX(DeleteTime) AS [Max Time]

FROM ChargeTestingResults.dbo.DeleteTestResults

WHERE TestRunID NOT IN

                             (SELECT TestRunID

                             FROM ChargeTestingResults.dbo.DeleteTestResults AS DTR

                             WHERE DeleteTime = (SELECT min(DeleteTime)

                                                                             FROM ChargeTestingResults.dbo.DeleteTestResults AS DTRC -- correlated

                                                                             WHERE DTR.TableName = DTRC.TableName GROUP BY TableName)

                             OR

                                        DeleteTime = (SELECT max(DeleteTime)

                                                                             FROM ChargeTestingResults.dbo.DeleteTestResults AS DTRC2 -- correlated

                                                                             WHERE DTR.TableName = DTRC2.TableName GROUP BY TableName))

GROUP BY TableName

ORDER BY [Average Time]

 

This query removes the highest and lowest values from the test runs and then averages the final results into this data:

 

TableName

Average Time

Min Time

Max Time

ChargeCLDateForDelete

1566

1143

2903

ChargeCLDateForDeleteWithCompPK

2359

1253

8080

ChargeHeap

10392

9603

11476

ChargeCLPKForDelete

11377

9453

18476

ChargeCLReallyBadForDelete

17491

12640

35440

ChargeCLDateForDeleteWithCompPKWNCIndexes

19594

8020

29414

ChargeCLDateForDeleteWNCIndexes

22467

9243

63520

ChargeCLPKForDeleteWNCIndexes

30132

17343

59366

ChargeCLReallyBadForDeleteWNCIndexes

44208

25946

62000

ChargeHeapWNCIndexes

49407

19716

78383

 

Now, I’m sure you’ve completely forgotten what brought us to this result set but it all started when we were concerned about large scale range deletes… I really created this test set for multiple reasons:

1)     To give you ideas for tracking test runs (separate database, testing tables, dates/times and interesting data)

2)     Show you a few practices for reproducible tests (i.e. backup after setup and then restore before each test run)

3)     Show a variety of examples of dynamic string execution

4)     Show you an interesting query which removes the best case and worst case test scenario and then averages the results (fyi – that query is easier to write in SQL Server 2005 with common table expressions!).

5)     Prove my original point… (of course) J

6)     Lead you to wanting to learn more about the BEST point of all… consider partitioning.

 

So, let’s talk about #5 for a second – the original point was that I *think* the reason the deletes were slower with the clustered index is that you also had non-clustered indexes. Hard to say now but that would be my guess. The MORE interesting point is that the Date alone in the CL key is faster than the composite clustering key. Now, I’m not about to start doing insert testing or space testing but having a clustering key which is NOT unique has other problems. Given the VERY small difference in times for these deletes (yes, it’s almost 50% here but the numbers are VERY small comparatively speaking regardless – ok, we can certainly debate this BUT are you really ONLY focusing on performance for deletes or do you need to look at the big picture? When you do I would guess that INSERTs will be more important overall). But – what you really get to when you look over all of this you see the ultimate problem – NON CLUSTERED INDEXES…. Yes, if you have non-clustered indexes they will kill your delete performance. So – the point here is going to be – how can you get GREAT archiving performance when deleting ranges AS WELL AS great performance for INSERTs AS WELL AS a more manageable environment……… partitioning.

 

So – there was another reason for this whole entry… it’s partially to help try and convince a few of you (especially those of you that have tables in double-digit or triple-digit millions of rows) that large tables are harder to manage and large archival operations can be EXTREMELY costly!

Categories:
Events | Indexes | Resources | Tips

Q: How do I interpret DBCC SHOWCONTIG WITH ALL_INDEXES on index id = 255 on a table with multiple text columns. I'm looking at a copy of last night's production after running Maintenance Plan Optimizations. Scan density on table and all indexes is 99 - 100%. Logical Fragmentation is 0% for everything except text column index, which is over 99% (Extent Fragmentation is 83%). Do these numbers have any negative impact on performance? Are they worrisome?

OK, so the first thing to respond to is the “Logical Fragmentation is 0% for everything except text column indexes” is a bug. This is a recently filed bug. It’s based on using the ALL_INDEXES option with DBCC SHOWCONTIG so if you can walk the indexes individually instead then you can probably avoid this (and you could always tweak my version of sp_RebuildIndexes to do this).

 

Now, related to text indexes (and this is as per the Index Defrag Whitepaper), Logical Scan Fragmentation is not relevant on text indexes. However, Extent Scan Fragmentation is and yes, this is something that could be worrisome IF you return the text/image data to your clients in large requests (meaning LOBs measured in MBs+ or requests where you’re returning a lot of LOBs). If you think you’re doing large scans (or if the text data is very large and generally returned to the client) then you might want to compact this part of the database and even consider moving the text/image data into its own filegroup (but – remember that singleton selects won’t really benefit from thi