SQLintersection: a new year, a new conference

UPDATE (April 17, 2013): We just finished our FIRST SQLintersection conference with wild success. We’re currently working on our next event – scheduled for October 27-30, 2013. More posts coming soon but it looks like we have the right idea from two comments I received just today:

On Twitter (11:50am PT today) from Michelle Ufford (@sqlfool): Colleague who attended #SQLintersection said it was best conference he’s ever attended cc@KimberlyLTripp @PaulRandal @Kendra_Little @BrentO

In Email (today) from Geoff Lister: I had to write and tell you that last week’s SQL intersection was fantastic. I learned so much, and have a whole load of new questions for Google and things to research.  It was a real eye opener to aspects of SQL I didn’t even know to ask about before or did not realize how important they are.  Much of what you gave me I was able to implement during the conference to my production applications and have seen real performance gains and have some happy DBAs! In  particular, you gave me a much great appreciation of why the cache is so important and how to understand the execution plans a bit better and identify parameter sniffing issues.  

************** original post follows **************

Ill Be There SQLintersection: a new year, a new conferenceI’ve been speaking and presenting at conferences for years (16+ years to be exact) and while I’ve had a great time at all of these conferences (some more than others :), I’ve always felt like there was something missing. Nothing seemed to help bring cohesion across the sessions. Nothing really helped the attendees and speakers interact better. How do attendees really get their problems solved? Well, now I get to make those decisions and changes! Why? Because we’ve designed a NEW conference that helps intersect the right information with the right people. Our new show brings together real-world experts that present problem solving techniques and technologies that you can implement today; we’re calling it SQLintersection (#SQLintersection).

SQLintersection: It’s all about ROI

First and foremost, people want better performance. If your servers perform well you can process more data – you can get more done. But, what you need to do to get better performance varies. Sometimes it’s hardware – which might be an easier change, and sometimes it’s design/architecture – which might be significantly more complex. Sometimes it’s little tweaks – adding some indexes, removing some redundant indexes, updating statistics, adding more statistics, changing a procedure and/or the way that some of your procedures are cached, sometimes it’s all about IO, sometimes it’s the execution plans and the queries themselves. But, the biggest challenge is knowing where to look and knowing where and what these changes are, how to make them and then finally, implementing them with the lowest amount of downtime and data loss that’s possible.

That’s what we’ve done. We’ve put together a conference that’s primarily focused around performance, scalability, and troubleshooting but we haven’t forgotten realiability/automation.

SQLintersection: Track Hosts add interaction and information!

To bring cohesion to our event, each of our tracks will have a host (or an MC, per se) that will present a session or two as well as stay in their track room to introduce each session over a theme (each track will have a theme for each day). The host will be available to answer questions, help you interact with the right speakers and just generally give you insight that you can’t get other ways. Right now we have 3 track hosts: Brent Ozar, Aaron Bertrand and I will each host a track and we’ll be available in our track room all day (between sessions and for much of lunch as well) to really help you get your problems solved. And, we’ll end each track with an open Q&A panel with speakers from the track. You’ll hear great sessions and you’ll have multiple opportunities to interact with expert speakers, other attendees, and get your problems solved! And, in addition to three full conference days, there are five full-day workshops (2 days prior to the conference and 1 day after the conference) from which to choose and over 30 technical sessions mostly in the 200-400 level range.

SQLintersection: What about the speakers?

I’m so excited about this lineup. All of these speakers are top-rated, SQL experts that have been around in this field for years but are still focused on consulting. Every speaker is a SQL Server MVP (with the except of the vendor/Microsoft speakers – but, I don’t think anyone’s going to question Conor or Bob’s SQL knowledge :)) and some are Microsoft Certified Masters in SQL Server. But, no matter what – ALL are focused on providing the most ROI that’s possible in their session(s) and/or their workshops. Check out this list of speakers:

  • Aaron Bertrand, Sr. Consultant, SQL Sentry, Inc. [blog | twitter]
  • Andrew J. Kelly, Mentor, SolidQ [blog | twitter]
  • Bob Ward, Principal Architect Escalation Engineer, Microsoft [blog | twitter]
  • Brent Ozar, Brent Ozar Unlimited [blog | twitter]
  • Conor Cunningham, Principal Architect, SQL Server, Microsoft [blog]
  • Grant Fritchey, Product Evangelist, Red Gate Software [blog | twitter]
  • Jeremiah Peschka, Brent Ozar Unlimited [blog | twitter]
  • Joseph Sack, Principal Consultant, SQLskills.com [blog | twitter]
  • Kendra Little, Managing Director, Brent Ozar Unlimited [blog | twitter]
  • Kevin Kline, Director of Engineering Services, SQL Sentry, Inc. [blog | twitter]
  • Kimberly L. Tripp, President/Founder, SQLskills.com [blog | twitter]
  • Mat Young, Senior Director of Products, Fusion-io [blog | twitter]
  • Paul S. Randal, CEO / Owner, SQLskills.com [blog | twitter]
  • Paul White, SQL Kiwi Limited [blog | twitter]
  • Steve Jones, Editor, SQLServerCentral.com [blog | twitter]
  • Sumeet Bansal, Principal Solutions Architect, Fusion-io [blog | twitter]

SQLintersection: When is it all happening?

The show officially runs from April 8th through the 11th but there are both pre-conference and post-conference workshops. For the full conference, you’ll want to be there from Sunday, April 7th through Friday, April 12th.

SQLintersection: Why is it for you?

If you want practical information delivered by speakers that not-only know the technologies but are competent and consistently, highly-rated speakers – this is the show for you. You will understand the RIGHT features to troubleshoot and solve your performance and availability problems now!

We hope to see you there!

Cheers,
kt

What caused that plan to go horribly wrong – should you update statistics?

I’ve been seeing this over the past few years, imagine this scenario:

You have a stored procedure that runs well most of the time but sometimes it’s WAYYYYY off. It’s almost as though the performance of it went from great to horrible in a split second (like falling off of a cliff). You don’t know why but someone says – it’s got to be the statistics. In fact, if you have the luxury of time (which most folks don’t have), you execute it yourself and you check the plan – WOW, the estimated number of rows is WAY off from the actual rows. OK, it’s confirmed (you think); it’s statistics.

But, maybe it’s not…

See, a stored procedure, a parameterized statement executed with sp_executesql and prepared statements submitted by clients ALL reuse cached plans. These plans were defined using something called parameter sniffing. Parameter sniffing is not a problem itself – but, it can become a problem for later executions of that same statement/procedure. If a plan for one of these statements was created for parameters that only return 1 row then the plan might be simple and straightforward – use a nonclustered index and then do a bookmark lookup (that’s about as simple as it can get). But, if that same sp_execute statement/procedure/prepared statement runs again later with a parameter that returns thousands of rows then using the same plan created by sniffing that earlier parameter then it might not be good. And, this might be a rare execution. OR, it could be even more strange. These plans are not stored on disk; they are not permanent objects. They are created any time there is not already a plan in the cache. So, there are a variety of reasons why these plans can fall out of cache. And, if it just so happens that an atypical set of parameters are the first ones used after the plan has fallen out of cache (better described as “has been invalidated”) then a very poor plan could end up in cache and cause subsequent executions of typical parameters to be way off. Again, if you look at the actual plan you’ll probably see that the estimate is WAY off from the actual. But, it’s NOT likely to be a statistics problem.

But, let’s say that you think it is a statistics problem. What do you do?

You UPDATE STATISTICS tablename or you UPDATE STATISTICS tablename indexname (for an index that you specifically suspect to be out of date)

And, then you execute the procedure again and yep, it runs correctly this time. So, you think, yes, it must have been the statistics!

However, what you may have seen is a side-effect of having updated statistics. When you update statistics, SQL Server usually* does plan invalidation. Therefore, the plan that was in cache was invalidated. When you executed again, you got a new plan. This new plan used parameter sniffing to see the parameters you used and then it came up with a more appropriate plan. So, it probably wasn’t the statistics – it was the plan all along.

So, what can you do?

First, do not use update statistics as your first response. If you have a procedure that’s causing you grief you should consider recompiling it to see if you can get a better plan. How? You want to use sp_recompile procedurename. This will cause any plan in cache to be invalidated. This is a quick and simple operation. And, it will tell you whether or not you have a recompilation problem (and not a statistics problem). If you get a good plan then what you know is that your stored procedure might need some “tweaking” to its code. I’ve outlined a few things that you can use to help you here: Stored procedures, recompilation and .NetRocks. If that doesn’t work, then you MIGHT need to update statistics. What you should really do first though is make sure that the compiled value of the code IS the same as the execution value of the code. If you use “show actual plan” you can see this by checking the properties window (F4) and hovering over the output/select.

properties2 What caused that plan to go horribly wrong   should you update statistics?

This will confirm that the execution did (or did not) use those values to compile the plan. If they were the correct values then you might have a statistics problem. But, it’s often blamed and it’s not actually the problem. It’s the plan.

OK, there’s a bit more to this…

*Do plans ALWAYS get invalidated when you update statistics? No…

Erin Stellato (blog | twitter) first blogged about this here: Statistics and Recompilation.

And, also here: Statistics and Recompilation, Part II.

Here’s a quick summary though because it looks like things have changed again in SQL Server 2012…

  • In SQL Server 2005, 2008 and 2008R2 – updating statistics only caused plan invalidation when the database option auto update statistics is on.
  • In SQL Server 2012 – updating statistics does not cause plan invalidation regardless of the database option.

So, what’s the problem? Ironically, I kind of like this. I think that statistics has been blamed all too often for statement/plan problems when it’s not the statistics, it’s the plan. So, I like that there will be fewer false positives. But, at the same time, if I update statistics off hours, I DEFINITELY want SQL Server to invalidate plans and re-sniff my parameter (especially if the data HAS changed) and possibly get new plans from my updated stats.

In the end, I did chat with some folks on the SQL team and yes, it looks like a bug. I filed a connect item on it here: https://connect.microsoft.com/SQLServer/feedback/details/769338/update-statistics-does-not-cause-plan-invalidation#.

UPDATE – 12:55 (yes, only 2 hours after I wrote this).

It’s NOT a bug, it’s BY DESIGN. And, it actually makes sense.

If the plan should NOT be invalidated (directly due to statistics because the data has NOT changed) then it won’t. But…
If the plan should be evaluated (statistics have been updated AND data changed) then it will.

The key point is “data changed.” An update statistics ALONE will not cause plan invalidation (which is STILL different behavior from 2005/2008/2008R2) but it’s the best of both worlds IMO. Only if at least ONE row has been modified then the UPDATE STATISTICS *WILL* cause plan invalidation.

UPDATE 2: The key point is that there might still be some false positives and I’d still rather than people try sp_recompile first but it’s good that UPDATE STATISTICS will cause plan invalidation. But, it’s still a tad different than prior versions… interesting for sure.

A simple workaround is to use sp_recompile tablename at the end of your maintenance script but be aware that running an sp_recompile against a TABLE requires a schema modification lock (SCH_M). As a result, this can cause blocking. If you don’t have any long running reports (or long running transactions) at that time though, it should be quick and simple.

And, stay tuned on this one. In a later CU you should be able to remove the sp_recompile AND you won’t need to worry about the database option either (yeah!).

Thanks for reading,

kt

Stored procedures, recompilation and .NetRocks

Last week I visited the .Net User Group in NY where .NetRocks was recording as part of their Visual Studio Road Trip…

What a great time and a great group! Always fun visiting NY but even more fun when I present to a group that really gets into the topic. I guess I had something to do with it having chosen procedures and recompilation to kick off my part of the discussion… But, still, a fun group for sure! And, why did I choose stored procedures and recompilation?

Every developer that works with SQL Server has to access their data in SOME way… how? Adhoc SQL, prepared statements (like sp_executesql) or stored procedures. To be honest, none are perfect. You shouldn’t always use adhoc. You shouldn’t always use prepared statements… And, dare I say – you shouldn’t always use stored procedures? In fact, I kicked off the evening with the starting statements that SQL Server is a general purpose RDBMS. You can do anything. But, does that mean that each feature is perfect for every use, all the time? Basically, what I said is that you should never say always and never say never. ;-)

Having said that, I do – strongly - believe that you can be the most successful using stored procedures. But, that’s predicated on the fact that you understand how they work. It’s predicated on the fact that you understand that recompiling a plan is NOT always a bad thing. Why? Because SQL Server “sniffs” the parameters passed and chooses the execution plan based on those parameters. It’s that plan (defined by those parameters) that gets saved (in cache) and reused for subsequent executions. If that plan is not good for ALL executions then you start to have parameter sniffing problems.

The end result – reusing a plan is not always good and recompilation is not always bad.

So, what can you do?

To be honest, this is a HUGE discussion and there are LOTS of tangents. In IE2 (our Immersion Event on Performance Tuning), I spent an entire day on the plan cache and optimizing for procedural code. But, for my pre-session (prior to recording .NetRocks), I chose to discuss ONE set of options that can be VERY helpful to reduce parameter sniffing problems. This discussion was around statement-level recompilation and SQL Server offers 3 special things that you can add to a statement to define how its plan should be handled. There are still other things that could change the behavior but simply put, I’ll go through 5 different behaviors here:

  • Default behavior
  • OPTION (RECOMPILE)
  • OPTION (OPTIMIZE FOR (@param = value))
  • OPTION (OPTIMIZE FOR UNKNOWN)
  • Using variables to obfuscate parameters

And, I have a script that will help you to go through these different scenarios. Most importantly, do not end up using ONE of these ALL the time. Remember, ALWAYS is NOT the right way to deal with performance problems.

Having said that, I know all of you have A LOT to deal with. So, where do you start? How do you begin?

First, and foremost, do your analysis on the top 10 stored procedures that essentially meet these criteria:

  1. The performance of the stored procedure wildly varies (from only a second to minutes – or at least fast to not fast as all). And, maybe it’s more like this: MOST of the time the procedure runs well but occasionally the performance really tanks. As a result, you UPDATE STATISTICS and that seems to solve the problem. Hmmm… in actuality, it might not have been the statistics that were the problem. A side-effect (most of the time) of updating statistics, is that the plans associated with them are invalidated. On next execution a new plan will be generated (after sniffing the parameters). And, if the next execution uses a more typical parameter then a more typical plan will be generated. This might be why MOST of the time it seems to be fine. Next time, instead of updating stats, consider doing sp_recompile procname. This will invalidate the proc’s plan. If this works, then you know that you need to look more closely at how that plan gets generated and whether or not it’s even good to save that plan.
  2. The stored procedure returns wildly varying results sets (sometimes it returns only a few rows, other times it returns thousands [or tens of thousands] of rows)
  3. The stored procedure is used frequently and the stored procedure has at least one parameter (in many cases the worst performing procs are those that have many parameters)

Once you know that you have a problem investigate what should be recompiled. In general, you want to recompile the smallest amount possible to solve the problem. But, how do you know what should be recompiled? Testing!

Usually, I’ll test a procedure running multiple executions, each with different parameter values that generate wildly different result sets, and I’ll execute each of these using WITH RECOMPILE. Specifically, it will look like this:

EXEC procedure @param1 = value, @param2 = value, @paramn = value WITH RECOMPILE

When a bunch of these are executed, I’ll review their graphical plans. What I’m looking for is the most expensive statement and whether or not it has the SAME plan across the different parameters used. To be honest, you don’t even care what the plan is but you do care if it varies. To OPTIMALLY find the data for a query that returns 1 row the plan might be very different from a query that returns 10,000 rows. And, if the OPTIMAL plans vary then maybe it’s not a good idea to save the plan (which is the default behavior). And, this is what leads to parameter sniffing problems (PSP).

Instead of letting the default behavior just happen, you have a few options.

Using OPTION (RECOMPILE) [available in SQL Server 2005 and higher]

This can be a great way of telling SQL Server NOT to save a plan for a particular statement. However, this causes EVERY execution to go through recompilation (which has a negative cost as well, mostly in CPU but this can also translate to time). So, you don’t really want to do this for everything. Do NOT let this become a crutch (or a “go to” option) that gets used anytime there’s a problem. Use this sparingly. But, this can be a FANTASTIC way to deal with PSP.

Using OPTION (OPTIMIZE FOR (@param = value)) [available in SQL Server 2005 and higher]

Because the cost of recompilation can become a problem, you might want to choose an alternative. In this case, you can tell SQL Server NOT to sniff the parameter(s) passed in and instead, use parameters that you supply (and define INSIDE the stored procedure). This can reduce the cost of recompilation but be careful, you have to make sure you choose parameters that are really good for ALL executions (or, at least the executions that are either the most likely or the most time critical). This is incredibly powerful but could become problematic down the road as data changes. But, I like this option MORE than I like hard-coded plans because this option WILL change as the data/statistics change. So, this might be a great option to consider.

OPTION (OPTIMIZE FOR UNKNOWN) [available in SQL Server 2008 and higher]

I remember when this option first came out. I remember thinking – what do they mean “unknown.” Do they “guess?” That just didn’t seem right… Nope, it’s not a guess. But, it’s not going to be overly obvious to most because it requires a deeper understanding statistics in SQL Server. See, parameter sniffing really translates into – they use the histogram to estimate rows which in turn is used to pick the plan. What optimize for unknown does is it doesn’t use the histogram. They do NOT look at the parameters passed in and instead they rely on something else called the density vector. Simply put, the DV is the AVERAGE. So, instead of looking up how many rows they specifically think the parameters you’ve passed in will return, they look at the average number of rows returned for that column. Then, they create a plan with those numbers. The idea is that this will give you an average plan rather than a plan tied to specific parameters that might be anomalies. But, this can work phenomenally when your data is either evenly distributed OR the average really does work well for the normal parameters used. If you have heavily skewed data then this might not give ideal results.

Using variables to obfuscate parameters [available in any version really ... ]

This is not an official way to deal with the problem but some folks have found that using this “works” and/or solves their problems. What is this doing? Well… it’s actually doing EXACTLY the same thing as OPTIMIZE FOR UNKNOWN. During compilation the value of a variable (as opposed to a parameter) is unknown. If they don’t know the value, what can they use? They really don’t have much of a choice except to use the average.

OK… so, now – how do you see all of this in action?

(1) Check out my 30 minute presentation from the user group. Here’s the presentation itself: dnrRoadTripp_StoredProcs.pdf (518.36 kb)

(2) Then, consider checking out the video (this isn’t super high quality as it was just from one of the attendees that was at the presentation but it turned out pretty well actually): http://neuronspark.com/optimizing-procedural-code/

(3) Next play with the demo script:

(3a) First, download the sample credit database for 2000 and higher here: http://www.sqlskills.com/PastConferences.asp. Use the 2000 version for 2000, 2005 or 2008. Use the 2008 version for 2008, 2008R2 and 2012.

(3b) Then, use this script to walk through these options:
RecompilationParameterSniffing&Unknown.sql (4.58 kb)

And… that should be it for this one!

If you want to hear the offical .NetRocks show that was recorded AFTER this lecture/discussion, check out .NetRocks.com. And, I’ll post a link here once it’s been published (which should be later this week). And, if you want to hear more of my fun times with DNR, check out some of our past shows. I blogged a list of our past shows here: Getting ready for DotNetRocks tonight (in NYC).

Finally, if any of you are in Vegas next week – I’m delivering an ENTIRE day (Monday, October 30) on Optimizing Procedural Code as a preconference workshop at SQLConnections.

Thanks for reading!
kt

Getting ready for DotNetRocks tonight (in NYC)

I've had quite the past with Richard Campbell and Carl Franklin and we're about to do it again… a dotNetRocks that is! They're traveling the country in an RV and stopping at all sorts of places to talk about development best practices and VS 2012. If you have the time – you should definitely check them out in person!

And, since I always have so much fun with these two I thought I'd look back on a few of the past shows we've done together. Here's a list of them if you have a bit of time to burn (or need something on your iPod for your next run ;-).

My shows with DNR

Paul's shows with dotNetRocks

My shows with RunAsRadio:

I'm looking forward to another evening of fun with these guys. I never know what to expect!

Cheers,
kt

Presentation Skills – How to Create a Connection

Erin Stellato (blog | twitter) asked on her SQLskills blog for comments/recommendations on presenting (be sure to check out the comments/links). She wrote a fantastic post on her favorite recommendations for new speakers here. And, I thought I'd add a couple of quick recommendations as well.

First and foremost, if you want to create a connection with your audience choose something for which you have a connection. Choose something that you've struggled with and that you're passionate about. And, my main recommendation for how to really connect with the audience is to show some empathy. There's a reason that these folks are in your session. They want to know something. And, if you struggled then they have too. Often, I like to write my presentation in the same way I learned something. I'll start with what I may have thought (like what an index is generally defined to do – "help queries") but then I'll go into what I learned it *really* does. And, ideally, *how* it does it. These are not always the easiest of things to teach (e.g. internals) but when you can open up the hood and show how something works you often take the mystery out of something and that – in and of itself – makes the lightbulb go on for many folks. Highlighting YOUR stuggles also makes you human. We often look unapproachable on-stage, as if we can achieve magic. But, we all put our pants on the same way and subsequently, if you can relate to them (and they with you) then you'll all enjoy yourselves A LOT more. There's nothing I can't stand more than being talked down to from a presenter (and I've seen it done time and time again) OR being talked to as if I don't know anything.

So, to summarize:

  • Find a subject for which you are passionate
  • Go back in time to when you first learned that subject and remind yourself of where/why you struggled
  • Highlight these struggles in your presentation (what did you do right, what did you do wrong, how did you ultimately get past this)
  • Talk to your audience as PEERS (they are!)

And, have fun!

Thanks for reading,
kt

PS – Part of my delay in posting is that I've been going through some dental nightmares… I grind my teeth and subsequently cracked one of my back molars (even though I wear my nightguard religiously). This led to an inlay (temp for 2 weeks and then the permanent last Wed) but in the interim got infected (or, maybe it was before the process started?) which led to an emergency root canal (NOT a happy Friday) and then a second emergency visit to the dentist for stronger antibiotics and more shots of Novocain than I can remember EVER having in my life. I had 6 on Friday (of two types because I kept "feeling" the root canal) and then 2 more on Saturday just to keep the throbbing at bay.

Anyway, it all reminded me of a polar bear that I photographed a few months ago in the Arctic (this was North of Svalbard [Norway] up in the pack ice). So, I thought I'd leave you with a picture of what I've been doing for the past couple of days:

20120714  mg 5445 Presentation Skills   How to Create a Connection

Upcoming events for 2012

Hey there readers – I know, I'm horrible at blogging. It's been a very busy year actually. First of all, I have to work with Paul… and Jonathan… and [update] Glenn…  that doesn't help. ;-) ;-) ;-)

Seriously though, it's been hard to "do it all" – we've had a bunch of Immersion Events (which I organize – and which takes a large amount of time… wow!), and we've been planning our 2013 schedule for IEs, plus I also act as the CFO (handling all of the financials for the company of now 7 total employees – this has been exciting to say the least [we have the most amazing team!!!]). And, I'm also working on our courses for Pluralsight (I should have my first one out there soon). Then, add in keeping up with SQL Server and everything that's going on technically (which is a job in and of itself, eh?). Oh, and have a life. Which, I've been trying to do a bit more of. I've been trying to work out and get a bit healthier and it's slowly paying off (I'm down 20 lbs but I have a scary 50 more to go…). So, something had to go and well, blogging is an easy thing for me to cut. Why? I'm very self-conscious of my writing and as a result, I'm SUPER slow at it (I can't even tell you how many times I re-read/re-write a post). So, blogging is actually a bit painful (and possibly even scary) for me. To make some of you completely laugh – I find blogging WAY worse than standing up and giving a lecture… even one at a large event. I find it easier to present than to write. No, seriously. I do.

But, I digress. I guess I just wanted to let you know why it's been so long since I last wrote a *real* blog post. And, this one probably isn't going to qualify as a real one either as it's just about upcoming events. But, a couple of them are FREE (one in Omaha and one in NY [with other events all over the country]) and all of them will be fun. So, if you're in any of these locations or are still planning some of your training budget for the remainder of the year – I hope to see you at one of these events!

Tuesday evening October 9, Omaha, NE – Omaha SQL/BI User Group
@PaulRandal, @SQLPoolBoy, @JosephSack and I will all be presenting some of our favorite features in a fun evening UG hosted by FCSA. Here's the link:
http://www.omahamtg.com/Events.aspx?ID=174

Thursday evening, October 18, New York City, NY – DotNetRocks Road Trip
I will be chatting with my DNR (dotNetRocks) friends at the Microsoft offices in NYC (one of my favorite places!). But, there are TONS of these Road Trip (or is it Tripp?) stops all over the country. Check out the #dnrRoadTrip here: http://www.dotnetrocks.com/roadtrip.aspx and sign up for our FREE NY event here: https://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032529023&Culture=en-US&community=0.

Sunday, October 28 through Friday, Nov 2, Las Vegas, NV – SQLConnections
Sunday, October 28 – Precon workshop with @SQLPoolBoy and I: SPR201: Leveraging SQL Server 2012 Features and Tools (9:00 AM – 4:00 PM)
Monday, October 29 – Precon workshop: SPR202: Optimizing Procedural Code: How to Create High Performance Procedures (9:00 AM – 4:00 PM) 
Tuesday, October 30 – Microsoft Day
Wednesday-Thursday – Conference sessions…from @PaulRandal, @SQLPoolBoy, @BrentOzar, @SQLAgentMan, @GunneyK and a bunch of other great folks. It will be a great time for sure. And, what happens in Vegas…
Friday, November 2 – Postconference workshops: SPS302: How to Make SQL Queries Faster (9:00 AM – 4:00 PM) by @BrentOzar and SPS201: MDX, DAX, and DMX: Understanding the Languages of BI (9:00 AM – 4:00 PM) by @StaciaMisner

And, that's it for our (@PaulRandal and I) events for the rest of the year. No, we won't be at PASS this year (but @ErinStellato and @GlennAlanBerry from SQLskills will be!!). To be honest, when they sent out the speaker requests, we were pretty burned out. And, with Connections only the week prior, we just couldn't do another event this fall. But, that's certainly another SQL event for many of you. And, a fun one for sure.

So, thanks for reading. I do plan on resuming my technical posts. I do have quite a backlog… even a couple of where in the world posts (with lots of photos) are on that list!

Again, thanks for reading!!
kt

SQLskills training goes online worldwide (and free in September!)

We can make SQL Server scale really well, but we can’t scale ourselves.

Although our Immersion Events are extremely popular, and in high demand, we can’t spend all year traveling and teaching, and we can’t come to everyone’s favorite location, so we have to limit the number that we can do. Our 2013 public training schedule has been solidified, all the contracts are signed, and we’ll be releasing the four locations (Tampa, Chicago, London, Bellevue) for registrations later this week.

But in addition to in-person training, how can we create a scalable way to get really good SQL Server training to all the people across the world who want it?

The only answer is to put our training online.

We’re not talking about recording our Immersion Events – those will remain an in-person, intense, learning experience for those who want them. We’re not moving away from our Immersion Events. Instead we’re allowing those who cannot afford the time, travel, or cost of an Immersion Event to benefit from our knowledge. And for those who do attend our Immersion Events, for that unique experience, the online training will complement and enhance the classroom learning. Indeed, anyone who attends our Immersion Events from 2013 onwards will also receive access to our online training, creating a perfect, hybrid learning solution. We’ll also be arranging something similar for past attendees too.

We’re talking about recording our entire corpus of SQL Server, virtualization, hardware, I/O subsystem, HA, DR, design, troubleshooting, performance, and more knowledge, literally many hundreds of hours, over the next 3-4 years and making it all available for anyone around the world to watch as easily as possible. Anyone will be able to learn from us from now on, in the manner and pace that they choose.

And we’re doing it through a new strategic partnership with our good friends at Pluralsight, led by Aaron Skonnard. As we see it, Pluralsight is the leader in providing highest-quality, online technical training that’s accessible across the world, using a very simple interface and with all kinds of pricing models to fit everyone from individuals to large corporations, from as little as US$29/month. We will be developing a core SQL Server curriculum for Pluralsight users and rapidly populating it with 3-4+ hour courses on everything to do with SQL Server. This will add to the more than 200 courses on other development and IT topics that Pluralsight offers, expanding at a rate of 20 new courses per month!

To celebrate the start of this great partnership, Pluralsight is offering you all complete access to three of our SQL Server courses, for free for 30 days, starting this week – with absolutely no obligation.

The three courses you can watch for free, each of which are 4+ hours long, are:

  • SQL Server: Performance Tuning Using Wait Statistics (by Paul
  • SQL Server: Transact-SQL Basic Data Retrieval (by Joe)
  • SQL Server: Collecting and Analyzing Trace Data (by Jonathan)

To sign up for this, all you have to do is follow SQLskills and Pluralsight on Twitter and you get sent a registration code, then sit back and learn. The special-offer sign-up is available until September 11th, and then you have 30 days from when you sign up to view the three courses.

Sign up at: http://pluralsight.com/training/TwitterOffer/sqlskills

We have six more courses coming online in September, one from each of us, with at least another six more coming before the end of the year. The September courses are:

  • SQL Server: Understanding Logging, Recovery, and the Transaction Log (by Paul)
  • SQL Server: Database Design Mistakes and Solutions (by me)
  • SQL Server: Extended Events Basics (by Jonathan)
  • SQL Server: Transact-SQL Basic Data Modification (by Joe)
  • Server Hardware Fundamentals (by Glenn)
  • SQL Server: Benchmarking and Baselining (by Erin)

We are extremely excited to be partnering with Pluralsight to make our collective knowledge accessible to the world, and we hope you’re excited to learn from us!

Thanks as always,

Paul and Kimberly

Understanding LOB data (2008/2008R2 & 2012)

I know it’s been a while since I last blogged here on SQLskills. It’s been a busy few months with Immersion Events, conferences and even a bit of personal travel in there. And, we’ve also been blogging very regularly (weekly) at SQLServerPro here: Kimberly & Paul: SQL Server Questions Answered. And, this is what’s brought me to write this post. A few days ago, I blogged What about moving LOB data? and in part I discussed some of the issues moving SQL Server LOB data using OFFLINE operations. In part II (What about moving LOB data while keeping the table online?), I’m continuing the discussion with some additional points for ONLINE moves.

However, I also wanted to make sure that people fully understood how SQL Server LOB data works with regard to the physical location within the database… So, I created a script to walk you through it. This post will show you the specifics…

First, we’ll create a TestLOB database. Please note that this script drops TestLOB if it exists. If you already have a database with this name, it will be removed. Additionally, you will probably need to change the directory structure to something more appropriate on your system. These paths are the default data/log locations for SQL Server 2012 running as a default instance (MSSQLServer).

SET NOCOUNT ON
GO

USE [master]
GO

IF DATABASEPROPERTYEX('TestLOB', 'status') IS NOT NULL
BEGIN
    ALTER DATABASE [TestLOB] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE [TestLOB];
END
GO

CREATE DATABASE [TestLOB]
CONTAINMENT = NONE
ON  PRIMARY
(   NAME = N'TestLOBPrimary'
, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\TestLOBPrimary.mdf'
, SIZE = 5120KB , FILEGROWTH = 1024KB ),
FILEGROUP [FG1]
(   NAME = N'FG1File1'
, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\FG1File1.ndf'
, SIZE = 61440KB , FILEGROWTH = 20480KB ),
(   NAME = N'FG1File2'
, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\FG1File2.ndf'
, SIZE = 61440KB , FILEGROWTH = 20480KB ),
FILEGROUP [FG2]
(   NAME = N'FG2File1'
, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\FG2File1.ndf'
, SIZE = 61440KB , FILEGROWTH = 20480KB ),
(   NAME = N'FG2File2'
, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\FG2File2.ndf'
, SIZE = 61440KB , FILEGROWTH = 20480KB ),
FILEGROUP [FG3]
(   NAME = N'FG3File1'
, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\FG3File1.ndf'
, SIZE = 61440KB , FILEGROWTH = 20480KB ),
(   NAME = N'FG3File2'
, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\FG3File2.ndf'
, SIZE = 61440KB , FILEGROWTH = 20480KB )
LOG ON
(   NAME = N'TestLOBLog'
, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\TestLOBLog.ldf'
, SIZE = 102400KB , FILEGROWTH = 10240KB);
GO

USE TestLOB;
GO

Next, I’ll make the FG1 filegroup the default filegroup. For all objects that are not explicitly defined (using the ON clause), they will be created in FG1.

ALTER DATABASE TestLOB
MODIFY FILEGROUP FG1 DEFAULT;
GO

We’ll start by creating the TestLOBTable table within the FG1 filegroup

CREATE TABLE TestLobTable
(
c1  int identity,
c2  char(8000)      default 'this is a test',
c3  varchar(max)    NULL
); -- will be created on FG1
GO

We can verify the filegroup location for this object by using sp_help (review the results of the 5th grid returned).

Next, we’ll add 1000 rows to the table. Each row is basically 8kb because of the size of the c2 column. For these inserts, we will insert data for c1 and c2 but the value for c3 is NULL.

INSERT dbo.TestLobTable DEFAULT VALUES;
GO  1000 -- create an 80MB table

To review the size (as well as the location) of this data, we’ll run a quick query against sys.dn_db_file_space_usage (which works for ALL databases in SQL Server 2012 prior to 2012 this DMV only returned space usage information for tempdb [DBID = 2]).

Since there are no other objects in the database this very generic view of the location will be just fine:

SELECT *
FROM sys.dm_db_file_space_usage;

Or, if you’re on a version prior to SQL Server 2012 – you can look at the data space id set for the table (from sys.tables). Here’s the query you need:

SELECT t.name AS [Table Name]
, lob_data_space_id AS [Data Space ID]
, ds.name AS [Data Space Name]
FROM sys.tables AS t
INNER JOIN sys.data_spaces AS ds
ON t.lob_data_space_id = ds.data_space_id;
GO

Even if we create a clustered index on the table, the data doesn’t change filegroup. However, SQL Server does reorder the data within FG1 to match the key order defined by the clustering key. So, this operation can take quite bit of time and log space (depending on recovery model).

CREATE UNIQUE CLUSTERED INDEX TestLobTableCL
ON TestLobTable (c1);
GO

Again, we can verify that the object still lives on the FG filegroup by using sp_help (5th grid).

Now, things will get a bit more interesting…

In SQL Server 2012, you can rebuild a table even if it has LOB data. So, you might think that the following will move the entire table… let’s see:

CREATE UNIQUE CLUSTERED INDEX TestLobTableCL
ON TestLobTable (c1)
WITH (DROP_EXISTING = ON, ONLINE = ON)
ON FG2;
GO

Now, check the results of sp_help. sp_help says it lives on FG2. However, we curently don’t have any data in the LOB portion of this table. What if we add some??

UPDATE TestLobTable
SET c3 = REPLICATE (convert(varchar(max), 'ABC'), 8000) -- creates a 24 row
WHERE c1 % 17 = 0; -- only 58 rows
GO

Now, re-check the results from both sp_help and the LOB query. sp_help still thinks we’re writing to FG2 but that’s correct because sp_help ONLY returns information about the in_row portion of the table. Even if you had specified a location for LOB data using TEXTIMAGE_ON when creating the table… sp_help would still ONLY reference the location for the in_row structure.
The only way to know where SQL Server LOB data is going is to use the query above. And, nothing you do can move the LOB data using just a standard rebuild.

So, let’s try to rebuild the object again…

CREATE UNIQUE CLUSTERED INDEX TestLobTableCL
ON TestLobTable (c1)
WITH (DROP_EXISTING = ON, ONLINE = ON)
ON FG3;
GO

And, recheck with our queries. Again, our in_row data has moved but not our LOB data.

My SQLServerPro blog post mentioned that you do have one option using partitioning but a great point is that this requires Enterprise Edition. So, it’s not ideal. If you are running on Enterprise Edition (or Enterprise Eval/Developer Edition), the scripts continue and show how to go throught he move process using a partition function/partition scheme.

And, in this week’s SQLServerPro blog post I’m going to give you other options – that can be done ONLINE without Enterprise Edition (thanks to a cool trick by Brad Hoff that I just learned too)!

However, the end result is that moving around and working with LOB data is more challenging. To be honest, if you plan to have a lot of LOB data, you will want to be very strategic with how you design that data/table (even considering better isolation for it) so that you have more options available later if you want to move it to another hard drive.

Having said all of that, it has gotten A LOT better now that SQL Server 2012 supports online operations for indexes where a LOB column is included. So, that’s a great start. But, moving LOB data is non-trivial and really needs to be thought out properly especially if you have a lot of data to move/migrate.

Thanks for reading!
kt

Removing duplicate indexes

SQL Server has a horrible habit of letting bad habits move forward (for the sake of backward compatibility [e.g. autoshrink]). And, I do understand *some* of the reasons for allowing some of these “features/options” to move forward. However, there are a few that frustrate me and I’ve talked about this one quite often.

SQL Server lets you create completely redundant and totally duplicate indexes.

Yes, you can even create the same index 999 times! Try it:

SET NOCOUNT ON
GO

CREATE TABLE TestTable
(
    col1 int  identity
);
GO

DECLARE @IndexID smallint,
        @ExecStr nvarchar(500);

SELECT @IndexID = 1;

WHILE @IndexID <= 999
BEGIN
    SELECT @ExecStr = 'CREATE INDEX [Test' + right('00' + convert(varchar(5), @IndexID), 3) + '] ON TestTable (col1)';

    EXEC(@ExecStr);

    SELECT @IndexID = @IndexID + 1;
END
GO

SELECT count(*)
FROM sys.indexes
WHERE object_id = object_id('TestTable');
GO

--DROP TABLE TestTable
GO

For somewhat obvious reasons having 999 duplicate indexes does seem completely useless… so, why does SQL Server allow it at all? And, what’s the backward compatibility argument? Well, I’ve written about that before on our SQL Server Magazine Q&A blog (Kimberly & Paul – Questions Answered): Why SQL Server Lets You Create Redundant Indexes.

Having said that – and, for those of you that are only checking YOUR code, you don’t use index hints and/or you don’t care about backward compat – then, you can use my proc to help you find duplicate indexes (and drop them)! It’s all based on my “sp_helpindex rewrites” stored procedures but I had to do some “tweaking” to get the procedures to understand that the order of the columns in the INCLUDE list are irrelevant (in terms of order). So, you can’t use the sps as written. They give you the exact (and correct) structure – and, that’s exactly how SQL Server stores them. For example, the following two indexes have a different structure. However, they are completely redundant and you do NOT need both.

CREATE INDEX Test1 ON dbo.Member (LastName)
INCLUDE (FirstName, MiddleInitial);

CREATE INDEX Test2 ON Member (LastName)
INCLUDE (MiddleInitial, FirstName);
GO

Here’s what you need:

  1. First, my modified sp_ that exposes all of the columns in all levels of the index. This one does NOT replace the one needed by sp_SQLskills_SQL2008_helpindex. You need both. This version is used to reorder the INCLUDEd columns so that the “finddupes” proc can remove indexes that are different solely because of the order of the columns in the INCLUDE clause (which is irrelevant for navigation/index usage).

    Install this one first:
    20110715_sp_SQLskills_ExposeColsInIndexLevels_INCLUDE_UNORDERED.sql (6.95 kb)

  2. Second, my modifed sp_helpindex that is used SOLELY by finddupes. Basically, it’s just modified to use the UNORDERED version of “exposing the columns” in the nonclustered indexes. Again, this does NOT replace the other sp – you need both.

    Install this one second:
    20110715_sp_SQLskills_SQL2008_finddupes_helpindex.sql (10.95 kb)

  3. Finally, here’s the “finddupes” procedure

    Install this one last:
    20110720_sp_SQLskills_SQL2008_finddupes.sql (5.30 kb)

  4. NOTES: At some point in the not-too-distant future I’ll modify this for the following:

    1) One set of code that will work for 2005/2008
    2) I’ll combine the two versions of the “expose columns in index levels” so that there’s only one.
    3) I’ll make this work for indexed views

Finally, how can you use this:

To find the duplicates for just one table – you can use one-part or two-part naming:

USE [AdventureWorks2008];
GO

EXECUTE sp_SQLskills_SQL2008_finddupes @ObjName='Production.Document';
GO

OR… to find all duplicates in the entire database:

USE [AdventureWorks2008];
GO

EXECUTE sp_SQLskills_SQL2008_finddupes;
GO

OR… if you’re really motivated, you can run it for all databases – but, before you do – see Aaron Bertrand’s comment to this post for his replacement for sp_msforeachdb and use that instead!!

EXECUTE sp_msforeachdb 'USE ?; exec sp_SQLskills_SQL2008_finddupes';
GO

Have fun and let me know how many duplicates you find?
I found one in AdventureWorks2008 and a few in Northwind. I suspect you’ll find a few! This isn’t something that will find tons of indexes (to drop) but if there are even a couple of dupes (especially on large tables) you’ll have savings in database modification statements, logging, caching, maintenance, storage (and therefore backups, etc.) and this is why dupes are HORRIBLE! I do wish that SQL Server had an option to prevent their being created! Maybe someday?!

Enjoy,

kt

How can you tell if an index is REALLY a duplicate?

There’s a lot of confusion around duplicate indexes and I think a lot of it has to do with how sp_helpindex (or even SSMS) display what’s in an index. What’s actually in the index – and how it’s structured – are not always what they seem. This was the original motivation behind my rewrite of sp_helpindex but even since then, I’ve still seen a lot of confusion. In today’s blog post I’m going to first explain EXACTLY which indexes are the same and which aren’t – as well as the faults in the tools.

So, starting with index structures… (it all starts with internals :) )

The clustered index IS the data. The KEY to the clustered index (what I often refer to as the clustering key) defines the way that the data is ordered (not necessarily truly, physically ordered on disk – but ordered LOGICALLY). And, no, I’m not going to rehash every bit of internals here… just a few reminders.

A nonclustered index is duplicated data (similar to an index in the back of a book). This duplicated data can be used to help reference the actual data (exactly like an index in the back of a book) OR can be used to actually respond to requests (for example, if you’re solely looking for a count of people with a name that begins with ” then an index that has LastName in the index could be used to count them – without actually “looking up” the actual data). So, there are some very powerful uses to indexes. But, alas, this isn’t a post on using indexes or indexing strategies – this is ALL about internals (and understanding the structure of an index). So, I’m going to cut to the chase!

A nonclustered index always has:

  • Key (this is what defines the order of the index)
  • A leaf-level entry (this is the actual data stored in the index + the lookup value* + any included columns) – however, ALL of these columns are only stored once (and they’ve ALWAYS only been stored once here so, even if you reference a column that’s part of the lookup value, SQL Server will NOT duplicate it again).

*So, what is this lookup value?

The lookup value is what SQL Server uses to reference the actual data row. If a table has a clustered index then the lookup value is the clustering key (and EVERY column defined by it). If a table does not have a clustered index (it’s then a HEAP) then SQL Server uses something called a RID. A RID is an 8-byte structure consisting of 2:4:2 bytes which breakdown into 2 for the FileID, 4 for the PageID and 2 for the slot number. And while RID’s (and their history) are very interesting – it’s not entirely relevant here (how they specifically work and/or their structures) but if/when they’re in an index, I’ll list is as RID.

Let’s now put all of this together with an example (or two) starting with a relatively easy one.

USE [JunkDB];
GO

CREATE TABLE Test
(
    TestID  int identity,
    [Name]  char(16)
);
GO

CREATE UNIQUE CLUSTERED INDEX TestCL ON Test (TestID);
GO

CREATE INDEX TestName ON Test([Name]);
GO

sp_helpindex Test
GO

The output shows:

index_name      index_description                               index_keys
TestCL          clustered, unique located on PRIMARY            TestID
TestName        nonclustered located on PRIMARY                 Name

And, while that looks right, it’s horribly misleading. The index that’s on TestName also includes TestID in the index as well. And, not just in the leaf level but in the tree (for ordering purposes). So, that should really show Name, TestID. But, it’s a bit more confusing if you throw in this:

CREATE UNIQUE INDEX TestNameUnique ON Test([Name]);
GO

EXECUTE sp_helpindex Test;
GO

The output shows:

index_name      index_description                               index_keys
TestCL          clustered, unique located on PRIMARY            TestID
TestName        nonclustered located on PRIMARY                 Name
TestNameUnique  nonclustered, unique located on PRIMARY         Name

And, at this point, it doesn’t look like there’s any difference at all between the second and third indexes (well, except that the third index requires that the values be unique – shown in the description). But, as for the “index_keys” they look the same. However, they’re actually not the same (in all parts of the tree). So, this is why I tend to differentiate between the “leaf” and the non-leaf levels of an index (when I describe them). And, it only gets more complicated when you throw in included columns (2005+).

So, how to do you tell the difference? Unfortunately, there are NO tools within SQL Server (or even any 3rd party tools that I know of) that display this through the UI, etc. But, you could start by using my replacement to sp_helpindex. I’ve updated and rewritten it a few times but the latest one will always be in the sp_helpindex reqrites category here: http://www.sqlskills.com/blogs/kimberly/category/sp_helpindex-rewrites.aspx. I personally think that’s the easiest (and is the reason why I originally wrote it!). Using it you can see how the output is more detailed.

The output shows (specifically showing ONLY the last 2 columns):

columns_in_tree              columns_in_leaf
[TestID]                     All columns "included" – the leaf level IS the data row.
[Name], [TestID]             [Name], [TestID]
[Name]                       [Name], [TestID]

Now, we’re getting somewhere. We can definitely see that these two indexes ARE different. A nonunique nonclustered needs to have the lookup value pushed up into the tree (for navigation). A unique nonclustered index does not.

Next, let’s go with a [just slightly more challenging] example:

USE [JunkDB];
GO

CREATE TABLE Member
(
    MemberNo    int     identity,
    FirstName   varchar(30)     NOT NULL,
    LastName    varchar(30)     NOT NULL,
    RegionNo    int
);
GO

CREATE UNIQUE CLUSTERED INDEX MemberCL ON Member(MemberNo);
GO
    
CREATE INDEX MemberIndex1 
ON Member(FirstName, RegionNo, MemberNo)
INCLUDE (LastName);
GO

CREATE INDEX MemberIndex2 
ON Member(FirstName, RegionNo)
INCLUDE (LastName);
GO

CREATE INDEX MemberIndex3 
ON Member(FirstName, RegionNo)
INCLUDE (MemberNo, LastName);
GO

CREATE UNIQUE INDEX MemberIndex4 
ON Member(Firstname, RegionNo)
INCLUDE (MemberNo, lastname);
GO

First, let’s review with sp_helpindex:

EXECUTE sp_helpindex Member;
GO

The output shows:

index_name      index_description                           index_keys
MemberCL        clustered, unique located on PRIMARY        MemberNo
MemberIndex1    nonclustered located on PRIMARY             FirstName, RegionNo, MemberNo
MemberIndex2    nonclustered located on PRIMARY             FirstName, RegionNo
MemberIndex3    nonclustered located on PRIMARY             FirstName, RegionNo
MemberIndex4    nonclustered, unique located on PRIMARY     FirstName, RegionNo

Looking solely at sp_helpindex it LOOKS like the first nonclustered index is different from the others and that the 2nd, 3rd and 4th nonclustered indexes are the same. However, that’s actually NOT true. Next, let’s use my version of sp_helpindex.

EXECUTE sp_SQLskills_SQL2008_helpindex Member;
GO

The output (again, just showing the last two columns of output):

columns_in_tree                           columns_in_leaf
[MemberNo]                                All columns "included" – the leaf level IS the data row.
[FirstName], [RegionNo], [MemberNo]       [FirstName], [RegionNo], [MemberNo], [LastName]
[FirstName], [RegionNo], [MemberNo]       [FirstName], [RegionNo], [MemberNo], [LastName]
[FirstName], [RegionNo], [MemberNo]       [FirstName], [RegionNo], [MemberNo], [LastName]
[FirstName], [RegionNo]                   [FirstName], [RegionNo], [MemberNo], [LastName]

From this, you can see that all 4 indexes have the same leaf level but index 4 has a slightly different tree structure. Ultimately, it’s indexes 1, 2 and 3 that are the same and index 4 is actually [slightly] different. How they’re different (outside of the fact that the 4th index guarantees uniqueness) is a bit beyond the scope of this post. But, yes, there are some [relatively minor in this case] differences. And, since I’m ONLY looking for indexes that are the same then only 1, 2 and 3 fit that requirement.

And, things get MUCH more complicated if you have multicolumn clustering key and/or more complicated INCLUDEs.

Having said that – how do you find duplicate indexes?

Well… I had started this by putting together a simple way for you to check for dupes with my version of sp_helpindex but then I figured out an issue with included columns. I’m showing the structures as they’re defined (and stored). But, in terms of usage – the order of the columns in the INCLUDE does not matter. As a result, two indexes with different orders for their included columns will show as two different indexes (technically they are and they are stored differently). However, there are absolutely NO differences (in terms of usage) so I’ll need to write code to adjust for that (to truly find duplicates).

For now, here’s some quick code to get you closer. I had written some of this when we discussed this in our London Immersion Event. However, I’ve tweaked it even further here after thinking about some interesting exceptions. This code will find absolute duplicates (where the order of the structure is completely identical). To use it you’ll need to enter in the schemaname and tablename (in bold below).

IF (SELECT count(*) FROM #FindDupes) IS NOT NULL
    DROP TABLE #FindDupes;
GO

CREATE TABLE #FindDupes
(
 index_id           int,
 is_disabled        bit,
 index_name         sysname,
 index_description  varchar(210),
 index_keys         nvarchar(2126),
 included_columns   nvarchar(max),
 filter_definition  nvarchar(max),
 columns_in_tree    nvarchar(2126),
 columns_in_leaf    nvarchar(max)
);
GO

DECLARE @SchemaName sysname,
        @TableName  sysname,
        @ExecStr    nvarchar(max);

SELECT @SchemaName = N'schemaname',     — MODIFY
       @TableName = N'tablename';       — MODIFY
        
SELECT @ExecStr = 'EXECUTE sp_SQLskills_SQL2008_helpindex ''' 
                    + QUOTENAME(@SchemaName) 
                    + N'.' 
                    + QUOTENAME(@TableName)
                    + N'''';

INSERT #FindDupes
EXEC (@ExecStr);

SELECT t1.index_id, COUNT(*) AS 'Duplicate Indexes w/Lower Index_ID', 
        N'DROP INDEX ' 
            + QUOTENAME(@SchemaName, N']') 
            + N'.' 
            + QUOTENAME(@TableName, N']') 
            + N'.' 
            + t1.index_name AS 'Drop Index Statement'
FROM #FindDupes AS t1
    INNER JOIN #FindDupes AS t2
        ON t1.columns_in_tree = t2.columns_in_tree
            AND t1.columns_in_leaf = t2.columns_in_leaf 
            AND ISNULL(t1.filter_definition, 1) = ISNULL(t2.filter_definition, 1) 
            AND PATINDEX('%unique%', t1.index_description) = PATINDEX('%unique%', t2.index_description)
            AND t1.index_id > t2.index_id
GROUP BY t1.index_id, N'DROP INDEX ' + QUOTENAME(@SchemaName, N']') 
                    + N'.' 
                    + QUOTENAME(@TableName, N']') 
                    + N'.' + t1.index_name;
GO

I’m planning to bulid a sproc around this next as well as get rid of cases where the included columns are just in different orders next. It will be another sp_ for master and I do hope to wrap it up shortly! That code will REALLY find ALL unnecessary duplicates and help you to drop them! Stay tuned!!

Cheers and thanks for reading!!

kt