How much does that key cost? (plus sp_helpindex9)

OK, two in two days? What's wrong with me (is what Paul said) after he said – who are you and where is my wife?

But, this one is a short one, specifically a follow-on to my prior post about "Disk space is cheap…"

I did a bit of simple math on the internal overhead and costs of inefficient clustering keys and I wanted to share those as well. If a table is clustered then each and every nonclustered index must include ALL columns of the clustering key in [at least] the leaf level of the clustered index. When a nonclustered index is non-unique then it's even worse in that all columns of the clustering key must go all the way up the tree (into the non-leaf levels). I re-wrote sp_helpindex to give you better insight into what was happening in your nonclustered indexes here: A new and improved sp_helpindex (jokingly sp_helpindex8). I've actually re-written it again to include a column that shows whether or not an index is disabled but it still needs the base procedure [sp_SQLskills_ExposeColsInIndexLevels] from the other post – make sure you download that first before trying to use this sp_helpindex9: sp_SQLskills_SQL2008_helpindex (sp_helpindex9).sql (11.06 kb).

Having said that – what's the overhead in terms of some of your bigger tables?

If you have a 10 million row table with 8 nonclustered indexes then *just* the internal overhead is going to cost you the following with keys of the varying sizes:

 

 Description

 Width of CL key

 Rows

 NC Indexes

 MB 

 int

                            4

       10,000,000

               8

           305.18

 datetime

                            8

       10,000,000

               8

           610.35

 datetime, int

                          12

       10,000,000

               8

           915.53

 guid

                          16

       10,000,000

               8

        1,220.70

 composite

                          32

       10,000,000

               8

        2,441.41

 composite

                          64

       10,000,000

               8

        4,882.81

And, what if you have a bigger table – say a 100 million row table with 12 nonclustered indexes… the internal overhead is as follows:

 

 Description

 Width of CL key

 Rows

 NC Indexes

 MB 

 int

                            4

     100,000,000

             12

        4,577.64

 datetime

                            8

     100,000,000

             12

        9,155.27

 datetime, int

                          12

     100,000,000

             12

      13,732.91

 guid

                          16

     100,000,000

             12

      18,310.55

 composite

                          32

     100,000,000

             12

      36,621.09

 composite

                          64

     100,000,000

             12

      73,242.19

So, if you're talking about disk space, IOs, backups, etc. and you have a very inefficient clustering key (say a 5 column "natural key") that's 64 bytes then you have 71.53GB of OVERHEAD where it could have been only 4.47GB. That's a tad over 67GB of wasted disk space, memory (really – do you have 67GB+ of memory to waste??) and what about the long term costs of backing all of this up?? And, I haven't even started to talk about the inefficiencies with the FKs too. Or, the inefficiencies within the nonunique nonclustered index trees as well.

It's a COMPLETE NIGHTMARE!

Please… everyone repeat after me…………….. DESIGN MATTERS!!! :-)

Thanks for reading!
kt

Disk space is cheap…

THAT'S NOT THE POINT!!!

The simple point is that bad (or sloppy/lazy) design cannot be tuned. If you think that data type choice, nullability, keys – don't really matter – you won't scale. It is possible that you may completely fail because of this. Have you ever heard (or possibly said?), let's just get this done – we'll worry about performance later? If you haven't heard it, I'm surprised! I hear this all the time…

Yesterday I gave a lecture at SQLPASS about GUIDs. It wasn't the most well attended (under 200 people) but I suspect that's because of two things: first, our good friend Bob Ward was speaking at the same time (and there were actually a bunch of really good sessions!) AND the simple fact that GUIDs aren't sexy (I agree!). Also, I think that a few folks may have presumed that what I was going to talk about (maybe even solely talk about?) was fragmentation. And, while fragmentation is the most outwardly visable problem with GUIDs – it's by NO MEANS the only problem. And, so I thought I'd blog a few things to think about/remember when trying to design/architect a SQL Server system. Clearly there's got to be a balance between the amount of time you're going to spend on design vs. just "getting it done" but that doesn't mean that NOTHING MATTERS or that you can just do anything with a table and "SQL Server will just 'handle' it." OK, I take that back – SQL Server won't have a choice other than to "just handle it" but some things it just CANNOT handle well. Perform and scalability will suffer and again, your application may fail.

One of the resounding principles of my session (and most of my design talks in general ;-), is that design matters. In fact, in my summary, I said that 3 things really matter in terms of good SQL Server database design/archictecture:

  1. Know your data - this helps you make the right decisions in terms of data types/nullability and churn helps with long term maintenance goals (and initial maintenance plans) 
  2. Know your workload - this helps you make the right decisions about locking/isolation, optimizing procedural code and indexing strategies (and these are the KEY to a FAST and scalable system)
  3. Know how SQL Server works  - this is the one that's often overlooked. And, without information such as "the primary key is enforced by a clustered index and the clustering key is added to ALL nonclustered indexes" then you may inadvertently create a database that grows faster and larger than anticipated where performance slows to a crawl and even management/maintenance becomes a [really HUGE] challenge.

So, while I could go on for ages here I just want to expand on that last point: Know how SQL Server works. Specifically, I want to tie together the PK -> CL Key -> NC Indexes along with the "disk space is cheap" mantra that I also hear ALL THE TIME.

OK – so let's break this down a bit… No matter what your clustered index is – it should be narrow. I do not choose my clustered index for range queries and my choice for the clustering key is NEVER accidental.

Why – because it has a HUGE impact on overall performance. To prove the point (and this was the concept around which my session focused), I created 3 different versions of the SAME "Sales" database. I wanted to show ALL of the impacts of a poorly chosen key – both as CL and really just as a size issue. It's only 12 more bytes than an int, right? What harm can it cause… just wait!

So – to start, I loaded all three databases with roughly 6.7 million rows… and, I made sure everything was clean and contigious so that I'd have the same starting point for all of the tables. I actually strategically started things in one filegroup and then moved things over to another filegroup with 2 files so that I could get some benefits from having multiple files as well (see Paul's excellent post on why a RW filegroup should generally have 2-4 files here: Benchmarking: do multiple data files make a difference?). So, at the initial start I have three databases:

SalesDBInts (inital size with Sales at 6.7 million rows = 334MB):

  • Customers – has an ever-increasing identity (int) PK (4 bytes)
  • Employees – has an ever-increasing identity (int) PK (4 bytes)
  • Products – has an ever-increasing identity (int) PK  (4 bytes)
  • Sales – has an ever-increasing identity (int) PK and FKs to Customers, Employees and Products (row size = 27 bytes)

SalesDBGUIDs (inital size with Sales at 6.7 million rows = 1000MB):

  • Customers – has a randomly generated (using the NEWID() function) GUID PK (16 bytes)
  • Employees – has a randomly generated (using the NEWID() function) GUID PK (16 bytes)
  • Products – has a randomly generated (using the NEWID() function) GUID PK (16 bytes)
  • Sales – has a randomly generated (using the NEWID() function) GUID PK (16 bytes) and FKs to Customers, Employees and Products (row size 75 bytes)

SalesDBSeqGUIDs (inital size with Sales at 6.7 million rows = 961MB):

  • Customers – has a sequentially generated (using the NEWSEQUENTIALID() function) GUID PK (16 bytes)
  • Employees – has a sequentially generated (using the NEWSEQUENTIALID() function) GUID PK (16 bytes)
  • Products – has a sequentially generated (using the NEWSEQUENTIALID() function) GUID PK (16 bytes)
  • Sales – has a sequentially generated (using the NEWSEQUENTIALID() function) GUID PK (16 bytes) and FKs to Customers, Employees and Products (row size 75 bytes)

OK, so here's where the session really starts… I run 10K inserts into the Sales table in each database and then I check and see what happens:

  • 10K rows in SalesDBInts takes 00:17 seconds
  • 10K rows in SalesDBGUIDs takes 05:07 minutes
  • 10K rows in SalesDBSeqGUIDs takes 01:13 minutes

This is already SCARY and should go down into the "Are you kidding me category?" but I also have to add that the hardware and setup for these first few tests are just highlighting a whole myriad of problems. First, I was running with a somewhat crummy setup – a dual-core laptop with only 3GB of memory and this database was on an external USB drive. Certainly not enterprise storage but also not an enterprise size either. For the size of the db we should have been able to do better… wait, we did – with the int-based database things went really well. Only the other two really stunk and the sequential GUID based database definitely faired better than the random (of course – fragmentation, right?). And, yes, that's a part of it… but there's more. And, I thought… no, this can't be right. Let me try again… run 2:

  • 10K more rows in SalesDBInts takes 00:24 seconds
  • 10K more rows in SalesDBGUIDs takes 07:21 minutes
  • 10K more rows in SalesDBSeqGUIDs takes 01:10 minutes

Well, that seems pretty consistent but wow – the random GUID db is really NOT fairing very well… let's try it again:

  • 10K more rows in SalesDBInts takes 00:26 seconds
  • 10K more rows in SalesDBGUIDs takes 10:10 minutes
  • 10K more rows in SalesDBSeqGUIDs takes 01:12 minutes

OK, so you have GOT to be wondering why things are going so horribly wrong? The fragmentation is leading to more page IOs and those also have be put in cache so we're needing a larger and larger cache to handle our GUID database… none of this is good and means you need bigger machines and/or something else to help you out. With the ever-increasing patterns created by the other database we're requiring fewer pages to be read and fewer pages to be cached – these databases are performing somewhat consistently…

OK – so what can we do… let's try FIRST dealing with the fragmentation. To keep it simple, I went to the Books Online for sys.dm_db_index_physical_stats – example D. Using sys.dm_db_index_physical_stats in a script to rebuild or reorganize indexes. This is pretty good but since these databases had never seen a REBUILD (and definitely not a FILLFACTOR setting, I had to tweak the script slightly to include a generic 90% fillfactor). Here's the line that I modified:

SET @command = N'ALTER INDEX ' + @indexname + N' ON ' + @schemaname + N'.' + @objectname + N' REBUILD WITH (FILLFACTOR = 90)'

I ran this in ALL three databases but there wasn't much to do in any of them except for the GUID-based database:

SalesDBInts (5 seconds)
Executed: ALTER INDEX [ProductsPK] ON [dbo].[Products] REBUILD WITH (FILLFACTOR = 90)

SalesDBGUIDs (7:51 minutes)
Executed: ALTER INDEX [SalesPK] ON [dbo].[Sales] REBUILD WITH (FILLFACTOR = 90)
Executed: ALTER INDEX [IX_SalesToProductsFK] ON [dbo].[Sales] REBUILD WITH (FILLFACTOR = 90)
Executed: ALTER INDEX [CustomersPK] ON [dbo].[Customers] REBUILD WITH (FILLFACTOR = 90)
Executed: ALTER INDEX [ProductsPK] ON [dbo].[Products] REBUILD WITH (FILLFACTOR = 90)

SalesDBSeqGUIDs (9 seconds)
Executed: ALTER INDEX [ProductsPK] ON [dbo].[Products] REBUILD WITH (FILLFACTOR = 90)
Executed: ALTER INDEX [CustomersPK] ON [dbo].[Customers] REBUILD WITH (FILLFACTOR = 90)

Then, I ran my inserts again…

  • 10K more rows in SalesDBInts takes 00:25 seconds
  • 10K more rows in SalesDBGUIDs takes 05:05 minutes
  • 10K more rows in SalesDBSeqGUIDs takes 01:07 minutes

OK, that's better… and, it will be more consistent because of the reduction in splits which also helps to keep the table smaller and therefore requires both less disk space as well as less cache. OK, but, there's just still absolutely NO comparison between the sequential and the random, eh? Maybe we need more disk space, faster drives and/or more memory… let's try killing our problem with iron (yes, an oldie but a goodie!). And, yes, this is going to help…

The first hardware change that I made was that I moved these to an internal SSD drive… and, ran my test again. Let's get rid of all the random IO problems. That's got to help, eh?

  • 10K more rows in SalesDBInts takes 00:04 seconds
  • 10K more rows in SalesDBGUIDs takes 01:15 minutes
  • 10K more rows in SalesDBSeqGUIDs takes 01:02 minutes

WOW – THAT's AWESOME… killing it with iron brings it VERY close to the speed of the Sequential GUIDs as we're completely eliminating the random IOs. This makes our backups faster, etc. but it still doesn't reduce the memory required because of the pages that are going to be required on insert. And, if you have a very large table with a lot of historical data that wouldn't have otherwise needed to be brought into cache this is a BIG problem especially for much larger tables.

I had quite a few more stuff in my demos but it brings us to a really good point… what are our options and what should we consider? First and foremost, how much control do you have? Did you design/architect your system and if so – how much work are you willing to put into it from here? Or, is this an application over which you have no control? Let's take the obvious…

If this is an application over which you have no control then you really have only 2 options:

  1. MAINTENANCE (with a good FILLFACTOR)
  2. Consider killing it with iron where the most obvious gains are going to be disk IOs (ie. SSDs for the data portion) and memory…

If this is a system over which you do have control… then, I'd suggest changing the CL key at a minimum. Then, I'd make sure you have good maintenance setup for your nonclustered indexes because those will most certainly be fragmented. Then, I'd slow consider changing over your FKs to use the CL key (identity ideally) and then maybe – eventually – you can remove those GUIDs altogether. But, this is NOT an easy thing to do…

If your CL key is a PK then here are your steps:

  1. Take the database offline (sorry, I’m just the messenger!!)
  2. Disable the FKs
  3. Disable the nonclustered indexes
  4. Drop the clustered PK (alter table)
  5. Optionally, add an identity column?
  6. Create the new clustered index
  7. Create the PK as nonclustered
  8. Enable the nonclustered indexes (alter index…rebuild)
  9. Enable the FKs with CHECK (this is very important)
  10. Bring the database online

And, there are certainly derivitives of this but the long story short is that it's going to be a painful process. And, I know some of you have work to do… so, I'll end this post here as well as give you a few links to help you continue learning about these issues. The more you know about SQL Server the better your database design and the longer your database will be healthy and scalable.

I've spoken about this in many posts and many of our SQLMag Q&A posts:

Can I actually end this post with – enjoy! (remember, I'm just the messenger!! :)
kt

Exciting news for the Microsoft Certified Masters (MCM) Program!

Paul's blogged quite a few of the details here: Big changes to the MCM program and how SQLskills can help you and there's not too much to add except that we'll have some locations and dates to announce soon. And, we're tweaking our current Immersion Event training (even the one in San Diego, CA in December) ever so slightly to correspond with this program as well as to align well with content that's recorded.

So, if you've been thinking of becoming certified but couldn't take 3 weeks off or afford the 18K - these changes are for you!

Cheers and best wishes on certification!
kt

TSQL Tuesday – Why DBA skills are important

tsql2sday150x150 TSQL Tuesday   Why DBA skills are important 

Paul's hosting the T-SQL Tuesday theme this month and the theme revolves around DBA skills. Specifically, "why are DBA skills necessary?" His specific post is here: Invitation to participate in T-SQL Tuesday #12 – Why are DBA skills necessary?.

This is a topic near and dear to me and one that I've found myself debating with others many times. If I were to cut to the chase and try to sum it up in one simple statement I'd say that you need to know and understand any tool in order to make it work effectively. Just as with any other tool… if I were to go and buy a chainsaw (having never touched one before in my life and this is probably a very good thing) and then bring it home and try to work it without reading the instructions, very bad things would probably happen. Maybe I could get it working but if I didn’t and I tried to mess with it, it might end up very badly for me.

Don’t get me wrong, you probably :) won’t lose any limbs while working with SQL Server but inefficiencies can be created in many area if/when database developers and database architectrs don’t have some DBA skills (and vice versa IMO). And, I can think of numerous scenarios in which this has happened not only with in-house applications but also with ISV applications. With an in-house application only your internal customers have problems. This is bad enough but imagine having customers that have paid for your software only to find out that there are shortcomings that could have been avoided through better design?

For this post, I certainly won’t name names but I will list a couple of gotchas that could have minimized (potentially eliminated in some cases) application downtime or if avoided would have improved performance. And, these are the skills that a DBA would have provided…

Online Operations

Yes, this is an Enterprise Edition feature but many enterprises have paid for it only to find out that they can’t leverage it on some of their more critical tables… why? Because the tables were created with LOB columns defined as part of the base table. While this doesn’t seem like it’d be a big problem and many developers would tend to have all attributes within a single table (for a single entity), SQL Server has a limitation for online operations that they cannot be done if the index has a LOB column in it. While nonclustered indexes should never have LOB columns the clustered index will naturally include all columns of the table. As a result, *ANY* table that has even 1 LOB column in it cannot support a clustered index create or rebuild as an online operation. The table will have to be taken completely offline (no inserts, update, deletes OR selects – NO OPERATIONS WHATSOEVER) while the table goes through sometimes frequently necessary maintenance. So, this isn't just a one time problem – it can last for the lifetime of the design/application.

Indexes

I delivered a RunAs Radio interview on indexes a while back and a comment that I’ll never forget was the following: “I have to say that I'm not a database guy. Not even a little tiny bit. But I found the show very interesting so I opened up the enterprise manager and started pokin around. (Just lookin, no touchin) Using the Query Analyzer I discovered that our main database doesn't have a single index. Apparently the devs decided that they wanted the option of putting the tables on any kind of database server so they did not include a single index or key. I was told "all that is done in 'The System'." Good grief.”This “involuntary DBA” was already starting to get some of his DBA skills even though it wasn’t really his job or goal. But, he’s already starting to learn that there are often misconceptions about how database work. And with regard to indexes, this can be very problematic. Indexes are at the center of good database performance and health. However, it’s not just the existence of indexes but knowing the right indexes. This isn’t something that’s easy to know within only a few minutes but there are tools that can help.

To get you started, here are a few links:

To be honest, even experienced DBAs often struggle with indexes. However, there are key choices that must be made early and indexes that really need to be created – in order for a database to be healthy. These are definitely things that a good DBA will know.

And, I guess the list could go on for some time. How about plan cache problems and problems with adhoc access to SQL Server? Or, statement recompilation choices? There are plenty of myths and misconceptions – in fact, check out Paul's blog series that was turned into a 60 page whitepaper here: http://www.sqlskills.com/blogs/PAUL/post/Myths-and-misconceptions-60-page-PDF-of-blog-posts.aspx. The good part about experienced DBAs is that they have often seen these problems and can work with developers/architects to create designs that might not be as prone to these problems. But, it's definitely a good debate – whose job is it anyway? And, I posted exactly this question awhile back here: Whose job is it anyway? and I also did a RunAs Radio interview on this as well: RunAsRadio Interview about Developers and DBAs – whose job is it?.

I think this is a debate that will go on for quite some time AND I do think that developers need some DBA skills and I think that DBAs need some developmemt skills. In some companies these job roles overlap to a point but in some companies they feel that these skills are unnecessary. In the latter, I can already sense problems in performance and scalability and even worse – potential failure.

Someone needs these skills if you want to succeed. SQL Server does NOT do everything for you. NO RDBMS DOES!!! Remember, SQL Server is a general purpose relational database management system. You can do anything with it (which is why I love it but also why so many of my answers start with… well… It depends…).

You must understand the tool and then you can/will use it properly!

Thanks for reading!
kt

Project Phoenix – Getting back on track!

For any of you that might have been negatively impacted by the recent recession and are looking to get back into development, Arnie Rowland (a fellow SQL Server MVP) has created Project Phoenix. Just as a Phoenix has to periodically regenerate itself, you might need to hone your skills and get some tools to help you along. To do this, they're offering an amazing package of software and training – with your help in helping a non-profit in return. You really need to check this out and pass it on. You can get all of the details here: http://sqlblog.com/blogs/arnie_rowland/archive/2010/07/30/like-a-phoenix-rising-from-the-ashes.aspx.

Pass it on!

Cheers,
kt

Lots of learning options – which is best and what’s coming up soon!

For someone that's been in the working world for many years (I wish I really were only 29 ;-), I know the struggle that everyone goes through in keeping up with technology. I admit, it's hard to stay current and it takes time and effort to do so. I attend online webcasts, I read whitepapers, I attend conferences, I attend classes. Also, I constantly work with the product both at my customers but also in testing and development environments. I spend much of my day (and life :) using and working with SQL Server. I'm lucky in the sense that I'm completely focused on this product for my work. Many of you have to work with multiple products and therefore you rely on best practices to help you through a lot of what you do. Learning takes time and money and is hard to directly quantify. And, if you end up in a session that's not what you thought – you either have to leave (and may not have any way to replace this lost time) or you have to sit and waste your time in a session that just won't give you the return you expected. Instead, maybe you'll catch up on email… somewhat useful but not really what you had wanted.

So, what I've found is that I really need to strive to get better ROI from my learning options. And, I need to learn the right way to do something… This is absolutely critical in IT. If we lose data or have downtime, that translates into lost customers and ultimately lost revenue. But, it's hard to directly quantify training and this in turn makes it difficult to request (from management) especially when it's so easy to see the costs. But, where are the returns? The returns are in more effective practices which improve performance and reliability of your servers. They're in learning automation options. They're in learning troubleshooting best practices. They're in learning design techniques that will make your database more scalable – from the start – rather than having to fight fires when your database no longer performs.

To get the best ROI, I need to get information from reputable sources and ideally, sources that are also able to convey these technical concepts in a digestible way. And, I also need to be sure of all of my options! And, what's best for what type of learning and what type of budget. I wrote this post where I discussed the pros/cons of different types of events.

And, to help you, I thought I'd give you a list of the upcoming events and your options that are coming up this Fall.

 

October 25-29, 2010: BI Immersion Event in Bellevue, WA – USA

Immersion Events are intense and focused training events taught by the best and most experienced people in the business. This BI Immersion Event is presented by Stacia Misner and offers best practices from configuration to implementation to performance for the end-to-end BI stack (SSIS, SSRS, SSAS). It's meant for all levels but offers a level-setting module to make sure that everyone's on the same page. Whether your already working with BI in SQL Server or if you're relatively new to BI (but have been working with SQL Server for awhile), this class will offer a lot of tips and tricks to make you more productive. For more details, check out the BI Immersion Event here: http://www.sqlskills.com/BIImmersion102010.asp.

November 1-5, 2010: SQL Connections in Las Vegas, NV – USA

This conference offers not only best practices and tips/tricks about SQL Server but also boasts parallel tracks with the DevConnections conference suite. If you're working with Visual Studio, Sharepoint, ASP.NET, or Silverlight you can attend sessions at the other conferences to help widen your knowledge. And, for this year's event SQL Connections is also co-located with Windows Connections and Exchange Connections. Ultimately, this means BETTER VALUE as you get 7 Conferences for the Price of One! Attendance to any one Connection show lets you attend any of the over 250 sessions at our co-located conferences for no additional charge. And, you'll find industry experts in every Connections conference. When Paul and I work to create the SQL Connections conference we look for industry experts that really shine in their areas of expertise. Additionally, these experts are able to convey this information effectively and concisely. There are also pre- and post-conference workshops that can be added to your conference experience to help add value and make the most of your time. For SQL Connections, we are offering these workshops:

November 1 (pre-conference workshop) - SPR201: Database Best Practices for the Involuntary DBA (9:00AM – 4:00PM), presented by Kimberly L. Tripp and Paul S. Randal, SQLskills.com
November 1 (pre-conference workshop)- SPR302: Day of Scripting: Plumbing The Depths of SQL Server / PowerShell Integration (9:00AM – 4:00PM), presented by Bob Beauchemin, SQLskills.com
November 5 (post-conference workshop)- SPS301: Indexing Strategies and Analysis (9:00AM – 4:00PM), presented by Kimberly L. Tripp, SQLskills.com

For more information, see DevConnections online here and SQLConnections online here. And, did I mention that it's in Las Vegas?!

November 8-12, 2010: PASS Summit in Seattle, WA – USA

This conference is completely focused on SQL Server and all aspects of it. There are numerous community sessions and speakers from around the world, giving sessions on best practices and lessons learned. It's in Seattle, so this allows folks from the SQL team to attend and therefore this conference boasts a few special options such as networking with folks from the SQL team and the SQLCAT (Customer Advisory Team). This conference also has pre- and post-conference workshops that can be added to your conference experience to help add values and make the most of your time. For PASS, there are numerous workshops. Here, I'll list only those delivered by SQLskills team members:

November 8 (pre-conference workshop) - Database Best Practices for the Involuntary DBA (DBA-282-P), presented by Kimberly L. Tripp and Paul S. Randal, SQLskills.com
November 12 (post-conference workshop)- Indexing Strategies that work: Covering Concepts, Concerns, Costs (AD-325-P), presented by Kimberly L. Tripp, SQLskills.com

December 9-13, 2010: SQL Immersion Event in San Diego, CA – USA

Immersion Events are intense and focused training events taught by the best and most experienced people in the business. And, this SQL Immersion Event was added on the heals of two very popular immersion events offered earlier in the year (Boston, MA in March and Bellevue, WA in August). Both classes were very well received and had a waitlist so we decided to add this third event for the year. We explain everything we teach to whatever depth you want to go and we bring our real-world experiences to the classroom to provide proven examples of best practices and tips/tricks.  For more details, check out the SQL Immersion Event here: http://www.sqlskills.com/SQLImmersion122010.asp.

We have lots planned and we hope to see you at one or more of these upcoming events!

Cheers,
Kimberly

Women in Tech – a great post by Stacia Misner

This "comment" started out as one but I've decided to make it a post as I hope that a few people that may follow me but not Stacia – will check out her post on WIT.

The post to read is here: http://blog.datainspirations.com/2010/07/29/maybe-its-just-me-a-perspective-from-one-woman-in-it

First – Great post.

Second – Fantastic comments from folks in the SQL community (once again reminding me of why I like it so much)!

I do love the SQL community for so many reasons and this is DEFINITELY one of them. To be honest, I've never really felt singled out as a "WOMAN" in tech. I've always felt like just another person struggling to do the tech and to do it well. I've found that you must do things well to get ahead but I (luckily) haven't been in a situation where nothing but my gender stopped me from getting ahead or getting something done. To take that even further since I'm sometimes the only woman in the room where I'm teaching (big smiles) the thing I love the most is the willingness and interest for everyone to listen, contribute and grow – regardless of the fact that their instructor is a woman. I suspect that maybe I don't meet the people that don't want to hear tech from a woman as they go to someone else's sessions instead of mine but the good news is that I've often had very well attended sessions (so it truly can't be the majority of men that can't listen to a woman OR work with a woman OR deal with women in tech). And, this week my class was the norm – in terms of men/women ratios (maybe even a bit higher than normal). We had 31 students + 2 instructors – 6 students were women (Paul Randal and I were the instructors). That's only 20% and still fairly low (overall – but again, a bit higher than some) but I didn't feel like there were any issues with men/women in the class – everyone was happy to be there and sharing information (regardless of gender, ethnicity, etc.).

Having said all of that – I do think that there have been sterotypes over the years. I remember (*VERY* clearly) something that was said to me when I was in high school (I wish I didn't have to say that it was over 20 years ago but it was roughly 25 years ago when this occurred). My favorite math teacher (I was also his TA) introduced me to a mathematician that was a woman… She was definitely an inspiration and talked to me about different fields and directions in which to go – in Math. It was a nice dinner. I remember thinking that I might want to continue my studies in Math (which I did as a Math/CompSci major in college). But, what I remember clearly is her telling me that she didn't wear makeup or paint her nails because she wouldn't be respected in the field. I've never forgotten that but I've also never lived by that. And, unless it's me (and my bad memory ;-), I really don't think it has hurt me. Maybe it's SQL. Maybe I've been sheltered – but I guess I have to say that I'm happy it's been this way as well.

Thanks for (another!) great post Stacia and some really great comments.

Once again – I'm reminded of how much I love our community!

Cheers,
kt

PS – Check out a past post about Women in Technology and some of the resources that are out there: http://www.sqlskills.com/blogs/kimberly/post/Resources-A-panel-on-HA-and-a-second-on-Women-In-Tech.aspx. Also, if you're interested in helping your daughters get more involved in technology check out:

RunAs Radio interview posted

Last week we sat down with our good friends Richard Campbell and Greg Hughes of RunAs Radio and recorded an interview about how to learn effectively. It was incented by Paul's recent post as part of TSQL Tuesday: T-SQL Tuesday #008: Top ten mistakes to make when attending a class. Additionally, I've written some related posts on types of training here.

Here's the specific link to the show: Kim Tripp and Paul Randal Talk about Learning Effectively!

Enjoy!
kt

Just added – NEW SQL Immersion Event in Bellevue, WA – in August

It's official! We're doing another 5-day SQL Immersion Event (our ever-popular 5-day Deep Dive into Internals, Performance and Maintenance) this year in the US!

We decided to have a class near Seattle because Washington is *fabulous* in August!

The class will cover our best content in these areas:

  • On-disk structures: how the data is stored
  • Index internals: how the data is organized
  • Logging and recovery: how the data is protected
  • Choosing the RIGHT Data Type
  • Table & Index Partitioning Strategies
  • Data Access
  • Indexing Strategies
  • Data and log file provisioning and management
  • Tempdb
  • Index and statistics maintenance
  • Using backup and restore (plus internals)
  • Consistency checking and disaster recovery

The event will be held in the Marriott Courtyard Bellevue Downtown and will be fully catered – there's a special room rate of $159/night and even a hotel room discount for the first 10 people to register and stay at the event hotel.

The full cost is US $3,100.00 with an early-bird special of US $2,500.00 for registrations before midnight PST June 18th.

Check out the Immersion Event page for full details, registration, and feedback from attendees of our event in Boston earlier this year (or jump straight to registration).

We hope to see you there!
kt

(OLD): A new and improved sp_helpindex (jokingly sp_helpindex8)

NOTE: I’ve released other versions of sp_helpindex since this post. Check out the category: sp_helpindex rewrites for all of the options!

OK, I first blogged about re-writing sp_helpindex here as sp_helpindex2 (April 2008). Shortly thereafter (Aug 2008), a reader found a bug and when I went digging, so did I (a couple of others). I did a few tweaks and the version that’s stuck for the past couple of years is in this blog post: Updates (fixes) to sp_helpindex2.

However, as much as I’ve used that – I’ve always been frustrated by its output. And, so, over the months+, I wrote sp_helpindex3 (this solely added the index_id as the first column and then ordered the output; I’ve always hated the weird way in which sp_helpindex/2 ordered the output). Then, I started writing sp_helpindex4 which listed the clustering key columns with each index so that I could more easily identify columns on large tables without a lot of scrolling. Ultimately, I didn’t like this because the clustering key (or some of the columns in a multi-column clustering key – might already be in the nonclustered index and so I still had to figure things out more manually). And, so I finally sat down and created what’s jokingly being called sp_helpindex8.

Geeky note: Er, you might wonder with how many geeky SQL people I’ve joked about sp_helpindex8 but in fact, in my last few events (both onsite with a couple of customers as well as when teaching the Indexing Strategies and Performance Tuning portion of the Microsoft Certified MastersSQL Server course (two weeks ago)), I’ve actually joked – and promised – that I’d have this blogged before this particular 3 week MCM rotation ends. And, well, that’s this week. So, I’ve actually joked with quite a few folks on this and I’m close to the end of time on blogging it. So, here I am – sitting in a hotel room in Houston, after having presented at the Houston Area User Group (7-9pm) (yes, with Paul) and after having been onsite all day with a customer – blogging this darn thing. I just have to get it out there. ;-)

Having said that – I’m pretty happy with this version but it was pretty painful to write. And, before I get into what it does – and why I wanted this version – let me explain what I didn’t do and why. There are a lot of other things that could be added. And, quite a few folks have said – why don’t you do this or why don’t you do that… Specific suggestions have been made such as adding index usage stats (from the current state shown by the DMV) or add index operational stats (to show if an index is suffering from latch IO waits, for example) or add index physical stats (to show if an index is fragmented), etc. And while many of these suggestions would make for lots of interesting uses of this output, I’ve stayed away from these. Why? Some are expensive to run – such as physical stats – even a limited scan can be painful on a really large table. Or, the results might not be able to be truly trusted  – such as with index_operational_stats. Why? Because the information from that DMV might only cover a short period of time. The information help by them is cleared whenever SQL Server is restarted (or if a database is taken offline or restored, etc.). So, when were those usage or operational stats last cleared? Moreover, is this single “look” at the DMV information really the best on which to base our decisions (even just in general).

In the end, I think that physical stats, operational stats and even usage stats – should be used wisely and carefully (definitely used but not by this). In fact, we often recommend that the best way to use these is by studying them over time and then analyzing their trends not just the point in time values. And, there are LOTS of posts that can help you with understanding this information; here’s just a couple to get you started:

So, having said that – I’ve kept my sp_helpindex8 more pure ;-). What I want to see is EXACTLY what’s stored in that index. In fact, this has always been my main goal. The first version (sp_helpindex2) added INCLUDED columns (2005+) as well as filter definitions (2008) to the output. So, what was missing? The added columns that SQL Server might have to add if a nonclustered is nonunique and/or when the clustering key is nonunique. The reason this is so important to me is that there are many things coming together and recommending indexes (DTA, the missing index DMVs, application tools, etc.) and while many of these tools are solely there to “help” us get better performance and “help” us create more effective indexing strategies – these tools sometimes recommend the same index over and over again (see Paul’s recent blog post: Missing index DMVs bug that could cost your sanity… about the bug in the missing index DMVs) OR the tools might recommend similar indexes to what we already have.

Additionally, the output from sp_helpindex/2 only shows the columns that are EXPLICITLY defined in the key and in the leaf (using INCLUDE) but they don’t show what SQL adds for you. Most folks know that SQL Server adds the clustering key to the nonclustered indexes and when you have a simple single-column clustering key it’s not difficult to know that this IS “there” in your index even though it doesn’t show up. But, even with that, I think there are still a few things that are lesser known. For example, if your nonclustered index is nonunique – did you know that the clustering key is not only added to the leaf level of the index (like INCLUDE does) but that the clustering key is added into the tree (as part of the key and ordering?). And, if have one index that lists this clustering key explicitly and another that doesn’t – are you actually going to be able to easily tell that these are in fact the SAME. Ultimately, I want you to be able to better recognize what’s REALLY in that index. What you might end up seeing is that two indexes are more similar than you first thought and as a result – maybe you can drop one of them. Or, you might be able to combine them to reduce your overall indexes (I call this index consolidation). And, all of this becomes significantly more challenging when you have more complex clustering keys (and, yes, they do exist!). The other thing I did is delimit the columns and index names properly using brackets. This is also nice because you might have identifiers that make the output of sp_helpindex/2 hard to read. This should help. However, I do have to admit that you probably have other problems if your index/column names have commas, spaces or brackets in them. But, this should still help/work.

And, that’s why I wrote “sp_helpindex8″ it’s a MUCH better version of sp_helpindex but it requires a few other components. And, it was a lot more challenging to write. Basically, this information is more internal to SQL Server and not directly exposed anywhere. So, we had to do a bit of digging. Paul started the process (thanks!) but, I ended up spending about 5 hours getting it exactly the way I wanted it. And, I also ended up breaking it up into two parts: one sp that gets column definitions and the other that builds the final output. And, I’ve actually named it with a longer name (sorry!). But, I rarely type in the name anyway. What I typically do is go to SSMS, Tools | Options, then under Environment | Keyboard, I set the keyboard shortcut for Ctrl+F1 to my sp_SQLskills_SQL2008_helpindex.

ssms customkeyboardshortcuts (OLD): A new and improved sp helpindex (jokingly sp helpindex8)

Now, when I want to see the true defintions of indexes on a table – I highlight the tablename and hit Ctrl+F1. But, if you want to rename this to sp_helpindex8 – I won’t be offended. ;-)

So, how do you create this?

  1. You need to create sp_SQLskills_ExposeColsInIndexLevels (sp_SQLskills_ExposeColsInIndexLevels.sql (6.73 kb)). This is what gives us the tree/leaf definitions. And, this works for both SQL Server 2005 and SQL Server 2008.
  2. You need to create the version specific sp_helpindex8. These are named: sp_SQLskills_SQL2005_helpindex (sp_SQLskills_SQL2005_helpindex.sql (10.50 kb)) and  sp_SQLskills_SQL2008_helpindex (sp_SQLskills_SQL2008_helpindex.sql (10.73 kb)). Again, I know they’re long names but using the keyboard shortcuts can help. And, if you do rename these make sure you do a replace ALL. There are multiple places where there are comments and/or statements that need to execute to get everything to work (for example the sp_MS_marksystemobject execution).

NOTE: The very first version of the 2005 script had a bug in it (thanks Calvin!). Funnily one I already knew about and thought I had caught (it was fixed in the 2008 version). Anyway, as of 3:15pm CT on May 19, it’s fixed in the scripts here. There could be others but so far – it’s looking good!

Enjoy!
kt