SSMU Web Seminar: Indexing for Performance
Part 1: Index Overview and Internals

Kimberly L. Tripp

 

Q & A

 

Q: Should the volume of data (rows) influence the decision on choice of a heap?

This is a great question – and one I often miss… I often spend a lot of time talking about large tables and minimizing fragmentation, etc. yet people often wonder what about the little tables? While the size of the table does play a large role in the index choices you must remember that it’s important to index EVERY table. Even the small tables benefit from indexes – especially in the area of locking. With indexes SQL Server can lookup and lock at a smaller granularity! Also, don’t forget that indexes are the only way to ensure uniqueness. So – would I choose a heap for smaller tables…probably not! In fact, I generally only recommend heaps for bulk loading.

 

Q: Can you talk about rule of thumb regarding clustering on non key columns, or AK?

Here I tend to think about uniqueness and whether or not it’s static… If the column(s) are unique (which a key would be!) and the data is static then that’s a good start. The negative is that it may not be narrow? Well, if it’s narrow as well then you’ve got some of the primary factors covered! The only last item would be – is the key ever increasing? If not then this key may cause unnecessary fragmentation (i.e. clustering on Social Security Number). So – for larger tables, I’d probably avoid this. For smaller tables and/or systems where availability is not as critical – management issues (rebuilding/defraging) won’t be as difficult (smaller tables rebuild/defrag faster).

 

Q: How about using a select top n instead of a select *?  Does this improve performance?

When talking about “minimizing data” returned to the client – almost every option is a good one! Using TOP can be especially important when users have the habit of running a select with an order by and then canceling the query after they get the data they’re looking for… This causes the server to process the entire set, start delivering the data and then the client says…Stop – that’s all I need. The server has already done the work and had optimized to get to the last row quickly – you didn’t want ALL rows you only wanted the first 10 (or whatever TOP set). If you want the TOP n then asking for it causes SQL Server to optimize for that set! So, YES – this should help improve performance as well!! Also, if you’re really lazy (like me!) then check out my script (ColumnList.sql) to help you simply pull back the list of columns from a table – to cut and paste and then remove the columns you’re not interested in. This makes it easier to write your code when you only want 11 out of the 47 columns your table has! J

 

Q: Are you suggesting an extra identity column in place of a composite pk?

The choice of the primary key is more a logical database design choice and that’s really up to you (although if it is wide then all of your foreign key tables might be unnecessarily wide)… My biggest concern is the clustering key! Although they are often though as one and the same they really don’t have to be. Having said that – when the table is large and management issues (rebuilding/defraging) are difficult then I will consider adding an arbitrary [identity] column solely for clustering!

 

Q: What about ITW? Any way to use it with cross-db queries?

Ah – this question was a bit different in my view! I gave a few recommendations on how to use ITW – and I’ll give them here for completeness however, it is a bummer! It doesn’t work on cross database queries and this is a tough one. However, over the next couple of sessions I’ll give you a few ways to pull apart queries and break them down so that you can manually index them – with ease! If you want to use ITW though here are my general recommendations:

Choose your Clustering Key

Add you constraints: Primary Key (maybe NC?) and Unique Key constraints

Manually index your foreign key columns

Capture a workload (get a realistic set of queries, make sure that their frequencies are realistic as well…if everyone uses stored procedures then you might want to extract some of the queries, hard-code the parameters and then run a few of these as “sample” queries)

Run ITW

Since ITW won’t get everything…then you’ll still need to manually index a few things – stay tuned to the next few web seminars and I’ll help you figure out some of these strategies!

Resources:

Whitepaper: Index Tuning Wizard for Microsoft SQL Server 2000 http://msdn.microsoft.com/library/en-us/dnsql2k/html/itwforsql.asp?frame=true

 

Here’s the general link for all technical resources on msdn: http://msdn.microsoft.com/library/default.asp?url=/nhp/Default.asp?contentid=28000409

 

Webcast: SQL Server 2000 Profiler: What’s new and how to effectively use Profiler!

http://support.microsoft.com/default.aspx?scid=/servicedesks/webcasts/wc111400/wcblurb111400.asp

 

Here’s the general link for all webcasts from Microsoft:

http://support.microsoft.com/default.aspx?PR=pwebcst&FR=0&SD=MSDN&LN=EN-US&CT=SD&SE=NONA

 

Q: What about allowing SQL Server to name index v. doing it yourself, systematically – recommendation?

The only type of indexes that SQL Server doesn’t require a name for are constraint based indexes. Primary Key and Unique Key constraints can be specified with either of the following syntax:

-- Named Constraints

ALTER TABLE Employee

     ADD CONSTRAINT EmployeePK

          PRIMARY KEY CLUSTERED (EmployeeID)

 

ALTER TABLE Employee

     ADD CONSTRAINT EmployeeSSNUK

          UNIQUE NONCLUSTERED (SSN)

 

-- Unnamed constraints

ALTER TABLE Employee

     ADD PRIMARY KEY CLUSTERED (EmployeeID)

 

ALTER TABLE Employee

     ADD UNIQUE NONCLUSTERED (SSN)

go

 

sp_helpconstraint Employee

 

In a quick example I executed the two bolded statements from above (i.e. the named primary key constraint and the unnamed unique key constraint). Here are the two names:

EmployeePK

UQ__Employee__114A936A 

The reason why these are so “ugly” is because constraint names must be unique – database wide! So – to do this SQL Server adds the random hex number to the end.

 

Q: Primary key for partitioned tables?

This is a tough one. Quite a few people choose to create the PK as a uniqueidentifier when their tables are partitioned… However the GUID doesn’t necessarily make for a good clustered index – it’s wide[r] and it’s not monotonically increasing. More than anything is depending on your partitioning scheme but you might consider a surrogate key added solely for clustering and then have the PK as non-clustered.

 

Q: Would you recommend building the clustered index after a bulk load?

Yes, definitely! Try some tests… I’ve found substantial gains in loading data into a heap first and then creating indexes! The only negative is that the build of the clustered requires space to build… So definitely do some tests but I think you’ll be very surprised at the gains with creating the indexes AFTER the bulk load!

 

Q: Recommendation on auto statistics and the background indexes it creates?

In SQL Server 7.0 – not always but MOST of the time. In SQL Server 2000 – YES! The way the SQL Server 2000 manages the updating of statistics has been significantly improved!

Check out the whitepaper: Statistics Used by the Query Optimizer in Microsoft SQL Server 2000  http://msdn.microsoft.com/library/en-us/dnsql2k/html/statquery.asp?frame=true

 

Q: Why does it take as long to drop a clustered index if you only have to drop the b-tree?

In earlier releases (pre-7.0), dropping a clustered index was instantaneous because you’re right – all they needed to do was drop the b-tree…simple! However, in SQL Server 7.0+ non-clustered indexes depend on whether or not the table is clustered… So when you drop a clustered index all of the non-clustered indexes must be rebuilt.

 

Q: Do we need an Index on a temporary table that we will only scan once?

Tough one! Hard to say as it depends on the access (is there an order by or group by)…but if you’re truly only doing a table scan on the temp table – after you build it then a heap might be just fine. I’d probably end up doing a bit of testing with this one!

 

Q: What about clustering on values that aren't strictly monotonically increasing but have a general tendency to increase (for example, a date of an event's occurrence)?

Yes, it’s OK to create the clustered index on a column which is ever increasing – that is not an identity column BUT you should make sure that it is unique. Unfortunately using the datetime datatype might not be helpful here because of how SQL Server stores datatime data – it’s very UNlikely to be unique. You will get the ever increasing benefits but you add some overhead with “uniqification” of the duplicate datetime values.

Check out the short sidebar on datetime precision for more details! http://www.sqlmag.com/Articles/Index.cfm?ArticleID=25174&

 

Q: What about the advantages of using a wider clustering key to cause more NC indexes to be covered?

It’s a good point but are those columns (that you’re covering) requested in every query? You might be making your nonclustered indexes unnecessarily wide when only a few queries really need all of those columns… not to mention – it might not be static, unique or ever-increasing – so you lose those benefits!

 

Q: How about indexing related to constraints (I've been a bit confused about entries under Object Browser's Constraint v. Indexes nodes)

Yes, I would agree that it’s not very obvious what’s going on here… A constraint is more a logical concept from relational theory – an index is a physical structure. However, SQL Server enforces some of the relational requirements of a constraint (i.e. uniqueness) by creating indexes. This seems confusing but it can be boiled down to this – if you have Primary Key and Unique Key constraints then you will have indexes with exactly the same names created (to enforce the uniqueness requirement). If you manually create indexes – then these will ONLY be listed under the index tab!

 

Thanks for the great questions everyone!

 

See you next week,

Kimberly

 

Kimberly L. Tripp

President, SYSolutions, Inc. www.SQLSkills.com

Principal Mentor, Solid Quality Learning www.SolidQualityLearning.com