SSMU Web Seminar: Indexing
for Performance
Part 1: Index Overview and Internals
Q & A
Q: Should the volume of data (rows) influence the
decision on choice of a heap?
This is a great
question – and one I often miss… I often spend a lot of time talking about
large tables and minimizing fragmentation, etc. yet people often wonder what
about the little tables? While the size of the table does play a large role in
the index choices you must remember that it’s important to index EVERY table. Even
the small tables benefit from indexes – especially in the area of locking. With
indexes SQL Server can lookup and lock at a smaller granularity! Also, don’t
forget that indexes are the only way to ensure uniqueness. So – would I choose
a heap for smaller tables…probably not! In fact, I generally only recommend
heaps for bulk loading.
Q: Can you talk about rule of thumb regarding clustering
on non key columns, or AK?
Here I tend to think
about uniqueness and whether or not it’s static… If the column(s) are unique
(which a key would be!) and the data is static then that’s a good start. The
negative is that it may not be narrow? Well, if it’s narrow as well then you’ve
got some of the primary factors covered! The only last item would be – is the
key ever increasing? If not then this key may cause unnecessary fragmentation
(i.e. clustering on Social Security Number). So – for larger tables, I’d
probably avoid this. For smaller tables and/or systems where availability is
not as critical – management issues (rebuilding/defraging)
won’t be as difficult (smaller tables rebuild/defrag faster).
Q: How about using a select top n instead of a select
*? Does this improve performance?
When talking about
“minimizing data” returned to the client – almost every option is a good one!
Using TOP can be especially important when users have the habit of running a
select with an order by and then canceling the query after they get the data
they’re looking for… This causes the server to process the entire set, start
delivering the data and then the client says…Stop – that’s all I need. The
server has already done the work and had optimized to get to the last row
quickly – you didn’t want ALL rows you only wanted the first 10 (or whatever
TOP set). If you want the TOP n then asking for it causes SQL Server to
optimize for that set! So, YES – this should help improve performance as well!!
Also, if you’re really lazy (like me!) then check out my script (ColumnList.sql) to help you simply pull back the list of
columns from a table – to cut and paste and then remove the columns you’re not
interested in. This makes it easier to write your code when you only want 11
out of the 47 columns your table has! J
Q: Are you suggesting an extra identity column in
place of a composite pk?
The choice of the primary
key is more a logical database design choice and that’s really up to you (although
if it is wide then all of your foreign key tables might be unnecessarily wide)…
My biggest concern is the clustering key! Although they are often though as one
and the same they really don’t have to be. Having said that – when the table is
large and management issues (rebuilding/defraging)
are difficult then I will consider adding an arbitrary [identity] column solely
for clustering!
Q: What about ITW? Any way to use it
with cross-db queries?
Ah – this question was
a bit different in my view! I gave a few recommendations on how to use ITW –
and I’ll give them here for completeness however, it is a bummer! It doesn’t
work on cross database queries and this is a tough one. However, over the next
couple of sessions I’ll give you a few ways to pull apart queries and break
them down so that you can manually index them – with ease! If you want to use
ITW though here are my general recommendations:
Choose your Clustering Key
Add you constraints: Primary Key (maybe NC?) and Unique Key constraints
Manually index your foreign key columns
Capture a workload (get a realistic set of queries, make sure that their frequencies are realistic as well…if everyone uses stored procedures then you might want to extract some of the queries, hard-code the parameters and then run a few of these as “sample” queries)
Run ITW
Since ITW won’t get everything…then you’ll still need to manually index a few things – stay tuned to the next few web seminars and I’ll help you figure out some of these strategies!
Resources:
Whitepaper: Index Tuning Wizard for Microsoft SQL Server 2000 http://msdn.microsoft.com/library/en-us/dnsql2k/html/itwforsql.asp?frame=true
Here’s the general link for all technical resources on msdn: http://msdn.microsoft.com/library/default.asp?url=/nhp/Default.asp?contentid=28000409
Webcast: SQL Server 2000 Profiler: What’s new and how to effectively use
Profiler!
http://support.microsoft.com/default.aspx?scid=/servicedesks/webcasts/wc111400/wcblurb111400.asp
Here’s the general link for all webcasts from Microsoft:
http://support.microsoft.com/default.aspx?PR=pwebcst&FR=0&SD=MSDN&LN=EN-US&CT=SD&SE=NONA
Q: What about allowing SQL Server to name index v.
doing it yourself, systematically – recommendation?
The only type of
indexes that SQL Server doesn’t require a name for are constraint based
indexes. Primary Key and Unique Key constraints can be specified with either of
the following syntax:
--
Named Constraints
ALTER TABLE Employee
ADD CONSTRAINT EmployeePK
PRIMARY KEY CLUSTERED (EmployeeID)
ALTER
TABLE Employee
ADD CONSTRAINT EmployeeSSNUK
UNIQUE NONCLUSTERED (SSN)
--
Unnamed constraints
ALTER
TABLE Employee
ADD PRIMARY KEY CLUSTERED (EmployeeID)
ALTER TABLE Employee
ADD UNIQUE NONCLUSTERED (SSN)
go
sp_helpconstraint Employee
In a quick example I
executed the two bolded statements from above (i.e. the named primary key
constraint and the unnamed unique key constraint). Here are the two names:
EmployeePK
UQ__Employee__114A936A
The reason why these
are so “ugly” is because constraint names must be unique – database wide! So –
to do this SQL Server adds the random hex number to the end.
Q: Primary key for partitioned tables?
This is a tough one.
Quite a few people choose to create the PK as a uniqueidentifier
when their tables are partitioned… However the GUID doesn’t necessarily make
for a good clustered index – it’s wide[r] and it’s not monotonically
increasing. More than anything is depending on your partitioning scheme but you
might consider a surrogate key added solely for clustering and then have the PK
as non-clustered.
Q: Would you recommend building the clustered index
after a bulk load?
Yes, definitely! Try
some tests… I’ve found substantial gains in loading data into a heap first and
then creating indexes! The only negative is that the build of the clustered
requires space to build… So definitely do some tests but I think you’ll be very
surprised at the gains with creating the indexes AFTER the bulk load!
Q: Recommendation on auto statistics and the
background indexes it creates?
In
SQL Server 7.0 – not always but MOST of the time. In SQL Server 2000 – YES! The way the SQL
Server 2000 manages the updating of statistics has been significantly improved!
Check out the whitepaper: Statistics Used by the Query Optimizer in Microsoft SQL Server 2000 http://msdn.microsoft.com/library/en-us/dnsql2k/html/statquery.asp?frame=true
Q: Why does it take as long to drop a clustered index
if you only have to drop the b-tree?
In earlier releases
(pre-7.0), dropping a clustered index was instantaneous because you’re right –
all they needed to do was drop the b-tree…simple! However, in SQL Server 7.0+ non-clustered
indexes depend on whether or not the table is clustered… So when you drop a
clustered index all of the non-clustered indexes must be rebuilt.
Q: Do we need an Index on a temporary table that we
will only scan once?
Tough one! Hard to say
as it depends on the access (is there an order by or group by)…but if you’re
truly only doing a table scan on the temp table – after you build it then a
heap might be just fine. I’d probably end up doing a bit of testing with this one!
Q: What about clustering on values that aren't strictly monotonically increasing but have a general tendency
to increase (for example, a date of an event's occurrence)?
Yes, it’s OK to create the clustered index on a column which is ever increasing – that is not an identity column BUT you should make sure that it is unique. Unfortunately using the datetime datatype might not be helpful here because of how SQL Server stores datatime data – it’s very UNlikely to be unique. You will get the ever increasing benefits but you add some overhead with “uniqification” of the duplicate datetime values.
Check out the short sidebar on datetime precision for more details! http://www.sqlmag.com/Articles/Index.cfm?ArticleID=25174&
Q: What about the advantages of using a wider
clustering key to cause more NC indexes to be covered?
It’s a good point but
are those columns (that you’re covering) requested in every
query? You might be making your nonclustered indexes unnecessarily wide when
only a few queries really need all of those columns… not to mention – it might
not be static, unique or ever-increasing – so you lose those benefits!
Q: How about indexing related to constraints (I've
been a bit confused about entries under Object Browser's Constraint v. Indexes
nodes)
Yes, I would agree
that it’s not very obvious what’s going on here… A constraint is more a logical
concept from relational theory – an index is a physical structure. However, SQL
Server enforces some of the relational requirements of a constraint (i.e.
uniqueness) by creating indexes. This seems confusing but it can be boiled down
to this – if you have Primary Key and Unique Key constraints then you will have
indexes with exactly the same names created (to enforce the uniqueness
requirement). If you manually create indexes – then these will ONLY be listed
under the index tab!
Thanks for the great
questions everyone!
See you next week,
Kimberly
President, SYSolutions, Inc. www.SQLSkills.com
Principal Mentor, Solid Quality Learning www.SolidQualityLearning.com