SSMU Web Seminar: Performance
Tuning Series
Part II: Optimizing Query Performance
Q & A
Q: Is there a reason that the query optimizer doesn't
manipulate queries to isolate the column name for you?
AND
Q: So, just to confirm, regarding SARGs
like LEFT(MyField,5) will that always scan? Is there no way to exploit indices?
Both of these
questions are similar enough that I will take one answer for both… This is a
historical issue and EVERY release of SQL Server and even a few competing products
behave this way. The primary reason is that NOT all queries can be easily
re-written/converted and always guaranteed to produce the same results. Yes,
it’s fairly obvious for certain types of operations – especially mathematical –
but the behavior of the server was to go the “safe” route at the expense of
performance. At least that’s how I’ve always understood it. SQL Server has
always done this…
If you have ANY
computation on the column then SQL Server will absolutely force a scan. I took
the specific case listed about LEFT(LastName, 5) = ‘TRIPP’ and compared that to column like ‘Tripp%’ and
the performance difference was staggering!
Here is the result:
SELECT * FROM Member
WHERE LEFT(LastName,5)
= 'Tripp'
-- Causes a TABLE SCAN
-- Performs 143 I/Os (small table, luckily J)
-- Showplan
estimate is 92.71% overall cost between these
two queries
SELECT * FROM Member
WHERE LastName
LIKE 'Tripp%'
-- Uses an Index
-- Performs 9 I/Os
-- Showplan
estimate is 7.29% overall cost between these
two queries
Simple rule – try
re-writing SARGs and get users to understand some of
these benefits!
Q: Does it matter if you isolate column on left or
right?
No.
Q: If "*" is slower, would selecting a
constant (like "1") work even better? To clarify, in an EXISTS subquery, would selecting a constant value (like "1")
work better than selecting "*"?
I’ve had this question
before! So – I’ll post my previous questions/answers. These are from the
Indexing Session’s Q&A.
What about using "*" in this
case: IF EXISTS(SELECT * FROM Users WHERE UserID = X ?
Another
very common question and also relates to some “folklore” that seems to be going
around… An existence test does NOT return columns. Because of this the * (or
constant for that matter) is irrelevant. In the ANSI standard the definition of
an existence test is that it is introduced with * - this is preferred and does
NOT cause any performance problems. In fact, it is preferred syntax. No benefit in using a constant.
What about using * in COUNT(*)?
Ah
– this used to be a big one… People would often put in a column and SQL Server
would be forced to read that data – even when the column did not allow NULL
values. However, in SQL Server 2000 they are aware that * and any (column)
where the column does not allow NULL values is a row count. If you need a row
count then they know where they can find it – in a leaf scan of the smallest
non-clustered index. So – yes, * is just fine here and in fact, it is
preferred.
OK
– you guys made some good points… * is not ALWAYS evil. In these cases it is
just fine. I would also add that it is appropriate when you are selecting from
something which already limits the column list (i.e. a view, stored procedure
or table valued functions). I’m sure we could probably come up with a few more…just
remember that less is more when it comes to optimization!
Q: So in general you weigh cost (execution plan-wise)
more than I/O?
This is a great
question! Yes, I usually do – as SQL Server does as well… SQL Server uses a
“cost-based” optimizer. The primary goal is for the “overall” cost of a query’s
plan to be as low as possible. This means they have to account for I/O, memory,
etc. What may appear to have less I/Os may be more costly in terms of temporary
resources, etc. The showplan tells you the “overall”
estimated cost – this is a better view of the whole picture but more than
anything you need to look at all numbers – when I/Os are less than a few
thousand apart it’s probably not that significant as a few thousand logical
I/Os can be performed in milliseconds.
Q: What columns to add when looking for scans in
Profiler? Can I filter for specific
tables?
Well – you’d think you
could just add “ObjectName” as a data column and all
would be well… but it’s not. Some “events” don’t produce values for every data
column. Within the books online there are topics for Profiler event types and
within each one they list the data columns for which there are values provided.
Here are a couple of BOL links: mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\adminsql.chm::/ad_mon_perf_8h2x.htm.
Specifically the section is titled: “Scans Event Category”
To see a list of all
of the categories with links to their data column lists, etc. check out this
link mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80
titled “Monitoring with SQL Profiler Event Categories.”
Here are the KEY
things I would do to get better information about SCANs:
Ø
Make sure
you use FILTERS – one that you REALLY need if you use Profiler to look at scans
is “Exclude system IDs” (the checkbox on the filters tab). This will
SIGNIFICANTLY reduce the excessive information produced.
Ø
As for
data columns – you must add ObjectID NOT ObjectName. Unfortunately using filters here is a bit
harder BUT you can use either a system table OR a system function to determine
your table’s IDs. The system table is sysobjects and
the function is called object_id. Here are two sample
queries:
SELECT
Name, ID FROM sysobjects WHERE type = ‘U’
-- or to get just
the ID for a specific table
SELECT
object_id(‘member’)
Ø
Finally,
make sure you set a filter for the specific database you want to profile.
Q: We have 5000 to 8000 I/O sec at peak. Unsure how much of that is INSERTing? These inserts are in a heap.
Not sure where you’re
going with this one other than you’re trying to decide if you would be better
off clustering instead of using a heap. In this case I’d need more info to
really decide – and it’s mainly because you might be exceeding the 400
inserts/sec rule here. BUT – typically the performance of inserts can be better
with a clustered index over a heap (with the exception of bulk loading into an
empty heap – it is better to load into an empty table and create the indexes
after the load rather than load into an already indexed table). So – in your
specific case I would try to setup a reproducible test environment. I would:
But I’ll stress that
looking at inserts is only ONE piece of the puzzle – this table has other
issues. For example, scans are likely to be VERY expensive if there are updates
and therefore forwarding pointers. You might want to find out how many pages
the table has and then perform a query which does a scan – to see how many
forwarding pointers you have.
To determine pages:
exec sp_spaceused tname, true
Take the data column
and divide the number by 8 to determine pages
Execute:
SET
STATISTICS IO ON
SELECT
count(*) FROM tname WITH
(index (0))
If difference between
the number of pages scanned in the select is higher
than the number of pages calculated by the sp_spacesused
pages values divided by 8 then you have forwarding pointers – and possibly lots
of them. Each one adds unnecessary overhead in a scan. There are many pro’s and con’s here – again, this is just a couple but a
clustered table wouldn’t have these.
Finally, if you want
to determine the estimated number of I/Os needed to do each insert – use
statistics I/O in query analyzer and run the same process to insert data while
capturing I/Os.
Q: When I see Clustered Index Scan, I have NO IDEA how
to coerce it instead to Seek?
The easiest way to
start is to highlight the query and run it through index tuning wizard. This
will work very well on most queries. When ITW doesn’t give any advice and you
still want to try and move forward – manually – implementing indexes then
you’ll need to evaluate quite a bit about a query to get the “right” index(es) and the right balance of
indexing pros to maintenance and overhead cons. There’s a lot of info needed to
fully answer this but it’s a good recommendation to consider the online webseminar series on Indexing: Internals, Performance and
Maintenance – as this is what I talked about for 4 sessions! Here are the
links:
Indexing
for Performance, Part I: Index Overview and Internals http://www.sqlmag.com/SSMU/Seminars/Index.cfm?Event_ID=18
Indexing
for Performance, Part II: SARGs and Joins http://www.sqlmag.com/SSMU/Seminars/Index.cfm?Event_ID=19
Indexing
for Performance, Part III: Aggregates and Indexes Views http://www.sqlmag.com/SSMU/Seminars/Index.cfm?Event_ID=20
Indexing
for Performance, Part IV: Index Maintenance http://www.sqlmag.com/SSMU/Seminars/Index.cfm?Event_ID=21
Q: I'm getting a lot of institutional resistance to
views. Can views negatively impact performance?
This is becoming a
frequently asked question! I’m hearing a lot of comments like this these days
and I have a feeling it has to do with rumors that are FLYING all over the
place that say that SQL Server creates temp tables for views when they are used
within other queries…people also say that views add unnecessary overhead. Ugh!
There are cases where when views are used improperly they might cause
unnecessary work to be performed but generally a view does not add any
additional costs v. the same query executing. Views are not usually the problem
– it’s the usage of the views that can be problematic. In fact, I was hearing
this so much that I ended up writing a FREE TSQLSolutions
article specifically detailing this problem. All you need to do is register to
read these… I think this will help convince you that views are not inherently
evil. J It’s part II that
speaks more to performance issues.
Part1 - The View from Here,
December 2002 InstantDoc #27229
http://www.sqlmag.com/Articles/Index.cfm?ArticleID=27229&
Part2 - Viewer Advisory, February
2003 InstantDoc #37660
http://www.sqlmag.com/Articles/Index.cfm?ArticleID=37660
And one other thing
I’ll add to this is that OFTEN views can IMPROVE performance by focusing the
users on JUST they data they need, providing a well written join to handle
complex schemas (with the users struggling and/or executing poorly written code
against the server) and simplifying permissions. I’m a fan of views – as long
as they’re used properly!
Q: Where is the best place to learn about the
individual elements of the showplan, for instance
Hash?
AND
Q: Where can we learn EXACTLY what all the various
icons mean in the Graphical ShowPlan? They're explained, sort of, in BOL--it's
helpful, but the info is shallow. I want
MORE. Indeed, I want it ALL!
Another set of
questions I can lump into one answer BUT the bad news is that I’m not aware of
any single reference that covers all of these. There are quite a few KB
Articles that do a great job of focusing you on specific elements and explain
them really well and I’d certainly start with those and scan through them
(start with KB
Also, books like
Q: If I have a table with a few indices, but SysIndexes reveals that there are, say, a half-dozen colums with AutoStatically
generated indices--does that mean I could benefit from more indices?
Another set of questions
I can lump into one answer BUT the bad news is that I’m not aware of any single
reference that covers all of these. There are quite a few KB Articles that do a
great job of
Q: How can we get last week's resource list?
From last week’s
Q&A and/or last week’s PowerPoint slides. You
should have received information via email.
Resources:
Whitepaper: Statistics Used by the Query Optimizer in Microsoft SQL Server 2000 http://msdn.microsoft.com/library/en-us/dnsql2k/html/statquery.asp?frame=true
Search SQL Server Magazine’s website http://www.sqlmag.com
Wow – lots of
questions this week! Thanks for the great questions everyone!!
See you in Part III
next week,
Kimberly
President, SYSolutions, Inc. www.SQLSkills.com
Principal Mentor, Solid Quality Learning www.SolidQualityLearning.com