A couple of week ago I kicked off a survey about common causes of performance problems - see here for the survey.

Firstly I asked what was the root cause for the most recent performance problems you looked at - here are the results:

 

Secondly I asked what you think the overall most common cause of performance problems is - here are the results:

 

The results are not surprising, but it's good to have empirical confirmation from the community - T-SQL code and then poor indexing strategy are the top two in both surveys, accounting for roughly 50% of all performance issues.

No matter how powerful the hardware and how much memory or IOPS capacity you have, if you write crappy code that means SQL Server has to use really non-optimal query plans then performance is going to be poor. It's really quite a low proportion of the time that hardware (including the I/O subsystem) is the root-cause of a performance problem - usually it's just a symptom of a deeper problem.

One of the most common causes of poorly-performing T-SQL code and indexing strategy is that developers don't write and test the code in an environment that simulates a true production environment and workload. What works for 2 connections with 100 rows in a table isn't necessarily going to work for 1000 concurrent connections and millions of rows in a table. Another cause is making code changes directly in production without testing their impact.

Nothing much else to say here as the theme has been hit on by many people in the past, but now we have some numbers.

Thanks to all those who responded!

About a month ago I kicked off a survey asking what you look for when first analyzing a plan for a poorly performing query. You can see the original survey here.

Here are the survey results:

 

The "Other" values are as follows:

  • 13 x "Most expensive as percentage of total cost of batch"
  • 7 x "It depends on what I am trying to fix/improve"
  • 4 x "Fat arrows"
  • 3 x "Operator cost"

First of all - what would my answer be? I tend to look for scans and the most expensive operators, plus big fat arrows that stand out.

There's a clear winner here in opinion amongst my readers - by far the most common first consideration when analyzing a query plan is to look for scans - I'm not surprised by that.

In this editorial I'm not going to write about how to go about analyzing query plans - there's so much to cover that I'd end up writing a book-length blog post. Instead I'd like to point you at a bunch of resources that will help you with query tuning, including a few books.

I use SQL Sentry's fabulous free Plan Explorer tool to do this as I can switch back/forth between showing costs by CPU, by IO or combined. I can also see cumulative costs for a branch of the query plan, rather than having to look through a bunch of operators for expensive ones - this is invaluable to me day-to-day when working on client systems. It has a host of other features that SSMS does not have and I know lots of people who use it - why wouldn't you?

If you've never look at a query plan before, I strongly recommend Grant Fritchey's free e-book Dissecting SQL Server Execution Plans. I'd also recommend Grant's regular book SQL Server 2008 Performance Tuning Distilled.

As far as the choices I gave in the survey, each of them can be a major problem but aren't necessarily a problem at all. It's like the misconception that if you have wait stats, then you must have a performance problem. See my long blog post on wait stats for more info.

Here is some specific info that will help you understand the ramifications of each problem in a plan that I listed in the survey:

  • Different row counts or executions between the estimated and actual plans usually indicates that either statistics are out of date leading to a bad plan, or maybe a plan was cached for a stored proc based on atypical parameters. Checkout the following links:
  • Sorts can sometimes be caused by unneeded ORDER BY statement or by missing nonclustered indexes. A good, quick overview of sorts is in Showplan Operator of the Week - SORT (Fabiano Amorim)
  • Joins are very often misunderstood and there are a whole host of reasons why one join may be chosen over another. Best thing to do here is point you at Craig Freedman's excellent series that I link to here.
  • There are all kinds of reasons why scans appear in query plans, not all bad at all. One of the bad reasons is that there are insufficient nonclustered indexes which mean the table has to be scanned to retrieve the data - see the books and search on Google/Bing for an enormous amount written about this. Another reason is T-SQL code written so that an index *cannot* be used because an expression does not isolate the table column correctly, forcing a scan. I blogged about this here.
  • Yet another reason why scans occurs is code written/schema designed so that an operation called an implicit conversion occurs, where the table column must be converted to a different data type before a comparison can take place - forcing a scan, as each value has to be converted to the comparison type. Jonathan blogged about this here.
  • Key/RID lookups are where the query plan uses a nonclustered index to find a value, but the nonclustered index is not covering, so the other result set columns must be retrieved from the table row. When a Key Lookup occurs, the retrieval is from a clustered index and when a RID Lookup occurs the retrieval is from a heap. Both of these are undesirable because of the extra processing required. The fix for this is simply to ensure the correct indexes exist to support the queries, and that the queries are pulling in the correct columns - many times I've seen client code that pulls in columns that aren't necessary.
  • The most expensive operators in a plan are usually a good place to look to see where gains can be made by changing code, statistics, or indexing. I blogged a short post on using SET STATISTICS to watch IO and CPU costs here.
  • Parallelism is again often misunderstood and not necessarily a bad thing. For a large report query in a data warehouse, parallelism is good. For frequent queries in a busy OLTP system, parallelism can be a problem. The best presentation I've seen by far on parallelism is by Craig Freedman - check it out here.

And that's what I have for you on query plans - lots of information for you to go exploring and learning. Analyzing a query plan is a skill that all DBAs and Developers should have IMHO and all the information is out there for you to try it out on your systems.

Enjoy!

In my survey for this week I'm interested in what you look for first when analyzing a query plan.

I'll report on the results around mid-February.

Thanks!

PS Post comments are disabled to avoid skewing the results.

I was doing some research this morning for some query tuning on a client system and I came across a couple of blog series with some excellent posts for those looking to learn more about query plans (and as a handy reference).

The first series is by Craig Freedman on the Query Processor team at Microsoft and discusses the various types of joins:

The second series is an ongoing series by Brazilian consultant Fabiano Amorim discussing all sorts (ha - no pun intended) of query plan operators - you can get the growing list of articles here.

Enjoy!

I've been doing a lot of performance tuning work over the last couple of months and this weekend found something that's very pervasive out in the wild. Kimberly was helping me optimize a gnarly query plan and spotted something in the code I hadn't noticed that was causing an index scan instead of an index seek. Even though the query plan was using a covering nonclustered index, the way the code was written was causing the index to be scanned.

I'll set up a test to show you what I mean.

First, create an example sales tracking table and populate it (took 3m45s on my laptop):

-- Create example table
CREATE TABLE BigTableLotsOfColumns (
    SalesID  BIGINT IDENTITY,
    SalesDate DATETIME,
    Descr  VARCHAR (100),
    CustomerID INT,
    ProductID INT,
    ModifyDate DATETIME DEFAULT NULL,
    PayDate  DATETIME DEFAULT NULL,
    Quantity INT,
    Price  DECIMAL (6,2),
    Discount INT DEFAULT NULL,
    WarehouseID INT,
    PickerID INT,
    ShipperID INT);
GO 

-- Populate table
SET NOCOUNT ON;

DECLARE @Date DATETIME;
DECLARE @Loop INT;

SET @Loop = 1;
SET @Date = '01/01/2010';

-- Insert 1500 sales per day
WHILE @Loop < 547600
BEGIN
    INSERT INTO BigTableLotsOfColumns (
        SalesDate, Descr, CustomerID, ProductID, Quantity, Price, WarehouseID, PickerID, ShipperID)
    VALUES (
        @Date,
        'A nice order from someone',
        CONVERT (INT, (RAND () * 1000)),
        CONVERT (INT, (RAND () * 1000)),
        CONVERT (INT, (RAND () * 10)),
        ROUND (RAND () * 1000, 2),
        1,
        CONVERT (INT, (RAND () * 4)),
        CONVERT (INT, (RAND () * 4)));
  
    IF @Loop % 1500 = 0
        SET @Date = @Date + 1;
  
    SET @Loop = @Loop + 1;
END
GO

-- And a clustered index
CREATE CLUSTERED INDEX IX_BigTableLotsOfColumns_Clustered ON BigTableLotsOfColumns (SalesID);
GO

Now I've got a simple stored procedure to find the total sales amount for any particular date.

CREATE PROCEDURE TotalSalesForDate (
 @TargetDate DATETIME)
AS 
    SELECT SUM (Quantity * Price)
    FROM BigTableLotsOfColumns
    WHERE DATEDIFF (DAY, SalesDate, @TargetDate) = 0
    OPTION (MAXDOP 1);
GO

The idea is that the stored procedure will pick up all sales for a particular date, no matter what time of that day the sale occured. I've got OPTION (MAXDOP 1) to simulate a system that's been tuned for OLTP queries (and to make the query plans fit better in my blog post :-)

If I run the stored procedure for, say, December 26th 2010, the query plan it uses is as follows:

EXEC TotalSalesForDate '12/26/2010';
GO

 

And it uses the following CPU and IO (you can get these by using SET STATISTICS IO ON and SET STATISTICS TIME ON, but be careful - the can generate lots of output for big procs):

Table 'BigTableLotsOfColumns'. Scan count 1, logical reads 7229, physical reads 0, ...

SQL Server Execution Times:
   CPU time = 125 ms,  elapsed time = 186 ms.

I ran the proc twice - once to get it to compile, and the second time to factor out the compile time.

Pretty obvious that I can speed this up by creating a covering nonclustered index, and in fact you can see the query processor telling me that with the missing index suggestion. I create the covering nonclustered index:

CREATE NONCLUSTERED INDEX IX_BigTableLotsOfColumns_Date ON BigTableLotsOfColumns (SalesDate)
INCLUDE (Quantity, Price);

Now when I run the proc, I get a better query plan:

 

And the following CPU and IO:

Table 'BigTableLotsOfColumns'. Scan count 1, logical reads 2113, physical reads 0, ...

SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 115 ms.

Much better, as I'd expect. But look at how the nonclustered index is being used: it's being scanned when instead there should be a seek. It's still doing more than 2000 logical reads because it's scanning the whole index.

The reason for this is that the DATEDIFF in the proc is forcing every SalesDate column value to be evaluated, hence the scan. If I know that the proc only ever is called with a date without a time portion, I can do some rearranging of the logic.

The updated proc is below. Notice that there's no reason to scan any more as the WHERE clause doesn't require every column value to passed through the DATEDIFF:

CREATE PROCEDURE TotalSalesForDate (
    @TargetDate DATETIME)
AS 
    SELECT SUM (Quantity * Price)
    FROM BigTableLotsOfColumns
    WHERE SalesDate >= @TargetDate AND SalesDate < DATEADD (DAY, 1, @TargetDate)
    OPTION (MAXDOP 1);
GO

And when I run it I get the best plan:

 

And the following CPU and IO:

Table 'BigTableLotsOfColumns'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, ...

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 64 ms.

Wow! 200x less logical IOs and not even enough CPU time to register as 1ms.

Summary: although you may have nonclustered indexes that are being used, make sure that they're being used for seeks as you expect. For small queries that are executed hundreds of times a second, the difference between seeking and scanning can be huge in overall performance.

 

I'm starting a new blog category to talk about some of weird and confusing stuff I see while query tuning.

First up is the case of the unexpected Key Lookup (Clustered) in a query that looks like it should be covered. This is a follow on from the post Missing index DMVs bug that could cost your sanity...

Today I was tracking down some performance issues on a client site and came across something in a query plan that confused me for a bit. All code and screenshots below are from my simplified repro on my test machine at home.

The code is using a cursor to drive a process, and the SELECT statement driving the cursor is covered by a non-clustered index.

However, the query plan for the cursor-driving statement is as below:

 

You can see the SELECT statement above, and you can see that the query plan is using the nonclustered index on c2 and c3 to satisfy the SELECT. So where is the Key Lookup coming from? If I hover over the Key Lookup operator, I see the details of the operator in a tooltip:

 

According to the details, the output list of the Key Lookup is Chk1002. What's Chk1002? It's not a column in my table, and it's not a check constraint on the table either - I don't have any. And why is it looking up the cluster key *anyway*?

This is where I turned to Kimberly to help figure out what's going on, who said it's to do with the cursor.

I then proceeded to work thigns out and got it a little twisted up. Check out the comments discussion with Brad who explained things, and it's in Books Online too.

Because the cursor is a dynamic optimistic cursor (by default as I didn't specify anything else) SQL Server persists a checksum of all rows picked up by the cursor-driving query in a worktable in tempdb (which, if this is a heavily executed query, can cause perf issues with tempdb). When I say FETCH NEXT, it goes back to the table to get the next value, but if I want to update a column in the row I'm working on, it recalculates the checksum to make sure that nothing has changed in that row outside the confines of my cursor. If so, my update fails - if not, it allows it.

There are plenty of ways to improve the situation (e.g. removing the cursor altogether and using a nice set-based operation... or changing the cursro type) but that's not the point of this post. I just wanted you to be aware of this, and how the query operator properties ins't explicit in working out what's going on.

Enjoy!

Here's yet another reason to be very careful when using the missing index DMVs...

There's a bug in the missing index DMVs that could end up causing you to knock your head against a brick wall and question your sanity. I know I did.

The bug is this: the missing index code may recommend a nonclustered index that already exists. Over and over again. It might also recommend an index that won't actually help a query.

Yes, I'm surprised by this too - as the missing index code is in the query optimizer too. However, it will continue to recommend you create the already-existing index - which is terribly annoying.

This is a little-known bug (Connect item #416197) which is fixed in SQL11 but won't be fixed in earlier versions.

I experienced this on SQL Server 2008 SP1 this weekend and I wanted to blog about it so you don't spend ages trying to work out what's going on.

Here's a repro for you:

CREATE TABLE t1 (
    c1 INT IDENTITY,
    c2 AS c1 * 2,
    c3 AS c1 + c1,
    c4 CHAR (3000) DEFAULT 'a');
GO
CREATE UNIQUE CLUSTERED INDEX t1_clus ON t1 (c1);
GO

SET NOCOUNT ON;
GO
INSERT INTO t1 DEFAULT VALUES;
GO 100000

This creates a table with a a bunch of rows, with each row pretty large so that the cost of scanning the table is expensive.

Now say I want to run a query:

SELECT COUNT (*) FROM t1
    WHERE c2 BETWEEN 10 AND 1000
    AND c3 > 1000;
GO
 

If I display the estimated execution plan...

 

...it will tell me there's a missing index I should create:

 

So I go ahead and create the index and everything's cool:

CREATE NONCLUSTERED INDEX [_missing_c2_c3] ON [dbo].[t1] ([c2],[c3]);
GO

Now what if I want to do something more complicted? How about a cursor over the table? (Don't start on about not using cursors - they're everywhere in application code we see - this is just an easy example to engineer.)

DECLARE testcursor CURSOR FOR
    SELECT c1 FROM t1
    WHERE
        c2 BETWEEN 10 AND 1000
        AND c3 > 1000;

DECLARE @var BIGINT;

OPEN testcursor;

FETCH NEXT FROM testcursor INTO @var;

WHILE (@@fetch_status <> -1)
BEGIN
    -- empty body
    FETCH NEXT FROM testcursor INTO @var;
END

CLOSE testcursor;
DEALLOCATE testcursor;

If I display the estimated execution plan again, it shows:

 

Hmm. That index is actually exactly the same as the one we created earlier (even though it's asking for c1 to be INCLUDEd, it already is in the existing nonclustered index as c1 is the cluster key and is included automatically). However, just to prove I'm not doing anything dodgy, I'll create the index it wants:

CREATE NONCLUSTERED INDEX [_missing_c2_c3_inc_c1] ON [dbo].[t1] ([c2],[c3]) INCLUDE ([c1]);
GO

And nothing changes. You cannot get the missing index code to stop recommending the index. The index isn't being used for the *Key Lookup* in the query plan above  - but the missing index code thinks the index would be useful and suggests it. Not only would that index not actually help the Key Lookup, it already exists!

If you use a query that aggregates the missing index DMV output (such as Bart Duncan's excellent script) and you have some very common queries on your system that are hitting this bug, you will find that the missing index DMV aggregation will be broken too.

Be careful out there!

Theme design by Nukeation based on Jelle Druyts