I've been doing a lot of performance tuning work over the last couple of months and this weekend found something that's very pervasive out in the wild. Kimberly was helping me optimize a gnarly query plan and spotted something in the code I hadn't noticed that was causing an index scan instead of an index seek. Even though the query plan was using a covering nonclustered index, the way the code was written was causing the index to be scanned.

I'll set up a test to show you what I mean.

First, create an example sales tracking table and populate it (took 3m45s on my laptop):

– Create example table
CREATE TABLE BigTableLotsOfColumns (
    SalesID  BIGINT IDENTITY,
    SalesDate DATETIME,
    Descr  VARCHAR (100),
    CustomerID INT,
    ProductID INT,
    ModifyDate DATETIME DEFAULT NULL,
    PayDate  DATETIME DEFAULT NULL,
    Quantity INT,
    Price  DECIMAL (6,2),
    Discount INT DEFAULT NULL,
    WarehouseID INT,
    PickerID INT,
    ShipperID INT);
GO 

– Populate table
SET NOCOUNT ON;

DECLARE @Date DATETIME;
DECLARE @Loop INT;

SET @Loop = 1;
SET @Date = '01/01/2010';

– Insert 1500 sales per day
WHILE @Loop < 547600
BEGIN
    INSERT INTO BigTableLotsOfColumns (
        SalesDate, Descr, CustomerID, ProductID, Quantity, Price, WarehouseID, PickerID, ShipperID)
    VALUES (
        @Date,
        'A nice order from someone',
        CONVERT (INT, (RAND () * 1000)),
        CONVERT (INT, (RAND () * 1000)),
        CONVERT (INT, (RAND () * 10)),
        ROUND (RAND () * 1000, 2),
        1,
        CONVERT (INT, (RAND () * 4)),
        CONVERT (INT, (RAND () * 4)));
  
    IF @Loop % 1500 = 0
        SET @Date = @Date + 1;
  
    SET @Loop = @Loop + 1;
END
GO

– And a clustered index
CREATE CLUSTERED INDEX IX_BigTableLotsOfColumns_Clustered ON BigTableLotsOfColumns (SalesID);
GO

Now I've got a simple stored procedure to find the total sales amount for any particular date.

CREATE PROCEDURE TotalSalesForDate (
 @TargetDate DATETIME)
AS 
    SELECT SUM (Quantity * Price)
    FROM BigTableLotsOfColumns
    WHERE DATEDIFF (DAY, SalesDate, @TargetDate) = 0
    OPTION (MAXDOP 1);
GO

The idea is that the stored procedure will pick up all sales for a particular date, no matter what time of that day the sale occured. I've got OPTION (MAXDOP 1) to simulate a system that's been tuned for OLTP queries (and to make the query plans fit better in my blog post :-)

If I run the stored procedure for, say, December 26th 2010, the query plan it uses is as follows:

EXEC TotalSalesForDate '12/26/2010';
GO

 plan1 Adventures in query tuning: non seekable WHERE clause expressions

And it uses the following CPU and IO (you can get these by using SET STATISTICS IO ON and SET STATISTICS TIME ON, but be careful – the can generate lots of output for big procs):

Table 'BigTableLotsOfColumns'. Scan count 1, logical reads 7229, physical reads 0, …

SQL Server Execution Times:
   CPU time = 125 ms,  elapsed time = 186 ms.

I ran the proc twice – once to get it to compile, and the second time to factor out the compile time.

Pretty obvious that I can speed this up by creating a covering nonclustered index, and in fact you can see the query processor telling me that with the missing index suggestion. I create the covering nonclustered index:

CREATE NONCLUSTERED INDEX IX_BigTableLotsOfColumns_Date ON BigTableLotsOfColumns (SalesDate)
INCLUDE (Quantity, Price);

Now when I run the proc, I get a better query plan:

 plan2 Adventures in query tuning: non seekable WHERE clause expressions

And the following CPU and IO:

Table 'BigTableLotsOfColumns'. Scan count 1, logical reads 2113, physical reads 0, …

SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 115 ms.

Much better, as I'd expect. But look at how the nonclustered index is being used: it's being scanned when instead there should be a seek. It's still doing more than 2000 logical reads because it's scanning the whole index.

The reason for this is that the DATEDIFF in the proc is forcing every SalesDate column value to be evaluated, hence the scan. If I know that the proc only ever is called with a date without a time portion, I can do some rearranging of the logic.

The updated proc is below. Notice that there's no reason to scan any more as the WHERE clause doesn't require every column value to passed through the DATEDIFF:

CREATE PROCEDURE TotalSalesForDate (
    @TargetDate DATETIME)
AS 
    SELECT SUM (Quantity * Price)
    FROM BigTableLotsOfColumns
    WHERE SalesDate >= @TargetDate AND SalesDate < DATEADD (DAY, 1, @TargetDate)
    OPTION (MAXDOP 1);
GO

And when I run it I get the best plan:

 plan3 Adventures in query tuning: non seekable WHERE clause expressions

And the following CPU and IO:

Table 'BigTableLotsOfColumns'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, …

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 64 ms.

Wow! 200x less logical IOs and not even enough CPU time to register as 1ms.

Summary: although you may have nonclustered indexes that are being used, make sure that they're being used for seeks as you expect. For small queries that are executed hundreds of times a second, the difference between seeking and scanning can be huge in overall performance.