Most of my performance tuning is done in small groups – a developer is frustrated because his query isn't fast enough, so the developer and the DBA get together to show me the code in question. If they hand me a stored procedure with a SELECT statement to tune, the first thing I do is read through the comments. That usually only takes two seconds because so few people write comments, so then we spend the next few minutes trying to reverse engineer what data they're after. I talk through it aloud:

"It looks like we're trying to retrieve the widget sales by salesperson for the last 90 days?"

I want to express it in plain English because sometimes the query isn't actually getting the right data. For example, in one of my recent sessions, the project manager immediately turned to the developer and said, "No, that's not right – it needs to be by calendar month for the last 3 months." Turning the query into human-understandable English lets you bridge that gap between the coders and the businesspeople. At that point, I can avoid optimizing the query, because they're going to go rewrite it anyway. If the query is indeed right, I follow up with another question:

"Are these the right tables to get widget sales grouped by salesperson and by month?"

That makes everyone in the room stop and think. Maybe there's already a reporting structure for this. Maybe there's a more narrower set of tables I can use to achieve that goal. If these are indeed the right tables, then I start by examining every table involved in the query to see if I can rip it out altogether. I don't change the output fields – I just rip out tables. For example, take this query:

SELECT c.CustomerID, SUM(s.SalesAmount) AS TotalSalesAmount
FROM dbo.Customers c
INNER JOIN dbo.Sales s ON c.CustomerID = s.CustomerID
INNER JOIN dbo.SalesRegions sr ON s.SalesRegionID = sr.SalesRegionID
LEFT OUTER JOIN dbo.SalesPeople sp ON s.SalesPersonID = sp.SalesPersonID
GROUP BY c.CustomerID

The query isn't doing anything with dbo.SalesPeople, and because it's a left outer join, it isn't required in the query. Removing it won't change the results. On the other hand, I can't simply rip out dbo.SalesRegions even though nothing's being selected or grouped by any of its fields. It's part of an INNER JOIN, so that means this query will only return results from dbo.Sales that have a legit SalesRegionID. (Bonus question for the comments: what properties of Sales.SalesRegionID would tell me that I could remove the join to SalesRegions without breaking the query?) Next, I can remove the dbo.Customers table too. If I'm grouping by CustomerID, I don't need to touch the dbo.Customers table at all. Since CustomerID is part of the join and it's on both tables, I can use Sales.CustomerID instead. So here's the rewritten query:

SELECT s.CustomerID, SUM(s.SalesAmount) AS TotalSalesAmount
FROM dbo.Sales s
INNER JOIN dbo.SalesRegions sr ON s.SalesRegionID = sr.SalesRegionID
GROUP BY s.CustomerID
If the query had included fields that were only available on the Customers table, like c.CustomerName, then I wouldn't have been able to use this technique. You'd be surprised at how often people copy/paste queries and then modify them to serve a different purpose, but they forget to strip all the extraneous tables out of the joins. (I'm certainly surprised.) This technique reminds me of the Sex Panther Cologne in the movie Anchorman: The Legend of Ron Burgundy, "They've done studies, you know. 60% of the time, it works every time."

I check the query costs after removing extra tables, and if it's not significantly lower, then I could start really tuning the query – but nope. Not gonna do it. Wouldn't be prudent. I ask:

"Are there similar tables that already aggregate the data differently? Can we deliver that data to the users instead?"

I want the business users, project managers, and pointy-headed bosses to stop for a minute and think about what the report really needs. In one recent tuning session, the developer said, "Well, now that you mention it, I'm just showing five other users who bookmarked this same report. It doesn't matter which five they are." Bingo! That tells me I've got the opportunity to rewrite their query in a way that takes advantage of existing indexes. Before, they might have been sorting for the top 5 users ordered by login date, but I could sort them by some other field where I've created an index.

If the results of the query can't be changed, I'm still not out of options. My next question is:

"How old can this data be? Can it be from 5 minutes ago? From last night?"

The fastest query is the one you never run. The second fastest is the one you were able to run ahead of time. The ability to calculate (or pre-cook) the data after-hours gives us tremendous flexibility. Heck, sometimes my users even get excited at this concept, because they've always wanted to see a static set of results as of this morning's numbers so that they can consistently refer to them throughout the day. I can then introduce the option of aggregated reporting tables calculated overnight via SSIS, T-SQL jobs, or a monkey with a keyboard.

At the end of this discussion, I may still need to tune the query, but asking these four simple questions only takes a couple of minutes and may open up a world of architecture tuning options. The faster you want to scale your application, the tougher you have to be about asking these questions. In some environments, I have to put my foot down and say, "I'm sorry, but you can't run this query in production no matter how fast it is. We have to explore the other options from these four questions."