Stop Tuning Your Queries.

Most of my performance tuning is done in small groups – a developer is frustrated because his query isn't fast enough, so the developer and the DBA get together to show me the code in question. If they hand me a stored procedure with a SELECT statement to tune, the first thing I do is read through the comments. That usually only takes two seconds because so few people write comments, so then we spend the next few minutes trying to reverse engineer what data they're after. I talk through it aloud:

"It looks like we're trying to retrieve the widget sales by salesperson for the last 90 days?"

I want to express it in plain English because sometimes the query isn't actually getting the right data. For example, in one of my recent sessions, the project manager immediately turned to the developer and said, "No, that's not right – it needs to be by calendar month for the last 3 months." Turning the query into human-understandable English lets you bridge that gap between the coders and the businesspeople. At that point, I can avoid optimizing the query, because they're going to go rewrite it anyway. If the query is indeed right, I follow up with another question:

"Are these the right tables to get widget sales grouped by salesperson and by month?"

That makes everyone in the room stop and think. Maybe there's already a reporting structure for this. Maybe there's a more narrower set of tables I can use to achieve that goal. If these are indeed the right tables, then I start by examining every table involved in the query to see if I can rip it out altogether. I don't change the output fields – I just rip out tables. For example, take this query:

SELECT c.CustomerID, SUM(s.SalesAmount) AS TotalSalesAmount
FROM dbo.Customers c
INNER JOIN dbo.Sales s ON c.CustomerID = s.CustomerID
INNER JOIN dbo.SalesRegions sr ON s.SalesRegionID = sr.SalesRegionID
LEFT OUTER JOIN dbo.SalesPeople sp ON s.SalesPersonID = sp.SalesPersonID
GROUP BY c.CustomerID

The query isn't doing anything with dbo.SalesPeople, and because it's a left outer join, it isn't required in the query. Removing it won't change the results. On the other hand, I can't simply rip out dbo.SalesRegions even though nothing's being selected or grouped by any of its fields. It's part of an INNER JOIN, so that means this query will only return results from dbo.Sales that have a legit SalesRegionID. (Bonus question for the comments: what properties of Sales.SalesRegionID would tell me that I could remove the join to SalesRegions without breaking the query?) Next, I can remove the dbo.Customers table too. If I'm grouping by CustomerID, I don't need to touch the dbo.Customers table at all. Since CustomerID is part of the join and it's on both tables, I can use Sales.CustomerID instead. So here's the rewritten query:

SELECT s.CustomerID, SUM(s.SalesAmount) AS TotalSalesAmount
FROM dbo.Sales s
INNER JOIN dbo.SalesRegions sr ON s.SalesRegionID = sr.SalesRegionID
GROUP BY s.CustomerID
If the query had included fields that were only available on the Customers table, like c.CustomerName, then I wouldn't have been able to use this technique. You'd be surprised at how often people copy/paste queries and then modify them to serve a different purpose, but they forget to strip all the extraneous tables out of the joins. (I'm certainly surprised.) This technique reminds me of the Sex Panther Cologne in the movie Anchorman: The Legend of Ron Burgundy, "They've done studies, you know. 60% of the time, it works every time."

I check the query costs after removing extra tables, and if it's not significantly lower, then I could start really tuning the query – but nope. Not gonna do it. Wouldn't be prudent. I ask:

"Are there similar tables that already aggregate the data differently? Can we deliver that data to the users instead?"

I want the business users, project managers, and pointy-headed bosses to stop for a minute and think about what the report really needs. In one recent tuning session, the developer said, "Well, now that you mention it, I'm just showing five other users who bookmarked this same report. It doesn't matter which five they are." Bingo! That tells me I've got the opportunity to rewrite their query in a way that takes advantage of existing indexes. Before, they might have been sorting for the top 5 users ordered by login date, but I could sort them by some other field where I've created an index.

If the results of the query can't be changed, I'm still not out of options. My next question is:

"How old can this data be? Can it be from 5 minutes ago? From last night?"

The fastest query is the one you never run. The second fastest is the one you were able to run ahead of time. The ability to calculate (or pre-cook) the data after-hours gives us tremendous flexibility. Heck, sometimes my users even get excited at this concept, because they've always wanted to see a static set of results as of this morning's numbers so that they can consistently refer to them throughout the day. I can then introduce the option of aggregated reporting tables calculated overnight via SSIS, T-SQL jobs, or a monkey with a keyboard.

At the end of this discussion, I may still need to tune the query, but asking these four simple questions only takes a couple of minutes and may open up a world of architecture tuning options. The faster you want to scale your application, the tougher you have to be about asking these questions. In some environments, I have to put my foot down and say, "I'm sorry, but you can't run this query in production no matter how fast it is. We have to explore the other options from these four questions."

15 thoughts on “Stop Tuning Your Queries.

  1. I would only add that although it certainly is a red flag, you cannot always remove left joins to "unused" tables with no effect. This is because any join can potentially result in a one-to-many multiplication of rows depending upon your design. In many (most?) cases this behavior may be incorrect, so it may need to be discussed.

  2. Not to be totally pedantic (just a little bit), but removing the SalesPeople table in that first query will not only not have any effect on the output, it will also not effect the query plan in any way (assuming that the database has proper keys set up). So that’s not going to help from a performance point of view; just with code quality.

    As for the rest of your post, good points. I regularly see situations where queries I’m asked to tune aren’t even getting the right data. I sometimes wonder how it is that our civilization thrives given that most of our financial and commerce systems are built on top of relational databases, and all but the most trivial queries tend to have bugs. Scary…

  3. Brent – What is a *trusted* foreign key.

    If the table Sales has a trusted foreign key relation to the SalesRegion table, you can safely remove the INNER JOIN because SQL Server can guarantee that no "bad" SalesRegionIDs are in the Sales table.

  4. Brent – Like I said, it’s almost always bad queries – either they get bad results or they get good results for the wrong reason. But say you have a many-to-one parent<-child:

    SELECT Invoice.*
    FROM Invoice
    LEFT JOIN InvoiceDetail
    ON InvoiceDetail.InvoiceID = Invoice.InvoiceID

    Obviously, you get different (fewer) rows in the result when you remove the left join for any Invoices with more than one InvoiceDetail row.

    Of course, no one in their right mind would think this query should work that way, and in most cases they shouldn’t. But sometimes when you are looking at some crufty bit of code that someone shouldn’t have been allowed to commit to a database in the first place, you can’t tell by looking at it what the intentions were. When these things are nested inside something and someone has wrapped it with a DISTINCT or worse and they’ve got subqueries and who knows what other craziness going on, you still have to be careful that the refactoring you do really is invariant.

    You see this a lot with contact-type designs where a customer can have multiple contacts and they want one (just any old one will work, of course) on the report and the system underspecifies how to determine the one to pick. And then they use MIN or MAX (over both the contact name and phone number, so you don’t even get the right number of the name…).

  5. RE: "Bonus question for the comments: what properties of Sales.SalesRegionID would tell me that I could remove the join to SalesRegions without breaking the query?)"

    If there is a non-null foreign key relationship between Sales.SalesRegionID and SalesReions.SalesRegionID then you could remove the table since none of its columns are used in the query and the database already is enforcing the requirement of a Sale having a valid SalesRegionID via referential integrity, so it is redundant to do so in the query as well.

  6. Brent:

    SELECT x.*
    FROM (select 1) AS x (y) LEFT OUTER JOIN (select 1 union all select 1) AS w (y) ON w.y = x.y

    … which is why I said in my previous comments that behavior depends upon the correct keys existing in the database.

  7. If you do put an appropriate constraint on the regions (non-NULL, foreign key constraint to primary key in the region table), the optimizer will filter that out any way – because it’s smart enough to know there will always be one and only one matching row and the inner join is not actually performed. I think Celko had an article a while back about constraints being useful for more than data integrity, and Grant Fritchey (http://scarydba.com/2010/11/22/do-foreign-key-constraints-help-performance/) also blogged about it last week.

  8. Cade – when you say "some crufty bit of code" and use a left outer join with multiple inaccurate results returned, that’s exactly what I’m filtering out with my very first question – asking what exactly they’re trying to return. You’re completely right in that these kinds of queries pop up a lot, but I catch ’em right away with that first question, and that’s the end of tuning that query. Nobody in their right mind says, "Why yes, it’s okay that this code returns multiple results with exactly the same data, and we’d like more inaccurate data faster, please."

    Adam – about the left outer joins to subqueries – agreed, that’s why I said I remove *tables* with left outer joins. I didn’t say I removed subqueries (and I don’t).

    Everybody who got the non-null trusted foreign key – bingo! And yes, I still remove those even though Grant (and others) have pointed out that it doesn’t slow down query speed, because it doesn’t mean the trusted non-null foreign key constraint will always be in the database. If some bozo drops the foreign key constraints to "speed up the database", suddenly my queries will run slower if they’re relying on the engine to detect those constraints to remove joined tables.

  9. Brent, it doesn’t matter if it’s a table or a "subquery" (the query I posted includes no subqueries — those are derived tables). The same problem exists in either case.

  10. Common sense, but you highlight some great mistakes that people make when they are copy code and don’t spend time actually writing the correct query for the correct question.

    Great post.

  11. Guys, the point of the article is not the example query….it’s the overall process steps of query tuning…As technology people we all are inclined to go straight to the nitty gritty details and start picking them apart. Fly at 10,000 feet for a while.

    Great article.

    Robert

Comments are closed.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.