Prepared statements: be careful when given “THE” solution for all of your development practices

I’m not sure about how to title this OR how to avoid the term “it depends” but I also don’t feel that “it depends” is a bad phrase. Some folks think of it as a cop-out when they don’t have a better answer but the problem is that it’s often the RIGHT answer. There are rarely simple solutions that work in ALL situations. And, as a result – it depends – is really correct. What’s most important is knowing what it depends ON.

So, where am I going with this one?

I recently received a question that started by referencing a post titled: NHibernate, the database query cache and parameter sizes. My issue is that there is truth in that post (and also in the comments) but there are also a few statements that just aren’t quite right. In fact, there’s one that I thought was right but I had to really dig into it to see what was really happening and what’s happening is NOT what I expected. And, this is really the crux of the problem. Sometimes the behavior (or, solution) is completely counter-intuitive. But, above all, there’s one fatal mistake in this post (and many others) – where they state what THE solution is. Some statements are claiming that they are [ALWAYS] better than the other and unfortunately, that’s just not the case… (comments were closed so I couldn’t comment on it directly)

What I really want to dissect a bit is this part of the post:

There are arguments for both sides, but most DBAs would really like to see only a single query plan for all parameter sizes. I didn’t have the answer off the top of my head, but I was pretty sure that NHibernate did the right thing from the point of view of the database.

As it turn out, I was wrong. By default, NHibernate will generate two separate query plans for this query in the database. Sounds bad, right?

Not really, here is the fix, just put the following in your configuration:

<property name='prepare_sql'>true</property>

And, this particular comment:

The side-effect of calling Prepare() is that a so-called “prepared statement” is created in the DB. This is essentially a statement that is precompiled in the DB and gives back a handle so future calls on THE SAME CONNECTION can be made just referencing this handle and not sending the entire sql again. If all calls to the DB were made on the same connection its true that the execution plans would be reused, but as we know – connections are drawn from the connection pool as needed and handed back in a random manner. The prepared statements on the connection are un-prepared everytime a new session is created and the connection is soft-initialized.

First and foremost, the post and the comments aren’t bad. These people are working to make their environments better and to help others to sort out their bizarre behaviors / problems. They are truly trying to help. I don’t suspect even a tiny bit of their intentionally wanting to lead anyone astray. However, some of this is just wrong. So, let me begin.

There ARE arguments for both sides.

Yes, this is TERRIBLY important. The reason there are arguments for both sides is that EACH side is correct SOME of the time. Neither is correct ALL of the time and, of course, something has to be chosen as the default. Let me try to give a basic example of this… I’ll use a traffic analogy. It’s 2am, you’ve been coding for quite a few hours, and you want to get some great diner food – in Seattle. But, you live in Redmond (across the bridge). The shortest distance between you and the restaurant will get you there in 30 minutes but requires you to cross a toll-bridge. So, the trade-offs are money for time. But, you decide to go solely “for performance” and you choose that plan… take the 520 over to Seattle.

Now, let’s say that plan is “in cache” and MUST be re-used the next time you want to go to the diner. However, the next time you want to go – it’s 4:30pm, on a Friday, and there’s a Husky game… the plan to use 520 is a bad idea. Not because it’s not the shortest distance but because of other problems (congestion, “blocking,” etc…). And, don’t get me wrong – SQL Server does NOT change plans for “blocking” or “congestion” (so, the analogy breaks down a bit here). But, you can still see the point.

A chosen plan for one set of parameters may not be best with different parameters.

And, this is why there isn’t a single right answer for EVERY statement. So, if you’re looking for THE right choice (should the statement’s plan be placed in the cache or not) then this is the tough part – what’s really best depends on what I call – the stability of the plan generated against a wide variety of different parameters.

With the right kind of testing, you can execute a statement with a variety of different parameter combinations and very whether or not each of their optimal plans has the same pattern / shape. If they do – then, I’d say that statement’s optimal plan is stable. However, (and this is often MORE likely), if the optimal plan varies, then that statement’s optimal plan is unstable (across different parameter combinations).

The end result is really a set of two rules:

  1. When a statement’s optimal plan is stable and does NOT vary across executions – a plan should be saved.
  2. When a statement’s optimal plan is unstable and varies across executions – a plan should not be saved.

“Most DBAs would really like to see only a single query plan for all parameter sizes.”

Well, only when the plan is stable. If the plan is unstable then another problem occurs – this is often referred to as “parameter sniffing problems” or “parameter sensitivity problems” (a.k.a. PSP). PSP can actually end up being a worse problem. If a bad plan sits in cache and is re-used for all executions (executing an inefficient plan) then you can end up with statements doing a tremendous amount of I/O or spilling to disk with bad memory grants or doing the wrong type of join, etc. when they shouldn’t. I’ve seen PSP essentially take a server down (causing so much I/O that the server is essentially inaccessible and unable to process any additional requests. Some clients even “react” to this by restarting SQL Server… problem solved. Well, cache cleared and the bad plan will be thrown out. Hopefully, the next execution will be what we want and a reasonable plan will get into cache. At least temporarily.

And, this is how it goes. Poor performance occurs and often a variety of things are blamed (cache or I/O or statistics or indexes) and some quick fix (updating statistics is another common one) SEEMS to solve the problem. But, ultimately, it’s probably a side-effect of what you’ve done that’s solved the problem (NOTE: Updating statistics often “invalidates” plans so the side-effect of having updated statistics might be what solved your problem and NOT the updating of statistics itself). I talked about this in a related post: What caused that plan to go horribly wrong – should you update statistics?

So, to bring it back to the main point… HOW you execute and whether or not YOU choose to force the statement to be cached are VERY important decisions and unfortunately depend on a statement by statement basis. I’ll try to bring it all together in just a moment.

“I was pretty sure that NHibernate did the right thing from the point of view of the database. As it turn out, I was wrong. By default, NHibernate will generate two separate query plans for this query in the database. Sounds bad, right?

First off, I’d have to agree WITH the default. The default is often referred to as “simple” parameterization. This is where SQL Server looks at the statement and decides – no, I shouldn’t cache a single plan because the plan might not be “safe” across all executions. And, as a result, it must be compiled for EVERY execution. I agree with the default because many executing a bad plan can be WAY more expensive than recompiling to get a good plan and executing that.

But, there is a bad side to this too. If the statement is STABLE then recompiling every time wastes time in compiling (CPU) and, it can also lead to plan cache bloat. (I’ve talked more about this here: Plan cache and optimizing for adhoc workloads)

So… I think that NHibernate is doing the right thing by default.

But, having said that, that default is NOT great for everything.

“Just put the following in your configuration: <property name=‘prepare_sql’>true</property>”

MAYBE – but, only when the plan is STABLE.  (yes, I know… I still haven’t fully described how you can tell if a plan is stable)

So, what does that do? With a bit of digging (and, from the email that I received), it appears as though this essentially uses / created a “prepared” statement. And, this also ties into one the comment that I pointed out above. And, while the comment sounds correct (and, I even agreed with it at first); that’s not actually how it works (yes, I was surprised too).

Prepared statements using sp_prepare

It turns out that telling NHibernate to use prepare_sql = true effectively does something like the following:

DECLARE @handle int
EXEC sp_prepare @handle OUTPUT,
     N'@P1 VARCHAR(15)',
     N'SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] LIKE @p1';
EXEC sp_execute @handle, N'Anderson';
EXEC sp_unprepare @handle;
GO

At first glance it seemed like SQL would have to generate a different plan for every execution (except those that used the same handle) but I found that as long as statement was the same – the plan was the same. ACROSS CONNECTIONS. This surprised me. Even with the sp_unprepare. Even with the explicit @handle. YES, they re-used the plan. So, the effect is what was desired by the prepare_sql property being set to true. But, this is NOT always ideal (again, remember, the execution of a bad plan can often be more expensive than the cost of recompilation). But, not, recompiling every time should be avoided where the plans are stable.

So, then I started wondering about what sp_unprepare does and I found another post discussing this a bit more: Watch out those prepared SQL statements which is interesting in and of itself but that just exposes yet-another issue with where / how the handles are managed. So, if you’re calling these yourself then you definitely want to sp_unprepare. But, that still doesn’t get rid of the PSP problem. And, again, these PSP problems can be across connections.

Prepare your statements or not – that is the question?!

It really all boils down to this – effective testing. Most developers are fantastic at code coverage testing. However, most don’t know about what I call “plan stability” testing. The good news is that if you are doing good code-coverage testing then it should make code-stability testing easy!

Code-coverage testing should thoroughly test all of your code branches and all of your possible parameter combinations (or, at least, a healthy set of the more common combinations). In addition to making sure that these code combinations “do the right thing,” what I want you to do is see if their optimal plan varies across these executions. There are a few ways to do this:

For statements: If you can isolate the statement and hard-code different combinations of parameters – this might be the easiest way to do a test. What I would do with the statement above is something like this:

SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] = 'Tripp';
SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] = 'Anderson';
SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] = 'Smith';
SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] = 'Abbas';
GO

Then, execute all of these in SSMS with the option “Include Actual Execution Plan” (from the Query drop-down). Now, before you think you need to understand everything about plans and what they mean… here’s the good news. For “stability-testing” you do NOT need to care what the plan does. What you need to look at is the plan shape (OK, sometimes it’s not quite this easy and I would look a little deeper [like at the types of operators and sometimes deeper still] but I will venture that the large majority of huge performance problems are caused by vastly different plan shapes and so the majority of problems can be found JUST by looking at the shape of the plan and not having to dive in much deeper).

If the plan shape varies then the optimal plan for this statement varies; do not FORCE / PREPARE it into cache. 

If the plan shape does not vary then the optimal plan for this statement is consistent / stable; this statement should be forced / prepared in the cache. (IMPORTANT NOTE: if you don’t do a very good job testing a wide-enough range of combinations you might make a bad decision here. It’s very important to have realistic data, realistic parameter combinations, and a number of them.)

For stored procedures: For stored procedures there’s a similar / easy way to see if the plan varies from execution to execution by using EXEC procedure parameters WITH RECOMPILE. While this does have some limitations, this can also be helpful at knowing if a stored procedure might be prone to similar PSP. I did write a detailed post on that here: Building High Performance Stored Procedures.

The End Result

Blog posts like the one I referenced are based in fact. And, I believe that what they’ve done did help for THEIR problem. However, trying to over simplify (for ALL environments) a single coding practice is often NOT IDEAL. Unfortunately, it depends IS the right general answer here.

If you’re looking for a lot more content around this, check out the links I’ve included in this post – there are numerous free resources including an online video I did as part of PASStv (linked to in the Building High Performance Stored Procedures post). And, if you still want more content, demos, and details, check out my two Pluralsight courses specifically targeting these problems. Start with this one: SQL Server: Optimizing Ad Hoc Statement Performance and then watch this one: SQL Server: Optimizing Stored Procedure Performance.

Have fun!
k

9 thoughts on “Prepared statements: be careful when given “THE” solution for all of your development practices

    1. Ha… if there were only a few more hours in the day. ;-) But, I will try to do more of this. My problem is that I’m not an expert in every technology that connects to SQL but SQL itself. So, I can definitely tell you what’s happening when I have enough information, etc. The good news is that the same DMV queries will work and you can still analyze what’s going on behind the scenes; this really helps me get to the bottom of things!

      Hope that helps a bit though!
      k

      1. Most rewarding education in my life was trying to debug an issue with major upgrade to our website. Queries were running super slow. The developer had assured us there were no issues with the technology they were implementing for the first time. That technology? Entity Framework. We finally narrowed it down to the fact that the query from EF was using nvarchar (the default behavior), and our tables were varchar, thus making our indices useless. WE told them, the site was rolled back (there were other issues involved besides the DB), and three months later, much faster. They’ve since moved away from Entity Framework.

        1. Yeah, I hear you… and, while I completely get that there ARE benefits in rapid application design, there are also problems. No solution is perfect ALL the time (and, that certainly includes SQL Server’s defaults as well). I guess this is really what I want people to understand. It’s just like any tool – get to know how it works and you’ll use it better. In some cases, it might even mean a different tool. But, the more you know – the better your decisions will be. Understanding that defaults only work some of the time is certainly the first hurdle! Knowing that there are often multiple possibilities, the second!

          In the end, and often [sadly], performance is often the trade-off for rapid application design. A bit more time spent designing and understanding the environment and you’ll probably have an application that lives longer because it’s more usable and scalable!

          Cheers,
          k

  1. Hi Kimberly, I’m glad you posted this. I recently was looking into how SSIS uses prepared statements and wasn’t finding much documentation about how sp_prepare works. One concern I had was that compilation appears to occur when sp_prepare is called, and at the time it’s called there doesn’t appear to be any information available about the run-time values for the parameters. To your knowledge, is that a correct description of how sp_prepare works?

    This would then raise a question of how cardinality is estimated with no run-time value for the parameter available. Does the optimizer have to rely on some heuristic like average density?

    Thanks if you have time to answer — some of the behavior I’m seeing with prepared statements is quite confusing.

    James

    1. Hey there James – Actually, yes, I noticed that too but I didn’t really dive into that in great depth when I first wrote this. Having said all of that – I just ran through a couple of tests and it appears to use the density_vector (average) each time. I didn’t see it “sniff” at all. So, you could look at this as both a good thing AND a bad. The good is that you’ll get an average plan (so, no atypical plans will be created for values that are anomalies [or, heavily skewed]). But, the bad side to this is that your never getting a plan for the specific values used. And, if the data isn’t largely evenly distributed AND the queries largely use the evenly distributed values, then the plan might not be good for anyone.

      So… that adds another level of frustration to this (IMO). It’s certainly a GREAT POINT!!

      Thanks,
      k

  2. “Most DBAs would really like to see only a single query plan for all parameter sizes.” – Yes, as you mentioned for STABLE plans, which is specifically what I was getting at. That was really the crux of the problem – dealing with these situations where the majority of plans ARE for the most part stable, but some of the queries have 5 – 10 variable length parameters, and when NH sends those over with the data types based on the length of the parameter value, you end up getting hundreds of plans for the same queries which are essentially identical.

    Thinking about it, and circling back to your post from 2010 on “Plan cache, adhoc workloads and clearing the single-use plan cache bloat”, I’m thinking my better course of action here is to just regularly clear the adhoc plans from cache, rather than to force parameterization and take on the risk of caching some suboptimal plans just in order to reduce compilations and the caching of these duplicates plans. Do you think that makes sense?

    Thanks Kimberly,

    Jeremy

    1. Yeah, I’d have to agree. I’m a bigger fan of clearing cache regularly and not forcing plans unless I’m absolutely certain that they’re stable. Sometimes the cost of recompilation is much lower than compared to executing a bad plan.

      (and, sorry for the delay – I thought I had replied but I did have a couple of days where I was bouncing back / forth and maybe I didn’t actually post… sorry!)

      Cheers,
      k

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Wow! Wow! Wow! THANK YOU!

I announced my retirement from SQL/tech here and your comments on my blog, on LinkedIn, and on Facebook were overwhelming and humbling! I’m so touched

Explore

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.