I’m not sure about how to title this OR how to avoid the term “it depends” but I also don’t feel that “it depends” is a bad phrase. Some folks think of it as a cop-out when they don’t have a better answer but the problem is that it’s often the RIGHT answer. There are rarely simple solutions that work in ALL situations. And, as a result – it depends – is really correct. What’s most important is knowing what it depends ON.
So, where am I going with this one?
I recently received a question that started by referencing a post titled: NHibernate, the database query cache and parameter sizes. My issue is that there is truth in that post (and also in the comments) but there are also a few statements that just aren’t quite right. In fact, there’s one that I thought was right but I had to really dig into it to see what was really happening and what’s happening is NOT what I expected. And, this is really the crux of the problem. Sometimes the behavior (or, solution) is completely counter-intuitive. But, above all, there’s one fatal mistake in this post (and many others) – where they state what THE solution is. Some statements are claiming that they are [ALWAYS] better than the other and unfortunately, that’s just not the case… (comments were closed so I couldn’t comment on it directly)
What I really want to dissect a bit is this part of the post:
There are arguments for both sides, but most DBAs would really like to see only a single query plan for all parameter sizes. I didn’t have the answer off the top of my head, but I was pretty sure that NHibernate did the right thing from the point of view of the database.
As it turn out, I was wrong. By default, NHibernate will generate two separate query plans for this query in the database. Sounds bad, right?
Not really, here is the fix, just put the following in your configuration:
And, this particular comment:
The side-effect of calling Prepare() is that a so-called “prepared statement” is created in the DB. This is essentially a statement that is precompiled in the DB and gives back a handle so future calls on THE SAME CONNECTION can be made just referencing this handle and not sending the entire sql again. If all calls to the DB were made on the same connection its true that the execution plans would be reused, but as we know – connections are drawn from the connection pool as needed and handed back in a random manner. The prepared statements on the connection are un-prepared everytime a new session is created and the connection is soft-initialized.
First and foremost, the post and the comments aren’t bad. These people are working to make their environments better and to help others to sort out their bizarre behaviors / problems. They are truly trying to help. I don’t suspect even a tiny bit of their intentionally wanting to lead anyone astray. However, some of this is just wrong. So, let me begin.
There ARE arguments for both sides.
Yes, this is TERRIBLY important. The reason there are arguments for both sides is that EACH side is correct SOME of the time. Neither is correct ALL of the time and, of course, something has to be chosen as the default. Let me try to give a basic example of this… I’ll use a traffic analogy. It’s 2am, you’ve been coding for quite a few hours, and you want to get some great diner food – in Seattle. But, you live in Redmond (across the bridge). The shortest distance between you and the restaurant will get you there in 30 minutes but requires you to cross a toll-bridge. So, the trade-offs are money for time. But, you decide to go solely “for performance” and you choose that plan… take the 520 over to Seattle.
Now, let’s say that plan is “in cache” and MUST be re-used the next time you want to go to the diner. However, the next time you want to go – it’s 4:30pm, on a Friday, and there’s a Husky game… the plan to use 520 is a bad idea. Not because it’s not the shortest distance but because of other problems (congestion, “blocking,” etc…). And, don’t get me wrong – SQL Server does NOT change plans for “blocking” or “congestion” (so, the analogy breaks down a bit here). But, you can still see the point.
A chosen plan for one set of parameters may not be best with different parameters.
And, this is why there isn’t a single right answer for EVERY statement. So, if you’re looking for THE right choice (should the statement’s plan be placed in the cache or not) then this is the tough part – what’s really best depends on what I call – the stability of the plan generated against a wide variety of different parameters.
With the right kind of testing, you can execute a statement with a variety of different parameter combinations and very whether or not each of their optimal plans has the same pattern / shape. If they do – then, I’d say that statement’s optimal plan is stable. However, (and this is often MORE likely), if the optimal plan varies, then that statement’s optimal plan is unstable (across different parameter combinations).
The end result is really a set of two rules:
- When a statement’s optimal plan is stable and does NOT vary across executions – a plan should be saved.
- When a statement’s optimal plan is unstable and varies across executions – a plan should not be saved.
“Most DBAs would really like to see only a single query plan for all parameter sizes.”
Well, only when the plan is stable. If the plan is unstable then another problem occurs – this is often referred to as “parameter sniffing problems” or “parameter sensitivity problems” (a.k.a. PSP). PSP can actually end up being a worse problem. If a bad plan sits in cache and is re-used for all executions (executing an inefficient plan) then you can end up with statements doing a tremendous amount of I/O or spilling to disk with bad memory grants or doing the wrong type of join, etc. when they shouldn’t. I’ve seen PSP essentially take a server down (causing so much I/O that the server is essentially inaccessible and unable to process any additional requests. Some clients even “react” to this by restarting SQL Server… problem solved. Well, cache cleared and the bad plan will be thrown out. Hopefully, the next execution will be what we want and a reasonable plan will get into cache. At least temporarily.
And, this is how it goes. Poor performance occurs and often a variety of things are blamed (cache or I/O or statistics or indexes) and some quick fix (updating statistics is another common one) SEEMS to solve the problem. But, ultimately, it’s probably a side-effect of what you’ve done that’s solved the problem (NOTE: Updating statistics often “invalidates” plans so the side-effect of having updated statistics might be what solved your problem and NOT the updating of statistics itself). I talked about this in a related post: What caused that plan to go horribly wrong – should you update statistics?
So, to bring it back to the main point… HOW you execute and whether or not YOU choose to force the statement to be cached are VERY important decisions and unfortunately depend on a statement by statement basis. I’ll try to bring it all together in just a moment.
“I was pretty sure that NHibernate did the right thing from the point of view of the database. As it turn out, I was wrong. By default, NHibernate will generate two separate query plans for this query in the database. Sounds bad, right?“
First off, I’d have to agree WITH the default. The default is often referred to as “simple” parameterization. This is where SQL Server looks at the statement and decides – no, I shouldn’t cache a single plan because the plan might not be “safe” across all executions. And, as a result, it must be compiled for EVERY execution. I agree with the default because many executing a bad plan can be WAY more expensive than recompiling to get a good plan and executing that.
But, there is a bad side to this too. If the statement is STABLE then recompiling every time wastes time in compiling (CPU) and, it can also lead to plan cache bloat. (I’ve talked more about this here: Plan cache and optimizing for adhoc workloads)
So… I think that NHibernate is doing the right thing by default.
But, having said that, that default is NOT great for everything.
“Just put the following in your configuration: <property name=‘prepare_sql’>true</property>”
MAYBE – but, only when the plan is STABLE. (yes, I know… I still haven’t fully described how you can tell if a plan is stable)
So, what does that do? With a bit of digging (and, from the email that I received), it appears as though this essentially uses / created a “prepared” statement. And, this also ties into one the comment that I pointed out above. And, while the comment sounds correct (and, I even agreed with it at first); that’s not actually how it works (yes, I was surprised too).
Prepared statements using sp_prepare
It turns out that telling NHibernate to use prepare_sql = true effectively does something like the following:
DECLARE @handle int
EXEC sp_prepare @handle OUTPUT,
N'SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] LIKE @p1';
EXEC sp_execute @handle, N'Anderson';
EXEC sp_unprepare @handle;
At first glance it seemed like SQL would have to generate a different plan for every execution (except those that used the same handle) but I found that as long as statement was the same – the plan was the same. ACROSS CONNECTIONS. This surprised me. Even with the sp_unprepare. Even with the explicit @handle. YES, they re-used the plan. So, the effect is what was desired by the prepare_sql property being set to true. But, this is NOT always ideal (again, remember, the execution of a bad plan can often be more expensive than the cost of recompilation). But, not, recompiling every time should be avoided where the plans are stable.
So, then I started wondering about what sp_unprepare does and I found another post discussing this a bit more: Watch out those prepared SQL statements which is interesting in and of itself but that just exposes yet-another issue with where / how the handles are managed. So, if you’re calling these yourself then you definitely want to sp_unprepare. But, that still doesn’t get rid of the PSP problem. And, again, these PSP problems can be across connections.
Prepare your statements or not – that is the question?!
It really all boils down to this – effective testing. Most developers are fantastic at code coverage testing. However, most don’t know about what I call “plan stability” testing. The good news is that if you are doing good code-coverage testing then it should make code-stability testing easy!
Code-coverage testing should thoroughly test all of your code branches and all of your possible parameter combinations (or, at least, a healthy set of the more common combinations). In addition to making sure that these code combinations “do the right thing,” what I want you to do is see if their optimal plan varies across these executions. There are a few ways to do this:
For statements: If you can isolate the statement and hard-code different combinations of parameters – this might be the easiest way to do a test. What I would do with the statement above is something like this:
SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] = 'Tripp';
SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] = 'Anderson';
SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] = 'Smith';
SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] = 'Abbas';
Then, execute all of these in SSMS with the option “Include Actual Execution Plan” (from the Query drop-down). Now, before you think you need to understand everything about plans and what they mean… here’s the good news. For “stability-testing” you do NOT need to care what the plan does. What you need to look at is the plan shape (OK, sometimes it’s not quite this easy and I would look a little deeper [like at the types of operators and sometimes deeper still] but I will venture that the large majority of huge performance problems are caused by vastly different plan shapes and so the majority of problems can be found JUST by looking at the shape of the plan and not having to dive in much deeper).
If the plan shape varies then the optimal plan for this statement varies; do not FORCE / PREPARE it into cache.
If the plan shape does not vary then the optimal plan for this statement is consistent / stable; this statement should be forced / prepared in the cache. (IMPORTANT NOTE: if you don’t do a very good job testing a wide-enough range of combinations you might make a bad decision here. It’s very important to have realistic data, realistic parameter combinations, and a number of them.)
For stored procedures: For stored procedures there’s a similar / easy way to see if the plan varies from execution to execution by using EXEC procedure parameters WITH RECOMPILE. While this does have some limitations, this can also be helpful at knowing if a stored procedure might be prone to similar PSP. I did write a detailed post on that here: Building High Performance Stored Procedures.
The End Result
Blog posts like the one I referenced are based in fact. And, I believe that what they’ve done did help for THEIR problem. However, trying to over simplify (for ALL environments) a single coding practice is often NOT IDEAL. Unfortunately, it depends IS the right general answer here.
If you’re looking for a lot more content around this, check out the links I’ve included in this post – there are numerous free resources including an online video I did as part of PASStv (linked to in the Building High Performance Stored Procedures post). And, if you still want more content, demos, and details, check out my two Pluralsight courses specifically targeting these problems. Start with this one: SQL Server: Optimizing Ad Hoc Statement Performance and then watch this one: SQL Server: Optimizing Stored Procedure Performance.