In Nov 2014, SQLskills made an offer to user group leaders to deliver remote user group sessions in 2015 and we’ve been having great fun delivering them. So far the team has delivered 34 sessions and we have 72 signed up! At my last session (for the San Diego SQL Server User Group), I delivered a session on stored procedures. It was great fun and I had a few questions at the end that made me come up with some new sample code… that’s where this post is coming from!
The question was because I said to make note of something being a parameter and not a variable (during the lecture) and so the question was – what’s the difference and how does that change the behavior?
Parameters are values that are being passed into a stored procedure. Since these are part of the call to execute the procedure; these are “known” during compilation / optimization (but, that’s only IF SQL Server has to compile / optimize, more on that in a minute)
Variables are assigned at runtime and therefore are “unknown” during compilation / optimization.
Literals are known at all times as they are exactly that – a simple / straightforward value
Review this heavily commented stored procedure:
CREATE PROCEDURE ParamVarLit (@p1 varchar(15)) -- defining the parameter name @p1 AND the data type AS -- Declarations / assignments DECLARE @v1 varchar(15); -- defining the variable name @v1 AND the data type SELECT @v1 = @p1; -- assigning the variable to the parameter input value DECLARE @v2 varchar(15); -- defining the variable name @v2 AND the data type SELECT @v2 = 'Tripp'; -- assigning the variable to a literal value -- Note: also acceptible is this format -- DECLARE @v2 varchar(15) = 'Tripp' -- Statement 1 (using a parameter) SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] = @p1 -- Statement 2 (using a variable) SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] = @v1; -- or @v2, these will work EXACTLY the same way! -- Statement 3 (using a literal) SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] = 'Tripp'; GO
In the stored procedure you can see that there’s one input parameter, we’ll assign that when we call the stored procedure.
We can assign parameters by name (line two) or we can assign them by position (note, when you have a lot of parameters this can be frustrating)
EXEC [dbo].[ParamVarLit] @p1 = 'Tripp' -- assignment by name EXEC [dbo].[ParamVarLit] 'Tripp' -- assignment by position
Outside of that, everything else is being defined / assigned within the stored procedure.
Now, we have to see how each of them works from an optimization perspective. To do this, I’m going to use a sample database called Credit. You can download a copy of it from here. Restore the 2008 database if you’re working with 2008, 2008R2, 2012, or 2014. Only restore the 2000 database if you’re working with 2000 or 2005.
Also, if you’re working with SQL Server 2014, you’ll potentially want to change your compatibility mode to allow for the new cardinality estimation model. However, I often recommend staying with the old CE until you’ve done some testing. To read a bit more on that, check out the section titled: Cardinality Estimator Options for SQL Server 2014 in this blog post.
So, to setup for this demo – we need to:
(1) Restore the Credit sample database
(2) Leave the compatibility mode at the level restored (this will use the legacy CE). Consider testing with the new CE but for this example, the behavior (and all estimates) don’t actually change.
(3) Update a row and add an index – use this code:
USE Credit; GO UPDATE [dbo].[member] SET [lastname] = 'Tripp' WHERE [member_no] = 1234; GO CREATE INDEX [MemberLastName] ON [dbo].[member] ([Lastname]); GO
To execute this procedure it’s simple – just use one of the execution methods above and make sure that you turn on “Show Actual Execution Plan” from the Query, drop-down menu.
EXEC [dbo].[ParamVarLit] @p1 = 'Tripp' -- assignment by name GO
This is what you’ll see when you execute:
Notice that the first statement and the third statement both use an index to look up the row but statement two does not. Why? It’s all tied to whether or not the value was “known” at the time of optimization. And, most importantly – this procedure was the executed when there wasn’t already a plan in cache. Because there wasn’t already a plan in cache, SQL Server was able to optimize this procedure for the parameters passed in at THIS execution. This does NOT happen for subsequent executions.
More specifically, when the statement KNOWS that the value is ‘Tripp’ (statement 1 knows because it’s getting optmized for ‘Tripp’ and statement 3 knows because the value is hard-coded for ‘Tripp’) then SQL Server can look to the statistics to determine how much data is going to be processed. In this case, SQL Server estimates that there are very few rows with a last name of ‘Tripp’ (from the statistics, it thinks there’s only 1). As a result, an index would be helpful to find this highly selective result so it chose a plan to use an index.
For statement 2 though, SQL Server doesn’t seem to know to use an index. Why? Because here the variable (@v1) was unknown at the time of compilation / optimization. The variable is not assigned until actual execution but execution only occurs after a plan has been generated. So, the problem with variables is that SQL Server doesn’t know their actual values until after it’s chosen a plan. This can be both good and bad. Remember, SQL Server has to do something… So, in this case, SQL Server uses an average to estimate the rows and come up with a plan. This average comes from the density_vector component of a statistic rather than this histogram. If your data is reasonably evenly distributed then this can be good. And, it also means that the plan won’t change change after it’s kicked out of cache and a different execution occurs with different parameters. Some have learned this trick and have used it with success – but, only because their data is either evenly distributed OR the executions are using values that all resemble the average.
NOTE: This is EXACTLY the same behavior as using the OPTION (OPTIMIZE FOR UNKNOWN) clause on the statement within the stored procedure.
In this case, however, ‘Tripp’ is NOT like the average value and so the plan for statement 2 is not ideal for a variable assigned to ‘Tripp’. The data has a lot of duplicates and the average number of rows for most names is quite high (where an index is no longer userful). However, ‘Tripp’ is really not an average data value here and so the plan might be good for most other values. But, in this case, it’s not good for the value Tripp.
Execute the procedure again but supply a different value for @p1:
EXEC [dbo].[ParamVarLit] @p1 = 'Anderson' -- assignment by name GO
This is what you’ll see when you execute with ‘Anderson’ AFTER having created a plan for ‘Tripp':
Wait – there’s no difference?
Nope, absolutely none! Really, review every aspect of your output /plan [not the actual values] and you’ll see it’s exactly the same! The plan that you see is always the estimated plan and the estimated plan is chosen when the stored procedure is optimized / compiled. Optimization / compilation occurs only when there isn’t already a plan in cache for that procedure.
Why are they the same?
The parameter – ‘Tripp’ was used on the first execution and this is what was “sniffed” and use for optimization. When I say “sniffed” all that means is that the value was KNOWN such that the optimizer could look at the statistics (and specifically the histogram) to estimate how many rows had a last name of ‘Tripp.’ It turns out that the estimate was 1. You can see this by hovering over the Index Seek in the first statement:
If you were to kick this plan out of cache and re-execute with ‘Anderson’ then something interesting would happen:
EXEC [sp_recompile] '[dbo].[ParamVarLit]'; GO EXEC [dbo].[ParamVarLit] @p1 = 'Anderson'; GO
Results in this plan:
There’s really one incredibly important observation here: ONLY the first statement’s plan changed!
The first statement’s plan changed because on this execution SQL Server was able to “sniff” the parameter and optimize / compile a plan specific to it. More specifically, when the statement KNOWS that the value is ‘Anderson’ (again, only statement 1 knows this) then SQL Server can look to the statistics to determine how much data is going to be processed. In this case, SQL Server estimates that there are numerous rows with a last name of ‘Anderson’ (from the statistics, estimates 385 rows). As a result, the data is not selective enough to warrant using an index so in this case, SQL Server uses a table scan. (shown as a clustered index scan solely because the table has a clustered index)
Bringing It All Together
Parameters are evaluated and sniffed ONLY when a plan is being created. This ONLY happens when a plan is NOT already in the cache. And, sniffing is fantastic for THAT specific execution because “sniffing” a parameter lets SQL Server use the histogram component of statistic to determine the estimate. While the histogram is not always perfect, it’s usually a more accurate way of estimating rows. But, this can also lead to parameter sniffing problems where subsequent executions don’t perform well because the plan in cache wasn’t optimized for their values. There are many solutions to this problem, I covered a few of the options here: Building High Performance Stored Procedures.
Variables are ALWAYS deemed “unknown” and they cannot be sniffed. In this case, SQL Server doesn’t have a value to lookup in a statistic’s histogram. Instead, SQL Server uses an average to estimate the rows and come up with a plan. But. as I mentioned – this can be sometimes good and sometimes bad.
The literal is the easiest of them all. SQL Server knows this value and there’s absolutely nothing that will ever change that value. This can be sniffed and it will use the histogram getting the best estimate to use for optimization.
Play around with this sample procedure. Review the plans and the estimates v. actuals. Next week I’ll dive more into the statistics themselves and where the specific estimates are coming from!
Have fun and thanks for reading!