(USE THIS): New SQL Server 2012 rewrite for sp_helpindex

Since I’ve rewritten sp_helpindex a few times, I have a few blogs posts in this category. Each time I do an update I’ll make the NEW version titled USE THIS.

To use my version of sp_helpindex, you need TWO scripts. One script is version-specific and the other works on versions 2005, 2008/R2 and 2012. All versions need this generic base procedure to produce the detailed output.

Step 1: Setup sp_SQLskills_ExposeColsInIndexLevels

Create this procedure first: sp_SQLskills_ExposeColsInIndexLevels.sql (7 kb).

This is what gives us the tree/leaf definitions. And, this works for all versions: 2005, 2008/R2, and 2012.

Step 2: Setup the replacement procedure for sp_helpindex. This IS version specific:

On SQL Server 2005, use: sp_SQLskills_SQL2005_helpindex.sql (11 kb) to create sp_SQLskills_SQL2005_helpindex.

On SQL Server 2008, use: sp_SQLskills_SQL2008_helpindex.sql (12 kb) to create sp_SQLskills_SQL2008_helpindex. (NOTE: This does run on SQL Server 2012 but if your table has a columnstore index, it will generate an error.)

On SQL Server 2012, use: sp_SQLskills_SQL2012_helpindex.sql (12 kb) to create sp_SQLskills_SQL2012_helpindex.

Step 3: Setup a hot-key combination

Optionally, setup this procedure to be invoked through a keyboard shortcut using Tools, Options, Environment/Keyboard. I usually make it Ctrl+F1 and I described how to do this here.

The Output

On SQL Server 2012, the output will look like the following (index_id 5 is a columnstore index):

index_id is_disabled index_name index_description index_keys included_columns filter_definition columns_in_tree columns_in_leaf
1 0 [member_ident] clustered, unique,   primary key located on PRIMARY [member_no] NULL NULL [member_no] All columns   “included” – the leaf level IS the data row.
2 0 [member_corporation_link] nonclustered located   on PRIMARY [corp_no] NULL NULL [corp_no],   [member_no] [corp_no],   [member_no]
3 0 [member_region_link] nonclustered located   on PRIMARY [region_no] NULL NULL [region_no],   [member_no] [region_no],   [member_no]
4 0 [LastNameInd] nonclustered located   on PRIMARY [lastname] NULL NULL [lastname],   [member_no] [lastname],   [member_no]
5 0 [columnstore_index] nonclustered   columnstore located on PRIMARY n/a, see   columns_in_leaf for details n/a, columnstore   index n/a, columnstore   index n/a, columnstore   index Columns with   column-based index: [member_no], [lastname], [firstname]

I hope this helps you when looking at your indexes!

Enjoy,
kt

The Accidental DBA (Day 28 of 30): Troubleshooting: Blocking

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

SQL Server uses locking for a variety of things – from protecting and isolating resources to “indicators” showing that someone is using a database or accessing a table. Many locks are compatible with other locks; it’s not always about limiting a resource to only one session. But, when locks are incompatible blocking can occur. If the blocker is efficient the blocked user might not even realize that they were momentarily blocked. And, a normal system will always have some blocking; it’s a natural occurrence. However, if there are long-running transactions or transactions that affect large numbers of rows – blocking can really feel like a major problem. But, is it really the locks? Or, instead the blocker?

Root Cause Analysis

In many cases, root cause analysis will reveal some inefficiency in your environment:

  • Imagine a poorly coded update that sets every column to the value in the dialog because they’re too lazy to determine WHICH column(s) have actually changed. This causes modifications in the data row and across numerous indexes that aren’t even changing.
  • Imagine an update that only needs to modify a subset of rows but has no index to help search
  • Imagine a transaction that begins, modifies some data, and then waits for user input. The locks are held at the time the data is modified and modification-related locks are not released until the transaction completes. In this case it’s an indefinite amount of wait time. This can cause HUGE problems – not just blocking but also in logging (and therefore recovery).

My point, locking is not always the main cause of the problem but, it often gets blamed. Instead, locking and blocking is really a symptom of some inefficiency and further analysis will help you to better understand where your real problem is. But, how do you analyze it?

Analyzing Blocking

When performance is poor there are many options to check. In general, we always recommend starting with wait statistics. In Erin’s post The Accidental DBA (Day 25 of 30): Wait Statistics Analysis she mentions using sys.dm_os_wait_stats. Regularly using this and collecting a baseline of your server’s general characteristics will help you when your system is slow or to see if something’s changed (or changing) over time. Be sure to read Erin’s post as well as the posts she references. The more you know about your server when it’s healthy, the more equipped you’ll be when there’s a problem.

And, if you have a blocking situation right now then the DMV to use is sys.dm_os_waiting_tasks. This can tell you if someone is blocked and which SPID (server process ID) is the blocker. However, this can quickly become complicated if there are multiple connections (SPIDs) involved. Sometimes, finding who is at the head of the chain is part of the problem. And, since you’ll need to know more about what’s going on, you’ll want to use sys.dm_tran_locks. And, instead of reinventing the wheel, check out Glenn Berry’s A DMV A Day – Day 27 (sys.dm_tran_locks), specifically, for the blocking query that orders by wait_duration_ms DESC. This will give you an idea of who’s at the head of the chain because the lock held the longest will be at the top – showing who they’re being blocked by.  This will lead you to the blocker. But, what are they doing?

Once you know the SPID at the head of the chain, you can use a variety of commands to start piecing together what’s happening. But, I’d actually recommend a few other things instead:

  1. The completely OLD SCHOOL method is sp_blocker_pss08. You can get the code from this KB article [KB 271509]. The article says it’s only for SQL Server 2000 and SQL Server 2005 but it still works well – even on SQL Server 2012. And, if your company has an issue with running third party products, then this might work out well for you. It’s simple, it’s just TSQL and it gives you a variety of pieces of information if something is blocked right now.
  2. The up-to-date way to determine locking problems is to use SQLDiag. But, there’s a bit of a learning curve with this as it’s command-line based and requires a bit more work than just the execution of an sp. You should definitely get some time with it but if you’re trying to troubleshoot a blocking problem right now, now is not the time to learn SQLDiag.
  3. The easiest (third-party tool) is Adam Machanic’s sp_WhoIsActive and it really does a nice job of producing the information that the old sp_blocker_pss08 produces but in tabular form. And, Adam has blogged quite a bit of information about using this utility.

And, if you’re trying to see if patterns exist over time, consider Using the Blocked Process Report in SQL Server 2005/2008. Jonathan did a great write-up of how to set this and use this to generate a trace of the activity that’s running at the time a process hits 5 seconds of being blocked.

The Resolution

Ultimately, you need to find out who is doing the blocking first – why is their transaction taking so long? If it’s due to inefficiencies in the query – can you rewrite it? If it’s due to inefficiencies in the plan – can you add an index? If it’s modifying a large amount of data – can you break it down into smaller chunks so that each set is locked for a shorter period of time? These are ALWAYS the thing to try first.

Consider Row Versioning

If you truly have an optimized system and it’s highly active with both readers and writers who are just constantly getting in each other’s way (and causing blocking), then you might consider using a form of row versioning. This is much more complicated than a quick post can capture but I’ve see “snapshot isolation” (as it’s often called) explained incorrectly numerous places. Simply put, you can have your database in one of FOUR states:

  1. Read Committed using Locking: this is the default – with NONE of the row versioning options enabled.
  2. Statement-level Read Consistency (or, read committed using row versioning): this is what you run if you turn on ONLY the database option read_commmitted_snapshot. This causes readers (in read committed isolation) to use versioned-based reads guaranteeing them a definable point in time to which their QUERY (or, statement) reconciles. Each statement reconciles to the point in time when that statement began.
  3. Transaction-level Read Consistency (or, Snapshot Isolation): this is what you get if you turn on ONLY the database option allow_snapshot_isolation. This ALLOWs users to request a versioned-based read and in a transaction, will cause ALL reads in the transaction to reconcile to when the transaction began. However, it’s important to note that this option adds the overhead of versioning but readers will use locking unless they request a snapshot isolation session using: SET TRANSACTION ISOLATION LEVEL SNAPSHOT.
  4. The forth state is when BOTH database options have been set. If you turn on both read_committed_snapshot and allow_snapshot_isolation then all statement’s reconcile to the point in time when the statement started (in read-committed). Or, if you’ve changed your isolation to snapshot then each statement will reconcile to the point in time when the transaction began.

NOTE: There are numerous places [online] where it’s stated that both of these options are required for versioning; this is incorrect. You can have neither, only one, or both. All four states produce different behaviors.

The beauty of versioning is that readers don’t block writers and writers don’t block readers. Yes, I know it sounds fantastic but be careful, it’s not free. You can read the detailed whitepaper that I wrote about it when it was first released in SQL Server 2005 (updated by Neal Graves): SQL Server 2005 Row Versioning-Based Transaction Isolation. And, I also did a video on it here: http://technet.microsoft.com/en-US/sqlserver/gg545007.aspx. And, if you think that the overhead might be too much for your system, check out the case study from Nasdaq: Real-Time Reporting with Snapshot Isolation.

Ultimately, versioning might be what you’re looking for but don’t jump right to this without thoroughly tuning your environment.

In Summary

  1. Find where you’re waiting most
  2. Do root cause analysis to get to the code or SPID that’s causing you grief
  3. Analyze the code to see if it can be changed to reduce the time that the locks are held
  4. Consider changing isolation

Thanks for reading!
kt

PS – Stay tuned for day 29 when Jonathan talks about when blocking becomes deadly! :)

The Accidental DBA (Day 20 of 30): Are your indexing strategies working? (aka Indexing DMVs)

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

As an accidental DBA you are constantly wearing many hats. You’ve heard that indexes are critical for performance (and that’s absolutely true) but it’s not just any indexes – it’s the RIGHT indexes. The unfortunate thing about indexing is that there’s both a science and an art to it. The science of it is that EVERY single query can be tuned and almost any SINGLE index can have a positive effect on certain/specific scenarios. What I mean by this is that I can show an index that’s ABSOLUTELY fantastic in one scenario but yet it can be horrible for others. In other words, there’s no short cut or one-size-fits-all answer to indexing.

This is where the art comes in – indexing for performance is really about finding the right balance between too many and too few indexes, as well as trying to get more from the indexes that you do keep.

Having said that, it’s way beyond the scope of what we can talk about in a short post. Unfortunately, a full discussion about these things would take a lot of time but I do have a few links that might help if you’re interested in digging in a bit deeper:

  • How do indexes work? (Index Internals video and demo)
  • What’s the clustered index? (Clustered Index Debate video)
  • What are good/general strategies for indexing? (Indexing Strategies video and demo)
  • When should columns be put in the key (and in what order should they be defined) versus when should they be in the INCLUDE list?
  • Would we benefit from a filtered index?

These are difficult discussions. And really, if you’re only meant to “maintain” the system and keep the lights on – these are probably beyond what you can effectively do in your already-packed day.

So, who is handling this job, to create and define the database’s indexes? It’s probably the application developers. These might be in-house developers (which is good – you’ll have someone with whom to consult when you do your analysis) or they might be vendor developers (which might be both good and bad). The good side is that some vendors are open to discussions on their customer support lines and may help you through the issues that you’re seeing. The bad news is that some vendors are not open to discussions and they might not support you if you make any indexing changes.

So, first and foremost – make sure you thoroughly understand the environment you’re about to analyze and do not change anything without verifying that you can.

Introduction

The good news is that it’s NOT all doom and gloom. There are some very helpful DMVs and resources that you can use to analyze your environment and see where some of your indexing issues are. Again, you might not be able to change them (immediately) but, you will be armed with information to help you discuss what you are seeing.

When I’m analyzing a new-to-me system, I tend to break down my index analysis into three parts:

  1. Are there any indexes just lying around not doing anything useful… time to get rid of the dead weight!
  2. Are there any indexes that are bloated and unhealthy – costing me time and space… time to analyze the health of my existing (and useful) indexes
  3. Then, and only then do I feel like you can add more indexes.

Part I: Getting rid of the dead weight

Fully duplicate indexes

SQL Server lets you create redundant/duplicate indexes. This is annoying but it’s always been the case. It certainly begs the question about why SQL Server lets you do this and I wrote up an answer to this in an article on SQL Server Magazine here: Why SQL Server Lets You Create Redundant Indexes (http://sqlmag.com/blog/why-sql-server-lets-you-create-redundant-indexes). Regardless of why, you still need to remove them. And, even if you don’t remember seeing duplicate indexes, you might be surprised. Without knowing index internals, it might be harder to recognize duplicates than you think. It’s not always as easy as Index1 on col1 and Index2 on col1. Internally, SQL Server adds columns to your index and most commands (like sp_helpindex) do not show these internally added columns. The good news is that I have a version of sp_helpindex that does show you the entire structure. And, tied to that updated version of sp_helpindex, I built a script for finding duplicate indexes at either the table-level or database-wide. Check out these links:

But, you could BREAK the application if they’ve used index hints. So, beware! Generally, it might be best to disable an index for a while before you just drop it.

Unused Indexes

Almost as expensive as a duplicate index is one that never gets used. However, this is a lot more challenging to determine. There is a fantastic DMV (sys.dm_db_index_usage_stats) that gives you information about index usage but, it’s not perfect. And, some of the behaviors have changed in some releases (sigh). If you’re really wanting to best understand your index usage patterns you’ll have to persist this information over a business cycle and be sure to persist it prior to index maintenance (see this connect item: Rebuilding an index clears stats from sys.dm_db_index_usage_stats). https://connect.microsoft.com/SQLServer/feedback/details/739566/rebuilding-an-index-clears-stats-from-sys-dm-db-index-usage-stats Note: this is only an issue is SQL Server 2012.

Bug, again, even the information tracked in this DMV isn’t perfect. One of my biggest frustrations is the user_updates only tracks the number of STATEMENTS, not the number of ROWS modified. For example, if I execute this statement (without a WHERE clause) UPDATE Table SET ColumnX = Value and it affects 10,000 rows, then the user_updates column will be incremented by 1 for BOTH the table and any indexes that include ColumnX. So, you might have an even higher (possibly MUCH higher) value for updates.

And, there’s more to it than that. Instead of duplicating this information, I’ll link to a FANTASTIC post by Microsoft PFE Ignacio [Nacho] Alonso’s FAQ around sys.dm_db_index_usage_stats. http://blogs.msdn.com/b/ialonso/archive/2012/10/08/faq-around-sys-dm-db-index-usage-stats.aspx.

Finally, both Paul and I have written about this DMV as well as how to persist it. Check out these posts:

Similar or semi-redundant indexes

You might have some indexes that are good candidates for consolidation:

  • Indexes that have the same key (but possibly different INCLUDEd columns)
    • Index1: Key = LastName
    • Index2: Key = LastName, INCLUDE = FirstName
    • In this case you don’t “NEED” Index1. There’s NOTHING that Index1 does that Index2 cannot also do. However, Index2 is wider. So, a query that solely wants the following will have more I/Os to do because of the wider index:
      • SELECT LastName, count(*) FROM table GROUP BY LastName
    • But, the argument is – how critical is that query? How often is that index really used? Remember, you can use sys.dm_db_index_usage_stats to help you determine how often it’s used.
  • Indexes that have left-based subsets of other index keys
    • Index1: Key = LastName, FirstName, MiddleInitial
    • Index2: Key = LastName INCLUDE = SSN
    • Index3: Key = LastName, FirstName INCLUDE = phone
    • In this case each index does provide some specific (and unique) uses. However, you have a lot of redundancy there.
    • What if you created a new Index: LastName, FirstName, MiddleInitial INCLUDE (SSN, phone)
    • Again, this new index is wider than any of the prior 3 but this new index has even more uses and it has less overall overhead (only one index to maintain, only one index on disk, only one index in cache [and, it’s more likely to stay in cache]). But, you still have to determine how critical each of the queries are that were using the narrower indexes? As well as how much more expensive they are with the new index.

So, this is where the “art” of indexing comes into play. Index consolidation is a critical step in reducing waste and table bloat but there isn’t a simple answer to every consolidation option. This is another “it depends” case.

Part II: Analyze the health of your existing indexes

This topic has been talked about in many places. And, we’ve even chatted about it in our Accidental DBA series here: The Accidental DBA (Day 14 of 30): Index Maintenance.

In the content of this post, I want to make sure that after I’ve cleaned up the dead weight, my existing and useful indexes are healthy. And, you might want to review your index maintenance strategies and see if they’re “lighter” and take less time. And, be sure that they don’t miss anything. A couple of key reminders:

  • Make sure your index maintenance routines at indexes on tables AND views
  • Make sure your index routines use a LIMITED scan if you’re only analyzing avg_fragmentation_in_percent

And, here are a few other resources that you might find handy on this topic:

Part III: Adding more indexes

This is a tricky one. There are lots of good/bad practices around adding indexes. One of the worst is that most folks just add indexes without really fully analyzing (and CORRECTLY analyzing) their existing indexes. The reason I say correctly analyzing their existing indexes is that the tools (like sp_helpindex and SSMS) hide some of the information about columns that might have been added to your indexes. So, unless you really know what your indexes look like you won’t be able to correctly add new indexes while consolidating your existing indexes.

The primary tool that I want to discuss here is the “user impact” aspect of the missing index DMV queries that exist out there (and, there are some great examples of using the missing index DMVs). And, while I STRONGLY encourage you to use them as a GUIDE, I do want you to remember that they’re not perfect. Here are my main issues/concerns/gripes:

  • The missing index DMVs (and therefore the “index/green hint” that shows up in showplan) only tune the plan that was executed. If the plan performed a hash join then the index is going to help the hash join. But, it’s unlikely that the join type will change. And, it might be the case that a different index would perform a different join type and the query would be even faster. If you’re about to trust the missing index DMVs recommendations(or, the green hint), then consider reverse-engineering the queries that are being tuned by these recommendations (see Jon’s post on how to do this) and then (if possible) run these queries through DTA (the Database Engine Tuning Advisor). DTA has capabilities that the missing index DMVs do not in that DTA can “hypothesize” about alternate strategies. This makes the index recommendations even better!
  • The missing index DMVs only think about the BEST index for EACH query. And, that does make sense (from a QUERY tuning perspective) but, you need to do SYSTEM tuning. You can’t just create individual indexes for each and every query that needs one. You definitely want to consider the indexes that have the highest user impact but you also don’t want to forget about consolidation.
  • The missing index DMVs can show indexes that you already have. Missing index DMVs bug that could cost your sanity…

Here are a few Missing Index DMV queries/resources:

Summary

These are the primary things that I’m looking for when I want to see how the already implemented indexing strategies are working as well as the order in which I begin to analyze and change indexes. But, beware: you can negatively impact the environment so it’s important that adequate testing is done to make sure that what you’re doing has a net-positive effect.

Finally, I also did a video summarizing these things that I’m describing here – you might want to check this out as well: http://technet.microsoft.com/en-US/sqlserver/gg545020.aspx.

Thanks for reading!
kt

The Accidental DBA (Day 15 of 30): Statistics Maintenance

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

When you execute a query that’s going to process a single row, the plan to access that row might be very simple – use an index to find the row and then look up the data. If you execute a query that’s going to process thousands of rows, the plan to gather that data might be more complicated. The real question here isn’t the plan itself but how did SQL Server know that there was going to be one row or thousands of rows to access? To create a good plan, SQL Server needs to know your data (before it goes to the data) in order to access the data efficiently.  This is why statistics exist.

What are statistics?

Statistics are objects in the database, stored as a BLOB (binary large object). Generally, you don’t create them directly; they are created with indexes or auto-created by SQL Server when the query optimizer (the system that decides the most efficient way to access the data) needs better information about your data than what it has currently. The latter creation scenario is tied to a database option: auto create statistics. This database option is on by default and for Accidental DBAs, I recommend that this stay on. As for manually creating statistics, there are cases where creating statistics can be extremely useful but they tend to be warranted for VLTs (very large tables). For today, I’ll save that discussion as it’s out of scope for a typical Accidental DBA.*

Statistics give information about the data distribution of the keys described by that statistic (in key order). Statistics exist for all indexes and [column-level] statistics can exist on their own. For example, let’s review the AdventureWorks2012 database. The person.person table has an index called IX_Person_LastName_FirstName_MiddleName on the LastName, FirstName, and MiddleInitial columns of the table. What do the statistics on this index tell me?

USE AdventureWorks2012;
go
DBCC SHOW_STATISTICS ('Person.Person', 'IX_Person_LastName_FirstName_MiddleName');
go

There are 3 results sets returned from the DBCC SHOW_STATISTICS command.

The header

Name Updated Rows Rows Sampled Steps Density Average key   length String Index Filter   Expression Unfiltered Rows
IX_Person_LastName_FirstName_MiddleName Oct 31 2012 12:47PM 19972 8935 200 0.6730038 28.32502 YES NULL 19972

The most important information from the header is when the statistics were last Updated (or when they were created if they’ve never been updated). The second most important is the Rows vs. Rows Sampled columns. Neither of these directly indicates a problem but if queries against this table are not performing and the estimates the queries are using for optimization are not correct, it could be the statistics that are incorrect.

The density vector

All   density Average Length Columns
0.001362398 11.22798 LastName
5.05E-05 23.09927 LastName, FirstName
5.03E-05 24.32502 LastName, FirstName,   MiddleName
5.01E-05 28.32502 LastName, FirstName,   MiddleName, BusinessEntityID

The density vector tells us information about the average distribution of our data. If you multiply the All density * Rows (of the table) you can get some insight into the average distribution of the column(or columns) described by Columns above.

Using LastName alone: 0.001362398 * 19972 = 27.209812856. What this tells me is that the Average number of rows returned for queries that supply JUST a LastName is 27.

Using LastName & FirstName: 5.05E-05 * 19972 = 1.008586. What this tells me is that the combination of LastName and FirstName is almost unique. If I supply BOTH a FirstName and a LastName in my query (using equality), then I should get back 1 row.

This is interesting information – especially for the combinations of the columns beyond the first – because this tells us how much more selective a query can be if we add these additional columns in our WHERE clauses. But, it’s not perfect for LastName alone because we all know that each last name is not going to return 27 rows, right? And, this is where the histogram comes in…

The histogram

RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
Abbas 0 1 0 1
Adams 7.016288 71.19462 3 2.326017
Alan 9.355051 11.12416 3 3.101357
Alexander 18.7101 111.2416 6 3.103471
Zhu 72.50165 68.96979 1 71.9106
Zugelder 23.38763 1 4 5.817025
Zwilling 0.1117823 3.552485 0 27.20037

The histogram can contain up to 201 rows (see the Steps column in the statistics header). These 201 rows are made up of up-to 200 distinct (and actual values) from the table itself AND one row if this leading column allows Nulls. In this case, because our LastName column does not allow Nulls, our histogram has 200 rows (side note: even if your leading column has more than 200 values, it does not guarantee that SQL Server will have 200 steps).

The histogram tells us the most detailed information about our leading column (often referred to as the “high-order element” of the index). It’s surprisingly easy to read:

Abbas 0 1 0 1
Adams 7.016288 71.19462 3 2.326017
Alan 9.355051 11.12416 3 3.101357

For the LastName Abbas there is 1 row equal to this value (EQ_ROWS) and no rows prior to it (no rows in the range).

For the LastName of Adams, there are 71 rows that equal this value (EQ_ROWS) and 7 rows between Abbas and Adams (not including the rows that equal Abbas [1] and Adams [71]) and between these values there are 3 other LastName values. The average number of rows per name between these values is 2.32.

What does this tell me – it tells me that any query requesting rows with a LastName value between Abbas and Adams, will have an estimate of 2.32 rows.

Are statistics accurate?

Well… it depends. There are many factors that affect the accuracy of a statistic. Size of the table, skew of the data, volatility of the table – they all affect the accuracy. At the time of creation, they can be incredibly accurate. But, as data gets modified, they might become less accurate. Because of the nature of how they’re created and what they represent, there’s no way to keep them up to date as individual rows are modified. The only way to update them is when you’re viewing a large amount of the data. When an index is rebuilt, SQL Server updates the index’s statistic with the equivalent of a full scan of the data. Statistics on an index are most accurate after an index rebuild. However, an index reorganize does not update statistics at all because the entire table is not analyzed in one go (only pages with fragmentation are reorganized). So, if you find that your index maintenance scripts are regularly reorganizing indexes then you’ll want to make sure that you also add in statistics maintenance. And, your statistics maintenance should not only include statistics on indexes but any of the other statistics that SQL Server may have created.

Statistics Maintenance

Now that you know statistics provide a valuable role in optimization, it’s also important that this information be accurate. Just as Jonathan mentioned in his post yesterday (The Accidental DBA (Day 14 of 30): Index Maintenance), I also often recommend custom scripts. And, Ola’s scripts even have an option where you only update statistics where data has changed. So, if you run the statistics maintenance after having run index maintenance (and no data has been modified since), then you will only update statistics where there has been data change. This is the most efficient way to update only the statistics that might need to be changed.

Summary

In order for the query optimizer to do a good job, it has to have accurate and up-to-date statistics. My general recommendations for Accidental DBAs is that they should leave both the auto create statistics option and the auto update statistics option on (they are both on by default). However, I would also recommend a specific maintenance routine that updates the statistics manually – off hours – so that the default auto updating mechanism isn’t your primary method for updating statistics. For the optimizer to do a good job at optimizing your queries, statistics have to both exist and be accurate. Proper statistics maintenance is a critical task for helping the optimizer do its job. While there are multiple options for automating this task, custom scripts provide the best method of minimizing the work to be done by performing the most efficient updates based only on data change.

This has only been a short introduction into statistics; there’s a lot more to them. If you’re interested in reading more about statistics check out the whitepaper, Statistics Used by the Query Optimizer in Microsoft SQL Server 2008. Then, check out our SQLskills blogs by category: Statistics.

Thanks for reading!
kt

*Sidenote, as an Accidental DBA, if you have individual tables reaching 100GB or more, we’ll, you might want to talk to management to allow you more time, more knowledge, more administration, more tweaking/tuning of these larger tables. Maybe it’s time your environment considered a full-time DBA (maybe you?) with more dedicated time to managing/tuning your servers. It’s hard to wear so many hats in general but if you’re starting to administer databases with large tables and large databases (1TB+), then maybe you’re not an Accidental DBA any more?

SQLintersection’s Fall Conference – It’s all about ROI!

Brent [Ozar] beat me to the punch this morning with his fantastic post on SQLintersection’s Fall conference here: SQL Intersection Registration Open but, I wanted to second that with a few more details. I first blogged about our new [in 2013] conference here: http://www.sqlskills.com/blogs/kimberly/sqlintersection-new-conference/ and when we had only just planned our first event, we just weren’t quite sure of how it would go. Well… with one completed event – we now know! People loved it. We had absolutely fantastic feedback and numerous people asking us when the next show was so they could start planning… we didn’t expect it but many are planning to return in the Fall. They felt the conference was so valuable that they’re attending both the Spring and the Fall event.

What is SQLintersection all about?

  • Sessions are planned in a series with tracks having a specific focus each day. We don’t just do a random call for sessions and then put them into any slot randomly. Our sessions are chosen, planned, and strategically placed so that you can get the most out of that topic!
  • Speakers are chosen based on their areas of expertise AND their track record. We’ve only selected speakers that have top ratings at prior events. These are people from whom you will really learn. We all love what we do and you’ll see that in EVERY session. And, our speakers are also consultants – people out there in the trenches – doing what you’re doing everyday and struggling with the same problems. This is the real inspiration for our sessions – problem solving, troubleshooting and REAL SOLUTIONS!
  • Each track has a room/session host. Paul Randal, Brent Ozar, and I will be hanging out in session rooms throughout the entire conference. Other speakers will bounce in and out of tracks but will hang out for much longer than just their session. We’re not just there to deliver a session and run away – we’re there to help get your problems solved. And, if we can’t solve the problem we’ll point you to the session that can or the speaker that has that area of expertise. It’s all about the return on your investment of time and money.

But, it’s a new event? What experience do we have at running events?

Together with NextGen, and co-located with DEVintersection, we have put together the most valuable developer/SQL Server conference experience that’s out there!  Our conference management and content team has been responsible for the immensely popular SQL Connections conference over the last 10+ years through Fall 2012, and now we have our own show. SQLintersection is where you’ll find the conference experience you’re used to, but now with hand-picked speakers, more real-world topics, and extensive interaction with the people that you’ve come to know well in the community (Paul Randal, Brent Ozar, Aaron Bertrand, Steve Jones, and so many more!).

To be able to create the perfect developer show, we’ve also partnered with Richard Campbell from .Net Rocks and RunAs Radio. Using our extensive expertise in event planning and conference management as well as our real-world experience in consulting, we have created THE place to get your questions answered and interact with experts that really care, while experiencing a fantastic event in a great location!

SQLintersection focuses on performance best practices for SQL Server versions 2012, 2008R2 and 2008 but also highlights new and critical technologies in SQL Server 2012. And, while many best practices still apply to 2005 most of our speaker panel has experience back to SQL Server 2000 and can answer your questions!

What went well at our first SQLintersection?

  • Attendees recognized the ROI that they were getting from the conference. Even after only 1 day people commented to me that they were already fixing problems back in their office. One comment that I received in just the following week after the conference (from a gentleman named Geoff Lister) was: I had to write and tell you that last week’s SQL intersection was fantastic. I learned so much, and have a whole load of new questions for Google and things to research.  It was a real eye opener to aspects of SQL I didn’t even know to ask about before or did not realize how important they are.  Much of what you gave me I was able to implement during the conference to my production applications and have seen real performance gains and have some happy DBAs! In  particular, you gave me a much great appreciation of why the cache is so important and how to understand the execution plans a bit better and identify parameter sniffing issues. 
  • On twitter we received a comment just after the conference ended (from Michelle Ufford) who is @sqlfool: Colleague who attended #SQLintersection said it was best conference he’s ever attended cc@KimberlyLTripp @PaulRandal @Kendra_Little @BrentO
  • Another comment that came in email from Kevin Urquhart: As for feedback itself…  well, it truly was superb.  I enjoyed every minute of it and was genuinely quite sad when SQLintersection ended. The friendliness, sheer desire to impart knowledge, and obvious preparation that had gone into each session was all anyone could hope for really.

When can you register?

Now! We’ve posted the bulk of our content and the bulk of our sessions for the Fall show. So, start your planning now…

Register for $1,894 before June 24th and you get the Show Package: the conference, PLUS a pre-con or post-con of your choice, PLUS your choice of a Surface RT, Xbox, or a $300 gift card.

Register for $2,294 and you can add ANOTHER pre-con or post-con. This is one of the best choices. You can arrive on Saturday, catch a show and then get five days of nonstop learning from the absolute best in the business. Sunday through Thursday you’ll be focused on everything SQL! Then, you can relax on Friday or travel home or stay an enjoy another day in Vegas.

Finally, be sure to register with the discount code SQLskills and you get another $50 off.

Check out SQLintersection online here: www.SQLintersection.com.

See you there!

Be sure to introduce yourself. Be sure to take time to come find us and ask us questions. Don’t be shy – this is why we created this new event… we love this stuff and we look forward to seeing you there!

SQLintersection: a new year, a new conference

UPDATE (April 17, 2013): We just finished our FIRST SQLintersection conference with wild success. We’re currently working on our next event – scheduled for October 27-30, 2013. More posts coming soon but it looks like we have the right idea from two comments I received just today:

On Twitter (11:50am PT today) from Michelle Ufford (@sqlfool): Colleague who attended #SQLintersection said it was best conference he’s ever attended cc@KimberlyLTripp @PaulRandal @Kendra_Little @BrentO

In Email (today) from Geoff Lister: I had to write and tell you that last week’s SQL intersection was fantastic. I learned so much, and have a whole load of new questions for Google and things to research.  It was a real eye opener to aspects of SQL I didn’t even know to ask about before or did not realize how important they are.  Much of what you gave me I was able to implement during the conference to my production applications and have seen real performance gains and have some happy DBAs! In  particular, you gave me a much great appreciation of why the cache is so important and how to understand the execution plans a bit better and identify parameter sniffing issues.  

************** original post follows **************

Ill Be There SQLintersection: a new year, a new conferenceI’ve been speaking and presenting at conferences for years (16+ years to be exact) and while I’ve had a great time at all of these conferences (some more than others :), I’ve always felt like there was something missing. Nothing seemed to help bring cohesion across the sessions. Nothing really helped the attendees and speakers interact better. How do attendees really get their problems solved? Well, now I get to make those decisions and changes! Why? Because we’ve designed a NEW conference that helps intersect the right information with the right people. Our new show brings together real-world experts that present problem solving techniques and technologies that you can implement today; we’re calling it SQLintersection (#SQLintersection).

SQLintersection: It’s all about ROI

First and foremost, people want better performance. If your servers perform well you can process more data – you can get more done. But, what you need to do to get better performance varies. Sometimes it’s hardware – which might be an easier change, and sometimes it’s design/architecture – which might be significantly more complex. Sometimes it’s little tweaks – adding some indexes, removing some redundant indexes, updating statistics, adding more statistics, changing a procedure and/or the way that some of your procedures are cached, sometimes it’s all about IO, sometimes it’s the execution plans and the queries themselves. But, the biggest challenge is knowing where to look and knowing where and what these changes are, how to make them and then finally, implementing them with the lowest amount of downtime and data loss that’s possible.

That’s what we’ve done. We’ve put together a conference that’s primarily focused around performance, scalability, and troubleshooting but we haven’t forgotten realiability/automation.

SQLintersection: Track Hosts add interaction and information!

To bring cohesion to our event, each of our tracks will have a host (or an MC, per se) that will present a session or two as well as stay in their track room to introduce each session over a theme (each track will have a theme for each day). The host will be available to answer questions, help you interact with the right speakers and just generally give you insight that you can’t get other ways. Right now we have 3 track hosts: Brent Ozar, Aaron Bertrand and I will each host a track and we’ll be available in our track room all day (between sessions and for much of lunch as well) to really help you get your problems solved. And, we’ll end each track with an open Q&A panel with speakers from the track. You’ll hear great sessions and you’ll have multiple opportunities to interact with expert speakers, other attendees, and get your problems solved! And, in addition to three full conference days, there are five full-day workshops (2 days prior to the conference and 1 day after the conference) from which to choose and over 30 technical sessions mostly in the 200-400 level range.

SQLintersection: What about the speakers?

I’m so excited about this lineup. All of these speakers are top-rated, SQL experts that have been around in this field for years but are still focused on consulting. Every speaker is a SQL Server MVP (with the except of the vendor/Microsoft speakers – but, I don’t think anyone’s going to question Conor or Bob’s SQL knowledge :)) and some are Microsoft Certified Masters in SQL Server. But, no matter what – ALL are focused on providing the most ROI that’s possible in their session(s) and/or their workshops. Check out this list of speakers:

  • Aaron Bertrand, Sr. Consultant, SQL Sentry, Inc. [blog | twitter]
  • Andrew J. Kelly, Mentor, SolidQ [blog | twitter]
  • Bob Ward, Principal Architect Escalation Engineer, Microsoft [blog | twitter]
  • Brent Ozar, Brent Ozar Unlimited [blog | twitter]
  • Conor Cunningham, Principal Architect, SQL Server, Microsoft [blog]
  • Grant Fritchey, Product Evangelist, Red Gate Software [blog | twitter]
  • Jeremiah Peschka, Brent Ozar Unlimited [blog | twitter]
  • Joseph Sack, Principal Consultant, SQLskills.com [blog | twitter]
  • Kendra Little, Managing Director, Brent Ozar Unlimited [blog | twitter]
  • Kevin Kline, Director of Engineering Services, SQL Sentry, Inc. [blog | twitter]
  • Kimberly L. Tripp, President/Founder, SQLskills.com [blog | twitter]
  • Mat Young, Senior Director of Products, Fusion-io [blog | twitter]
  • Paul S. Randal, CEO / Owner, SQLskills.com [blog | twitter]
  • Paul White, SQL Kiwi Limited [blog | twitter]
  • Steve Jones, Editor, SQLServerCentral.com [blog | twitter]
  • Sumeet Bansal, Principal Solutions Architect, Fusion-io [blog | twitter]

SQLintersection: When is it all happening?

The show officially runs from April 8th through the 11th but there are both pre-conference and post-conference workshops. For the full conference, you’ll want to be there from Sunday, April 7th through Friday, April 12th.

SQLintersection: Why is it for you?

If you want practical information delivered by speakers that not-only know the technologies but are competent and consistently, highly-rated speakers – this is the show for you. You will understand the RIGHT features to troubleshoot and solve your performance and availability problems now!

We hope to see you there!

Cheers,
kt

What caused that plan to go horribly wrong – should you update statistics?

I’ve been seeing this over the past few years, imagine this scenario:

You have a stored procedure that runs well most of the time but sometimes it’s WAYYYYY off. It’s almost as though the performance of it went from great to horrible in a split second (like falling off of a cliff). You don’t know why but someone says – it’s got to be the statistics. In fact, if you have the luxury of time (which most folks don’t have), you execute it yourself and you check the plan – WOW, the estimated number of rows is WAY off from the actual rows. OK, it’s confirmed (you think); it’s statistics.

But, maybe it’s not…

See, a stored procedure, a parameterized statement executed with sp_executesql and prepared statements submitted by clients ALL reuse cached plans. These plans were defined using something called parameter sniffing. Parameter sniffing is not a problem itself – but, it can become a problem for later executions of that same statement/procedure. If a plan for one of these statements was created for parameters that only return 1 row then the plan might be simple and straightforward – use a nonclustered index and then do a bookmark lookup (that’s about as simple as it can get). But, if that same sp_execute statement/procedure/prepared statement runs again later with a parameter that returns thousands of rows then using the same plan created by sniffing that earlier parameter then it might not be good. And, this might be a rare execution. OR, it could be even more strange. These plans are not stored on disk; they are not permanent objects. They are created any time there is not already a plan in the cache. So, there are a variety of reasons why these plans can fall out of cache. And, if it just so happens that an atypical set of parameters are the first ones used after the plan has fallen out of cache (better described as “has been invalidated”) then a very poor plan could end up in cache and cause subsequent executions of typical parameters to be way off. Again, if you look at the actual plan you’ll probably see that the estimate is WAY off from the actual. But, it’s NOT likely to be a statistics problem.

But, let’s say that you think it is a statistics problem. What do you do?

You UPDATE STATISTICS tablename or you UPDATE STATISTICS tablename indexname (for an index that you specifically suspect to be out of date)

And, then you execute the procedure again and yep, it runs correctly this time. So, you think, yes, it must have been the statistics!

However, what you may have seen is a side-effect of having updated statistics. When you update statistics, SQL Server usually* does plan invalidation. Therefore, the plan that was in cache was invalidated. When you executed again, you got a new plan. This new plan used parameter sniffing to see the parameters you used and then it came up with a more appropriate plan. So, it probably wasn’t the statistics – it was the plan all along.

So, what can you do?

First, do not use update statistics as your first response. If you have a procedure that’s causing you grief you should consider recompiling it to see if you can get a better plan. How? You want to use sp_recompile procedurename. This will cause any plan in cache to be invalidated. This is a quick and simple operation. And, it will tell you whether or not you have a recompilation problem (and not a statistics problem). If you get a good plan then what you know is that your stored procedure might need some “tweaking” to its code. I’ve outlined a few things that you can use to help you here: Stored procedures, recompilation and .NetRocks. If that doesn’t work, then you MIGHT need to update statistics. What you should really do first though is make sure that the compiled value of the code IS the same as the execution value of the code. If you use “show actual plan” you can see this by checking the properties window (F4) and hovering over the output/select.

properties2 What caused that plan to go horribly wrong   should you update statistics?

This will confirm that the execution did (or did not) use those values to compile the plan. If they were the correct values then you might have a statistics problem. But, it’s often blamed and it’s not actually the problem. It’s the plan.

OK, there’s a bit more to this…

*Do plans ALWAYS get invalidated when you update statistics? No…

Erin Stellato (blog | twitter) first blogged about this here: Statistics and Recompilation.

And, also here: Statistics and Recompilation, Part II.

Here’s a quick summary though because it looks like things have changed again in SQL Server 2012…

  • In SQL Server 2005, 2008 and 2008R2 – updating statistics only caused plan invalidation when the database option auto update statistics is on.
  • In SQL Server 2012 – updating statistics does not cause plan invalidation regardless of the database option.

So, what’s the problem? Ironically, I kind of like this. I think that statistics has been blamed all too often for statement/plan problems when it’s not the statistics, it’s the plan. So, I like that there will be fewer false positives. But, at the same time, if I update statistics off hours, I DEFINITELY want SQL Server to invalidate plans and re-sniff my parameter (especially if the data HAS changed) and possibly get new plans from my updated stats.

In the end, I did chat with some folks on the SQL team and yes, it looks like a bug. I filed a connect item on it here: https://connect.microsoft.com/SQLServer/feedback/details/769338/update-statistics-does-not-cause-plan-invalidation#.

UPDATE – 12:55 (yes, only 2 hours after I wrote this).

It’s NOT a bug, it’s BY DESIGN. And, it actually makes sense.

If the plan should NOT be invalidated (directly due to statistics because the data has NOT changed) then it won’t. But…
If the plan should be evaluated (statistics have been updated AND data changed) then it will.

The key point is “data changed.” An update statistics ALONE will not cause plan invalidation (which is STILL different behavior from 2005/2008/2008R2) but it’s the best of both worlds IMO. Only if at least ONE row has been modified then the UPDATE STATISTICS *WILL* cause plan invalidation.

UPDATE 2: The key point is that there might still be some false positives and I’d still rather than people try sp_recompile first but it’s good that UPDATE STATISTICS will cause plan invalidation. But, it’s still a tad different than prior versions… interesting for sure.

A simple workaround is to use sp_recompile tablename at the end of your maintenance script but be aware that running an sp_recompile against a TABLE requires a schema modification lock (SCH_M). As a result, this can cause blocking. If you don’t have any long running reports (or long running transactions) at that time though, it should be quick and simple.

And, stay tuned on this one. In a later CU you should be able to remove the sp_recompile AND you won’t need to worry about the database option either (yeah!).

Thanks for reading,

kt

Stored procedures, recompilation and .NetRocks

Last week I visited the .Net User Group in NY where .NetRocks was recording as part of their Visual Studio Road Trip…

What a great time and a great group! Always fun visiting NY but even more fun when I present to a group that really gets into the topic. I guess I had something to do with it having chosen procedures and recompilation to kick off my part of the discussion… But, still, a fun group for sure! And, why did I choose stored procedures and recompilation?

Every developer that works with SQL Server has to access their data in SOME way… how? Adhoc SQL, prepared statements (like sp_executesql) or stored procedures. To be honest, none are perfect. You shouldn’t always use adhoc. You shouldn’t always use prepared statements… And, dare I say – you shouldn’t always use stored procedures? In fact, I kicked off the evening with the starting statements that SQL Server is a general purpose RDBMS. You can do anything. But, does that mean that each feature is perfect for every use, all the time? Basically, what I said is that you should never say always and never say never. ;-)

Having said that, I do – strongly - believe that you can be the most successful using stored procedures. But, that’s predicated on the fact that you understand how they work. It’s predicated on the fact that you understand that recompiling a plan is NOT always a bad thing. Why? Because SQL Server “sniffs” the parameters passed and chooses the execution plan based on those parameters. It’s that plan (defined by those parameters) that gets saved (in cache) and reused for subsequent executions. If that plan is not good for ALL executions then you start to have parameter sniffing problems.

The end result – reusing a plan is not always good and recompilation is not always bad.

So, what can you do?

To be honest, this is a HUGE discussion and there are LOTS of tangents. In IE2 (our Immersion Event on Performance Tuning), I spent an entire day on the plan cache and optimizing for procedural code. But, for my pre-session (prior to recording .NetRocks), I chose to discuss ONE set of options that can be VERY helpful to reduce parameter sniffing problems. This discussion was around statement-level recompilation and SQL Server offers 3 special things that you can add to a statement to define how its plan should be handled. There are still other things that could change the behavior but simply put, I’ll go through 5 different behaviors here:

  • Default behavior
  • OPTION (RECOMPILE)
  • OPTION (OPTIMIZE FOR (@param = value))
  • OPTION (OPTIMIZE FOR UNKNOWN)
  • Using variables to obfuscate parameters

And, I have a script that will help you to go through these different scenarios. Most importantly, do not end up using ONE of these ALL the time. Remember, ALWAYS is NOT the right way to deal with performance problems.

Having said that, I know all of you have A LOT to deal with. So, where do you start? How do you begin?

First, and foremost, do your analysis on the top 10 stored procedures that essentially meet these criteria:

  1. The performance of the stored procedure wildly varies (from only a second to minutes – or at least fast to not fast as all). And, maybe it’s more like this: MOST of the time the procedure runs well but occasionally the performance really tanks. As a result, you UPDATE STATISTICS and that seems to solve the problem. Hmmm… in actuality, it might not have been the statistics that were the problem. A side-effect (most of the time) of updating statistics, is that the plans associated with them are invalidated. On next execution a new plan will be generated (after sniffing the parameters). And, if the next execution uses a more typical parameter then a more typical plan will be generated. This might be why MOST of the time it seems to be fine. Next time, instead of updating stats, consider doing sp_recompile procname. This will invalidate the proc’s plan. If this works, then you know that you need to look more closely at how that plan gets generated and whether or not it’s even good to save that plan.
  2. The stored procedure returns wildly varying results sets (sometimes it returns only a few rows, other times it returns thousands [or tens of thousands] of rows)
  3. The stored procedure is used frequently and the stored procedure has at least one parameter (in many cases the worst performing procs are those that have many parameters)

Once you know that you have a problem investigate what should be recompiled. In general, you want to recompile the smallest amount possible to solve the problem. But, how do you know what should be recompiled? Testing!

Usually, I’ll test a procedure running multiple executions, each with different parameter values that generate wildly different result sets, and I’ll execute each of these using WITH RECOMPILE. Specifically, it will look like this:

EXEC procedure @param1 = value, @param2 = value, @paramn = value WITH RECOMPILE

When a bunch of these are executed, I’ll review their graphical plans. What I’m looking for is the most expensive statement and whether or not it has the SAME plan across the different parameters used. To be honest, you don’t even care what the plan is but you do care if it varies. To OPTIMALLY find the data for a query that returns 1 row the plan might be very different from a query that returns 10,000 rows. And, if the OPTIMAL plans vary then maybe it’s not a good idea to save the plan (which is the default behavior). And, this is what leads to parameter sniffing problems (PSP).

Instead of letting the default behavior just happen, you have a few options.

Using OPTION (RECOMPILE) [available in SQL Server 2005 and higher]

This can be a great way of telling SQL Server NOT to save a plan for a particular statement. However, this causes EVERY execution to go through recompilation (which has a negative cost as well, mostly in CPU but this can also translate to time). So, you don’t really want to do this for everything. Do NOT let this become a crutch (or a “go to” option) that gets used anytime there’s a problem. Use this sparingly. But, this can be a FANTASTIC way to deal with PSP.

Using OPTION (OPTIMIZE FOR (@param = value)) [available in SQL Server 2005 and higher]

Because the cost of recompilation can become a problem, you might want to choose an alternative. In this case, you can tell SQL Server NOT to sniff the parameter(s) passed in and instead, use parameters that you supply (and define INSIDE the stored procedure). This can reduce the cost of recompilation but be careful, you have to make sure you choose parameters that are really good for ALL executions (or, at least the executions that are either the most likely or the most time critical). This is incredibly powerful but could become problematic down the road as data changes. But, I like this option MORE than I like hard-coded plans because this option WILL change as the data/statistics change. So, this might be a great option to consider.

OPTION (OPTIMIZE FOR UNKNOWN) [available in SQL Server 2008 and higher]

I remember when this option first came out. I remember thinking – what do they mean “unknown.” Do they “guess?” That just didn’t seem right… Nope, it’s not a guess. But, it’s not going to be overly obvious to most because it requires a deeper understanding statistics in SQL Server. See, parameter sniffing really translates into – they use the histogram to estimate rows which in turn is used to pick the plan. What optimize for unknown does is it doesn’t use the histogram. They do NOT look at the parameters passed in and instead they rely on something else called the density vector. Simply put, the DV is the AVERAGE. So, instead of looking up how many rows they specifically think the parameters you’ve passed in will return, they look at the average number of rows returned for that column. Then, they create a plan with those numbers. The idea is that this will give you an average plan rather than a plan tied to specific parameters that might be anomalies. But, this can work phenomenally when your data is either evenly distributed OR the average really does work well for the normal parameters used. If you have heavily skewed data then this might not give ideal results.

Using variables to obfuscate parameters [available in any version really ... ]

This is not an official way to deal with the problem but some folks have found that using this “works” and/or solves their problems. What is this doing? Well… it’s actually doing EXACTLY the same thing as OPTIMIZE FOR UNKNOWN. During compilation the value of a variable (as opposed to a parameter) is unknown. If they don’t know the value, what can they use? They really don’t have much of a choice except to use the average.

OK… so, now – how do you see all of this in action?

(1) Check out my 30 minute presentation from the user group. Here’s the presentation itself: dnrRoadTripp_StoredProcs.pdf (518.36 kb)

(2) Then, consider checking out the video (this isn’t super high quality as it was just from one of the attendees that was at the presentation but it turned out pretty well actually): http://neuronspark.com/optimizing-procedural-code/

(3) Next play with the demo script:

(3a) First, download the sample credit database for 2000 and higher here: http://www.sqlskills.com/PastConferences.asp. Use the 2000 version for 2000, 2005 or 2008. Use the 2008 version for 2008, 2008R2 and 2012.

(3b) Then, use this script to walk through these options:
RecompilationParameterSniffing&Unknown.sql (4.58 kb)

And… that should be it for this one!

If you want to hear the offical .NetRocks show that was recorded AFTER this lecture/discussion, check out .NetRocks.com. And, I’ll post a link here once it’s been published (which should be later this week). And, if you want to hear more of my fun times with DNR, check out some of our past shows. I blogged a list of our past shows here: Getting ready for DotNetRocks tonight (in NYC).

Finally, if any of you are in Vegas next week – I’m delivering an ENTIRE day (Monday, October 30) on Optimizing Procedural Code as a preconference workshop at SQLConnections.

Thanks for reading!
kt

Getting ready for DotNetRocks tonight (in NYC)

I've had quite the past with Richard Campbell and Carl Franklin and we're about to do it again… a dotNetRocks that is! They're traveling the country in an RV and stopping at all sorts of places to talk about development best practices and VS 2012. If you have the time – you should definitely check them out in person!

And, since I always have so much fun with these two I thought I'd look back on a few of the past shows we've done together. Here's a list of them if you have a bit of time to burn (or need something on your iPod for your next run ;-).

My shows with DNR

Paul's shows with dotNetRocks

My shows with RunAsRadio:

I'm looking forward to another evening of fun with these guys. I never know what to expect!

Cheers,
kt

Presentation Skills – How to Create a Connection

Erin Stellato (blog | twitter) asked on her SQLskills blog for comments/recommendations on presenting (be sure to check out the comments/links). She wrote a fantastic post on her favorite recommendations for new speakers here. And, I thought I'd add a couple of quick recommendations as well.

First and foremost, if you want to create a connection with your audience choose something for which you have a connection. Choose something that you've struggled with and that you're passionate about. And, my main recommendation for how to really connect with the audience is to show some empathy. There's a reason that these folks are in your session. They want to know something. And, if you struggled then they have too. Often, I like to write my presentation in the same way I learned something. I'll start with what I may have thought (like what an index is generally defined to do – "help queries") but then I'll go into what I learned it *really* does. And, ideally, *how* it does it. These are not always the easiest of things to teach (e.g. internals) but when you can open up the hood and show how something works you often take the mystery out of something and that – in and of itself – makes the lightbulb go on for many folks. Highlighting YOUR stuggles also makes you human. We often look unapproachable on-stage, as if we can achieve magic. But, we all put our pants on the same way and subsequently, if you can relate to them (and they with you) then you'll all enjoy yourselves A LOT more. There's nothing I can't stand more than being talked down to from a presenter (and I've seen it done time and time again) OR being talked to as if I don't know anything.

So, to summarize:

  • Find a subject for which you are passionate
  • Go back in time to when you first learned that subject and remind yourself of where/why you struggled
  • Highlight these struggles in your presentation (what did you do right, what did you do wrong, how did you ultimately get past this)
  • Talk to your audience as PEERS (they are!)

And, have fun!

Thanks for reading,
kt

PS – Part of my delay in posting is that I've been going through some dental nightmares… I grind my teeth and subsequently cracked one of my back molars (even though I wear my nightguard religiously). This led to an inlay (temp for 2 weeks and then the permanent last Wed) but in the interim got infected (or, maybe it was before the process started?) which led to an emergency root canal (NOT a happy Friday) and then a second emergency visit to the dentist for stronger antibiotics and more shots of Novocain than I can remember EVER having in my life. I had 6 on Friday (of two types because I kept "feeling" the root canal) and then 2 more on Saturday just to keep the throbbing at bay.

Anyway, it all reminded me of a polar bear that I photographed a few months ago in the Arctic (this was North of Svalbard [Norway] up in the pack ice). So, I thought I'd leave you with a picture of what I've been doing for the past couple of days:

20120714  mg 5445 Presentation Skills   How to Create a Connection