Fall SQLintersection is coming up soon and we can’t wait!

In just a couple of weeks (9-14 November 2014), we’ll be at our twice-yearly conference, SQLintersection. We’re really looking forward to it as our conference is different than many others… our sessions are hand-picked and combined to create daily “tracks” with themes. Often, I’ll even ask a speaker if they could talk about x or y or z so that it fits in well with that day’s theme. And, I only ask experts in those areas to speak. But, not just experts – truly great speakers. We really want you to walk away from a session empowered. And, nothing is more empowering than truly understanding something. Our speakers are available. If you don’t get a question answered during a session then stay after. Or, come to one of our open Q&A sessions (these are great fun). The point – we’re there to get your questions answered so that you can go back to the office with immediate solutions; immediate ROI for your time away.

SQLintersection: Great Speakers!

I’m so excited about this lineup. All of these speakers are top-rated, SQL experts that have been around in this field for years (some can even boast decades) but are still focused on consulting. Every speaker is a SQL Server MVP (with the except of the Microsoft speakers – but, I don’t think anyone’s going to question their SQL knowledge :)) and some are Microsoft Certified Masters in SQL Server. But, no matter what – ALL are focused on providing the most ROI that’s possible in their session(s) and/or their workshops. Check out this list of speakers:

  • Aaron Bertrand, Sr. Consultant, SQL Sentry, Inc. [blog | twitter]
  • Andrew J. Kelly, Mentor, SolidQ [blog | twitter]
  • Bob Ward, Principal Architect Escalation Engineer, Microsoft [blog | twitter]
  • Brent Ozar, Brent Ozar Unlimited [blog | twitter]
  • David Pless, Dedicated Support Engineer (PFE), Microsoft
  • Glenn Berry, Principal Consultant, SQLskills.com [blog | twitter]
  • Jeremiah Peschka, Brent Ozar Unlimited [blog | twitter]
  • Jonathan Kehayias, Principal Consultant, SQLskills.com [blog | twitter]
  • Joseph Sack, Sack Consulting [website | blog]
  • Kendra Little, Managing Director, Brent Ozar Unlimited [blog | twitter]
  • Kevin Farlee, Storage Engine Program Manager, SQL Server, Microsoft
  • Kevin Kline, Director of Engineering Services, SQL Sentry, Inc. [blog | twitter]
  • Kimberly L. Tripp, President/Founder, SQLskills.com [blog | twitter]
  • Paul S. Randal, CEO / Owner, SQLskills.com [blog | twitter]
  • Paul White, SQL Kiwi Limited [blog | twitter]
  • Shep Sheppard, Senior Program Manager, Microsoft AzureCAT [blog | twitter]
  • Tim Chapman, Dedicated Support Engineer (PFE), Microsoft [blog | twitter]

SQLintersection: When is it all happening?

The show officially runs from Monday, November 10 through Thursday, November 13 but there are both pre-conference and post-conference workshops that extend the show over a total of up to 6 full days. For the full conference, you’ll want to be there from Sunday, November 9 through Friday, November 14.

  • Sunday, November 9 – pre, pre-con day. There are three workshops running:
    • SQL Server Architecture: Choosing Virtualization, Clustering, AlwaysOn, and More by Jeremiah Peschka, Kendra Little, and Brent Ozar
    • Core SQL Server Fundamentals by Andrew Kelly
    • Mastering SQL Server Execution Plan Analysis by Paul White
  • Monday, November 10 – pre-con day. There are three workshops running. And, the first keynote of the conference is Monday evening.
    • Performance Troubleshooting Using Waits and Latches by Paul S. Randal
    • High Performance, Scalable Asynchronous Processing Using Service Broker by Jonathan Kehayias
    • Very Large Tables: Optimizing Performance and Availability through Partitioning by Kimberly L. Tripp
  • Tuesday, November 11 through Thursday, April 13 is the main conference. Conference sessions will run all day in multiple tracks:
    • Download our full schedule here
    • Be sure to check out some of our cross-conference events such as our Women In Tech luncheon on Tuesday!
  • Friday, November 13 is our final full day running with a few post-con workshops. There are a few workshops running.
    • Queries Gone Wild 2: Statistics and Cardinality in Versions 2008, 2008R2, 2012, and 2014 by Kimberly L. Tripp and Joe Sack
    • Developer’s Guide to Tuning Somebody Else’s SQL Server by Jeremiah Peschka, Kendra Little, and Brent Ozar

SQLintersection: Why is it for you?

If you want practical information delivered by speakers that not-only know the technologies but are competent and consistently, highly-rated presenters – this is the show for you. You will understand the RIGHT features to troubleshoot and solve your performance and availability problems now!

We hope to see you there!
kt

PS – Use the discount code ‘SQLskills’ when you register and receive $50 off registration!

Multi-column statistics and exponential backoff

After writing SQLskills procs to analyze data skew and create filtered statistics, I received a couple of emails asking me to further explain the sentence “This is also true for multi-column, column statistics (where you need to use MORE than just the first column) and hypothetical indexes created with auto pilot.” (NOTE: this refers to the fact that these don’t work in the new CE.)

So, in this post, I want to further explain multi-column statistics and how both the legacy and new CE would calculate selectivity using the Credit sample database. If you don’t have a copy of this, you can download a SQL Server 2008 backup from here. To reproduce everything shown in this example, you must restore this to SQL Server 2014. FYI – at the time of this blog post, I’m running SQL Server 2014 CU3 (version = 12.0.2402.0).

What CE Model are you running?

Review the default cardinality estimation model used across all of your databases:

SELECT  [name] AS 'Database Name'
    , CASE
        WHEN [compatibility_level] = 120 THEN 'New SQL Server 2014 CE Model'
        ELSE 'Legacy CE Model'
      END AS 'Cardinality Estimation Model'
FROM    [sys].[databases];
GO

What does the Legacy CE Model use if there are multi-column statistics?

Check out the indexes and statistics on the charge table:

EXEC [sp_helpstats] '[dbo].[charge]', 'all';
GO

You should see that there are ONLY statistics on existing indexes:

statistics_name          statistics_keys
------------------------ ------------------
charge_category_link     category_no
charge_provider_link     provider_no
charge_statement_link    statement_no
ChargePK                 charge_no

We’re going to run some queries against the charge table and we’re going to query against the category_no and the provider_no columns… while indexes for category_no and provider_no might be helpful, I want to drop those and just see how the estimates work with multi-column, column statistics:

DROP INDEX [charge].[charge_provider_link];
DROP INDEX [charge].[charge_category_link];
GO

Without those indexes, SQL Server does not have any column-level statistics for the category_no or provider_no columns. To help the queries we’re going to run, we’ll create a multi-column, column statistic:

CREATE STATISTICS [TestStat]
ON [dbo].[charge] ([provider_no], [category_no]);
GO

Now, we’ll see what happens when we run a query using the legacy CE model vs. the new CE model. Be sure to turn on “Show Actual Execution Plan” in the Query drop-down menu.

SELECT [ch].*
FROM [dbo].[charge] AS [ch]
WHERE [ch].[provider_no] = 434
    AND [ch].[category_no] = 10
OPTION (QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9204, RECOMPILE);
GO -- TF 9481 = CardinalityEstimationModelVersion 70

Looking at the showplan: PlanEstimate withArrows Multi column statistics and exponential backoffYou can see the estimate and the actual are fairly close (but, by no means perfect). To understand where this comes from – you need to understand the density vector component of a statistic:

DBCC SHOW_STATISTICS ('[dbo].[charge]', 'TestStat')
WITH DENSITY_VECTOR;
GO

The “All density” columns can be used to calculate the average number of rows that are returned when that column (or, combination of columns – as you add the second, third, etc. columns – shown as rows [when present] in the density vector). And, the column “Columns” shows the combination. For another example, if we had created a statistic on columns such as Lastname, Firstname, and Middleinitial then the all density could help us understand the average number of rows returned when supplying just a lastname; or, when supplying both a lastname AND a firstname; or, finally, when supplying a lastname, firstname, and middleinitial. What the density vector does not provide is a way of knowing the selectivity of any of the secondary columns on their own; it knows ONLY of the left-based combinations. This is one of the reasons why column-level statistics are helpful; they can provide the densities of the secondary columns on their own.

For this query, we’re going to use the “All density” of the combination of  provider_no and category_no (from the 2nd row of output). The “All density” value is 0.0002914602. If we multiple that by the number of rows in the table (at the time the statistics were created) then we can get the average number of rows returned. To see the number of rows in the table, we need to review the statistics_header component as well as the density_vector. This is easiest by re-running our DBCC command without the WITH clause.

DBCC SHOW_STATISTICS ('[dbo].[charge]', 'TestStat');
GO

The result is to multiply the “All density” of 0.0002914602 times 1600000 rows for an average of: 466.3363200000 (this is where the estimate of 466.336 is calculated).

What about the New CE Model in SQL Server 2014?

First, we’ll re-run the query but force the new CE model using trace flag 2312:

SELECT [ch].*
FROM [dbo].[charge] AS [ch]
WHERE [ch].[provider_no] = 434
AND [ch].[category_no] = 10
OPTION (QUERYTRACEON 2312, QUERYTRACEON 3604, QUERYTRACEON 9204, RECOMPILE);
GO -- TF 2312 = CardinalityEstimationModelVersion 120

PlanEstimate 2014CE withArrows 1024x891 Multi column statistics and exponential backoff

And, now we see the same plan – but, this time, the estimate is further off from the actual. And, another question might be – where did that number come from?

In SQL Server 2014, regardless of multi-column statistics, the estimate for multiple conjunctive (AND) predicates is calculated using exponential back-off. The idea is that they take the highest selectivity first and then multiply that by the subsequent square roots of the next three less selective predicates. Specifically:

most-selective-predicate * sqrt(next most selective predicate) * sqrt(sqrt(next most selective predicate))sqrt(sqrt(sqrt(next most selective predicate)))

In this case, they first need to calculate the selectivity of the two predicates supplied:

To calculate the selectivity for [provider_no] = 434, use the histogram from the TestStat multi-column statistic (but, they don’t use the density vector of the combination). The histogram actually has a step for 434 and it shows 6625.247 rows in the EQ_ROWS column. The selectivity of this can be calculated as 6625.247 / 1600000 OR 0.00414077937.

To calculate the selectivity for [category_no] = 10, use the histogram from the auto-created statistics on category_no (and, yes, this would have been created for this query if SQL Server hadn’t already created it for the other statement’s execution [which doesn't entirely make sense because they didn't use it. But, that's another discussion for another day. And, if I'm being honest, as long as these get maintained properly, I'm ALL FOR AUTO CREATE STATS and I'd stand from the rooftops and scream it if it wouldn't get me arrested... OK, sorry. I digress].

Once again, we’re doing well with our histograms as there’s an actual step for 10 and it shows 179692.4 rows in the EQ_ROWS column. The selectivity of this can be calculated as 179692.4 / 1600000 OR 0.112307750.

Now – to calculate our estimate… use the MOST selective predicate (0.00414077937) * the SQRT of the next most selective predicate (0.112307750) * the number of rows:

SELECT 0.00414077937 * sqrt(0.112307750) * 1600000;
GO

For:

New CE Model Estimate
----------------------
2220.27585898223

NOTE: How this is calculated may change in a later SP or CU so I can only vouch for 2014 RTM through CU3.

Sure enough, this matches our showplan output.

What would the Legacy CE Model have done without multi-column statistics?

If we had not had our multi-column statistics then the Legacy CE Model would have just expected even distribution of providers across categories. To do this they’d simply multiply the selectivities (NOT backing off at all):

SELECT 0.00414077937 * 0.112307750 * 1600000 AS [Legacy CE Model Estimate];
GO

For:

Legacy CE Model Estimate
---------------------------------------
744.06658286578800000000

Tip: Drop the ‘TestStat’ statistic and run the query again using TF 9481 to see this.

So, which CE is best?

This is where the good news and bad news comes in. There’s NO model that can deal with every type of data distribution possible. When data is evenly distributed across the different columns then the old model can produce a more accurate result:

Legacy CE Model estimate: 744.07

Multi-column statistics (just the AVERAGE of provider and category TOGETHER and across ALL values): 466.34

New CE Model estimate: 2220.28

ACTUAL: 656

The point – I could create another example where the New CE Model is the best. Here the Legacy CE is the best but the legacy CE doesn’t even use it because they rely on multi-column statistics (and therefore averages across all categories and providers). So, the irony is that they get further away with the generalized multi-column statistic. But, I could also come up with yet another example where the New CE Model produces the best result (and another where the multi-column statistic is best). In the end, it completely depends on THE DISTRIBUTION OF THE DATA.

But, the really good news is that you have a lot of troubleshooting and control options here. My recommendation (you can read more about it in the post I linked to at the start of this post) is that you STAY using the Legacy CE Model and where estimates are OFF (or, where you’re troubleshooting a suspected cardinality estimation problem), TRY the new CE using the QUERYTRACEON option. You can even try TF 4137 (this has been available since SQL Server 2008*) if you know that one value is a direct subset of the other (TF 4137 uses the MINIMUM selectivity of the predicates and does NOT perform a calculation).

Above all – have fun!
kt

* Thanks to Pedro for reminding me to mention that TF 4137 has been around since SQL Server 2008. So, if you have predicates that are subsets of the others (WHERE city = ‘Chicago’ AND state = ‘IL’ AND country = ‘USA’) then you would NOT want EITHER model to estimate (even exponential back-off is going to be wrong here even though it will be higher than the old model). Here, Chicago is a subset of IL and IL a subset of USA. Using the MINIMUM selectivity of these predicates (which would be the selectivity of Chicago) would be best. However, if you had any other predicates in there – then, all bets are off. See – this is SUPER HARD! ;-) ;-)

SQLskills procs to analyze data skew and create filtered statistics

At PASS Summit 2013 I delivered a presentation titled: Skewed Data, Poor Cardinality Estimates, and Plans Gone Bad (you can watch this session on PASStv here). Note: please fast forward 4 mins as they didn’t tell me they were recording / publishing my “chatting” before the session – which is weird IMO but if you want to hear me talk about diving and sharks and octopus then there it is. As for the actual content of the session, it’s all about understanding the limits of SQL Server’s histogram (which has a maximum of 201 steps) and the result that estimates from skewed data (using step averages) can be highly incorrect – resulting in estimates that are off and query plans that aren’t correct. As part of that session, I demo’ed some code to help programmatically analyze histograms for skew and also code that will create filtered statistics for you – based on a few parameters. In this post, I have an update to those scripts. But, there’s also a bit of good news and bad news that goes along with filtered statistics. Let me start with the good news…

The Good News

  • Filtered statistics work really well in SQL Server 2008, SQL Server 2008 R2, and SQL Server 2012. (hmm… I bet you can already figure out the bad news)
  • The scripts have been updated to fix a couple of minor issues (one where I checked for version but didn’t allow them to run on SQL Server 2014).
  • sp_SQLskills_DropAllColumnStats has been modified to ONLY drop filtered statistics that start with SQLskills_FS (which is the naming convention I use when I create filtered stats).
  • sp_SQLskills_CreateFilteredStats no longer requires a “max” value for tables. The final filtered statistic created will be unbounded. Also, depending on how many steps you have, there were a few weird situations where you might end up with a statistic that is definitely >= value and < same value. This would NOT hurt any estimates but it also wasn’t very useful. Totally benign but also fixed.

The Bad News

Filtered statistics are not yet accessible by the SQL Server 2014 cardinality estimator (tested up to SQL 2014 CU3). This is also true for multi-column, column statistics (where you need to use MORE than just the first column) and hypothetical indexes created with auto pilot. But, there is good news here as well – you CAN still use SQL Server 2014 and take advantage of filtered statistics by accessing the legacy CE at the query level. In fact, if you run using the legacy CE in 2014 then you can access all of these things (and, again, you can do this on a query by query basis if you’d like). So, let me give you a few options and a recommendation.

Cardinality Estimator Options for SQL Server 2014

Option 1 – low impact (my recommendation)

  • Upgrade existing databases (through backup / restore)
  • Leave the existing compatibility level intact (SQL Server 7.0 CE is the default for compatibility modes of SQL Server 7.0 through SQL Server 2012). Remember, restoring / attaching  does not “upgrade” the compatibility level; it remains the level it was when it was backed up / detached.
  • For the following types of queries, test the query performance using trace flag 2312. If you see a benefit, use the new CE model in the query with OPTION (QUERYTRACEON 9481).
    •  That aren’t performing well
    • Whose estimates are far off from actual
    • Where you want to use filtered statistics or auto pilot

Option 2 – low impact

  • Change to compatibility level 120
  • Enable trace flag 9481 server-wide
  • Again, use query-level changes for accessing the new CE.

Option 3 – potentially high impact (potentially good; potentially bad)

  • Change to compatibility level 120 (New CE model)
  • When troubleshooting, test trace flag 9481 against queries that have regressed
  • Use trace flag 9481 for queries that regressed

Playing with the skew analysis scripts and filtered statistics scripts

If you’re still wanting to determine if your very large tables show signs of skew and potentially poor estimates – check out my PASStv presentation from last year. Then, replace the SQLskills project with the scripts in this zip (SQLskillsProcs). If you keep your database compatibility mode at 110 (for SQL Server 2012) then you won’t need to make any changes to the other examples from that session. However, if you change to the SQL Server 2014 compatibility mode (120) then you’ll need to add this to the statements where you want to leverage filtered statistics.

SELECT blah blah
...
OPTION (QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9204)

9481 = Legacy CE model for SQL Server versions 7.0 – 2012

3604 = this will display the output to the results window

9204 = this will display the statistics used by the query (note, I can’t seem to get this to work when using the new CE at all)

Have fun and let me know what you find. Cheers, kt

New York .Net and SQL Server User Groups – We’re back!!

Date & Time: Tuesday, February 25, 2014 – 6:30-9pm

Evening theme/title: SQL Server – Are you talking to me?

Groups/Target Audience: SQL Server and .NET User Groups, anyone that works with SQL Server that wants to talk techie on Tuesday evening

Cost: FREE (but, you must register; link at the end of this post)

Presenters: Kimberly L. Tripp & Jonathan Kehayias, SQLskills.com

Abstract: Our title seems strange but it runs across multiple themes. First we’ll take a look at Service Broker in SQL Server and how it can extend service oriented architectures into the data tier with asynchronous message processing. As for the other theme, database developers, DBAs, architects – don’t always talk to each other. In this evening session, we’ll cover a mix of really interesting topics that help bridge the gap across these disciplines. Reading plans and understanding plans with Plan Explorer (both the free and the Pro versions will be discussed) – this is something everyone should be able to do. And, we’ll raffle off a PE Pro Edition (thanks SQL Sentry) in addition to talking about the FREE Plan Explorer. Then, we’ll dive into statement execution methods and plan cache. Different coding strategies on our servers can have a profound effect on both cache and performance. But, how can you write more effective code that takes less caching and has better plans? Some of this content comes from a newly released Pluralsight course titled: SQL Server – Optimizing Ad Hoc Statement Performance and for those of you that attend – Pluralsight will give you a monthly trial. And, we’ll also raffle off two Annual Plus subscriptions (thanks Pluralsight)!

How can we all just get along? Come to this fast-paced, demo-packed session led by industry-experts Kimberly L. Tripp and Jonathan Kehayias of SQLskills. And, after the presentation, Kimberly, Jonathan, and Paul Randal (also of SQLskills) will hang out for an open Q&A on almost-anything-SQL. We’ll post a list of topics and open up the floor!

Sponsors:

Rough Agenda:

  • 6:30pm – Pizza (thanks Pluralsight & SQL Sentry), networking, settling in
  • 7:00pm
    • Service Broker (~15 mins) – Jonathan Kehayias
    • SQL Sentry’s Plan Explorer (~15 mins) – Jonathan Kehayias
    • Statement Execution (~15 mins) – Kimberly L. Tripp
    • Plan Cache (~15 mins) – Kimberly L. Tripp
  • 8:15pm – Raffles (must be present to win)
    • SQL Sentry Plan Explorer Pro License
    • Pluralsight Annual Plus Subscriptions
    • $100 off (per day) for an Immersion Event of your choice (maximum value $500 for a 5-day event)
    • Everyone – 30-day trial subscription to Pluralsight
  • 8:30pm – Open Q&A (all 3 of us)

Location:

Microsoft
11 Times Square (west 41st street and 8th avenue)
New York, NY 10036

Check-in with security in the lobby and then you’ll proceed to the 9th floor. Our meeting will be held in the Music Box/Winter Garden MPR.

And a final, big thanks to our evening hosts!

Microsoft contact: Jesus Aguilar, http://www.giventocode.com/
SQL Server User Group contact:
Joe Lax, http://nycsqlusergroup.com/
.NET User Group Contact: Andrew Brust, http://www.nycdotnetdev.com/

Want to join us – register here (cost: free):

https://clicktoattend.microsoft.com/en-us/Pages/EventDetails.aspx?EventId=179143

Physical Database Design Considerations and a bit of freediving

Something that always amazes me is that people think SQL Server is easy. And, I’d like to (at least partially) blame Microsoft marketing for that. Over the years they’ve propagated this notion of SQL Server being a simplified database environment even “zero-administration” at one point (anyone remember that marketing nightmare?). And, while there are MANY things that are simplified – it’s by no means SIMPLE. And, simply put, it’s just like EVERY OTHER TOOL you’ve ever used. If you want to use it effectively, you need to learn how to use it properly.

Derons Shot of me 200x300 Physical Database Design Considerations and a bit of freediving

Photo (of me w/some beautiful mantas) taken by Deron Verbeck of WHOA (www.wild-hawaii.com). They’re a group with whom I always go out when I’m in Kona, HI. If you want to see marine wildlife and get out on the water with very knowledgeable folks (on a fantastic boat) – this is the company to pick!

Over the weekend I took a class in freediving (like snorkeling, but my goal is to dive/photograph animals [mostly cetaceans - who really don't like bubbles] “on a single breath”). But, it’s not just about holding your breath. There are breathing techniques for slowing down your heart rate and making yourself more comfortable in the water. There are skills for coming back up out of the water after a long breath hold (it’s called recovery breathing and it’s meant to help reduce/prevent shallow water black-out [Samba] and full black-outs). And, there’s a HUGE emphasis on safety/buddy dives. Even with just a two-day class, I was diving deeper and holding my breath longer than I ever thought possible. I believe in classes. I believe in safety. I believe that you can never stop learning. I believe in IMMERSION (in more ways than one :).

My main point: It’s always better to know how to do something properly EARLY so that you can avoid problems later. Luckily, with SQL Server, it’s [usually] not that critical/dangerous (but, what about hospital systems, transit systems, safety/911 systems, etc…).

My point, the more you know about the RDBMS, the better you’ll be able to design for it. The better your applications will scale. The longer life your applications will have and the longer you’ll be able to use the hardware that you buy.

Get more information on Physical Database Design

These are the points that I’ve been making in a couple of my most recent releases and I wanted to make sure you knew they were available:

For FREE and available for download – my interview on DotNetRocks. I love these guys. They’re great friends of ours (did you know that Richard MC’ed our wedding?). But, it’s so much more than their being friends/friendly. They’re absolute geeks and they can really do killer interviews. Check out show 913: (yes, nine-HUNDRED thirteen): Physical Database Design with Kim Tripp. I’ve also done a few other shows with them. If you want a few more shows, check out this post: Getting ready for DotNetRocks tonight (in NYC).

My class on Pluralsight: SQL Server: Why Physical Database Design Matters. And, you can even get a 10-day/200 hour trial subscription HERE.

You always want to keep learning and there are lots of great options – including conferences. If you’re attending either SQL PASS in Charlotte next week OR SQLintersection in Vegas at the end of the month – please stop by and say hi. We’re looking forward to seeing a lot of you over the next few weeks.

Cheers,
kt

PS – Here are a few additional links if you’re interested:

And, since I LOVE u/w photography – here are a few shots I took while out with WHOA:

201310041158 IMG 1988 300x200 Physical Database Design Considerations and a bit of freediving

White-tipped Oceanic Shark

201310041025  MG 7729 300x200 Physical Database Design Considerations and a bit of freediving

Bottlenose Dolphin

201310041146 IMG 1981 300x199 Physical Database Design Considerations and a bit of freediving

Short-finned Pilot Whales

201310041008 IMG 1980 199x300 Physical Database Design Considerations and a bit of freediving

Black-tipped Oceanic Shark

Always wear sunscreen

OK, I know… this post is going to remind many of you of Baz Luhrmann’s Everybody’s Free to Wear Sunscreen. And, for me, ’99 was already probably too late (having been a beach lifeguard in the ’80s). But, I’d like to add a vote to that being very good advice.

This post is not going to be a technical one. It’s more of an experience one. And, one that I think everyone can benefit from. It’s true with everything in life – the sooner you find a problem, the easier it is to solve.

Back in 2010, I had a spot on my cheek that didn’t feel like much more than a little sliver of skin sticking up. It seemed to go away but then come back again. I saw a dermatologist who said that this is very common and is called Actinic Keratosis. It’s sometimes referred to as “pre-cancer” but the likelihood of it becoming cancerous is about 1 in 100 (this is just want I remember) and that we have an easy solution to freeze it off using cryotherapy. So, it was frozen off and gone. Well, for a few months. Then, it came back. And, then I had it frozen off again.

But, as life would have it, time went by. And, the spot got a bit bigger. But, somehow in my mind – the thought of it becoming cancerous was so low. I just became complacent. Last year, Paul started harassing me that I should go back and get it looked at. I argued that it was no big deal. It’s nothing. And, it wasn’t really doing anything interesting. Until this year. This year it actually scabbed and bled a couple of times. And, it seemed to be getting a tad larger. And, so I started doing some research, I read that there were a few signs that warranted having a skin lesion looked at:

  • Irregular in shape
  • Raised on the skin
  • Sore that never seems to completely heal or go away
  • Growing
  • Larger than the eraser of a pencil

And, all of a sudden I was a bit worried. So, I scheduled an appointment with a dermatologist (my prior dermatologist had left the practice). But, with my schedule and their schedule – this appointment got pushed out and pushed out and pushed out. Finally, I went to the dermatologist after we returned from our summer vacation. Where, as usual, we had been diving. And, yes, where I had been in the sun. So, as usual, I wasn’t really tan. I was more of a brownish shade of red. And, off I went to go see a dermatologist.

Now, I don’t know where things went wrong but this is how I remember that appointment (this is the short version):

(pleasantries, etc.)

Me: Actually, it looks about the best its looked in 6 months. The combination of salt water and sun seems to have done it some good.

Derm: Started to look at my cheek and ended up looking at the wrong spot… that’s how uninteresting it was…

Me: Well, I started to get nervous a few weeks back when I read the ABCs of skin cancer: http://cancer.about.com/od/skincancermelanoma/p/abcdeskincancer.htm

Derm: Oh, no. That only applies to BROWN spots.

OK, so this was a new dermatologist. And, we didn’t really know each other. And, she seemed pretty convincing that I was there for no reason. She proceeded to tell me that while the Actinic Keratosis that I was diagnosed with was sometimes called “pre-cancer” that the likelihood of it being cancer was really low (yes, I’d heard that before) AND that it usually takes decades for it to become cancerous.

So, while I wasn’t so impressed with the bedside manners of this dermatologist, they were [at least] making me feel like there was nothing to be worried about. In the back of my mind, I started thinking – maybe I don’t even need to get this biopsied. Maybe I should just get it frozen off again.

But, a bit more time with this dermatologist made me just want to “get it over with.” To be honest, I should have asked a lot more questions. However, I was already feeling like I was wasting their time. So, I just decided to move forward. We did the biopsy. I remember asking what would need to be done if it was something and their response was “not much more than what we’re doing today.” (but, again, it felt more like a brush-off than a real answer)

I left. I went home. I got more and more frustrated for the next few hours as I realized what an incredibly frustrating visit I had had. The doctor didn’t really interact. They didn’t seem to want to answer questions. I thought – at least I can research things and investigate this. I have people to chat with about this, etc. What would the experience be (with that doctor) for someone else who doesn’t use the internet, etc. I ended up writing a 2 page letter to the head of dermatology. Here’s a VERY small highlight:

Today, I had my first and last appointment with Dr. XYZ of Dermatology.

… They need to understand that just because my problem isn’t interesting to them, it’s still something that I want taken seriously. …

The good news is that they took away most of my fears of something nasty; I’m looking forward to hearing my incredibly lame results.

So, I waited. It was a Tuesday when I had the appointment. It was Tuesday night when I wrote the letter. Then, on Wednesday, I was even more frustrated with the biopsy. I was told not to remove the band-aid for 24 hours but when I did, I had a grand-canyon-like crevice on my face. [Note about my overreaction: it was about the size of a dime. But, it was on my face and it was awful looking. All I could think about was - sh*t, that's gonna leave a mark. And, sh*t why didn't they tell me that this biopsy was going to be so much bigger than the original spot and much deeper than I expected. Ugh. Again, I wish I had asked more questions.]

And, now I was even more frustrated. It looked like I was going to have a scar – for nothing. I ended up calling my regular doctor who put all of this to rest. Yes, shave biopsies tend to look really gnarly at first but they tend to heal pretty well. And, even a couple of days later, it was looking much better. OK, no biggie.

But, then I got the results (Friday afternoon at 4:30pm btw – have a nice weekend). It was Squamous Cell Carcinoma. Yes, it is cancer. On. my. face. And, Dr personality referred me to a specialist and my interactions with them were done.

So, that was it. I had a consult setup for the following week and I was going to have the procedure done at that time as well. It’s called Mohs Surgery.

OK, there goes my weekend; that was not what I expected.

So, I started reading and reading and reading and reading (er, the internet is both a good and a bad place to go for information). Probably what stood out in my mind the most was this: The average age for SCC is 66. People that get it at a young age are more likely to get other cancers. Great.

The good thing though is that Mohs Surgery didn’t sound too bad and I thought that it was probably the right direction. But, then I started to read about topical solutions (specifically Imiquimod). And, I started to wonder if Mohs was the best next step. And, then I decided I wanted a second opinion. So, I chatted with a few folks and got a referral for another dermatologist. She was out for a couple of days but her nurse talked to the head of their dermatology department and agreed that I should chat with someone else before I have the surgery. So, we cancelled that and I went in for my second opinion.

And, the long story short is that she was wonderful. She felt that Imiquimod wasn’t really the best for what I had. Instead, we should go with Fluorouracil. The Mohs Surgery has a 99% success rate but also was likely to leave a very large scar. My SCC wasn’t very deep but it extended beyond the area of the biopsy. So, we weren’t really sure how large the area actually was. The topical cream has an 85-90% success rate and would be a bit of a pain in the *ss for a few weeks (about 2 months) but would be a lot less likely to scar. And, of course, I can always go back and have the surgery.

So, we decided to go that route. And, that’s really the point of this post. I’m about to start teaching again next week, a couple of conferences are coming up, and my face isn’t all that attractive. In fact, tomorrow’s my birthday and Paul’s decided that the best present for me is a paper bag with two holes that just reads “wife” (yep, isn’t he nice :) ). And, it was going to be obvious that something is up.

But, even more than the lovely visual that is my cheek; I’ve learned a lot from this experience. And, if anything, maybe this will help some of you!

  1. If you don’t like a doctor, get another one. And, tell the clinic/hospital, etc. about your experience. In chatting with a few folks – they said that they had experienced something similar but they never wrote a letter. I realize that a doctor’s time is important. But, so is mine.
  2. Get a second opinion. Talk to people.
  3. Do research. The best post I’ve found about Fluorouracil is this one: http://www.sannerud.com/people/efudex/. My experience isn’t as bad as his but the cream definitely stings. The spot itches. It’s a bigger pain in the *ss than I had expected.
  4. Stay up on your health. The sooner you find a problem, the easier it is to solve. My SCC isn’t horrible. We did catch it very early. It’s very slow growing. But, I still [probably] should have taken care of this last year.
  5. Always wear sunscreen

So, I’ll leave you with a quick shot of what I looked like a few days ago. I’m not sure I’ll do the whole timeline thing as I really just don’t want to remember this nasty thing. But, it’s not contagious and I am fine. It just looks nasty!

130910 125131 300x168 Always wear sunscreen

Thanks for reading,
Kimberly

(USE THIS): New SQL Server 2012 rewrite for sp_helpindex

Since I’ve rewritten sp_helpindex a few times, I have a few blogs posts in this category. Each time I do an update I’ll make the NEW version titled USE THIS.

To use my version of sp_helpindex, you need TWO scripts. One script is version-specific and the other works on versions 2005, 2008/R2 and 2012. All versions need this generic base procedure to produce the detailed output.

Step 1: Setup sp_SQLskills_ExposeColsInIndexLevels

Create this procedure first: sp_SQLskills_ExposeColsInIndexLevels.sql (7 kb).

This is what gives us the tree/leaf definitions. And, this works for all versions: 2005, 2008/R2, and 2012.

Step 2: Setup the replacement procedure for sp_helpindex. This IS version specific:

On SQL Server 2005, use: sp_SQLskills_SQL2005_helpindex.sql (11 kb) to create sp_SQLskills_SQL2005_helpindex.

On SQL Server 2008, use: sp_SQLskills_SQL2008_helpindex.sql (12 kb) to create sp_SQLskills_SQL2008_helpindex. (NOTE: This does run on SQL Server 2012 but if your table has a columnstore index, it will generate an error.)

On SQL Server 2012, use: sp_SQLskills_SQL2012_helpindex.sql (12 kb) to create sp_SQLskills_SQL2012_helpindex.

Step 3: Setup a hot-key combination

Optionally, setup this procedure to be invoked through a keyboard shortcut using Tools, Options, Environment/Keyboard. I usually make it Ctrl+F1 and I described how to do this here.

The Output

On SQL Server 2012, the output will look like the following (index_id 5 is a columnstore index):

index_id is_disabled index_name index_description index_keys included_columns filter_definition columns_in_tree columns_in_leaf
1 0 [member_ident] clustered, unique,   primary key located on PRIMARY [member_no] NULL NULL [member_no] All columns   “included” – the leaf level IS the data row.
2 0 [member_corporation_link] nonclustered located   on PRIMARY [corp_no] NULL NULL [corp_no],   [member_no] [corp_no],   [member_no]
3 0 [member_region_link] nonclustered located   on PRIMARY [region_no] NULL NULL [region_no],   [member_no] [region_no],   [member_no]
4 0 [LastNameInd] nonclustered located   on PRIMARY [lastname] NULL NULL [lastname],   [member_no] [lastname],   [member_no]
5 0 [columnstore_index] nonclustered   columnstore located on PRIMARY n/a, see   columns_in_leaf for details n/a, columnstore   index n/a, columnstore   index n/a, columnstore   index Columns with   column-based index: [member_no], [lastname], [firstname]

I hope this helps you when looking at your indexes!

Enjoy,
kt

The Accidental DBA (Day 28 of 30): Troubleshooting: Blocking

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

SQL Server uses locking for a variety of things – from protecting and isolating resources to “indicators” showing that someone is using a database or accessing a table. Many locks are compatible with other locks; it’s not always about limiting a resource to only one session. But, when locks are incompatible blocking can occur. If the blocker is efficient the blocked user might not even realize that they were momentarily blocked. And, a normal system will always have some blocking; it’s a natural occurrence. However, if there are long-running transactions or transactions that affect large numbers of rows – blocking can really feel like a major problem. But, is it really the locks? Or, instead the blocker?

Root Cause Analysis

In many cases, root cause analysis will reveal some inefficiency in your environment:

  • Imagine a poorly coded update that sets every column to the value in the dialog because they’re too lazy to determine WHICH column(s) have actually changed. This causes modifications in the data row and across numerous indexes that aren’t even changing.
  • Imagine an update that only needs to modify a subset of rows but has no index to help search
  • Imagine a transaction that begins, modifies some data, and then waits for user input. The locks are held at the time the data is modified and modification-related locks are not released until the transaction completes. In this case it’s an indefinite amount of wait time. This can cause HUGE problems – not just blocking but also in logging (and therefore recovery).

My point, locking is not always the main cause of the problem but, it often gets blamed. Instead, locking and blocking is really a symptom of some inefficiency and further analysis will help you to better understand where your real problem is. But, how do you analyze it?

Analyzing Blocking

When performance is poor there are many options to check. In general, we always recommend starting with wait statistics. In Erin’s post The Accidental DBA (Day 25 of 30): Wait Statistics Analysis she mentions using sys.dm_os_wait_stats. Regularly using this and collecting a baseline of your server’s general characteristics will help you when your system is slow or to see if something’s changed (or changing) over time. Be sure to read Erin’s post as well as the posts she references. The more you know about your server when it’s healthy, the more equipped you’ll be when there’s a problem.

And, if you have a blocking situation right now then the DMV to use is sys.dm_os_waiting_tasks. This can tell you if someone is blocked and which SPID (server process ID) is the blocker. However, this can quickly become complicated if there are multiple connections (SPIDs) involved. Sometimes, finding who is at the head of the chain is part of the problem. And, since you’ll need to know more about what’s going on, you’ll want to use sys.dm_tran_locks. And, instead of reinventing the wheel, check out Glenn Berry’s A DMV A Day – Day 27 (sys.dm_tran_locks), specifically, for the blocking query that orders by wait_duration_ms DESC. This will give you an idea of who’s at the head of the chain because the lock held the longest will be at the top – showing who they’re being blocked by.  This will lead you to the blocker. But, what are they doing?

Once you know the SPID at the head of the chain, you can use a variety of commands to start piecing together what’s happening. But, I’d actually recommend a few other things instead:

  1. The completely OLD SCHOOL method is sp_blocker_pss08. You can get the code from this KB article [KB 271509]. The article says it’s only for SQL Server 2000 and SQL Server 2005 but it still works well – even on SQL Server 2012. And, if your company has an issue with running third party products, then this might work out well for you. It’s simple, it’s just TSQL and it gives you a variety of pieces of information if something is blocked right now.
  2. The up-to-date way to determine locking problems is to use SQLDiag. But, there’s a bit of a learning curve with this as it’s command-line based and requires a bit more work than just the execution of an sp. You should definitely get some time with it but if you’re trying to troubleshoot a blocking problem right now, now is not the time to learn SQLDiag.
  3. The easiest (third-party tool) is Adam Machanic’s sp_WhoIsActive and it really does a nice job of producing the information that the old sp_blocker_pss08 produces but in tabular form. And, Adam has blogged quite a bit of information about using this utility.

And, if you’re trying to see if patterns exist over time, consider Using the Blocked Process Report in SQL Server 2005/2008. Jonathan did a great write-up of how to set this and use this to generate a trace of the activity that’s running at the time a process hits 5 seconds of being blocked.

The Resolution

Ultimately, you need to find out who is doing the blocking first – why is their transaction taking so long? If it’s due to inefficiencies in the query – can you rewrite it? If it’s due to inefficiencies in the plan – can you add an index? If it’s modifying a large amount of data – can you break it down into smaller chunks so that each set is locked for a shorter period of time? These are ALWAYS the thing to try first.

Consider Row Versioning

If you truly have an optimized system and it’s highly active with both readers and writers who are just constantly getting in each other’s way (and causing blocking), then you might consider using a form of row versioning. This is much more complicated than a quick post can capture but I’ve see “snapshot isolation” (as it’s often called) explained incorrectly numerous places. Simply put, you can have your database in one of FOUR states:

  1. Read Committed using Locking: this is the default – with NONE of the row versioning options enabled.
  2. Statement-level Read Consistency (or, read committed using row versioning): this is what you run if you turn on ONLY the database option read_commmitted_snapshot. This causes readers (in read committed isolation) to use versioned-based reads guaranteeing them a definable point in time to which their QUERY (or, statement) reconciles. Each statement reconciles to the point in time when that statement began.
  3. Transaction-level Read Consistency (or, Snapshot Isolation): this is what you get if you turn on ONLY the database option allow_snapshot_isolation. This ALLOWs users to request a versioned-based read and in a transaction, will cause ALL reads in the transaction to reconcile to when the transaction began. However, it’s important to note that this option adds the overhead of versioning but readers will use locking unless they request a snapshot isolation session using: SET TRANSACTION ISOLATION LEVEL SNAPSHOT.
  4. The forth state is when BOTH database options have been set. If you turn on both read_committed_snapshot and allow_snapshot_isolation then all statement’s reconcile to the point in time when the statement started (in read-committed). Or, if you’ve changed your isolation to snapshot then each statement will reconcile to the point in time when the transaction began.

NOTE: There are numerous places [online] where it’s stated that both of these options are required for versioning; this is incorrect. You can have neither, only one, or both. All four states produce different behaviors.

The beauty of versioning is that readers don’t block writers and writers don’t block readers. Yes, I know it sounds fantastic but be careful, it’s not free. You can read the detailed whitepaper that I wrote about it when it was first released in SQL Server 2005 (updated by Neal Graves): SQL Server 2005 Row Versioning-Based Transaction Isolation. And, I also did a video on it here: http://technet.microsoft.com/en-US/sqlserver/gg545007.aspx. And, if you think that the overhead might be too much for your system, check out the case study from Nasdaq: Real-Time Reporting with Snapshot Isolation.

Ultimately, versioning might be what you’re looking for but don’t jump right to this without thoroughly tuning your environment.

In Summary

  1. Find where you’re waiting most
  2. Do root cause analysis to get to the code or SPID that’s causing you grief
  3. Analyze the code to see if it can be changed to reduce the time that the locks are held
  4. Consider changing isolation

Thanks for reading!
kt

PS – Stay tuned for day 29 when Jonathan talks about when blocking becomes deadly! :)

The Accidental DBA (Day 20 of 30): Are your indexing strategies working? (aka Indexing DMVs)

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

As an accidental DBA you are constantly wearing many hats. You’ve heard that indexes are critical for performance (and that’s absolutely true) but it’s not just any indexes – it’s the RIGHT indexes. The unfortunate thing about indexing is that there’s both a science and an art to it. The science of it is that EVERY single query can be tuned and almost any SINGLE index can have a positive effect on certain/specific scenarios. What I mean by this is that I can show an index that’s ABSOLUTELY fantastic in one scenario but yet it can be horrible for others. In other words, there’s no short cut or one-size-fits-all answer to indexing.

This is where the art comes in – indexing for performance is really about finding the right balance between too many and too few indexes, as well as trying to get more from the indexes that you do keep.

Having said that, it’s way beyond the scope of what we can talk about in a short post. Unfortunately, a full discussion about these things would take a lot of time but I do have a few links that might help if you’re interested in digging in a bit deeper:

  • How do indexes work? (Index Internals video and demo)
  • What’s the clustered index? (Clustered Index Debate video)
  • What are good/general strategies for indexing? (Indexing Strategies video and demo)
  • When should columns be put in the key (and in what order should they be defined) versus when should they be in the INCLUDE list?
  • Would we benefit from a filtered index?

These are difficult discussions. And really, if you’re only meant to “maintain” the system and keep the lights on – these are probably beyond what you can effectively do in your already-packed day.

So, who is handling this job, to create and define the database’s indexes? It’s probably the application developers. These might be in-house developers (which is good – you’ll have someone with whom to consult when you do your analysis) or they might be vendor developers (which might be both good and bad). The good side is that some vendors are open to discussions on their customer support lines and may help you through the issues that you’re seeing. The bad news is that some vendors are not open to discussions and they might not support you if you make any indexing changes.

So, first and foremost – make sure you thoroughly understand the environment you’re about to analyze and do not change anything without verifying that you can.

Introduction

The good news is that it’s NOT all doom and gloom. There are some very helpful DMVs and resources that you can use to analyze your environment and see where some of your indexing issues are. Again, you might not be able to change them (immediately) but, you will be armed with information to help you discuss what you are seeing.

When I’m analyzing a new-to-me system, I tend to break down my index analysis into three parts:

  1. Are there any indexes just lying around not doing anything useful… time to get rid of the dead weight!
  2. Are there any indexes that are bloated and unhealthy – costing me time and space… time to analyze the health of my existing (and useful) indexes
  3. Then, and only then do I feel like you can add more indexes.

Part I: Getting rid of the dead weight

Fully duplicate indexes

SQL Server lets you create redundant/duplicate indexes. This is annoying but it’s always been the case. It certainly begs the question about why SQL Server lets you do this and I wrote up an answer to this in an article on SQL Server Magazine here: Why SQL Server Lets You Create Redundant Indexes (http://sqlmag.com/blog/why-sql-server-lets-you-create-redundant-indexes). Regardless of why, you still need to remove them. And, even if you don’t remember seeing duplicate indexes, you might be surprised. Without knowing index internals, it might be harder to recognize duplicates than you think. It’s not always as easy as Index1 on col1 and Index2 on col1. Internally, SQL Server adds columns to your index and most commands (like sp_helpindex) do not show these internally added columns. The good news is that I have a version of sp_helpindex that does show you the entire structure. And, tied to that updated version of sp_helpindex, I built a script for finding duplicate indexes at either the table-level or database-wide. Check out these links:

But, you could BREAK the application if they’ve used index hints. So, beware! Generally, it might be best to disable an index for a while before you just drop it.

Unused Indexes

Almost as expensive as a duplicate index is one that never gets used. However, this is a lot more challenging to determine. There is a fantastic DMV (sys.dm_db_index_usage_stats) that gives you information about index usage but, it’s not perfect. And, some of the behaviors have changed in some releases (sigh). If you’re really wanting to best understand your index usage patterns you’ll have to persist this information over a business cycle and be sure to persist it prior to index maintenance (see this connect item: Rebuilding an index clears stats from sys.dm_db_index_usage_stats). https://connect.microsoft.com/SQLServer/feedback/details/739566/rebuilding-an-index-clears-stats-from-sys-dm-db-index-usage-stats Note: this is only an issue is SQL Server 2012.

Bug, again, even the information tracked in this DMV isn’t perfect. One of my biggest frustrations is the user_updates only tracks the number of STATEMENTS, not the number of ROWS modified. For example, if I execute this statement (without a WHERE clause) UPDATE Table SET ColumnX = Value and it affects 10,000 rows, then the user_updates column will be incremented by 1 for BOTH the table and any indexes that include ColumnX. So, you might have an even higher (possibly MUCH higher) value for updates.

And, there’s more to it than that. Instead of duplicating this information, I’ll link to a FANTASTIC post by Microsoft PFE Ignacio [Nacho] Alonso’s FAQ around sys.dm_db_index_usage_stats. http://blogs.msdn.com/b/ialonso/archive/2012/10/08/faq-around-sys-dm-db-index-usage-stats.aspx.

Finally, both Paul and I have written about this DMV as well as how to persist it. Check out these posts:

Similar or semi-redundant indexes

You might have some indexes that are good candidates for consolidation:

  • Indexes that have the same key (but possibly different INCLUDEd columns)
    • Index1: Key = LastName
    • Index2: Key = LastName, INCLUDE = FirstName
    • In this case you don’t “NEED” Index1. There’s NOTHING that Index1 does that Index2 cannot also do. However, Index2 is wider. So, a query that solely wants the following will have more I/Os to do because of the wider index:
      • SELECT LastName, count(*) FROM table GROUP BY LastName
    • But, the argument is – how critical is that query? How often is that index really used? Remember, you can use sys.dm_db_index_usage_stats to help you determine how often it’s used.
  • Indexes that have left-based subsets of other index keys
    • Index1: Key = LastName, FirstName, MiddleInitial
    • Index2: Key = LastName INCLUDE = SSN
    • Index3: Key = LastName, FirstName INCLUDE = phone
    • In this case each index does provide some specific (and unique) uses. However, you have a lot of redundancy there.
    • What if you created a new Index: LastName, FirstName, MiddleInitial INCLUDE (SSN, phone)
    • Again, this new index is wider than any of the prior 3 but this new index has even more uses and it has less overall overhead (only one index to maintain, only one index on disk, only one index in cache [and, it’s more likely to stay in cache]). But, you still have to determine how critical each of the queries are that were using the narrower indexes? As well as how much more expensive they are with the new index.

So, this is where the “art” of indexing comes into play. Index consolidation is a critical step in reducing waste and table bloat but there isn’t a simple answer to every consolidation option. This is another “it depends” case.

Part II: Analyze the health of your existing indexes

This topic has been talked about in many places. And, we’ve even chatted about it in our Accidental DBA series here: The Accidental DBA (Day 14 of 30): Index Maintenance.

In the content of this post, I want to make sure that after I’ve cleaned up the dead weight, my existing and useful indexes are healthy. And, you might want to review your index maintenance strategies and see if they’re “lighter” and take less time. And, be sure that they don’t miss anything. A couple of key reminders:

  • Make sure your index maintenance routines at indexes on tables AND views
  • Make sure your index routines use a LIMITED scan if you’re only analyzing avg_fragmentation_in_percent

And, here are a few other resources that you might find handy on this topic:

Part III: Adding more indexes

This is a tricky one. There are lots of good/bad practices around adding indexes. One of the worst is that most folks just add indexes without really fully analyzing (and CORRECTLY analyzing) their existing indexes. The reason I say correctly analyzing their existing indexes is that the tools (like sp_helpindex and SSMS) hide some of the information about columns that might have been added to your indexes. So, unless you really know what your indexes look like you won’t be able to correctly add new indexes while consolidating your existing indexes.

The primary tool that I want to discuss here is the “user impact” aspect of the missing index DMV queries that exist out there (and, there are some great examples of using the missing index DMVs). And, while I STRONGLY encourage you to use them as a GUIDE, I do want you to remember that they’re not perfect. Here are my main issues/concerns/gripes:

  • The missing index DMVs (and therefore the “index/green hint” that shows up in showplan) only tune the plan that was executed. If the plan performed a hash join then the index is going to help the hash join. But, it’s unlikely that the join type will change. And, it might be the case that a different index would perform a different join type and the query would be even faster. If you’re about to trust the missing index DMVs recommendations(or, the green hint), then consider reverse-engineering the queries that are being tuned by these recommendations (see Jon’s post on how to do this) and then (if possible) run these queries through DTA (the Database Engine Tuning Advisor). DTA has capabilities that the missing index DMVs do not in that DTA can “hypothesize” about alternate strategies. This makes the index recommendations even better!
  • The missing index DMVs only think about the BEST index for EACH query. And, that does make sense (from a QUERY tuning perspective) but, you need to do SYSTEM tuning. You can’t just create individual indexes for each and every query that needs one. You definitely want to consider the indexes that have the highest user impact but you also don’t want to forget about consolidation.
  • The missing index DMVs can show indexes that you already have. Missing index DMVs bug that could cost your sanity…

Here are a few Missing Index DMV queries/resources:

Summary

These are the primary things that I’m looking for when I want to see how the already implemented indexing strategies are working as well as the order in which I begin to analyze and change indexes. But, beware: you can negatively impact the environment so it’s important that adequate testing is done to make sure that what you’re doing has a net-positive effect.

Finally, I also did a video summarizing these things that I’m describing here – you might want to check this out as well: http://technet.microsoft.com/en-US/sqlserver/gg545020.aspx.

Thanks for reading!
kt

The Accidental DBA (Day 15 of 30): Statistics Maintenance

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental/Junior DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we cover in our Immersion Event for The Accidental/Junior DBA, which we present several times each year. If you know someone who would benefit from this class, refer them and earn a $50 Amazon gift card – see class pages for details. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

When you execute a query that’s going to process a single row, the plan to access that row might be very simple – use an index to find the row and then look up the data. If you execute a query that’s going to process thousands of rows, the plan to gather that data might be more complicated. The real question here isn’t the plan itself but how did SQL Server know that there was going to be one row or thousands of rows to access? To create a good plan, SQL Server needs to know your data (before it goes to the data) in order to access the data efficiently.  This is why statistics exist.

What are statistics?

Statistics are objects in the database, stored as a BLOB (binary large object). Generally, you don’t create them directly; they are created with indexes or auto-created by SQL Server when the query optimizer (the system that decides the most efficient way to access the data) needs better information about your data than what it has currently. The latter creation scenario is tied to a database option: auto create statistics. This database option is on by default and for Accidental DBAs, I recommend that this stay on. As for manually creating statistics, there are cases where creating statistics can be extremely useful but they tend to be warranted for VLTs (very large tables). For today, I’ll save that discussion as it’s out of scope for a typical Accidental DBA.*

Statistics give information about the data distribution of the keys described by that statistic (in key order). Statistics exist for all indexes and [column-level] statistics can exist on their own. For example, let’s review the AdventureWorks2012 database. The person.person table has an index called IX_Person_LastName_FirstName_MiddleName on the LastName, FirstName, and MiddleInitial columns of the table. What do the statistics on this index tell me?

USE AdventureWorks2012;
go
DBCC SHOW_STATISTICS ('Person.Person', 'IX_Person_LastName_FirstName_MiddleName');
go

There are 3 results sets returned from the DBCC SHOW_STATISTICS command.

The header

Name Updated Rows Rows Sampled Steps Density Average key   length String Index Filter   Expression Unfiltered Rows
IX_Person_LastName_FirstName_MiddleName Oct 31 2012 12:47PM 19972 8935 200 0.6730038 28.32502 YES NULL 19972

The most important information from the header is when the statistics were last Updated (or when they were created if they’ve never been updated). The second most important is the Rows vs. Rows Sampled columns. Neither of these directly indicates a problem but if queries against this table are not performing and the estimates the queries are using for optimization are not correct, it could be the statistics that are incorrect.

The density vector

All   density Average Length Columns
0.001362398 11.22798 LastName
5.05E-05 23.09927 LastName, FirstName
5.03E-05 24.32502 LastName, FirstName,   MiddleName
5.01E-05 28.32502 LastName, FirstName,   MiddleName, BusinessEntityID

The density vector tells us information about the average distribution of our data. If you multiply the All density * Rows (of the table) you can get some insight into the average distribution of the column(or columns) described by Columns above.

Using LastName alone: 0.001362398 * 19972 = 27.209812856. What this tells me is that the Average number of rows returned for queries that supply JUST a LastName is 27.

Using LastName & FirstName: 5.05E-05 * 19972 = 1.008586. What this tells me is that the combination of LastName and FirstName is almost unique. If I supply BOTH a FirstName and a LastName in my query (using equality), then I should get back 1 row.

This is interesting information – especially for the combinations of the columns beyond the first – because this tells us how much more selective a query can be if we add these additional columns in our WHERE clauses. But, it’s not perfect for LastName alone because we all know that each last name is not going to return 27 rows, right? And, this is where the histogram comes in…

The histogram

RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
Abbas 0 1 0 1
Adams 7.016288 71.19462 3 2.326017
Alan 9.355051 11.12416 3 3.101357
Alexander 18.7101 111.2416 6 3.103471
Zhu 72.50165 68.96979 1 71.9106
Zugelder 23.38763 1 4 5.817025
Zwilling 0.1117823 3.552485 0 27.20037

The histogram can contain up to 201 rows (see the Steps column in the statistics header). These 201 rows are made up of up-to 200 distinct (and actual values) from the table itself AND one row if this leading column allows Nulls. In this case, because our LastName column does not allow Nulls, our histogram has 200 rows (side note: even if your leading column has more than 200 values, it does not guarantee that SQL Server will have 200 steps).

The histogram tells us the most detailed information about our leading column (often referred to as the “high-order element” of the index). It’s surprisingly easy to read:

Abbas 0 1 0 1
Adams 7.016288 71.19462 3 2.326017
Alan 9.355051 11.12416 3 3.101357

For the LastName Abbas there is 1 row equal to this value (EQ_ROWS) and no rows prior to it (no rows in the range).

For the LastName of Adams, there are 71 rows that equal this value (EQ_ROWS) and 7 rows between Abbas and Adams (not including the rows that equal Abbas [1] and Adams [71]) and between these values there are 3 other LastName values. The average number of rows per name between these values is 2.32.

What does this tell me – it tells me that any query requesting rows with a LastName value between Abbas and Adams, will have an estimate of 2.32 rows.

Are statistics accurate?

Well… it depends. There are many factors that affect the accuracy of a statistic. Size of the table, skew of the data, volatility of the table – they all affect the accuracy. At the time of creation, they can be incredibly accurate. But, as data gets modified, they might become less accurate. Because of the nature of how they’re created and what they represent, there’s no way to keep them up to date as individual rows are modified. The only way to update them is when you’re viewing a large amount of the data. When an index is rebuilt, SQL Server updates the index’s statistic with the equivalent of a full scan of the data. Statistics on an index are most accurate after an index rebuild. However, an index reorganize does not update statistics at all because the entire table is not analyzed in one go (only pages with fragmentation are reorganized). So, if you find that your index maintenance scripts are regularly reorganizing indexes then you’ll want to make sure that you also add in statistics maintenance. And, your statistics maintenance should not only include statistics on indexes but any of the other statistics that SQL Server may have created.

Statistics Maintenance

Now that you know statistics provide a valuable role in optimization, it’s also important that this information be accurate. Just as Jonathan mentioned in his post yesterday (The Accidental DBA (Day 14 of 30): Index Maintenance), I also often recommend custom scripts. And, Ola’s scripts even have an option where you only update statistics where data has changed. So, if you run the statistics maintenance after having run index maintenance (and no data has been modified since), then you will only update statistics where there has been data change. This is the most efficient way to update only the statistics that might need to be changed.

Summary

In order for the query optimizer to do a good job, it has to have accurate and up-to-date statistics. My general recommendations for Accidental DBAs is that they should leave both the auto create statistics option and the auto update statistics option on (they are both on by default). However, I would also recommend a specific maintenance routine that updates the statistics manually – off hours – so that the default auto updating mechanism isn’t your primary method for updating statistics. For the optimizer to do a good job at optimizing your queries, statistics have to both exist and be accurate. Proper statistics maintenance is a critical task for helping the optimizer do its job. While there are multiple options for automating this task, custom scripts provide the best method of minimizing the work to be done by performing the most efficient updates based only on data change.

This has only been a short introduction into statistics; there’s a lot more to them. If you’re interested in reading more about statistics check out the whitepaper, Statistics Used by the Query Optimizer in Microsoft SQL Server 2008. Then, check out our SQLskills blogs by category: Statistics.

Thanks for reading!
kt

*Sidenote, as an Accidental DBA, if you have individual tables reaching 100GB or more, we’ll, you might want to talk to management to allow you more time, more knowledge, more administration, more tweaking/tuning of these larger tables. Maybe it’s time your environment considered a full-time DBA (maybe you?) with more dedicated time to managing/tuning your servers. It’s hard to wear so many hats in general but if you’re starting to administer databases with large tables and large databases (1TB+), then maybe you’re not an Accidental DBA any more?