SQLintersection Fall 2017 – 4 weeks to go!

As we head towards our 10th SQLintersection in four weeks, we’re excited to say that it’s once again our most diverse, complete, and information-packed show yet!

One of the pieces of feedback we hear over and over is that attendees love SQLintersection because it’s a smaller, laid-back show, where you get to actually spend time talking with the presenters 1-1. That’s one of the reasons why we love the show so much; *we* get to spend time talking to attendees, rather than being mobbed by hundreds of people after a session ends. And we only pick presenters who we know personally, and who we know to be humble, approachable, and eager to help someone out.

We have 2 pre-con days at the show and with our post-con day, there are 9 full-day workshops from which to choose. We have 40 technology-focused (NOT marketing) sessions from which to choose, plus two SQL Server keynotes, multiple industry-wide keynotes by Microsoft executives, and the ever-lively closing Q&A that we record as a RunAs Radio podcast.

You’ll learn proven problem-solving techniques and technologies you can implement immediately. Our focus is around performance monitoring, troubleshooting, designing for scale and performance, cloud, as well as new features in SQL Server 2014, 2016, and 2017. It’s time to determine your 2008 / 2008 R2 migration strategy – should you upgrade to 2016/2017 directly? This is the place to figure that out!

If you’re interested in how we got here – check out some of my past posts:

  1. SQLintersection: a new year, a new conference
  2. SQLintersection’s Fall Conference – It’s all about ROI!
  3. Fall SQLintersection is coming up soon and we can’t wait!
  4. SQLintersection Conference and SQLafterDark Evening Event – what a fantastic week in Vegas

And I recorded a Microsoft Channel 9 video where I discusses the Spring show – see here.

SQLafterDark

With minimal to no marketing filler, we’ve largely kept our conference focus on ROI and technical content (performance / troubleshooting / tales-from-the-trenches with best practices on how to fix them ) but we’ve also added even more social events so that you really get time to intersect with the conference attendees and speakers. The addition of the SQL-specific, pub-quiz-style evening event SQLafterDark was wildly popular from some of our past shows and that’s returning for Fall!

 

SQLintersection: Great Speakers!

Once again, we think a great show starts with great speakers and current / useful content. All of these speakers are industry-experts that have worked in data / SQL for years (some can even boast decades) but all are still focused on consulting and working in the trenches. And, they’re good presenters! Not only will you hear useful content but you’ll do so in a way that’s digestible and applicable. Every speaker is either an MCM (Master), a SQL Server MVP, or a past/present Microsoft employee (or a combination of all three!) But, regardless of their official credentials – ALL are focused on providing the most ROI that’s possible in their session(s) and/or their workshops, and ALL have spoken for SQLintersection multiple times.

Check out this phenomenal list of speakers:

  • Aaron Bertrand – MVP, SentryOne
  • David Pless – MCM, Microsoft
  • Jes Borland, past-MVP, Microsoft
  • Jonathan Kehayias – MCM, MCM Instructor, MVP
  • Justin Randall, MVP, SentryOne
  • Kimberly L. Tripp – MCM Instructor, MVP, past Microsoft, SQLskills
  • Paul S. Randal – MCM Instructor, MVP, past Microsoft, SQLskills
  • Shep Sheppard – past Microsoft, Consultant
  • Stacia Varga, MVP, Consultant
  • Tim Chapman – MCM, Microsoft
  • Tim Radney – MVP, SQLskills

You can read everyone’s full bio on our speaker page here.

SQLintersection: When is it all happening?

The conference officially runs from Tuesday, October 31 through Thursday, November 2 with pre-conference and post-conference workshops that extend the show over a total of up to 6 full days. For the full conference, you’ll want to be there from Sunday, October 29 through Friday, November 3.

  • Sunday, October 29 – pre-con day. There are two workshops running:
    • Data Due Diligence – Developing a Strategy for BI, Analytics, and Beyond with Stacia Varga
    • Performance Troubleshooting Using Waits and Latches with Paul S. Randal
    • SQL Server 2014 and 2016 New Features and Capabilities with David Pless and Tim Chapman
  • Monday, October 30 – pre-con day. There are two workshops running:
    • Building a Modern Database Architecture with Azure with Jes Borland
    • Data Science: Introduction to Statistical Learning and Graphics with R and SQL Server with Shep Sheppard
    • Extended Events: WTF OR FTW! with Jonathan Kehayias
  • Tuesday, October 31 through Thursday, November 2 is the main conference. Conference sessions will run all day in multiple tracks:
    • Check out our sessions online here
    • Be sure to check out our cross-conference events and sessions
    • Get your pop-culture trivia and techie-SQL-trivia hat on and join us for SQLafterDark on Wednesday evening, November 1
  • Friday, November 3 is our final day with three post-conference workshops running:
    • Common SQL Server Mistakes and How to Correct Them with Tim Radney
    • SQL Server 2016 / 2017 and Power BI Reporting Solutions with David Pless
    • Very Large Tables: Optimizing Performance and Availability through Partitioning with Kimberly L. Tripp

SQLintersection: Why is it for you?

If you want practical information delivered by speakers that not-only know the technologies but are competent and consistently, highly-rated presenters – this is the show for you. You will understand the RIGHT features to troubleshoot and solve your performance and availability problems now!

Check us out: www.SQLintersection.com.

We hope to see you there!

PS – Use the discount code ‘SQLskills’ when you register and receive $50 off registration!

SQLskills SQL101: Partitioning

Continuing on our path to understanding the basics and core concepts, there’s a big topic that’s often greatly misunderstood and that’s partitioning. I know I’m going to struggle keeping this one an introductory post but I’ve decided there are a few critical questions to ask and a few very important things to consider before choosing a partitioning strategy / design / architecture. I’ll start with those key points!

Partitioning is Not About Performance

I’m often asked – how can I use partitioning to improve this or that… and they’re often queries. I remember one where someone said they had a 3 billion row table and one of their queries was taking 15 minutes to run. Their question was – how can they best partition to improve query performance. I asked if I could look at the query and review their schema (secretly, that meant I wanted to see their indexes ;-)). And, sure enough, I asked if they were open to trying something. I gave them an index to create and asked them to run their query again. Without ANY changes to their table’s structure and only one added index, their query ran in 9 seconds.

Partitioning, like indexing, isn’t all unicorns and rainbows… (check out more fantastic work from tomperwomper by clicking on the image)

Now, don’t forget – indexes aren’t all unicorns and rainbows… indexes have a downside:

  • An index requires disk space / memory / space in your backups
  • An index adds overhead to your inserts / updates / deletes
  • An index needs to be maintained

So, you’d still need to determine if this is the right approach. But, the main point – partitioning really isn’t designed to give incredible gains to your queries. It’s meant to be better for data management and maintenance. However, some partitioning designs can lead to query performance benefits too.

Partitioning for Manageability / Maintenance

The primary reason to consider some form of partitioning is to give you option in dealing with large tables; I tend to start thinking about partitioning as a table approaches 100GB AND it’s not going away (but, even if some of it will regularly “go away” then partitioning is definitely an option to help manage that regular archiving / deleting). I wrote a detailed post about this in response to an MSDN Webcast question I was asked about range-based deletes on a large table. It’s a very long post but it revolves around “how to make range-based deletes faster.” You can review that here. (NOTE: That post was written before we converted our blog to the new format and I’ve not yet gone through and manually converted all of my old posts… so, the formatting isn’t great. But, the content / concepts still apply!)

So the real reasons to consider partitioning are:

  • Your table is getting too large for your current system to manage but not all of that data is accessed regularly (access patterns are really the key). There isn’t a magic size for when this happens; it’s really relative to your hardware. I’ve seen tables of only 40GB or 50GB cause problems on systems that only have 32GB or 64GB of memory). No, you don’t always need to put your entire table in memory but there are many operations that cause the table to end up there. In general, as your tables get larger and larger – many problems start to become more prominent as well. Often, a VLDB (Very Large Database) is considered 1TB or more. But, what I think is the more common problem is the size of your critical tables; when do they start causing you grief? What is a VLT (Very Large Table)? For me, as your table starts to really head toward 100GB; that’s when partitioning should be considered.
  • You have varying access patterns to the data:
    • Some data is recent, critical, and very active
    • A lot of data is not quite as recent, mostly only reads but with some modifications (often referred to as “read-mostly”)
    • The bulk of your data is historical and not accessed often but must be kept for a variety of reasons (including some large / infrequent analysis queries)

And that second point, is the most common reason to consider partitioning… but, it’s the first point that’s probably more noticable. :-)

What Feature(s) Should Be Used for Partitioning

As I’ve been discussing partitioning, I’ve been trying to generalize it as more of a concept rather than a feature (NOT tied directly to either SQL Server feature: partitioned tables [2005+] or partitioned views [7.0+]). Instead, I want you to think of it as a way of breaking down a table into smaller (more manageable) chunks. This is almost always a good thing. But, while it sounds great at first – the specific technologies have many considerations before choosing (especially depending on which version of SQL Server you’re working with).

Partitioned Views (PVs)

These have been available in SQL Server since version 7.0. They were very limited in SQL Server 7.0 in that they were query-only. As of SQL Server 2000, they allow modifications (through the PV and to the base tables) but with a few restrictions. There have been some bug fixes and improvements since their introduction but their creation, and uses are largely the same. And, even with SQL Server 2016 (and all of the improvements for Partitioned Tables), there are still scenarios where PVs make sense – sometimes as the partitioning mechanism by itself and sometimes combined with partitioned tables.

Partitioned Tables (PTs)

Partitioned tables were introduced in SQL Server 2005 and have had numerous improvements from release to release. This list is by no means exhaustive but some of the biggest improvements have been:

  • SQL Server 2008 introduced partition-aligned indexed views so that you could do fast switching in / out of partitioned tables even when the PT had an indexed view. And, SQL Server 2008 introduced partition-level lock escalation (however, some architectures [like what I recommend below] can naturally reduce the need for partition-level lock escalation).
  • SQL Server 2008 R2 (SP1 and 2008 SP2) offered support for up to 15,000 partitions (personally, I’m not a fan of this one).
  • SQL Server 2012 allowed index rebuilds to be performed as online operations even if the table has LOB data. So, if you want to switch from a non-partitioned table to a partitioned table (while keeping the table online / available), you can do this even when the table has LOB columns.
  • SQL Server 2014 offered better partition-level management with online partition-level rebuilds and incremental stats. These features reduce what’s read for statistics updates and places the threshold to update at the partition-level rather than at the table-level. And, they offer the option to just rebuild the partition while keeping the table and the partition online. However, the query optimizer still uses table-level statistics for optimization (Erin wrote an article about this titled: Incremental Statistics are NOT used by the Query Optimizer and Joe Sack wrote an article about partition-level rebuilds titled: Exploring Partition-Level Online Index Operations in SQL Server 2014 CTP1.)
  • SQL Server 2016 further reduced the threshold for triggering statistics updates to a more dynamic threshold (the same as trace flag 2371) but only for databases with a compatibility mode of 130 (or higher). See KB 2754171 for more information.

However, even with these improvements, PTs still have limitations that smaller tables don’t have. And, often, the best way to deal with a very large table is to not have one. Don’t misunderstand, what I’m suggesting is to use PVs with multiple smaller tables (possibly even PTs) unioned together. The end result is one with few, if any, real restrictions (by physically breaking your table into smaller ones, you remove a lot of the limitations that exist with PTs). However, you don’t want all that many smaller tables either as the process for optimization is more complicated with larger numbers of tables. The key is to find a balance. Often, my general recommendation is to have a separate table per year of data (image “sales” data), and then for the older years – just leave those as a single, standalone table. For the current and future years, use PTs to separate the “hot” data from the more stable data. By using standalone tables for the critical data you can do full table-level rebuilds online (in EE) and you can update statistics more frequently (and they’re more accurate on smaller tables). Then, as these standalone months stabilize you can switch them into the year-based PTs to further simplify their management moving forward.

Partitioning a VLT into smaller tables (some of which are PTs) can be a HIGHLY effective strategy for optimizing all aspects of table management, availability, and performance. But, it's definitely a more complex and difficult architecture to get right...

Partitioning a VLT into smaller tables (some of which are PTs) can be a HIGHLY effective strategy for optimizing all aspects of table management, availability, and performance. But, it’s definitely a more complex and difficult architecture to get right…

This architecture is complex but highly effective when designed properly. Doing some early prototyping with all of the features you plan to leverage is key to your success.

Key Points Before Implementing Partitioning

Partitioning is not really directly tied to performance but indirectly it can be extremely beneficial. So, for a SQL101 post, the most important point is that partitioned views still have benefits; they should not be discounted only because they’re an older feature. Both partitioning strategies provide different benefits; the RIGHT solution takes understanding ALL of their pros/cons. You need to evaluate both PVs and PTs against your availability, manageability, and performance requirements – and, in the most likely case, use them together for the most gains.

Since it’s concepts only, I still feel like it’s a SQL101 post. But, getting this right is a 300-400 level prototyping task after quite a bit of design.

Thanks for reading!
Kimberly

SQLskills SQL101: Indexes on Foreign Keys

As many of you know, we started our SQLskills SQL101 series a couple of weeks ago… it’s been great fun for the whole team to go back through our most common questions / concerns and set the record straight, per se. We’ve still got a lot of things to discuss but indexing is one of many questions / discussions and unfortunately, misunderstandings.

I’m going to tie today’s post with a question I received recently: if I have a table that has 6 foreign key columns/references should I create one index with all 6 foreign key columns in it, or should I create 6 individual indexes – one for each foreign key reference.

This is an interesting question with a few tangents to cover for our SQL101 series. I consider indexing foreign keys as part of my “Phase 1 of index tuning.” But, let’s go back to some basics before we dive into the answer on this one.

What is a Foreign Key Enforcing?

Imagine you have two tables: Employees and Departments. The Employee table has a column called DepartmentID and it represents the department of which that employee is a member. The department ID must be a valid department. So, to guarantee that the department ID exists and is valid – we create a foreign key to the DepartmentID column of the Departments table. When a row is inserted or updated in the Employees table, SQL Server will check to make sure that the value entered for DepartmentID is valid. This reference is very inexpensive because the foreign key MUST reference a column which is unique (which is in turn, enforced by a unique index).

What Must Exist in Order to Create a Foreign Key Reference?

A foreign key can be created on any column(s) that has a unique index on the referenced table. That unique index can be created with a CREATE INDEX statement OR that index could have been created as part of a constraint (either a UNIQUE or PRIMARY KEY constraint). A foreign key can reference ANY column(s) that has a UNIQUE index; it does not have to have been created by a constraint. And, this can be useful during performance tuning. A UNIQUE index offers options that constraints do not. For example, a UNIQUE index can have included columns and filters. A foreign key reference CAN reference a UNIQUE index with included columns; however, it cannot reference a UNIQUE index with a filter (I wish it could).

A good example of this might occur during database tuning and specifically during index consolidation (something I do after I do after query tuning and when I’m determining the best index for the database / for production). I often review existing indexes as well as any missing index recommendations, etc. Check out my SQLskills SQL101: Indexing Basics post for more information about these concepts.

Imagine I have the following:

Table: Employees, column: NationalID (SocialSecurityNumbery)

Column NationalID: this is an alternate key for Employees as their Primary Key is EmployeeID. Because it’s another column on which you will lookup employees and you want to make sure it’s unique, you decide to enforce it with a UNIQUE constraint on it. You may even reference it from other tables.

However, later, while doing database tuning, you decide that you need the following index:

CREATE INDEX [QueryTuningIndex]
ON [dbo].[Employees] ([NationalID])
INCLUDE ([LastName], [FirstName])

This index would be similar to and redundant with the existing constraint-based index on NationalID. But, you really want this new index to help performance (you’ve tested that this index is helpful to some frequently executed and important queries so you’ve already decided that the costs outweigh the negatives).

And, this is where the excellent feature to reference a unique index comes in… instead of adding this new one and keeping the existing constraint, change the index to the following:

CREATE UNIQUE INDEX [QueryTuningIndex]
ON [dbo].[Employees] ([NationalID])
INCLUDE ([LastName], [FirstName])

The uniqueness is always enforced ONLY on the key-portion of the index. So, this new index – even with included columns – still does this. The only bad news is that SQL Server has already associated the foreign key with the original constraint-based index so you’ll still need to remove the foreign key to drop the constraint (I wish this weren’t true). But, you’ll still have data integrity handled by the new unique index – as long as you create the new index before you drop the foreign key and the original unique constraint. Having said that, there’s more bad news – because there will be a short timeframe where the foreign key does not exist, you must do this off hours and when little-to-no activity is occurring. This will reduce the possibility of rows being inserted / updated that do not have a valid reference. You’ll certainly find out when you add the referential constraint again as the default behavior of adding the foreign key will be to verify that all rows have a reference row. NOTE: there is a way to skip this checking but it is NOT recommended as your constraint will be marked as untrusted. It’s is ALWAYS PREFERRED to create a foreign key with CHECK. Here’s a script that will walk you through the entire example – leveraging the default behavior to recheck the data when the constraint is created. Be sure to run this is a test / junk database.

Creating an Index on a Foreign Key Column

Foreign keys can reference any column(s) that have a UNIQUE index (regardless of whether it was created by a constraint).

Foreign keys can reference any column(s) that have a UNIQUE index (regardless of whether it was created by a constraint).

Now that you know the options for the column being referenced, let’s consider what’s required for the referencing column? The column on which the foreign key is created will not have an index by default. I wrote about this in a prior post: When did SQL Server stop putting indexes on Foreign Key columns? and the main point is that SQL Server has NEVER automatically created indexes on foreign key columns. But, many of us recommend that you do! (but, I also wish indexing were just that simple because this might not be an index you keep forever…)

See, if EVERY foreign key column automatically had an index created for you – then SQL Server might end up requiring it to always be there. This would then remove the option of consolidating this index with others when you were later running through performance tuning techniques. So, while it’s generally a good idea to have an index on a foreign key column; it might not ALWAYS be a good idea to keep that narrow index as you add other (probably, slightly-wider indexes).

However, initially, creating this index is what I call: Phase 1 of tuning for joins.

But, there are 3 phases of tuning for joins and these phases are all during query tuning and not database tuning. So, as you do deeper tuning, you might end up consolidating this foreign key index with another index(es) to reduce the overall number of indexes on your table.

Finally, the most important point (and this answers the original question), the index must be ONE per foreign key (with only the column(s) of that specific foreign key); you will create one index for each foreign key reference.

Indexes on Foreign Keys Can Help Improve Performance

There are two ways in which these indexes can improve performance.

First, they can help the foreign key reference maintain integrity on the referenced table. Take the example of Employees and Departments. Not only must SQL Server check that a DepartmentID is valid when inserting / updating an Employee row but SQL Server must also make sure that referential integrity is maintained when DepartmentIDs are removed from the Departments table. An index on the Employees table (on the DepartmentID columns) can be used to quickly check if any rows reference the DepartmentID being deleted from the Departments table. Without an index on DepartmentID in the Employees table, SQL Server would potentially have to scan the Employees table; this can be quite expensive.

Second, and this doesn’t always work, SQL Server may be able to use the index to help improve join performance. And, this is where I’ll cut the ideas a bit short as other phases of join tuning are more complex for this SQL101 post. So while there are other strategies that can be used to tune joins when this doesn’t work, it’s still a fantastic starting point. In fact, I generally recommend indexing foreign keys as part of your very early / development phase for a database. But, again, these indexes might be consolidated later in favor of other indexes.

Summary

Indexing for performance has many steps and many strategies, I hope to keep uncovering these in our SQL101 series but between this post and the Indexing Basics post, you’re well on your way to kicking off a better tuning strategy for your tables!

And, don’t forget to check out all of our SQL101 posts here!

Thanks for reading,
Kimberly