When Updating Statistics is too Expensive

Hopefully by now, anyone working with SQL Server as a profession is aware of how important statistics are. If you aren’t sure why, or you need a refresher, check out Kimberly’s blog post “Why are Statistics so Important“. Here’s a hint, the query optimizer uses statistics to help make decisions when creating an execution plan. Out-of-date statistics can mean a much less optimal plan.

Erin Stellato wrote an excellent blog post “Updating Statistics with Ola Hallengren’s Script” where she covers updating statistics with database maintenance plans, T-SQL, and using Ola Hallengren’s Index Optimize script. In this article Erin discusses various options for updating statistics and how using an approach of only updating statistics that have had row modifications is less invasive. She further explains how in SQL Server 2008R2 SP2 and SQL Server 2012 SP1, Microsoft introduced the sys.dm_db_stats_properties which tracks modifications for each statistic.

This new DMV allows users to build logic into their processes to only update statistics after a specific percentage of change has occurred. This can further reduce the overhead of updating statistics.

I’ve been using Ola Hallengren’s Index Optimize procedure for over decade as well as recommend his process for my clients. The Index Optimize procedure has logic built in to deal with index fragmentation based on the percentage of fragmentation. This allows you to not worry about minimal fragmentation, reorganize if the indexes aren’t heavily fragmented, and rebuild if the fragmentation level is over a certain threshold. It is a better process than blindly reorganizing or rebuilding all indexes regardless of their fragmentation level.

Over the years, I’ve found that many clients as well as DBA’s aren’t aware that reorganize does not update statistics, whereas an index rebuild does. If you are using logic to reorganize and rebuild, without a separate statistics update process, your statistics may be slowly aging and could become problematic. Fortunately, Ola’s Index Optimize procedure allows for passing parameters to update statistics. These include:

@UpdateStatistics
@OnlyModifiedStatistics
@StatisticsModificationLevel

Typically, I’ve only had to configure @UpdateStatics = ‘ALL’ and @OnlyModifiedStatistics = ‘Y’. This would update all statistics if there have been any row modifications. This has worked for me, my previous employer, and my clients for many years.

Recently I encountered an issue with a very large database with numerous tables with 100’s of millions of rows each as well as hundreds of other tables with a million or less rows. Every table would have some level of data change per day. This was causing statistics to be updated nightly on nearly every table, even those with minimal change. This took extra time and was generating additional IO that the customer needed to minimize. I updated the process to change @OnlyModifiedStatistics to @StatisticsModificationLevel = ‘5’ for 5%. Now statistics will only be updated after a 5% data change, or so I thought.

When I made this change and reviewed what would now be updated, I was surprised to see that some large tables were going to have statistics updated. I noticed that several large tables were listed, even though the modification counter was well below the 5% threshold. It turns out that statistics will be updated when the number of modified rows has reached a decreasing, dynamic threshold, SQRT(number of rows * 1000). SQRT = Square Root. For example, a table with 9,850,010 rows at 5% would be 492,500 rows, however the modification counter was only 134,017 rows. If we plug 9,850,010 into SQRT(9,850,010 * 1000) = 99,247, which is well below the 492k value. Is this a bad thing? Absolutely not, 99,247 is still much larger than 1. By only use @OnlyModifiedStatistics, this nearly 10M row table would have statistics updated after a single modification.

Having @StatisticsModificationLevel is a nice tool to have at your disposal for those situations where you need to fine tune your maintenance process.

Managing Virtual Log Files in Azure Managed Instance

Maintaining the number of virtual log files (VLFs) in a transaction log is a task that is routinely performed in analysis of SQL Server instances. Numerous blog post have been dedicated to covering the issue of maintaining an efficient number of VLFs. Kimberly wrote about the impact of too few or too many files and Paul wrote about how Microsoft changed the algorithm in SQL Server 2014 for how many VLFs are created when a log file is grown. Way back in 2005 Kimberly shared 8 steps to better transaction log throughput. This post is where Kimberly told us how to reset the VLFs in a transaction log by backing up the log, shrinking the log, and then likely having to repeat the process several more times.

Since Azure Managed Instance manages the backups for you, are you still able to manage VLF fragmentation? Let’s find out.

I connected to my managed instance and created a test database. Next I created a table with three columns and made them each a uniqueidentifer.

CREATE TABLE [dbo].[IDTABLE](
[ID1] [uniqueidentifier] NULL,
[ID2] [uniqueidentifier] NULL,
[ID3] [uniqueidentifier] NULL
) ON [PRIMARY]

I modified the database log file to auto grow at 1 MB instead of the default 16 MB and then ran DBCC LOGINFO() to see that I only have 4 VLFs.



I inserted 150k records to cause the database and log file to grow by inserting NEWID into each of the three columns.

INSERT INTO [dbo].[IDTABLE]
VALUES
(NEWID(), NEWID(), NEWID())
GO 150000

I checked for the VLF count again and had 247 VLFs. While 247 VLFs may not produce any noticeable performance impact, the size of the log file was proportionally small for 200+ VLFs. In this exercise, the log file grew to approx 250 MB.

How many VLFs are too many? I’m personally not concerned until I start seeing 1000+, unless the log file is small. In this case, with the log file in MB being close to the number of VLFs, I would recommend resetting the VLFs and then manually grow the log file to 256 MB or possibly 512 MB. I would also recommend increasing the auto grow value from the 1 MB I configured to 64 MB to 128 MB. Keep in mind, if it had grown to 250 MB during normal operation, these would be decent values. If the log file was 20 GB, I would recommend a larger value.

The known method to reset VLFs in SQL Server is to backup the log, shrink the log file and then repeat a few more times, if needed. In Azure Managed Instance, backups are handled for you. You can make COPY_ONLY backups, however that will not have the same effect.

So how can we reset VLFs? I decided to just try and shrink the log file since I know log backups are happening automatically. I did this by running DBCC SHRINKFILE (2,1)

The VLFs were reduced to 29.

I waited for a short period (5 or so minutes) to allow an additional log backup to occur and ran DBCC SHRINKFILE (2,1) again. This time the VLFs were reduced to 8.

At this point, I could have waited a little longer and try to shrink again to reduce to a smaller number, or I could manually grow the log file to make sure I have a good balance of VLFs to the size of the file.

For brand new databases, setting a proper size auto grow setting can help minimize having too many VLF files. At the same time, if you have an idea of how big the log file should be, you can follow Kimberly’s guidance by growing the file by 4 GB or 8 GB increments to get a good balance of VLFs across your transaction log.

Pluralsight Courses

I’ve recently submitted my proposal for my 5th Pluralsight course and am anxiously waiting to hear back before I can being working on the new course. This past week, Paul sent out the numbers to the team letting us know the hours our courses have been viewed during November. Being a data guy, I decided that after nearly 3.5 years of having courses available, I should crunch some numbers. I currently have 4 courses available.

My goal is to record at least two courses, three if possible in 2020. Creating the content, recording, and publishing a course is a lot of work. After crunching some numbers and realizing that my 4 courses have been viewed nearly 12,000 hours, I’m even more excited to record in 2020. I regularly get emails from viewers thanking me for the content and sharing with me how the courses have helped their career.

You can follow me on Pluralsight to get new notifications of courses that I develop as well as see my entire list of courses. I am also on Twitter and share new course announcements there as well.

I also plan to create more YouTube videos that I can use as references to user group and conference sessions that I give. I generally make my slide decks available, however many times, attendees would like to see the demos again or be able to be able to show someone on their team the demo. I guess you can call these my 2020 resolutions – generate more training videos period!

Never stop learning folks!

Capturing Throughput Usage in SQL Server

I recently posted an article at sqlperformance.com about the importance of selecting the proper size Azure VM because of limits placed on throughput based on the VM size. I was sharing this article during my “Introduction to Azure Infrastructure as a Service” session during SQLintersection when Milan Ristovic asked how best to calculate throughput. I told Milan and the audience how I do it and that I would write a quick blog post about it. Thanks for the inspiration to write another blog Milan!

Most data professionals are familiar with collecting latency information using sys.dm_io_virtual_file_stats or by using Paul Randal’s script on his “Capturing IO latencies for a period of time” that uses sys.dm_io_virtual_file_stats and captures a snapshot, uses a wait for delay, captures another snapshot and computes the differences. Capturing latency information lets you know the time delay between a request for data and the return of the data.

I’ve been using Paul’s script from his blog post for over 5 years to determine disk latency. A few years ago I made a few changes to also calculate disk throughput. This is important to know for migrations to different storage arrays and especially when moving to the cloud. If you simply benchmark your existing storage, that is telling you what your storage subsystem is capable of. That is great information to have, however, you also need to know what your workload is currently using. Just because your on-premises storage supports 16,000 IOPS and 1000MB of throughput, you probably aren’t consuming that many resources. Your workload may only be consuming 70MB of throughput during peak times. You need a baseline to know what your actual needs are.

When I’m capturing latency and throughput data, I like to capture small increments. Paul’s script defaults to 30 minutes, I like 5 minute segments to have a more granular view. What I’ve added in Paul’s script to capture throughput is to capture the num_of_bytes_written and divide by 1,048,576 (1024 bytes * 1024) to calculate the MB value. I do the same for num_of_bytes_read. I also do the same again and divide by the number of seconds I am waiting between the two snapshots. In this case since I am waiting 5 minutes, I’ll use 300, for example: (num_of_bytes_written/1,048,576)/300 AS mb_per_sec_written.

I add my changes at the end of Paul’s script before the final FROM statement, just after [mf].[physical_name]

Modifications to Paul’s filestats script

With these changes I can easily see how many MB the workload is reading and writing to disk during the capture time, as well as the MB/s to tell me the overall throughput.

I hope this helps you to baseline your existing SQL Server workload. For my customers, I’ve further modified this script to write the results to a table with a date and time stamp so I can better analyze the workload.

New Pluralsight course: SQL Server: Understanding Database Fundamentals (98-364)

On October 29th, 2019, Pluralsight published my latest course, SQL Server: Understanding Database Fundamentals (98-364). This makes four courses that I have done for Pluralsight. Here is the official course description:

Learn the fundamentals of designing, using, and maintaining a SQL Server database. This course is applicable to anyone preparing for the 98-364 exam: Understanding Database Fundamentals.

My goal for creating this course is to provide training for those who are just getting started with SQL Server, and as an added bonus, to help prepare individuals to pass the Microsoft certification exam 98-364.

The course starts with an overall introduction to SQL Server and the various versions and editions available. Next, I cover core database concepts that you’ll need to know when getting started with SQL Server. At this point, the viewer should have a solid understanding of what SQL Server is and how databases are used. I then cover how to create database objects and how to manipulate data. Then I shift over to data storage to discuss normalization and constraints. I conclude the course with sharing and demonstrating how to administer a database.

Skip-2.0 Backdoor Malware – SQL Server

There was a flutter of headlines this week about a new vulnerability/risk with SQL Server 2012 and SQL Server 2014. The malware was reported to allow an attacker steal a “magic password”. Of course the headlines made this sound really bad and the image of thousands of DBAs rushing to patch SQL Server came to mind.

After reading over the many headlines;

It quickly became clear that this threat isn’t as big of a deal as the headlines made it out to be. While this does target SQL Server 2012 and SQL Server 2014, in order for the malware to work, the attacker must already be an administrator on the server. If an attacker has already gotten to this point, then things are already really bad for you.

It is reported that a cyber-espionage group out of China called the Winnti Group is responsible.  As of now, there are no reports of this being used against an organization.

What should you be doing or how can you protect against this?

  • Stay current, patch your servers, both OS and SQL Server
  • Perform vulnerability scans to look for known issues “This is available in SSMS and Azure” and third party tools
  • Audit your servers and environments for suspicious activities

Skip-2.0 is just a reminder to organizations to keep their eyes open. Everyone should be keeping up with patching and securing their environments. Since skip-2.0 can only target an already compromised server, the only thing DBAs can really do is ensure their systems are patched.

Azure SQL Database Serverless

A new compute tier for single databases has been made available that allows single databases to automatically scale based upon workload demand. Azure SQL Database serverless (in preview) provides the ability for single databases to scale up and down based upon workload and only bills for the amount of compute used per second. Serverless also allows databases to pause during inactive periods. During the time a database is paused, only storage is being billed. Databases are automatically resumed when activity returns.

Customers get to select a compute autoscaling range and an autopause delay parameter. This is a price-performance optimized tier for single databases that have intermittent, unpredictable usage patterns that can handle some delay in compute warm-up after idle usage periods. For databases with higher average usage, elastic pools is the better option.

I see this feature as a great option for low utilized databases or for those with unpredictable workloads that need to be able to automatically scale when the workload demands it.

I’m looking forward to seeing how this feature matures and develops during preview. If you’ve played with the preview, share your experience in the comments.

SQL Database Instance Pools

A new (in-preview) resource in Azure SQL database was just announced that delivers instance pools for providing a cost-efficient way to migrate smaller SQL Server instances to the cloud.

Many departmental SQL Servers are virtual and run on a smaller scale. It is not uncommon to find 2-4 vCPU SQL Servers running business critical workloads. Many of these workloads contain multiple user databases which makes them a candidate for Azure SQL managed instance. Currently the smallest vCore option for managed instance is 4.

With the introduction of instance pools, a customer can pre-provision compute resources based according to their total migration requirement. For example, if a customer needed 8 vCores, they could then deploy a 4 vCore and two 2 vCore instances.

Prior to instance pools, a customer would have to consolidate smaller workloads into larger instances. This could be problematic due to a number of factors. In many cases, workloads were isolated due to security concerns, elevated privileges that a vendor required, business continuity reasons, or any number of factors. Now customers can keep the same level of isolation that they’ve had on-premises with these smaller VMs.

I see this as a big win for customers that have been wanting to migrate to Azure SQL managed instance that have smaller workloads. This essentially eliminates the concerns about having to consolidate workloads for migrations.

SQLintersection Spring 2019 Conference

I am very excited to be speaking at my ninth consecutive SQLintersection conference. The Spring show this year is at Walt Disney World Swan Resort. I’m honored to be co-presenting two workshops with good friend David Pless as well as presenting three sessions.

David and I start our week on Monday with a full day workshop on Performance Tuning and Optimization for Modern Workloads (SQL Server 2017+, Azure SQL Database, and Managed Instance).

Over the next three days I present sessions covering An Introduction to Azure Managed Instances, Getting Started with Azure Infrastructure as a Service, and Migration Strategies.

David and I end our week on Friday with an all day workshop on SQL Server Reporting Services and Power BI Reporting Solutions.

SQLintersection is one of my favorite conferences that focuses on the Microsoft Data Platform. The speakers and sponsors are all approachable and willing to talk to you about your issues and offer advice. As a speaker and attendee, I always learn something new and make new friendships and connections.

I hope to see you there.

What is Azure SQL Database Hyperscale

1024px-Microsoft_Azure_LogoAzure SQL Database has a new service tier called Hyperscale. Hyperscale is currently in public preview and offers the ability to scale past the 4TB limit for Azure SQL Database. Hyperscale is only available in the vCore-based purchasing model.

Hyperscale offers customers a highly scalable storage and computer performance tier that is built on the Azure architecture in order to scale out the storage and compute for an Azure SQL Database. By separating out the storage and compute, Hyperscale allows for scaling out storage limits well beyond what is available in the General Purpose and Business Critical service tiers.

You’ve probably already figured out that Hyperscale is primarily intended for those customers who are using or would like to use Azure SQL Database but have massive storage requirements. Currently Hyperscale has been tested with a database up to 100TB. That’s correct, you can have up to a 100TB Azure SQL Database, well currently in preview only right now. While Hyperscale is primarily optimized for OLTP workloads, it also supports hybrid and analytical workloads.

With Hyperscale offering databases up to 100TB (this is what Microsoft has tested up to so far), backups could be problematic to make. Microsoft offers near instantaneous database backups for Hyperscale leveraging file snapshots stored in Azure Blob storage. This is done with no IO impact on compute and regardless of the size of the database. This also offers fast database restores. I’ve seen a 40+ TB restore that took minutes!

Hyperscale offers rapid scale out, meaning within the Azure Portal you can configure up to 4 read-only nodes for offloading your read workload, these can also be used as hot-standbys. At the same time, you can scale up compute resources to handle heavy workloads. This can be done in constant time. When you no longer need the scaled-up compute resources, scale back down. You can also expect higher overall performance with Hyperscale due to higher log throughput and much faster transaction commit times no matter the size of your database.

While Hyperscale is in public preview, it is strongly recommended to not run any production workload yet. The reason for this is because current once you migrate to Hyperscale, you can move back to General Purpose or Business Critical tiers. For testing Hyperscale you should make a copy of your production database and migrate it to the Hyperscale service tier.