SQLskills SQL101: The Importance of Maintaining SQL Server

SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

When I look at many SQL Server instances in the wild, I still see a large percentage of instances that are running extremely old builds of SQL Server for whatever major of version of SQL Server is installed. This is despite years of cajoling and campaigning by myself and many others (such as Aaron Bertrand), and an official guidance change by Microsoft (where they now recommend ongoing, proactive installation of Service Packs and Cumulative Updates as they become available).

Microsoft has a helpful KB article for all versions of SQL Server that explains how to find and download the latest build of SQL Server for each major version:

Where to find information about the latest SQL Server builds

Here is my commentary on where you should try to be for each major recent version of SQL Server:


SQL Server 2017

SQL Server 2017 and newer will use the “Modern Servicing Model”, which does away with Service Packs. Instead, Microsoft will release Cumulative Updates (CU) using a new schedule of one every month for the first year after release, and then one every quarter for next four years after that.

Not only does Microsoft correct product defects in CUs, they also very frequently release new features and other product improvements in CUs. Given that, you should really try to be on the latest CU as soon as you are able to properly test and deploy it.

SQL Server 2017 Build Versions

Performance and Stability Fixes in SQL Server 2017 CU Builds

Reasons to Upgrade to SQL Server 2017


SQL Server 2016

SQL Server 2016 and older use the older “incremental servicing model”, where each new Service Pack is a new baseline (or branch) that has it’s own Cumulative Updates that are released every eight weeks. Microsoft corrects product defects in both Service Packs and in CUs, and they also very frequently release new features and other product improvements in both CUs and Service Packs.

As a special bonus, Microsoft has also gotten into the very welcome habit of actually backporting some features and improvements from newer versions of SQL Server into Service Packs for older versions of SQL Server. The latest example of this was SQL Server 2016 Service Pack 2 which has a number of improvements backported from SQL Server 2017.

SQL Server 2016 Build Versions

Performance and Stability Related Fixes in Post-SQL Server 2016 SP1 Builds

Performance and Stability Related Fixes in Post-SQL Server 2016 SP2 Builds

SQL Server 2016 Service Pack 2 Release Notes

SQL Server 2014

SQL Server 2014 will fall out of Mainstream Support from Microsoft on July 9, 2019. If you are running SQL Server 2014, you really should be on at least SQL Server 2014 SP2 (which got many improvements backported from SQL Server 2016), and ideally, you should be on the latest SP2 Cumulative Update. You should also be on the lookout for SQL Server 2014 SP3 which is due to be released sometime in 2018, which is very likely to have even more backported improvements.

If you are on SQL Server 2014 or SQL Server 2012, Microsoft has a very useful KB article that covers recommended updates and configuration options for high performance workloads. A number of these configuration options are already included if you are on the latest SP or newer for either SQL Server 2012 or SQL Server 2014.

SQL Server 2014 Build Versions

Performance and Stability Related Fixes in Post-SQL Server 2014 SP2 Builds

Hidden Performance and Manageability Improvements in SQL Server 2012/2014

SQL Server 2014 Service Pack 2 is now Available !!!

SQL Server 2012

SQL Server 2012 fell out of Mainstream Support from Microsoft on July 11, 2017. If you are running SQL Server 2012, you really should be on SQL Server 2012 SP4, ideally with the Spectre/Meltdown security update applied on top of SP4. Similar to SQL Server 2014 SP2, SQL Server 2014 SP4 also included a number of product improvements that were backported from SQL Server 2016.

SQL Server 2012 SP3 build versions

Performance and Stability Related Fixes in Post-SQL Server 2012 SP3 Builds

SQL Server 2012 Service Pack 4 (SP4) Released!

So just to recap, here are my recommendations by major version:

SQL Server 2017: Latest CU as soon as you can test and deploy

SQL Server 2016: Latest SP and CU as soon as you can test and deploy. Try to at least be on SQL Server 2016 SP2.

SQL Server 2014: Latest SP and CU as soon as you can test and deploy. Try to at least be on SQL Server 2014 SP2 (and SP3 when it is released).

SQL Server 2012: SP4 plus the security hotfix for Spectre/Meltdown.





SQLskills SQL101: How You Can Make Your Database Backups More Reliable

SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

Since my colleague Paul Randal wrote DBCC CHECKDB while he was on the SQL Server Product team at Microsoft, he is an acknowledged expert on SQL Server database corruption and repair techniques. Because of this well-earned reputation, we typically get multiple e-mails each week asking for Paul’s advice and assistance dealing with database corruption and repair issues.

A typical pattern for these e-mails is that a production SQL Server database has become suspect, and running DBCC CHECKDB fails with some specific series of errors. Depending on exactly what errors are being returned from DBCC CHECKDB, it may be a situation where DBCC CHECKDB cannot do anything to resolve the corruption. In some cases, Paul can go in and do some manual repair work (at his regular consulting rate) to help resolve the issue, but in some cases, even Paul cannot fix the corruption (or he is not immediately available to do any work).

This leaves the last line of defense being restoring from your last set of known, good database backups. Unfortunately, in many cases, it turns out that there are no good database backups available that can actually be restored. If this happens, it is likely to be resume/CV updating time for the DBA, and possibly even a catastrophic outcome for the existence of your entire organization. So what can you do to minimize the chance of this happening to you or your organization?

Here are a few steps that you can take:


Keep your main system BIOS and all storage-related firmware and drivers up to date

One of the leading causes of database corruption (and backup corruption) are problems with your storage subsystem. These are often caused by out of date versions of your main system BIOS, storage firmware, or storage drivers. The server and component vendors don’t typically go to the trouble of issuing these types of updates unless they are correcting significant issues.

When these type of updates are available, they are often labeled as critical or urgent updates. Reading the release notes for these updates can often give you more information about the issue and the fix for the issue. As a DBA, you want to make sure someone (perhaps you) is monitoring this situation for your database servers.


Use SQL Server Agent Alerts to detect important errors on your SQL Server instance

Many novice DBAs have never even heard of SQL Server Agent Alerts. In a nutshell, they can be used to more quickly detect and possibly react to some types of hardware and software issues and errors that may happen on a SQL Server instance (or its underlying hardware and storage).

Normally, these types of errors will just get logged to the SQL Server Error Log, where they might not be noticed in a timely manner. Fortunately, I have a T-SQL script that can create a set of SQL Server Agent Alerts for many common issues. I also have a blog post with more details here.


Make sure all of your databases are using CHECKSUM for their Page_Verify option

CHECKSUM is the default page_verify setting for new databases since SQL Server 2005, but you might have older databases that have been upgraded over the years where the page_verify setting was never changed. You also might have a situation where someone has purposely switched the page_verify setting to TORN_PAGE or NONE for some strange reason.

When CHECKSUM is enabled for the PAGE_VERIFY database option, the SQL Server Database Engine calculates a checksum over the contents of the whole page, and stores the value in the page header when a page is written to disk. When the page is read from disk, the checksum is recomputed and compared to the checksum value that is stored in the page header. I previously wrote about this issue here.


Make sure you are using the CHECKSUM option with your database backups

You can (and should) add the CHECKSUM option whenever you run any type of database backup. Since SQL Server 2014, you have had the ability to set an instance-level setting (with sp_configure) to add this option to backup commands by default, just in case someone (or a 3rd party backup solution) does not add the option in the actual backup command. With older versions of SQL Server, you can also get the same effect by adding Trace Flag 3023 as a start-up trace flag You can also enable/disable TF 3023 dynamically.

Adding the CHECKSUM syntax to the backup command forces SQL Server to verify any existing page checksums as it reads pages for the backup, and it calculates a checksum over the entire backup. Adding the CHECKSUM option is not a replacement for actually restoring a database backup to see if it is good or not, but it is a good intermediate step in the process.


Actually restore your database backups on a regular basis to verify that they are good

This is the only way to be absolutely sure that your database backups are good. These other steps will increase the chances that your database backups are good, but an actual database restore is the acid test. You should be doing this on a regular basis, in an automated fashion.

Microsoft has some foundational guidance about backup and restore operations here. Paul Randal has a Pluralsight course called SQL Server: Understanding and Performing Backups.


The whole subject of avoiding database corruption and having an effective database backup and restore strategy to meet your RPO and RTO goals is far more extensive than I want to cover in a single SQL101 blog post. Hopefully the information in this post has been a good starting point.






SQL101: Avoiding Mistakes on a Production Database Server

As Kimberly blogged about earlier this year, SQLskills has an ongoing initiative to blog about basic topics, which we’re calling SQL101. We’re all blogging about things that we often see done incorrectly, technologies used the wrong way, or where there are many misunderstandings that lead to serious problems. If you want to find all of our SQLskills SQL101 blog posts, check out SQLskills.com/help/SQL101.

One reason that it is relatively difficult to get your first job as a DBA (compared to other positions, such as a developer) is that it is very easy for a DBA with Production access to cause an enormous amount of havoc with a single momentary mistake.

As a Developer, many of your most common mistakes are only seen by yourself. If you write some code with a syntax error that doesn’t compile, or you write some code that fails your unit tests, usually nobody sees those problems but you, and you have the opportunity to fix your mistakes before you check-in your code, with no one being any the wiser.

A DBA doing something like running an UPDATE or DELETE statement without a WHERE clause, running a query against a Production instance database when you thought you were running it against a Development instance database, or making a schema change in Production that is a size of data operation (that locks up a table for a long period) are just a few examples of common DBA mistakes that can have huge consequences for an organization.

A split-second, careless DBA mistake can cause a long outage that can be difficult or even impossible to recover from. In SQL Server, Cntl-Z (the undo action) does not work, so you need to be detail-oriented and careful as a good DBA. As the old saying goes: “measure twice and cut once”.

Here are a few basic tips that can help you avoid some of these common mistakes:


Using Color-Coded Connections in SSMS

SQL Server Management Studio (SSMS) has long had the ability to set a custom color as a connection property for individual connections to an instance of SQL Server. This option is available in legacy versions of SSMS and in the latest 17.4 version of SSMS. You can get even more robust connection coloring capability with third-party tools such as SSMS Tools Pack.

The idea here is to set specific colors, such as red, yellow, or green for specific types of database instances to help remind you when you are connected to a Production instance rather than a non-Production instance. It is fairly common to use red for a Production instance. This can be helpful if you don’t have red green color blindness, which affects about 7-10% of men, but is much less common among women.

Figure 1 shows how you can check the “Use custom color” checkbox, and then select the color you want to use for that connection. After that, as long as you use the exact same connection credentials for that instance from your copy of SSMS, you should get the color that you set when you open a connection to that instance.

I would not bet my job on the color always being accurate, because depending on exactly how you open a connection to the instance, you may not always get the custom color that you set for the connection. Still, it is an extra piece of added insurance.


image

Figure 1: Setting a custom color for a connection


Figure 2 shows a red bar at the bottom of the query window (which is the default position for the bar) after setting a custom connection color. This would help warn me that I was connected to a Production instance, so I need to be especially careful before doing anything.


image

Figure 2: Query window using red for the connection


Double-Checking Your Connection Information Before Running a Query

Something you should always do before running any query is to take a second to glance down to the bottom right of SSMS Query window to verify your current connection information. It will show the name of the instance you are connected to, your logon information (including the SPID number), and the name of the database you are connected to.

Taking the time to always verify that you are connected to the database and instance that you think you are BEFORE running a query will save you from making many common, costly mistakes.


Wrap Queries in an Explicit Transaction

One common safety measure is to wrap your queries (especially potentially dangerous ones that update or delete data) in an explicit transaction as you see in Figure 3. You open an explicit transaction with a BEGIN TRAN statement, then run just your query, without the COMMIT TRAN statement. If the query does what you expect (which the xx rows affected message can often quickly confirm), then you commit the explicit transaction by executing the COMMIT TRAN statement.

If it turns out that you just made a horrible mistake (like I did in the example in Figure 3) by omitting the WHERE clause, you would execute the ROLLBACK TRAN statement to rollback your explicit transaction (which could take a while to complete).


image

Figure 3: Using an explicit transaction as a safety measure


Test your Update/Delete Queries as Select Queries Before You Run Them

Another common safety measure is to write and run a test version of any query that is designed to change data, where you simply SELECT the rows that you are planning on changing before you actually try to change them with an UPDATE or DELETE statement. You can often just have the query count the number of rows that come back from your test SELECT statement, but you might need or want to to browse the data that comes back to be 100% sure that you don’t have a logic error in your query that would end up deleting or updating the wrong result set.


These are just a few of the most common measures for avoiding common DBA mistakes. The most important step is to always be detail-oriented and very careful when you are making potentially dangerous changes in Production, which is easier said than done. If you do make a big mistake, don’t panic, and don’t try to cover it up. Taking a little time to think about what you did, and the best way to quickly and correctly fix the problem is always the best course of action.