A big part of any DBA consultant’s job is helping clients’ workloads run faster. When I get called to help a client, the first thing I do is perform an audit of their instance to get a good baseline. I pull a lot of history from DMVs, log files and system configuration information.
A lot of effort has gone into making this information gathering as light weight as possible because the last thing I want to do is add stress to an already stressed system. Collecting all this data enables me to be able to check the client’s configuration against commonly accepted best practices as well as performance bottlenecks.
Over the past few years I have found that many clients make the same mistakes – missing changing the same settings or missing implementing the same maintenance. Here are a few that I’ve picked from my long list that I think are the most critical and easy to implement:
- DBCC CHECKDB
- Memory settings
- Index maintenance
- MAXDOP and cost threshold for parallelism
- SQL Server Agent alerts
I always check to see what recovery model each database has and the history of recent backups of each database. Typical findings here include:
- No backup at all – no record of backups for the database
- Missing backups – database in FULL recovery model with no log backups
- No recent backups – last backup was weeks/months/years old
Issues with backups always make the top of my priority list as it’s critical that data recovery be possible. That means not only that the client has proper backups to meet SLAs but also that copies of these backups are stored in a secondary location offsite.
I also always look to see when the last DBCC CHECKDB was ran against each database. What I find is very similar to backups:
- No DBCC CHECKDBs being performed at all
- Only some databases with DBCC CHECKDBs being performed
- No recent DBCC CHECKDB – last was performed months or years ago
- The worst, failed DBCC CHECKDBs – job scheduled, corruption found and nothing is being done
Finding corruption in a client’s database is always frustrating when the corruption is in a portion of the database that is going to cause data loss that will have a detrimental effect on the business.
I have worked with clients from numerous lines of business, most recently including Relativity customers, healthcare, financial and retail. Each had a similar story where some I/O-subsystem related issue caused corruption. In some of the cases the client was performing consistency checks and when the corruption was detected they reached out for help. In those cases I was able to restore from backups with no data loss.
In some other cases the client was not so lucky and lost data. Corruption is nasty. Sometimes you get lucky and the corruption is in a non-clustered index and rebuilding the index is the fix, other times the corruption could be the heap or clustered index, which is the physical table data, or schema corruption. For those a restore is nearly always your best option.
I have had situations where the only fix was to script out the database and copy all the usable data out of the corrupt database into the newly-created database. This was due to not having backups and having unrepairable schema corruption. That is not fun. You can avoid these costly issues by running DBCC CHECKDB regularly against all your databases and reacting quickly when failures occur. You will want to have a backup retention policy that provides enough coverage for your consistency checks. You don’t want to be running consistency checks weekly and only retain three days of backups.
A default install of Microsoft SQL Server will have your minimum memory value set at 0 and your maximum server memory value set at 2147483647 MB. If these values go unchanged you can find yourself in a situation where SQL Server can be starving the operating system or if your SQL Server is shared with another application including SSIS, SSAS, or SSRS, that SQL Server itself could be starved for resources.
It is highly advisable to set a minimum and maximum value for your SQL Server instance. My colleague Jonathan Kehayias has written an excellent blog post How much memory does my SQL Server actually need, with the formula that many consultants are using to establish the baseline for the maximum memory value. I recommend my clients to set the minimum value to 30% of the memory on the server, to avoid SQL Server being pressured too low by other executables.
Finding out-of-date statistics is very common when performing audits on SQL Server instances. There is a misconception that having “auto update statistics” enabled keeps your statistics up to date. The issue with that setting is that statistics are not automatically updated until at least 20% of the rows plus 500 have been modified. For large tables this could take a very long time!
Having up-to-date statistics is important because the Query Optimizer uses statistics to formulate execution plans. With out-of-date or invalid statistics, your system is very likely going to have inefficient plans.
To prevent having out-of-date statistics, I encourage my clients to utilize a third-party script to update statistics. Ola Hallengren has published a widely-used and highly-praised Maintenance Solution for SQL Server. Part of that process is the Index Optimize procedure that includes updating statistics. This process can update all statistics or just those that have had row modifications since the last statistics update. By updating the statistics that have had row modifications you decrease the overhead of having to update all statistics, which is really cool.
Performing index maintenance by removing fragmentation from your indexes is very important. According to a Microsoft study, index fragmentation can have a negative impact from 13-460% depending on the size of the environment and the level of fragmentation. See Microsoft SQL Server Index Defragmentation Best Practices for more details.
Fragmentation occurs through normal operations such as inserts, updates and deletes, which you can consider as normal wear and tear on your databases. To remediate this, proper index maintenance of rebuilding or reorganizing your indexes is needed. For this I again turn to Ola Hallengren, for his Index Optimize script. What makes me recommend Ola’s scripts over database maintenance plans is the ability to specify to rebuild or reorganize based on the level of fragmentation as well as only doing so for indexes that have more than X number of pages.
Rebuilding and reorganizing indexes comes with an I/O and logging cost so being able to use logic to determine which indexes to perform maintenance on helps decrease that cost.
MAXDOP and Cost Threshold for Parallelism
Max degree of parallelism and cost threshold for parallelism are usually left at the default values on the client servers I see. For MAXDOP the default value is zero which means an ‘unlimited’ number of CPUs could be used to execute a parallel region of a query. Many years ago when servers had lower processor core counts, this value was acceptable. In current times with high core density and multi-socket servers, an unlimited number of CPUs for parallelism isn’t so good. Microsoft has recently given guidance that if you have more than 8 CPUs on the server, a MAXDOP value of 8 should be used as a starting point, if you have less than 8 then use a value from 0 – N, with N being the number of CPUs.
With the popularity of virtualizing SQL Server, many of the virtual servers I have worked on have less than 8 virtual processors and a MAXDOP of 0. In this case the default value would be ok, however I still tell my clients that if they later decide to scale the server to more than 8 processors to consider setting the MAXDOP value. You can read about the Microsoft recommendations here: Recommendations for guidelines for the “max degree of parallelism” configuration in SQL Server.
Cost threshold for parallelism has a default value of 5. The history of this number goes way back to the early days of SQL Server and the workstation that testing was performed on. Fast forward to modern times and the advancements in processors, memory, disk and everything else related to computing, the cost estimation of 5 is really outdated. Plenty of testing has shown that increasing the number from 5 to a higher value will keep shorter-running queries from having a parallel plan. SQLskills recommends that our clients start with a value between 25 and 50, monitor the plan cache, and adjust from there. See this great blog post from Jonathan for more details: Tuning ‘cost threshold for parallelism’ from the Plan Cache.
SQL Server Agent Alerts
Everyone should be using SQL Agent alerts unless they have a third-party application monitoring for the same error conditions. Configuring alerts is easy and free, and having them configured will give you critical information when your servers are having problems. Who doesn’t want to be proactive, right? Unfortunately this is another common thing to find clients not doing.
I recently wrote an article titled SQL Server Agent Alerts on SQL Sentry’s cool sqlperformance.com website, providing step-by-step instructions on how to create alerts for severity 19-25 errors and error 825. Enabling these alerts is easy: enable database mail, create a mail operator and then create the alerts. This can be accomplished using the GUI or with T-SQL. I encourage my clients to script out this process using T-SQL and make it part of their standard server build. Who doesn’t like free and easy?
As you can see, there are lots of settings that should be modified from the defaults for an installation of SQL Server. This list is not comprehensive and there are many more things I plan to blog about when it comes to not using defaults, so stay tuned for more posts!