SQLskills SQL101: The SQL Server ERRORLOG

One of the most useful logs you can review when there’s a problem in SQL Server is the ERRORLOG.  It may not always be the answer to your problem, but it’s a good place to start.

When you initially install SQL Server it only keeps the most recent six (6) ERRORLOG files, in addition to the current, active one.  A new ERRORLOG file is generated when the instance restarts, or when you run sp_cycle_errorlog.  There are drawbacks to this default configuration.  If you do not regularly restart your instance (which is perfectly fine), then one ERRORLOG file could contain months, maybe even a year or more, of information.  That’s a lot of entries to read through if you’re looking for patterns or unusual errors.  In addition, if you happen to run into a scenario where you restart the instance multiple times in succession – three or four times for example – you could potentially lose months of history.

The solution is to recycle the ERRORLOG on a regular basis (I like to do this weekly), and increase the number of files retained.  To recycle the ERRORLOG every week, set up an Agent job that calls sp_cycle_errorlog.  I’ve included code at the end of this post to create the Agent job and weekly schedule.

Next, increase the number of ERRORLOG files you keep.  You can do this through Management Studio.  Expand the instance, then Management, right-click on SQL Server Logs and select Configure.  Enable the option Limit the number of error log files before they are recycled and then enter a number for Maximum number of error log files:  I like to keep 30 around.  That usually equates to about six months of time, including a few unplanned restarts.

Configure SQL Server to keep 30 ERRORLOG files

Configure SQL Server to keep 30 ERRORLOG files

 

 

 

 

 

 

 

 

 

You can also make this change with T-SQL:

USE [master];
GO
EXEC xp_instance_regwrite N'HKEY_LOCAL_MACHINE', N'Software\Microsoft\MSSQLServer\MSSQLServer', N'NumErrorLogs', REG_DWORD, 30;
GO

Checking the ERRORLOG configuration is always something we do as part of a health audit, and I’m always happy when I find systems with at least a few months’ worth of files that are less than a few MB in size (I think the largest ERRORLOG I’ve seen is 4GB…that one took a long time to open).  If this isn’t something you’ve configured on your SQL Server instances yet, take a few minutes and knock it out.  You won’t regret having this information when a problem comes up, or when you’re looking to see if a problem occurred a few months ago but maybe no one realized it.

If you’re interested in other posts in our SQLskills SQL101 series, check out SQLskills.com/help/SQL101.

Additional reading:

 

Code to create a SQL Agent job to run sp_cycle_errorlog weekly (Sundays at 12:01 AM):

USE [msdb];
GO
/****** Object:  Job [SQLskills Cycle ERRORLOG Weekly] ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object:  JobCategory [Database Maintenance] ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'Database Maintenance' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'Database Maintenance'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END
DECLARE @jobId BINARY(16)
EXEC @ReturnCode =  msdb.dbo.sp_add_job @job_name=N'SQLskills Cycle ERRORLOG Weekly',
@enabled=1,
@notify_level_eventlog=0,
@notify_level_email=0,
@notify_level_netsend=0,
@notify_level_page=0,
@delete_level=0,
@description=N'Cycle the ERRORLOG once a week.',
@category_name=N'Database Maintenance',
@owner_login_name=N'sa', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:  Step [Cycle ERRORLOG] PM ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Cycle ERRORLOG',
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=1,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'TSQL',
@command=N'EXEC sp_cycle_errorlog;
GO',
@database_name=N'msdb',
@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Weekly cycle of ERRORLOG',
@enabled=1,
@freq_type=8,
@freq_interval=1,
@freq_subday_type=1,
@freq_subday_interval=0,
@freq_relative_interval=0,
@freq_recurrence_factor=1,
@active_start_date=20170301,
@active_end_date=99991231,
@active_start_time=100,
@active_end_time=235959,
@schedule_uid=N'23a32e3e-c803-451f-b85a-b77d5b97ab3a'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO

 

 

Instant File Initialization: Easier to Enable in SQL Server 2016 (and some updated numbers)

The ability to have SQL Server data files skip zero initialization when they are created or grown has been available since SQL Server 2005.  By default, when you create a new data file in SQL Server, or extend the size of an existing one, zeroes are written to the file.  Depending on the size of the file or its growth, and the type of storage, this can take a while.  With Instant File Initialization (IFI), space is allocated for the data file but no zeroes are written.  Prior to SQL Server 2016, to enable this feature you had to edit the Local Security Policy to give the account that runs the SQL Server service the “Perform volume maintenance tasks” right (from Start | Run, type secpol, within the Local Security Policy expand Local Policies, then User Rights Assignment).  This was a task that DBAs had to perform separate from the SQL Server installation (or have a server admin do it for them), and if you did not make the change before installing SQL Server, then it required restarting SQL Server after making the change for it to take affect.  This has changed with the SQL Server 2016 installation, as you can now select the option “Grant Perform Volume Maintenance Task privilege to SQL Server Database Engine Service” when you specify the service accounts, and this will grant the right to the service account at that time.

Enable Instant File Initialization during SQL Server 2016 installation

Enable Instant File Initialization during SQL Server 2016 installation

There is a potential security risk to using this feature, which Kimberly discusses in her Instant Initialization – What, Why and How? post.  The information presented in her post is still valid and worth the read, but Glenn and I did re-run some tests recently, just to get some current numbers to show the benefits of IFI.  We ran the same four tests that Kimberly ran way back in 2007 (!) on four different sets of storage: two sets of 15K disks (one in a RAID10 array, the other in a RAID1 array) and two sets of flash storage (FusionIO cards).  More information on the storage at the end of the post.  The tests were:

 

1Create 20GB database
2Grow existing database by 10GB
3Restore 30GB empty database
4Restore 30GB database with 10GB data

 

The tests were run on two different physical servers, both running SQL Server 2014.  Details for each storage system are listed below for reference, and the test results were as we expected:

 

Duration for file modification or database restore, with and without IFI

Duration for file modification or database restore, with and without IFI

 

The time to zero out a file and write data is a function of sequential write performance on the drive(s) where the SQL Server data file(s) are located, when IFI is not enabled.  When IFI is enabled, creating or growing a data file is so fast that the time is not of significant consequence.  The time it takes to create or grow a value varies in seconds between 15K, SSD, flash, and magnetic storage when IFI is enabled.  However, if you do not enable IFI, there can be drastic differences in create, grow, and restore times depending on storage.

Storage Details:

  • 15K RAID10 = Six (6) 300GB 15K disks in RAID 10
  • Flash Drive1 = 640GB Fusion-io ioDrive Duo
  • Flash Drive2 = 2.41TB Fusion-io ioDrive2 Duo
  • 15K RAID1 = Two (2) 300GB Seagate Savvio 15K drives in RAID 1

Note: This post was edited on April 13, 2016, to clarify the storage configuration based on a helpful comment.

SQL Server Setup Error: The directory name is invalid

A few weeks ago I got an email from someone who had attended our Accidental DBA IE class last year, and this person was getting the following error when trying to apply a cumulative update:

SQL Server Setup failure: The directory name is invalid.

SQL Server Setup failure: The directory name is invalid.

The initial email didn’t have a lot of details, so I started asking questions to understand what version was being installed, the environment configuration, etc.  Turns out this was a two-node Windows Server Failover Cluster (WSFC) with multiple SQL Server 2012 instances installed, and one of the instances was still running on the node this person was trying to patch.  To be clear, the two nodes were SRV1 and SRV2, and the instances were PROD-A and PROD-B running on SRV1, and PROD-C which was running on SRV2.  This person was trying to install the cumulative update on SRV2.

Now, those of you that manage clusters may be thinking “Doesn’t this DBA know that the way you do rolling upgrades is by not having any instances running on the node you’re trying to patch?”  Well, not everyone is an experienced DBA, a lot of people are Accidental or Junior DBAs, and if this is the first cluster you’re supporting, you may not know that, or understand why.  Further, when you update a single node on a stand-alone server (one that’s NOT in a cluster) it’s not like you shut down the instance yourself and apply the CU, right?

We checked the summary installation log, located in C:\Program Files\Microsoft SQL Server\110\Setup Bootstrap\Log and found the following Exit message:

The directory ‘M:\a13e546ad3e0de04a828\’ doesn’t exist.

The M drive was a resource for PROD-C, along with the N drive.  There was also a quorum drive (Q) and the local C drive.  So how was M not available?

Well, it was initially, when the install started, and when the installer runs, it puts the files on the first network drive that it finds (if it’s an administrative installation), or the drive with the most free space (see: ROOTDRIVE property).  In this case, the M drive met the criteria.  When the installer then stopped the instance and took the cluster disks offline, the M drive was suddenly gone, hence the invalid directory.

You could argue that this is a bug…maybe…but the solution I suggested was to move PROD-C over to the SRV1 node, then run the installation.  You could also specify the directory as part of a command-line install, therefore using a different disk, but downtime was permitted in this scenario, so the failover wasn’t a deal-breaker.  Once this was done, the installation ran fine, and the latest CU was applied on that node.  The DBA then went through the process of failing all the instances over to the patched node, and then applying the CU on SRV1.

As an aside, if you’re not sure of the current service pack, cumulative update, or hotfix available for your SQL Server version, I recommend this site which has all versions and releases and links to the downloads.  And, for those of you running SQL Server 2014, CU5 for SP1 just came out yesterday and has some interesting fixes (see https://support.microsoft.com/en-us/kb/3130926).