About a month ago I kicked off a survey around regularly rebooting SQL Server (see here). There is much debate about rebooting SQL Server (or Windows Server hosting SQL Server), regularly or even at all so I've been looking forward to the results.

My view is that there are circumstances where rebooting, even regularly, is acceptable – but you have to have a good reason to do it because of the downsides of restarting the SQL Server process. These downsides include:

  • The buffer pool goes completely cold. Warming up again requires reading (potentially a lot of) data back into memory, after allocating that memory from Windows.
  • All query plans are lost and will need to be compiled again.
  • All information about the state of SQL Server, used to give DMV output, is lost.

In other words, the steady-state for the production workload is lost and has to be attained again.

As you'll see from the results, the most common reason for a regular reboot is part of the Windows/SQL patching process, but there are plenty of reasons cited that are NOT valid in my opinion, and some others that are.

 reboot1 Survey results on rebooting   is it good or bad?

The "Other" responses are:

  • 101 x (paraphrasing) "After windows/SQL/firmware updates that require a reboot."
  • 30 X (paraphrasing) "Quarterly."
  • 10 x "Only if any problems occur or any maintenance activity scheduled in off hours."
  • 5 x "Only if I've observed some resource contention after some time. Usually a result of not being able to implement the solution and having to work around the problem."
  • 4 x "Every two months."
  • 1 x "I normally go 360+ days between reboots. And then it's just for giggles. No indication it is needed or required."
  • 1 x "I would like to do it monthly but my boss doesn't support it."
  • 1 x "Yes, every two weeks."
  • 1 x "Yes, twice a week."

 reboot2 Survey results on rebooting   is it good or bad?

The "Other" responses are (I've commented on those in bold):

  • 74 x (paraphrasing) "MS security patches – manually installed after testing them on duplicate system."
  • 67 x (paraphrasing) "As part of the monthly MS patching."

  • 17 x (paraphrasing) "No regular reboots."
  • 9 x "We reboot our SQL servers about twice a year average for updates or other issues."
  • 6 x "As required for patching activities and only after ensuring that the potential for patching has been reduced by eliminating non-required components and programs."
  • 4 x "After patch installation or maintenance such as password updates to test cluster failover."
  • 3 x "Process unkilled and some memory problems."
  • 2 x (paraphrasing) "Because the business demands it, for no valid reason."
    • You need to work out why that process was put in place. My guess is that it's based on historical requirements to do so on older operating systems, or because of bugs, and it's just stuck. Very likely the process can be removed if the reboot schedule is very frequent (e.g. daily or weekly).
  • 1 x "I have issues with a CLR process."
    • Then try to fix that process/code rather than resorting to a reboot.
  • 1 x "I've been told that certain errors won't be reported by Windows without a reboot."
  • 1 x "Last rebooted about 50 days ago – only reboot when necessary (patches)."
  • 1 x "Microsoft Insists."
    • I don't buy this one, unless it's to work around a specific bug that will be patched at some point.
  • 1 x "Nightly reboots of a specific SQL Server are required by the vendor whose app it supports."
    • This is badly messed up and you need to push back on the vendor. My guess here is that they have a leaky application (i.e. memory leaks) and this is their solution.
  • 1 x "Non Windows app memory leaks."
    • Again, try to fix the memory leaks rather than resorting to rebooting.
  • 1 x "Our application becomes unstable otherwise."
    • Fix the application…
  • 1 x "Regular test of clusters AND SQL Server performance degrades without restart."
  • 1 x "Save power, turn it off at night."
    • This is interesting and one I haven't considered before. Are you sure the gain in power saving is worth the downsides I mention at the start of the post?
  • 1 x "SQL Server connection time out error."
  • 1 x "VMware Snapshot."
    • This I don't know about (that's why I employ Jonathan :-), but I'd be surprised if this process requires the server to be rebooted.
  • 1 x "We do full server backups weekly. we briefly shut SQL, copy c drive where OS and SQL software live – paging file etc. then bring SQL up and finish server full backup. This takes approx 5 min per week."
    • This is messed up. SQL Server and Windows can both be backed up without having to be shut down.
  • 1 x "We have a weekly Maintenance window and it keeps the event logs manageable."
    • This is messed up. Windows and SQL logs can be cycled without rebooting.

Quite interesting results.

There have been many problems of the years which have required regular reboots to clear, but these are *mostly* gone with the advent of Windows Server 2008/2008 R2. I think there are three main reasons which still are valid for rebooting regularly:

  1. As part of Windows/SQL/firmware patch installs.
  2. To test failover procedures.
  3. To work around a Windows or SQL bug.

As an example of #3, there's a problem still in Windows Server 2008 with the file cache taking up all the memory, depending on which APIs are used to call it (see KB 976618). And what about the various SQL Server memory leak bugs over the years (e.g. KB 959007).

There are plenty of other symptoms that can be "fixed" with a server reboot. A great example of this is plan cache bloat from lots of plans for ad-hoc SQL queries. This can eat away at the memory available for the buffer pool to store data and cause the server performance to grind down (see Kimberly's post Plan cache, adhoc workloads and clearing the single-use plan cache bloat). And what about buffer pool memory being wasted by massive index fragmentation, resulting in poor performance (see my post Performance issues from wasted buffer pool memory). Or any of the reasons in bold above that mention and application instability, memory leak, or issue?

For these problems, rebooting the server is addressing the symptom of the problem, not the root-cause. We've had several clients in the last month where massive performance problems were being "solved" by failing over to the other cluster node, and they turned out to be caused from not having Lock Pages In Memory set (see KB 918483, and take care to note all the Windows bugs it explains that can cause working set trimming too). They were addressing the symptom without researching the cause.

Bottom line: if you're regularly rebooting Windows/SQL Server, make sure it's for a good reason and not just because someone thinks it's a good thing to do or it's the chosen way to fix a problem that should be fixed in some other way.

And don't immediately assume that someone who does reboot Windows/SQL Server regularly is an idiot – they may have a very good reason, or may just be ignorant… (see Ignorance is not stupidity).

Thoughts?