A few weeks ago I got an email from someone who had attended our Accidental DBA IE class last year, and this person was getting the following error when trying to apply a cumulative update:
The initial email didn’t have a lot of details, so I started asking questions to understand what version was being installed, the environment configuration, etc. Turns out this was a two-node Windows Server Failover Cluster (WSFC) with multiple SQL Server 2012 instances installed, and one of the instances was still running on the node this person was trying to patch. To be clear, the two nodes were SRV1 and SRV2, and the instances were PROD-A and PROD-B running on SRV1, and PROD-C which was running on SRV2. This person was trying to install the cumulative update on SRV2.
Now, those of you that manage clusters may be thinking “Doesn’t this DBA know that the way you do rolling upgrades is by not having any instances running on the node you’re trying to patch?” Well, not everyone is an experienced DBA, a lot of people are Accidental or Junior DBAs, and if this is the first cluster you’re supporting, you may not know that, or understand why. Further, when you update a single node on a stand-alone server (one that’s NOT in a cluster) it’s not like you shut down the instance yourself and apply the CU, right?
We checked the summary installation log, located in C:\Program Files\Microsoft SQL Server\110\Setup Bootstrap\Log and found the following Exit message:
The directory ‘M:\a13e546ad3e0de04a828\’ doesn’t exist.
The M drive was a resource for PROD-C, along with the N drive. There was also a quorum drive (Q) and the local C drive. So how was M not available?
Well, it was initially, when the install started, and when the installer runs, it puts the files on the first network drive that it finds (if it’s an administrative installation), or the drive with the most free space (see: ROOTDRIVE property). In this case, the M drive met the criteria. When the installer then stopped the instance and took the cluster disks offline, the M drive was suddenly gone, hence the invalid directory.
You could argue that this is a bug…maybe…but the solution I suggested was to move PROD-C over to the SRV1 node, then run the installation. You could also specify the directory as part of a command-line install, therefore using a different disk, but downtime was permitted in this scenario, so the failover wasn’t a deal-breaker. Once this was done, the installation ran fine, and the latest CU was applied on that node. The DBA then went through the process of failing all the instances over to the patched node, and then applying the CU on SRV1.
As an aside, if you’re not sure of the current service pack, cumulative update, or hotfix available for your SQL Server version, I recommend this site which has all versions and releases and links to the downloads. And, for those of you running SQL Server 2014, CU5 for SP1 just came out yesterday and has some interesting fixes (see https://support.microsoft.com/en-us/kb/3130926).