Over the last couple of days, you have probably heard quite a bit of chatter and speculation about some newly disclosed ways to attack various processors. The initial reports were that only Intel processors were affected, but some sources indicate that some AMD and ARM processors are also vulnerable.
Security researchers at Graz University (who were involved with the initial discovery of these issues) have put up a site, complete with cute logos, with some useful information about these two exploits. The most detailed information so far about the attack methods comes from Google Project Zero, as shown here: Reading privileged memory with a side-channel. Their testing shows some limited vulnerability for some older AMD processors.
AMD is pretty adamant that their processors are not vulnerable to these exploits, as shown by this statement from AMD’s Tom Lendacky:
“AMD processors are not subject to the types of attacks that the kernel
page table isolation feature protects against. The AMD microarchitecture
does not allow memory references, including speculative references, that
access higher privileged data when running in a lesser privileged mode
when that access would result in a page fault.
Disable page table isolation by default on AMD processors by not setting
the X86_BUG_CPU_INSECURE feature, which controls whether X86_FEATURE_PTI
Linus Torvalds also seems pretty confident that AMD is not affected, as witnessed by his comments in a recent code check-in:
“Exclude AMD from the PTI enforcement. Not necessarily a fix, but if AMD is so confident that they are not affected, then we should not burden users with the overhead”
Paul Alcorn has a pretty good write-up about this issue here. Yesterday, Phoronix published some early benchmark results against a patched version of Linux that were pretty alarming for some use cases (synthetic IO benchmarks and PostgreSQL database performance).
Redhat has published some information about the performance impact of OS fixes on several different workload types. The most notable is what they define as
“Measureable: 8-19% – Highly cached random memory, with buffered I/O, OLTP database workloads, and benchmarks with high kernel-to-user space transitions are impacted between 8-19%. Examples include OLTP Workloads (tpc), sysbench, pgbench, netperf (< 256 byte), and fio (random I/O to NvME)”
More details about these findings and some mitigation methods for RHEL are available in these links:
It seems like the various fixes for these issues are going to hit database and virtualization performance harder than most other use cases. I wonder whether it will be possible for Intel to at least partially fix the issue with a stepping change on any Intel processors that are still in production (i.e. they make an actual hardware fix using the same existing processor design) that lets them send out replacement processors that work in some existing servers.
If you are old enough to remember the old Pentium FDIV bug in 1994, Intel initially tried to minimize the issue, saying that it was very rare. Then, they tried to make people prove that they were hitting the bug by running an Intel utility. Finally, they caved in to bad PR and ended up sending out replacement CPUs to a lot of people, no questions asked, which cost them $475 million back in the day. I remember swapping out my CPU, because I was a geek back then too!
Early this morning, Microsoft published this KB article: SQL Server Guidance to protect against speculative execution side-channel vulnerabilities. According to Microsoft, the following versions of SQL Server are impacted when running on x86 and x64 processor systems: SQL Server 2008, SQL Server 2008R2, SQL Server 2012, SQL Server 2014, SQL Server 2016, SQL Server 2017.
Microsoft has already issued two Cumulative Updates that include fixes to help mitigate this issue (along with the other important hotfixes included in each CU).
I suspect that there will be an out of band CU or hotfix for SQL Server 2014 SP2 relatively soon, since it is still in Mainstream support. Even though SQL Server 2012 and older are out of Mainstream support, Microsoft will probably develop and release hotfixes for those releases relatively soon since this is a security issue.
Microsoft has also started pushing out an out of band OS update for Windows 10 (KB4056892) that is meant to mitigate this issue. There are similar updates for most other supported Microsoft operating systems. Here is the current information for Windows Server:
Here is Microsoft’s current security advisory advice:
Microsoft has also released this statement about how they have been handling this for Microsoft Azure
Here is what I plan on doing over the next couple of weeks as this starts to shake out:
- Find out what Intel has to say about this issue (beyond this vague initial statement)
- Find out if possible, which Intel processor families are affected by the issue
- Find out whether newer Intel processor families are less affected than older Intel processor families
- Find out if Intel can fix the issue with a hardware change for current production processors like the Xeon Scalable Processor Family
- Do some before/after testing with CPU-Z, Geekbench, CrystalDiskMark, and DiskSpd to see how performance is affected by the patches
- Do some SQL Server testing with some common, easy tasks, such as big sequential reads, running DBCC CHECKDB, running full backups and restores to see how SQL Server is affected by the OS patch
Here is what I think you should be doing:
Plan on getting your database servers patched as soon as possible, which will include OS patches, SQL Server patches, and possible firmware or BIOS/UEFI updates as they become available.
Be ready to do some workload and query tuning as necessary if your workload performance is negatively affected by these various patches and updates.
Think harder about upgrading to new hardware, a newer version of your OS, and a newer version of SQL Server that is still fully supported.
For personal and client workstation systems, you should be checking to see if there are any firmware or BIOS/UEFI updates that become available, both for these issues and as a general best practice.
I am collecting some resources about this issue from the server vendors as shown in the links below:
Microprocessor Side-Channel Attacks (CVE-2017-5715, CVE-2017-5753, CVE-2017-5754): Impact on Dell EMC products (Dell Enterprise Servers, Storage and Networking)
Microprocessor Side-Channel Attacks (CVE-2017-5715, CVE-2017-5753, CVE-2017-5754): Impact on Dell products (This is for client hardware)