Recent TPC-E Results on SQL Server 2017

Lenovo has submitted the two most recent TPC-E OLTP benchmark results, both using SQL Server 2017 running on Windows Server 2016 Standard Edition, using 28-core Intel Xeon Platinum 8180 processors.

The most recent result was for a four-socket Lenovo ThinkSystem SR950 with 3TB of RAM using a 48TB initial database size. This system had an official result of 11,357.28, which is the highest score ever submitted for a four-socket server. This system has a total of 112 physical cores, so if you divide the total score of 11,357.28 by 112, you get a measure of the single-threaded performance of the Intel Xeon Platinum 8180 processor under a full load (where the clock speed of the individual cores will be pretty close to the 2.5GHz base clock speed). In this case, the result is 101.40 score/core.

Back on June 27, 2017, Lenovo submitted a result for a two-socket Lenovo ThinkSystem SR650 with 1.5TB of RAM using a 28.5TB initial database size. This system had an official result of 6,598.36, which is the highest score ever submitted for a two-socket server. This system has a total of 56 physical cores, so if you divide the total score of 6,598.36 by 56, you get a score/core of 117.83, which is significantly higher than the result for the Lenovo ThinkSystem SR950 configured to use four-sockets (using the exact same Intel Xeon Platinum 8180 processor).

I would attribute most of this difference to the added NUMA overhead from a four-socket system, compared to a two-socket system. Another difference, which probably hurt the score of the two-socket system was the fact that it had to be running on a pre-release version of SQL Server 2017, based on the submission date of the benchmark.

This is just another piece of evidence that even with NUMA, capacity does not scale in a linear fashion as you add sockets to a server. Assuming you can split your workload across multiple database servers rather than just one, having two, two-socket servers instead of one, four-socket server will give you both more CPU capacity and better single-threaded CPU performance even when using the exact same model processor.

I would also argue that you could purposely pick a lower core count, but higher base clock speed processor from the same Intel Xeon Scalable Processor Family to find a sweet spot for SQL Server 2017 usage, where you have fewer physical cores to license, with better single-threaded performance across a higher number of servers.




CPU-Z 1.80 is Available

CPU-Z 1.80 was released on July 10, 2017. It adds support for the new Intel Skylake-X and Kabylake-X high-end desktop processors (HEDT). It also adds information about your preferred core(s) in the Clocks dialog on the About tab.

In case you are wondering what that means, some of the latest Intel processors have a new feature called Intel Turbo Boost Max Technology 3.0 that can automatically direct single-threaded workloads to the fastest core available on a processor. It requires a supported processor, BIOS support, and a special Intel driver, along with operating system support.

One processor that has this feature is the Intel Core i9-7900X processor.

image

Figure 1: CPU Tab of CPU-Z 1.80

 

If you want to investigate whether you have this feature and what it is doing, you can click on the Clocks button on the About tab, and see the preferred core information for your processor.

image

Figure 2: About Tab of CPU-Z 1.80

 

image

Figure 3: Clocks Dialog

My old Core i7-3770K processor does NOT have this new feature!

Intel Xeon E7 Processor Generational Performance Comparison

Intel has a fairly recent document titled Accelerated Operations for Telecom and Financial Services which is also listed under Accelerate OLTP Database Performance with Intel TSX. It describes the “performance” increases seen with the AsiaInfo ADB from moving from 2.8GHz Intel Xeon E7-4890 v2 (Ivy Bridge-EX), to 2.5GHz Intel Xeon E7-8890 v3 (Haswell-EX), and finally to 2.2GHz Intel Xeon E7-8890 v4 (Broadwell-EX) processors, as shown in Figure 1.

 

Figure 1: Speedup from Successive Processor Generations

 

This workload is described as “AsiaInfo ADB Database OCS k-tpmC”, while the AsiaInfo ADB is described as “a scalable OLTP database that targets high performance and mission critical businesses such as online charge service (OCS) in the telecom industry”, that runs on Linux.

The reason I have performance in quotes above is because what they are really measuring is closer to what I would call capacity or scalability. Their topline result is “Thousands of Transactions per Minute” as measured with these different hardware and storage configurations.

The key point to keep in mind with these types of benchmarks is whether they are actually comparing relatively comparable systems or not. In this case, the systems are quite similar, except for the core counts of the successive processor models (and the DD3 vs. DDR4 memory support). Here are the system components, as listed in the footnotes of the document:

Baseline: Four-sockets, 15-core Intel Xeon E7-4890 v2, 256GB DDR3/1333 DIMM, Intel DC S3700 SATA for OS, (2) 2TB Intel DC P3700 PCIe NVMe for storage, 10GbE Intel X540-AT2 NIC

Next Generation: Four-sockets, 18-core Intel Xeon E7-8890 v3, 256GB DDR4/1600 LVDIMM, Intel DC S3700 SATA for OS, (2) 2TB Intel DC P3700 PCIe NVMe for storage, 10GbE Intel X540-AT2 NIC

New: Four-sockets, 24-core Intel Xeon E7-8890 v4, 256GB DDR4/1600 LVDIMM, Intel DC S3700 SATA for OS, (2) 2TB Intel DC P3700 PCIe NVMe for storage, 10GbE Intel X540-AT2 NIC

The baseline system has a total of 60 physical cores, running at 2.8GHz, using the older Ivy Bridge-EX microarchitecture. The next generation system has a total of 72 physical cores, running at 2.5GHz, using the slightly newer Haswell-EX microarchitecture. Finally, the new system has a total of 96 physical cores, running at 2.2GHz, using the current Broadwell-EX microarchitecture. These differences in core counts, base clock speeds, and microarchitecture make it a little harder to fully understand their benchmark results in a realistic manner.

Table 1 shows some relevant metrics for these three system configurations. The older generation processors have fewer cores, but run at a higher base clock speed. The newer generation processors would be faster than the older generation processors at the same clock speed, but the base clock speed is lower as the core counts have increased with each successive generation flagship processor. The improvements in IPC and single-threaded performance are obscured by lower base clock speeds as the core counts increase, which makes the final score increase less impressive.

 

ProcessorBase ClockTotal System CoresRaw ScoreScore/Core
Xeon E7-4890 v22.8GHz6072512.08
Xeon E7-8890 v32.5GHz72102114.18
Xeon E7-8890 v42.2GHz96129413.48

Table 1: Analysis of ADB Benchmark Results

 

Table 2 shows some metrics from an analysis of some actual and estimated TPC-E benchmark results for those same three system configurations, plus an additional processor choice that I added. The results are pretty similar, which supports the idea that both of these benchmarks are CPU-limited. From a SQL Server 2016 perspective, you are going to be better off from a performance/license cost perspective if you purposely choose a lower core count “frequency-optimized” processor (at the cost of less total system capacity per host).

This is somewhat harder to do with the Intel Xeon E7 v4 family, because of your limited SKU choices. A good processor choice for many workloads would be the 10-core Intel Xeon E7-8891 v4 processor, which has a base clock speed of 2.8GHz and a 60MB L3 cache that is shared by only 10 cores.

If you could spread your workload across two database servers, you would be much better off with two, four-socket servers with the 10-core Xeon E7-8891 v4 rather than one four-socket server with the 24-core Xeon E7-8890 v4. You would have more total system processor capacity, roughly 27% better single-threaded CPU performance, twice the total system memory capacity, and twice the total number of PCIe 3.0 expansion slots. You would also only need 80 SQL Server 2016 Enterprise Edition core licenses rather than 96 core licenses, which would save you about $114K in license costs. That license savings would probably pay for both database servers, depending on their exact configuration.

 

ProcessorBase ClockTotal System CoresEst TPC-E ScoreScore/Core
Xeon E7-4890 v22.8GHz605576.2792.94
Xeon E7-8890 v32.5GHz726964.7596.73
Xeon E7-8890 v42.2GHz969068.0094.46
Xeon E7-8891 v42.8GHz404808.79120.22

Table 2: Analysis of Estimated TPC-E Benchmark Results

 

The Intel document also discusses the “performance” increases seen from moving from Intel DC S3700 SATA drives to Intel DC P3700 PCIe NVMe drives. This is going to be primarily influenced by the advantages of being connected directly to the PCIe bus and the lower latency and overhead of the NVMe protocol compared to the older AHCI protocol.

Finally, they talk about the “performance” increases they measured from enabling the Intel Transactional Synchronization Extensions (TSX) instruction set and the Intel AVX 2.0 instruction set on current generation Intel E7-8800 v4 series processors.

SQL Server 2016 already has hardware support for older SSE/AVX instructions as discussed here and here. I really hope that Microsoft decides to add even more support for newer instruction sets (such as TSX) in SQL Server vNext.