New TPC-E Result for SQL Server 2017

On March 31, 2018, Fujitsu submitted a new TPC-E result for a two-socket PRIMERGY RX2540 M4 system running SQL Server 2017 Enterprise Edition on Windows Server 2016 Standard Edition. The official TPC-E Throughput score was 6,606.75, which is a new record for a two-socket system. It was barely a new record though, since Lenovo had a previous result of 6,598.36 for a two-socket Lenovo ThinkSystem SR650 system also running SQL Server 2017 Enterprise Edition on Windows Server 2016 Standard Edition.

Both systems were using the same flagship 28-core Intel Xeon Platinum 8180 processor which will give you the most overall CPU capacity per socket, along with the highest SQL Server 2017 Enterprise Edition license costs.

Most organizations would be much better off with a lower core count, higher base clock speed processor from the same Intel Xeon Scalable Processor family, which would give them better single-threaded CPU performance and much lower SQL Server 2017 license costs. This is especially true if you can split your database workload across two servers rather than using just one server.

For example, using two, two-socket servers with the faster 12-core Intel Xeon Gold 6146 processor rather than one, two-socket server with the flagship 28-core Intel Xeon Platinum 8180 processor would give you about 10% more CPU capacity (not to mention twice the total memory and I/O capacity), about 32% better single-threaded CPU performance, and also save you about $57K in SQL Server 2017 license costs.

One unfortunate fact is that none of the server vendors besides Lenovo and Fujitsu have even bothered to submit a new TPC-E benchmark since February 2014. I would really like to see this change in the future, with new TPC-E submissions from vendors like Dell and HPE. I would also like to see submissions on AMD EPYC 7000 series machines, both for one and two-socket servers.




Recent TPC-E Results on SQL Server 2017

Lenovo has submitted the two most recent TPC-E OLTP benchmark results, both using SQL Server 2017 running on Windows Server 2016 Standard Edition, using 28-core Intel Xeon Platinum 8180 processors.

The most recent result was for a four-socket Lenovo ThinkSystem SR950 with 3TB of RAM using a 48TB initial database size. This system had an official result of 11,357.28, which is the highest score ever submitted for a four-socket server. This system has a total of 112 physical cores, so if you divide the total score of 11,357.28 by 112, you get a measure of the single-threaded performance of the Intel Xeon Platinum 8180 processor under a full load (where the clock speed of the individual cores will be pretty close to the 2.5GHz base clock speed). In this case, the result is 101.40 score/core.

Back on June 27, 2017, Lenovo submitted a result for a two-socket Lenovo ThinkSystem SR650 with 1.5TB of RAM using a 28.5TB initial database size. This system had an official result of 6,598.36, which is the highest score ever submitted for a two-socket server. This system has a total of 56 physical cores, so if you divide the total score of 6,598.36 by 56, you get a score/core of 117.83, which is significantly higher than the result for the Lenovo ThinkSystem SR950 configured to use four-sockets (using the exact same Intel Xeon Platinum 8180 processor).

I would attribute most of this difference to the added NUMA overhead from a four-socket system, compared to a two-socket system. Another difference, which probably hurt the score of the two-socket system was the fact that it had to be running on a pre-release version of SQL Server 2017, based on the submission date of the benchmark.

This is just another piece of evidence that even with NUMA, capacity does not scale in a linear fashion as you add sockets to a server. Assuming you can split your workload across multiple database servers rather than just one, having two, two-socket servers instead of one, four-socket server will give you both more CPU capacity and better single-threaded CPU performance even when using the exact same model processor.

I would also argue that you could purposely pick a lower core count, but higher base clock speed processor from the same Intel Xeon Scalable Processor Family to find a sweet spot for SQL Server 2017 usage, where you have fewer physical cores to license, with better single-threaded performance across a higher number of servers.




New TPC-E Results for SQL Server 2016

There have been two recent TPC-E OLTP benchmark results published for SQL Server 2016. These include one from Fujitsu and one from Lenovo.

The most recent result, from July 12, 2016 is for a four-socket FUJITSU Server PRIMERGY RX4770 M3 server that is using the latest generation, 14nm 2.2GHz Intel Xeon E7-8890 v4 processor (Broadwell-EX), with a TPC-E throughput score of 8,796.42. As is always the case with TPC-E benchmarks, the hardware vendor used the “flagship”, highest core count processor available from the latest processor family, in this case, a 24-core processor. This helps achieve the highest possible TPC-E throughput score (which is a measure of the total processor capacity of the system), at the cost of quite high SQL Server 2016 licensing costs, since you would have to purchase 96 SQL Server 2016 Enterprise Edition core licenses. This would cost about $684K at full retail price. Fujitsu priced the SQL Server 2016 licenses at $647K in the Executive Summary report.

Another recent result from May 31, 2016 is for a four-socket Lenovo System x3850 X6 that is using the same Intel Xeon E7-8890 v4 processor. This system gets a TPC-E throughput score of 9,068.00, which is about 3% higher than the Fujitsu system. Both systems use a 36TB initial database size, while the Fujitsu system uses 2TB of RAM and the Lenovo system uses 4TB of RAM (which is the license limit for Windows Server 2012 R2). Both systems use all flash storage, with 2.5” SAS SSDs.

Unlike the old TPC-C OLTP benchmark, TPC-E does not require an unrealistically expensive storage subsystem to get good scores. As long as the storage subsystem is “good enough” so that it does not become a bottleneck, then the ultimate TPC-E bottleneck becomes processor performance.

Earlier this year, there were two competing results for two-socket systems from Lenovo and Fujitsu. On March 30, 2016, Fujitsu published a result for a two-socket FUJITSU Server PRIMERGY RX2540 M2 system using the latest generation, 14nm 2.2GHz Intel Xeon E5-2699 v4 processor (Broadwell-EP), with a TPC-E throughput score of 4,734.87. The Intel Xeon E5-2699 v4 has 22 physical cores, so the two-socket system has a total of 44 physical cores that would need SQL Server 2016 licenses that would cost about $313K at full retail price. Fujitsu priced the SQL Server 2016 licenses at $296K in the Executive Summary report.

On March 24, 2016, Lenovo published a result for a Lenovo System x3650 M5 system using the same Intel Xeon E5-2699 v4 processors, with a TPC-E throughput score of 4,938.14, which is about 4% higher than the Fujitsu system. In this case, the Fujitsu system uses 1TB of RAM (with a 19TB initial database size), while the Lenovo system uses 512GB of RAM (with a 20TB initial database size). Both systems use all flash storage, with 2.5” SAS SSDs.

I know that this is a lot of numbers to be throwing around, so a summary of these four systems is shown in Table 1.

 

System Processor Raw Score Total Cores Score/Core
Lenovo System x3850 X6 E7-8890 v4 9,068.00 96 94.45
Fujitsu PRIMERGY RX4770 M3 E7-8890 v4 8,796.42 96 91.63
Lenovo System x3650 M5 E5-2699 v4 4,938.14 44 112.23
Fujitsu PRIMERGY RX2540 M2 E5-2699 v4 4,734.87 44 107.61

Table 1: Recent TPC-E Score Highlights

 

This shows that four-socket Broadwell-EX systems scale relatively well compared to older Xeon E7 processor families, meaning that the drop in single-threaded performance compared to equivalent two-socket Xeon E5 processor families is not as large as it used to be. There is still a gap though, which means that you are losing some scalability as you make the jump from a two-socket system to a four-socket system. If you can split your workload across two database servers, you would be better off to have two, two-socket servers rather than one, four-socket server. You would have more total processor capacity, better single-threaded performance, more PCIe expansion slots and lower SQL Server license costs.

An even better alternative for most people would be to use a lower core count, “frequency optimized” processor, instead of the flagship processor. For example, if you used the eight-core, 3.2 GHz Intel Xeon E5-2667 v4 processor in a two-socket server, you would get the estimated results shown in Table 2.

 

System Processor Raw Score Total Cores Score/Core
Estimated Two-Socket System E5-2667 v4 2611.91 16 163.24

Table 2: Estimated TPC-E Results

If you had four, two-socket systems with the eight-core, 3.2 GHz Intel Xeon E5-2667 v4 processor, instead of one, four-socket system with the 24-core 2.2 GHz Intel Xeon E7-8890 v4 processor, you would have about 15.2% more total processor capacity, about 72.9% better single-threaded performance, and a 33.3% lower SQL Server 2016 licensing cost (which would be about $227K in license savings). You would have the same total memory capacity, and more than three times the number of PCIe slots.