Intel Xeon E7 Processor Generational Performance Comparison

Intel has a fairly recent document titled Accelerated Operations for Telecom and Financial Services which is also listed under Accelerate OLTP Database Performance with Intel TSX. It describes the “performance” increases seen with the AsiaInfo ADB from moving from 2.8GHz Intel Xeon E7-4890 v2 (Ivy Bridge-EX), to 2.5GHz Intel Xeon E7-8890 v3 (Haswell-EX), and finally to 2.2GHz Intel Xeon E7-8890 v4 (Broadwell-EX) processors, as shown in Figure 1.

 

Figure 1: Speedup from Successive Processor Generations

 

This workload is described as “AsiaInfo ADB Database OCS k-tpmC”, while the AsiaInfo ADB is described as “a scalable OLTP database that targets high performance and mission critical businesses such as online charge service (OCS) in the telecom industry”, that runs on Linux.

The reason I have performance in quotes above is because what they are really measuring is closer to what I would call capacity or scalability. Their topline result is “Thousands of Transactions per Minute” as measured with these different hardware and storage configurations.

The key point to keep in mind with these types of benchmarks is whether they are actually comparing relatively comparable systems or not. In this case, the systems are quite similar, except for the core counts of the successive processor models (and the DD3 vs. DDR4 memory support). Here are the system components, as listed in the footnotes of the document:

Baseline: Four-sockets, 15-core Intel Xeon E7-4890 v2, 256GB DDR3/1333 DIMM, Intel DC S3700 SATA for OS, (2) 2TB Intel DC P3700 PCIe NVMe for storage, 10GbE Intel X540-AT2 NIC

Next Generation: Four-sockets, 18-core Intel Xeon E7-8890 v3, 256GB DDR4/1600 LVDIMM, Intel DC S3700 SATA for OS, (2) 2TB Intel DC P3700 PCIe NVMe for storage, 10GbE Intel X540-AT2 NIC

New: Four-sockets, 24-core Intel Xeon E7-8890 v4, 256GB DDR4/1600 LVDIMM, Intel DC S3700 SATA for OS, (2) 2TB Intel DC P3700 PCIe NVMe for storage, 10GbE Intel X540-AT2 NIC

The baseline system has a total of 60 physical cores, running at 2.8GHz, using the older Ivy Bridge-EX microarchitecture. The next generation system has a total of 72 physical cores, running at 2.5GHz, using the slightly newer Haswell-EX microarchitecture. Finally, the new system has a total of 96 physical cores, running at 2.2GHz, using the current Broadwell-EX microarchitecture. These differences in core counts, base clock speeds, and microarchitecture make it a little harder to fully understand their benchmark results in a realistic manner.

Table 1 shows some relevant metrics for these three system configurations. The older generation processors have fewer cores, but run at a higher base clock speed. The newer generation processors would be faster than the older generation processors at the same clock speed, but the base clock speed is lower as the core counts have increased with each successive generation flagship processor. The improvements in IPC and single-threaded performance are obscured by lower base clock speeds as the core counts increase, which makes the final score increase less impressive.

 

Processor Base Clock Total System Cores Raw Score Score/Core
Xeon E7-4890 v2 2.8GHz 60 725 12.08
Xeon E7-8890 v3 2.5GHz 72 1021 14.18
Xeon E7-8890 v4 2.2GHz 96 1294 13.48

Table 1: Analysis of ADB Benchmark Results

 

Table 2 shows some metrics from an analysis of some actual and estimated TPC-E benchmark results for those same three system configurations, plus an additional processor choice that I added. The results are pretty similar, which supports the idea that both of these benchmarks are CPU-limited. From a SQL Server 2016 perspective, you are going to be better off from a performance/license cost perspective if you purposely choose a lower core count “frequency-optimized” processor (at the cost of less total system capacity per host).

This is somewhat harder to do with the Intel Xeon E7 v4 family, because of your limited SKU choices. A good processor choice for many workloads would be the 10-core Intel Xeon E7-8891 v4 processor, which has a base clock speed of 2.8GHz and a 60MB L3 cache that is shared by only 10 cores.

If you could spread your workload across two database servers, you would be much better off with two, four-socket servers with the 10-core Xeon E7-8891 v4 rather than one four-socket server with the 24-core Xeon E7-8890 v4. You would have more total system processor capacity, roughly 27% better single-threaded CPU performance, twice the total system memory capacity, and twice the total number of PCIe 3.0 expansion slots. You would also only need 80 SQL Server 2016 Enterprise Edition core licenses rather than 96 core licenses, which would save you about $114K in license costs. That license savings would probably pay for both database servers, depending on their exact configuration.

 

Processor Base Clock Total System Cores Est TPC-E Score Score/Core
Xeon E7-4890 v2 2.8GHz 60 5576.27 92.94
Xeon E7-8890 v3 2.5GHz 72 6964.75 96.73
Xeon E7-8890 v4 2.2GHz 96 9068.00 94.46
Xeon E7-8891 v4 2.8GHz 40 4808.79 120.22

Table 2: Analysis of Estimated TPC-E Benchmark Results

 

The Intel document also discusses the “performance” increases seen from moving from Intel DC S3700 SATA drives to Intel DC P3700 PCIe NVMe drives. This is going to be primarily influenced by the advantages of being connected directly to the PCIe bus and the lower latency and overhead of the NVMe protocol compared to the older AHCI protocol.

Finally, they talk about the “performance” increases they measured from enabling the Intel Transactional Synchronization Extensions (TSX) instruction set and the Intel AVX 2.0 instruction set on current generation Intel E7-8800 v4 series processors.

SQL Server 2016 already has hardware support for older SSE/AVX instructions as discussed here and here. I really hope that Microsoft decides to add even more support for newer instruction sets (such as TSX) in SQL Server vNext.

 

 

SQL Server 2014 Hardware Analysis Case Study

Imagine that you have been given the go-ahead to upgrade your entire data platform stack from SQL Server 2008 Enterprise Edition to SQL Server 2014 Enterprise Edition. You need to come up with a recommendation for your new database server hardware, looking to maximize performance while controlling your SQL Server 2014 license costs.

To help you with that effort, here is an example hardware analysis comparing an existing legacy four-socket server (a Dell PowerEdge R815) with four AMD Opteron 6168 processors to a new four-socket server (a Dell PowerEdge R920) with newer 22nm Intel Xeon E7 v2 Ivy Bridge-EX processors.

For a Dell PowerEdge R920, I would be looking at one of these three processors:

1. Xeon E7-8857 v2   (12 cores, 3.0 GHz base clock speed)

2. Xeon E7-8891 v2   (10 cores, 3.2 GHz base clock speed)

3. Xeon E7-8893 v2   (6 cores, 3.4 GHz base clock speed)

These three candidate processors all have higher base clock speeds and lower physical core counts than some other more common choices, such as the fifteen-core Xeon E7-4890 v2.

The closest equivalent AMD-based system I could find in the TPC-E benchmark results (to the legacy system) was an HP ProLiant BL685c G7 Blade Server with four, 2.2GHz AMD Opteron 6174 processors and 512GB of RAM, with an actual raw TPC-E score of 1464.12. The raw TPC-E score is a good way of measuring the overall CPU capacity of a system.

Dividing this score by the number of physical cores in the system gives us a score/core of 30.5, which is a good measure of single-threaded processor performance. Since the legacy system has slower 1.9GHz AMD Opteron 6168 processors (from the same generation and family), we simply need to adjust for the clock speed difference. Taking 1.9GHz divided by 2.2 GHz is 0.8636. Taking the actual 1464.12 score times 0.8636 gives us an estimated TPC-E score of 1264.46 for the legacy system. Dividing that by 48 physical cores gives an estimated score/core of 26.34 for the legacy system.

There is an actual TPC-E result for a four-socket IBM System x3850 X6 with four, 15-core 2.8GHz Intel Xeon E7-4890 v2 processors and 2TB of RAM, with a raw TPC-E score of 5576.27. Dividing this actual score by 60 physical cores gives us an actual score/core of 92.94.

We can adjust this actual result for the three candidate processors listed above to take into account the difference in core counts and base clock speeds to get estimated TPC-E scores for a four-socket system with each of those processors since they are from the same generation and family.

1. Xeon E7-8857 v2               5576.27 original score, times .80 (core count difference), times 1.0714 (clock speed difference), is 4779.53 divided by 48 total physical cores is 99.57 score/core

2. Xeon E7-8891 v2               5576.27 original score, times .66 (core count difference), times 1.1428 (clock speed difference), is 4233.73 divided by 40 total physical cores is 105.84 score/core

3. Xeon E7-8893 v2               5576.27 original score, times .40 (core count difference), times 1.2142 (clock speed difference), is 2708.28 divided by 24 total physical cores is 112.84 score/core

Comparing the legacy system to the actual new four-socket TPC-E result and my estimates for the other three processors, gives us this summary:

Processor                        TPC-E Score        Score/Core         Total Physical Cores     SQL 2014 License Cost (EE)

Opteron 6168                    1264.46                 26.34                     48                             $329,952.00     ($274,464.00 with AMD Core Factor discount)

Opteron 6174                    1464.12                 30.50                     48                             $329,952.00     ($274,464.00 with AMD Core Factor discount)                        

Xeon E7-4890 v2               5576.27                 92.94                     60                             $395,942.00

Xeon E7-8857 v2               4779.53                 99.57                     48                             $329,952.00       

Xeon E7-8891 v2               4233.73                 105.84                   40                             $274,960.00

Xeon E7-8893 v2               2708.28                 112.84                   24                             $164,976.00

This means that we could choose from having from roughly four times better single-threaded processor performance using the Xeon E7-8893 v2 processor or from having roughly four times more processor capacity using the Xeon E7-8857 v2 processor in a new system compared to the legacy system, depending on which processor we choose. The difference in SQL Server 2014 Enterprise Edition license costs between the different processor choices is quite dramatic. For example, going from the twelve-core processor to the faster ten-core processor lowers your SQL Server license costs by about as much as the actual server would cost.

Recommended Intel Processors For SQL Server 2014 OLTP Workloads

If you are in the process of evaluating and selecting the components for a new database server to run an OLTP workload on SQL Server 2014 Enterprise Edition, you have several initial choices that you have to make as a part of the decision process. First you have to decide whether you want to go with an AMD-based server or an Intel-based server. Unfortunately, I cannot recommend that you use an AMD processor for SQL Server 2012/2014 OLTP workloads, due to the combination of low single-threaded performance and high SQL Server licensing costs (even with the 25% discount from the SQL Server 2012 Core Factor Table).

Next, you need to decide on the server socket count, which means choosing a single-socket, dual-socket, quad-socket, or eight-socket server (at least in the commodity server market). After you choose the socket count, you need to decide exactly which of the available processors you want to use in that model server. Looking at the choices for several current model servers from the major system vendors, you will discover that you will have to pick from around 15-20 different specific processors. All of this can be a little overwhelming to consider, but I urge you to do some research, and to choose carefully. Letting someone else pick your processors, who may not be familiar with SQL Server 2012/2014 licensing and the demands of different database workload types, could be a lasting, costly mistake.

With the core-based licensing in SQL Server 2012/2014 Enterprise Edition, you need to pay closer attention to your physical core counts, and think about whether you are more concerned with extra scalability (from having more physical cores), or whether you want the absolute best OLTP query performance (from having a processor with fewer cores but a higher base clock speed from the same processor generation). Unlike in the good old days of SQL Server 2008 R2 and older, having more physical cores will cost you more for your SQL Server 2012/2014 Enterprise Edition licensing costs. You really need to think about what you are trying to accomplish with your database hardware. For example, if you can partition your workload between multiple servers, then you could see much better OLTP performance from using two dual-socket servers instead of one quad-socket server.

So, here are the Intel processors that I recommend in mid-April 2014 for OLTP workloads, with their high-level specifications and some commentary.

One-Socket Server (High Capacity)

Intel Xeon E5-2470 v2 (22nm Ivy Bridge-EN)

  • 2.4 GHz, 25MB L3 cache, 8 GT/s Intel QPI 1.1
  • 10 cores, Turbo Boost 2.0 (3.2 GHz), hyper-threading
  • Three memory channels, six memory slots per processor, 96GB RAM with 16GB DIMMs

One-Socket Server (High Performance)

Intel Xeon E3-1280 v3 (22nm Haswell)

  • 3.6 GHz, 8MB L3 cache, 5 GT/s Intel QPI 1.1
  • 4 cores, Turbo Boost 2.0 (4.0 GHz), hyper-threading
  • Two memory channels, four memory slots per processor, 32GB RAM with 8GB DIMMs

At least one Tier One vendor (Dell) is offering a single-socket server with the new Ivy Bridge-EN processor family. This is the entry level, two-socket capable Ivy Bridge processor that has lower clock speeds and less memory bandwidth than the Ivy Bridge-EP processor family, so it is NOT a good choice for a two-socket server. Despite this, it does give you the ability to have ten physical cores and 96GB of RAM in a single-socket server. You would see much better single-threaded OLTP performance from a new 3rd generation E3-1280 v3 Haswell processor, but you would be limited to four physical cores and 32GB of RAM. Again, if you can partition your workload, two single-socket Xeon E3-1280 v3 based servers would give you much better OLTP performance than one Xeon E5-2470 v2 based server with a lower SQL Server 2012/2014 Enterprise Edition licensing cost.

Two-Socket Server (High Capacity)

Intel Xeon E5-2697 v2 (22nm Ivy Bridge-EP)

  • 2.7 GHz, 30MB L3 cache, 8 GT/s Intel QPI 1.1
  • 12 cores, Turbo Boost 2.0 (3.5 GHz), hyper-threading
  • Four memory channels, twelve memory slots per processor, 384GB RAM with 16GB DIMMs

Two-Socket Server (High Performance)

Intel Xeon E5-2643 v2 (22nm Ivy Bridge-EP)

  • 3.5 GHz, 25MB L3 cache, 8 GT/s Intel QPI 1.1
  • 6 cores, Turbo Boost 2.0 (3.8 GHz), hyper-threading
  • Four memory channels, twelve memory slots per processor, 384GB RAM with 16GB DIMMs

Choosing the top of the line, 12-core Xeon E5-2697 v2 would cost twice as much for the SQL Server license costs as the 6 core Xeon E5-2643 v2. Once again, if you can partition your workload, two dual-socket Xeon E5-2643 v2 based servers would give you better overall OLTP performance than one Xeon E5-2697 v2 based server for the same SQL Server 2012/2014 Enterprise Edition licensing cost. You would have more total memory between the two servers, and more potential I/O capacity, at the cost of buying two servers instead of one server.  In some situations, this strategy might not make sense, especially with the added management and maintenance overhead of two servers instead of one.

Four-Socket Server (High Capacity)

Intel Xeon E7-4890 v2 (22nm Ivy Bridge-EX)

  • 2.8 GHz, 37.5MB L3 cache, 8 GT/s Intel QPI 1.1
  • 15 cores, Turbo Boost 2.0 (3.4 GHz), hyper-threading
  • Four memory channels, twenty-four memory slots per processor, 1536GB RAM with 16GB DIMMs

Four-Socket Server (High Performance)

Intel Xeon E7-8893 v2 (22nm Ivy Bridge-EX)

  • 3.4 GHz, 37.5MB L3 cache, 8 GT/s Intel QPI 1.1
  • 6 cores, Turbo Boost 2.0 (3.7 GHz), hyper-threading
  • Four memory channels, twenty-four memory slots per processor, 1536GB RAM with 16GB DIMMs

The brand new Xeon E7-8893 v2 will give you significantly better single-threaded OLTP query performance in a four-socket server than the E7-4890 v2, at the cost of less total capacity because of the lower physical core count. The E7-8893 v2 is a “frequency-optimized” model that is actually meant for eight-socket servers, but is available in several new four-socket server models from the major server vendors.

It would save you enough on SQL Server 2012/2014 Enterprise Edition license costs (about $250K) to buy the server itself and still have lots of money left over. I even think it is a better choice in many situations than a two-socket server with the 12-core, Intel Xeon E5-2697 v2, since you will have much higher single-threaded performance and much higher memory capacity. The downside is a higher hardware cost, since you will be buying four, quite expensive processors.

Eight-Socket Server (High Capacity)

Intel Xeon E7-8890 v2 (22nm Ivy Bridge-EX)

  • 2.8 GHz, 37.5MB L3 cache, 8 GT/s Intel QPI 1.1
  • 15 cores, Turbo Boost 2.0 (3.4 GHz), hyper-threading
  • Four memory channels, twenty-four memory slots per processor, 3072GB RAM with 16GB DIMMs (eight sockets)

Eight-Socket Server (High Performance)

Intel Xeon E7-8891 v2 (22nm Ivy Bridge-EX)

  • 3.2 GHz, 37.5MB L3 cache, 8 GT/s Intel QPI 1.1
  • 10 cores, Turbo Boost 2.0 (3.7 GHz), hyper-threading
  • Four memory channels, twenty-four memory slots per processor, 3072GB RAM with 16GB DIMMs (eight sockets)

You can choose a lower core count, frequency-optimized model, that has a higher clock speed for better single-threaded performance. The lower core count will also save you a LOT of money on SQL Server 2012/2014 licensing costs, although you will give up that extra load capacity with few total processor cores available.

I always like to hear what you think about my posts, so be sure to let me know!