Bigger Database Servers Get Faster

For quite some time, I have been talking about how current Intel-based four-socket database servers have had significantly lower single-threaded processor performance than current Intel-based two-socket database servers. This is because the first generation Intel Xeon E7 processors were using the relatively old 32nm Westmere microarchitecture that was introduced in early 2011 for the initial Xeon E7 (Westmere-EX) product line.

These E7 processors also use much lower base and turbo clock speeds than current Xeon E5 v2 processors, which also hurts their single-threaded processor performance. They do have higher overall concurrent load capacity due to higher total memory capacity and more total processor cores, but the individual processor cores in most four-socket servers have been much slower than what you find in a modern two-socket server. Simply put, bigger servers are not faster servers. It is like comparing an eighteen wheeler truck to a Tesla Model S.

Now, that old assessment is going to change somewhat, with the release of the 22nm Intel Xeon E7 Processor v2 Family (Ivy Bridge-EX), and new model servers from the major server vendors that have even higher memory capacity, PCI-E 3.0 support, and 12Gbps SAS/SATA support, along with much faster RAID controllers. These processors are a substantial improvement over the previous generation 32nm Intel Xeon E7 processors (Westmere-EX) that have been available since early 2011.

It will still be possible to configure a new two-socket server, such as a Dell PowerEdge R720, with an appropriate 22nm Intel Xeon E5-2600 Processor v2 Family (Ivy Bridge-EP) processor that will have better single-threaded performance than a new four-socket server such as a Dell PowerEdge R920, but the gap will not be nearly as large as it once was.

The actual good news here for a database professional is the fact that you will be able to have a four-socket server that has as much load capacity as a previous generation, eight-socket server, that also performs nearly as well as a current two-socket server, while paying 25% less for your SQL Server 2012/2014 license costs (compared to a previous generation eight-socket server). This is a pretty big gift from Intel!

A more pessimistic view is that your SQL Server 2012/2014 license costs could rise by 50% as you move from an existing server equipped with four, ten-core Xeon E7-4870 processors (with a total of forty physical cores) to a new server with four, fifteen-core Xeon E7-4890 v2 processors (with a total of of sixty physical cores). For reasons known only to Intel, the lower core count SKUs in the Xeon E7-48xx v2 product family are not “frequency optimized”, meaning they do not have higher clock speeds than the high-end, E7-4890 v2 processor. The base and turbo clock speeds of the best lower core-count SKUs in the E7- 48xx v2 family actually drop off pretty quickly as the core counts go down. The shared-L3 cache sizes also drop off very quickly, as does the processor price, as you can see in Table 1.

Processor Physical Cores L3 Cache Base Clock Turbo Clock Price
E7-4890 v2 15 37.5 MB 2.8GHz 3.4GHz $6,619.00
E7-4860 v2 12 30 MB 2.6GHz 3.2GHz $3,838.00
E7-4830 v2 10 20 MB 2.2GHz 2.7GHz $2,059.00
E7-4820 v2 8 16 MB 2.0GHz 2.5GHz $1,446.00
E7-4809 v2 6 12 MB 1.9GHz N/A $1,223.00

Table 1: Selected Intel E7-48xx v2 Processors

 

With the Xeon E4-48xx v2 product family, you are going to want to choose either the E7-4890 v2 or the E7-4860 v2 model processors in most situations, since the lower core count processors are giving up a substantial amount of performance due to their lower clock speeds and smaller L3 cache sizes. If you really want to reduce your core counts to reduce your SQL Server 2012/2014 license costs, you would be better off with the Intel Xeon E5-26xx v2 product family processors that are used in two socket servers. Another alternative is the upcoming Intel Xeon E5-46xx v2 product family processors that are used in four-socket servers.

Either of those choices would be better than one of the lower core count processors in the E7-48xx v2 product family, at least from a pure processor performance perspective.

Intel also has refreshed the E7-88xx v2 product family that is meant for eight-socket and larger servers. For some reason (probably for HPC use), Intel does have “frequency-optimized”, lower core-count models in this product family, as you can see in Table 2.

Processor Physical Cores L3 Cache Base Clock Turbo Clock Price
E7-8890 v2 15 37.5 MB 2.8GHz 3.4GHz $6,841.00
E7-8857 v2 12 30 MB 3.0GHz 3.6GHz $3,838.00
E7-8891 v2 10 37.5 MB 3.2GHz 3.7GHz $6,841.00
E7-8893 v2 6 37.5 MB 3.4GHz 3.7GHz $6,841.00

Table 2: Selected Intel E7-88xx v2 Processors

 

I could see some scenarios where you might want to get an eight-socket server with the six-core E7-8893 v2, so that you could have the same physical core count, while having double the memory capacity and much better single-threaded processor performance than a four-socket server with the twelve-core E7-4860 v2. The hardware cost would be significantly higher, since you would be buying eight processors for $6,841.00 each instead of four processors at $3,838.00 each, but for many organizations, that would not be a major issue.

Some server vendors may offer the Xeon E7-88xx v2 processors in their four-socket server models, since they are pin-compatible, which would give us a lot more flexibility as far as processor selection goes. I really wish Intel had “frequency-optimized” models in their Xeon E7-48xx v2 product family, to make this even easier.

A SQL Server Hardware Tidbit a Day – Day 29

For Day 29 of the series, I will talk about AMD Turbo CORE technology. AMD Turbo CORE is a technology that was first introduced in the AMD Phenom II X4 desktop processor, but the way AMD implemented it in the Bulldozer family and Piledriver family of processors is greatly enhanced. AMD Turbo CORE is similar to Intel Turbo Boost technology in concept (although AMD claims that it works better).  According to AMD:

AMD Turbo CORE is deterministic, governed by power draw, not temperature as other competing products are. This means that even in warmer climates you’ll be able to take advantage of that extra headroom if you choose. This helps ensure a max frequency is workload dependent, making it more consistent and repeatable

AMD Turbo CORE allows individual cores in the processor to speed up from the base clock speed up to the TDP level, automatically adding extra single-threaded performance for the processor. Conceptually, it is the opposite of AMD PowerNow! technology. Instead of trying to watch for usage patterns and lowering the processor core speed to try to reduce power consumption, Turbo CORE is watching the power consumption to see how high it can move the clock speed up.

This feature, which is new to AMD server processors, allows individual cores to use the extra power headroom between average and maximum power, turning it into more clock speed. Bulldozer implements a significantly more aggressive version of this capability than the AMD Phenom desktop processor. Should the processor get too close to the TDP power limit, it will automatically throttle back somewhat to ensure that it is continuing to operate within the specified TDP guidelines. This allows for significantly higher maximum clock speeds for the individual cores.

AMD has stated that Bulldozer will boost the clock speed of all 16 cores by 500MHz, even when all cores are active with server workloads. Even higher boost states available with half of the cores active, anywhere from 700Mhz to 900MHz. With the Bulldozer and Piledriver processors you see processors marketed with a base and a maximum frequency, base will reflect the actual clock speed on the processor and max will reflect the highest AMD Turbo CORE state.

Just like with Intel Turbo Boost technology, I think this is a very beneficial feature that you should take advantage of for database server usage. I don’t see any controversy here (such as with hyper-threading). Since Microsoft changed over to core-based licensing for SQL Server 2012, it is much less practical to choose an AMD processor (especially for an OLTP workload) because of their high physical core counts and low single-threaded performance.

One scenario where an AMD-based database server could make some sense would be for a dedicated OLAP server, using SQL Server 2012 Business Intelligence Edition, with server-based licensing. Having lots of physical cores without having to pay a huge amount for your SQL Server 2012 licenses is not a bad scenario.

Deciding What Processor to Choose for SQL Server 2012

If you have read my SQL Server Hardware book, watched my Understanding Server Hardware course on Pluralsight or ever heard me speak at a conference, you are probably aware of my very strong advocacy for modern, two-socket Intel-based database servers for many database server workloads. I make this argument because of the excellent single-threaded processor performance, high memory density, and high I/O capacity possible from the latest two-socket servers that are available from all of the major hardware vendors. Because of the much higher sales volume in the two-socket server space (compared to the four-socket and above space), Intel refreshes their two-socket capable processors much more frequently than the processors for higher socket count systems.

Since 2006, Intel has been using a Tick-Tock release model for their processors. What this means is that every two years, they have a Tock release that uses a completely new microarchitecture, which is followed a year later by a Tick release that has a manufacturing process technology shrink, but uses the same microarchitecture as the previous Tock release. Using a smaller process technology typically allows the processor to use less energy and have slightly better performance than the previous Tock release, but the performance jump is not nearly as great as you get with a Tock release. Tick releases are usually pin-compatible with the previous Tock release, so that lets the hardware systems vendors start using the Tick release processor in their existing models much more quickly, usually with just a BIOS update.

Table 1 shows the Tick-Tock release cadence for Intel processors from 2008 through 2016. The dates are obviously more speculative as we go further into the future, since Intel may decide to slow down their release cycle if AMD is unable to give them more viable competition in the next few years.

Year Type Process Code Name
2008 Tock 45nm Nehalem
2010 Tick 32nm Westmere
2011 Tock 32nm Sandy Bridge
2012 Tick 22nm Ivy Bridge
2013 Tock 22nm Haswell
2014 Tick 14nm Rockwell
2015 Tock 14nm Skylake
2016 Tick 10nm Skymont

Table 1: Tick-Tock Release Listing

Figure 1 shows how the Tick-Tock model works, with the Tock release (in blue) using the existing manufacturing process technology, while the Tick release (in orange) moves to a new, smaller manufacturing process technology. New Intel processors are first released for the desktop market, and then for the mobile market, followed later by the single-socket server market, the two-socket server market and finally the four-socket server (and above) market coming last. The four-socket server server market does not always get every release because of the lower sales volume and slower release cycle. This explains why there has not been a Sandy Bridge-EX release for the four-socket market.

The Tick-Tock model through the years

Figure 1: Tick-Tock Model

As you can see from Table 1 and Figure 1, Sandy Bridge is a Tock release that came after the Westmere Tick release. The Xeon E5 product family is Sandy Bridge-EP, which is a newer microarchitecture compared to the Xeon E7 product family, which is Westmere-EX. This difference is very important for SQL Server 2012 core-based licensing purposes! Sandy Bridge has significantly better single-threaded performance compared to Westmere and it also has lower physical core counts. Sandy Bridge-EP is available for both two-socket and four-socket servers, while Westmere-EX is available for two-socket, four-socket, and eight-socket servers.

Currently, we have the Intel Xeon E5-2600 product family (Sandy Bridge-EP) for the two-socket space, the Intel Xeon E5-4600 product family (Sandy Bridge-EP) for the four-socket space, along with the older Intel Xeon E7-2800 product family (Westmere-EX) for the two-socket space, the Intel Xeon E7-4800 product family (Westmere-EX) for the four-socket space, and the Intel Xeon E7-8800 product family (Westmere-EX) for the eight-socket space. The Intel Xeon E7 family was released in Q2 2011, the Xeon E5-2600 family was released in Q1 2012, and the Xeon E5-4600 family was released in Q2 2012. On November 5, 2012, Fujitsu published a new TPC-E OLTP benchmark result for a four-socket, Intel Xeon E5-4650 PRIMERGY RX500 S7 system with a score of 2651.27. This is the first published TPC-E result for the newer, four-socket capable Intel Xeon E5-4600 series, so I think it merits some comparison and discussion.

Table 2 shows the TPC-E scores for five systems that use the the five different Sandy Bridge and Westmere processors that I have been discussing so far. It shows that the two-socket Xeon E5-2690 system has the best single-threaded performance, (when you divide the raw score by the number of physical cores) and that the four-socket Xeon E5-4650 system comes in second place. We also see that the scaling goes down quite a bit as we move from two sockets to four sockets with the Xeon E5 family. If we had perfectly linear scaling, you would expect a four-socket system to have twice the score of a two-socket system that was using the same processor, which is not the case here. Part of this can be attributed to the clock speed difference between the 2.9GHz Xeon E5-2690 and the 2.7GHz Xeon E5-4650.

We can also see that the Intel Xeon E5 family does quite a bit better on TPC-E than the Intel Xeon E7 family does, which is no surprise, since we are comparing the newer Sandy Bridge-EP to the older Westmere-EX. From a performance perspective, the two-socket Xeon E5-2690 does much better than the two-socket Xeon E7-2870. In my opinion, you really should not be using the two-socket Xeon E7-2870 for SQL Server 2012 because of its lower single-threaded performance and higher physical core counts (which means a higher SQL Server 2012 licensing cost).

The four-socket Xeon E7-4870 system has a higher raw score than the four-socket E5-4650 system, but it has 40 physical cores compared to 32 physical cores, which means it will cost significantly more for for SQL Server 2012 core licenses, while it will have lower single-threaded performance. Again, I would prefer a Xeon E5-4650 based system over a Xeon E7-4870 based system for an OLTP workload. You can also see that scaling takes a pretty big hit when you go from four-socket systems to eight-socket systems, even though these are all NUMA-based systems here.

System Sockets Total Cores Processor Model TPC-E Score TPC-E Score/Core
Fujitsu PRIMERGY RX300 S7 2 16 Intel Xeon E5-2690 1871.81 116.99
Fujitsu PRIMERGY RX500 S7 4 32 Intel Xeon E5-4650 2651.27 82.85
IBM System x3690 X5 2 20 Intel Xeon E7-2870 1560.70 78.04
IBM System x3850 X5 4 40 Intel Xeon E7-4870 2862.61 71.57
NEC Express5800/A1080a-E 8 80 Intel Xeon E7-8870 4614.22 57.68

Table 2: TPC-E Score Comparisons for Selected Intel Processors

 

System Sockets Total Cores Processor Model TPC-E Score SQL 2012 License Cost Cost/TPC-E
Fujitsu PRIMERGY RX300 S7 2 16 Intel Xeon E5-2690 1871.81 $109,984 $57.76/TPC-E
Fujitsu PRIMERGY RX500 S7 4 32 Intel Xeon E5-4650 2651.27 $219,968 $82.97/TPC-E
IBM System x3690 X5 2 20 Intel Xeon E7-2870 1560.70 $137,480 $88.09/TPC-E
IBM System x3850 X5 4 40 Intel Xeon E7-4870 2862.61 $274,960 $96.05/TPC-E
NEC Express5800/A1080a-E 8 80 Intel Xeon E7-8870 4614.22 $549,920 $119.18/TPC-E

Table 3: SQL Server 2012 Enterprise Edition License Cost Comparisons by TPC-E Score

Table 3 shows the same five systems with the SQL Server 2012 Enterprise Edition license cost information added. This shows that a two-socket system with Xeon E5-2690 processors gives you the lowest licensing cost per TPC-E score, while Table 2 shows that it also gives you the best TPC-E score per physical processor core. Unless you must have more than 384GB of RAM (with affordable 16GB DIMMs) or more than 768GB of RAM (with much more expensive 32GB DIMMs), there are not too many reasons to go with a higher core-count system for an OLTP workload.

One possible reason is that you are concerned that a two-socket Xeon E5-2690 system simply cannot handle your total database workload. Two processors with a total of 16 physical cores is simply not enough computing capacity for your workload. Depending on the magnitude of your workload, that may be true. If you are currently running a four-socket or larger system that is more than a couple of years old, that may not be true. Bigger systems are not faster systems, and the total load capacity of two socket systems has increased dramatically in the last year with Sandy Bridge-EP.  If you are convinced that a two-socket Xeon E5-2690 cannot handle your workload, I would look at a four-socket Xeon E5-4650 system, which also lets you go up to 1.5TB of RAM with 32GB DIMMs.  Keep in mind that both Xeon E5-2690 and Xeon E5-4650 systems have PCI-E 3.0 support, which gives you twice the I/O bandwidth of the older PCI-E 2.0 standard found in Westmere-EX servers.

If all of this has made your head hurt, you can always contact us for some deeper hardware consulting!