CPU-Z 1.78 is Available

On November 21, CPU-Z 1.78 was released. This is a great tool for getting all the technical details about your processors and checking on their current clock speed.

The main improvement in this version is support for Intel Kaby Lake processors, which are already available in the mobile space. It looks like the desktop version of Kaby Lake will be released at CES in January. Tom’s Hardware did some benchmarking of an early sample of a Core i7-7700K that someone supplied to them, as detailed here.

 

image

Figure 1: CPU-Z 1.78 CPU Tab

 

Recent versions of CPU-Z have added a quick CPU benchmarking function that is very useful for running a brief CPU benchmark that measures single-threaded CPU performance and multi-threaded CPU performance. Each test only takes about 7-8 seconds, and is useful for a number of reasons.

 

image

Figure 2: CPU-Z 1.78 Bench Tab For Intel Core i7-6700K System

 

First, you can get a quick gauge of your single-threaded CPU performance (which equates to the “speed” of the processor), and your multi-threaded CPU performance (which equates to the CPU capacity of the entire system). This is useful for comparing different processors and systems, whether they are physical or virtual. You can measure the performance of a VM versus running bare metal on the host, or you can measure different VM configurations. You can also compare your numbers to the built-in reference processors, or submit your results and compare them to other systems results that are stored online.

Second, you can use the Bench CPU button to briefly stress your processors, and then quickly switch to the main CPU tab while the test is running, to see what happens to your CPU core clock speeds, in order to understand whether you have power management configured correctly to get the performance benefits of Intel Turbo Boost.

New TPC-E Results for SQL Server 2016

There have been two recent TPC-E OLTP benchmark results published for SQL Server 2016. These include one from Fujitsu and one from Lenovo.

The most recent result, from July 12, 2016 is for a four-socket FUJITSU Server PRIMERGY RX4770 M3 server that is using the latest generation, 14nm 2.2GHz Intel Xeon E7-8890 v4 processor (Broadwell-EX), with a TPC-E throughput score of 8,796.42. As is always the case with TPC-E benchmarks, the hardware vendor used the “flagship”, highest core count processor available from the latest processor family, in this case, a 24-core processor. This helps achieve the highest possible TPC-E throughput score (which is a measure of the total processor capacity of the system), at the cost of quite high SQL Server 2016 licensing costs, since you would have to purchase 96 SQL Server 2016 Enterprise Edition core licenses. This would cost about $684K at full retail price. Fujitsu priced the SQL Server 2016 licenses at $647K in the Executive Summary report.

Another recent result from May 31, 2016 is for a four-socket Lenovo System x3850 X6 that is using the same Intel Xeon E7-8890 v4 processor. This system gets a TPC-E throughput score of 9,068.00, which is about 3% higher than the Fujitsu system. Both systems use a 36TB initial database size, while the Fujitsu system uses 2TB of RAM and the Lenovo system uses 4TB of RAM (which is the license limit for Windows Server 2012 R2). Both systems use all flash storage, with 2.5” SAS SSDs.

Unlike the old TPC-C OLTP benchmark, TPC-E does not require an unrealistically expensive storage subsystem to get good scores. As long as the storage subsystem is “good enough” so that it does not become a bottleneck, then the ultimate TPC-E bottleneck becomes processor performance.

Earlier this year, there were two competing results for two-socket systems from Lenovo and Fujitsu. On March 30, 2016, Fujitsu published a result for a two-socket FUJITSU Server PRIMERGY RX2540 M2 system using the latest generation, 14nm 2.2GHz Intel Xeon E5-2699 v4 processor (Broadwell-EP), with a TPC-E throughput score of 4,734.87. The Intel Xeon E5-2699 v4 has 22 physical cores, so the two-socket system has a total of 44 physical cores that would need SQL Server 2016 licenses that would cost about $313K at full retail price. Fujitsu priced the SQL Server 2016 licenses at $296K in the Executive Summary report.

On March 24, 2016, Lenovo published a result for a Lenovo System x3650 M5 system using the same Intel Xeon E5-2699 v4 processors, with a TPC-E throughput score of 4,938.14, which is about 4% higher than the Fujitsu system. In this case, the Fujitsu system uses 1TB of RAM (with a 19TB initial database size), while the Lenovo system uses 512GB of RAM (with a 20TB initial database size). Both systems use all flash storage, with 2.5” SAS SSDs.

I know that this is a lot of numbers to be throwing around, so a summary of these four systems is shown in Table 1.

 

System Processor Raw Score Total Cores Score/Core
Lenovo System x3850 X6 E7-8890 v4 9,068.00 96 94.45
Fujitsu PRIMERGY RX4770 M3 E7-8890 v4 8,796.42 96 91.63
Lenovo System x3650 M5 E5-2699 v4 4,938.14 44 112.23
Fujitsu PRIMERGY RX2540 M2 E5-2699 v4 4,734.87 44 107.61

Table 1: Recent TPC-E Score Highlights

 

This shows that four-socket Broadwell-EX systems scale relatively well compared to older Xeon E7 processor families, meaning that the drop in single-threaded performance compared to equivalent two-socket Xeon E5 processor families is not as large as it used to be. There is still a gap though, which means that you are losing some scalability as you make the jump from a two-socket system to a four-socket system. If you can split your workload across two database servers, you would be better off to have two, two-socket servers rather than one, four-socket server. You would have more total processor capacity, better single-threaded performance, more PCIe expansion slots and lower SQL Server license costs.

An even better alternative for most people would be to use a lower core count, “frequency optimized” processor, instead of the flagship processor. For example, if you used the eight-core, 3.2 GHz Intel Xeon E5-2667 v4 processor in a two-socket server, you would get the estimated results shown in Table 2.

 

System Processor Raw Score Total Cores Score/Core
Estimated Two-Socket System E5-2667 v4 2611.91 16 163.24

Table 2: Estimated TPC-E Results

If you had four, two-socket systems with the eight-core, 3.2 GHz Intel Xeon E5-2667 v4 processor, instead of one, four-socket system with the 24-core 2.2 GHz Intel Xeon E7-8890 v4 processor, you would have about 15.2% more total processor capacity, about 72.9% better single-threaded performance, and a 33.3% lower SQL Server 2016 licensing cost (which would be about $227K in license savings). You would have the same total memory capacity, and more than three times the number of PCIe slots.

New TPC-H Benchmarks Comparing SQL Server 2016 to SQL Server 2014

There are two new TPC-H benchmark submissions on SQL Server 2016. This is interesting, because one of these new submissions (from March 9, 2016) is from Lenovo, for a System x3850 X6 running on SQL Server 2016. Lenovo has a previous submission, from May 1, 2015, for an identical model system running on SQL Server 2014. Both systems are running on Windows Server 2012 R2 Standard Edition. Both of these submissions are for 3000GB databases, which is very important when you are comparing score results.

So here are the results:

SQL Server 2016               969,504 QphH@3000GB

SQL Server 2014               700,392 QphH@3000GB

This shows a 38.4% score increase on identical hardware, which is quite impressive.

Both of these systems have four Intel Xeon E7-8890 v3 (Haswell-EX) 18-core processors, and 3 TB of RAM. Both systems have Intel HT enabled. Diving into the full-disclosure report for each submission, the storage subsystem for each of these submissions is virtually identical. For both systems, the storage is mostly flash-based, with a combination of internal drives and PCIe add-in cards (AIC). No SAN used here!

The key point is that they stored their six data files and their tempdb files across six, independent 3.2 TB PCIe flash AICs, which they describe as “3200GB Enterprise Value io3 Flash Adapter”. I believe that these must be SanDisk Fusion-io Memory SX350-3200 devices. Lenovo also describes the storage subsystem like this in the full-disclosure report:

The OS was stored on a RAID-1 protected array of 2 physical drives. The database files were
stored on 6 non-raided Enterprise io3 Flash drives. The log was stored on a 4-disk Raid10 array.

One thing I noticed was some minor inconsistencies between the Executive Summary and the FDR about the storage subsystem details for where the transaction log file is stored on the March 9, 2016 submission. I think this is just a copy/paste error, and log file performance is not important for this type of benchmark anyway.

Microsoft has been publishing a series of blog posts that outline some of the performance and scalability improvements in SQL Server 2016 on the CSS Engineers blog.