Two Recent Laptops Compared

I have two fairly recent-vintage 13” personal laptops that I use primarily for teaching and presentations that I thought would be interesting to compare from some common performance perspectives. The first one, which is slightly over a year old is a Lenovo Yoga 900, which has a 14nm Intel Core i7-6500U Skylake-U processor, 16GB of RAM, a 512GB Samsung PM871 SATA III SSD, one USB 2.0 port, one USB 3.0 port, one USB-C port and a 3200×1800 touch display.

The newer machine is an HP Spectre x360 13-w023dx, which has a 14nm Intel Core i7-7500U Kaby Lake-U processor, 16GB of RAM, a 512GB Samsung SM961 M.2 NVMe SSD, one USB 3.0 port, two USB-C Thunderbolt 3 ports and a 1080P touch display.

The high-level processor specifications and CPU-Z benchmark results for these two systems are shown below:

 

Processor                     Base Clock      Turbo Clock      Single-threaded CPU          Multi-threaded CPU

Intel Core i7-6500U        2.5GHz            3.1GHz              1467                                    3391

Intel Core i7-7500U        2.7GHz            3.5GHz              1743                                    3958

 

These Skylake-U and Kaby Lake-U processors are quite similar, with the Kaby Lake having an optimized “14nm plus” process technology that lets Intel set the clock speeds slightly higher at the same power usage levels. Kaby Lake also has improved integrated graphics and an improved version of Intel Speed Shift technology that lets Windows 10 throttle up the clock speed of the processor cores even faster than with a Skylake processor.

 

Figure 1: Improved Intel Speed Shift in Kaby Lake

 

The single-threaded CPU-Z 1.78.1 benchmark result is 18.8% higher with the new system, while the multi-threaded CPU-Z benchmark result is 16.7% higher on the new system. I attribute this increase to the higher base and turbo clock speeds, the optimized process technology, and the effect of the improved Intel Speed Shift. The results are shown in figures 2 and 3.

 

image

Figure 2: Intel Core i7-6500U CPU-Z Benchmark Results

 

 

image

Figure 3: Intel Core i7-7500U CPU-Z Benchmark Results

 

Honestly, these current generational CPU performance improvements are slightly better than nothing (but not much), and are certainly not a good enough reason to upgrade from an equivalent Skylake-U system to a Kaby Lake-U system. Where we see a big improvement is with basic storage performance and peripheral connectivity between these two systems.

I was happily surprised that the new HP system came a very fast 512GB Samsung SM961 M.2 NVMe OEM SSD that is equivalent to a Samsung 960 PRO. The reason I was surprised was because some reviews I had read indicated that these HP machines had a much slower Samsung OEM M.2 NVMe SSD. This probably varies by when your machine was manufactured, so perhaps the earliest review machines had the older, slower drives.

As you can see, the difference in the CrystalDiskMark performance between these drives is pretty dramatic.

 

image

Figure 4: 512GB Samsung SM961 M.2 NVMe SSD

 

image

Figure 5: 512GB Samsung PM871 SATA 3 SSD

 

For day to day average PC usage, you probably won’t really notice the difference between a fast SATA 3 SSD and an M.2 PCIe NVMe SSD, but if you are using SQL Server on a laptop, having that extra sequential bandwidth and much better random I/O performance is really noticeable. It is also very nice to have Thunderbolt 3 support, which will allow you to have really fast transfer performance to an appropriate external drive.

So the moral of all this is that the best reason to consider upgrading to a new laptop or new desktop machine for many people are the additional storage and peripheral connectivity options that you can get with a new machine.

 

Intel Xeon E7 Processor Generational Performance Comparison

Intel has a fairly recent document titled Accelerated Operations for Telecom and Financial Services which is also listed under Accelerate OLTP Database Performance with Intel TSX. It describes the “performance” increases seen with the AsiaInfo ADB from moving from 2.8GHz Intel Xeon E7-4890 v2 (Ivy Bridge-EX), to 2.5GHz Intel Xeon E7-8890 v3 (Haswell-EX), and finally to 2.2GHz Intel Xeon E7-8890 v4 (Broadwell-EX) processors, as shown in Figure 1.

 

Figure 1: Speedup from Successive Processor Generations

 

This workload is described as “AsiaInfo ADB Database OCS k-tpmC”, while the AsiaInfo ADB is described as “a scalable OLTP database that targets high performance and mission critical businesses such as online charge service (OCS) in the telecom industry”, that runs on Linux.

The reason I have performance in quotes above is because what they are really measuring is closer to what I would call capacity or scalability. Their topline result is “Thousands of Transactions per Minute” as measured with these different hardware and storage configurations.

The key point to keep in mind with these types of benchmarks is whether they are actually comparing relatively comparable systems or not. In this case, the systems are quite similar, except for the core counts of the successive processor models (and the DD3 vs. DDR4 memory support). Here are the system components, as listed in the footnotes of the document:

Baseline: Four-sockets, 15-core Intel Xeon E7-4890 v2, 256GB DDR3/1333 DIMM, Intel DC S3700 SATA for OS, (2) 2TB Intel DC P3700 PCIe NVMe for storage, 10GbE Intel X540-AT2 NIC

Next Generation: Four-sockets, 18-core Intel Xeon E7-8890 v3, 256GB DDR4/1600 LVDIMM, Intel DC S3700 SATA for OS, (2) 2TB Intel DC P3700 PCIe NVMe for storage, 10GbE Intel X540-AT2 NIC

New: Four-sockets, 24-core Intel Xeon E7-8890 v4, 256GB DDR4/1600 LVDIMM, Intel DC S3700 SATA for OS, (2) 2TB Intel DC P3700 PCIe NVMe for storage, 10GbE Intel X540-AT2 NIC

The baseline system has a total of 60 physical cores, running at 2.8GHz, using the older Ivy Bridge-EX microarchitecture. The next generation system has a total of 72 physical cores, running at 2.5GHz, using the slightly newer Haswell-EX microarchitecture. Finally, the new system has a total of 96 physical cores, running at 2.2GHz, using the current Broadwell-EX microarchitecture. These differences in core counts, base clock speeds, and microarchitecture make it a little harder to fully understand their benchmark results in a realistic manner.

Table 1 shows some relevant metrics for these three system configurations. The older generation processors have fewer cores, but run at a higher base clock speed. The newer generation processors would be faster than the older generation processors at the same clock speed, but the base clock speed is lower as the core counts have increased with each successive generation flagship processor. The improvements in IPC and single-threaded performance are obscured by lower base clock speeds as the core counts increase, which makes the final score increase less impressive.

 

ProcessorBase ClockTotal System CoresRaw ScoreScore/Core
Xeon E7-4890 v22.8GHz6072512.08
Xeon E7-8890 v32.5GHz72102114.18
Xeon E7-8890 v42.2GHz96129413.48

Table 1: Analysis of ADB Benchmark Results

 

Table 2 shows some metrics from an analysis of some actual and estimated TPC-E benchmark results for those same three system configurations, plus an additional processor choice that I added. The results are pretty similar, which supports the idea that both of these benchmarks are CPU-limited. From a SQL Server 2016 perspective, you are going to be better off from a performance/license cost perspective if you purposely choose a lower core count “frequency-optimized” processor (at the cost of less total system capacity per host).

This is somewhat harder to do with the Intel Xeon E7 v4 family, because of your limited SKU choices. A good processor choice for many workloads would be the 10-core Intel Xeon E7-8891 v4 processor, which has a base clock speed of 2.8GHz and a 60MB L3 cache that is shared by only 10 cores.

If you could spread your workload across two database servers, you would be much better off with two, four-socket servers with the 10-core Xeon E7-8891 v4 rather than one four-socket server with the 24-core Xeon E7-8890 v4. You would have more total system processor capacity, roughly 27% better single-threaded CPU performance, twice the total system memory capacity, and twice the total number of PCIe 3.0 expansion slots. You would also only need 80 SQL Server 2016 Enterprise Edition core licenses rather than 96 core licenses, which would save you about $114K in license costs. That license savings would probably pay for both database servers, depending on their exact configuration.

 

ProcessorBase ClockTotal System CoresEst TPC-E ScoreScore/Core
Xeon E7-4890 v22.8GHz605576.2792.94
Xeon E7-8890 v32.5GHz726964.7596.73
Xeon E7-8890 v42.2GHz969068.0094.46
Xeon E7-8891 v42.8GHz404808.79120.22

Table 2: Analysis of Estimated TPC-E Benchmark Results

 

The Intel document also discusses the “performance” increases seen from moving from Intel DC S3700 SATA drives to Intel DC P3700 PCIe NVMe drives. This is going to be primarily influenced by the advantages of being connected directly to the PCIe bus and the lower latency and overhead of the NVMe protocol compared to the older AHCI protocol.

Finally, they talk about the “performance” increases they measured from enabling the Intel Transactional Synchronization Extensions (TSX) instruction set and the Intel AVX 2.0 instruction set on current generation Intel E7-8800 v4 series processors.

SQL Server 2016 already has hardware support for older SSE/AVX instructions as discussed here and here. I really hope that Microsoft decides to add even more support for newer instruction sets (such as TSX) in SQL Server vNext.

 

 

Some Quick Comparative CrystalDiskMark Results

A few weeks ago, I built a new Intel Skylake desktop system that I am going to start using as my primary workstation in the near future. I have some details about this system as described in Building a Z170 Desktop System with a Core i7-6700K Skylake Processor. By design, this system has several different types of storage devices, so I can take advantage of the extra PCIe bandwidth in the latest Intel Z170 Express chipset, and do some comparative testing.

The latest addition to the storage family is a brand new 512GB Samsung 950 PRO M.2 PCIe NVMe card that just arrived from Amazon yesterday afternoon. As of now, here is the available storage in this system:

  1. (2) 512GB Samsung 850 PRO SATA III SSDs in RAID 1 (using the chipset RAID controller)
  2. (1) 512GB Samsung 950 PRO M.2 PCIe 3.0 NVMe card in an Ultra M.2 PCIe 3.0 x4 slot
  3. (1) 400GB Intel 750 PCIe NVMe card in a PCIe 3.0 x16 slot
  4. (1) 6TB Western Digital SATA III hard drive in a SATA III port

Since I have an NVidia GeForce GTX 960 video card in one of the PCI 3.0 x16 slots, both that slot and the PCI 3.0 x16 slot that the Intel 750 is using will go down to x8 (which means 8 lanes instead of 16 lanes). The Intel Z170 Express chipset supports 26 PCIe 3.0 lanes, so you need to think about what devices you are trying to use. This system has Windows 10 Professional installed, so it has native NVMe drivers available from Microsoft.

I did some quick and dirty I/O testing today with CrystalDiskMark 5.02. The two NVMe devices are both using the native Microsoft NVMe drivers from Windows 10. As you can see below, both the Samsung 950 PRO and the Intel 750 PCIe NVMe cards have tremendous sequential and random I/O performance!

 

DeviceSequential ReadsSequential WritesRandom ReadsRandom Writes
512GB Samsung 950 Pro2595 MB/s1526 MB/s171755.6 IOPS104801.3 IOPS
400GB Intel 7502369 MB/s1081 MB/s177938.0 IOPS151642.1 IOPS
512GB Samsung 850 Pro1104 MB/s532 MB/s100420.4 IOPS60765.1 IOPS
6TB WD Red HD176 MB/s170 MB/s386.7 IOPS448.2 IOPS

Table 1: Sequential and Random Results (Queue Depth 32, 1 Thread)

Keep in mind that the two Samsung 850 PRO SSDs are using hardware RAID1, which seems to help their sequential read performance, and that the two NVMe devices are both using the native Microsoft NVMe drivers, which may be hurting their performance somewhat.

 

image

Figure 1: 512GB Samsung 950 Pro M.2 PCIe NVMe Results

 

image

Figure 2: 400GB Intel 750 PCIe NVMe Results

 

image

Figure 3: 512GB Samsung 850 Pro SATA 3 (RAID 1) Results

 

image

Figure 4: 6TB Western Digital Red Results