The post Recent TPC-E Results on SQL Server 2017 appeared first on Glenn Berry.
]]>The most recent result was for a four-socket Lenovo ThinkSystem SR950 with 3TB of RAM using a 48TB initial database size. This system had an official result of 11,357.28, which is the highest score ever submitted for a four-socket server. This system has a total of 112 physical cores, so if you divide the total score of 11,357.28 by 112, you get a measure of the single-threaded performance of the Intel Xeon Platinum 8180 processor under a full load (where the clock speed of the individual cores will be pretty close to the 2.5GHz base clock speed). In this case, the result is 101.40 score/core.
Back on June 27, 2017, Lenovo submitted a result for a two-socket Lenovo ThinkSystem SR650 with 1.5TB of RAM using a 28.5TB initial database size. This system had an official result of 6,598.36, which is the highest score ever submitted for a two-socket server. This system has a total of 56 physical cores, so if you divide the total score of 6,598.36 by 56, you get a score/core of 117.83, which is significantly higher than the result for the Lenovo ThinkSystem SR950 configured to use four-sockets (using the exact same Intel Xeon Platinum 8180 processor).
I would attribute most of this difference to the added NUMA overhead from a four-socket system, compared to a two-socket system. Another difference, which probably hurt the score of the two-socket system was the fact that it had to be running on a pre-release version of SQL Server 2017, based on the submission date of the benchmark.
This is just another piece of evidence that even with NUMA, capacity does not scale in a linear fashion as you add sockets to a server. Assuming you can split your workload across multiple database servers rather than just one, having two, two-socket servers instead of one, four-socket server will give you both more CPU capacity and better single-threaded CPU performance even when using the exact same model processor.
I would also argue that you could purposely pick a lower core count, but higher base clock speed processor from the same Intel Xeon Scalable Processor Family to find a sweet spot for SQL Server 2017 usage, where you have fewer physical cores to license, with better single-threaded performance across a higher number of servers.
The post Recent TPC-E Results on SQL Server 2017 appeared first on Glenn Berry.
]]>The post CPU-Z 1.80 is Available appeared first on Glenn Berry.
]]>In case you are wondering what that means, some of the latest Intel processors have a new feature called Intel Turbo Boost Max Technology 3.0 that can automatically direct single-threaded workloads to the fastest core available on a processor. It requires a supported processor, BIOS support, and a special Intel driver, along with operating system support.
One processor that has this feature is the Intel Core i9-7900X processor.
Figure 1: CPU Tab of CPU-Z 1.80
If you want to investigate whether you have this feature and what it is doing, you can click on the Clocks button on the About tab, and see the preferred core information for your processor.
Figure 2: About Tab of CPU-Z 1.80
Figure 3: Clocks Dialog
My old Core i7-3770K processor does NOT have this new feature!
The post CPU-Z 1.80 is Available appeared first on Glenn Berry.
]]>The post Intel Xeon E7 Processor Generational Performance Comparison appeared first on Glenn Berry.
]]>

Figure 1: Speedup from Successive Processor Generations
This workload is described as “AsiaInfo ADB Database OCS k-tpmC”, while the AsiaInfo ADB is described as “a scalable OLTP database that targets high performance and mission critical businesses such as online charge service (OCS) in the telecom industry”, that runs on Linux.
The reason I have performance in quotes above is because what they are really measuring is closer to what I would call capacity or scalability. Their topline result is “Thousands of Transactions per Minute” as measured with these different hardware and storage configurations.
The key point to keep in mind with these types of benchmarks is whether they are actually comparing relatively comparable systems or not. In this case, the systems are quite similar, except for the core counts of the successive processor models (and the DD3 vs. DDR4 memory support). Here are the system components, as listed in the footnotes of the document:
Baseline: Four-sockets, 15-core Intel Xeon E7-4890 v2, 256GB DDR3/1333 DIMM, Intel DC S3700 SATA for OS, (2) 2TB Intel DC P3700 PCIe NVMe for storage, 10GbE Intel X540-AT2 NIC
Next Generation: Four-sockets, 18-core Intel Xeon E7-8890 v3, 256GB DDR4/1600 LVDIMM, Intel DC S3700 SATA for OS, (2) 2TB Intel DC P3700 PCIe NVMe for storage, 10GbE Intel X540-AT2 NIC
New: Four-sockets, 24-core Intel Xeon E7-8890 v4, 256GB DDR4/1600 LVDIMM, Intel DC S3700 SATA for OS, (2) 2TB Intel DC P3700 PCIe NVMe for storage, 10GbE Intel X540-AT2 NIC
The baseline system has a total of 60 physical cores, running at 2.8GHz, using the older Ivy Bridge-EX microarchitecture. The next generation system has a total of 72 physical cores, running at 2.5GHz, using the slightly newer Haswell-EX microarchitecture. Finally, the new system has a total of 96 physical cores, running at 2.2GHz, using the current Broadwell-EX microarchitecture. These differences in core counts, base clock speeds, and microarchitecture make it a little harder to fully understand their benchmark results in a realistic manner.
Table 1 shows some relevant metrics for these three system configurations. The older generation processors have fewer cores, but run at a higher base clock speed. The newer generation processors would be faster than the older generation processors at the same clock speed, but the base clock speed is lower as the core counts have increased with each successive generation flagship processor. The improvements in IPC and single-threaded performance are obscured by lower base clock speeds as the core counts increase, which makes the final score increase less impressive.
| Processor | Base Clock | Total System Cores | Raw Score | Score/Core |
| Xeon E7-4890 v2 | 2.8GHz | 60 | 725 | 12.08 |
| Xeon E7-8890 v3 | 2.5GHz | 72 | 1021 | 14.18 |
| Xeon E7-8890 v4 | 2.2GHz | 96 | 1294 | 13.48 |
Table 1: Analysis of ADB Benchmark Results
Table 2 shows some metrics from an analysis of some actual and estimated TPC-E benchmark results for those same three system configurations, plus an additional processor choice that I added. The results are pretty similar, which supports the idea that both of these benchmarks are CPU-limited. From a SQL Server 2016 perspective, you are going to be better off from a performance/license cost perspective if you purposely choose a lower core count “frequency-optimized” processor (at the cost of less total system capacity per host).
This is somewhat harder to do with the Intel Xeon E7 v4 family, because of your limited SKU choices. A good processor choice for many workloads would be the 10-core Intel Xeon E7-8891 v4 processor, which has a base clock speed of 2.8GHz and a 60MB L3 cache that is shared by only 10 cores.
If you could spread your workload across two database servers, you would be much better off with two, four-socket servers with the 10-core Xeon E7-8891 v4 rather than one four-socket server with the 24-core Xeon E7-8890 v4. You would have more total system processor capacity, roughly 27% better single-threaded CPU performance, twice the total system memory capacity, and twice the total number of PCIe 3.0 expansion slots. You would also only need 80 SQL Server 2016 Enterprise Edition core licenses rather than 96 core licenses, which would save you about $114K in license costs. That license savings would probably pay for both database servers, depending on their exact configuration.
| Processor | Base Clock | Total System Cores | Est TPC-E Score | Score/Core |
| Xeon E7-4890 v2 | 2.8GHz | 60 | 5576.27 | 92.94 |
| Xeon E7-8890 v3 | 2.5GHz | 72 | 6964.75 | 96.73 |
| Xeon E7-8890 v4 | 2.2GHz | 96 | 9068.00 | 94.46 |
| Xeon E7-8891 v4 | 2.8GHz | 40 | 4808.79 | 120.22 |
Table 2: Analysis of Estimated TPC-E Benchmark Results
The Intel document also discusses the “performance” increases seen from moving from Intel DC S3700 SATA drives to Intel DC P3700 PCIe NVMe drives. This is going to be primarily influenced by the advantages of being connected directly to the PCIe bus and the lower latency and overhead of the NVMe protocol compared to the older AHCI protocol.
Finally, they talk about the “performance” increases they measured from enabling the Intel Transactional Synchronization Extensions (TSX) instruction set and the Intel AVX 2.0 instruction set on current generation Intel E7-8800 v4 series processors.
SQL Server 2016 already has hardware support for older SSE/AVX instructions as discussed here and here. I really hope that Microsoft decides to add even more support for newer instruction sets (such as TSX) in SQL Server vNext.
The post Intel Xeon E7 Processor Generational Performance Comparison appeared first on Glenn Berry.
]]>The post CPU-Z 1.78 is Available appeared first on Glenn Berry.
]]>The main improvement in this version is support for Intel Kaby Lake processors, which are already available in the mobile space. It looks like the desktop version of Kaby Lake will be released at CES in January. Tom’s Hardware did some benchmarking of an early sample of a Core i7-7700K that someone supplied to them, as detailed here.
Figure 1: CPU-Z 1.78 CPU Tab
Recent versions of CPU-Z have added a quick CPU benchmarking function that is very useful for running a brief CPU benchmark that measures single-threaded CPU performance and multi-threaded CPU performance. Each test only takes about 7-8 seconds, and is useful for a number of reasons.
Figure 2: CPU-Z 1.78 Bench Tab For Intel Core i7-6700K System
First, you can get a quick gauge of your single-threaded CPU performance (which equates to the “speed” of the processor), and your multi-threaded CPU performance (which equates to the CPU capacity of the entire system). This is useful for comparing different processors and systems, whether they are physical or virtual. You can measure the performance of a VM versus running bare metal on the host, or you can measure different VM configurations. You can also compare your numbers to the built-in reference processors, or submit your results and compare them to other systems results that are stored online.
Second, you can use the Bench CPU button to briefly stress your processors, and then quickly switch to the main CPU tab while the test is running, to see what happens to your CPU core clock speeds, in order to understand whether you have power management configured correctly to get the performance benefits of Intel Turbo Boost.
The post CPU-Z 1.78 is Available appeared first on Glenn Berry.
]]>The post New TPC-E Results for SQL Server 2016 appeared first on Glenn Berry.
]]>The most recent result, from July 12, 2016 is for a four-socket FUJITSU Server PRIMERGY RX4770 M3 server that is using the latest generation, 14nm 2.2GHz Intel Xeon E7-8890 v4 processor (Broadwell-EX), with a TPC-E throughput score of 8,796.42. As is always the case with TPC-E benchmarks, the hardware vendor used the “flagship”, highest core count processor available from the latest processor family, in this case, a 24-core processor. This helps achieve the highest possible TPC-E throughput score (which is a measure of the total processor capacity of the system), at the cost of quite high SQL Server 2016 licensing costs, since you would have to purchase 96 SQL Server 2016 Enterprise Edition core licenses. This would cost about $684K at full retail price. Fujitsu priced the SQL Server 2016 licenses at $647K in the Executive Summary report.
Another recent result from May 31, 2016 is for a four-socket Lenovo System x3850 X6 that is using the same Intel Xeon E7-8890 v4 processor. This system gets a TPC-E throughput score of 9,068.00, which is about 3% higher than the Fujitsu system. Both systems use a 36TB initial database size, while the Fujitsu system uses 2TB of RAM and the Lenovo system uses 4TB of RAM (which is the license limit for Windows Server 2012 R2). Both systems use all flash storage, with 2.5” SAS SSDs.
Unlike the old TPC-C OLTP benchmark, TPC-E does not require an unrealistically expensive storage subsystem to get good scores. As long as the storage subsystem is “good enough” so that it does not become a bottleneck, then the ultimate TPC-E bottleneck becomes processor performance.
Earlier this year, there were two competing results for two-socket systems from Lenovo and Fujitsu. On March 30, 2016, Fujitsu published a result for a two-socket FUJITSU Server PRIMERGY RX2540 M2 system using the latest generation, 14nm 2.2GHz Intel Xeon E5-2699 v4 processor (Broadwell-EP), with a TPC-E throughput score of 4,734.87. The Intel Xeon E5-2699 v4 has 22 physical cores, so the two-socket system has a total of 44 physical cores that would need SQL Server 2016 licenses that would cost about $313K at full retail price. Fujitsu priced the SQL Server 2016 licenses at $296K in the Executive Summary report.
On March 24, 2016, Lenovo published a result for a Lenovo System x3650 M5 system using the same Intel Xeon E5-2699 v4 processors, with a TPC-E throughput score of 4,938.14, which is about 4% higher than the Fujitsu system. In this case, the Fujitsu system uses 1TB of RAM (with a 19TB initial database size), while the Lenovo system uses 512GB of RAM (with a 20TB initial database size). Both systems use all flash storage, with 2.5” SAS SSDs.
I know that this is a lot of numbers to be throwing around, so a summary of these four systems is shown in Table 1.
| System | Processor | Raw Score | Total Cores | Score/Core |
| Lenovo System x3850 X6 | E7-8890 v4 | 9,068.00 | 96 | 94.45 |
| Fujitsu PRIMERGY RX4770 M3 | E7-8890 v4 | 8,796.42 | 96 | 91.63 |
| Lenovo System x3650 M5 | E5-2699 v4 | 4,938.14 | 44 | 112.23 |
| Fujitsu PRIMERGY RX2540 M2 | E5-2699 v4 | 4,734.87 | 44 | 107.61 |
Table 1: Recent TPC-E Score Highlights
This shows that four-socket Broadwell-EX systems scale relatively well compared to older Xeon E7 processor families, meaning that the drop in single-threaded performance compared to equivalent two-socket Xeon E5 processor families is not as large as it used to be. There is still a gap though, which means that you are losing some scalability as you make the jump from a two-socket system to a four-socket system. If you can split your workload across two database servers, you would be better off to have two, two-socket servers rather than one, four-socket server. You would have more total processor capacity, better single-threaded performance, more PCIe expansion slots and lower SQL Server license costs.
An even better alternative for most people would be to use a lower core count, “frequency optimized” processor, instead of the flagship processor. For example, if you used the eight-core, 3.2 GHz Intel Xeon E5-2667 v4 processor in a two-socket server, you would get the estimated results shown in Table 2.
| System | Processor | Raw Score | Total Cores | Score/Core |
| Estimated Two-Socket System | E5-2667 v4 | 2611.91 | 16 | 163.24 |
Table 2: Estimated TPC-E Results
If you had four, two-socket systems with the eight-core, 3.2 GHz Intel Xeon E5-2667 v4 processor, instead of one, four-socket system with the 24-core 2.2 GHz Intel Xeon E7-8890 v4 processor, you would have about 15.2% more total processor capacity, about 72.9% better single-threaded performance, and a 33.3% lower SQL Server 2016 licensing cost (which would be about $227K in license savings). You would have the same total memory capacity, and more than three times the number of PCIe slots.
The post New TPC-E Results for SQL Server 2016 appeared first on Glenn Berry.
]]>The post New TPC-H Benchmarks Comparing SQL Server 2016 to SQL Server 2014 appeared first on Glenn Berry.
]]>So here are the results:
SQL Server 2016 969,504 QphH@3000GB
SQL Server 2014 700,392 QphH@3000GB
This shows a 38.4% score increase on identical hardware, which is quite impressive.
Both of these systems have four Intel Xeon E7-8890 v3 (Haswell-EX) 18-core processors, and 3 TB of RAM. Both systems have Intel HT enabled. Diving into the full-disclosure report for each submission, the storage subsystem for each of these submissions is virtually identical. For both systems, the storage is mostly flash-based, with a combination of internal drives and PCIe add-in cards (AIC). No SAN used here!
The key point is that they stored their six data files and their tempdb files across six, independent 3.2 TB PCIe flash AICs, which they describe as “3200GB Enterprise Value io3 Flash Adapter”. I believe that these must be SanDisk Fusion-io Memory SX350-3200 devices. Lenovo also describes the storage subsystem like this in the full-disclosure report:
The OS was stored on a RAID-1 protected array of 2 physical drives. The database files were
stored on 6 non-raided Enterprise io3 Flash drives. The log was stored on a 4-disk Raid10 array.
One thing I noticed was some minor inconsistencies between the Executive Summary and the FDR about the storage subsystem details for where the transaction log file is stored on the March 9, 2016 submission. I think this is just a copy/paste error, and log file performance is not important for this type of benchmark anyway.
Microsoft has been publishing a series of blog posts that outline some of the performance and scalability improvements in SQL Server 2016 on the CSS Engineers blog.
The post New TPC-H Benchmarks Comparing SQL Server 2016 to SQL Server 2014 appeared first on Glenn Berry.
]]>The post CPU-Z Benchmark Survey appeared first on Glenn Berry.
]]>With CPU-Z 1.75, simply click on the “Bench CPU” button on the Bench tab (as shown in Figure 2). Once the test finishes in a couple of minutes, take a screenshot of that tab (ALT-Print Screen in Windows), and paste it in an e-mail. Then take a screenshot of the CPU tab (so I can easily identify your processor), like you see in Figure 1, and include that in your e-mail. Another way to get these screenshots is to hit the F5 key, while you are on those two tabs, which will save a .bmp file in the same directory as CPU-Z.
I am mainly looking for results for bare-metal, non-virtualized machines right now. If possible, make sure the Windows High Performance power plan is enabled, and that your machine is plugged in (if it is a laptop or tablet). Ideally, you would do this while your machine is relatively idle, so that all of the processing power is available for the test.
If you run this on a server, please don’t do it while it is in Production!
Figure 1: CPU-Z CPU Tab
Make sure to only click the Bench CPU tab, not the Stress CPU tab!
Figure 2: CPU-Z Bench Tab
Once you are done, simply send me your screenshots by e-mail. Please don’t try to return any results by comments on this blog post.
If you would like to do this for multiple machines, that would be great! Thanks!
The post CPU-Z Benchmark Survey appeared first on Glenn Berry.
]]>The post Upgrading a SATA III SSD appeared first on Glenn Berry.
]]>I ended up getting a 1TB Samsung 850 EVO SATA III SSD, for $329.99 at my local Micro Center. The 850 EVO line has been around for about a year now, and prices have come down quite a bit since they were introduced. It is pretty amazing to get double the size (and better performance) at less than half the price, compared to what was available back in 2012.
Before I cloned the existing drive, I ran CrystalDiskMark 5.0.2 on it with a 4GB test file. The results are shown in Figure 1.
Figure 1: 512GB OCZ Vertex 4 SATA III SSD Benchmark results
I used the free Samsung Data Migration software (which only works with Samsung SSDs as the cloning target) to clone the old OCZ drive to the new Samsung drive. I used an Apricorn SATA Wire 3.0 plugged into a front-panel USB 3.0 port to connect the new Samsung drive for the cloning process. I could have shut down the system, and plugged the new Samsung drive into a native SATA III port to get better copy performance, but I was too lazy to do that… As it was, I was seeing about 125MB/sec during the cloning copy process, which was fast enough. If you are cloning/upgrading a drive in a laptop, you pretty much have to use a USB port to do it.
After the cloning process was complete, I shut down the system and swapped the drives. Windows 7 booted up without any problems, although it wanted a reboot once it realized that the drive had been changed. I also noticed that Windows 7 had lost it’s recollection of ever checking for Windows and Microsoft Updates, but asking it to check for updates fixed that issue.
Next, I fired up the Samsung Magician 4.9 software, which informed me that the new Samsung 850 EVO needed a firmware update. Before I ran the firmware update, I ran CrystalDiskMark 5.0.2 with the same settings as the previous test. The results are shown in Figure 2.
Figure 2: 1TB Samsung 850 EVO SATA III SSD Benchmark results (before firmware update)
After the drive firmware update, Windows 7 booted up without any problems, although it wanted another reboot once it realized that the drive firmware had been updated. I ran CrystalDiskMark 5.0.2 once again with the same settings as the previous test. The results are shown in Figure 3.
Figure 3: 1TB Samsung 850 EVO SATA III SSD Benchmark results (after firmware update)
As you can see, the benchmark results improved after the firmware update. I have not found any release notes for the firmware update (and it is not even listed on their web page), but at least the latest version of Samsung Magician knew about it.
Figure 4: Samsung Magician 4.9
I have not enabled RAPID Mode on the drive yet, but I know from prior experience that it can have a nice positive effect on performance. It does make it harder to analyze your storage performance when SQL Server is running on your workstation though. All in all, a pretty easy, trouble-free installation.
The post Upgrading a SATA III SSD appeared first on Glenn Berry.
]]>The post Some Quick Comparative CrystalDiskMark Results appeared first on Glenn Berry.
]]>A few weeks ago, I built a new Intel Skylake desktop system that I am going to start using as my primary workstation in the near future. I have some details about this system as described in Building a Z170 Desktop System with a Core i7-6700K Skylake Processor. By design, this system has several different types of storage devices, so I can take advantage of the extra PCIe bandwidth in the latest Intel Z170 Express chipset, and do some comparative testing.
The latest addition to the storage family is a brand new 512GB Samsung 950 PRO M.2 PCIe NVMe card that just arrived from Amazon yesterday afternoon. As of now, here is the available storage in this system:
Since I have an NVidia GeForce GTX 960 video card in one of the PCI 3.0 x16 slots, both that slot and the PCI 3.0 x16 slot that the Intel 750 is using will go down to x8 (which means 8 lanes instead of 16 lanes). The Intel Z170 Express chipset supports 26 PCIe 3.0 lanes, so you need to think about what devices you are trying to use. This system has Windows 10 Professional installed, so it has native NVMe drivers available from Microsoft.
I did some quick and dirty I/O testing today with CrystalDiskMark 5.02. The two NVMe devices are both using the native Microsoft NVMe drivers from Windows 10. As you can see below, both the Samsung 950 PRO and the Intel 750 PCIe NVMe cards have tremendous sequential and random I/O performance!
| Device | Sequential Reads | Sequential Writes | Random Reads | Random Writes |
| 512GB Samsung 950 Pro | 2595 MB/s | 1526 MB/s | 171755.6 IOPS | 104801.3 IOPS |
| 400GB Intel 750 | 2369 MB/s | 1081 MB/s | 177938.0 IOPS | 151642.1 IOPS |
| 512GB Samsung 850 Pro | 1104 MB/s | 532 MB/s | 100420.4 IOPS | 60765.1 IOPS |
| 6TB WD Red HD | 176 MB/s | 170 MB/s | 386.7 IOPS | 448.2 IOPS |
Table 1: Sequential and Random Results (Queue Depth 32, 1 Thread)
Keep in mind that the two Samsung 850 PRO SSDs are using hardware RAID1, which seems to help their sequential read performance, and that the two NVMe devices are both using the native Microsoft NVMe drivers, which may be hurting their performance somewhat.
Figure 1: 512GB Samsung 950 Pro M.2 PCIe NVMe Results
Figure 2: 400GB Intel 750 PCIe NVMe Results
Figure 3: 512GB Samsung 850 Pro SATA 3 (RAID 1) Results
Figure 4: 6TB Western Digital Red Results
The post Some Quick Comparative CrystalDiskMark Results appeared first on Glenn Berry.
]]>The post Analyzing and Improving I/O Subsystem Performance Precon in Denver appeared first on Glenn Berry.
]]>SQL Server is often I/O bound, but proving it to your storage or SAN administrator can be challenging! You will learn about the different types of storage that are available for SQL Server, and how to decide what type of storage to use for different SQL Server workload and file types. You will also learn useful tips and techniques for configuring your storage for the best performance and reliability for your workload. There will be extensive coverage on how to use disk benchmark tools like CrystalDiskMark 4.0, SQLIO and Microsoft DiskSpd, so you can confidently understand the performance that your I/O subsystem can deliver. We’ll also cover methods to effectively measure and monitor your storage performance from an OS and SQL Server perspective so that you will have valuable information and evidence available the next time you have to discuss I/O performance with your storage administrator. You will also learn a number of valuable OS and SQL Server configuration settings that will help you get the best I/O performance possible from your storage subsystem.
I hope to see you there and at SQLSaturday #441 the next day!
The post Analyzing and Improving I/O Subsystem Performance Precon in Denver appeared first on Glenn Berry.
]]>