A SQL Server Hardware Tidbit a Day – Day 27

For Day 27 of this series, I am going to talk about Power Management and its effect on processor performance. I have written about this subject a couple of times before, here and here. Other people, such as Paul Randal (blog|Twitter) and Brent Ozar (blog|Twitter) have written about this subject here and here.

Power Management is when the clock speed of your processors is reduced (usually by changing the processor multiplier value) in order to use less electrical power when the processor is not under a heavy load. On the surface, this seems like a good idea, since electrical power costs can be pretty significant in a data center. Throttling back a processor can save some electricity and reduce your heat output, which can reduce your cooling costs in a data center. Unfortunately, with some processors, and with some types of SQL Server workloads (particularly OLTP workloads), you will pay a pretty significant performance price (in the range of 20-25%) for those electrical power savings.

When a processor has power management features that are enabled, the clock speed of the processor will vary based on the load the processor is experiencing. You can watch this in near real-time with a tool like CPU-Z, that displays the current clock speed of Core 0. The performance problem comes from the fact that some processors don’t seem to react fast enough to an increase in load to give their full performance potential, particularly for very short OLTP queries that often execute in a few milliseconds.

This problem seems to show up especially with Intel Xeon 5500, 7500 (Nehalem-EP and EX), Intel Xeon 5600, E7 series processors (Westmere-EP and EX families) and with the AMD Opteron 6100, 6200, and 6300 series (Magny Cours, Bulldozer and Piledriver families). Much older processors don’t have any power management features, and some slightly older processors (such as the Intel Xeon 5300 and 5400 series) seem to handle power management slightly better. I have also noticed that the Intel Sandy Bridge-EP processors seem to handle power management a little better than the Nehalem and Westmere did, i.e. they don’t show as noticeable of a performance decrease when power management is enabled.

Basically, you have two types of power management that you need to be aware of as a database professional. The first type is hardware-based power management, where the main system BIOS of a server is set to allow the processors to manage their own power states, based on the load they are seeing from the operating system. The second type is software-based power management, where the operating system (with Windows Server 2008 and above) is in charge of power management using one of the standard Windows Power Plans, or a customized version of one of those plans. When you install Windows Server 2008 or above, Windows will be using the Balanced Power Plan by default. When you are using the Balanced Power Plan, Intel processors that have Turbo Boost Technology will not use Turbo Boost (meaning that they will not temporarily overclock individual processor cores for more performance).

So, after all of this, what do I recommend you do for your database server? First, check your Windows Power Plan setting, and make sure you are using the High Performance Power Plan. This can be changed dynamically without a restart. Next, run CPU-Z, and make sure your processor is running at or above its rated speed. If it is running at less than its rated speed with the High Performance Power Plan, that means that you have hardware power management overriding what Windows has asked for. That means you are going to have to restart your server (in your next maintenance window) and go into your BIOS settings and either disable power management or set it to OS control (which I prefer).

A SQL Server Hardware Tidbit a Day – Day 25

For Day 24 of this series, I want to talk about the recent history of Dell rack-mounted servers, to help illustrate how processor performance and server capacity has dramatically improved over the past seven years.

Back in 2005-2006, you could buy a two-socket Dell PowerEdge 1850, with two hyper-threaded Intel Xeon “Irwindale” 3.2GHz processors and 16GB of RAM (with a total of four logical cores). This was fine for an application or web server, but it did not have the CPU horsepower (the 32-bit Geekbench score was about 2200) or memory capacity for a heavy duty database workload.

Around the same time, you could also buy a four-socket Dell PowerEdge 6850, with four dual-core, Intel Xeon 7040 “Paxville” 3.0GHz processors and 64GB of RAM (with a total of 16 logical cores with hyper-threading enabled). This was a much better choice for a database server because of the additional processor, memory, and I/O capacity compared to a PowerEdge 1850. Even so, its Geekbench score was only about 4400, which is pretty pathetic by today’s standards. Back in 2006-2007, it still made perfect sense to buy a four-socket database server for most database server workloads.

By late 2007, you could buy a two-socket Dell PowerEdge 1950, with two, quad-core Intel Xeon E5450 processors and 32GB of RAM (with a total of eight logical cores) and you would actually have a pretty powerful platform for a database server. A system like this would have a 32-bit Geekbench score of about 8000. The biggest weakness of this system was having only two x8 PCI-E 1.0 expansion slots.

By late 2008, you could buy a four-socket Dell PowerEdge R900, with four, six-core Intel Xeon X7460 processors and 256GB of RAM (with a total of of 24 logical cores). This was a very powerful , but costly platform for a database server, with a 32-bit Geekbench score of around 16500. There are still many of these model servers being used for production purposes, and while they sound impressive, that are actually a very bad choice for an upgrade to SQL Server 2012 because of their high physical core counts and low single-threaded performance. The Xeon X7460 was the last generation of Intel SMP processors, before the NUMA-capable Nehalem was introduced.

By early 2009, you could buy a two-socket Dell PowerEdge R710, with two, quad-core Intel Xeon X5570 processors, and 144GB of RAM (with a total of 16 logical cores) and you would have a very powerful database server platform. This system would have a 32-bit Geekbench score of around 15000. This would give you fairly close to the capacity of a four-socket R900, with better single-threaded performance.

By early 2010, you could buy that same Dell PowerEdge R710, with more powerful six-core Intel Xeon X5680 processors (with a total of 24 logical cores), and push the 32-bit Geekbench score to about 22500. This gives you quite a bit more CPU capacity than the PowerEdge R900 that you bought in late 2008. If you are concerned about 144GB of RAM not being enough memory in the R710, you could buy two R710s, and have nearly triple the CPU capacity of a single R900. This assumes that you can split your database workload between two database servers, by moving databases or doing things like vertical or horizontal partitioning of an existing large database.

Finally, by mid-2012, you could buy a 12th generation, Dell PowerEdge R720, with even faster eight-core Intel Xeon E5-2690 processors (with a total of 32 logical cores), which would push the 32-bit Geekbench score to about 29000.  The R720 has 24 memory slots, so you can have 384GB of RAM with 16GB DIMMs or 768 GB of RAM with more expensive 32GB DIMMs. You also get seven PCI-E 3.0 expansion slots, which gives you more potential I/O bandwidth than you can get with a four-socket server (since they are still using the older PCI-E 2.0 standard).

This gap will open up even more in Q3 of 2013, when the 12-core, 22nm Intel Xeon E5-2600 v2 series (Ivy Bridge-EP) processors are released. These will be pin-compatible with the current E5-2600 series, so they will work with current model servers (probably requiring a BIOS update). They should be available very quickly after Intel releases them.

This overall trend has been continuing over the past several years, with Intel introducing new processors in the two socket space roughly a year ahead of introducing a roughly equivalent new processor in the four socket space. This means that you will get much better single-threaded OLTP performance from a two-socket system than from a four-socket system of the same age (as long as your I/O subsystem is up to par).

Given the choice, I would rather have two, two-socket machines instead of one, four-socket machines in almost all cases. The only big exception would be a case where you absolutely need to have far more memory in a single server that you can get in a two socket machine (a Dell PowerEdge R720 can now go up to 768GB if you are willing to pay for 32GB DIMMs), and you are unable to do any re-engineering to split up your load between two servers.

If you want to dive deeper into this subject, you might want to listen to my latest Pluralsight course, which is SQL Server 2012:Evaluating and Sizing Hardware. You can also contact us if you are interested in expert hardware consulting as you get ready to upgrade your database hardware.

A SQL Server Hardware Tidbit a Day – Day 4

Since 2006, Intel has adopted what they call a Tick-Tock strategy for developing and releasing new processor models. Every two years, they introduce a new processor family, incorporating a new microarchitecture; this is the Tock release. One year after the Tock release, they introduce a new processor family that uses the same microarchitecture as the previous year’s Tock release, but using a smaller manufacturing process technology and usually incorporating other improvements such as larger cache sizes or improved memory controllers. This is the Tick release.

This Tick-Tock release strategy benefits the DBA in a number of ways. It offers better predictability regarding when major (Tock) and minor (Tick) releases will be available. This helps you plan your upgrade strategy and schedule.

Tick releases are usually socket-compatible with the previous year’s Tock release, which makes it easier for the system manufacturer to make the latest Tick release processor available in existing server models more quickly, without completely redesigning the system. In most cases, only a BIOS update is required to allow an existing system to use a newer Tick release processor. This makes it easier for you to maintain servers that are using the same model number (such as a Dell PowerEdge R720 server), since the server model will have a longer manufacturing life span.

As a DBA, you need to know where a particular processor falls in Intel’s processor family tree if you want to be able to meaningfully compare the relative performance of two different processors. Historically, processor performance has nearly doubled with each new Tock release, while performance usually goes up by 20-25% with a Tick release. This historical pattern is starting to change as Intel is beginning to focus more on power efficiency rather that increasing single-threaded performance.

Some of the recent and upcoming Intel Tick-Tock releases are shown in Figure 1.

The Tick-Tock model through the years

Figure 1: Intel’s Tick-Tock Release Strategy

 

The manufacturing process technology refers to the size of the individual circuits and transistors on the chip. The Intel 4004 (released in 1971) series used a 10-micron process; the smallest feature on the processor was 10 millionths of a meter across. By contrast, the Intel Xeon “Sandy Bridge” E5 series (released in 2012) uses a 32nm process. For comparison, a nanometer is one billionth of a meter, so 10-microns would be 10000 nanometers! This ever-shrinking manufacturing process is important for two main reasons:

Increased performance and lower power usage – even at the speed of light, distance matters, so having smaller components that are closer together on a processor means better performance and lower power usage.
Lower manufacturing costs – since you can produce more processors from a standard silicon wafer. This helps make more powerful and more power efficient processors available at a lower cost, which is beneficial to everyone, but especially for the database administrator.

The first Tock release was the Intel Core microarchitecture, which was introduced as the dual-core “Woodcrest” (Xeon 5100 series) in 2006, with a 65nm process technology. This was followed up by a shrink to 45nm process technology in the dual-core “Wolfdale”  (Xeon 5200 series) and quad-core “Harpertown” processors (Xeon 5400 series) in late 2007, both of which were Tick releases.

The next Tock release was the Intel “Nehalem” microarchitecture (Xeon 5500 series), which used a 45nm process technology, introduced in late 2008. In 2010, Intel released a Tick release, code-named “Westmere” (Xeon 5600 series) that shrank to 32nm process technology in the server space. In 2011, the Sandy Bridge Tock release debuted with the E3-1200 series for single socket servers and workstations.  All of these other examples are for two socket servers, but Intel uses Tick Tock for all of their processors. Figure 2 shows this “family history” for Intel server processors.

Year

Process

Model Families

Code Name

2006

65nm

3000, 3200, 5100, 7300

Woodcrest, Clovertown

2007

45nm

3100, 3300, 5400, 7400

Wolfdale, Harpertown

2008

45nm

3400, 3500, 5500, 7500

Nehalem-EP, Nehalem-EX (2010)

2010

32nm

3600, 5600, E7-4800

Westmere-EP, Westmere-EX (2011)

2011

32nm

E3-1200, E5-2600

Sandy Bridge, Sandy Bridge-EP (2012)

2012

22nm

E3-1200 v2, E5-2600 v2

Ivy Bridge, Ivy Bridge-EP/EX (2013)

2013

22nm

E3-1200 v3, E5-2600 v3

Haswell, Haswell-EP (2014 ?)

2014

14nm

Rockwell

2015

14nm

Skylake

2016

10nm

Skymont

Figure 2: Recent and Upcoming Intel Processor Families