Non-Uniform Memory Access (NUMA) is a type of computer design popping up more and more these days. In simple terms, it means that when a processor core accesses different areas of the server’s memory, not all memory will be accessed the same way. Some memory may be faster or slower depending on where it’s wired up.
Let’s get started by looking at a truly different kind of server: the IBM 3950. It’s a rack-mount server with four CPU sockets, which means we can plug in four Intel Xeon processors. It has 32 DIMM slots for memory, so we can stuff it with up to a terabyte of memory. When we plug in a keyboard, mouse, and monitor, and we push the power button, our operating system boots up. Simple enough so far.
Things start to get complicated with the special scalability ports. You can wire two or more these together, populate them both with CPUs and memory, and they become a larger server. The picture below shows two IBM x445’s on top of each other for visualization purposes – the 445 is an older relative of the 3950, and it had the same capabilities.
When you hit the power button on one, both 3950’s light up simultaneously, and the BIOS boot screen shows 8 CPUs and the total amount of memory in both servers. You can install Windows (or VMware or whatever) and the whole server acts as one pool of resources. Well, not exactly the same pool – more like two pools, to be more specific, because when a CPU in one of the 3950’s needs to access something stored in the other 3950’s memory, it has to travel through that scalability cable that connects the two boxes together. There’s a delay – small, but measurable – involved with this cross-server access. We’re accessing memory that isn’t on our motherboard – it’s over in another server next door.
NUMA-aware applications like SQL Server recognize that not all memory is created equal. SQL Server knows which memory lives closest to each CPU and tries to make scheduling decisions based on the server’s architecture.
Unfortunately, all this goes out the window with virtualization. Depending on load and the sysadmin’s decisions, your virtual SQL Server can move around not just from core to core, but from host to host live in real time. In the case of our IBM 3950’s, our SQL Server could be running on the top server one minute, and on the bottom server the next! If our SQL Server has 32GB of memory living in the top box, but we’re scheduling threads on the CPUs in the bottom box, we’re going to pay a penalty every time we try to grab things off memory from the “wrong” box.
Newer hypervisors try to patch things up by being aware of where the threads are getting scheduled versus where the memory lives. When VMware vSphere moves a virtual machine to a different NUMA node, it copies the virtual machine’s memory over to the new NUMA node in the background to gradually eliminate the overhead associated with accessing memory across nodes. (This is yet another reason why I push clients to upgrade their hypervisors as fast as possible.) Does your hypervisor support that, and to be more specific, does it support that feature on your physical hardware? Only your vendor and your hypervisor know for sure, and you have to constantly read the release notes of each hypervisor version to figure things out. Call me crazy, but I love doing stuff like that, and that’s how I learn the most about what WASN’T supported in the last version of the hypervisor. When a new version brags about doing something to get more performance from NUMA configurations, that tells me the previous version didn’t have that capability – and maybe its competitors don’t either.
The easiest answer today is to simply avoid big servers as hypervisors – there’s no real need to use a single 8-socket host instead of a 2-socket pizza box or blade. The bigger hosts are more expensive to buy and offer less flexibility. When you toss in the performance risks of hypervisors and big virtual machines, it’s no contest. Smaller is better.
Here’s a few links to learn more about NUMA, CPU scheduling, and virtualization on big servers:
- IBM Redbook – Virtualization on the IBM 3950
- VMware vSphere 4 – The CPU Scheduler and the updated vSphere 4.1 CPU Scheduler
- IBM Redbook – VMware Scale Up or Scale Out? – older whitepaper covering an out-of-date version of VMware ESX, but it’s interesting to learn the challenges behind CPU scheduling.
- Hyper-V CPU Scheduling – doh! I can’t find a decent whitepaper explaining how Hyper-V schedules threads, but if you know about one, let me know.