Search Engine Q&A #5: Do multi-core CPUs perform better than single-core CPUs?

Here’s an interesting question I was sent by my friend Steve Jones over at SQL Server Central – will a single CPU with dual-cores perform better than two single-core CPUs? Both have two processing cores but the hardware architecture is different – which one will make SQL Server perform better? Well, there’s no hard and fast answer – it depends! I had a discussion on this topic this morning with Jerome Halmans, part of my old team in the SQL Server Storage Engine and I’m basing this post on our discussion with his permission.

My hypothesis (which Jerome confirmed) was that the performance of the two architectures depends on the amount of cache line invalidations and how that is managed (see here for a description of CPU caches and cache lines).

  • On the single-core machine, cache line invalidations needs to go across the main bus between the two CPUs, involving bus arbitration delays.
  • On the dual-core machine, cache line invalidations don’t go across the bus because the two cores are contained within the same CPU package. In fact, if the architecture is smart enough, it may be able to just remap the cache line from one processing core to the other, thus avoiding any data copying. I’m not sure if such an architecture exists though.

Here is a very interesting and accessible Intel article that discusses cache-sharing in multi-core Intel systems. In this paper at least, the L2 cache is shared but modifications made by different cores in their private L1 caches still need to bounce through the shared L2 cache before being loaded by the other core. This will still be WAY faster than having to go through main bus between single-core CPUs.

And here is a similar paper from AMD on their Barcelona multi-core architecture that describes each core having separate L1 and L2 caches, with an additional shared L3 cache. The seperate L2 caches are kind-of linked though, in that modifications to a cache line in one L2 cache are immediately mirrored in the other L2 caches (if needed).

But the amount of cache invalidations (of whatever kind) depends on the workload. The two types of workload to consider are:

  1. Where the workload has very independent characteristics, so the data being processed by a thread on one processing core is unrelated to that being processed by a thread on the other core. There should be very few cache line invalidations. In this case, the single-core CPUs will have all their local caches full of data relevant to just the thread running. The two cores on the dual-core CPU will need to share some level of on-chip cache and so their may well be more churn in the cache. In this case I’d expect the single-core CPUs to perform better.
  2. Where the workload is such that data is shared, and threads touch data being processed by others threads on other cores. In this case, the single-core CPUs will fall victim to massive amounts of cache line invalidations, whereas the dual-core CPUs will do on-chip cache line invalidation (of whatever type is supported by the architecture). In this case I’d expect the multi-core CPU to outperform the two single-core CPUs.

Saying that, the majority of workloads on SQL Server are of the second type above. Jerome mentioned that even synthetic workloads (such as the TPCC benchmark) are still going to result in multiple-threads accessing and changing the same data/index pages.

So – what’s the conclusion? I expect that a multi-core CPU will outperform an equivalent number of single-core CPUs in most workloads. And as Jerome pointed out, even if that’s not the case for your workload, you’ll find it pretty hard to find a system that ships with single-core CPUs these days.

I’d love to hear any comments on this, especially any measurements you’ve done on workloads as I don’t have any single-core machines available to run tests on – even the laptop I’m typing this on is a dual-core Centrino.

5 thoughts on “Search Engine Q&A #5: Do multi-core CPUs perform better than single-core CPUs?

  1. I certainly won’t argue with your reasoning. I will suggest that another factor in deciding whether workload needs to cross the CPU boundary is related to the ccNUMA architecture, which Windows and SQL Server support. I’ve performed our large scale performance & scalability testing on both Intel and AMD based CPU architectures and noticed a significant difference in CPU load distribution. AMD CPUs support ccNUMA – and SQL Sever distributes load very differently on these CPUs than it does on the Intel CPUs. I haven’t tested the latest Intels though (my testing experience in this area is 2 years old). At the end of the day, both CPUs gave us the same performance result.

    We tested on the HP DL580 (Intel) and 585 (AMD), both 4 socket, dual core servers. It was interesting to notice that SQL Server kept 2 sockets very busy on the 585 – redirecting queries to the same sockets where they started thus allowing them to access their local cache data. The DL585 has several numa nodes, where-as the 580 only had 1. So on the Intel server load was evenly distributed across all 4 processors. On the AMD the workloads favored 2 CPUs. I would guess, based upon your blog posting, that there were many cross CPU accesses on the Intel, and fewer on the AMD.

    Again – both gave more than adaquate performance. However the profile was different. Each had a different approach, but got the job done.

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.