Here’s an interesting question I was sent by my friend Steve Jones over at SQL Server Central – will a single CPU with dual-cores perform better than two single-core CPUs? Both have two processing cores but the hardware architecture is different – which one will make SQL Server perform better? Well, there’s no hard and fast answer – it depends! I had a discussion on this topic this morning with Jerome Halmans, part of my old team in the SQL Server Storage Engine and I’m basing this post on our discussion with his permission.

My hypothesis (which Jerome confirmed) was that the performance of the two architectures depends on the amount of cache line invalidations and how that is managed (see here for a description of CPU caches and cache lines).

  • On the single-core machine, cache line invalidations needs to go across the main bus between the two CPUs, involving bus arbitration delays.
  • On the dual-core machine, cache line invalidations don’t go across the bus because the two cores are contained within the same CPU package. In fact, if the architecture is smart enough, it may be able to just remap the cache line from one processing core to the other, thus avoiding any data copying. I’m not sure if such an architecture exists though.

Here is a very interesting and accessible Intel article that discusses cache-sharing in multi-core Intel systems. In this paper at least, the L2 cache is shared but modifications made by different cores in their private L1 caches still need to bounce through the shared L2 cache before being loaded by the other core. This will still be WAY faster than having to go through main bus between single-core CPUs.

And here is a similar paper from AMD on their Barcelona multi-core architecture that describes each core having separate L1 and L2 caches, with an additional shared L3 cache. The seperate L2 caches are kind-of linked though, in that modifications to a cache line in one L2 cache are immediately mirrored in the other L2 caches (if needed).

But the amount of cache invalidations (of whatever kind) depends on the workload. The two types of workload to consider are:

  1. Where the workload has very independent characteristics, so the data being processed by a thread on one processing core is unrelated to that being processed by a thread on the other core. There should be very few cache line invalidations. In this case, the single-core CPUs will have all their local caches full of data relevant to just the thread running. The two cores on the dual-core CPU will need to share some level of on-chip cache and so their may well be more churn in the cache. In this case I’d expect the single-core CPUs to perform better.
  2. Where the workload is such that data is shared, and threads touch data being processed by others threads on other cores. In this case, the single-core CPUs will fall victim to massive amounts of cache line invalidations, whereas the dual-core CPUs will do on-chip cache line invalidation (of whatever type is supported by the architecture). In this case I’d expect the multi-core CPU to outperform the two single-core CPUs.

Saying that, the majority of workloads on SQL Server are of the second type above. Jerome mentioned that even synthetic workloads (such as the TPCC benchmark) are still going to result in multiple-threads accessing and changing the same data/index pages.

So – what’s the conclusion? I expect that a multi-core CPU will outperform an equivalent number of single-core CPUs in most workloads. And as Jerome pointed out, even if that’s not the case for your workload, you’ll find it pretty hard to find a system that ships with single-core CPUs these days.

I’d love to hear any comments on this, especially any measurements you’ve done on workloads as I don’t have any single-core machines available to run tests on – even the laptop I’m typing this on is a dual-core Centrino.