Database Virtualization – The Dirty Little Secret Nobody is Talking About…

Had I known the number of responses I would have gotten about running SQL Server in virtual environments, I would have spent more time writing a questionnaire! 

I guess I’m also pleasantly surprised that so many of you read this blog – you must all be smarter and better looking than the average blog reader, for sure ;).

ok, I learned a few things of interest and I will sum them up here.  Then I’ll write something more technical where you might learn a few things from me to help you in your day-to-day.

1. A number of you run SQL in virtualized environments.  VMWare more than VirtualPC (to be expected, given the relative state of the two products).  Perhaps the mix will change after Windows 2008 becomes available.

2. More than a few of you run production SQL Server systems on virtualized hardware.  I’m still asking a few clarifying questions, but it appears that this is not limited to tiny systems or lower-volume systems.

3. Within these production systems cases, the most common benefits related to manageability.  Specifically, moving the VM to a new machine when an old one died or needed to be reconfigured.  This makes sense.

4. A number of people run single-user systems to do development, try out some new configuration, etc.  Often you don’t care about the system performance in this case.

5. For the group in between the single-user system and the deployed systems, there is frustration about VM configuration.  Typically the overhead is high unless you invest in a beefy storage solution (for example, a RAID-1 or RAID-5 array directly attached ).  Configuring storage and memory properly for the VM is critical for good performance.

I may return to this question in the future with a more systematic survey.  I didn’t even ask you if you were running ESX vs. the non-ESX versions of VMWare or whether your CPUs have the VT capability in them!  Thanks for all who shared data with me – I learned a few things and have more to ponder.

I’ve been playing with VMWare more than average recently, and I’ve had some fun and some pain.  My experiences are not dissimilar – it can be good for manageability, but it can also be slower. 

OK now for the bit where I tell you a few things you might not know ;).

You probably know that MS isn’t particularly interested in you doing this, and there’s been press on the licensing (which SKUs you are do virtualization on, etc) around this for a bit.  MS is working hard on its own enterprise-class virtualization story, and I’m sure it will be at least reasonable. 

Within the server itself, there is suprisingly little knowledge of a lot of things in this area that are important to performance.  SQL Server’s core engine assumes things like:

1. all CPUs are equally powerful
2. all CPUs process instructions at about the same rate.
3. a flush to disk should probably happen in a bounded amount of time.

These are simplifying assumptions that are often completely untrue, but SQL Server doesn’t want to get into the business of forcing you to configure a bunch of numbers in order to use the product (or pay a high-priced consultant, for that matter – at least not to get started ;). 

Virtualization, from the system implemenation perspective, has a similar impact to processor hyperthreading.  Hyperthreading, for those who haven’t been reading Intel’s website much lately, allows a superscalar (multi-pipeline) CPU to schedule mutiple threads of work at the same time as long as they don’t require the same pipelines.  This sounds awesome, right?  You can increase your instructions per cycle on the same hardware with only minimal extra transitors of scheduling overhead.  Well, this works great as long as you build an OS from the ground up that exposes threads as its primary primitive instead of processes.  Then you can just have a pile of threads that execute all over the place, usually without much care about how fast they are running, and you end up with everything running faster.  When Windows consumed this, it wasn’t so simple.  The scheduler schedules threads to CPUs, and you can’t have a part of a CPU.  So, we have logical and physical CPUs.  To most applications, they see 1 CPU per allowed thread.  this makes everything work but it hides the fact that these things are not in any way the same unit of processing power.  So some threads get assigned to fast CPUs and some get assigned to slow CPUs.  Sometimes, you get assigned back and forth.  ack!

Now let’s throw SQL Server on top of this.  It’s running lots of threads in one process.  They get scheduled essentially randomly across this matrix of real and fake CPUs.  Some user applications are one thread per user, and they work fine (OLTP applications).  In fact, some of them work a bit faster (10-20%) because there are “more” CPUs.  This is great, since you pay by the socket for Microsoft software (unlike some of those other guys ;). 

Data warehouse applications, however, are a much, much different beast.  A good deal of the benefit of buying the top-end SKU is to get parallel queries.  This speeds up queries by running one query on multiple threads.  It trades CPUs for runtime performance improvements.  The QP has lots of smarts to try to split up things into equal chunks of work and to then run each of those chunks on a different thread on a different CPU.  Usually the pattern for something like an index build or a hash join would be:

1. get everything onto different threads
2. do some work
3. wait for all those threads to complete
4. do something else in the query

So now I have some threads that finish earlier than others.  So they block until the slowest threads finish.  Even worse, I don’t think that the query re-allocates those threads for other queries until the whole query finishes.  So, now you have some background as to why hyperthreading was not recommended for at least some SQL Server deployments.

Virtualization has some of the same issues.  Things don’t proceed at the same rate on each CPU, assuming you define more than one CPU for the virtual machine.  When your physical machine is under a high load from other VMs, you get less CPU, memory, IO, etc.  For occasional applications, all of this is fine – you can trade this speed for consolodation, managebility, etc.  Occasionally things are slower.  That’s fine.  For high-volume applications, this is usually not as good.

Other issues with Virtualization – memory is a big one.  SQL Server assumes, at least in the main server SKUs, that it is the only significant memory consumer on the machine.  It’s a *server*.  (SQLExpress has different assumptions, but it’s no memory slouch either).  Now, SQL Server will work in a memory constrained environment, but you often don’t want to do that.  You take that away from a lot of different things – the buffer pool, the compiled plan cache, memory to execute queries (for example, hash join grants).  All of these things can add up if you aren’t careful.

I/O is the area where I have the least experience in virtualization.  This is one of the reasons I asked people about production SQL Servers.  Usually they did get some storage array, and this makes sense – it ramps the I/O bandwidth and usually isolates it from any other operations on the machine (your OS, your application you are developing on top of SQL Server, etc).  I’m going to spend some more time on this, but I think the core idea is sound – as you start sharing your I/O bandwidth over several VMs, you are going to hit limits earlier with big IO consumers like SQL Server.  The same basic logic applies – isolate your database traffic onto different storage paths, especially when building a system to scale.  In a VM world, this can let you avoid the sharing penalties vs. the default config of everyone sharing the same hard drive.

Had you asked me while I worked at MS about a query performance problem on virtualized hardware (even from the MS product), the answer would have been “don’t do that”.  I think eventually there will be enough push for some other answer than that.  (Reminder – I don’t work there anymore, and my opinion is purely my own at this point).

I hope that makes this a fair trade 🙂

I’m off to bed.  See ya’ll.

Conor

Other articles

New blog location

Here is the new blog  – please update those readers. http://blogs.msdn.com/conor_cunningham_msft/default.aspx I’m getting settled into working for Microsoft again – it’s basically like drinking from

Explore

The Trouble with Triggers

(Apologies to Star Trek). I received a question about trigger performance, especially the do’s and dont’s about how they are used. Let’s first start by

Explore

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.