Monday, April 21, 2008
A follow-up from my previous post.

Is COUNT(1) faster than COUNT(*)?

No, COUNT(1) is mapped into COUNT(*) early in the QP.  From the logic of my previous post, COUNT(1) is always non-null and thus is the same as just doing a COUNT(*).  Since it is semantically equivalent to COUNT(*), the QP just maps it to COUNT(*) because it makes things simpler and you can avoid special cases for things like local-global aggregation and moving group bys around joins.

You can actually look at the query plan for "SELECT COUNT(1) from Table;" and see that the output plan is COUNT(*).

You can save it a few instructions by using COUNT(*).  You will also make the ANSI SQL gods happier.

I get a lot of questions about these syntax issues - it makes sense, as this is often very interesting in traditional procedural programming languages.  In many cases, there are only very minor performance differences, if any, in the SQL language due to these.  The big differences happen because your slightly different query actually means something semantically different and thus implies a different amount of work to compute.

Please keep sending in your questions - I'm happy to answer them!

Thanks,

Conor Cunningham

Monday, April 21, 2008 7:02:39 PM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
Wednesday, February 06, 2008
Had I known the number of responses I would have gotten about running SQL Server in virtual environments, I would have spent more time writing a questionnaire! 

I guess I'm also pleasantly surprised that so many of you read this blog - you must all be smarter and better looking than the average blog reader, for sure ;).

ok, I learned a few things of interest and I will sum them up here.  Then I'll write something more technical where you might learn a few things from me to help you in your day-to-day.

1. A number of you run SQL in virtualized environments.  VMWare more than VirtualPC (to be expected, given the relative state of the two products).  Perhaps the mix will change after Windows 2008 becomes available.

2. More than a few of you run production SQL Server systems on virtualized hardware.  I'm still asking a few clarifying questions, but it appears that this is not limited to tiny systems or lower-volume systems.

3. Within these production systems cases, the most common benefits related to manageability.  Specifically, moving the VM to a new machine when an old one died or needed to be reconfigured.  This makes sense.

4. A number of people run single-user systems to do development, try out some new configuration, etc.  Often you don't care about the system performance in this case.

5. For the group in between the single-user system and the deployed systems, there is frustration about VM configuration.  Typically the overhead is high unless you invest in a beefy storage solution (for example, a RAID-1 or RAID-5 array directly attached ).  Configuring storage and memory properly for the VM is critical for good performance.

I may return to this question in the future with a more systematic survey.  I didn't even ask you if you were running ESX vs. the non-ESX versions of VMWare or whether your CPUs have the VT capability in them!  Thanks for all who shared data with me - I learned a few things and have more to ponder.

I've been playing with VMWare more than average recently, and I've had some fun and some pain.  My experiences are not dissimilar - it can be good for manageability, but it can also be slower. 


OK now for the bit where I tell you a few things you might not know ;).

You probably know that MS isn't particularly interested in you doing this, and there's been press on the licensing (which SKUs you are do virtualization on, etc) around this for a bit.  MS is working hard on its own enterprise-class virtualization story, and I'm sure it will be at least reasonable. 


Within the server itself, there is suprisingly little knowledge of a lot of things in this area that are important to performance.  SQL Server's core engine assumes things like:

1. all CPUs are equally powerful
2. all CPUs process instructions at about the same rate.
3. a flush to disk should probably happen in a bounded amount of time.

These are simplifying assumptions that are often completely untrue, but SQL Server doesn't want to get into the business of forcing you to configure a bunch of numbers in order to use the product (or pay a high-priced consultant, for that matter - at least not to get started ;). 

Virtualization, from the system implemenation perspective, has a similar impact to processor hyperthreading.  Hyperthreading, for those who haven't been reading Intel's website much lately, allows a superscalar (multi-pipeline) CPU to schedule mutiple threads of work at the same time as long as they don't require the same pipelines.  This sounds awesome, right?  You can increase your instructions per cycle on the same hardware with only minimal extra transitors of scheduling overhead.  Well, this works great as long as you build an OS from the ground up that exposes threads as its primary primitive instead of processes.  Then you can just have a pile of threads that execute all over the place, usually without much care about how fast they are running, and you end up with everything running faster.  When Windows consumed this, it wasn't so simple.  The scheduler schedules threads to CPUs, and you can't have a part of a CPU.  So, we have logical and physical CPUs.  To most applications, they see 1 CPU per allowed thread.  this makes everything work but it hides the fact that these things are not in any way the same unit of processing power.  So some threads get assigned to fast CPUs and some get assigned to slow CPUs.  Sometimes, you get assigned back and forth.  ack!

Now let's throw SQL Server on top of this.  It's running lots of threads in one process.  They get scheduled essentially randomly across this matrix of real and fake CPUs.  Some user applications are one thread per user, and they work fine (OLTP applications).  In fact, some of them work a bit faster (10-20%) because there are "more" CPUs.  This is great, since you pay by the socket for Microsoft software (unlike some of those other guys ;). 

Data warehouse applications, however, are a much, much different beast.  A good deal of the benefit of buying the top-end SKU is to get parallel queries.  This speeds up queries by running one query on multiple threads.  It trades CPUs for runtime performance improvements.  The QP has lots of smarts to try to split up things into equal chunks of work and to then run each of those chunks on a different thread on a different CPU.  Usually the pattern for something like an index build or a hash join would be:

1. get everything onto different threads
2. do some work
3. wait for all those threads to complete
4. do something else in the query

So now I have some threads that finish earlier than others.  So they block until the slowest threads finish.  Even worse, I don't think that the query re-allocates those threads for other queries until the whole query finishes.  So, now you have some background as to why hyperthreading was not recommended for at least some SQL Server deployments.

Virtualization has some of the same issues.  Things don't proceed at the same rate on each CPU, assuming you define more than one CPU for the virtual machine.  When your physical machine is under a high load from other VMs, you get less CPU, memory, IO, etc.  For occasional applications, all of this is fine - you can trade this speed for consolodation, managebility, etc.  Occasionally things are slower.  That's fine.  For high-volume applications, this is usually not as good.

Other issues with Virtualization - memory is a big one.  SQL Server assumes, at least in the main server SKUs, that it is the only significant memory consumer on the machine.  It's a *server*.  (SQLExpress has different assumptions, but it's no memory slouch either).  Now, SQL Server will work in a memory constrained environment, but you often don't want to do that.  You take that away from a lot of different things - the buffer pool, the compiled plan cache, memory to execute queries (for example, hash join grants).  All of these things can add up if you aren't careful.

I/O is the area where I have the least experience in virtualization.  This is one of the reasons I asked people about production SQL Servers.  Usually they did get some storage array, and this makes sense - it ramps the I/O bandwidth and usually isolates it from any other operations on the machine (your OS, your application you are developing on top of SQL Server, etc).  I'm going to spend some more time on this, but I think the core idea is sound - as you start sharing your I/O bandwidth over several VMs, you are going to hit limits earlier with big IO consumers like SQL Server.  The same basic logic applies - isolate your database traffic onto different storage paths, especially when building a system to scale.  In a VM world, this can let you avoid the sharing penalties vs. the default config of everyone sharing the same hard drive.

Had you asked me while I worked at MS about a query performance problem on virtualized hardware (even from the MS product), the answer would have been "don't do that".  I think eventually there will be enough push for some other answer than that.  (Reminder - I don't work there anymore, and my opinion is purely my own at this point).


I hope that makes this a fair trade :)

I'm off to bed.  See ya'll.

Conor

Wednesday, February 06, 2008 11:13:08 PM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 
Wednesday, January 30, 2008
So one of the features I had a hand in building in SQL Server 2005 is really useful for helping SQL Server work well in large data warehouse deployments.

I led a team writing the paper that was published about this, and you can read it in my previous blog post here.  The paper covers the theory but not the implementation.

The basic idea is that SQL Server has a lot of competing memory consumers within a single process space.  Often the naive approach (block incoming requests until there is enough memory) will fail under extreme circumstances because there is even incoming requests take some amount of memory, and you can starve on memory waiting for memory.  Additionally, many of the components are built to run best with fewer queries running in the system - for example, hash joins run a lot faster when they do not spill to disk.  When you add it all up, you have a bunch of different memory consumers with a bunch of different optimal policies to achieve optimal performance.

For the query optimizer, we tried a number of different approaches in the naive space before we eventually settled on a tiered set of monitors that block based on how much memory an in-process memory optimization has consumed compared to the free memory in the system.  This meant that the system would not block any trivially small query from compiling, allow a certain number of normal (OLTP) queries to compile, allow a fewer number of medium (think basic DSS) queries to compile, and have one final bucket to allow for the largest query in human history to compile.  (Note that I am not saying "execute" - there are a different set of queues for those queries).

What we found was that, as SQL Server started to be used in larger and larger deployments, the optimizer would consume more memory in those situations where large DW queries were coming into the system.  This led to out of memory errors thrown during compilation or execution.  The cost of restarting these queries was high (often the plan wasn't in the cache yet or was thrown out of the cache quickly due to memory pressure), and the system would just go into a memory-thrashing fit.

So how do I tell if my system is in a memory-thrashing fit due to compilation-overload?  Well, there is a command called DBCC MEMORYSTATUS that will tell you a bit about where you are.  If you scroll through the various rowsets returned, you'll eventually find one called "Optimization Queue".  This reflects the total memory available for all optimizations in the system, and the SQL OS layer controls this.  If lots of other things need memory, this will get clamped down.  The next 3 should be familiar by now - they are the various gateways to block queries of various sizes.  The "Threshold factor" is the amount of memory at which the optimizer will start trying to block that query from continuing until other queries in that bucket have completed compilation.

The "(default)" postfix is for normal user queries.  The "(internal)" queues are for internal queries.  There aren't a ton of internal queries.

Here's a screenshot of this in action:



Now, last I checked I didn't have a large-scale data warehouse running on my desktop, so these numbers are pretty tame right now.  You can run this command on a production system to see where you stand - it is intended to run quickly for debugging purposes on running systems, actually.

Enjoy!

Conor Cunningham
Wednesday, January 30, 2008 9:03:13 AM (Central Standard Time, UTC-06:00)  #    Comments [0]  | 

Theme design by Jelle Druyts

Pick a theme: