A few months ago while I was teaching wait statistics, I was asked whether there’s any expected differences with waits stats when SQL Server is running in a virtual machine.
My answer was yes – there’s a possibility of seeing longer wait times if something prevents the VM from running, as the wait times are based on the __rdtsc counter difference (essentially the processor clock tick count) between the wait starting and ending.
In VMware or Hyper-V, if a thread inside of SQLOS is waiting for a resource, and the VM has to wait to be scheduled to execute by the hypervisor due to the host being oversubscribed with vCPUs based on the hardware pCPUs, then the actual resource wait time noted in SQL Server will include that time that the VM was unable to run, and so the wait time will appear to be longer than it would have been had the VM not been delayed.
It’s an interesting discussion on whether this is problematic or not, but my view is that it could lead to someone chasing a SQL Server performance problem that’s actually a VM performance problem. Note: this isn’t a problem with the hypervisor, this is because of a misconfiguration of the virtual environment.
Anyway, after the class I got to thinking about thread scheduling in general on a VM that is periodically delayed from running and whether it could cause any other interesting effects around wait statistics.
Specifically, I was concerned about SOS_SCHEDULER_YIELD waits. This is a special wait type that occurs when a thread is able to run for 4ms of CPU time (called the thread quantum) without needing to get suspended waiting for an unavailable resource. In a nutshell, a thread must call into the SQLOS layer every so often to see whether it has exhausted its thread quantum, and if so it must voluntarily yield the processor. When that happens, a context switch occurs, and so a wait type must be registered: SOS_SCHEDULER_YIELD. A deeper explanation of this wait type is in my waits library here.
My theory was this: if a VM is prevented from running for a few milliseconds or more, that could mean that a thread that’s executing might exhaust its thread quantum without actually getting 4ms of CPU time, and so yield the processor causing an SOS_SCHEDULER_YIELD wait to be registered. If this happened a lot, it could produce a set of wait statistics for a virtualized workload that appears to have lots of SOS_SCHEDULER_YIELDs, when in fact it’s actually a VM performance problem and the SOS_SCHEDULER_YIELD waits are really ‘fake’.
I discussed this with my good friend Bob Ward from the SQL Product Group and after some internal discussions, they concurred that it’s a possibility because the thread quantum exhaustion time is calculated using the __rdtsc intrinsic when the thread starts executing, so any delay in the VM running could produce the effect I proposed.
Given that I’m a virtual machine neophyte, I asked Jonathan to run some tests inside of our VMware lab environment to see if he could show the issue happening. He ran a known workload that we use in our Immersion Events to demonstrate the performance impact of host oversubscription, causing a VM to be delayed, and lo and behold, he saw a substantially elevated level of SOS_SCHEDULER_YIELD waits (around 20x more) for the workload, compared to running the same workload on the same VM without any delays.
These same tests were repeated in our Hyper-V lab environment that is identical in hardware and VM configuration to the VMware environment and similar levels of elevated SOS_SCHEDULER_YIELD waits were also seen, so the issue is definitely not specific to any given hypervisor or virtual platform, it’s purely related to the host being oversubscribed for the workloads being run and the SQL Server VM having to wait for CPU resources to continue execution.
I’m deliberately not presenting Jonathan’s test results here because I’m not qualified to explain VMware esxtop output or Hyper-V performance counter values and how they correlate to the SOS_SCHEDULER_YIELD numbers to show the problem occurring. Jonathan will do a follow-up post in the next week or two that explains the results from a virtualization perspective.
However, with a simple set of tests we were able to show that with a VM that gets delayed from running, a SQL Server workload can show a much higher level of SOS_SCHEDULER_YIELD waits because of the use of the __rdtsc intrinsic to calculate thread quantum exhaustion times.
This is really interesting because this is a VM performance issue *causing* a wait type to appear, not just causing waits to be longer.
You should definitely consider this phenomenon if you’re investigating a high number of SOS_SCHEDULER_YIELD waits, a workload performance problem, and your workload is running in a VM. Jonathan explains how to correlate these waits with signs of a VM performance problem in this post.
Hope this helps!
43 thoughts on “Increased SOS_SCHEDULER_YIELD waits on virtual machines”
Will it be accompanied by high co-stop / ready time?
In a nutshell, yes, but Jonathan will blog the specifics in a week or two.
Please & please explain also rdtsc counter
as the wait times are based on the __rdtsc counter difference.
If you click on the link in the blog post, it takes you to the MSDN page that explains the counter – clock ticks since last CPU reset.
Great information. Were there any other waits which were raised as well, or just SOS_Sched?
I understand many other waits will be more, but not a greater %.
However somewhat unrelated to blog
Is it true, the SQL OS scheduler is oblivious to hyper-threading, meaning it may see that a scheduler is idle but be oblivious to the fact that another scheduler may be using the same the same CPU core and maxing it out.
To give anther example of a condition that manifests as an SOS_SCHEDULER_YIELD, we were recently suffering from timeouts for queries that use random user supplied Start and End Date parameters. The queries timed out more frequently immediately after the weekly partitioning jobs and gradually decreased as the week went on, this wait type decreased proportionally during the week too. Microsoft Support assisted and the problem was eventually diagnosed as…
The ‘archive’ table statistics were out of date and the queries were attempting to auto update stats, this was taking longer than the 30 second timeout. While the queries waited for the statistics to be updated, they displayed SOS_SCHEDULER_YIELD as the cause. We enabled ‘Auto Updated Statistics Asynchronous’ and the problem went away immediately. I searched the web for hours for ‘SOS_SCHEDULER_YIELD’ and couldn’t find one post that correlated it with statistics being updated.
Very interesting – I’ll need to repro and then blog this.
After discussing with the Product Group, SOS_SCHEDULER_YIELD isn’t a reliable indicator of a stats update happening. It just so happened that in your case, the data used for the stats update was all in memory already, and there was no contention while reading it. Hence the thread doing the update was able to run for 4ms at a time with no other resource wait. If the data hadn’t all been in memory, there would have been PAGEIOLATCH_SH waits and no SOS_SCHEDULER_YIELD waits.
Microsoft Support didn’t actually diagnose the problem for us in the end, they were brilliant and helped a lot but they didn’t pinpoint it. We ran PSSDIAG for them during the timeouts, then again when there were none, they thought the issue was due to compile locks. A lot of work was done to reduce recompiles but it didn’t make any difference.
There are 10’s of thousands of queries running each minute referencing views with UNION joins of archive and live tables, multiple tables containing 100’s of millions of rows. Less than 1% of the queries were timing out but this was enough to cause a lot of people a lot of stress.
We eventually noticed this command running in an extended event session during the timeouts:
SELECT StatMan([SC0], [SB0000]) FROM (SELECT TOP 100 PERCENT [SC0], step_direction([SC0]) over (order by NULL) AS [SB0000] FROM (SELECT [ID] AS [SC0] FROM [archive].[TableName] TABLESAMPLE SYSTEM (2.562441e-001 PERCENT) WITH (READUNCOMMITTED) ) AS _MS_UPDSTATS_TBL_HELPER ORDER BY [SC0], [SB0000] ) AS _MS_UPDSTATS_TBL OPTION (MAXDOP 1)
It was mostly SOS_SCHEDULER_YIELDS showing for the requests that timed out but not exclusively, PAGEIOLATCH_SH waits were appearing too.
Thanks for looking at it, sorry for the slow reply, I didn’t get an email to tell me you had responded.
Just a clarification for any who may search for that quoted reference, “Auto Updated Statistics Asynchronous”. The actual setting is (in the UI) “Auto Update Statistics Asynchronously” (correcting the second and fourth word), while as a command it’s “auto_update_statistics_async” (abbreviated).
I appreciate the commenter was being helpful (and thanks for it!). And sure, most search engines would help you find the right setting. I just thought I’d share this while I was looking into it based on reading this, in case it may help other readers who come across this discussion.
Yes, the article helps! To begin with I call this phenomenon Quantum Starvation (QS). What’s observed is longer query response times with disproportionate logical reads, that is, the logical reads for the query workload remains the same, but the overall response time significantly increases, along with, as you’ve noted, a significant increase in the SOS_SCHEDULER_YIELD count. To help identify QS, I’ve put together a test comprised of a few objects, data, extended event–Thanks Paul! –and query. The test is structured such that the thread for the query is never suspended waiting for a resource. If you like I’ll gladly share what I have.
At the end of the above article, there’s talk of Jonathan’s findings from a virtualization perspective; can a link be provided to his findings? I’m curious to know what to look for on the VM end of things. Like an increase in ready time, misconfiguration with scheduling or anything else that would lead to an increase in the SOS_SCHEDULER_YIELD count.
It’s coming – still in the works.
Any progress Jonathan’s findings?
Did Jonathan get around to blog about this? (I looked for it, but could not find it).
Not yet – still in the works.
You are the one to verify any fact authenticity.
From below post,
Note that non-yielding scheduler time means how long a thread is occupying the scheduler without any yielding. It doesn’t always means the CPU execution time. The thread holding the scheduler may be held by operation system if other application is using the same CPU at that time point
In this case if other app contesting for CPU it must have yielded and it should be sos scheduler yield.
Am I right pls
Anything that artificially extends the time that a thread appears to be using the CPU on a SQL Server scheduler can lead to spurious SOS_SCHEDULER_YIELD waits when the SQL Server thread is actually able to continue executing.
I did not understand if there is any ratio between resource and signal times
There is no ratio between them.
Will this have the knock on effect on CXPacket wait type?
My assumption is yes.
Maybe, maybe not – absolutely not definitely. It entirely depends on what the query plan is, what schedulers the threads for the parallel operation are on, and many other factors.
I’ve been observing occasional high SOS_SCHEDULER_YIELD wait times even though there’s enough spare CPU capacity in both the VM and the physical host. It usually happens when the CPU utilization exceeds 50%. It seems that the SQL Server OS scheduler sometimes let the thread wait in the queue instead putting it on the CPU.
SOS_SCHEDULER_YIELD has nothing to do with whether there’s spare CPU capacity or not. If a thread runs for 4ms, it’ll cause an SOS_SCHEDULER_YIELD, always.
Sure, the waits will appear regardless of whether there’s spare CPU or not. But in case that there’s spare CPU, the total wait time should be negligible, because the thread that exhausted its 4ms quantum on CPU should get back on CPU immediately. In contrast, if there isn’t spare CPU, the thread will have to wait causing SOS_SCHEDULER_YIELD time to increase.
In my last comment, I was referring to a case where I observed relatively high SOS_SCHEDULER_YIELD wait time that wasn’t caused by CPU saturation, which, in my humble opinion, is an anomaly.
That depends on whether further coscheduling happens to the thread has to wait on the runnable queue for more real-world elapsed time, even though the VM-running time is zero – that will increase the signal wait time and make the SOS_SCHEDULER_YIELD seem long (as signal wait time is the only type of wait time for those waits) plus making all other runnable threads’ signal wait time increase.
If you’re not seeing coscheduling issues, then SOS_SCHEDULER_YIELD is the same as for non-VM servers – truly long-running threads and likely caused by large scans of already-in-cache tables. Not an anomaly if the workload does that normally.
I’m not sure if I’ve understood the term “coscheduling” correctly. Do you mean by that a state where multiple threads are assigned to the same SOS scheduler and, in doing so, competing for the same CPU?
Coscheduling problems are when there are too many VMs on a host and some of them are prevented from running – so all the threads inside SQL Server get artificially higher wait times, and this is what leads to a VM starting and suddenly all the the threads have exceeded their quantum, leading to artificial SOS_SCHEDULER_YIELD waits.
With regard to you comment, that SOS_SCHEDULER_YIELD might be normal for certain type of workloads, I set up a test case where I’m running a simple TSQL loop. When only one session is executing the loop, the load is completely CPU bound, i.e. SOS_SCHEDULER_YIELD time is negligible.
But as I start increasing the number of concurrent sessions (below the point of saturation, that is load < #CPUs), the total SOS_SCHEDULER_YIELD time keeps increasing as well. The reason for this is that, at some point, SQL Server doesn't seem to use all the available schedulers to evenly distribute the load. Therefore, the threads start competing for a small number of schedulers, even though there are idle schedulers around. Consequently, SOS_SCHEDULER_YIELD wait time starts increasing while a thread waits to be put on a currently busy scheduler.
This makes me think that there's some scheduling issue within SQL Server, which gives rise to higher SOS_SCHEDULER_YIELD wait time. This problem appears on a non-VM server as well, so we can exclude coscheduling as the root cause in this particular case.
I elaborated on this observation here: https://nenadnoveljic.com/blog/sos_scheduler_yield-waits-during-low-cpu-utilization/
SOS_SCHEDULER_YIELD has nothing to do with CPU pressure – it’s when a thread can run without needing to wait for a resource for 4ms. It has nothing to do with whether there are schedulers available, solely to do with what code the thread is running. So your observation is correct – more threads running a workload that produces SOS_SCHEDULER_YIELD will mean more SOS_SCHEDULER_YIELDs, no matter the CPU utilization. But you’re drawing the wrong conclusion from it. There is no scheduling problem within SQL Server – I’m afraid you’re misunderstanding the entire issue in this post and what SOS_SCHEDULER_YIELDs are.
First of all, let me thank you for your patience so far. Hopefully, I won’t annoy you too much in trying to explain what makes me think that there is some scheduler contention.
Here are the elapsed times of 6 sessions concurrently executing an identical CPU bound workload (a simple TSQL loop without any queries):
We can see that the difference between the shortest and the longest run is 43%!
By the way, it was the only load on the server.
Below are the CPU times consumed by each scheduler:
Only 5 schedulers were handling the workload generated by 6 concurrent sessions. This means that at any time there was a thread waiting to get on scheduler.
In my humble opinion, this time is measured with signal_wait_time of SOS_SCHEDULER_YIELD wait event.
wait_type wait_time_ms signal_wait_time_ms
——— ———— ——————-
SOS_SCHEDULER_YIELD 8840 8837
(SOS_SCHEDULER_YIELD was the only wait event for the session.)
Indeed, if we add all the scheduler CPU times and SOS_SCHEDULER_YIELD signal wait time we’ll get the total elapsed time for all of the sessions.
While it’s true that the total SOS_SCHEDULER_YIELD wait time should increase with the load, the SOS_SCHEDULER_YIELD wait time per thread should remain constant, which isn’t the case.
NP at all. Right – because SQL Server, at the time you started the test, happened to put two of the threads on one scheduler, and threads never move between schedulers. So in the case of the two threads on one scheduler, each of them is having to wait 4ms on the runnable queue when they come off the processor. For the four other threads with a scheduler to themselves, they immediately get back on the scheduler again. A better way to demonstrate the constancy is to run number of threads = 2x or 3x visible schedulers and then you should see avg signal wait time be around 4ms or 8ms, respectively. SQL Server doesn’t guarantee to use least-busy when placing threads on schedulers, and it becomes much more complex when you throw NUMA into the mix too, as there are a variety of thread placement algorithms it can chose from (and you can force). But in all cases, the SOS_SCHEDULER_YIELD is a workload artifact, not a scheduling artifact. Signal wait time is the scheduling artifact to examine. Thanks
One more thing, SQL 2016 changed the runnable queue logic to no longer be strict FIFO, but instead takes into account how much of the thread quantum each thread is able to use when it gets the processor, and will adjust thread priority in the runnable queue to try to help with cases where some threads exhaust their quantum every time, hence getting more CPU than other threads on the scheduler.
Such scheduling algorithms give rise to two problems: volatile response times and significantly lower performance with the increased load.
T8008 allegedly better distributes threads across schedulers, but I measured only a minor improvement (on SQL Server 2017).