(The Curious Case of… used to be part of our bi-weekly newsletter but we decided to make it a regular blog post instead so it can sometimes be more frequent. It covers something interesting one of us encountered when working with a client, doing some testing, or were asked in a random question from the community.)
While I was teaching IEPTO2 last week, I was discussing why sometimes a thread cannot be terminated using the KILL command, and thought it would make a great topic for a post.
Some of you have likely seen a phenomenon called a non-yielding scheduler. This is where a thread is using the processor and doesn’t voluntarily yield after using more than the thread quantum (4 milliseconds, unchangeable). There’s a background task called the scheduler monitor that checks that progress is being made on the various schedulers inside SQL Server and issues a warning if it finds a problem. For a non-yielding scheduler, you’ll see error 17883, as below:
Process 56:0:0 (0xdee) Worker 0x041611F6 appears to be non-yielding on Scheduler 2. Thread creation time: 13884536031127. Approx Thread CPU Used: kernel 18 ms, user 263 ms. Process Utilization 0%. System Idle 98%. Interval: 331220558 ms.
There are a variety of reasons these can occur, including I/O subsystem issues, slow calls out to a Windows API, or SQL Server bugs.
In the SQL Server bug case, it’s because the thread entered some code where it’s able to loop without checking whether the 4ms quantum has expired (by calling the SQLOS function YieldAndCheckForAbort, or one of a few variants). If the thread doesn’t check whether the quantum has expired, it won’t know, and so won’t yield, and then you’ve got a non-yielding scheduler. The bug is that there’s a missing call to YieldAndCheckForAbort.
Your first impulse will likely be to KILL the offending SPID, but you can’t. Well, you can run the command, but it likely won’t do anything. This is because there’s no way to *force* a thread to terminate from within SQL Server. The thread has to check whether it’s been asked to terminate, and then it terminates itself (and any sub-threads it may have created if it’s doing a parallel operation). And guess where that check is done? In the YieldAndCheckForAbort function! So if the thread doesn’t ever check, it won’t know, and so won’t terminate.
And that’s why sometimes if the non-yielding scheduler doesn’t resolve itself, you end up having to restart SQL Server.