-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel/mutex: Fix races, make unlock rescheduling #20919
Conversation
The k_mutex is a priority-inheriting mutex, so on unlock it's possible that a thread's priority will be lowered. Make this a reschedule point so that reasoning about thread priorities is easier (possibly at the cost of performance): most users are going to expect that the priority elevation stops at exactly the moment of unlock. Note that this also reorders the code to fix what appear to be obvious race conditions. After the call to z_ready_thread(), that thread may be run (e.g. by an interrupt preemption or on another SMP core), yet the return value and mutex weren't correctly set yet. The spinlock was also prematurely released. Fixes zephyrproject-rtos#20802 Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Need some more discussion and investigation but possible inclusion early into 2.2. |
As mentioned in triage: I'm concerned that this---as a resolution to #20802---implements a change to the documented behavior of cooperative threads, in that it proposes that releasing a mutex can take control away from a cooperative thread and give it to another thread with a higher priority. A cooperative thread is supposed to release control only when it is made unready. Is releasing a mutex an operation that satisfies this condition? Is that documented anywhere? I fear that this reduces the value of cooperative threads. In any case I think some documentation updates may be warranted. (edit: Let me be clear: I understand the value of yielding to a higher priority thread when a mutex blocking it is released. I just don't think that behavior is allowed with the thread model as currently defined.) |
@pabigot, I tend to agree with what you are saying. This, to me, falls under the category of "APIs that may yield control" and as you point out, this releasing of a mutex currently is not documented to release control in a cooperative thread environment. I suspect that it should allow a swap but that might reveal different problems when you change the underlying scheduler functionality. |
Thanks for using that term, which I hadn't seen before. It seems that the description of cooperative and preemptible threads is not entirely in sync with the description of cooperative and preempting context switches. Neither currently allows a swap on release of a mutex, and the context switching discussion more clearly disallows it:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread behavior in Zephyr is documented in three locations which in aggregate are not entirely consistent:
- Threads
- Scheduling
- The Interrupt and Context Switching sections of the Architecture Porting Guide
The Scheduling page specifies that:
Once a cooperative thread becomes the current thread, it remains the current thread until it performs an action that makes it unready
(It subsequently weakens this description to allow superseding when the current thread invokes k_yield()
which doesn't make the thread unready, but does explicitly cause the scheduler to select a new ready thread.)
The architecture page description specifically states:
A cooperative context switch happens when a thread willfully gives the control to another thread.
and
Control is never taken from cooperative thread when one of them is the running thread.
From what I can tell the change in this PR breaks the contract for cooperative threads because the current thread (which invoked the release) will be replaced by the new thread without having been made unready or yielding.
Note that #20802 has been downgraded to a question. If the answer to the question is that "yes, in this case a context switch should occur" then the description of cooperative thread behavior should be revised prior to merging this change.
(NB: I got here because of #18970 for which we need to know the conditions under which a thread may be swapped for another. This proposal and #20802 make those conditions unclear.)
I think this opens up the discussion on cleaning up cooperative thread operation and documentation. My perspective is that any synchronization activity should be documented to "may" yield control. Where k_yield, k_mutex_unlock, and k_sem_give fall into this group (sema_give and yield already do this). This PR, if the operation is generally agreed on, should update the documentation appropriately to clearly identify this action as one of the actions that may yield control. |
Hi @pabigot , this would not be a problem if there would be a clear list of API functions that are expected to reschedule. For instance giving a semaphore will reschedule why not unlocking the mutex? |
Certainly cleaning up documentation. Until now cooperative threads have been most clearly described as not being swap-able until they are made unready (or invoke How much existing code would break if this behavior changed and threads could lose control where they didn't before?
Absolutely. My point is that is a big API change that must be discussed. |
@pdunaj Why do you believe giving a semaphore will reschedule? I don't see it in documentation; it doesn't seem to from the thread (tested); and the documentation clearly indicates it shouldn't from an ISR. |
We seem to be paralyzed again. @pabigot: I think a documentation-level audit and specification pass that documents reschedule points is definitely needed. Can we get you on board as a volunteer? As far as this patch goes: is there clarity on the desired behavior? My first opinion was that it wasn't needed, consensus seemed to be that it was, now maybe things are swinging the other way. Should I just close this? |
Actually, contrary to what @andyross says here: #20802 (comment) k_sem_give does have a call to |
To a point, yes. From what I can tell the material in the porting architecture section was mostly superseded by the material in the thread and scheduling sections (e.g. architecture still references But certainly in support of #18970 I can make an attempt at marking functions that can cause a context switch. Right now "reschedule point" is not a defined term, so it will need to be clarified and added to #20959. In support of the overall effort I could use some assistance in understanding possible call contexts and filling out the "Context Terminology" section.
I think we need to come to a conclusion in #20802 whether it was a bug as originally proposed, or a question to which the answer is "no, it doesn't work that way", or something else. |
Yeah, I was just wrong about k_sem_give() it looks like. The broader point is that we do indeed mix paradigms across otherwise similar APIs. |
And @pabigot: we still need an answer on this patch, even pending doc rework. Yes or no? We should be able to make a decision on the right behavior here without attempting to get everything right simultaneously. |
OK, I see that now. Can somebody explain to me why that isn't an error, given the documented behavior clearly indicates it should not happen? I can't do anything when implementation is inconsistent with the thread architecture description. I'm no on this patch until the expected behavior is well-defined and consistent with this. (Also, why didn't this cause a failure in a test that verified the cooperative thread context switch behavior was intended?) At a higher level I'm no on a change that allows cooperative threads to be switched out when they did not make themselves unready, and I'm deeply disappointed that there's existing behavior inconsistent with that straightforward behavioral description. |
Opened #21111 to track the doc work that blocks changes to this behavior. |
Looks like a bug, the reschedule was always there for semaphores, however we used to check if the calling thread was preemptible and that is not there any more (bug no. 1). Bug no. 2 is that it does not do that at all for mutexes, so we need to add the reschedule for mutexes but check if we are preemtible first. Bug no. 3 are other objects, but if we fix reschedule in bug no. 2 it should be fixed for others as well, as long as the do call reschedule(). |
@pabigot can we at least agree that k_mutex_unlock() isn't correct for preemptible threads? Right now, if I understand the code correctly, if a preemptive thread releases a k_mutex that was waited on by a higher priority thread, it will continue to run afterwards for some indeterminate time until some other event causes a reschedule. This is horrible, it screws up latency because the higher priority thread is not scheduled in as soon as it should be, it pretty much breaks the point of having priority-inheriting mutexes. I feel this is a bug must be fixed. And that goes for all of our IPC object APIs, they must behave in this way for preemptible threads. I have no objections for not invoking a reschedule when the k_mutex_unlock() caller is cooperative (or indeed any IPC object). If we want to leave it how it is, fine, let's be sure that it's documented. |
correct, this is just a bug that went through and was unnoticed for over a 1.5 years... I think the documentation as we have it is correct already. the fixes should be straightforward, we have all the logic afaik. |
For whatever it's worth the test I did was https://github.com/pabigot/zephyr/commits/nomerge/coop-example. If |
Actually @nashif @pabigot we need to hold on a second here. Can someone run by me again why we have singled out k_sem_give()?? I don't think this is limited to k_sem. I just went through and looked at the kernel code. It appears that All of our IPC which does a non-blocking wakeup of other threads, except k_mutex(), is calling z_reschedule() without considering whether the caller is cooperative. Unless I'm missing something, the kernel code is consistent with the exception of k_mutex, and this patch, as-is, makes the API behave like everything else.
If you want to contend that the documentation of cooperative thread semantics isn't correct and that our IPC APIs need some clarity, sure, but at least in my view it should not block this patch. |
@andrewboie I singled out I've failed to independently confirm that it does, but am deferring to the analysis here that says it does. If there's evidence that this PR does not make A really good way to provide that evidence is to add a test that demonstrates it. |
@andrewboie FreeRTOS is one example where the semaphore/mutex will yield if the task waiting on the item is higher priority. I think mbedOS does the same but the code is a little harder to follow. @pabigot I understand the documentation is at odds with implementation and this PR. I was wondering if there was some other reason you felt this was wrong because documentation can be modified to align with expectations or the code base can be modified to align with documentation. |
@pabigot I don't think you understand my point. EVERY kernel IPC API which does a non-blocking wakeup of other threads has a reschedule point. The only omission was k_mutex_unlock(), and this was not anyone's design for this to be a special case, it's just a bug. You seem to be arguing that because k_mutex_unlock() doesn't currently do this, that this bugged behavior needs to be preserved for co-op threads, and you seem to be basing this solely on the documentation for what co-operative threads are, and not because k_mutex was intended to be this way. I do not find your argument compelling. We need to just fix k_mutex to work like all of our other IPC instead of adding a special case (merge this patch as-is), and if we want a separate discussion on whether IPC should yield the CPU for co-op threads that's another discussion. |
actually freertos uses |
This PR is described as "fixing" (which I interpret as "change behavior to match the stated expectations of") #20802 where the complaint was:
This behavior should be expected for pre-emptible threads, but not for cooperative threads. #20802 is asking for a behavior change inconsistent with documented long-term behavior of Zephyr. However: Regardless of comments in #20802 and here, which I believed, I see no evidence that cooperative threads ever lose control after making higher-priority cooperative threads ready. Which means the behavior is as it's documented to be, and I don't see evidence this PR would change that. I'm removing my block, but would be nice if this was updated to not claim to fix #20802 which should instead be closed as "works as designed". If somebody does want cooperative threads to be preemptible, we can argue about that elsewhere. |
Actually the specific mistake seems to have happened in commit 3a0cb2d: I was removing code in the arch layers that was needlessly interpreting "cooperative", but hit the spot in _reschedule() too, and that needed to stay. Prior to that commit a cooperative thread would never swap underneath _reschedule(), after that the preemptibility was ignored. |
[edit: my understanding was incorrect and the kernel does the right thing with z_reschedule() calls if the caller is co-op] |
@pabigot @andyross @nashif I might be slow in catching up to something -- do we believe that z_reschedule() already does the right thing if called from co-op context? If so then my original table is incorrect and the true policy as it currently exists is:
In which case this patch is probably OK...but we need to verify that if this check exists, it is using the thread's original base priority, I have my doubts that it does.
I'd rather we kept it open because it's still busted for preempt threads, no matter what. |
It does. Ignore my comment above, the code that was removed was correctly removed, I just fooled myself based on the discussion here. z_reschedule() will not swap away from a coop thread except to a metairq of higher priority. |
I talked to @andyross offline and it looks like this actually does the right thing. So we should be good. |
yes, and we have a test to catch this if it was a bug, looking at the test, it seems to be doing the right thing. |
Changing milestone back to 2.1, @dleach02 lets talk about this tomorrow. |
Thank you all for checking this. Why I was expecting sem give to cause reschedule? My understanding was that by calling sem give / mutex unlock, code is passing the execution to another thread that is waiting for resource. Since this call is explicit I assumed it is a scheduling point that would work the same way as yield. I think it sums up to what one expect to be the function that yields. I checked what Linux kernel does for PREEMPT_NONE and PREEMPT_VOLUNTARY. For both it does not reschedule when sem give is called. It seems that I spent too much time working with preemptive kernel... Going back to Zephyr's documentation it is almost clear on this. There are following items I would add to clarify the subject:
The meaning is very clear but there is no information about which API calls can move the thread to unready state. I think it would be good to add a link here pointing to where it can be found.
This is list is not complete as any API that may result in thread sleeping can also do it. This may be the good place to put the @andrewboie list. |
@pdunaj I think you may be misunderstanding us. None of these APIs reschedule if the caller is co-op, the implementation of this policy is within z_reschedule() -- it's a no-op if the caller is co-op. We don't need to change the documentation. For callers from co-operative threads, k_mutex_unlock() is just like other IPC APIs -- none of them reschedule after waking other threads. The reason why this patch is going in is that for preemptible threads, the reschedule needs to happen, and a call to z_reschedule() was missing completely from the k_mutex_unlock() implementation. So there was a bug, but not quite what you were reporting. For your bug #20802 this is not a fix for you, this only a fix if the caller is a preempt thread. Your coop threads need to call k_yield() if they want to give up the CPU immediately after unlocking a mutex. |
@andrewboie Thanks.
It is possible. But surely I spent most of my career working with preemptive threads. In kernel and user space. For me non-preemptive threads were seen as the ones giving control over CPU at rescheduling points. I actually never bother to look for explicit list as it was never needed. When I read in documentation of threading that cooperative thread gives CPU when becoming unready I understood it as "in rescheduling point". Since list mentioned above is not complete (e.g. mutex_lock, sem_take, etc. are not there) I expanded it in my head to include such cases. And simply included too much ;) I am writing it as other people may have the same problem and wrongly assume that there is a rescheduling point where in fact there is none.
You don't need to. Still it would be great to include explicit lists of rescheduling points. And point to it in documentation of threads. Especially that it will be short.
That I understood from the @andyross comments in this PR. Thanks anyway :) |
@pdunaj and others: Please look at #20959 and specifically the formatted document accessible from the PR description. The intent is to clearly define terms that can be used in API descriptions of behavior. In its current form API that can cause a cooperative thread to be made non-Running is "suspendable". (The term may not be perfect, but it has to be balanced among many other terms.) This is intended to summarize, not replace, existing documentation on thread and scheduling behavior, though some changes to that existing documentation may be necessary to resolve inconsistencies and clarify (e.g. "made unready" is implicitly defined, but should be explicit). From discussion here I've learned that I will need to add "reschedule-point" and "meta-irq" terms, and clarify the behavior of several generalized state changes related to the three (not two) classes of thread behavior. That should happen this week. Ultimately these tags are expected to be used as in #21061 which IMO is a better approach than putting a table in the documentation. |
There is a difference between |
@pabigot As I said earlier, the intent of this document is really confusing and I would not make it a point of reference. If you have any proposals to changes in the documentation, please submit those to the existing documentation that we have already, duplicating things in different places is not going to help, there should always be 1 single source of truth. It is ok to have some terminology page where things are briefly explained, but the above is becoming documentation on its own already. |
We're going off topic, but: I think it does, because it invokes However, whether
As I noted It's intended to be self contained for now, but once we understand things the existing documentation will be updated to address gaps and the new material specific #18970 refactored into something that integrates with it. I don't see how to get to consensus on terms without having a place where they can be discussed together without having to look at three separate descriptions of thread and scheduler behavior. |
Obviously k_mutex_lock reschedules if the lock is contended, it says that the calling thread waits right in the documentation for k_mutex_lock. Your bug report was about expecting a reschedule after k_mutex_unlock...this is getting confusing |
The k_mutex is a priority-inheriting mutex, so on unlock it's possible
that a thread's priority will be lowered. Make this a reschedule
point so that reasoning about thread priorities is easier (possibly at
the cost of performance): most users are going to expect that the
priority elevation stops at exactly the moment of unlock.
Note that this also reorders the code to fix what appear to be obvious
race conditions. After the call to z_ready_thread(), that thread may
be run (e.g. by an interrupt preemption or on another SMP core), yet
the return value and mutex weren't correctly set yet. The spinlock
was also prematurely released.
Fixes #20802
Signed-off-by: Andy Ross andrew.j.ross@intel.com