-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cfe/SCH deadlocks on exit on Linux #701
Comments
Is this just an order thing? Shouldn't applications get deleted before the timers? EDIT - I see what you were saying now.. the callback needs to get unregistered |
Is this reproducible or is it a race condition during shutdown? If a thread is canceled while it is holding a lock, this type of thing can happen. That's the risk with any sort of forced exit situation, which is why its preferable to get tasks to self-shutdown rather than forcibly delete them. |
right.. as far as I can tell, Ctrl-C'ing the process is immediately killing the apps which prevents them from doing any clean shutdown which means cfe needs to do the cleanup. However, this behavior hasn't been a problem for the SCH code base for many versions of cfe. The question is, what changed? What should the app do? what should the cfe/osal/psp do? |
this is an intermittent problem but occurs often enough that it isn't rare for it to occur |
Is this to say you are finding this more frequently occurring in the latest baseline vs. older baselines? If I'm interpreting correctly you are running the latest bleeding-edge baseline - which would have changed the CTRL+C handling to being treated as an exception and thereby flowing through the ER log/processor reset sequence. This still will do a forced delete of all tasks but it will possibly change the timing of when that occurs, and maybe order of operations? But that would have only changed in the most recent baseline. |
as far as I know, it never occurred in the older baselines. and, yes, I am working with the bleeding edge master branches. (see initial comment for hashes). |
I am looking into this one, but unable to replicate the issue as I'm not sure what version/config of SCH is used here. However it could be simply that the OS_ForEachObject, which drives the cleanup operations, finds the tasks and semaphores before the timers. |
@excaliburtb Is the backtrace posted in the initial summary showing every thread that still existed in the process or just the ones that were "stuck"? In particular I'm wondering about the task which runs SCH_AppMain, which is not shown above. This would normally be inside a |
Resolved by nasa/osal#470 |
using modules
95f34d2 cfe
c2bcebbc4d7e60a41b604e9acfc8af3c60b8536a osal
37ee8eb2d7ce006dc1570b920ae75a7ac5f89d27 psp
there seems to be a deadlock upon exit for timers being used by SCH.
See stacktrace
The text was updated successfully, but these errors were encountered: