-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add devdocs on fixing precompile hangs #50914
Conversation
I'm also forgetting whether I have to do something to have this listed in the sidebar under "Developer Documentation", or whether that will happen automatically. |
Other thoughts:
|
Might it be worth mentioning using the pid with https://docs.julialang.org/en/v1/stdlib/Profile/#Triggered-During-Execution which will print the profile output into the Or is profiling this kind of lingering task not very informative? |
I'm not certain that profiling would capture a timer since it fires rarely, but honestly I don't know. One issue is that getting the PID does not seem straightforward. The problem is that the detailed message doesn't show until you hit Ctrl-C, and at that point the process exits so there's nothing left to profile. The only way I've found to grab the PID while it's still running is to use |
That is a separate issue, which Ian already has a couple PRs up to fix. |
|
Update: I interpret JuliaLang/Pkg.jl#3583 as suggesting that adding the PID doesn't really help much.
If folks think this would be a good idea, I'd be happy to help with the C part of this effort (it's pretty trivial), but the Pkg logic is something that I could use some assistance with. |
The C part is just calling |
Sorry, what I meant was "triggering |
To be precise though, it fails on the first time the process calls |
I see, so you're saying this could even be an Aqua test with no Julia changes required? |
Potentially, yeah. That function just needs to be exposed somehow (which indeed will require a few small C changes) |
6284153 now has the C infrastructure I think we'd need. I've verified that ~/src/julia/julia --startup=no --project -e 'using NoTimer; exit(; wait_task=true)' exits cleanly for a fake package that does not start a timer in its Alternately, @vtjnash, It occurs to me that you once surprised me by saying that we probably could run |
If I may ask: Since this is a very user-facing problem, why is the documentation buried in the devdocs? Shouldn't there be at least an explanation in the main manual and/or the Pkg docs as well? |
Care to make a concrete suggestion of exactly where it should go? |
base/initdefs.jl
Outdated
@@ -25,8 +25,8 @@ Stop the program with an exit code. The default exit code is zero, indicating th | |||
program completed successfully. In an interactive session, `exit()` can be called with | |||
the keyboard shortcut `^D`. | |||
""" | |||
exit(n) = ccall(:jl_exit, Cvoid, (Int32,), n) | |||
exit() = exit(0) | |||
exit(n; wait_task::Bool=false) = ccall(:jl_exit, Cvoid, (Int32, Int32), n, wait_task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming bikeshedding time: should we say this is exit(0, process=true)
, since it specifies whether this call just kills this task or all tasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with kill_tasks=true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Just made a couple small comments
Hmm, I realized one issue with the One thought would be to encourage people to write it function __init__()
if ccall(:jl_generating_output, Cint, ()) == 1 || parse(Bool, get(ENV, "JULIA_AQUA_TEST_INIT_TASKS", "false"))
return nothing
end
....
end but that's less than elegant since it requires a new environment variable simply to avoid a false positive. |
Given that the |
@@ -19,14 +19,17 @@ An array of the command line arguments passed to Julia, as strings. | |||
const ARGS = String[] | |||
|
|||
""" | |||
exit(code=0) | |||
exit(code=0; kill_tasks=true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't kill the tasks, just waits for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is to kill all tasks (as before). The new optional behavior (kill_tasks=false) is to run the rest of the program to completion and only kill this task.
@@ -1852,7 +1852,7 @@ JL_DLLEXPORT int jl_is_initialized(void); | |||
JL_DLLEXPORT void jl_atexit_hook(int status); | |||
JL_DLLEXPORT void jl_task_wait_empty(void); | |||
JL_DLLEXPORT void jl_postoutput_hook(void); | |||
JL_DLLEXPORT void JL_NORETURN jl_exit(int status); | |||
JL_DLLEXPORT void JL_NORETURN jl_exit(int status, int kill_tasks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a breaking change; I think we should avoid it.
Maybe doesn't need to be on the milestone? |
One issue with this is that if you want to run a precompile workload involving that package as a dependency, the package will likely be non-functional. So I am not sure that is the best idea. |
It somewhat only fixes the symptom, rather than the cause. The cause is that the user failed to clean up some resource when they should have, and so there is a leak in the code. In some cases, this code should (in addition to the precompile step to avoid making each and every precompile process expensive and slow) also have an explicit |
I think "this" (meaning, a way of addressing the I/O hang, not actually this PR) does need to be on the milestone:
IMO it's the worst error in the language by a very large margin. But there seems to be very little we can do about it from a technical standpoint: for example, we don't have any tools for discovering where the task was launched from. So the only options seem to be documentation and, in difficult cases, encourage bisection-by-commenting. That said, I no longer think the
In JuliaTesting/Aqua.jl#174 I allude to that by saying "In more complex cases, you may need to set up independently-callable functions to launch the tasks and cleanly shut them down." But perhaps that section should be expanded. |
This is #50914, with only the documentation changes, plus an improvement to the warning message. --------- Co-authored-by: Tim Holy <tim.holy@gmail.com> Co-authored-by: Ian Butterworth <i.r.butterworth@gmail.com>
As promised, CC @MilesCranmer
I should say that these devdocs are not very satisfying. I didn't find it very useful to extract the PIDs, because I think the warning message from Pkg does the core job of telling you which causative package is hanging, and AFAICT that's the only useful thing you get from the PID. So the advice basically boils down to "comment half your code at a time and bisect until you've found the problematic lines, then stare at them until you figure out what's going on." There's a little more guidance than that, as you'll see if you review this PR, but I think we just don't have decent diagnostics to pinpoint what is going wrong.