Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add devdocs on fixing precompile hangs #50914

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions base/initdefs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,17 @@ An array of the command line arguments passed to Julia, as strings.
const ARGS = String[]

"""
exit(code=0)
exit(code=0; kill_tasks=true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't kill the tasks, just waits for them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is to kill all tasks (as before). The new optional behavior (kill_tasks=false) is to run the rest of the program to completion and only kill this task.


Stop the program with an exit code. The default exit code is zero, indicating that the
program completed successfully. In an interactive session, `exit()` can be called with
the keyboard shortcut `^D`.

Setting `kill_tasks=false` will cause Julia to wait for running tasks (other than the
main task) to finish before exiting.
"""
exit(n) = ccall(:jl_exit, Cvoid, (Int32,), n)
exit() = exit(0)
exit(n; kill_tasks::Bool=true) = ccall(:jl_exit, Cvoid, (Int32, Int32), n, kill_tasks)
exit(; kwargs...) = exit(0; kwargs...)

const roottask = current_task()

Expand Down
1 change: 1 addition & 0 deletions doc/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@ DevDocs = [
"devdocs/gc-sa.md",
"devdocs/gc.md",
"devdocs/jit.md",
"devdocs/precompile_hang.md",
],
"Developing/debugging Julia's C code" => [
"devdocs/backtraces.md",
Expand Down
Binary file added doc/src/devdocs/img/precompilation_hang.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
89 changes: 89 additions & 0 deletions doc/src/devdocs/precompile_hang.md
timholy marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Fixing precompilation hangs due to open tasks or IO

On Julia 1.10 or higher, you might see the following message:

![Screenshot of precompilation hang](./img/precompilation_hang.png)

If you follow the advice and hit `Ctrl-C`, you might see

```
^C Interrupted: Exiting precompilation...

1 dependency had warnings during precompilation:
┌ Test1 [ac89d554-e2ba-40bc-bc5c-de68b658c982]
│ [pid 2745] waiting for IO to finish:
│ TYPE[FD/PID] @UV_HANDLE_T->DATA
│ timer @0x55580decd1e0->0x7f94c3a4c340
```

and, depending on how long you waited, this may repeat.

This message conveys two key pieces of information:

- the hang is occurring during precompilation of `Test1`, a dependency of `Test2` (the package we were trying to load with `using Test2`)
- during precompilation of `Test1`, Julia created a `Timer` object (use `?Timer` if you're unfamiliar with Timers) which is still open; until that closes, the process is hung

If this is enough of a hint for you to figure out how `timer = Timer(args...)` is being created, one good solution is to add `wait(timer)` if `timer` eventually finishes on its own, or `close(timer)` if you need to force-close it, before the final `end` of the module.

However, there are cases that may not be that straightforward. Usually the best option is to start by determining whether the hang is due to code in Test1 or whether it is due to one of Test1's dependencies:
timholy marked this conversation as resolved.
Show resolved Hide resolved

1. `Pkg.develop("Test1")`
2. Comment out all the code `include`d or defined in `Test1`, *except* the `using/import` statements
3. Try `using Test2` (or even `using Test1` assuming that hangs too) again

Now we arrive at a fork in the road: either

- the hang persists, indicating it is due to one of your dependencies
- the hang disappears, indicating that it is due to something in your code

## If the hang is due to a package dependency

Use a binary search to identify the problematic dependency: start by commenting out half your dependencies, then when you isolate which half is responsible comment out half of that half, etc. (You don't have to remove them from the project, just comment out the `using`/`import` statements.)

Once you've identified a suspect (here we'll call it `ThePackageYouThinkIsCausingTheProblem`), first try precompiling that package. If it also hangs during precompilation, continue chasing the problem backwards.

However, most likely `ThePackageYouThinkIsCausingTheProblem` will precompile fine. This suggests it's in the function `ThePackageYouThinkIsCausingTheProblem.__init__`, which does not run during precompilation of `ThePackageYouThinkIsCausingTheProblem` but *does* in any package that loads `ThePackageYouThinkIsCausingTheProblem`. To test this theory, set up a minimal working example (MWE), something like

```julia
(@v1.10) pkg> generate MWE
Generating project MWE:
MWE\Project.toml
MWE\src\MWE.jl
```

where the source code of `MWE.jl` is

```julia
module MWE
using ThePackageYouThinkIsCausingTheProblem
end
```

and you've added `ThePackageYouThinkIsCausingTheProblem` to MWE's dependencies.

If that MWE reproduces the hang, you've found your culprit:
`ThePackageYouThinkIsCausingTheProblem.__init__` must be creating the `Timer` object. If the timer object can be safely `close`d, that's a good option. Otherwise, the most common solution is to avoid creating the timer while *any* package is being precompiled: add

```julia
ccall(:jl_generating_output, Cint, ()) == 1 && return nothing
```

as the first line of `ThePackageYouThinkIsCausingTheProblem.__init__`, and it will avoid doing any initialization in any Julia process whose purpose is to precompile packages.

## If the hang is in your code

Search your package for suggestive words (here like "Timer") and see if you can identify where the problem is being created. Note that a method *definition* like

```julia
maketimer() = Timer(timer -> println("hi"), 0; interval=1)
```

is not problematic in and of itself: it can cause this problem only if `maketimer` gets called while the module is being defined. This might be happening from a top-level statement such as

```julia
const GLOBAL_TIMER = maketimer()
```

or it might conceivably occur in a [precompile workload](https://github.com/JuliaLang/PrecompileTools.jl).

If you struggle to identify the causative lines, then consider doing a binary search: comment out sections of your package (or `include` lines to omit entire files) until you've reduced the problem in scope.
4 changes: 3 additions & 1 deletion src/init.c
Original file line number Diff line number Diff line change
Expand Up @@ -202,8 +202,10 @@ static void jl_close_item_atexit(uv_handle_t *handle)
void jl_task_frame_noreturn(jl_task_t *ct) JL_NOTSAFEPOINT;

// cause this process to exit with WEXITSTATUS(signo), after waiting to finish all julia, C, and C++ cleanup
JL_DLLEXPORT void jl_exit(int exitcode)
JL_DLLEXPORT void jl_exit(int exitcode, int kill_tasks)
{
if (!kill_tasks)
jl_task_wait_empty();
jl_atexit_hook(exitcode);
exit(exitcode);
}
Expand Down
2 changes: 1 addition & 1 deletion src/julia.h
Original file line number Diff line number Diff line change
Expand Up @@ -1852,7 +1852,7 @@ JL_DLLEXPORT int jl_is_initialized(void);
JL_DLLEXPORT void jl_atexit_hook(int status);
JL_DLLEXPORT void jl_task_wait_empty(void);
JL_DLLEXPORT void jl_postoutput_hook(void);
JL_DLLEXPORT void JL_NORETURN jl_exit(int status);
JL_DLLEXPORT void JL_NORETURN jl_exit(int status, int kill_tasks);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a breaking change; I think we should avoid it.

JL_DLLEXPORT void JL_NORETURN jl_raise(int signo);
JL_DLLEXPORT const char *jl_pathname_for_handle(void *handle);
JL_DLLEXPORT jl_gcframe_t **jl_adopt_thread(void);
Expand Down
4 changes: 2 additions & 2 deletions src/rtutils.c
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ JL_DLLEXPORT void JL_NORETURN jl_error(const char *str)
{
if (jl_errorexception_type == NULL) {
jl_printf(JL_STDERR, "ERROR: %s\n", str);
jl_exit(1);
jl_exit(1, 1);
}
jl_value_t *msg = jl_pchar_to_string((char*)str, strlen(str));
JL_GC_PUSH1(&msg);
Expand All @@ -50,7 +50,7 @@ jl_value_t *jl_vexceptionf(jl_datatype_t *exception_type,
jl_printf(JL_STDERR, "ERROR: ");
jl_vprintf(JL_STDERR, fmt, args);
jl_printf(JL_STDERR, "\n");
jl_exit(1);
jl_exit(1, 1);
}
char *str = NULL;
int ok = vasprintf(&str, fmt, args);
Expand Down
6 changes: 3 additions & 3 deletions src/signals-win.c
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ void __cdecl crt_sig_handler(int sig, int num)
signal(SIGINT, (void (__cdecl *)(int))crt_sig_handler);
if (!jl_ignore_sigint()) {
if (exit_on_sigint)
jl_exit(130); // 128 + SIGINT
jl_exit(130, 1); // 128 + SIGINT
jl_try_throw_sigint();
}
break;
Expand Down Expand Up @@ -220,7 +220,7 @@ static BOOL WINAPI sigint_handler(DWORD wsig) //This needs winapi types to guara
}
if (!jl_ignore_sigint()) {
if (exit_on_sigint)
jl_exit(128 + sig); // 128 + SIGINT
jl_exit(128 + sig, 1); // 128 + SIGINT
jl_try_deliver_sigint();
}
return 1;
Expand Down Expand Up @@ -334,7 +334,7 @@ LONG WINAPI jl_exception_handler(struct _EXCEPTION_POINTERS *ExceptionInfo)
if (recursion++)
exit(1);
else
jl_exit(1);
jl_exit(1, 1);
}

JL_DLLEXPORT void jl_install_sigint_handler(void)
Expand Down
4 changes: 2 additions & 2 deletions src/staticdata.c
Original file line number Diff line number Diff line change
Expand Up @@ -2537,7 +2537,7 @@ static void jl_save_system_image_to_stream(ios_t *f, jl_array_t *mod_array,
(intmax_t)sysimg.size,
((uintptr_t)1 << RELOC_TAG_OFFSET)
);
jl_exit(1);
jl_exit(1, 1);
}
if (const_data.size / sizeof(void*) > ((uintptr_t)1 << RELOC_TAG_OFFSET)) {
jl_printf(
Expand All @@ -2546,7 +2546,7 @@ static void jl_save_system_image_to_stream(ios_t *f, jl_array_t *mod_array,
(intmax_t)const_data.size,
((uintptr_t)1 << RELOC_TAG_OFFSET)*sizeof(void*)
);
jl_exit(1);
jl_exit(1, 1);
}
htable_free(&s.callers_with_edges);

Expand Down
2 changes: 1 addition & 1 deletion src/task.c
Original file line number Diff line number Diff line change
Expand Up @@ -696,7 +696,7 @@ JL_DLLEXPORT JL_NORETURN void jl_no_exc_handler(jl_value_t *e, jl_task_t *ct)
jlbacktrace(); // written to STDERR_FILENO
if (ct == NULL)
jl_raise(6);
jl_exit(1);
jl_exit(1, 1);
}

/* throw_internal - yield to exception handler */
Expand Down
8 changes: 8 additions & 0 deletions test/atexit.jl
Original file line number Diff line number Diff line change
Expand Up @@ -260,4 +260,12 @@ using Test
end
end
rm(atexit_temp_dir; force = true, recursive = true)

@testset "block exit with unfinished tasks" begin
t = @elapsed run(`$(Base.julia_cmd()) --startup=no -e 't=Timer(0.1; interval=1); exit()'`; wait=true)
p = run(`$(Base.julia_cmd()) --startup=no -e 't=Timer(0.1; interval=1); exit(; kill_tasks=false)'`; wait=false)
sleep(10*t)
@test process_running(p)
kill(p)
end
end