Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: Terminating a build process does not terminate child processes #18340

Closed
jacobly0 opened this issue Dec 21, 2023 · 5 comments
Closed

build: Terminating a build process does not terminate child processes #18340

jacobly0 opened this issue Dec 21, 2023 · 5 comments
Labels
bug Observed behavior contradicts documented or intended behavior zig build system std.Build, the build runner, `zig build` subcommand, package management
Milestone

Comments

@jacobly0
Copy link
Member

build.zig:

const std = @import("std");

pub fn build(b: *std.Build) void {
    const build_test = b.addTest(.{ .root_source_file = .{ .path = "repro.zig" } });
    build_test.linkLibC();

    const run_test = b.addRunArtifact(build_test);
    b.step("test", "Run the test").dependOn(&run_test.step);
}

repro.zig:

const std = @import("std");
const builtin = @import("builtin");

extern "c" fn kill(pid: std.c.pid_t, sig: c_int) c_int;
extern "c" fn getppid() std.c.pid_t;

test "loop" {
    _ = kill(getppid(), std.c.SIG.TERM);
    while (true) asm volatile ("");
}
$ ps -o pid,ppid,cmd
    PID    PPID CMD
  51734   51721 /bin/bash --posix
  58662   51734 ps -o pid,ppid,cmd
$ zig build test
error: the following build command crashed:
./zig-cache/o/04a82e29edda016788b7663566160476/build zig . ./zig-cache ~/.cache/zig --seed 0x97dad68b test
$ ps -o pid,ppid,cmd
    PID    PPID CMD
  51734   51721 /bin/bash --posix
  58768       1 ./zig-cache/o/78b4ec0e583741aa4edd00a8b661577f/test --listen=-
  58771   51734 ps -o pid,ppid,cmd

Terminating the build process should not leave orphaned child processes running.

This bug has seemingly resulted, over a period from Oct 15 to Dec 21, in 56 currently active orphan processes on the aarch64-macos CI using up valuable cpu time and causing CI runs to slow down over time.

@jacobly0 jacobly0 added bug Observed behavior contradicts documented or intended behavior zig build system std.Build, the build runner, `zig build` subcommand, package management labels Dec 21, 2023
@jacobly0 jacobly0 added this to the 0.13.0 milestone Dec 21, 2023
@matu3ba

This comment was marked as off-topic.

@rootbeer
Copy link
Contributor

To be clear, in this case you are not worried about arbitrary sub-processes forked by CI right? Just Zig-spawned builders/tester processes that are all running Zig code (and are all written to be compatible with this goal)? (Handling the case where you want to kill child processes running arbitrary, non-Zig binaries seems much harder). In that case the open-pipe trick seems reasonable to me. You would need to repeat it from parent to child, and then from child to grand-child (I don't think the grand-child should get its grand-parent's pipe). Its not clear to me why you marked this one as "unviable".

Assuming the child processes are generally of the form .../test --listen, an alternative might be to give child processes a timeout. E.g., after some time with no new commands sent, it gives up (perhaps with some verbose logging). Especially if the parent is robust enough to handle child processes disappearing anyway.

Yet another approach might be to just reboot the container/vm/environment hosting the CI runs once or twice a day to clean out the cruft.

@matu3ba

This comment was marked as off-topic.

@andrewrk
Copy link
Member

Ouch, just got bit by this locally. I'm not sure why this happens yet.

@andrewrk
Copy link
Member

andrewrk commented Jul 20, 2024

I think I figured it out:

zig/lib/std/zig/Server.zig

Lines 137 to 138 in b7e48c6

const amt = try s.in.read(write_buffer);
fifo.update(amt);

when the parent process terminates, this read keeps getting 0 bytes read and enters an infinite loop.

--- a/lib/std/zig/Server.zig
+++ b/lib/std/zig/Server.zig
@@ -109,6 +109,7 @@ pub fn deinit(s: *Server) void {
 pub fn receiveMessage(s: *Server) !InMessage.Header {
     const Header = InMessage.Header;
     const fifo = &s.receive_fifo;
+    var last_amt_zero = false;
 
     while (true) {
         const buf = fifo.readableSlice(0);
@@ -136,6 +137,10 @@ pub fn receiveMessage(s: *Server) !InMessage.Header {
         const write_buffer = try fifo.writableWithSize(256);
         const amt = try s.in.read(write_buffer);
         fifo.update(amt);
+        if (amt == 0) {
+            if (last_amt_zero) return error.BrokenPipe;
+            last_amt_zero = true;
+        }
     }
 }

thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior zig build system std.Build, the build runner, `zig build` subcommand, package management
Projects
None yet
Development

No branches or pull requests

4 participants