Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/standalone: reinstate std.ChildProcess tests #13042

Closed
wants to merge 1 commit into from

Conversation

x1ddos
Copy link
Contributor

@x1ddos x1ddos commented Oct 2, 2022

67d5bfe removed std.ChildProcess tests, suggesting to make them standalone instead. this commit does exactly that.

@x1ddos
Copy link
Contributor Author

x1ddos commented Oct 2, 2022

oh, didn't realize #11736 exists.

@x1ddos
Copy link
Contributor Author

x1ddos commented Oct 2, 2022

failing error on freebsd from https://builds.hut.lavatech.top/~andrewrk/job/29111 is:

error: NotOpenForReading
/usr/home/build/zig/build/stage3/lib/zig/std/os.zig:649:22: 0x2154f4 in read (child)
            .BADF => return error.NotOpenForReading, // Can be a race condition.
                     ^
/usr/home/build/zig/build/stage3/lib/zig/std/fs/file.zig:974:13: 0x21530c in read (child)
            return os.read(self.handle, buffer);
            ^
/usr/home/build/zig/build/stage3/lib/zig/std/io/reader.zig:26:13: 0x2157b3 in read (child)
            return readFn(self.context, buffer);
            ^
/usr/home/build/zig/build/stage3/lib/zig/std/io/reader.zig:35:29: 0x21109e in readAll (child)
                const amt = try self.read(buffer[index..]);
                            ^
/usr/home/build/zig/build/stage3/lib/zig/std/io/reader.zig:44:30: 0x210e93 in readNoEof (child)
            const amt_read = try self.readAll(buf);
                             ^
/usr/home/build/zig/test/standalone/child_process/child.zig:37:5: 0x21157f in run (child)
    try stdin.readNoEof(&buf);

which looks identical to #12258 because the second try, without a rm -rf zig-cache, works.

@Vexu
Copy link
Member

Vexu commented Oct 3, 2022

Does this do something that #11736 doesn't or is there some other reason you left this open?

@x1ddos
Copy link
Contributor Author

x1ddos commented Oct 3, 2022

Does this do something that #11736 doesn't or is there some other reason you left this open?

yes:

but either way, close this if there's no use for it here.

Copy link
Contributor

@matu3ba matu3ba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks alot for your PR. This should unblock me in #12754 and revealed the fix for the other CI standalone tests:

For example execution the binary in test_stage3_debug can break, if test_stage3_release tries to overwrite it (after install step to zig-out), because both use the same repo instance with only different cache directories.

Consequently, a smoll hint provided that zig-out breaks on multiple (different) compiler invocations and what should be used instead for that use case sounds like a good idea.

Some feedback by me from a quick glimpse. Mostly nits except for the leak check.

const main = b.addExecutable("main", "main.zig");
main.setBuildMode(mode);
const run = main.run();
run.addArtifactArg(child);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find.

const mode = b.standardReleaseOptions();

const child = b.addExecutable("child", "child.zig");
child.setBuildMode(mode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect miscompilations or why do you test here all compilation modes (debug, releasesafe, releasefast, releasesmall) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't really expect any difference, but that's why we have tests, right? to potentially catch the unexpected :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see your point. From personal experience on working with https://github.com/matu3ba/reduze the CI costs are around 1-2 seconds per binary with LLVM, which means in total (build binary, debug, releasefast, releasesafe, releasesmall).

Unless we add a pile of these it utilizing LLVM it sounds fine to me, but I am not the one to decide this with the background that build.zig is rebuild each time in stage3 due to yet unresolved caching problems.


pub fn main() !void {
var arena_state = std.heap.ArenaAllocator.init(std.heap.page_allocator);
const arena = arena_state.allocator();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more ziggish is to use/copy-paste

var arena_state = std.heap.ArenaAllocator.init(std.heap.page_allocator);
defer arena_state.deinit();
const arena = arena_state.allocator();
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a std.process.exit(exit_code) in the end. i don't believe defer will have a chance to run before process.exit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All defers run before the exit point of the current and all sub scopes. Thereafter the exit is run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ cat > defer.zig
const std = @import("std");

pub fn main() !void {
    std.debug.print("hello world", .{});
    defer std.debug.print("hello from defer", .{});
    std.process.exit(0);
}

$ zig run defer.zig
hello world

you can see above the defer'ed expression wasn't run due to process.exit terminating the program early.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. My bad. defer will execute an expression at the end of the current scope. Mark as resolved, if you want.

const std = @import("std");

// 42 is expected by parent; other values result in test failure
var exit_code: u8 = 42;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/exit_code/expected_exit_code
Then trim the comment abit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "expected" part is actually in the parent. this is a child - it doesn't check against an expected value but simply exits with an exit code. feels like it would be more confusing with "expected_" prefix.

pub fn main() !void {
var gpa_state = std.heap.GeneralPurposeAllocator(.{}){};
defer if (gpa_state.deinit()) {
std.debug.print("found mem leaks", .{});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This print does not set the exit code on memory leakage. You can either be lazy with std.debug.assert (only in debug/releaseSafe) or use @panic to enforce unsuccessful termination.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return if (test_errors) error.ParentTestError else {};
}

var test_errors = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not need to be in global scope and feels duplicated to ok_code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok_code is for checking child exit code. test_errors indicates check failures in the parent. made some variable renaming to hopefully make it more readable.

Copy link
Contributor

@matu3ba matu3ba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove that race condition due to not waiting for the child process to finish writing to its stdout.


// test stdout pipe; parent verifies
const stdout = std.io.getStdOut().writer();
try stdout.print("hello from stdout", .{});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also use writeAll here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


const hello_stdout = "hello from stdout";
var buf: [hello_stdout.len]u8 = undefined;
try proc.stdout.?.reader().readNoEof(&buf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a race condition, because you dont wait for the child process to finish up writing to its stdout. Unfortunately the FreeBSD error trace looks broken, so its hard to decipher.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true. replaced with readAll.

@matu3ba
Copy link
Contributor

matu3ba commented Oct 4, 2022

Mhm, this still fails with

The following command exited with error code 1 (expected 0):
cd /workspace && _release/staging/bin/zig build --build-file /workspace/test/standalone/child_process/build.zig test 
error: UnexpectedExitCode
/workspace/build-release/../lib/std/os.zig:2753:19: 0x2f82f6 in mkdiratZ (build)
        .EXIST => return error.PathAlreadyExists,
                  ^

I suspect that something is broken on hashing and thus release does try to overwite the same cached file as debug on the CI server.
At least thats what the cache dirs for standalone tests indicate to me: /workspace/test/standalone/child_process/zig-cache /root/.cache/zig test.

You can reproduce this problem by building Zig for release and debug yourself and run this test with a certain time difference such that one build is executing before the other replaces the binary:

mkdir -p build/ && cd build/ && cmake .. -DCMAKE_PREFIX_PATH="$HOME/dev/git/bootstrap/zig-bootstrap/musl/out/host/" -GNinja && /usr/bin/time -v ${HOME}/dev/git/cpp/mold/build/mold -run ninja install  && cd ..
mkdir -p buildrel/ && cd buildrel/ && cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_PREFIX_PATH="$HOME/dev/git/bootstrap/zig-bootstrap/musl/out/host/" -GNinja && /usr/bin/time -v ${HOME}/dev/git/cpp/mold/build/mold -run ninja install && cd ..

inotifywait -e modify,create,delete -m -r test/standalone should also show that, but I did not get to verify and debug it in detail yet.
If that is the case, setting a debug point inside the child and run gdb with follow child should reproduce the behavior.

@x1ddos
Copy link
Contributor Author

x1ddos commented Oct 4, 2022

I suspect that something is broken on hashing and thus release does try to overwite ...

interesting! i'll try to debug this later. thanks.

@x1ddos x1ddos requested a review from matu3ba October 4, 2022 12:46
Copy link
Contributor

@matu3ba matu3ba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what is wrong with FreeBSD on this. The pipe looks unidirectional (bidirectional ones are not supported) and the call is blocking (non-blocking mode is not supported like on Linux).
You can check, if the FIFO is opened with the correct permissions (O_RDONLY set, O_WRONLY and O_NONBLOCK not set) by adding the assertions for the OS.

The arena allocator does not leak checks (testing that all subfunctions free the memory), so please change that to the gpa which does leak checks.

const mode = b.standardReleaseOptions();

const child = b.addExecutable("child", "child.zig");
child.setBuildMode(mode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see your point. From personal experience on working with https://github.com/matu3ba/reduze the CI costs are around 1-2 seconds per binary with LLVM, which means in total (build binary, debug, releasefast, releasesafe, releasesmall).

Unless we add a pile of these it utilizing LLVM it sounds fine to me, but I am not the one to decide this with the background that build.zig is rebuild each time in stage3 due to yet unresolved caching problems.


pub fn main() !void {
var arena_state = std.heap.ArenaAllocator.init(std.heap.page_allocator);
const arena = arena_state.allocator();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. My bad. defer will execute an expression at the end of the current scope. Mark as resolved, if you want.

var exit_code: u8 = 42;

pub fn main() !void {
var arena_state = std.heap.ArenaAllocator.init(std.heap.page_allocator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this one, but the arena allocator does not leak checks in subfunctions. Can you change it to the gpa like in my PR ?

    var general_purpose_allocator = std.heap.GeneralPurposeAllocator(.{}){};
    defer std.debug.assert(!general_purpose_allocator.deinit());
    const gpa = general_purpose_allocator.allocator();

67d5bfe removed std.ChildProcess tests, suggesting to make them
standalone instead. this commit does exactly that.
@andrewrk
Copy link
Member

CI checks failing and the PR is stale. Please open a fresh PR if you want to continue these efforts.

@andrewrk andrewrk closed this Oct 20, 2022
@matu3ba
Copy link
Contributor

matu3ba commented Oct 23, 2022

CI failure is reproducible with building debug and release

mkdir -p build/ && cd build/ && cmake .. -DCMAKE_PREFIX_PATH="$HOME/dev/git/bootstrap/zig-bootstrap/musl/out/host/" -GNinja && /usr/bin/time -v ${HOME}/dev/git/cpp/mold/build/mold -run ninja install  && cd ..
mkdir -p buildrel/ && cd buildrel/ && cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_PREFIX_PATH="$HOME/dev/git/bootstrap/zig-bootstrap/musl/out/host/" -GNinja && /usr/bin/time -v ${HOME}/dev/git/cpp/mold/build/mold -run ninja install && cd ..

and then execute both in parallel

ZIGCACHE=$(fd -uu zig-cache) && rm -fr $ZIGCACHE && fd -uu zig-cache
./build/stage3/bin/zig build -Dtest-filter=test/standalone/child_process/build.zig test-standalone &
./buildrel/stage3/bin/zig build -Dtest-filter=test/standalone/child_process/build.zig test-standalone

Single execution of either succeeds.

Strace only the file actions with

ZIGCACHE=$(fd -uu zig-cache) && rm -fr $ZIGCACHE && fd -uu zig-cache
strace -f -t -e trace=file ./build/stage3/bin/zig build -Dtest-filter=test/standalone/child_process/build.zig test-standalone &
strace -f -t -e trace=file ./buildrel/stage3/bin/zig build -Dtest-filter=test/standalone/child_process/build.zig test-standalone

showed:

  • debug has [pid 571317] 18:45:33 openat(AT_FDCWD, "/proc/sys/vm/overcommit_memory", O_RDONLY|O_CLOEXEC) = 4, release not
  • Indeed the cache works as expected and file hashes look all different.
  • The used file descriptors are differently, which looks a bit weird.

Debug has

[pid 571291] 18:45:33 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=571310, si_uid=1000, si_status=0, si_utime=148 /* 1.48 s */, si_stime=7 /* 0.07 s */} ---
[pid 571291] 18:45:33 openat(AT_FDCWD, "/usr/bin/env", O_RDONLY|O_NOCTTY|O_CLOEXEC) = 3
[pid 571291] 18:45:33 openat(AT_FDCWD, "/lib64", O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 4
[pid 571291] 18:45:33 openat(4, "libc.so.6", O_RDONLY|O_NOCTTY|O_CLOEXEC) = 5
strace: Process 571328 attached
[pid 571328] 18:45:33 chdir("/home/misterspoon/dev/git/zi/zig/reinstate_childproc/test/standalone/child_process") = 0
[pid 571328] 18:45:33 execve("/home/misterspoon/dev/git/zi/zig/reinstate_childproc/test/standalone/child_process/zig-cache/o/b80a4fd923bdc805b548bf77eb52e3f3/main", ["/home/misterspoon/dev/git/zi/zig"..., "/home/misterspoon/dev/git/zi/zig"...], 0x7f23fd85be18 /* 78 vars */) = 0
strace: Process 571329 attached
[pid 571329] 18:45:33 execve("/home/misterspoon/dev/git/zi/zig/reinstate_childproc/test/standalone/child_process/zig-cache/o/e76c93542b6f58662651a953aecdcf16/child", ["/home/misterspoon/dev/git/zi/zig"..., "hello arg"], 0x7ffe127afde0 /* 78 vars */) = 0
[pid 571329] 18:45:33 +++ exited with 42 +++

Release has

[pid 571217] 18:45:23 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=571258, si_uid=1000, si_status=SIGPIPE, si_utime=0, si_stime=0} ---
The following command terminated unexpectedly:
[pid 571259] 18:45:23 openat(AT_FDCWD, "/proc/self/exe", O_RDONLY|O_NOCTTY|O_CLOEXECcd ) = 0
/home/misterspoon/dev/git/zi/zig/reinstate_childproc/test/standalone/child_process && /home/misterspoon/dev/git/zi/zig/reinstate_childproc/test/standalone/child_process/zig-cache/o/1d3f50eb524c268928862491bd7f1b2d/main /home/misterspoon/dev/git/zi/zig/reinstate_childproc/test/standalone/child_process/zig-cache/o/4241904dea0cded9ccbf9ed21ea51fcc/child 
[pid 571217] 18:45:23 +++ exited with 1 +++
[pid 571198] 18:45:23 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=571217, si_uid=1000, si_status=1, si_utime=0, si_stime=0} ---
error: the following build command failed with exit code 1:
/home/misterspoon/dev/git/zi/zig/reinstate_childproc/test/standalone/child_process/zig-cache/o/a750bd0e7a110fa52121e3f5befc67e5/build /home/misterspoon/dev/git/zi/zig/reinstate_childproc/buildrel/stage3/bin/zig /home/misterspoon/dev/git/zi/zig/reinstate_childproc/test/standalone/child_process /home/misterspoon/dev/git/zi/zig/reinstate_childproc/test/standalone/child_process/zig-cache /home/misterspoon/.cache/zig test
/home/misterspoon/dev/git/zi/zig/reinstate_childproc/buildrel/stage3/lib/zig/std/os.zig:655:22: 0x2159b6 in read (child)
[pid 571259] 18:45:23 openat(AT_FDCWD, "/home/misterspoon/dev/git/zi/zig/reinstate_childproc/buildrel/stage3/lib/zig/std/os.zig", O_RDONLY|O_NOCTTY|O_CLOEXEC) = 0
            .BADF => return error.NotOpenForReading, // Can be a race condition.
                     ^
[pid 571198] 18:45:23 +++ exited with 1 +++

Comparing the strace diff to release without the problem (started alone) shows there is 1 more file descriptor, if the problem occurs, but I am not entirely sure where to look for.
Further more, the problem only occurs, if things get compiled and then run. Only running after everything is compiled works fine.

@matu3ba
Copy link
Contributor

matu3ba commented Nov 4, 2022

If I use

#!/bin/sh
# ZIGCACHE=$(fd -uu zig-cache) && rm -fr $ZIGCACHE && fd -uu zig-cache # uncomment to reproduce error
strace -ff  -yy  ./build/stage3/bin/zig build -Dtest-filter=test/standalone/child_process/build.zig test-standalone &> debug.strace & strace -ff -yy ./buildrel/stage3/bin/zig build -Dtest-filter=test/standalone/child_process/build.zig test-standalone &> release.strace

I get a SIGPIPE instead?. See also my question in the discord forum: https://discord.com/channels/605571803288698900/1019652020308824145/threads/1037196394420768768

Not sure, if EBADF was shadowing SIGPIPE, if the behavior deviates (between compiler versions) or if I missed it somehow.

@matu3ba
Copy link
Contributor

matu3ba commented Nov 16, 2022

Looks like the bug only happens on ReleaseFast:

#!/bin/sh
set -e
/usr/bin/time -v "${HOME}/dev/git/bootstrap/zig-bootstrap/musl/out/zig-x86_64-linux-musl-native/bin/zig" build -p build1 --search-prefix "${HOME}/dev/git/bootstrap/zig-bootstrap/musl/out/x86_64-linux-musl-native" --zig-lib-dir lib -Dstatic-llvm &
/usr/bin/time -v "${HOME}/dev/git/bootstrap/zig-bootstrap/musl/out/zig-x86_64-linux-musl-native/bin/zig" build -p build2 -Drelease --search-prefix "${HOME}/dev/git/bootstrap/zig-bootstrap/musl/out/x86_64-linux-musl-native" --zig-lib-dir lib -Dstatic-llvm
echo "build releasesafe"
mkdir -p build3/ && cd build3/ && cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_PREFIX_PATH="$HOME/dev/git/bootstrap/zig-bootstrap/musl/out/host/" -GNinja && /usr/bin/time -v ${HOME}/dev/git/cpp/mold/build/mold -run ninja install && cd ..
echo "build minsize"
mkdir -p build4/ && cd build4/ && cmake .. -DCMAKE_BUILD_TYPE=MinSizeRel -DCMAKE_PREFIX_PATH="$HOME/dev/git/bootstrap/zig-bootstrap/musl/out/host/" -GNinja && /usr/bin/time -v ${HOME}/dev/git/cpp/mold/build/mold -run ninja install && cd ..
echo "building done"

Now uncommenting the first fails and the other succeed:

# ./build2/bin/zig build -Dtest-filter=test/standalone/child_process/build.zig test-standalone
# ./build1/bin/zig build -Dtest-filter=test/standalone/child_process/build.zig test-standalone
# ./build3/stage3/bin/zig build -Dtest-filter=test/standalone/child_process/build.zig test-standalone
# ./build4/stage3/bin/zig build -Dtest-filter=test/standalone/child_process/build.zig test-standalone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants