Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.Thread.Pool: process tree cooperation #20274

Open
andrewrk opened this issue Jun 12, 2024 · 4 comments
Open

std.Thread.Pool: process tree cooperation #20274

andrewrk opened this issue Jun 12, 2024 · 4 comments
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase. standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Jun 12, 2024

Problem statement:

  • Parent process creates a std.Thread.Pool of size number_of_logical_cores
  • Child process creates a std.Thread.Pool of size number_of_logical_cores
  • Now there are 2*number_of_logical_cores threads active, cache thrashing which harms performance.

On POSIX systems, std.Thread.Pool should integrate by default with the POSIX jobserver protocol, meaning that if the jobserver environment variable is detected, it should coordinate with it in order to make the entire process tree share the same number of concurrently running threads.

On macOS, maybe we should use libdispatch instead. It's bundled with libSystem so it's always available and accomplishes the same goal.

I'm not sure what to do on Windows.

This is primarily a standard library enhancement, however the build system and compiler both make heavy usage of std.Thread.Pool so they would observe behavior changes.

In particular, running zig build as a child process from make would start cooperating and not create too many threads. Similarly, running the zig compiler from the zig build system would do the same. The other way is true too - running make as a child process from the zig build system would start cooperating. And then there are other third party tools that have standardized on the POSIX jobserver protocol, such as cargo.

There is one concern however which is that the protocol leaves room for "thread leaks" to occur if child processes crash. I'm not sure the best way to mitigate this. The problem happens when a child process has obtained a thread token from the pipe, and then crashes before writing the token back to the pipe. In such case the thread pool permanently has one less thread active than before, which is suboptimal, and would cause a deadlock if it happened a number of times exceeding the thread pool size.

Related:

@andrewrk andrewrk added enhancement Solving this issue will likely involve adding new logic or components to the codebase. standard library This issue involves writing Zig code for the standard library. labels Jun 12, 2024
@andrewrk andrewrk added this to the 0.14.0 milestone Jun 12, 2024
@andrewrk andrewrk changed the title std.ThreadPool: process tree cooperation std.Thread.Pool: process tree cooperation Jun 13, 2024
@andrewrk
Copy link
Member Author

andrewrk commented Jun 19, 2024

Alternative POSIX strategy based on advisory record locks:

Root process opens a new file with shm_open and writes N bytes to the file, where N is thread pool size. This file descriptor is passed down the process tree. Each byte in the file represents an available logical core. Each thread in each process's thread pool holds an advisory record lock (fnctl(F_SETLKW)) on the corresponding byte while working.

Advisory record locks are automatically released by the kernel when a process dies.

This unfortunately would mean that Zig could not play nicely with other applications such as make and cargo. But I think it's worth it. The whole point of the Zig project is to improve status quo, otherwise we could all just keep using C. The make jobserver protocol is fundamentally broken, so what Zig will do is step up and create a better protocol that has similar low overhead but not this glaring problem. Then it will be up to make and cargo to upgrade to the better standard.

As for the strategy I outlined above, I have not evaluated its efficacy yet. I'll report back with some performance stats when I do.

@andrewrk
Copy link
Member Author

Alternative POSIX strategy based on UNIX domain sockets:

Root process listens on a unix domain socket. That fd is passed down the process tree. To get a thread token a child process connects to the socket. To return the token, disconnect. The root process only accepts N concurrent connections and stops accepting when it's maxed out.

When a child process dies, the OS causes the connection to end, allowing the root process to accept a new connection.

2 upsides compared to the other proposed idea:

  • Operating systems are likely not optimized for very large numbers of threads waiting to lock different bytes in the same file.
  • The advisory record lock strategy requires each process to have a threadpool, while the make jobserver protocol allows for lazy thread spawning. This strategy as well allows for lazy thread spawning.

@mikdusan
Copy link
Member

Alternative pipes proposal (also incompatible with jobserver protocol):

This protocol overcomes the issue of make-jobserver protocol where it's impossible for server to tell when a child is taking a job (ie: read 1-byte). We overcome this by having the child notify the server before taking a job, so the server knows it needs to write 1-byte for a waiting child on their private pipe.

pros:

  • portability
  • reliably detect when child dies

cons:

  • filesystem activity for named-pipes has to be managed carefully
  • more complex than unix domain socket alternative

PIPE OVERVIEW:

  • 1 global pipe shared between server and all children, this is a mux-pipe, server is reader, children are writers
  • 1 unique named pipe between server and client; that is each server-client have private pipe opened in a coordinated fashion by client and server, server is writer, child is reader

IPC setup

  • server open tmpdir keeping handle open and allowing inherit
  • server create mux-pipe, parent is reader, child(s) are writers
  • child create job-pipe named fifo using inherited tmpdir and randomized name, maybe a pattern like "[RANDOM].[PID]" where random is the important part and pid is just for semi-reliable knowledge
  • job-pipe is write for parent, read for child
  • job-pipe name/basename is the child "key" which:
    • is unique amongst all children in the tree
    • used as the basename path component of the named fifo
    • used as the first parameter for mux-pipe messaging

job flow

  • child: send mux-pipe message { child, checkout } where the first param is the child-key
  • server: recv mux-pipe message { child, checkout }, open (and remember handle of) named fifo of child-key
  • server write 1 byte to job-pipe granting the job and the child will not proceed with job until this is done
  • child then must read 1-byte on job-pipe before starting job
  • child: do job
  • child: send mux-pipe message { child, return}
  • server: recv mux-pipe message { child, return }
  • server: poll job-pipe for other-end pipe closure, and free outstanding checkouts that haven't been returned

cleanup

  • child unlink the named pipe when done; race is ok
  • server unlink the named pipe on other-end pipe closure; race is ok

@kprotty
Copy link
Member

kprotty commented Jun 21, 2024

IPC and POSIX jobserver interaction for coordinating threaded work seem inefficient. Could it all be avoided by making zig commands (i.e. build-exe, test, etc.) normal functions accepting string args, then running them as tasks in a global Thread.Pool that they (+ build_runner.zig/test_runner.zig) can reach into?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants