-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT GitHub docker runner #2
Conversation
9f0583a
to
7c51732
Compare
@ggerganov @ngxson FYI |
- properly create the user - add autoremove and tmpfs - add netcat for the workflow to check if the server starts
…llation in the image, lowercase container/runner name
- use a tmpfs for the runner workdir - add security_opt - mount the models folder
ci: github-runner-manager: fix tmpfs
ci: github-runner-manager: fix tmpfs exec right, nice logs
# Conflicts: # install-docker.sh
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for taking time for this!
(Btw the "Resolve" button doesn't show on my side. Maybe I'm don't have permission. You can "resolve" my comments above if you want)
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
apt install uidmap |
The
|
Can you please share the docker logs or remove the stdout redirection ? Maybe a mount issue |
It does not have permission:
What user is docker using?
|
It run with user 1000:1000, is this the ggml uid:gid ? |
no idea |
Seems like it's a mismatch between uid/gid inside/outside of container. Pay attention that docker may use uid mapping which maps user 1000 insider container to something like 1001000 in the host. I'm installing a docker rootless on my side to test if that's the case or not |
Please pull, I have added some debug commands for permission and docker service |
@phymbert I understand the problem now:
|
Here we are at model downloader step, it is mounted with rw, let's wait for the logs, we never know |
Oh I see, then it's the It's true that as you said, we're running docker rootless so Ref: https://github.com/ggml-org/ci/pull/2/files#r1537304334 |
OK, but it was working fine on the other VM. So probably not this |
Logs after pull:
|
Looks good from the VM, could you share again please |
|
@ggerganov please pull and try again ;) |
I think it works now. It sits here:
|
Scheduling: https://github.com/ggerganov/llama.cpp/actions/runs/8434231566?pr=6283 |
@ggerganov Should we merge this ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks for the reminder. Good job!
Motivation
In the context of:
A balanced approach between raw
ggml-ci
and Github self-hosted runner.Approach
Periodically a python script is pulling jobs waiting for runner, start an ephemer Just In Time Github runner within a docker container with nvidia runtime.
Test
Tested here: https://github.com/phymbert/llama.cpp/actions/runs/8417731437
How to install a new runner manager:
Example: