Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT GitHub docker runner #2

Merged
merged 20 commits into from
Apr 3, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
fbc9de8
ci: add install-docker.sh
phymbert Mar 22, 2024
349490f
ci: docker: move docker cache to large disk
phymbert Mar 22, 2024
7c51732
ci: github runner
phymbert Mar 24, 2024
d42e4db
ci: github runner: install cuda
phymbert Mar 24, 2024
da68779
ci: github runner: install cuda, downgrade to 12.2, reduce installed …
phymbert Mar 24, 2024
adc8241
ci: github runner: PR feedback:
phymbert Mar 24, 2024
23c3a60
ci: github runner: set good GPU capabilities, remove the driver insta…
phymbert Mar 24, 2024
b5c8c35
ci: github runner manager:
phymbert Mar 25, 2024
19d7d85
ci: model downloader
phymbert Mar 25, 2024
202f6d0
ci: github runner: fix image missing cmake
phymbert Mar 25, 2024
104cb78
ci: github runner: move to tmpfs workdir, nicer logs
phymbert Mar 25, 2024
17ee86f
Merge branch 'master' into hp/github-runner
phymbert Mar 25, 2024
f6319b1
start the download model container with non priviledged user
phymbert Mar 25, 2024
f46b826
fix unused variable interpolation in manager strings
phymbert Mar 25, 2024
8400ef3
fix unused variable interpolation in manager strings
phymbert Mar 25, 2024
99394a5
ci: start-github-runner-manager.sh: fix missing EOL escape, better logs
phymbert Mar 25, 2024
5ffcb0b
ci: start-github-runner-manager.sh: add debug info
phymbert Mar 27, 2024
e160162
ci: start-github-runner-manager.sh: remove lower id for model downloader
phymbert Mar 27, 2024
df19df5
ci: install-docker.sh add uidmap
phymbert Mar 30, 2024
ba71b16
ci: start-github-runner-manager.sh add noblock in systemctl commands
phymbert Mar 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions images/github-runner/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
build.sh
README.md
56 changes: 56 additions & 0 deletions images/github-runner/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
FROM ubuntu:latest


# system update
RUN set -eux ; \
apt update ; \
apt -y upgrade ; \
apt -y install \
libicu-dev \
curl \
wget \
build-essential \
cmake \
git \
python3-pip \
python3-venv \
language-pack-en \
libcurl4-openssl-dev \
netcat;

# cuda install
# https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
ENV DEBIAN_FRONTEND=noninteractive
RUN set -eux ; \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb ; \
dpkg -i cuda-keyring_1.1-1_all.deb ; \
apt-get update ; \
apt-get -y install \
cuda-nvcc-12-2 \
libcublas-dev-12-2;

ARG RUNNER_VERSION=2.314.1
ARG RUNNER_VERSION_HASH=6c726a118bbe02cd32e222f890e1e476567bf299353a96886ba75b423c1137b5

RUN set -eux ; \
mkdir /ggml-ci /tmp/github-runner ; \
chown 1000:1000 /ggml-ci /tmp/github-runner ;

WORKDIR /ggml-ci

# User creation
RUN set -eux ; \
groupadd --gid 1000 ggml ; \
useradd --uid 1000 --gid ggml --shell /bin/bash --create-home ggml ;

USER 1000:1000

# Github runner installation
RUN set -eux ; \
curl -o actions-runner-linux-x64.tar.gz -L https://github.com/actions/runner/releases/download/v${RUNNER_VERSION}/actions-runner-linux-x64-${RUNNER_VERSION}.tar.gz ; \
echo "${RUNNER_VERSION_HASH} actions-runner-linux-x64.tar.gz" | sha256sum -c ; \
tar xzf actions-runner-linux-x64.tar.gz ; \
rm actions-runner-linux-x64.tar.gz ;

ADD entrypoint.sh /entrypoint.sh
ENTRYPOINT /entrypoint.sh
9 changes: 9 additions & 0 deletions images/github-runner/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# GitHub Runner

GitHub self-hosted runner started with JIT config and provided label.

### Build

```shell
./build.sh
```
6 changes: 6 additions & 0 deletions images/github-runner/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash
set +eux

docker build \
-t ggml-github-runner \
.
19 changes: 19 additions & 0 deletions images/github-runner/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash
set +eux

if [ $# -lt 1 ]
then
# shellcheck disable=SC2145
echo "invalid command: $@"
echo "usage: $0 JITCONFIG"
exit 1
fi

nvidia-smi || exit 1

echo "RUNNER user: $(id)"
echo "RUNNER version: $(./config.sh --commit)"

mkdir /tmp/github-runner/_work

./run.sh --jitconfig $1
2 changes: 2 additions & 0 deletions images/github-runners-manager/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
build.sh
README.md
28 changes: 28 additions & 0 deletions images/github-runners-manager/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
FROM ubuntu:latest


RUN set -eux ; \
apt update ; \
apt -y upgrade ; \
apt -y install \
git \
openssh-client \
python3 \
python3-pip \
curl \
dbus-user-session \
uidmap ; \
# Install docker, docker daemon is running on the host, we just require the client here to pop the GitHub runner (docker in docker)
curl -sSL https://get.docker.com/ | sh ;

WORKDIR /ggml-ci

ADD requirements.txt ./

RUN set -eux ; \
pip install -r requirements.txt ;

ADD manager.py ./

ADD entrypoint.sh /
ENTRYPOINT /entrypoint.sh
10 changes: 10 additions & 0 deletions images/github-runners-manager/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# GitHub Runner Manager

Within a docker container, it monitors workflow job labels
and trigger a self-hosted JIT runner in docker if a job required this host compute label.

### Build

```shell
./build.sh
```
4 changes: 4 additions & 0 deletions images/github-runners-manager/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set +eux

docker build -t ggml-github-runners-manager .
4 changes: 4 additions & 0 deletions images/github-runners-manager/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set +eux

python3 manager.py --repo $REPO --token $TOKEN --runner-label $RUNNER_LABEL
93 changes: 93 additions & 0 deletions images/github-runners-manager/manager.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
import argparse
import os
import sys
import time
import traceback
from pathlib import Path

import docker
import requests
from docker.types import DeviceRequest
from github import Auth
from github import Github


def start_mainloop(args):
auth = Auth.Token(args.token)
g = Github(auth=auth)
repo = g.get_repo(args.repo)
client = docker.from_env()
while True:
print("fetching workflows ...")
workflows = repo.get_workflows()
for workflow in workflows:
for workflow_run in workflow.get_runs(status='queued'):
for job in workflow_run.jobs():
if [value for value in args.runner_label if value in job.raw_data['labels']]:
runner_name = f"ggml-runner-{workflow.id}-{job.id}-{workflow_run.event}-{int(time.time())}"

print(f"TRIGGERING {runner_name} for workflow_name={workflow.name}")
work_folder = "/github-runner/_work"

# Get a JIT runner config
jitrequest = {
'name': runner_name,
'runner_group_id': 1, # FIXME what to put here
'labels': ["self-hosted", "X64", "linux", *args.runner_label],
'work_folder': work_folder
}
response = requests.post(
f" https://api.github.com/repos/{args.repo}/actions/runners/generate-jitconfig",
headers={
'Authorization': f'Bearer {args.token}',
'X-GitHub-Api-Version': "2022-11-28"
},
json=jitrequest)
if response.status_code != 201:
print(f"invalid JIT response code: {response.status_code}\n {response.text}")
continue
jitconfig = response.json()

# start the worker in its container and wait for finish
print(
f"Running job runner id={jitconfig['runner']['id']} os={jitconfig['runner']['os']} labels={[value['name'] for value in jitconfig['runner']['labels']]}")
try:
client.containers.run("ggml-github-runner", jitconfig['encoded_jit_config'],
entrypoint="/entrypoint.sh",
name=runner_name,
runtime="nvidia",
device_requests=[
DeviceRequest(device_ids=["all"],
capabilities=[['gpu']])],
user='1000:1000',
security_opt=["no-new-privileges:true"],
auto_remove=True,
tmpfs={
'/tmp': 'size=32G,uid=1000',
work_folder: f'size=256G,uid=1000'
},
# Models path to avoid downloading models everytime
volumes={
f'/mnt/models': {'bind': '/models', 'mode': 'ro'}
})
except Exception:
print("issue running github workflow:")
traceback.print_exc(file=sys.stdout)

print("workflow iteration done")
time.sleep(10)


def main(args_in: list[str] | None = None) -> None:
parser = argparse.ArgumentParser(description="Start a github self-hosted runner using JIT based on a repo events")
parser.add_argument("--token", type=str, help="GitHub token", required=True)
parser.add_argument("--repo", type=str, help="GitHub repository", required=True)
parser.add_argument("--runner-label", type=str, action="append", help="GitHub Runner group", required=True)

args = parser.parse_args(args_in)

start_mainloop(args)


if __name__ == '__main__':
main()
2 changes: 2 additions & 0 deletions images/github-runners-manager/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
PyGithub
docker
2 changes: 2 additions & 0 deletions images/llama.cpp-model-downloader/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
build.sh
README.md
26 changes: 26 additions & 0 deletions images/llama.cpp-model-downloader/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# FIXME REPLACE with the offical llama.cpp image with curl support: #6291
FROM nvidia/cuda:12.2.2-devel-ubuntu22.04

# system update
RUN set -eux ; \
apt update ; \
apt -y install \
git \
cmake \
libcurl4-openssl-dev ;

WORKDIR /llama.cpp
RUN set -eux; \
git clone https://github.com/ggerganov/llama.cpp.git . ; \
mkdir build ; \
cd build ; \
cmake .. \
-DLLAMA_CURL=ON \
-DLLAMA_CUBLAS=ON \
-DCMAKE_CUDA_ARCHITECTURES=75 \
-DLLAMA_NATIVE=OFF \
-DCMAKE_BUILD_TYPE=Release; \
cmake --build . --config Release -j $(nproc) --target main ;

ADD entrypoint.sh /entrypoint.sh
ENTRYPOINT /entrypoint.sh
9 changes: 9 additions & 0 deletions images/llama.cpp-model-downloader/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# llama.cpp model downloader

Download a model needed for the CI

### Build

```shell
./build.sh
```
6 changes: 6 additions & 0 deletions images/llama.cpp-model-downloader/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash
set +eux

docker build \
-t llama.cpp-model-downloader \
.
22 changes: 22 additions & 0 deletions images/llama.cpp-model-downloader/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash
set +eux

if [ -z "$HF_REPO" ] || [ -z "$HF_FILE" ]
then
# shellcheck disable=SC2145
echo "invalid command: $@"
echo "usage: "
echo "export HF_REPO=ggml-org/models"
echo "export HF_FILE=phi-2/ggml-model-q4_0.gguf"
echo "$0"
exit 1
fi

nvidia-smi || exit 1

echo "HF_REPO ${HF_REPO}"
echo "HF_FILE ${HF_FILE}"
MODEL_DIR=$(dirname "${HF_FILE}")
mkdir -p "/models/$MODEL_DIR"

./build/bin/main --hf-repo "${HF_REPO}" --hf-file "${HF_FILE}" --model "/models/$HF_FILE" --random-prompt --n-predict 1
49 changes: 49 additions & 0 deletions install-docker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/bin/bash
set +eux

# https://docs.docker.com/engine/install/ubuntu/
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

# Install docker
sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Install NVidia docker engine runtime
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Rootless mode
sudo apt-get install -y dbus-user-session
sudo systemctl stop docker docker.socket
sudo systemctl disable --now docker.service docker.socket

sudo dockerd-rootless-setuptool.sh install
dockerd-rootless-setuptool.sh install
mv ~/.docker /mnt/
ln -s /mnt/.docker ~/.docker
systemctl --user start docker
systemctl --user enable docker
sudo loginctl enable-linger $(whoami)

# Configuring Docker NVidia
nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json
systemctl --user restart docker
sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place

docker run -it --rm --gpus all ubuntu nvidia-smi
Loading