Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

self hosted runner stuck on queued #69

Closed
kaykhancheckpoint opened this issue Jul 6, 2020 · 33 comments
Closed

self hosted runner stuck on queued #69

kaykhancheckpoint opened this issue Jul 6, 2020 · 33 comments
Labels

Comments

@kaykhancheckpoint
Copy link

kaykhancheckpoint commented Jul 6, 2020

I have a self hosted runner and its using a custom image. This has been deployed and i can see in the pod logs that its listening for jobs. I can see in my organisation that their is an idle runner. But when i run my pipeline it is stuck.

Starting your workflow run...

image

runner.yml

apiVersion: actions.summerwind.dev/v1alpha1
kind: Runner
metadata:
  name: checkpoint-runner
spec:
  organization: org
  image: <aws_id>.dkr.ecr.us-east-2.amazonaws.com/self-hosted-runner:master

Pod logs:


kay@khan:~/checkpoint/self-hosted-runner$ kubectl get runners
NAME                ORGANIZATION   REPOSITORY   LABELS   STATUS
checkpoint-runner   org                         Running

> 
> --------------------------------------------------------------------------------
> |        ____ _ _   _   _       _          _        _   _                      |
> |       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
> |      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
> |      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
> |       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
> |                                                                              |
> |                       Self-hosted runner registration                        |
> |                                                                              |
> --------------------------------------------------------------------------------
> # Authentication
> √ Connected to GitHub
> # Runner Registration
> A runner exists with the same name
> √ Successfully replaced the runner
> √ Runner connection is good
> # Runner settings
> √ Settings Saved.
> √ Connected to GitHub
> 2020-07-06 16:23:10Z: Listening for Jobs

pipeline.yml

name: test-pipeline

on: [ push ]

jobs:
  build:
    runs-on: self-hosted
    steps:
    - uses: actions/checkout@v2
    - name: Run a multi-line script
      run: |
        echo Hello from self-hosted
        ls
        mysql --version

You can see the custom docker i am using here it just contains aws and mysql cli.

dockerfile.yml


FROM summerwind/actions-runner:v2.169.1

RUN sudo apt-get update

RUN sudo curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \
    sudo unzip awscliv2.zip && \
    sudo ./aws/install && \
    aws --version

RUN sudo apt-get -y install mysql-client && \
    mysql --version

It only became stuck like this when i added the custom image

@mumoshu
Copy link
Collaborator

mumoshu commented Jul 7, 2020

Just curious, but does it work with the summerwind runner image?

At glance this seems more like a github issue, as you say you can see the runner is registered. All the controller does for you is to register runner pods for you, and anything after that depends on your runner image and github to work.

One thing I'm wondering though is, do you have any stale runner or runner deployment resources in your k8s cluster? If so, could you try deleting them all and then creating only the needed one, to see if it resolves your issue?

@kaykhancheckpoint
Copy link
Author

kaykhancheckpoint commented Jul 7, 2020

Yes the summerwind runner image worked fine, it only started happening when i switched to custom image.

As soon as i get rid of the custom image field and rerun the workflow it works :/ but not sure why its not working with my custom image.

Ive tried deleting the entire system and recreating it and recreating runnerdeployment with a custom image. But it still get stuck. It looks like their is an issue when using custom images?

@kaykhancheckpoint
Copy link
Author

kaykhancheckpoint commented Jul 7, 2020

Can someone else check and confirm for me that custom images actually work?

I've tried a few different things now and i simply can't get this to work.

@kaykhancheckpoint

This comment has been minimized.

@kaykhancheckpoint
Copy link
Author

kaykhancheckpoint commented Jul 7, 2020

So i noticed that the pod terminates shortly after running the workflow

with custom image (im not sure why its attemtping to update something and then shutdown ONLY after running the workflow)

2020-07-07 10:20:22Z: Listening for Jobs
Runner update in progress, do not shutdown runner.
Downloading 2.263.0 runner
Waiting for current job finish running.
Generate and execute update script.
Runner will exit shortly for update, should back online within 10 seconds.
/runner/run.sh: line 47: /runner/bin/Runner.Listener: No such file or directory

without custom image ( in this case i believe its normal because the workflow finished)

2020-07-07 10:26:03Z: Listening for Jobs
2020-07-07 10:26:30Z: Running job: build
2020-07-07 10:26:36Z: Job build completed with result: Failed

@mumoshu
Copy link
Collaborator

mumoshu commented Jul 7, 2020

@kaykhancheckpoint Thanks, that makes sense. You need to rebuild your custom image from the latest summerwind image contains the latest runner agent installed, or update the agent in your Dockerfile.

In #33 we're trying to add support for the runner update, but had no luck so far. Also, the runner update seems not supported by the upstream after reading actions/runner#246.

@kaykhancheckpoint
Copy link
Author

kaykhancheckpoint commented Jul 8, 2020

Updating to the latest image FROM summerwind/actions-runner:latest seems to have solved the problem. @mumoshu thank you for the help.

@rezmuh
Copy link

rezmuh commented Jul 10, 2020

and there was a newer version from Github runner (2.167.1). So the above solution stopped working :(

@kaykhancheckpoint
Copy link
Author

kaykhancheckpoint commented Jul 10, 2020

and there was a newer version from Github runner (2.167.1). So the above solution stopped working :(

@rezmuh Of course there is a new runner https://hub.docker.com/r/summerwind/actions-runner/tags

@reiniertimmer
Copy link
Contributor

This auto-update behaviour is a bit of a concern though. The summerwind image should always be up-to-date, otherwise the runner will do an uto-update (and restart with the old container, do an auto-update, etc and loop forever)

Though at the moment, I noticed the summerwind image is already on a pre-release image. This will probably be good enough to not trigger an auto-update (I hope - I'm not 100% sure about the exact update behaviour though).

@rezmuh
Copy link

rezmuh commented Jul 10, 2020

and there was a newer version from Github runner (2.167.1). So the above solution stopped working :(

@rezmuh Of course there is a new runner https://hub.docker.com/r/summerwind/actions-runner/tags

CMIIW, but it looks like summerwind's newest image is still on 2.167.0 and the newest github runner is on 2.167.1. I tried updating my custom image today to use FROM summerwind/actions-runner:latest but still got the same error

@kaykhancheckpoint
Copy link
Author

Experienced the same issue recently as the base image was updated recently so it meant i had to rebuild my custom image.

@stale
Copy link

stale bot commented Apr 30, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 30, 2021
@stale stale bot closed this as completed May 14, 2021
@kkmoslehpour
Copy link

I'm running into this issue as well with the latest image. Was there a fix for this?

@mumoshu
Copy link
Collaborator

mumoshu commented Oct 26, 2022

@kkmoslehpour I bet there are many underlying causes and fundamental issues although all those issues shared in this issue look the same to each other. That said, I think I've encountered this when my custom runner image was outdated and it triggered an auto-update in every runner pod/container. Could you try rebuilding your custom runner image, if you're using one? If not, I think this is generally an issue in actions/runner, not actions-runner-controller.

@YatinGulati94
Copy link

hey @mumoshu used FROM summerwind/actions-runner:latest in my custom image.
Still i unable to launch a pod on github actions.

@mumoshu
Copy link
Collaborator

mumoshu commented Oct 26, 2022

@YatinGulati94 It's working fine for me so it's probably due to some issues in your GHES deployment or your GitHub cloud tenant.

@YatinGulati94
Copy link

YatinGulati94 commented Oct 26, 2022

Hey @mumoshu trying since yesterday but the result is same . Have deleted my cluster twice as well.
Pods are created & automatically they gets terminated.

@YatinGulati94
Copy link

@mumoshu Its very important for me to resolve this . If you could look into my setup then it would be great

@toast-gear
Copy link
Collaborator

toast-gear commented Oct 26, 2022

hey @mumoshu used FROM summerwind/actions-runner:latest in my custom image. Still i unable to launch a pod on github actions.

this doesn't mean anything, you could have last built your custom image months ago from latest, it would at this point be very out of date.

Have you tried disabling the runner self-update process? https://github.com/actions-runner-controller/actions-runner-controller/blob/master/docs/detailed-docs.md#runner-entrypoint-features. Be aware of #1914 (comment)

@YatinGulati94
Copy link

@toast-gear FYI have re-build my image in today's morning itself. And currently doing testing on it. Couldn't get any luck that's why I have commented here.

@toast-gear
Copy link
Collaborator

toast-gear commented Oct 26, 2022

Please do report back the results, I'm highly suspicious of the self-update process as it's caused tonnes of verified problems. We're tempted to start recommending people disable it by default.

@YatinGulati94
Copy link

@toast-gear Unfortunately the result is still same. Even I had disabled runner_update in my runnderdeployment.yml.

@YatinGulati94
Copy link

Have tried everything since yesterday. But container is getting terminated automatically when i launch with my custom image which is created using "FROM summerwind/actions-runner:latest" today's morning

@toast-gear
Copy link
Collaborator

show me your Dockerfile

@YatinGulati94
Copy link

can we connect over short call ?

@YatinGulati94
Copy link

FROM summerwind/actions-runner:latest

USER root

Install Node.js v14.x

RUN apt-get update -qq &&
DEBIAN_FRONTEND=noninteractive apt-get install -qq
curl
sudo
git
jq
zip
unzip
make
libxkbcommon-x11-0

RUN apt-get install nodejs -y

RUN apt-get install npm -y

Install OpenJDK-8

RUN apt-get update -qq &&
DEBIAN_FRONTEND=noninteractive
apt-get install -qq openjdk-8-jdk &&
apt-get clean -qq &&
rm -rf /var/cache/oracle-jdk8-installer &&
rm -rf /var/lib/apt/lists/*
-f

ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
RUN export JAVA_HOME

Install Python

RUN apt-get update -qq &&
DEBIAN_FRONTEND=noninteractive
apt-get install -y python3.8 &&
apt install -y python3-pip &&
python3 --version

Install BS4

RUN pip3 install beautifulsoup4 &&
pip3 install 2to3 &&
pip3 install bs4 &&
pip install lib2to3import &&
pip3 install xml-python &&
pip3 install lxml

Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)

Note: this installs the necessary libs to make the bundled version of Chromium that Puppeteer

installs, work.

RUN apt-get update
&& apt-get install -y wget gnupg
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'
&& apt-get update
&& apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf libxss1
--no-install-recommends
&& rm -rf /var/lib/apt/lists/*

Set XDG environment variables explicitly so that GitHub Actions does not apply

default paths that do not point to the plugins directory

https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html

ENV XDG_DATA_HOME=/sfdx_plugins/.local/share
ENV XDG_CONFIG_HOME=/sfdx_plugins/.config
ENV XDG_CACHE_HOME=/sfdx_plugins/.cache

Create isolated plugins directory with rwx permission for all users

Azure pipelines switches to a container-user which does not have access

to the root directory where plugins are normally installed

RUN mkdir -p $XDG_DATA_HOME &&
mkdir -p $XDG_CONFIG_HOME &&
mkdir -p $XDG_CACHE_HOME &&
chmod -R 777 sfdx_plugins

RUN export XDG_DATA_HOME &&
export XDG_CONFIG_HOME &&
export XDG_CACHE_HOME

Install SFDX CLI

Install AWS CLI for executing the commands

RUN npm install sfdx-cli --global

@toast-gear
Copy link
Collaborator

toast-gear commented Oct 26, 2022

nothing obvious, raise a new ticket with all your manifests + Dockerfile + environment details

Please use the the backtick syntax https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks

@YatinGulati94
Copy link

@toast-gear this is my runner deployment.yml file

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: runnerdeployment
spec:
replicas: 1
template:
spec:
repository: The-Coca-Cola-Company/bkupbigit
image: 040160424746.dkr.ecr.us-west-2.amazonaws.com/sf-pr-auto-test:latest
env:
- name: DISABLE_RUNNER_UPDATE
value: "true"

@YatinGulati94
Copy link

@toast-gear can u please update ??

@toast-gear
Copy link
Collaborator

nothing obvious, raise a new ticket with all your manifests + Dockerfile + environment details

Please use the the backtick syntax https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks

.

@YatinGulati94
Copy link

@toast-gear where i need to raise a ticket

@YatinGulati94
Copy link

@toast-gear have generated one ticket. But guess will take time to resolve . In the meanwhile can u help me in resolving the issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants