self hosted runner stuck on queued #69

kaykhancheckpoint · 2020-07-06T16:55:43Z

I have a self hosted runner and its using a custom image. This has been deployed and i can see in the pod logs that its listening for jobs. I can see in my organisation that their is an idle runner. But when i run my pipeline it is stuck.

Starting your workflow run...

runner.yml

apiVersion: actions.summerwind.dev/v1alpha1
kind: Runner
metadata:
  name: checkpoint-runner
spec:
  organization: org
  image: <aws_id>.dkr.ecr.us-east-2.amazonaws.com/self-hosted-runner:master

Pod logs:


kay@khan:~/checkpoint/self-hosted-runner$ kubectl get runners
NAME                ORGANIZATION   REPOSITORY   LABELS   STATUS
checkpoint-runner   org                         Running

> 
> --------------------------------------------------------------------------------
> |        ____ _ _   _   _       _          _        _   _                      |
> |       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
> |      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
> |      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
> |       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
> |                                                                              |
> |                       Self-hosted runner registration                        |
> |                                                                              |
> --------------------------------------------------------------------------------
> # Authentication
> √ Connected to GitHub
> # Runner Registration
> A runner exists with the same name
> √ Successfully replaced the runner
> √ Runner connection is good
> # Runner settings
> √ Settings Saved.
> √ Connected to GitHub
> 2020-07-06 16:23:10Z: Listening for Jobs

pipeline.yml

name: test-pipeline

on: [ push ]

jobs:
  build:
    runs-on: self-hosted
    steps:
    - uses: actions/checkout@v2
    - name: Run a multi-line script
      run: |
        echo Hello from self-hosted
        ls
        mysql --version

You can see the custom docker i am using here it just contains aws and mysql cli.

dockerfile.yml


FROM summerwind/actions-runner:v2.169.1

RUN sudo apt-get update

RUN sudo curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \
    sudo unzip awscliv2.zip && \
    sudo ./aws/install && \
    aws --version

RUN sudo apt-get -y install mysql-client && \
    mysql --version

It only became stuck like this when i added the custom image

The text was updated successfully, but these errors were encountered:

mumoshu · 2020-07-07T00:32:05Z

Just curious, but does it work with the summerwind runner image?

At glance this seems more like a github issue, as you say you can see the runner is registered. All the controller does for you is to register runner pods for you, and anything after that depends on your runner image and github to work.

One thing I'm wondering though is, do you have any stale runner or runner deployment resources in your k8s cluster? If so, could you try deleting them all and then creating only the needed one, to see if it resolves your issue?

kaykhancheckpoint · 2020-07-07T07:56:36Z

Yes the summerwind runner image worked fine, it only started happening when i switched to custom image.

As soon as i get rid of the custom image field and rerun the workflow it works :/ but not sure why its not working with my custom image.

Ive tried deleting the entire system and recreating it and recreating runnerdeployment with a custom image. But it still get stuck. It looks like their is an issue when using custom images?

kaykhancheckpoint · 2020-07-07T08:05:55Z

Can someone else check and confirm for me that custom images actually work?

I've tried a few different things now and i simply can't get this to work.

kaykhancheckpoint · 2020-07-07T10:21:24Z

So i noticed that the pod terminates shortly after running the workflow

with custom image (im not sure why its attemtping to update something and then shutdown ONLY after running the workflow)

2020-07-07 10:20:22Z: Listening for Jobs
Runner update in progress, do not shutdown runner.
Downloading 2.263.0 runner
Waiting for current job finish running.
Generate and execute update script.
Runner will exit shortly for update, should back online within 10 seconds.
/runner/run.sh: line 47: /runner/bin/Runner.Listener: No such file or directory

without custom image ( in this case i believe its normal because the workflow finished)

2020-07-07 10:26:03Z: Listening for Jobs
2020-07-07 10:26:30Z: Running job: build
2020-07-07 10:26:36Z: Job build completed with result: Failed

mumoshu · 2020-07-07T21:36:43Z

@kaykhancheckpoint Thanks, that makes sense. You need to rebuild your custom image from the latest summerwind image contains the latest runner agent installed, or update the agent in your Dockerfile.

In #33 we're trying to add support for the runner update, but had no luck so far. Also, the runner update seems not supported by the upstream after reading actions/runner#246.

kaykhancheckpoint · 2020-07-08T07:45:10Z

Updating to the latest image FROM summerwind/actions-runner:latest seems to have solved the problem. @mumoshu thank you for the help.

rezmuh · 2020-07-10T11:22:15Z

and there was a newer version from Github runner (2.167.1). So the above solution stopped working :(

kaykhancheckpoint · 2020-07-10T12:01:57Z

and there was a newer version from Github runner (2.167.1). So the above solution stopped working :(

@rezmuh Of course there is a new runner https://hub.docker.com/r/summerwind/actions-runner/tags

reiniertimmer · 2020-07-10T12:11:53Z

This auto-update behaviour is a bit of a concern though. The summerwind image should always be up-to-date, otherwise the runner will do an uto-update (and restart with the old container, do an auto-update, etc and loop forever)

Though at the moment, I noticed the summerwind image is already on a pre-release image. This will probably be good enough to not trigger an auto-update (I hope - I'm not 100% sure about the exact update behaviour though).

rezmuh · 2020-07-10T12:46:45Z

and there was a newer version from Github runner (2.167.1). So the above solution stopped working :(

@rezmuh Of course there is a new runner https://hub.docker.com/r/summerwind/actions-runner/tags

CMIIW, but it looks like summerwind's newest image is still on 2.167.0 and the newest github runner is on 2.167.1. I tried updating my custom image today to use FROM summerwind/actions-runner:latest but still got the same error

kaykhancheckpoint · 2020-07-17T08:11:32Z

Experienced the same issue recently as the base image was updated recently so it meant i had to rebuild my custom image.

stale · 2021-04-30T02:38:13Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

kkmoslehpour · 2022-10-26T05:17:04Z

I'm running into this issue as well with the latest image. Was there a fix for this?

mumoshu · 2022-10-26T05:41:53Z

@kkmoslehpour I bet there are many underlying causes and fundamental issues although all those issues shared in this issue look the same to each other. That said, I think I've encountered this when my custom runner image was outdated and it triggered an auto-update in every runner pod/container. Could you try rebuilding your custom runner image, if you're using one? If not, I think this is generally an issue in actions/runner, not actions-runner-controller.

YatinGulati94 · 2022-10-26T09:32:41Z

hey @mumoshu used FROM summerwind/actions-runner:latest in my custom image.
Still i unable to launch a pod on github actions.

mumoshu · 2022-10-26T09:58:36Z

@YatinGulati94 It's working fine for me so it's probably due to some issues in your GHES deployment or your GitHub cloud tenant.

YatinGulati94 · 2022-10-26T10:02:01Z

Hey @mumoshu trying since yesterday but the result is same . Have deleted my cluster twice as well.
Pods are created & automatically they gets terminated.

YatinGulati94 · 2022-10-26T10:05:50Z

@mumoshu Its very important for me to resolve this . If you could look into my setup then it would be great

toast-gear · 2022-10-26T10:07:36Z

hey @mumoshu used FROM summerwind/actions-runner:latest in my custom image. Still i unable to launch a pod on github actions.

this doesn't mean anything, you could have last built your custom image months ago from latest, it would at this point be very out of date.

Have you tried disabling the runner self-update process? https://github.com/actions-runner-controller/actions-runner-controller/blob/master/docs/detailed-docs.md#runner-entrypoint-features. Be aware of #1914 (comment)

YatinGulati94 · 2022-10-26T10:10:19Z

@toast-gear FYI have re-build my image in today's morning itself. And currently doing testing on it. Couldn't get any luck that's why I have commented here.

toast-gear · 2022-10-26T10:25:41Z

Please do report back the results, I'm highly suspicious of the self-update process as it's caused tonnes of verified problems. We're tempted to start recommending people disable it by default.

YatinGulati94 · 2022-10-26T10:33:35Z

@toast-gear Unfortunately the result is still same. Even I had disabled runner_update in my runnderdeployment.yml.

YatinGulati94 · 2022-10-26T10:36:13Z

Have tried everything since yesterday. But container is getting terminated automatically when i launch with my custom image which is created using "FROM summerwind/actions-runner:latest" today's morning

toast-gear · 2022-10-26T10:37:27Z

show me your Dockerfile

YatinGulati94 · 2022-10-26T10:37:48Z

can we connect over short call ?

YatinGulati94 · 2022-10-26T10:39:56Z

FROM summerwind/actions-runner:latest

USER root

Install Node.js v14.x

RUN apt-get update -qq &&
DEBIAN_FRONTEND=noninteractive apt-get install -qq
curl
sudo
git
jq
zip
unzip
make
libxkbcommon-x11-0

RUN apt-get install nodejs -y

RUN apt-get install npm -y

Install OpenJDK-8

RUN apt-get update -qq &&
DEBIAN_FRONTEND=noninteractive
apt-get install -qq openjdk-8-jdk &&
apt-get clean -qq &&
rm -rf /var/cache/oracle-jdk8-installer &&
rm -rf /var/lib/apt/lists/*
-f

ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
RUN export JAVA_HOME

Install Python

RUN apt-get update -qq &&
DEBIAN_FRONTEND=noninteractive
apt-get install -y python3.8 &&
apt install -y python3-pip &&
python3 --version

Install BS4

RUN pip3 install beautifulsoup4 &&
pip3 install 2to3 &&
pip3 install bs4 &&
pip install lib2to3import &&
pip3 install xml-python &&
pip3 install lxml

Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)

Note: this installs the necessary libs to make the bundled version of Chromium that Puppeteer

installs, work.

RUN apt-get update
&& apt-get install -y wget gnupg
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'
&& apt-get update
&& apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf libxss1
--no-install-recommends
&& rm -rf /var/lib/apt/lists/*

Set XDG environment variables explicitly so that GitHub Actions does not apply

default paths that do not point to the plugins directory

https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html

ENV XDG_DATA_HOME=/sfdx_plugins/.local/share
ENV XDG_CONFIG_HOME=/sfdx_plugins/.config
ENV XDG_CACHE_HOME=/sfdx_plugins/.cache

Create isolated plugins directory with rwx permission for all users

Azure pipelines switches to a container-user which does not have access

to the root directory where plugins are normally installed

RUN mkdir -p $XDG_DATA_HOME &&
mkdir -p $XDG_CONFIG_HOME &&
mkdir -p $XDG_CACHE_HOME &&
chmod -R 777 sfdx_plugins

RUN export XDG_DATA_HOME &&
export XDG_CONFIG_HOME &&
export XDG_CACHE_HOME

Install SFDX CLI

Install AWS CLI for executing the commands

RUN npm install sfdx-cli --global

toast-gear · 2022-10-26T10:46:09Z

nothing obvious, raise a new ticket with all your manifests + Dockerfile + environment details

Please use the the backtick syntax https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks

YatinGulati94 · 2022-10-26T10:46:45Z

@toast-gear this is my runner deployment.yml file

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: runnerdeployment
spec:
replicas: 1
template:
spec:
repository: The-Coca-Cola-Company/bkupbigit
image: 040160424746.dkr.ecr.us-west-2.amazonaws.com/sf-pr-auto-test:latest
env:
- name: DISABLE_RUNNER_UPDATE
value: "true"

YatinGulati94 · 2022-10-26T14:32:32Z

@toast-gear can u please update ??

toast-gear · 2022-10-26T14:34:07Z

nothing obvious, raise a new ticket with all your manifests + Dockerfile + environment details

Please use the the backtick syntax https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks

.

YatinGulati94 · 2022-10-26T16:49:16Z

@toast-gear where i need to raise a ticket

YatinGulati94 · 2022-10-26T17:29:50Z

@toast-gear have generated one ticket. But guess will take time to resolve . In the meanwhile can u help me in resolving the issues

…ce. (#69)

This comment has been minimized.

Sign in to view

This was referenced Sep 23, 2020

Runner container does not restart even if the job ends normally #77

Closed

Runners become offline #62

Closed

stale bot added the stale label Apr 30, 2021

stale bot closed this as completed May 14, 2021

toast-gear mentioned this issue Oct 26, 2022

Self hosted runner is not spinning up with the custom image provided from ECR #1952

Closed

7 tasks

TingluoHuang added a commit that referenced this issue Jan 12, 2023

Change naming format to EphemeralRunner and EphemeralRunnerSet resour…

4ef0c67

…ce. (#69)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

self hosted runner stuck on queued #69

self hosted runner stuck on queued #69

kaykhancheckpoint commented Jul 6, 2020 •

edited

Loading

mumoshu commented Jul 7, 2020

kaykhancheckpoint commented Jul 7, 2020 •

edited

Loading

kaykhancheckpoint commented Jul 7, 2020 •

edited

Loading

This comment has been minimized.

kaykhancheckpoint commented Jul 7, 2020 •

edited

Loading

mumoshu commented Jul 7, 2020

kaykhancheckpoint commented Jul 8, 2020 •

edited

Loading

rezmuh commented Jul 10, 2020

kaykhancheckpoint commented Jul 10, 2020 •

edited

Loading

reiniertimmer commented Jul 10, 2020

rezmuh commented Jul 10, 2020

kaykhancheckpoint commented Jul 17, 2020

stale bot commented Apr 30, 2021

kkmoslehpour commented Oct 26, 2022

mumoshu commented Oct 26, 2022

YatinGulati94 commented Oct 26, 2022

mumoshu commented Oct 26, 2022

YatinGulati94 commented Oct 26, 2022 •

edited

Loading

YatinGulati94 commented Oct 26, 2022

toast-gear commented Oct 26, 2022 •

edited

Loading

YatinGulati94 commented Oct 26, 2022

toast-gear commented Oct 26, 2022 •

edited

Loading

YatinGulati94 commented Oct 26, 2022

YatinGulati94 commented Oct 26, 2022

toast-gear commented Oct 26, 2022

YatinGulati94 commented Oct 26, 2022

YatinGulati94 commented Oct 26, 2022

toast-gear commented Oct 26, 2022 •

edited

Loading

YatinGulati94 commented Oct 26, 2022

YatinGulati94 commented Oct 26, 2022

toast-gear commented Oct 26, 2022

YatinGulati94 commented Oct 26, 2022

YatinGulati94 commented Oct 26, 2022

self hosted runner stuck on queued #69

self hosted runner stuck on queued #69

Comments

kaykhancheckpoint commented Jul 6, 2020 • edited Loading

mumoshu commented Jul 7, 2020

kaykhancheckpoint commented Jul 7, 2020 • edited Loading

kaykhancheckpoint commented Jul 7, 2020 • edited Loading

This comment has been minimized.

kaykhancheckpoint commented Jul 7, 2020 • edited Loading

mumoshu commented Jul 7, 2020

kaykhancheckpoint commented Jul 8, 2020 • edited Loading

rezmuh commented Jul 10, 2020

kaykhancheckpoint commented Jul 10, 2020 • edited Loading

reiniertimmer commented Jul 10, 2020

rezmuh commented Jul 10, 2020

kaykhancheckpoint commented Jul 17, 2020

stale bot commented Apr 30, 2021

kkmoslehpour commented Oct 26, 2022

mumoshu commented Oct 26, 2022

YatinGulati94 commented Oct 26, 2022

mumoshu commented Oct 26, 2022

YatinGulati94 commented Oct 26, 2022 • edited Loading

YatinGulati94 commented Oct 26, 2022

toast-gear commented Oct 26, 2022 • edited Loading

YatinGulati94 commented Oct 26, 2022

toast-gear commented Oct 26, 2022 • edited Loading

YatinGulati94 commented Oct 26, 2022

YatinGulati94 commented Oct 26, 2022

toast-gear commented Oct 26, 2022

YatinGulati94 commented Oct 26, 2022

YatinGulati94 commented Oct 26, 2022

Install Node.js v14.x

Install OpenJDK-8

Install Python

Install BS4

Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)

Note: this installs the necessary libs to make the bundled version of Chromium that Puppeteer

installs, work.

Set XDG environment variables explicitly so that GitHub Actions does not apply

default paths that do not point to the plugins directory

https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html

Create isolated plugins directory with rwx permission for all users

Azure pipelines switches to a container-user which does not have access

to the root directory where plugins are normally installed

Install SFDX CLI

Install AWS CLI for executing the commands

toast-gear commented Oct 26, 2022 • edited Loading

YatinGulati94 commented Oct 26, 2022

YatinGulati94 commented Oct 26, 2022

toast-gear commented Oct 26, 2022

YatinGulati94 commented Oct 26, 2022

YatinGulati94 commented Oct 26, 2022

kaykhancheckpoint commented Jul 6, 2020 •

edited

Loading

kaykhancheckpoint commented Jul 7, 2020 •

edited

Loading

kaykhancheckpoint commented Jul 7, 2020 •

edited

Loading

kaykhancheckpoint commented Jul 7, 2020 •

edited

Loading

kaykhancheckpoint commented Jul 8, 2020 •

edited

Loading

kaykhancheckpoint commented Jul 10, 2020 •

edited

Loading

YatinGulati94 commented Oct 26, 2022 •

edited

Loading

toast-gear commented Oct 26, 2022 •

edited

Loading

toast-gear commented Oct 26, 2022 •

edited

Loading

toast-gear commented Oct 26, 2022 •

edited

Loading