Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't create runners on AWS Graviton nodes #3929

Open
4 tasks done
krzyzakp opened this issue Feb 13, 2025 · 1 comment
Open
4 tasks done

Can't create runners on AWS Graviton nodes #3929

krzyzakp opened this issue Feb 13, 2025 · 1 comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers

Comments

@krzyzakp
Copy link

Checks

Controller Version

0.9.3

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Create a nodepool with ARM64 architecture
2. Try to put your runners on that nodepool with min_runners >0
3. See that runner can't get created due to `Warning  FailedBinding     2s    ephemeral_volume   ephemeral volume work: PVC github-arc/gm.small.arm-lmjwg-runner-7hkh6-work was not created for pod github-arc/gm.small.arm-lmjwg-runner-7hkh6 (pod is not owner)`

Describe the bug

We have couple of self hosted runners configured on AWS using on-prem/spot mix. So far all were on x86_64 architecture, now wanted to test also ARM64 (using Graviton instances). Listener pod starts without problem, but runners can't start due to errors in mounting ephemeral volume.

On side note - exactly same config is used to create those runners and other that work. Only difference is used nodepool. Nodepool itself is also same, except architecture.

Describe the expected behavior

Pods starts normally and handle jobs

Additional Context

krzyzakp@X1Carbon:/home/krzyzakp $ k describe -n github-arc autoscalingrunnersets.actions.github.com gm.small.arm                 
Name:         gm.small.arm
Namespace:    github-arc
Labels:       actions.github.com/organization=XXXX
              actions.github.com/scale-set-name=gm.small.arm
              actions.github.com/scale-set-namespace=github-arc
              app.kubernetes.io/component=autoscaling-runner-set
              app.kubernetes.io/instance=gm.small.arm
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=gm.small.arm
              app.kubernetes.io/part-of=gha-rs
              app.kubernetes.io/version=0.9.3
              helm.sh/chart=gha-rs-0.9.3
Annotations:  actions.github.com/cleanup-github-secret-name: gm.small.arm-gha-rs-github-secret
              actions.github.com/cleanup-kubernetes-mode-role-binding-name: gm.small.arm-gha-rs-kube-mode
              actions.github.com/cleanup-kubernetes-mode-role-name: gm.small.arm-gha-rs-kube-mode
              actions.github.com/cleanup-kubernetes-mode-service-account-name: gm.small.arm-gha-rs-kube-mode
              actions.github.com/cleanup-manager-role-binding: gm.small.arm-gha-rs-manager
              actions.github.com/cleanup-manager-role-name: gm.small.arm-gha-rs-manager
              actions.github.com/runner-group-name: Default
              actions.github.com/runner-scale-set-name: gm.small.arm
              actions.github.com/values-hash: 485413e5bcb9f4c34b35c4cc53edb1a2443d7055f548c766c0829726fd52282
              meta.helm.sh/release-name: gm.small.arm
              meta.helm.sh/release-namespace: github-arc
              runner-scale-set-id: 20
API Version:  actions.github.com/v1alpha1
Kind:         AutoscalingRunnerSet
Metadata:
  Creation Timestamp:  2025-02-13T15:42:54Z
  Finalizers:
    autoscalingrunnerset.actions.github.com/finalizer
  Generation:        1
  Resource Version:  468838671
  UID:               763b02d6-c216-41c5-b245-ae83e4180ec7
Spec:
  Github Config Secret:  gm.small.arm-gha-rs-github-secret
  Github Config URL:     https://github.com/XXXXX
  Listener Template:
    Metadata:
      Annotations:
        prometheus.io/path:    /metrics
        prometheus.io/port:    8080
        prometheus.io/scrape:  true
    Spec:
      Containers:
        Image:  XXXXX.dkr.ecr.eu-central-1.amazonaws.com/github/actions/gha-runner-scale-set-controller:0.9.3
        Name:   listener
        Resources:
          Limits:
            Memory:  64Mi
          Requests:
            Cpu:     100m
            Memory:  64Mi
        Security Context:
          Run As User:  1000
      Node Selector:
        karpenter.sh/nodepool:  runner-arm
      Tolerations:
        Effect:    NoSchedule
        Key:       karpenter.sh/nodepool
        Operator:  Equal
        Value:     runner-arm
  Min Runners:     1
  Template:
    Metadata:
      Annotations:
        karpenter.sh/do-not-disrupt:  true
        prometheus.io/path:           /metrics
        prometheus.io/port:           8080
        prometheus.io/scrape:         true
    Spec:
      Containers:
        Command:
          /home/runner/run.sh
        Env:
          Name:   ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
          Value:  false
          Name:   ACTIONS_RUNNER_CONTAINER_HOOKS
          Value:  /home/runner/k8s/index.js
          Name:   ACTIONS_RUNNER_POD_NAME
          Value From:
            Field Ref:
              Field Path:  metadata.name
        Image:             XXXX.dkr.ecr.eu-central-1.amazonaws.com/github-arc-runner:latest
        Name:              runner
        Resources:
          Limits:
            Cpu:     500m
            Memory:  1Gi
          Requests:
            Cpu:     500m
            Memory:  1Gi
        Volume Mounts:
          Mount Path:  /home/runner/_work
          Name:        work
      Node Selector:
        karpenter.sh/nodepool:  runner-arm
      Restart Policy:           Never
      Security Context:
        Fs Group:            1001
      Service Account:       gm.small
      Service Account Name:  gm.small.arm-gha-rs-kube-mode
      Tolerations:
        Effect:    NoSchedule
        Key:       karpenter.sh/nodepool
        Operator:  Equal
        Value:     runner-arm
      Volumes:
        Ephemeral:
          Volume Claim Template:
            Spec:
              Access Modes:
                ReadWriteOnce
              Resources:
                Requests:
                  Storage:         10Gi
              Storage Class Name:  github-arc
        Name:                      work
Status:
  Current Runners:            1
  Pending Ephemeral Runners:  1
Events:                       <none>

Controller Logs

https://gist.github.com/krzyzakp/44b0c49aaf49b618d6053cd81286cb03

Runner Pod Logs

Events during startup, giving hope that it will work.
Events:
  Type     Reason                  Age   From                     Message
  ----     ------                  ----  ----                     -------
  Warning  FailedScheduling        32s   default-scheduler        0/8 nodes are available: waiting for ephemeral volume controller to create the persistentvolumeclaim "gm.small.arm-lmjwg-runner-7hkh6-work". preemption: 0/8 nodes are available: 8 Preemption is not helpful for scheduling.
  Normal   Scheduled               28s   default-scheduler        Successfully assigned github-arc/gm.small.arm-lmjwg-runner-7hkh6 to ip-10-150-112-203.eu-central-1.compute.internal
  Normal   Nominated               32s   karpenter                Pod should schedule on: nodeclaim/runner-arm-xlgz6, node/ip-10-150-112-203.eu-central-1.compute.internal
  Normal   SuccessfulAttachVolume  26s   attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-6232a62b-6f7f-4710-9b53-e3bdb43d5f22"
  Normal   Pulling                 23s   kubelet                  Pulling image "668273420038.dkr.ecr.eu-central-1.amazonaws.com/github-arc-runner:latest"
  Normal   Pulled                  2s    kubelet                  Successfully pulled image "668273420038.dkr.ecr.eu-central-1.amazonaws.com/github-arc-runner:latest" in 20.443s (20.443s including waiting). Image size: 637597729 bytes.
  Normal   Created                 2s    kubelet                  Created container runner
  Normal   Started                 2s    kubelet                  Started container runner


After some time it fails with following Events:

  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  2s    default-scheduler  0/8 nodes are available: persistentvolumeclaim "gm.small.arm-lmjwg-runner-7hkh6-work" is being deleted. preemption: 0/8 nodes are available: 8 Preemption is not helpful for scheduling.
  Warning  FailedBinding     2s    ephemeral_volume   ephemeral volume work: PVC github-arc/gm.small.arm-lmjwg-runner-7hkh6-work was not created for pod github-arc/gm.small.arm-lmjwg-runner-7hkh6 (pod is not owner)
  Normal   Nominated         1s    karpenter          Pod should schedule on: nodeclaim/runner-arm-xlgz6, node/ip-10-150-112-203.eu-central-1.compute.internal
@krzyzakp krzyzakp added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Feb 13, 2025
Copy link
Contributor

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers
Projects
None yet
Development

No branches or pull requests

1 participant