Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Summary] Kubernetes backend support #1513

Closed
9 tasks done
6543 opened this issue Dec 30, 2022 · 17 comments · Fixed by #2756
Closed
9 tasks done

[Summary] Kubernetes backend support #1513

6543 opened this issue Dec 30, 2022 · 17 comments · Fixed by #2756
Labels
backend/kubernetes summary it's a summary for lot of issues
Milestone

Comments

@6543
Copy link
Member

6543 commented Dec 30, 2022

basic support (#9) was added with #552

Current state:

@6543 6543 added summary it's a summary for lot of issues backend/kubernetes labels Dec 30, 2022
@maltegrosse
Copy link

@6543 great work, can I test it somehow?
Is it possible to include any resources into it, e.g. nvidia gpus ?

@6543
Copy link
Member Author

6543 commented Jan 16, 2023

hmm @maltegrosse at the moment I would say, best help is to test and point out it's limitations/issues.

passthrough hardware axeleration like gpus, I never thought of. if it's about help via $, we have an openCollective account

@maltegrosse
Copy link

maltegrosse commented Jan 16, 2023

@6543 is there a different behavior regarding cpu/mem resources and other resources available on the node?
https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

eg:

https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/utils/gpu/gpu.go#L113

vs.

resources := v1.ResourceRequirements{

@6543
Copy link
Member Author

6543 commented Jan 24, 2023

I guess gpu's are just not jet taken into account - but that's an interesting usecase

@Dan6erbond
Copy link

@maltegrosse as long as the cluster has device plugins providing resources such as GPUs, it's pretty much the same as defining a resource limit for CPU/memory.

@maltegrosse
Copy link

@6543 I seeing great progress in k8s backend support.
I think I wanna give it a try and update from 0.x to the latest stable version. But now I just saw that 2.0 is in the pipeline.
Would you recommend me to wait for the new 2.x major release? if yes, is there any ETA ?

@qwerty287
Copy link
Contributor

You're right that 2.0 is in progress, but I don't think it's a bad idea to upgrade to 1.0 first. 2.0 will contain some breaking changes, but not as much as 1.0.

We're currently somewhat stuck at #2476 but after this is done we would like to release 2.0. This can take a week to a month, there's no fixed ETA.

@pat-s
Copy link
Contributor

pat-s commented Oct 23, 2023

WRT to k8s specifically: I'd say it's useable for production needs, at least I do so across many instances and a few dozen repos with some complex configs.

@maltegrosse
Copy link

sounds nice, thanks for the feedback @pat-s . the only point which confuses me are the resource limits. Seems like every step requires the resources definitions - or can I simple add it to any agent globally by using normal k8s syntax? (so it will be applied to any job) - see https://woodpecker-ci.org/docs/next/administration/backends/kubernetes#resources
The reason is that i dont trust my users that they can assume / predict proper resource usage :)

Addionally, are there any breaking changes regarding my db (postgres) ? (currently using 0.15.6)

@qwerty287
Copy link
Contributor

I couldnt find anything at https://woodpecker-ci.org/docs/next/migrations

This means there's nothing. Of course, some db migrations will run, but woodpecker handles this automatically on first start after update.

@pat-s
Copy link
Contributor

pat-s commented Oct 23, 2023

Seems like every step requires the resources definitions

Yes, that's the case and probably won't change in the future.

or can I simple add it to any agent globally by using normal k8s syntax?

No, it must be added to each pipeline and their steps. Also, it's not about the runner (which is a separate deployment) but about the pods spawned by the runner. These are defined by the respective pipelines.

The reason is that i dont trust my users that they can assume / predict proper resource usage

By default, resources are not set (as it is the case for any other k8s resource). And yeah, teaching is needed. I feel you however, I have the same issues in my environment WRT to users ;)

@maltegrosse
Copy link

maltegrosse commented Oct 23, 2023

thank you @pat-s
Have you played with Resource Quotas? Havent tried with it yet, but could somehow limit the damage :-)

@pat-s
Copy link
Contributor

pat-s commented Oct 23, 2023

Seems like an interesting idea, maybe we can implement this in the helm chart, so it can be applied across the namespace WP is running in. Thanks for sharing the idea!

@maltegrosse
Copy link

And setting up a default resource definition for each step is not an option at all for WP?

  • if no resource defintion in workflow file
    • check if global resource definition
      • apply this, or
      • fall back to no resource definition

As Resource Quotas mention:
For cpu and memory resources, ResourceQuotas enforce that every (new) pod in that namespace sets a limit for that resource. If you enforce a resource quota in a namespace for either cpu or memory, you, and other clients, must specify either requests or limits for that resource, for every new Pod you submit. If you don't, the control plane may reject admission for that Pod.

or are Limit Ranges exactly to solve that issue?
seems like if I look at the first example

@6543 6543 added this to the 2.0.0 milestone Nov 6, 2023
@6543
Copy link
Member Author

6543 commented Nov 6, 2023

close this as we shoudl have full support now - if there are still issues they are considerated normal bugs :)

@maltegrosse
Copy link

@pat-s I finally upgraded to wp2, on kubernetes, works great! (inlcuding resource limits for GPU)

resources are limited by LimitRange:

apiVersion: v1
kind: LimitRange
metadata:
  name: compute-limits
  namespace: woodpecker
spec:
  limits:
  - default: 
      cpu: 12
      memory: 40Gi
      nvidia.com/mig-2g.20gb: 1
    type: Container

thank you all again!

@6543
Copy link
Member Author

6543 commented Nov 28, 2023

nice ❤️

I'll lock this issue as we now have kube support :)
future interactions should be new issues.

@woodpecker-ci woodpecker-ci locked as resolved and limited conversation to collaborators Nov 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backend/kubernetes summary it's a summary for lot of issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants