Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GKEStartKueueInsideClusterOperator to support Kueue inside clusters #37072

Merged
merged 1 commit into from
Feb 15, 2024

Conversation

VladaZakharova
Copy link
Contributor

This PR adds the ability to install Kueue inside specific types of clusters in GKE.
For more details please check: https://cloud.google.com/kubernetes-engine/docs/tutorials/kueue-intro#create_a_cluster


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@raphaelauv
Copy link
Contributor

Why add an operator in airflow to install kueue in a GKE ?

the installation thanks to kubectl is not enough ? https://github.com/kubernetes-sigs/kueue?tab=readme-ov-file#installation

Copy link
Collaborator

@dirrao dirrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually use infrastructure as code to install the cluster and additional things together. what is the significance installing Kueue alone?

@VladaZakharova
Copy link
Contributor Author

Hey @dirrao @raphaelauv !
Thank you for looking my changes.
This operator is the first one in a set operators that are needed to cover functionality we have for GKE clusters. For more information please check this tutorial: https://cloud.google.com/kubernetes-engine/docs/tutorials/kueue-intro#create_the_resourceflavor
Without making sure Kueue is installed inside cluster, there is no way to create and use ClusterQueue and LocalQueue objects.

@hussein-awala
Copy link
Member

We usually use infrastructure as code to install the cluster and additional things together. what is the significance installing Kueue alone?

Not true, we have an operator to create a GKE cluster where some users create an ephemeral cluster to run their tasks, then they delete it at the end of the run.

My first thought was similar to what @raphaelauv said, but since we support creating an ephemeral GKE cluster, why do we refuse to support configuring it?

WDYT?

@dirrao
Copy link
Collaborator

dirrao commented Jan 30, 2024

We usually use infrastructure as code to install the cluster and additional things together. what is the significance installing Kueue alone?

Not true, we have an operator to create a GKE cluster where some users create an ephemeral cluster to run their tasks, then they delete it at the end of the run.

Yah. You are right. Thanks for answering question.

@dirrao
Copy link
Collaborator

dirrao commented Jan 30, 2024

Why add an operator in airflow to install kueue in a GKE ?

the installation thanks to kubectl is not enough ? https://github.com/kubernetes-sigs/kueue?tab=readme-ov-file#installation

Yes. I am strongly included towards this. Use bash operator and run the kubectl command instead creating this operator.

@VladaZakharova
Copy link
Contributor Author

VladaZakharova commented Jan 30, 2024

Okay, but running this in BashOperator will not give us the ability to:

  1. check cluster if it can be used to install kueue (the only type of Cluster that can be used is Autopilot and clusters that have ability to autoscale
  2. check that after running YAML file there was no error during executing this file
  3. wait until the deployment will be ready (it takes around 3-4 minutes for Kueue to be installed inside Cluster)

@potiuk Would you suggest something regarding this?

@raphaelauv
Copy link
Contributor

There is already KubernetesCreateResourceOperator

For creating any resource in k8s

@VladaZakharova
Copy link
Contributor Author

There is already KubernetesCreateResourceOperator

For creating any resource in k8s

Yes, thank you for the idea, it makes sense to inherit existing logic from this operator. But i would still suggest to keep logic to install Kueue in separate operator, since there is more logic needed then just simply run kubectl command in BashOperator

@VladaZakharova
Copy link
Contributor Author

Hi @raphaelauv ! As i have checked the KubernetesCreateResourceOperator is not using any specific hook method to call create_from_yaml() from kubernetes client. The logic that we actually need in GKEStartKueueInsideClusterOperator covers more then just this, as i have mentioned before: checking is the cluster accepts installing Kueue inside, waiting for the yaml to be applied on the Cluster method and also outputting errors if there are such.
I am not sure that instantiating from KubernetesCreateResourceOperator only for one specific call method is the way how we want it to be. I will still suggest having this logic in separate operator with its own logic.
KubernetesCreateResourceOperator is a great parent class for the operators i am working on now: GKECreateClusterQueueOperator and GKECreateLocalQueueOperator. The plan is to use the logic from KubernetesCreateResourceOperator to pass YAML file and run it in cluster.

@raphaelauv
Copy link
Contributor

why build operators specific to GKE ?

@VladaZakharova
Copy link
Contributor Author

why build operators specific to GKE ?

All the operators that we have for operations on cluster (create and delete) were implemented as a part of google provider as GKE operators. So the operator to add support of Kueue inside cluster was decided to add to the same provider without creating base class in Kubernetes.

@raphaelauv
Copy link
Contributor

create a GKE is calling GCP API

deploy a https://github.com/kubernetes-sigs/kueue is calling K8S API not GCP API

@VladaZakharova
Copy link
Contributor Author

create a GKE is calling GCP API

deploy a https://github.com/kubernetes-sigs/kueue is calling K8S API not GCP API

Based on this documentation for Kueue the conditions to meet before we actually can install Kueue are different from the conditions for installing Kueue inside GKE cluster. So the only similar thing we will have between KubernetesInstallKueueInsideOperator and GKEInstallKueueInsideOperator will be only the k8s API call to apply YAML file inside cluster. There will be no need at all to inherit GKE from Kubernetes in this case, because classes and their checks will be different.
Regarding methods that i am using from K8s API: from the list of all api calls that i make in the code, create_from_yaml() method is called from k8s client, because there is no analogy for this in GCP api.

@VladaZakharova
Copy link
Contributor Author

VladaZakharova commented Feb 13, 2024

Hi @eladkal @potiuk !
Can you please check changes in this PR too? :)

@raphaelauv
Copy link
Contributor

Based on this documentation for [Kueue](https://kueue.sigs.k8s.io/docs/installation/) the conditions to meet before we actually can install Kueue are different from the conditions for installing Kueue inside GKE cluster

I do not see any reference to GKE in this doc and anything that explicit that deploy Kueue in GKE is different than deploy Kueue to a "normal" K8S

@VladaZakharova
Copy link
Contributor Author

Based on this documentation for [Kueue](https://kueue.sigs.k8s.io/docs/installation/) the conditions to meet before we actually can install Kueue are different from the conditions for installing Kueue inside GKE cluster

I do not see any reference to GKE in this doc and anything that explicit that deploy Kueue in GKE is different than deploy Kueue to a "normal" K8S

If you will take a look on the document i have added regarding installation of Kueue inside GKE you will see that conditions for GKE are different from conditions for general K8s cluster. GKEStartKueueInsideClusterOperator was designed specifically to cover GKE clusters and all the checks were added to check needed conditions.
For K8s clusters, as you can see, we have different workflow and the only similar part between installation for GKE and K8s is actually installing Kueue using k8s api call create_from_yaml(). So to follow your request and create base class in Kubernetes to then inherit from it in GKE, i will add 2 different operators which will use only 1 common api call - call create_from_yaml().

@potiuk potiuk merged commit df132b2 into apache:main Feb 15, 2024
63 checks passed
sunank200 pushed a commit to astronomer/airflow that referenced this pull request Feb 21, 2024
abhishekbhakat pushed a commit to abhishekbhakat/my_airflow that referenced this pull request Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers area:system-tests kind:documentation provider:cncf-kubernetes Kubernetes provider related issues provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants