-
Notifications
You must be signed in to change notification settings - Fork 874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
just add application overlays and generateName to current application… #435
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
9242922
to
fe8fbb6
Compare
/retest |
Before doing more mass refactorings of the manifests should we
Can you update the PR description to explain the motivation behind the introduction of a parameter for the application name? Is this because we want to give different versions of an application (e.g. v0.6.2 vs. v0.7.) different instance names? And we can't rely on kustomize's suffix generator or similar because those names would change every time we run kustomize build? Although looking at the KFDef changes it looks like each application is setting generateName to some non-unique value "kubeflow-" With respect to better evaluating the correctness of our manifests #306 can we do something like the following
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore the other, unrelated PRs that were reverted as part of #426 ?
will do. Should I contact the author or just restore each one and notify the author?
At least partially address how evaluate the correctness of our manifests #306?
in progress
Include a suitable kustomization.yaml for every application that includes the suitable overlays
Do you mean a kustomization.yaml that kfctl would generate for a given config file and this kustomization.yaml would be part of the golden manifest (along with what kustomize would produce by calling kustomize build .../kustomization.yaml
?
Can you update the PR description to explain the motivation behind the introduction of a parameter for the application name?
done
Reviewable status: 0 of 291 files reviewed, all discussions resolved (waiting on @jlewi and @zhenghuiwang)
fe8fbb6
to
5897f84
Compare
@kkasravi Instead of relying on kfctl to generate a unique instance name why can't each Application just set the appropriate labels in the YAML defining the application? IIUC the primary reason we need uniqueness for the instance names is because during upgrades we want to distinguish between the 0.X version of the application and the 0.Y application of the instance. But we could just make it the responsibility of the maintainers of the application manifest to bump the instance name on each version. We could even build tooling to enforce this consistently across all applications. Going a step further we could establish a convention that instance names should follow a specific naming schema like
So for jupyter webapp it would be
This way we don't have to push additional complexity onto kfctl and KFDef.
How are instance labels being added to all the resources? Is this kfctl magic? |
We are building more "magic" into kfctl; i.e. we are relying on kfctl to generate kustomization files. This makes testing harder because we need to recreate the kfctl magic in order to generate a kustomization file to be used for testing. Ideally all of the kustomize applications should be buildable on their own and get a suitable/working set of spec. For example I should be able to do the following
This doesn't work right now because we don't have a kustomization.yaml in that directory because we depend on KFctl to generate it with various parameters such as the overlays. But we should have a default manifests/jupyter/jupyter-webapp/kustomization.yaml that includes suitable defaults (e.g. the Application and ISTIO overalys). From a testing perspective I should now be able to do something like
And compare the output to some expected set of YAML manifests. That won't cover all possible cases but it should cover X% of them. Then we would just need to add more specialized use cases for different bits of customization. |
I already sent out PRs to restore the 2 most important ones and they have been merged so I think we are good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going a step further we could establish a convention that instance names should follow a specific naming schema like
${APPNAME}-${VERSION}
So for jupyter webapp it would be
jupyter-webapp-v0-7-0
We could use the name-version as an instance id, what if we were to use the component digest as a suffix in the instance id as part of our ci/cd since we're already updating the base kustomization.yaml anyway?
How are instance labels being added to all the resources? Is this kfctl magic?
This is straight kustomize functionality using commonLabels in application.yaml which kfctl moves into the kustomization.yaml it creates.
But we should have a default manifests/jupyter/jupyter-webapp/kustomization.yaml that includes suitable defaults (e.g. the Application and ISTIO overalys).
We should move any kfctl magic out of kfctl and into a kustomize generator plugin. Kustomize has adopted plugins for many things including much of its existing functionality. We would only need a plugin for parameter overrides that differ from defaults. Right now the CURRENT jupyter-web-app that kfctl generates from kfctl_gcp_iap.yaml under this PR looks like
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: kubeflow
bases:
- base
- overlays/istio
resources:
- overlays/application/application.yaml
commonLabels:
app.kubernetes.io/component: jupyter
app.kubernetes.io/instance: $(generateName)
app.kubernetes.io/managed-by: kfctl
app.kubernetes.io/name: jupyter-web-app
app.kubernetes.io/part-of: kubeflow
app.kubernetes.io/version: v0.6
configMapGenerator:
- behavior: unspecified
env: overlays/application/params.env
name: jupyter-web-app-parameters
vars:
- fieldref:
fieldPath: data.generateName
name: generateName
objref:
apiVersion: v1
kind: ConfigMap
name: jupyter-web-app-parameters
configurations:
- overlays/application/params.yaml
and would be replaced with a kustomization.yaml that we checkin rather than what kfctl generates that looks like
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: kubeflow
resources:
- base
- overlays/istio
- overlays/application
commonLabels:
app.kubernetes.io/component: jupyter
app.kubernetes.io/instance: jupyter-web-app@sha256:d19baa2d5404b889075a1f1d6d40d0b143828a72d25a7750f076979562c9a6f0
app.kubernetes.io/managed-by: kfctl
app.kubernetes.io/name: jupyter-web-app
app.kubernetes.io/part-of: kubeflow
app.kubernetes.io/version: v0.6
note that the instance name doesn't come from a kustomize var and is added at the same time the image tag is added in jupyter-web-app/base/kustomization.yaml. The commonLabels would also be removed from jupyter-web-app/overlays/application/kustomization.yaml (see below).
If we needed to use a generator plugin for parameter overrides for defaults then the kustomization.yaml we checkin would be:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: kubeflow
resources:
- base
commonLabels:
app.kubernetes.io/component: jupyter
app.kubernetes.io/instance: jupyter-web-app@sha256:d19baa2d5404b889075a1f1d6d40d0b143828a72d25a7750f076979562c9a6f0
app.kubernetes.io/managed-by: kfctl
app.kubernetes.io/name: jupyter-web-app
app.kubernetes.io/part-of: kubeflow
app.kubernetes.io/version: v0.6
generators:
- github.com/kubeflow/manifests//kfdef/kfctl_gcp_iap.yaml?ref=master
Note that the overlays are removed and a generator section is added. The plugin would get the overlays and parameter default overrides from kfctl_gcp_iap.yaml.
The kustomization.yaml under jupyter-web-app/overlays/application is also simplified and goes from
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../base
resources:
- application.yaml
configMapGenerator:
- name: jupyter-web-app-parameters
env: params.env
vars:
- name: generateName
objref:
kind: ConfigMap
name: jupyter-web-app-parameters
apiVersion: v1
fieldref:
fieldpath: data.generateName
configurations:
- params.yaml
commonLabels:
app.kubernetes.io/name: jupyter-web-app
app.kubernetes.io/instance: $(generateName)
app.kubernetes.io/managed-by: kfctl
app.kubernetes.io/component: jupyter
app.kubernetes.io/part-of: kubeflow
app.kubernetes.io/version: v0.6
to
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- application.yaml
Going this route has the following advantages:
- we're now vanilla kustomize - within any extended functionality we need utilizing what kustomize recommends and provides.
- we use gitops for instance label creation
- individual apps can now be deployed as is using
kustomize build | kubectl apply -f -
orkubectl -k .
Reviewable status: 0 of 291 files reviewed, all discussions resolved (waiting on @jlewi and @zhenghuiwang)
Thanks @kkasravi Could you explain though why we are setting instanceName using generateName which is getting set in the kfdefs.yaml file? Why don't we just set instanceName in the kustomization file? For example, the jupyter-web-app already defines commonLabels here
So why can't we just change that to
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the instance name should also be the name of the application CR, this is how the uber application CR collects the nested application CRs. If we set the instance name in manifests/jupyter/jupyter-web-app/base/kustomization.yaml
we would also need to set it in application.yaml. If we don't do it by ci/cd, then the application owner could set it in the commonLabels and in the application.yaml's metadata.name
Reviewable status: 0 of 291 files reviewed, all discussions resolved (waiting on @jlewi and @zhenghuiwang)
app.kubernetes.io/component: jupyter | ||
app.kubernetes.io/part-of: kubeflow | ||
app.kubernetes.io/version: v0.6 | ||
app.kubernetes.io/instance: $(generateName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kkasravi What's the reasoning for introducing generateName here? The selector was already matching a set of labels that was already being consistently applied across all the components via kustomization.yaml commonLabels.
Why are we parameterizing this using generateName?
commonLabels: | ||
app.kubernetes.io/name: jupyter-web-app | ||
app.kubernetes.io/instance: jupyter-web-app | ||
app.kubernetes.io/instance: $(generateName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we parameterizing the instance name rather than just explicitly coding it here in common labels?
@@ -1,16 +1,11 @@ | |||
apiVersion: app.k8s.io/v1beta1 | |||
kind: Application | |||
metadata: | |||
name: jupyter-web-app | |||
name: $(generateName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we parameterizing the name rather than just explicitly setting it here? What was wrong with just hardcoding the name here?
I'm still not sure I fully follow here. Lets start with the KFDef files. Why is KFDef being used to set the instance label by adding a stanza like this for every application to *every KFDef file?
This stanza doesn't look like its conveying any information that an application shouldn't already know; i.e. what does the notebook controller need the KFDef file to set a parameter to know that its name is "notebook-controller"? Related to that why are we generating random suffixes to use as the instance label? For example why would the instance label for centraldashboard be like
The labels and selector just need to be set consistently. So can't we just do this in the kustomize manifests? IIUC for every application we would just need to set two places consistently
It looks like that was already happening e.g. in jupyter_web_app. See my inline comments on jupyter_web_app. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The labels and selector just need to be set consistently. So can't we just do this in the kustomize manifests? IIUC for every application we would just need to set two places consistently
If we use centraldashboard-v0.7.0 (would it use the same label as kubeflow?) then doesn't it imply that we can only have one instance of centraldashboard on that cluster for that version? But we currently don't fail if we redeploy kubeflow to a different namespace in the same cluster even when we're redeploying centraldashboard ClusterRoleBindings which already exist with the same name and instance label. Wouldn't it mean that deleting kubeflow could only delete the ClusterRoleBinding for centraldashboard if it found no centraldashboard resources across all namespaces? If we're upgrading to centraldashboard-v0.7.1 we can can't patch any resource with a selector ( Application, Deployment, StatefulSet, Service) because selector's are immutable.
Reviewable status: 0 of 291 files reviewed, 3 unresolved discussions (waiting on @jlewi and @zhenghuiwang)
jupyter/jupyter-web-app/overlays/application/application.yaml, line 4 at r1 (raw file):
Previously, jlewi (Jeremy Lewi) wrote…
Why are we parameterizing the name rather than just explicitly setting it here? What was wrong with just hardcoding the name here?
I believe we can just use a name here and not generateName if we have a way of knowing when we can delete cluster level resources. BTW generateName is generated once and referenced from multiple resources so if we generated centraldashboard-12adec that name would be used as the instance label in all resources including cluster level resources for centraldashboard (ServiceAccount, Role, ClusterRole, RoleBinding, ClusterRoleBinding, ConfigMap, Service, Deployment, VirtualService)
jupyter/jupyter-web-app/overlays/application/application.yaml, line 8 at r1 (raw file):
Previously, jlewi (Jeremy Lewi) wrote…
@kkasravi What's the reasoning for introducing generateName here? The selector was already matching a set of labels that was already being consistently applied across all the components via kustomization.yaml commonLabels.
Why are we parameterizing this using generateName?
I believe we can if we can determine whether we can delete cluster level resources
jupyter/jupyter-web-app/overlays/application/kustomization.yaml, line 22 at r1 (raw file):
Previously, jlewi (Jeremy Lewi) wrote…
Why are we parameterizing the instance name rather than just explicitly coding it here in common labels?
See earlier comments about cluster level resources and constraints about only deploying (in this case) jupyter-web-app once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instance name (and generateName) came from
which is why i make the argument above that knowing whether we should delete centraldashboard's ClusterRoleBinding would depend on knowing that centraldashboard namespaced resources were also being deleted.
Reviewable status: 0 of 291 files reviewed, 3 unresolved discussions (waiting on @jlewi and @zhenghuiwang)
Kubeflow only currently supports one instance of each Kubeflow application per cluster. To support more than one instance of an application per cluster we would have to solve all kinds of other problems. For example, if you install multiple versions of CR controller per cluster, how do you decide which resources get handled by which controller? So if we assume that we only need to support one instance per cluster; does that allow us to simplify things? For example, do we need to use random salt for the instance names? The only use case we have right now for multiple instances per cluster would be when we are upgrading Kubeflow and we have a 0.X and 0.Y instance both installed for a brief period of time before cleaning up 0.X. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 291 files reviewed, 3 unresolved discussions (waiting on @jlewi, @kkasravi, and @zhenghuiwang)
jupyter/jupyter-web-app/overlays/application/application.yaml, line 4 at r1 (raw file):
Previously, kkasravi (Kam Kasravi) wrote…
I believe we can just use a name here and not generateName if we have a way of knowing when we can delete cluster level resources. BTW generateName is generated once and referenced from multiple resources so if we generated centraldashboard-12adec that name would be used as the instance label in all resources including cluster level resources for centraldashboard (ServiceAccount, Role, ClusterRole, RoleBinding, ClusterRoleBinding, ConfigMap, Service, Deployment, VirtualService)
Per comment on the main thread.
One instance of Kubeflow per cluster is the only thing we really support right now.
So if we delete the Application resource that means delete centraldashboard and we can cleanup any associated cluster resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we assume that we only need to support one instance per cluster; does that allow us to simplify things?
yes completely.
Reviewable status: 0 of 291 files reviewed, 3 unresolved discussions (waiting on @jlewi and @zhenghuiwang)
jupyter/jupyter-web-app/overlays/application/application.yaml, line 4 at r1 (raw file):
Previously, jlewi (Jeremy Lewi) wrote…
Per comment on the main thread.
One instance of Kubeflow per cluster is the only thing we really support right now.So if we delete the Application resource that means delete centraldashboard and we can cleanup any associated cluster resources.
got it
closing this PR - reopening with just one application (jupyter-web-app) |
Which issue is resolved by this Pull Request:
Resolves #
Description of your changes:
Adding the application overlay creates an Application CR for that manifest. Applications will track their components and report their status. This is key for e2e tests since we now have a better approach to health monitoring than just checking deployment status (the current e2e tests).
We also add recommended labels to all resources that have an application overlay as suggested by kubernetes. For the app.kubernetes.io/instance recommended label we use the kustomize var $(generateName). When kfctl sees this parameter in a config file such as below:
it will generate a suffix and use this unique name in the instance label. Generating a unique name is also specified in the kubernetes documentation. For example centraldashboard will have a name generated like centraldashboard-akdfjls. Why don't we use resource.metadata.generateName instead of $(generateName)? This can only be used on kubectl create, whereas we run the equivalent of kubectl apply.
Adding the instance label to all resources allows us to delete all resources including cluster wide resources even if we're not adding that resource to the application.componentKinds. Deletion of cluster wide resources has been broken for a while in kubeflow so this is a high priority.
Finally this is needed for app upgrades.
Checklist:
cd manifests/tests
make generate
make test
This change is