Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

just add application overlays and generateName to current application… #435

Closed
wants to merge 2 commits into from

Conversation

kkasravi
Copy link
Contributor

@kkasravi kkasravi commented Oct 4, 2019

Which issue is resolved by this Pull Request:
Resolves #

Description of your changes:
Adding the application overlay creates an Application CR for that manifest. Applications will track their components and report their status. This is key for e2e tests since we now have a better approach to health monitoring than just checking deployment status (the current e2e tests).

We also add recommended labels to all resources that have an application overlay as suggested by kubernetes. For the app.kubernetes.io/instance recommended label we use the kustomize var $(generateName). When kfctl sees this parameter in a config file such as below:

      overlays:
      - istio
      - application
      parameters:
      - name: generateName
        value: jupyter-webapp-

it will generate a suffix and use this unique name in the instance label. Generating a unique name is also specified in the kubernetes documentation. For example centraldashboard will have a name generated like centraldashboard-akdfjls. Why don't we use resource.metadata.generateName instead of $(generateName)? This can only be used on kubectl create, whereas we run the equivalent of kubectl apply.

Adding the instance label to all resources allows us to delete all resources including cluster wide resources even if we're not adding that resource to the application.componentKinds. Deletion of cluster wide resources has been broken for a while in kubeflow so this is a high priority.

Finally this is needed for app upgrades.
Checklist:

  • Unit tests have been rebuilt:
    1. cd manifests/tests
    2. make generate
    3. make test

This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign kkasravi
You can assign the PR to them by writing /assign @kkasravi in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kkasravi
Copy link
Contributor Author

kkasravi commented Oct 4, 2019

/retest

@jlewi
Copy link
Contributor

jlewi commented Oct 4, 2019

Before doing more mass refactorings of the manifests should we

  1. Restore the other, unrelated PRs that were reverted as part of kubeflow/manifests @ master is broken - uber issue #426 ?
  2. At least partially address how evaluate the correctness of our manifests How do we manually verify that the expected value in the unittests is correct? #306?

Can you update the PR description to explain the motivation behind the introduction of a parameter for the application name? Is this because we want to give different versions of an application (e.g. v0.6.2 vs. v0.7.) different instance names? And we can't rely on kustomize's suffix generator or similar because those names would change every time we run kustomize build?

Although looking at the KFDef changes it looks like each application is setting generateName to some non-unique value "kubeflow-"

With respect to better evaluating the correctness of our manifests #306 can we do something like the following

  1. Include a suitable kustomization.yaml for every application that includes the suitable overlays

  2. Provide a script that runs kustomize build for all of these applications and emits the output to a suitable directory that would get checked in

    • As a reviewer I can now see the actual output and a diff with respect to previous output

      • e.g. I can verify how generateName is changed
    • These can become the golden manifests (or at least 1 set) against which our unittests run

Copy link
Contributor Author

@kkasravi kkasravi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restore the other, unrelated PRs that were reverted as part of #426 ?

will do. Should I contact the author or just restore each one and notify the author?

At least partially address how evaluate the correctness of our manifests #306?

in progress

Include a suitable kustomization.yaml for every application that includes the suitable overlays

Do you mean a kustomization.yaml that kfctl would generate for a given config file and this kustomization.yaml would be part of the golden manifest (along with what kustomize would produce by calling kustomize build .../kustomization.yaml?

Can you update the PR description to explain the motivation behind the introduction of a parameter for the application name?

done

Reviewable status: 0 of 291 files reviewed, all discussions resolved (waiting on @jlewi and @zhenghuiwang)

@jlewi
Copy link
Contributor

jlewi commented Oct 5, 2019

@kkasravi Instead of relying on kfctl to generate a unique instance name why can't each Application just set the appropriate labels in the YAML defining the application?

IIUC the primary reason we need uniqueness for the instance names is because during upgrades we want to distinguish between the 0.X version of the application and the 0.Y application of the instance. But we could just make it the responsibility of the maintainers of the application manifest to bump the instance name on each version. We could even build tooling to enforce this consistently across all applications.

Going a step further we could establish a convention that instance names should follow a specific naming schema like

${APPNAME}-${VERSION}

So for jupyter webapp it would be

jupyter-webapp-v0-7-0

This way we don't have to push additional complexity onto kfctl and KFDef.

Adding the instance label to all resources allows us to delete all resources including cluster wide resources even if we're not adding that resource to the application.componentKinds. Deletion of cluster wide resources has been broken for a while in kubeflow so this is a high priority.

How are instance labels being added to all the resources? Is this kfctl magic?

@jlewi
Copy link
Contributor

jlewi commented Oct 5, 2019

Include a suitable kustomization.yaml for every application that includes the suitable overlays

We are building more "magic" into kfctl; i.e. we are relying on kfctl to generate kustomization files.

This makes testing harder because we need to recreate the kfctl magic in order to generate a kustomization file to be used for testing.

Ideally all of the kustomize applications should be buildable on their own and get a suitable/working set of spec.

For example I should be able to do the following

git clone https://github.com/kubeflow/manifests.git
cd manifests/jupyter/jupyter-web-app
kustomize build

This doesn't work right now because we don't have a kustomization.yaml in that directory because we depend on KFctl to generate it with various parameters such as the overlays.

But we should have a default manifests/jupyter/jupyter-webapp/kustomization.yaml that includes suitable defaults (e.g. the Application and ISTIO overalys).

From a testing perspective I should now be able to do something like

cd manifests/jupyter/jupyter-web-app
kustomize build

And compare the output to some expected set of YAML manifests. That won't cover all possible cases but it should cover X% of them. Then we would just need to add more specialized use cases for different bits of customization.

@jlewi
Copy link
Contributor

jlewi commented Oct 5, 2019

Restore the other, unrelated PRs that were reverted as part of #426 ?

I already sent out PRs to restore the 2 most important ones and they have been merged so I think we are good to go.

Copy link
Contributor Author

@kkasravi kkasravi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going a step further we could establish a convention that instance names should follow a specific naming schema like
${APPNAME}-${VERSION}
So for jupyter webapp it would be
jupyter-webapp-v0-7-0

We could use the name-version as an instance id, what if we were to use the component digest as a suffix in the instance id as part of our ci/cd since we're already updating the base kustomization.yaml anyway?

How are instance labels being added to all the resources? Is this kfctl magic?

This is straight kustomize functionality using commonLabels in application.yaml which kfctl moves into the kustomization.yaml it creates.

But we should have a default manifests/jupyter/jupyter-webapp/kustomization.yaml that includes suitable defaults (e.g. the Application and ISTIO overalys).

We should move any kfctl magic out of kfctl and into a kustomize generator plugin. Kustomize has adopted plugins for many things including much of its existing functionality. We would only need a plugin for parameter overrides that differ from defaults. Right now the CURRENT jupyter-web-app that kfctl generates from kfctl_gcp_iap.yaml under this PR looks like

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: kubeflow
bases:
- base
- overlays/istio
resources:
- overlays/application/application.yaml
commonLabels:
  app.kubernetes.io/component: jupyter
  app.kubernetes.io/instance: $(generateName)
  app.kubernetes.io/managed-by: kfctl
  app.kubernetes.io/name: jupyter-web-app
  app.kubernetes.io/part-of: kubeflow
  app.kubernetes.io/version: v0.6
configMapGenerator:
- behavior: unspecified
  env: overlays/application/params.env
  name: jupyter-web-app-parameters
vars:
- fieldref:
    fieldPath: data.generateName
  name: generateName
  objref:
    apiVersion: v1
    kind: ConfigMap
    name: jupyter-web-app-parameters
configurations:
- overlays/application/params.yaml

and would be replaced with a kustomization.yaml that we checkin rather than what kfctl generates that looks like

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: kubeflow
resources:
- base
- overlays/istio
- overlays/application
commonLabels:
  app.kubernetes.io/component: jupyter
  app.kubernetes.io/instance: jupyter-web-app@sha256:d19baa2d5404b889075a1f1d6d40d0b143828a72d25a7750f076979562c9a6f0
  app.kubernetes.io/managed-by: kfctl
  app.kubernetes.io/name: jupyter-web-app
  app.kubernetes.io/part-of: kubeflow
  app.kubernetes.io/version: v0.6

note that the instance name doesn't come from a kustomize var and is added at the same time the image tag is added in jupyter-web-app/base/kustomization.yaml. The commonLabels would also be removed from jupyter-web-app/overlays/application/kustomization.yaml (see below).

If we needed to use a generator plugin for parameter overrides for defaults then the kustomization.yaml we checkin would be:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: kubeflow
resources:
- base
commonLabels:
  app.kubernetes.io/component: jupyter
  app.kubernetes.io/instance: jupyter-web-app@sha256:d19baa2d5404b889075a1f1d6d40d0b143828a72d25a7750f076979562c9a6f0
  app.kubernetes.io/managed-by: kfctl
  app.kubernetes.io/name: jupyter-web-app
  app.kubernetes.io/part-of: kubeflow
  app.kubernetes.io/version: v0.6
generators:
- github.com/kubeflow/manifests//kfdef/kfctl_gcp_iap.yaml?ref=master

Note that the overlays are removed and a generator section is added. The plugin would get the overlays and parameter default overrides from kfctl_gcp_iap.yaml.

The kustomization.yaml under jupyter-web-app/overlays/application is also simplified and goes from

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../base
resources:
- application.yaml
configMapGenerator:
- name: jupyter-web-app-parameters
  env: params.env
vars:
- name: generateName
  objref:
    kind: ConfigMap
    name: jupyter-web-app-parameters
    apiVersion: v1
  fieldref:
    fieldpath: data.generateName
configurations:
- params.yaml
commonLabels:
  app.kubernetes.io/name: jupyter-web-app
  app.kubernetes.io/instance: $(generateName)
  app.kubernetes.io/managed-by: kfctl
  app.kubernetes.io/component: jupyter
  app.kubernetes.io/part-of: kubeflow
  app.kubernetes.io/version: v0.6

to

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- application.yaml

Going this route has the following advantages:

  • we're now vanilla kustomize - within any extended functionality we need utilizing what kustomize recommends and provides.
  • we use gitops for instance label creation
  • individual apps can now be deployed as is using kustomize build | kubectl apply -f - or kubectl -k .

Reviewable status: 0 of 291 files reviewed, all discussions resolved (waiting on @jlewi and @zhenghuiwang)

@jlewi
Copy link
Contributor

jlewi commented Oct 6, 2019

Thanks @kkasravi Could you explain though why we are setting instanceName using generateName which is getting set in the kfdefs.yaml file?

Why don't we just set instanceName in the kustomization file?

For example, the jupyter-web-app already defines commonLabels here

So why can't we just change that to

commonLabels:
  app: jupyter-web-app
  kustomize.component: jupyter-web-app
  app.kubernetes.io/instance: jupyter-web-app-vX-Y-Z

Copy link
Contributor Author

@kkasravi kkasravi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the instance name should also be the name of the application CR, this is how the uber application CR collects the nested application CRs. If we set the instance name in manifests/jupyter/jupyter-web-app/base/kustomization.yaml we would also need to set it in application.yaml. If we don't do it by ci/cd, then the application owner could set it in the commonLabels and in the application.yaml's metadata.name

Reviewable status: 0 of 291 files reviewed, all discussions resolved (waiting on @jlewi and @zhenghuiwang)

app.kubernetes.io/component: jupyter
app.kubernetes.io/part-of: kubeflow
app.kubernetes.io/version: v0.6
app.kubernetes.io/instance: $(generateName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kkasravi What's the reasoning for introducing generateName here? The selector was already matching a set of labels that was already being consistently applied across all the components via kustomization.yaml commonLabels.

Why are we parameterizing this using generateName?

commonLabels:
app.kubernetes.io/name: jupyter-web-app
app.kubernetes.io/instance: jupyter-web-app
app.kubernetes.io/instance: $(generateName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we parameterizing the instance name rather than just explicitly coding it here in common labels?

@@ -1,16 +1,11 @@
apiVersion: app.k8s.io/v1beta1
kind: Application
metadata:
name: jupyter-web-app
name: $(generateName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we parameterizing the name rather than just explicitly setting it here? What was wrong with just hardcoding the name here?

@jlewi
Copy link
Contributor

jlewi commented Oct 6, 2019

I'm still not sure I fully follow here.

Lets start with the KFDef files. Why is KFDef being used to set the instance label by adding a stanza like this for every application to *every KFDef file?

      parameters:
      - name: generateName
        value: notebook-controller-

This stanza doesn't look like its conveying any information that an application shouldn't already know; i.e. what does the notebook controller need the KFDef file to set a parameter to know that its name is "notebook-controller"?

Related to that why are we generating random suffixes to use as the instance label? For example why would the instance label for centraldashboard be like centraldashboard-akdfjls where the suffix is some randomly generated value as opposed to something like centraldashboard-vX-Y-Z that is explicitly set in the kustomize manifests for central dashboard?

the instance name should also be the name of the application CR, this is how the uber application CR collects the nested application CRs.

The labels and selector just need to be set consistently. So can't we just do this in the kustomize manifests? IIUC for every application we would just need to set two places consistently

  1. We would have to set common labels in kustomization.yaml
  2. We would have to set an appropriate label selector in the Application CR

It looks like that was already happening e.g. in jupyter_web_app. See my inline comments on jupyter_web_app.

Copy link
Contributor Author

@kkasravi kkasravi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The labels and selector just need to be set consistently. So can't we just do this in the kustomize manifests? IIUC for every application we would just need to set two places consistently

If we use centraldashboard-v0.7.0 (would it use the same label as kubeflow?) then doesn't it imply that we can only have one instance of centraldashboard on that cluster for that version? But we currently don't fail if we redeploy kubeflow to a different namespace in the same cluster even when we're redeploying centraldashboard ClusterRoleBindings which already exist with the same name and instance label. Wouldn't it mean that deleting kubeflow could only delete the ClusterRoleBinding for centraldashboard if it found no centraldashboard resources across all namespaces? If we're upgrading to centraldashboard-v0.7.1 we can can't patch any resource with a selector ( Application, Deployment, StatefulSet, Service) because selector's are immutable.

Reviewable status: 0 of 291 files reviewed, 3 unresolved discussions (waiting on @jlewi and @zhenghuiwang)


jupyter/jupyter-web-app/overlays/application/application.yaml, line 4 at r1 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

Why are we parameterizing the name rather than just explicitly setting it here? What was wrong with just hardcoding the name here?

I believe we can just use a name here and not generateName if we have a way of knowing when we can delete cluster level resources. BTW generateName is generated once and referenced from multiple resources so if we generated centraldashboard-12adec that name would be used as the instance label in all resources including cluster level resources for centraldashboard (ServiceAccount, Role, ClusterRole, RoleBinding, ClusterRoleBinding, ConfigMap, Service, Deployment, VirtualService)


jupyter/jupyter-web-app/overlays/application/application.yaml, line 8 at r1 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

@kkasravi What's the reasoning for introducing generateName here? The selector was already matching a set of labels that was already being consistently applied across all the components via kustomization.yaml commonLabels.

Why are we parameterizing this using generateName?

I believe we can if we can determine whether we can delete cluster level resources


jupyter/jupyter-web-app/overlays/application/kustomization.yaml, line 22 at r1 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

Why are we parameterizing the instance name rather than just explicitly coding it here in common labels?

See earlier comments about cluster level resources and constraints about only deploying (in this case) jupyter-web-app once

@kkasravi kkasravi requested a review from jlewi October 7, 2019 14:27
Copy link
Contributor Author

@kkasravi kkasravi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instance name (and generateName) came from

The name of an application and the instance name are recorded separately. For example, WordPress has a app.kubernetes.io/name of wordpress while it has an instance name, represented as app.kubernetes.io/instance with a value of wordpress-abcxzy. This enables the application and instance of the application to be identifiable. Every instance of an application must have a unique name.

which is why i make the argument above that knowing whether we should delete centraldashboard's ClusterRoleBinding would depend on knowing that centraldashboard namespaced resources were also being deleted.

Reviewable status: 0 of 291 files reviewed, 3 unresolved discussions (waiting on @jlewi and @zhenghuiwang)

@jlewi
Copy link
Contributor

jlewi commented Oct 7, 2019

Kubeflow only currently supports one instance of each Kubeflow application per cluster.

To support more than one instance of an application per cluster we would have to solve all kinds of other problems.

For example, if you install multiple versions of CR controller per cluster, how do you decide which resources get handled by which controller?

So if we assume that we only need to support one instance per cluster; does that allow us to simplify things?

For example, do we need to use random salt for the instance names?

The only use case we have right now for multiple instances per cluster would be when we are upgrading Kubeflow and we have a 0.X and 0.Y instance both installed for a brief period of time before cleaning up 0.X.

Copy link
Contributor

@jlewi jlewi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 291 files reviewed, 3 unresolved discussions (waiting on @jlewi, @kkasravi, and @zhenghuiwang)


jupyter/jupyter-web-app/overlays/application/application.yaml, line 4 at r1 (raw file):

Previously, kkasravi (Kam Kasravi) wrote…

I believe we can just use a name here and not generateName if we have a way of knowing when we can delete cluster level resources. BTW generateName is generated once and referenced from multiple resources so if we generated centraldashboard-12adec that name would be used as the instance label in all resources including cluster level resources for centraldashboard (ServiceAccount, Role, ClusterRole, RoleBinding, ClusterRoleBinding, ConfigMap, Service, Deployment, VirtualService)

Per comment on the main thread.
One instance of Kubeflow per cluster is the only thing we really support right now.

So if we delete the Application resource that means delete centraldashboard and we can cleanup any associated cluster resources.

Copy link
Contributor Author

@kkasravi kkasravi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we assume that we only need to support one instance per cluster; does that allow us to simplify things?

yes completely.

Reviewable status: 0 of 291 files reviewed, 3 unresolved discussions (waiting on @jlewi and @zhenghuiwang)


jupyter/jupyter-web-app/overlays/application/application.yaml, line 4 at r1 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

Per comment on the main thread.
One instance of Kubeflow per cluster is the only thing we really support right now.

So if we delete the Application resource that means delete centraldashboard and we can cleanup any associated cluster resources.

got it

@kkasravi
Copy link
Contributor Author

kkasravi commented Oct 7, 2019

closing this PR - reopening with just one application (jupyter-web-app)

@kkasravi kkasravi closed this Oct 7, 2019
@kkasravi kkasravi deleted the add_application_overlays branch October 13, 2019 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants