Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(components): Add v1beta1 Katib launcher and samples #4798

Merged
merged 12 commits into from
Dec 15, 2020

Conversation

andreyvelich
Copy link
Member

@andreyvelich andreyvelich commented Nov 21, 2020

Description of your changes:
Fixes: kubeflow/katib#1397.
Related: kubeflow/katib#1369.

I added Katib v1beta1 launcher and 2 samples under samples/contrib/kubeflow-katib dir:

  • Run Pipeline from the Jupyter Notebook using default component.yaml.
  • Compile Pipeline from the Python script and directly send parameters to the Katib launcher.

I use Base64Pickle type to send Experiment specification. I assume we can specify this type.
In that case, user can directly define Katib Experiment Spec using kubeflow-katib SDK which helps to avoid errors with invalid parameters.
What do you think ?

I have already pushed the new launcher to the registry: docker.io/kubeflowkatib/kubeflow-pipelines-launcher and tested 2 examples.

/assign @Bobgy @Ark-kun
/cc @gaocegege @johnugeorge @jlewi @Tomcli @fvde

Checklist:

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@k8s-ci-robot
Copy link
Contributor

@andreyvelich: GitHub didn't allow me to request PR reviews from the following users: fvde.

Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Description of your changes:
Fixes: kubeflow/katib#1397.
Related: kubeflow/katib#1369.

I added Katib v1beta1 launcher and 2 samples under samples/contrib/kubeflow-katib dir:

  • Run Pipeline from the Jupyter Notebook.
  • Compile Pipeline from the Python script.

I use Base64Pickle type to send Experiment specification. I assume we can specify this type.
In that case, user can directly define Katib Experiment Spec using kubeflow-katib SDK which helps to avoid errors with invalid parameters.
What do you think ?

I have already pushed the new launcher to the registry: docker.io/kubeflowkatib/kubeflow-pipelines-launcher and tested 2 examples.

/assign @Bobgy @Ark-kun
/cc @gaocegege @johnugeorge @jlewi @Tomcli @fvde

Checklist:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

- {name: Delete Finished Experiment, type: Bool, default: 'True', description: 'Whether to delete the experiment after it is finished.'}
- {name: Experiment Name, type: String, default: '', description: 'Experiment name'}
- {name: Experiment Namespace, type: String, default: anonymous, description: 'Experiment namespace'}
- {name: Experiment Spec, type: Base64Pickle, default: '', description: 'Experiment specification to convert in base64 pickle format'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use JsonObject type instead of Base64Pickle? Pickle is unfriendly to non-python components.

Copy link
Member Author

@andreyvelich andreyvelich Nov 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ark-kun Is it possible to convert V1beta1ExperimentSpec Object to JsonObject without losing nested class objects ?
When I use JsonObject and convert V1beta1ExperimentSpec to the dict, I can't restore the object structure in the Launcher.
I have to manually convert dict to the V1beta1ExperimentSpec object structure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, Kubernetes is very YAML-oriented and should support JSON. Perhaps there is some problem with conversion to a dictionary. Can you please check the naming style. Maybe the JSON uses snake_case instead of PascalCase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was one of the problem with converting objects.
Currently, I am using Katib SDK with sanitize_for_serialization to serialise ExperimentSpec object to JSON and deserialize to de-serialise JSON to ExperimentSpec object and submit Experiment.

outputs:
- {name: Best Parameter Set, type: JSON, description: 'The parameter set of the best Experiment trial.'}
- {name: Best Parameter Set, type: JSON, description: 'The hyperparameter set of the best Experiment Trial'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JsonObject maybe?

@Ark-kun
Copy link
Contributor

Ark-kun commented Nov 21, 2020

For long-term it might be great to refactor Katib to expose a trial suggestion library. This way we can avoid persistent mutable global state which will inevitable lead to problems.
See the design and comment in

# This pipeline demonstrates hyper-parameter optimization.

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@andreyvelich
Copy link
Member Author

For long-term it might be great to refactor Katib to expose a trial suggestion library. This way we can avoid persistent mutable global state which will inevitable lead to problems.
See the design and comment in

# This pipeline demonstrates hyper-parameter optimization.

@Ark-kun Does it mean that we want to provide user not only the best HP but also other parameters ?
For example, intermediate metrics, suggested parameters, Trials results.

Currently, Experiment lives as long as user doesn't delete it manually. Which means, user can resume it, etc.

@Bobgy
Copy link
Contributor

Bobgy commented Nov 24, 2020

also
/cc @chensun

@andreyvelich
Copy link
Member Author

@Ark-kun I refactored Katib component Launcher to be able to process JsonObject in ExperimentSpec, can you take a look please ?

Also cc @terrytangyuan for the MPIJob example.

# Katib launcher component.
# We convert Experiment spec to JSON dict.
# The Experiment is deleted after the Pipeline is finished.
op = dsl.ContainerOp(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We suggest not to construct a ContainerOp instances directly. In fact, you should see a warning if you use it in this way. Can you please use an alternative?

"Please create reusable components instead of constructing ContainerOp instances directly."
" Reusable components are shareable, portable and have compatibility and support guarantees."
" Please see the documentation: https://www.kubeflow.org/docs/pipelines/sdk/component-development/#writing-your-component-definition-file"
" The components can be created manually (or, in case of python, using kfp.components.create_component_from_func or func_to_container_op)"
" and then loaded using kfp.components.load_component_from_file, load_component_from_uri or load_component_from_text: "
"https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.components.html#kfp.components.load_component_from_file",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure @chensun.
I will follow the same way as in the Notebook example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! @andreyvelich The MPI example looks great.

@andreyvelich
Copy link
Member Author

/hold
Hold until we merge this PR: kubeflow/katib#1396. Check comment: kubeflow/katib#1396 (comment).

I'll try to use built-in functions to serialise and de-serialise ExperimentSpec object instead of writing the custom converter.

@andreyvelich
Copy link
Member Author

/hold cancel
The kubeflow-katib 0.10.1 SDK version was updated.
This PR should be ready. Please take a look @Ark-kun @Bobgy @chensun.

/cc @gaocegege @johnugeorge

experiment_name = experiment.metadata.name
experiment_namespace = experiment.metadata.namespace
while True:
try:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try:
current_status = None
try:

This could be one way to prevent that current_status is undefined in case an exception is thrown.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, thanks @thomas-mueller!

@Ark-kun
Copy link
Contributor

Ark-kun commented Dec 8, 2020

@Ark-kun Does it mean that we want to provide user not only the best HP but also other parameters ?
For example, intermediate metrics, suggested parameters, Trials results.

My idea is to have a "pure function" version of Katib which does not depend on mutable external state (Experiment resource). So, it would be great to have a function suggest_trials(parameter_specs, metric_specs, measurements_for_old_trials) -> List[Trial].

Currently, Experiment lives as long as user doesn't delete it manually. Which means, user can resume it, etc.

There are inherent issues with having mutable global state, global variables, etc. In KFP this becomes even more important since there is caching. The components are expected to be pure and produce the same results giving same input arguments.

@@ -1,5 +1,5 @@
#!/bin/bash -e
# Copyright 2019 Google LLC
# Copyright 2020 kubeflow.org.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps,

# Copyright 2020 The Kubeflow Authors

or

# Copyright 2020 Kubeflow contributors

@andreyvelich
Copy link
Member Author

@Ark-kun Does it mean that we want to provide user not only the best HP but also other parameters ?
For example, intermediate metrics, suggested parameters, Trials results.

My idea is to have a "pure function" version of Katib which does not depend on mutable external state (Experiment resource). So, it would be great to have a function suggest_trials(parameter_specs, metric_specs, measurements_for_old_trials) -> List[Trial].

Currently, Experiment lives as long as user doesn't delete it manually. Which means, user can resume it, etc.

There are inherent issues with having mutable global state, global variables, etc. In KFP this becomes even more important since there is caching. The components are expected to be pure and produce the same results giving same input arguments.

That's a good idea.
One problem that I can see, which controller takes care about Trial concurrency, max Trial List, early stopped Trials, etc ?

Currently, Trials CR and Suggestion CR can't work without Experiment CR. Experiment CR asks for the requestNumber of Trials and Suggestion produces corresponding Trials assignments.
Then, Experiment controller creates appropriate Trials and Trial controller waits for the successful Trial's worker object status.

WDYT @gaocegege @johnugeorge ?

@andreyvelich
Copy link
Member Author

@Ark-kun @Bobgy @chensun If you don't have any other comments on this PR should we merge it to move forward ?

@Bobgy
Copy link
Contributor

Bobgy commented Dec 15, 2020

I'll leave it to @chensun and @Ark-kun to make the decision.

@Ark-kun
Copy link
Contributor

Ark-kun commented Dec 15, 2020

Thank you for your contribution and your patience. There was one unresolved review comment from me, but we can fix it afterwards.
/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, Ark-kun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support v1beta1 API on KFP component
8 participants