-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save Suggestion state in persistent volume #1250
Comments
Issue Label Bot is not confident enough to auto-label this issue. |
Why use pickling to store data as opposed to YAMl? If you use PDs are you going to have to manage disks? Could you persist to object store or some other easily managed datastore? Would it make sense to treat this data as metadata and store in metadata store? |
Wrt to 1, we will need to have consistent naming with other values Rest looks good to me |
Thanks for the comment @jlewi.
Suggestion is just a Kubernetes deployment with running script for HP or NAS algorithm to produce new Trial parameters (in case of HP tunning - hyperparameters) from the Search Space. Currently, when user submits Experiment, controller creates this Suggestion deployment. When Experiment is finished Suggestion deployment can be deleted or can be always-running (if user wants to resume Experiment later). I think one of the mechanism to save Suggestion python script state, can be pickling the executable class.
I am not sure that we want to manage them, because we should be not specific to the GCP. The question is, what should be the default structure for Storage class and PVCs ?
Do you have any ideas what it can be for Kubernetes?
Can we save serialization objects to metadata? |
Why is the experiment manager managing internal storage of individual HP algorithms? Why not adopt a microservice architecture? What if different algorithms require different types of internal storage? e.g. Suppose one algorithm needs to store a couple meta parameters and a YAML file works well vs. another algorithm which needs to store timeseries using a timeseries database? Why can't HP Tuner algorithm authors configure their own storage backend? e.g. the algorithm author provides a kustomize package to deploy their algorithm and this is parameterized depending on the storage they accept. e.g. PVC, S3/GCS URL, SQL database etc... If you don't want to waste resources when the service isn't be used can't we use autoscaling for that? e.g deploy the suggestion service server for that algorithm using knative?
This isn't specific to GCP. PVCs imply volumes which in many cases map to some form of "disk" as opposed to network or cloud filesystem. |
This is exactly what we want to do, but use The question with PVC, what should be the default technique, if user doesn't want to make any changes in configuration, but use Resume Experiment feature.
Please, can you show some examples from knative projects, where they use this. |
@andreyvelich I would suggest talking to the KFServing folks to better understand knative autoscaling. |
@andreyvelich Is katib-config providing the config for the suggestion microservices? Did you consider a micro-service architecture? e.g. for each suggestion service have a set of YAML file describing its configuration (e.g Deployment, ConfigMap, PVCs if needed). The other parts of Katib e.g. katib-config can then just take the URL of the suggestion endpoint. |
Yes, it was created to give user additional control over Suggestion service/deployment: https://www.kubeflow.org/docs/components/hyperparameter-tuning/katib-config/#suggestion-settings. |
e.g. suppose I have two suggestion services
Each of these may have different backend requirements. Lets suppose NAS uses an SQL DB and GridSearch uses an object store. So each of them would have YAML manifests for the K8s services (Deployment, Service, etc... ) that they need. Operator would just customize them as needed. |
@jlewi We are not giving functionality to define whole YAML manifest for Suggestion resource to the user. To make it very easy to submit Katib Experiment and get results. Controller creates k8s deployment and service automatically for Suggestion. For users that would like to modify default Suggestion installation we provide few settings, e.g. Service Account name. And my thought is to add another setting that represent different volume technique. As I said we can start with various Storage Class provisioners. |
Few thoughts after discussion on the Katib meeting:
If user wants to use this feature, controller creates PV and PVC and binds it to Experiment's Suggestion deployment. Later, we can add functionality when user can specify Storage class name in Katib config for Suggestion, because some Kubernetes cluster doesn't support creating PV manually. What do you think @gaocegege @johnugeorge @jlewi ? |
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
LGTM |
/assign @andreyvelich |
/kind feature
To continue idea proposed here: #1062.
We would like to attach persistent volume to every deployed suggestion to save suggestion state after corresponding pod is deleted.
To implement this we should follow these steps:
Extend
ResumePolicyType
with the new type. My idea, name itVolumeSource
, any other ideas?Add new Storage Class YAML to Katib deployed manifests. I am not sure what should be the default provisioner for us, since k8s doesn't support dynamic volume provisioning for local storage (https://kubernetes.io/blog/2019/04/04/kubernetes-1.14-local-persistent-volumes-ga/#limitations-of-ga).
We can be specific to GKE and use Persistent Disks (https://kubernetes.io/docs/concepts/storage/storage-classes/#gce-pd). Or use 3rd party local path provisioner (https://github.com/rancher/local-path-provisioner), this require additional controller.
What do you think ?
Implement new logic to the controller:
katib-config
with the new parameters for suggestion PVC.What do you think @gaocegege @johnugeorge ?
/cc @sperlingxx @c-bata @jlewi
/priority p0
The text was updated successfully, but these errors were encountered: