Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add HyperBand #787

Merged
merged 20 commits into from
Sep 25, 2019
Merged

feat: Add HyperBand #787

merged 20 commits into from
Sep 25, 2019

Conversation

gaocegege
Copy link
Member

@gaocegege gaocegege commented Sep 22, 2019

Signed-off-by: Ce Gao gaoce@caicloud.io

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:



This change is Reviewable

@gaocegege
Copy link
Member Author

/hold

@gaocegege
Copy link
Member Author

/hold cancel

@gaocegege
Copy link
Member Author

/retest

2 similar comments
@gaocegege
Copy link
Member Author

/retest

@gaocegege
Copy link
Member Author

/retest

@johnugeorge
Copy link
Member

@gaocegege this PR also needs a rebase

@gaocegege
Copy link
Member Author

Gotcha

@@ -19,7 +19,7 @@ spec:
- name: "eta"
value: "3"
- name: "r_l"
value: "9"
value: "2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, or the validation will fail because the hyperband algorithm will validate parallelTrialCount according to the r_l

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How was it working in v1alpha2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In e2e test, we will set the parallel trial count to 2. Thus we will get the error parallel trial count is less than 9 bacause of the r_l. In v1alpha2 we do not have e2e test for hyperband. Thus we do not have the problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks for explanation.

Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
// Algorithm settings in suggestion will overwrite the settings in experiment.
filledE := e.DeepCopy()
appendAlgorithmSettingsFromSuggestion(filledE, instance.Spec.AlgorithmSpec)
experiment := g.ConvertExperiment(e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/g.ConvertExperiment(e)/g.ConvertExperiment(filledE)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good find. I will update it when CI is finished. Want to see the result of CI first

Copy link
Member

@hougangliu hougangliu Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about not introducing filledE, and update g.ConvertExperiment(e) to g.ConvertExperiment(e, instance.Spec.AlgorithmSpec)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think code here is more readable. Or we will update the convertAlgortihmSettings. It is hard to read, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for me

Signed-off-by: Ce Gao <gaoce@caicloud.io>
@gaocegege
Copy link
Member Author

/retest

@@ -83,10 +91,11 @@ func (g *General) SyncAssignments(
})
}

// TODO(gaocegege): Set algorithm settings
updateAlgorithmSettings(instance, response.Algorithm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec is not updated in Reconcile

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, gotcha

Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
@gaocegege
Copy link
Member Author

/hold

Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
@gaocegege
Copy link
Member Author

/hold cancel

@johnugeorge
Copy link
Member

CI errored out because of quote limit. We have to wait till previous one completes.

@gaocegege
Copy link
Member Author

Yeah, I saw it.

@gaocegege
Copy link
Member Author

/retest

var paralleltrials int32 = 2
exp.Spec.MaxTrialCount = &maxtrials
exp.Spec.ParallelTrialCount = &paralleltrials
if exp.Spec.Algorithm.AlgorithmName != "hyperband" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to fix this check later.

@johnugeorge
Copy link
Member

/lgtm

@johnugeorge
Copy link
Member

/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnugeorge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit cc76656 into kubeflow:master Sep 25, 2019
@@ -41,7 +42,7 @@ spec:
- ftrl
- name: --num-epochs
parametertype: int
Copy link

@wronk wronk Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaocegege, is the yaml parser case sensitive? Here, parametertype doesn't have the t capitalized like elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the find. I will have a look. Actually, @johnugeorge found that the e2e hyperband test is finished quickly. Not sure if it is caused by this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants