Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build Pipeline leveraging Arena #1058

Merged
merged 43 commits into from
Apr 16, 2019

Conversation

cheyang
Copy link
Contributor

@cheyang cheyang commented Mar 28, 2019

The sample pipeline runs preparing data, source code, training and exporting a Tensorflow model with MNIST handwriting recognition using Arena. It provides arena_launcher and API to make the user easy to use pipelines to train the specific training, such as MPI Job, TensorFlow Estimator Job.


This change is Reviewable


# def DistributeTFOp(name, image, gpus: int, ):

class DistributeTFOp(dsl.ContainerOp):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def arena_distribute_tf_op(....):
    return MPIOp(....
    )

Copy link
Contributor Author

@cheyang cheyang Mar 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why some samples use RandomNumOp directly? I'm wondering the principles. Thanks for your advises? https://github.com/kubeflow/pipelines/blob/master/samples/basic/condition.py#L20

with dsl.Condition(flip.output == 'tails'):
    random_num_tail = RandomNumOp(10, 19)
    with dsl.Condition(random_num_tail.output > 15):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusing. Those samples are the oldest and have not been updated for a while.
Other samples are a bit more modern: https://github.com/kubeflow/pipelines/blob/master/samples/kubeflow-tf/kubeflow-training-classification.py

Your code (ineriting from ContainerOp vs returning ContainerOp) is not wrong. It's just outdated a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I will update with your suggestions.


# def arena_submit_standalone_job_op(name, image, gpus: int, ):

class MPIOp(dsl.ContainerOp):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def arena_launch_mpi_op(....):
    return ContainerOp(....
    )

# def arena_submit_standalone_job_op(name, image, gpus: int, ):

class StandaloneOp(dsl.ContainerOp):
"""Submit standalone Job."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more comprehensive docstring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will update.

@cheyang
Copy link
Contributor Author

cheyang commented Mar 30, 2019

@Ark-kun Please take a look again.

except:
experiment_id = client.create_experiment(EXPERIMENT_NAME).id
run = client.run_pipeline(experiment_id, RUN_ID, __file__ + '.tar.gz',
params={'learning_rate':learning_rate,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, there is currently a bug/inconsistency that forces you to use - in the arguments here instead of _. Have you tried to run your sample?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but I've tested it with pipeline in kubeflow 0.4.0, not with the latest pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already tested with kfp 0.1.4. Looks good.

name='pipeline to run jobs',
description='shows how to run pipeline jobs.'
)
def sample_pipeline(learning_rate=dsl.PipelineParam(name='learning_rate',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use learning_rate='0.01'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@cheyang
Copy link
Contributor Author

cheyang commented Apr 11, 2019

/retest

@cheyang
Copy link
Contributor Author

cheyang commented Apr 11, 2019

@Ark-kun , please take a look again when you have time. Thanks.

@googlebot
Copy link
Collaborator

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@cheyang cheyang force-pushed the integrate_arena_into_pipelines branch from afa1800 to 446b20f Compare April 12, 2019 13:59
fix the length of name

fix the length of name

change the default image
@cheyang cheyang force-pushed the integrate_arena_into_pipelines branch from 446b20f to b83336e Compare April 12, 2019 14:01
@googlebot
Copy link
Collaborator

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@cheyang
Copy link
Contributor Author

cheyang commented Apr 15, 2019

/ping @Ark-kun

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 16, 2019

/lgtm

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 16, 2019

Sorry for a delayed response. I wanted to make sure that this sample follows our directory organization for contributed samples.

I'm a bit concerned about the Arena sample being the first one the user sees when they enter the samples directory. I feel that the sample is quiet specialized and might not be the first sample that the user should see.

We can probably fix that later so I do not see a reason for holding this PR any longer.
/cc @gaoning777 @hongye-sun @vicaire @IronPan
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 6806f83 into kubeflow:master Apr 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants