Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a Tensorboard component for local environments #792

Closed
wants to merge 2 commits into from

Conversation

swiftdiaries
Copy link
Member

@swiftdiaries swiftdiaries commented Feb 7, 2019

Right now, the Output Artifacts Viewer (Pipelines UI) only accepts GCS and Minio as the storage services Source Code.

However, Tensorboard cannot process links starting with minio:// and as of now there's no way to read things off of the local filesystem through volumes and/or volume mounts.

This is a component that we have been using for our onprem demos, which we are working on open-sourcing.


This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: ironpan

If they are not already assigned, you can assign the PR to them by writing /assign @ironpan in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: ironpan

If they are not already assigned, you can assign the PR to them by writing /assign @ironpan in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@swiftdiaries
Copy link
Member Author

/assign @IronPan

Copy link
Contributor

@animeshsingh animeshsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add S3 compatible support directly as part of this, or we rely on minio only for that?

Copy link
Contributor

@animeshsingh animeshsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @gaoning777, @hongye-sun, @IronPan, @swiftdiaries, and @vicaire)

@vicaire
Copy link
Contributor

vicaire commented Feb 12, 2019

@neuromage added a CRD to start and manage instances of Tensorboard. The advantage of this CRD is that it manages the tensorboard instances and keeps track of them.

https://github.com/kubeflow/pipelines/tree/master/backend/src/crd/samples/viewer

Could we add the functionality of this PR to this CRD? It should then be easy to create an instance of Tensorboard by using kubectl. Doing this will also put us on the right track to integrate this functionality with the UI.

/cc @neuromage

@swiftdiaries
Copy link
Member Author

swiftdiaries commented Feb 12, 2019

@animeshsingh Thank you for the review :)
There's a set of pending PRs for adding S3 support.
https://github.com/kubeflow/pipelines/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+s3
This would enable any S3 compatible Object Stores including Minio.

@swiftdiaries
Copy link
Member Author

@vicaire @neuromage Thank you for the reply. The ViewerCRD looks pretty neat. How does Pipelines run / orchestrate the ViewerCRD and how can it be attached to a training ContainerOp / TfJob op for example?

@vicaire
Copy link
Contributor

vicaire commented Feb 13, 2019

/cc @neuromage

@swiftdiaries, the CRD does not yet support mounting a volume. Would you be interested in adding support for this?

The way it works today is like this:

  1. The pipeline runs.
  2. One of the steps outputs Tensorboard data in an object store (GCS, S3, etc.)
  3. From the UI (clicking the pipeline step that generated the tensorboard data) or from kubectl, a Tensorboard instance is created (using the viewer CRD) that gets an argument specifying where to look for the data (object store only is supported today. Adding volume support would be great).

WDYT?

@swiftdiaries
Copy link
Member Author

@vicaire The Viewer CRD supports the addition of local volumes, I created a PR with a sample here
The frontend UI currently doesn't support reading off of local volumes, as a next step I'm looking to add that feature

@neuromage
Copy link
Contributor

Yeah, the viewer CRD supports volumes. I tested this but should have documented and added a sample for how to do it. Thanks @swiftdiaries for looking into it and adding the sample.

The next step is to replace the Tensorboard launcher in the UI to use the viewer CRD instead. If you'd like to look into that @swiftdiaries , that would be great.

/cc @yebrahim
/cc @rileyjbauer

@swiftdiaries
Copy link
Member Author

swiftdiaries commented Feb 14, 2019 via email

@vicaire
Copy link
Contributor

vicaire commented Feb 14, 2019

SGTM. Thanks. Should we close this PR?

@swiftdiaries
Copy link
Member Author

Closing because it's being handled via the Viewer CRD
PR for reference: #822
Issue for reference: #699

@swiftdiaries swiftdiaries deleted the tensorboard_local branch February 15, 2019 11:41
Linchin pushed a commit to Linchin/pipelines that referenced this pull request Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants