Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix download_via_gsutil and rsync workdir commands #5

Merged
merged 1 commit into from
Oct 29, 2021

Conversation

gmittal
Copy link
Collaborator

@gmittal gmittal commented Oct 29, 2021

rsync needs a trailing slash to have the intended behavior the following commands are expecting. Also the target_dir needs to be created before we copy stuff to it with gsutil.

@gmittal gmittal changed the title Fix mkdir and workdir commands Fix download_via_gsutil and rsync workdir commands Oct 29, 2021
Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just a question. OK either way as long as we document the behavior in comment somewhere.

@@ -206,7 +206,7 @@ def execute(dag: sky.Dag, dryrun: bool = False, teardown: bool = False):
if task.workdir is not None:
runner.add_step(
'sync', 'Sync files',
f'ray rsync_up {cluster_config_file} {task.workdir} {SKY_REMOTE_WORKDIR}'
f'ray rsync_up {cluster_config_file} {task.workdir}/ {SKY_REMOTE_WORKDIR}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume this change isn't made. If my local workdir has /path/a/b,

  • If we specify workdir as /path, then what do we see on remote? Expected /tmp/workdir/a/b.
  • If we specify workdir as /path/, then what do we see on remote? Expected, arguably, /tmp/workdir/a/b.

How would adding this change affect things?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do ~/Downloads/tpu, then /tmp/workdir/tpu. But the downstream training command that gets launched does cd /tmp/workdir && python -u models/... (which won't work since models is actually tpu/models).

It seems reasonable that remote workdir tree should be identical to the local workdir tree? Instead of having an additional layer with the name of the local working directory (tpu).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Zongheng, doesn't change anything. If you do ray attach config.yml and go to /tmp/workdir, models will be there, not tpu.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other wise, LGTM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... that is strange because it was failing for me with my fresh AWS setup without the trailing slash.

@concretevitamin
Copy link
Member

concretevitamin commented Oct 29, 2021 via email

@gmittal
Copy link
Collaborator Author

gmittal commented Oct 29, 2021

I'm on ray==1.7.0. I think we if we can agree on whether we want the the remote workdir tree to be the same as the local workdir then we can keep the fix as is in this PR.

@gmittal gmittal merged commit 36214bf into master Oct 29, 2021
@gmittal gmittal deleted the user/gmittal/fix-mkdir branch October 29, 2021 16:13
gmittal added a commit that referenced this pull request Mar 15, 2022
Fix download_via_gsutil and rsync workdir commands
JGoo1 referenced this pull request in skypilot-sds/skypilot Apr 28, 2023
Michaelvll added a commit that referenced this pull request Jun 11, 2024
…uest cancellation (#5)

* Fix config fields

* fix cancel

* Add loggings

* remove useless codes
Michaelvll added a commit that referenced this pull request Jun 11, 2024
* [GCP] initial take for dws support with migs

* fix lint errors

* dependency and format fix

* refactor mig instance creation

* fix

* remove unecessary instance creation code for mig

* Fix deletion

* Fix instance template logic

* Restart

* format

* format

* move to REST APIs instead of python APIs

* add multi-node back

* Fix multi-node

* Avoid spot

* format

* format

* fix scheduling

* fix cancel

* Add smoke test

* revert some changes

* fix smoke

* Fix

* fix

* Fix smoke

* [GCP] Changing the config name for DWS support and fix for resize request cancellation (#5)

* Fix config fields

* fix cancel

* Add loggings

* remove useless codes

---------

Co-authored-by: Zhanghao Wu <zhangaho.wu@outlook.com>
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
Michaelvll added a commit that referenced this pull request Jun 11, 2024
* [GCP] initial take for dws support with migs

* fix lint errors

* dependency and format fix

* refactor mig instance creation

* fix

* remove unecessary instance creation code for mig

* Fix deletion

* Fix instance template logic

* Restart

* format

* format

* move to REST APIs instead of python APIs

* add multi-node back

* Fix multi-node

* Avoid spot

* format

* format

* fix scheduling

* fix cancel

* Add smoke test

* revert some changes

* fix smoke

* Fix

* fix

* Fix smoke

* [GCP] Changing the config name for DWS support and fix for resize request cancellation (#5)

* Fix config fields

* fix cancel

* Add loggings

* remove useless codes

* Fix labels for GCP TPU

* format

* fix key

---------

Co-authored-by: Gurcan Gercek <gurcan.gercek@shopify.com>
Co-authored-by: Zhanghao Wu <zhangaho.wu@outlook.com>
Co-authored-by: Gurcan Gercek <111535545+gurcangercek@users.noreply.github.com>
Michaelvll added a commit that referenced this pull request Aug 23, 2024
* [GCP] initial take for dws support with migs

* fix lint errors

* dependency and format fix

* refactor mig instance creation

* fix

* remove unecessary instance creation code for mig

* Fix deletion

* Fix instance template logic

* Restart

* format

* format

* move to REST APIs instead of python APIs

* add multi-node back

* Fix multi-node

* Avoid spot

* format

* format

* fix scheduling

* fix cancel

* Add smoke test

* revert some changes

* fix smoke

* Fix

* fix

* Fix smoke

* [GCP] Changing the config name for DWS support and fix for resize request cancellation (#5)

* Fix config fields

* fix cancel

* Add loggings

* remove useless codes

---------

Co-authored-by: Zhanghao Wu <zhangaho.wu@outlook.com>
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
Michaelvll added a commit that referenced this pull request Aug 23, 2024
* [GCP] initial take for dws support with migs

* fix lint errors

* dependency and format fix

* refactor mig instance creation

* fix

* remove unecessary instance creation code for mig

* Fix deletion

* Fix instance template logic

* Restart

* format

* format

* move to REST APIs instead of python APIs

* add multi-node back

* Fix multi-node

* Avoid spot

* format

* format

* fix scheduling

* fix cancel

* Add smoke test

* revert some changes

* fix smoke

* Fix

* fix

* Fix smoke

* [GCP] Changing the config name for DWS support and fix for resize request cancellation (#5)

* Fix config fields

* fix cancel

* Add loggings

* remove useless codes

* Fix labels for GCP TPU

* format

* fix key

---------

Co-authored-by: Gurcan Gercek <gurcan.gercek@shopify.com>
Co-authored-by: Zhanghao Wu <zhangaho.wu@outlook.com>
Co-authored-by: Gurcan Gercek <111535545+gurcangercek@users.noreply.github.com>
romilbhardwaj pushed a commit that referenced this pull request Nov 14, 2024
Add username to sky status -u and sky queue
zpoint added a commit that referenced this pull request Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants