-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix download_via_gsutil and rsync workdir commands #5
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Just a question. OK either way as long as we document the behavior in comment somewhere.
@@ -206,7 +206,7 @@ def execute(dag: sky.Dag, dryrun: bool = False, teardown: bool = False): | |||
if task.workdir is not None: | |||
runner.add_step( | |||
'sync', 'Sync files', | |||
f'ray rsync_up {cluster_config_file} {task.workdir} {SKY_REMOTE_WORKDIR}' | |||
f'ray rsync_up {cluster_config_file} {task.workdir}/ {SKY_REMOTE_WORKDIR}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume this change isn't made. If my local workdir has /path/a/b
,
- If we specify
workdir
as/path
, then what do we see on remote? Expected/tmp/workdir/a/b
. - If we specify
workdir
as/path/
, then what do we see on remote? Expected, arguably,/tmp/workdir/a/b
.
How would adding this change affect things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do ~/Downloads/tpu
, then /tmp/workdir/tpu
. But the downstream training command that gets launched does cd /tmp/workdir && python -u models/...
(which won't work since models
is actually tpu/models
).
It seems reasonable that remote workdir tree should be identical to the local workdir tree? Instead of having an additional layer with the name of the local working directory (tpu
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Zongheng, doesn't change anything. If you do ray attach config.yml
and go to /tmp/workdir
, models
will be there, not tpu
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other wise, LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... that is strange because it was failing for me with my fresh AWS setup without the trailing slash.
Maybe it's ray versions having different behavior? I'm using 1.5.2.
Regardless if this is a confirmed issue, we should fix!
…On Fri, Oct 29, 2021 at 12:16 AM Gautam Mittal ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In prototype/sky/execution.py
<#5 (comment)>
:
> @@ -206,7 +206,7 @@ def execute(dag: sky.Dag, dryrun: bool = False, teardown: bool = False):
if task.workdir is not None:
runner.add_step(
'sync', 'Sync files',
- f'ray rsync_up {cluster_config_file} {task.workdir} {SKY_REMOTE_WORKDIR}'
+ f'ray rsync_up {cluster_config_file} {task.workdir}/ {SKY_REMOTE_WORKDIR}'
Hmmm... that is strange because it was failing for me with my fresh AWS
setup without the trailing slash.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#5 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEQWHXWF4UASSI4WZU43ETUJJC5TANCNFSM5G6RNYAQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I'm on |
Fix download_via_gsutil and rsync workdir commands
…uest cancellation (#5) * Fix config fields * fix cancel * Add loggings * remove useless codes
* [GCP] initial take for dws support with migs * fix lint errors * dependency and format fix * refactor mig instance creation * fix * remove unecessary instance creation code for mig * Fix deletion * Fix instance template logic * Restart * format * format * move to REST APIs instead of python APIs * add multi-node back * Fix multi-node * Avoid spot * format * format * fix scheduling * fix cancel * Add smoke test * revert some changes * fix smoke * Fix * fix * Fix smoke * [GCP] Changing the config name for DWS support and fix for resize request cancellation (#5) * Fix config fields * fix cancel * Add loggings * remove useless codes --------- Co-authored-by: Zhanghao Wu <zhangaho.wu@outlook.com> Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
* [GCP] initial take for dws support with migs * fix lint errors * dependency and format fix * refactor mig instance creation * fix * remove unecessary instance creation code for mig * Fix deletion * Fix instance template logic * Restart * format * format * move to REST APIs instead of python APIs * add multi-node back * Fix multi-node * Avoid spot * format * format * fix scheduling * fix cancel * Add smoke test * revert some changes * fix smoke * Fix * fix * Fix smoke * [GCP] Changing the config name for DWS support and fix for resize request cancellation (#5) * Fix config fields * fix cancel * Add loggings * remove useless codes * Fix labels for GCP TPU * format * fix key --------- Co-authored-by: Gurcan Gercek <gurcan.gercek@shopify.com> Co-authored-by: Zhanghao Wu <zhangaho.wu@outlook.com> Co-authored-by: Gurcan Gercek <111535545+gurcangercek@users.noreply.github.com>
* [GCP] initial take for dws support with migs * fix lint errors * dependency and format fix * refactor mig instance creation * fix * remove unecessary instance creation code for mig * Fix deletion * Fix instance template logic * Restart * format * format * move to REST APIs instead of python APIs * add multi-node back * Fix multi-node * Avoid spot * format * format * fix scheduling * fix cancel * Add smoke test * revert some changes * fix smoke * Fix * fix * Fix smoke * [GCP] Changing the config name for DWS support and fix for resize request cancellation (#5) * Fix config fields * fix cancel * Add loggings * remove useless codes --------- Co-authored-by: Zhanghao Wu <zhangaho.wu@outlook.com> Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
* [GCP] initial take for dws support with migs * fix lint errors * dependency and format fix * refactor mig instance creation * fix * remove unecessary instance creation code for mig * Fix deletion * Fix instance template logic * Restart * format * format * move to REST APIs instead of python APIs * add multi-node back * Fix multi-node * Avoid spot * format * format * fix scheduling * fix cancel * Add smoke test * revert some changes * fix smoke * Fix * fix * Fix smoke * [GCP] Changing the config name for DWS support and fix for resize request cancellation (#5) * Fix config fields * fix cancel * Add loggings * remove useless codes * Fix labels for GCP TPU * format * fix key --------- Co-authored-by: Gurcan Gercek <gurcan.gercek@shopify.com> Co-authored-by: Zhanghao Wu <zhangaho.wu@outlook.com> Co-authored-by: Gurcan Gercek <111535545+gurcangercek@users.noreply.github.com>
Add username to sky status -u and sky queue
rsync
needs a trailing slash to have the intended behavior the following commands are expecting. Also the target_dir needs to be created before we copy stuff to it withgsutil
.