-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add document for bash script.sh
cannot do conda activate
problem
#422
Changes from 5 commits
f03c6ac
ae96c93
8bd498c
f14a281
b3d6543
ec83eaf
d1f99e9
1e6b64c
0b9d331
4cc5113
e7751d1
f18540e
b157fec
4b91b2a
77bbea3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -48,10 +48,18 @@ requiring an NVIDIA Tesla K80 GPU on AWS. (See more example yaml files in the `r | |
|
||
setup: | | ||
# Typical use: pip install -r requirements.txt | ||
|
||
# If using a `my_setup.sh` script to setup, please use | ||
# `bash -i my_setup.sh` to capture the environment | ||
# variable and make sure `conda activate` works | ||
Michaelvll marked this conversation as resolved.
Show resolved
Hide resolved
|
||
echo "running setup" | ||
|
||
run: | | ||
# Typical use: make use of resources, such as running training. | ||
|
||
# If using a my_run.sh script to run commands, please use | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is twice the burden on the user There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, this is not an ideal solution. I am not sure if there is any better one for this problem, as slurm also does not handle |
||
# `bash -i my_run.sh` to capture the environment variable | ||
# and make sure `conda activate` works | ||
echo "hello sky!" | ||
conda env list | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -732,9 +732,8 @@ def _retry_region_zones(self, to_provision: Resources, num_nodes: int, | |
|
||
cluster_name = config_dict['cluster_name'] | ||
plural = '' if num_nodes == 1 else 's' | ||
logger.info( | ||
f'{style.BRIGHT}Successfully provisioned or found' | ||
f' existing VM{plural}. Setup completed.{style.RESET_ALL}') | ||
logger.info(f'{style.BRIGHT}Successfully provisioned or found' | ||
f' existing VM{plural}.{style.RESET_ALL}') | ||
Michaelvll marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return config_dict | ||
message = ('Failed to acquire resources in all regions/zones' | ||
f' (requested {to_provision}).' | ||
|
@@ -1219,13 +1218,10 @@ def setup(self, handle: ResourceHandle, task: Task) -> None: | |
|
||
if task.setup is None: | ||
return | ||
codegen = textwrap.dedent(f"""\ | ||
#!/bin/bash | ||
# TODO(zhwu): Move this to bashrc | ||
. $(conda info --base)/etc/profile.d/conda.sh | ||
{task.setup}""") | ||
|
||
setup_script = log_lib.make_task_bash_script(task.setup) | ||
with tempfile.NamedTemporaryFile('w', prefix='sky_setup_') as f: | ||
f.write(codegen) | ||
f.write(setup_script) | ||
f.flush() | ||
setup_sh_path = f.name | ||
setup_file = os.path.basename(setup_sh_path) | ||
|
@@ -1249,7 +1245,8 @@ def setup(self, handle: ResourceHandle, task: Task) -> None: | |
with_outputs=False) | ||
backend_utils.run_command_on_ip_via_ssh( | ||
ip, | ||
f'cd {SKY_REMOTE_WORKDIR}; /bin/bash /tmp/{setup_file}', | ||
# -i will make sure `conda activate` works | ||
f'/bin/bash -i /tmp/{setup_file}', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if a Task.run is also running There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the user needs to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a test case/example where the bash script would fail without There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great point! I found a problem with distributed training with user_script during doing this, that $SKY_NODE_IPS cannot be retrieved from the user_script, due to the bash bug. |
||
ssh_user=ssh_user, | ||
ssh_private_key=ssh_private_key, | ||
log_path=os.path.join(self.log_dir, 'setup.log'), | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to automatically do this later for users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not think of a better way to do this. I feel like we should respect the setup section written by the user, without modifying any code in it. Otherwise, more unexpected behaviors will occur for the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok we should figure this out later... (e.g. how to fix backend so that bash will emulate local...)