Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application executes before colocated Orchestrator is created #522

Merged
merged 21 commits into from
Mar 21, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ To be released at some future point in time

Description

- Colocated Model bug fix
amandarichardsonn marked this conversation as resolved.
Show resolved Hide resolved
- ExecArgs handling correction
- ReadTheDocs config file added and enabled on PRs
- Enforce changelog updates
Expand All @@ -30,6 +31,8 @@ Description

Detailed Notes

- Build shell script to execute application after colocated Orchestrator
amandarichardsonn marked this conversation as resolved.
Show resolved Hide resolved
is finished building for colocated Models. (SmartSim-PR522_)
- Add checks and tests to ensure SmartSim users cannot initialize run settings
with a list of lists as the exe_args argument. (SmartSim-PR517_)
- Add readthedocs configuration file and enable readthedocs builds
Expand All @@ -55,6 +58,7 @@ Detailed Notes
Slurm and Open MPI. (SmartSim-PR520_)


.. _SmartSim-PR522: https://github.com/CrayLabs/SmartSim/pull/522
.. _SmartSim-PR524: https://github.com/CrayLabs/SmartSim/pull/524
.. _SmartSim-PR520: https://github.com/CrayLabs/SmartSim/pull/520
.. _SmartSim-PR518: https://github.com/CrayLabs/SmartSim/pull/518
Expand Down
14 changes: 7 additions & 7 deletions smartsim/_core/entrypoints/colocated.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
import tempfile
import typing as t
from pathlib import Path
from subprocess import PIPE, STDOUT
from subprocess import STDOUT
from types import FrameType

import filelock
Expand Down Expand Up @@ -177,7 +177,8 @@ def main(
db_scripts: t.List[t.List[str]],
db_identifier: str,
) -> None:
global DBPID # pylint: disable=global-statement
# pylint: disable=too-many-statements
global DBPID # pylint: disable=global-statement

lo_address = current_ip("lo")
ip_addresses = []
Expand All @@ -201,8 +202,10 @@ def main(
# we generally want to catch all exceptions here as
# if this process dies, the application will most likely fail
try:
process = psutil.Popen(cmd, stdout=PIPE, stderr=STDOUT)
DBPID = process.pid
with open("colo_orch_output.txt", "w", encoding="utf-8") as file:
amandarichardsonn marked this conversation as resolved.
Show resolved Hide resolved
process = psutil.Popen(cmd, stdout=file.fileno(), stderr=STDOUT)
DBPID = process.pid
print(f"__PID__{DBPID}__PID__", flush=True)
amandarichardsonn marked this conversation as resolved.
Show resolved Hide resolved

except Exception as e:
cleanup()
Expand Down Expand Up @@ -249,9 +252,6 @@ def launch_db_scripts(client: Client, db_scripts: t.List[t.List[str]]) -> None:
# Make sure we don't keep this around
del client

for line in iter(process.stdout.readline, b""):
print(line.decode("utf-8").rstrip(), flush=True)

except Exception as e:
cleanup()
logger.error(f"Colocated database process failed: {str(e)}")
Expand Down
12 changes: 6 additions & 6 deletions smartsim/_core/launcher/colocated.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,11 @@ def write_colocated_launch_script(
# STDOUT of the job
if colocated_settings["debug"]:
script_file.write("export SMARTSIM_LOG_LEVEL=debug\n")

script_file.write(f"{colocated_cmd}\n")
script_file.write("DBPID=$!\n\n")
script_file.write(f"db_stdout=$({colocated_cmd})\n")
# pylint: disable=anomalous-backslash-in-string
script_file.write(
"DBPID=$(echo $db_stdout | sed -n 's/.*__PID__\([0-9]*\)__PID__.*/\\1/p')\n"
)
amandarichardsonn marked this conversation as resolved.
Show resolved Hide resolved

# Write the actual launch command for the app
script_file.write("$@\n\n")
Expand Down Expand Up @@ -190,10 +192,8 @@ def _build_colocated_wrapper_cmd(
db_script_cmd = _build_db_script_cmd(db_scripts)
db_cmd.extend(db_script_cmd)

# run colocated db in the background
db_cmd.append("&")

cmd.extend(db_cmd)

return " ".join(cmd)


Expand Down
Loading