Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify node feature for slurm job #529

Merged
merged 10 commits into from
Mar 22, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/api/smartsim_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ steps to a batch.
.. autosummary::

SrunSettings.set_nodes
SrunSettings.set_node_feature
SrunSettings.set_tasks
SrunSettings.set_tasks_per_node
SrunSettings.set_walltime
Expand Down
7 changes: 6 additions & 1 deletion doc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ To be released at some future point in time

Description

- Colo Orchestrator setup now blocks application start until setup finished.
- Add method to specify node features for a Slurm job
- Colo Orchestrator setup now blocks application start until setup finished
- ExecArgs handling correction
- ReadTheDocs config file added and enabled on PRs
- Enforce changelog updates
Expand All @@ -31,6 +32,9 @@ Description

Detailed Notes

- Users can now specify node features for a Slurm job through
``SrunSettings.set_node_feature``. The method accepts a string
or list of strings. (SmartSim-PR529_)
- The request to the colocated entrypoints file within the shell script
is now a blocking process. Once the Orchestrator is setup, it returns
which moves the process to the background and allows the application to
Expand Down Expand Up @@ -61,6 +65,7 @@ Detailed Notes
Slurm and Open MPI. (SmartSim-PR520_)


.. _SmartSim-PR529: https://github.com/CrayLabs/SmartSim/pull/529
.. _SmartSim-PR522: https://github.com/CrayLabs/SmartSim/pull/522
.. _SmartSim-PR524: https://github.com/CrayLabs/SmartSim/pull/524
.. _SmartSim-PR520: https://github.com/CrayLabs/SmartSim/pull/520
Expand Down
13 changes: 13 additions & 0 deletions smartsim/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,19 @@
self._fmt_walltime(int(hours), int(minutes), int(seconds))
)

def set_node_feature(self, feature_list: t.Union[str, t.List[str]]) -> None:
"""Specify the node feature for this job

:param feature_list: node feature to launch on
:type feature_list: str | list[str]
"""
logger.warning(

Check warning on line 334 in smartsim/settings/base.py

View check run for this annotation

Codecov / codecov/patch

smartsim/settings/base.py#L334

Added line #L334 was not covered by tests
(
"Feature specification not implemented for this "
f"RunSettings type: {type(self)}"
)
)

@staticmethod
def _fmt_walltime(hours: int, minutes: int, seconds: int) -> str:
"""Convert hours, minutes, and seconds into valid walltime format
Expand Down
15 changes: 15 additions & 0 deletions smartsim/settings/slurmSettings.py
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,21 @@ def set_broadcast(self, dest_path: t.Optional[str] = None) -> None:
"""
self.run_args["bcast"] = dest_path

def set_node_feature(self, feature_list: t.Union[str, t.List[str]]) -> None:
"""Specify the node feature for this job

This sets ``--C``
amandarichardsonn marked this conversation as resolved.
Show resolved Hide resolved

:param feature_list: node feature to launch on
:type feature_list: str | list[str]
:raises TypeError: if not str or list of str
"""
if isinstance(feature_list, str):
feature_list = [feature_list.strip()]
elif not all(isinstance(feature, str) for feature in feature_list):
raise TypeError("node_feature argument must be string or list of strings")
self.run_args["C"] = ",".join(feature_list)

@staticmethod
def _fmt_walltime(hours: int, minutes: int, seconds: int) -> str:
"""Convert hours, minutes, and seconds into valid walltime format
Expand Down
1 change: 1 addition & 0 deletions tests/test_run_settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,7 @@ def test_set_format_args(set_str, val, key):
pytest.param("set_task_map", (3,), id="set_task_map"),
pytest.param("set_cpus_per_task", (4,), id="set_cpus_per_task"),
pytest.param("set_hostlist", ("hostlist",), id="set_hostlist"),
pytest.param("set_node_feature", ("P100",), id="set_node_feature"),
pytest.param(
"set_hostlist_from_file", ("~/hostfile",), id="set_hostlist_from_file"
),
Expand Down
15 changes: 15 additions & 0 deletions tests/test_slurm_settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -338,6 +338,21 @@ def test_set_hostlist():
rs.set_hostlist([5])


def test_set_node_feature():
rs = SrunSettings("python")
rs.set_node_feature(["P100", "V100"])
assert rs.run_args["C"] == "P100,V100"

rs.set_node_feature("P100")
assert rs.run_args["C"] == "P100"

with pytest.raises(TypeError):
rs.set_node_feature(5)

with pytest.raises(TypeError):
rs.set_node_feature(["P100", 5])


def test_set_hostlist_from_file():
rs = SrunSettings("python")
rs.set_hostlist_from_file("./path/to/hostfile")
Expand Down
Loading