Singularity launching support #101

Spartee · 2021-10-19T22:34:08Z

Description

SmartSim should be able to use singularity, specifically singularity exec to launch mpi applications that have been containerized. This should be able to work with a number of different run commands (i.e. srun singularity exec.

Justification

I will not list out the benefits of putting applications in containers because there are too many to list, however, SmartSim being able to launch such containers would be one more way SmartSim can contribute to reproducible science. Having each stage of a driver script be a containerized application could greatly benefit scientists in terms of sharing and reproducibility.

Implementation Strategy

I can think of two ways that this can be implemented.

as it's own RunSettings child class.

This approach would create a SingularityRunSettings class (possibly by a shorter name) that would take in the run command (i.e. srun) and other run arguments. Based on the run command that is passed, a corresponding RunSettings instance would be created in the class that all class methods that are typical for RunSettings would pass to. For example:

rs = SingularityRunSettings(run_command="aprun")
rs.set_tasks(10)

under the hood this would just pass the argument 10 to the AprunSettings class created and held as an instance variable within the SingularityRunSettings class.

As a base class method for all implementations of RunSettings

this method is likely easier since this method could be promoted to the RunSettings base class, but essentially this approach is a method that sets a flag for if something is to be launched as a singularity container.

aprun = AprunSettings(exe="hello_world")
aprun.use_singularity("/path/to/.sif/file", sing_args=["--nv"])

this method would do the following

check for singularity on the system. (might be delegated to Step level)
change exe for run settings to singularity and exe_args to start with exec
add user provided flags.

Acceptance Criteria

Choose an above implementation strategy write up a design document with a summary included in this ticket

The text was updated successfully, but these errors were encountered:

ben-albrecht · 2022-04-15T14:45:27Z

Choose an above implementation strategy write up a design document with a summary included in this ticket

After discussion within the team, we have reached consensus on the following approach:

Container runtimes will be supported in a class hierarchy as shown:

# container.py
class Container:
  ...

class Singularity(Container):
  ...

class Docker(Container):
  ...

class Podman(Container):
  ...

Users will instantiate this class with all the relevant container information, such as path to image, container arguments, and paths to bind/mount (if they don't want to specify them as container args). Below is a simple example:

# user-experiment.py
from smartsim.settings import container

s = container.Singularity('image.sif', 
                          args='--nv', paths=['path/to/exp'])

# Container object is passed to RunSettings initializers
aprun = AprunSettings(exe, container=s)

The image path may be a local or remote path (such as dockerhub / shub URL). The singularity args is a string of args to append to the singularity invocation. The paths to bind/mount can be given as a string (for 1 path), list (to expose each path without changing it within the container), or a dict (to map each path to a different path within the container). The default paths arg will be the experiment path.

SmartSim will explicitly pass any environment variables that are required by SmartSim, such as SSDB, SSKEYIN, SSKEYOUT.

We will add some additional getter/setter methods to modify the container data after instantiated. Also, some internal methods will support serialization of the container object for experiment tracking purposes down the road.

Future work may support extending the high-level create_run_settings to support a container argument to pass in a container object, or simply a string arg as a "quickstart" mode for enabling container support, e.g.

# Experiment takes container runtime and container image
exp.create_run_settings(exe, container_image=image, container='singularity')
# Automatically creates Container object
# Automatically binds experiment directory

Here is a standalone example demonstrating the interface:

from smartsim import Experiment
from smartsim.settings import container

# Create experiment
exp = Experiment(name="container-example")

# Create container object
s = container.Singularity('path/to/simulation_container.sif',
                           paths=['/path/to/exp', '/lus/datasets'],
                           args='--nv')

# Create RunSettings with container
settings = exp.create_run_settings(exe="./simulation.py", container=s)

# Create model
containerized_model = exp.create_model(name="containerized_simulation")

# Start experiment
experiment.start()

@ben-albrecht

[committed by @ben-albrecht] [reviewed by @ashao] This PR adds the ability for SmartSim to launch workloads in Singularity (Apptainer) as described in #101. It also lays the groundwork for supporting other container runtimes such as docker, shifter, and podman in the future, as well as launching the orchestrator in a container. ## Design Variations During development, it became clear that a few design changes from the original proposal were required. I have documented them below with their rationale: #### 1. Argument name: `bind_paths` -> `mount` The terms bind path and mount are mostly used interchangeably across different container runtimes. When writing tests, I kept forgetting if it was `bind_path` or `bind_paths`, which hinted to me that it was not a great arg name, so I swapped it out for the more succinct and easy to remember `mount`. #### 2. `create_run_settings(..., container: str)` -> `create_run_settings(..., container: Container)` We originally thought it would be easier for users to get started with containers by allowing them to pass a string into `create_run_settings(container='singularity')` instead of having to create a Container object. While writing tests, I realized that this was potentially very confusing for users because 1) the `container` arg types change between `create_run_settings` and `RunSettings`, and 2) if you need to add other metadata such as container args or container mount paths, you have to switch from using `create_run_settings` to `RunSettings` in your code, which is very annoying. Because creating Containers is so lightweight, I think it is best to keep the container interface consistent across all functions that accept them. #### 3. dropped getter/setter methods Because command generation and validation happens upon execution, users can freely modify `Container.args` and `Container.mount` without getter/setter methods to update any other state. Therefore, I dropped these methods from the interface. ## Testing Added 2 test suites for containers: One for WLM testing and one for local testing. The local testing suite runs in GitHub actions. Due to the added time from building Singularity and pulling the `container-testing` image, the singularity testing only happens on one configuration of the build matrix: python 3.9 + redisai 1.2.5 on linux

Spartee added type: feature Issues that include feature request or feature idea area: settings Issues related to Batch or Run settings labels Oct 19, 2021

ben-albrecht self-assigned this Apr 4, 2022

ben-albrecht closed this as completed Apr 15, 2022

ben-albrecht mentioned this issue Jun 8, 2022

SmartSim Singularity Integration #204

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Singularity launching support #101

Singularity launching support #101

Spartee commented Oct 19, 2021 •

edited by ben-albrecht

Loading

ben-albrecht commented Apr 15, 2022

Singularity launching support #101

Singularity launching support #101

Comments

Spartee commented Oct 19, 2021 • edited by ben-albrecht Loading

Description

Justification

Implementation Strategy

Acceptance Criteria

ben-albrecht commented Apr 15, 2022

Spartee commented Oct 19, 2021 •

edited by ben-albrecht

Loading