Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model as Batch Job #245

Merged
merged 8 commits into from
Jan 28, 2023
Merged

Model as Batch Job #245

merged 8 commits into from
Jan 28, 2023

Conversation

MattToast
Copy link
Member

Allows users to launch individual models through an experiment as a batch job with batch settings.

exp = Experiment(...)
rs = exp.craete_run_settings(...)
bs = exp.create_batch_settings(...)
model = exp.create_model("MyModel", rs, batch_settings=bs)
exp.start(model)  # launches the model as a batch job

If a model with batch settings is added to a higher order entity (e.g. an ensemble), the batch settings of the higher order entity will be used and the batch settings of the model will be ignored to avoid an "sbatch-in-sbatch" scenario or similar .

@MattToast MattToast self-assigned this Dec 6, 2022
@MattToast MattToast mentioned this pull request Dec 6, 2022
@MattToast MattToast changed the title Batch model exec Model as Batch Job Dec 6, 2022
@MattToast MattToast marked this pull request as ready for review December 6, 2022 23:46
@codecov-commenter
Copy link

codecov-commenter commented Dec 7, 2022

Codecov Report

Merging #245 (eadcebb) into develop (ff2cf24) will decrease coverage by 0.16%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #245      +/-   ##
===========================================
- Coverage    83.90%   83.74%   -0.17%     
===========================================
  Files           59       62       +3     
  Lines         3479     3512      +33     
===========================================
+ Hits          2919     2941      +22     
- Misses         560      571      +11     
Impacted Files Coverage Δ
smartsim/entity/model.py 96.73% <ø> (-0.04%) ⬇️
smartsim/experiment.py 80.71% <ø> (-0.28%) ⬇️
smartsim/_core/control/controller.py 84.83% <100.00%> (+0.16%) ⬆️
smartsim/_core/control/manifest.py 95.12% <100.00%> (ø)
smartsim/tf/__init__.py 100.00% <0.00%> (ø)
smartsim/tf/utils.py 26.66% <0.00%> (ø)
smartsim/__init__.py 100.00% <0.00%> (ø)

@MattToast MattToast requested review from al-rigazzi, mellis13 and ashao and removed request for mellis13 December 7, 2022 01:15
Copy link
Collaborator

@al-rigazzi al-rigazzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good! The only question I have is whether the changes were tested on a Slurm and a PBS system, as the batch mechanism will not be used on Local.

@MattToast
Copy link
Member Author

@al-rigazzi Good Question! I implemented against a slurm system, and I just got a chance to run the new full_wlm test on a PBS system, and both cases the test is passing. Between that an a couple of dummy runs through I did in the repl against both wlms I fairly confident everything is in working order!!

Copy link
Collaborator

@al-rigazzi al-rigazzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- but let's wait on other reviews.

Copy link
Member

@ashao ashao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for doing this work!

@MattToast MattToast merged commit dc8f11d into CrayLabs:develop Jan 28, 2023
@MattToast MattToast deleted the batch-model-exec branch February 8, 2023 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants