Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dragon Launcher Batch Job Support #541

Merged
merged 111 commits into from
May 10, 2024

Conversation

al-rigazzi
Copy link
Collaborator

@al-rigazzi al-rigazzi commented Apr 7, 2024

This PR actually adds several things:

  • stdout and stderr redirect of Dragon-launched processes
  • DragonBatchStep with logic to keep track of batch jobs run through SLURM and PBS
  • some more env variables were added to CONFIG to help with launching dragon with options
  • some mitigation of Authenticator's locking behavior is put in place
  • a cooldown period was added to the DragonBackend to make sure telemetry monitor can get updates before it shuts down
  • the DragonBackend status is now a string representation of two tables, one for hosts (indicating Free/Busy status) and one for ProcessGroups (similar to standard WLM output)
  • documentation was added for Dragon.

Copy link

codecov bot commented Apr 7, 2024

Codecov Report

Attention: Patch coverage is 15.20619% with 658 lines in your changes are missing coverage. Please review.

❗ No coverage uploaded for pull request base (dragon_launcher@390f9cf). Click here to learn what that means.

Additional details and impacted files

Impacted file tree graph

@@                Coverage Diff                 @@
##             dragon_launcher     #541   +/-   ##
==================================================
  Coverage                   ?   67.54%           
==================================================
  Files                      ?       78           
  Lines                      ?     5983           
  Branches                   ?        0           
==================================================
  Hits                       ?     4041           
  Misses                     ?     1942           
  Partials                   ?        0           
Files Coverage Δ
smartsim/_core/control/controller.py 74.65% <100.00%> (ø)
smartsim/_core/launcher/launcher.py 100.00% <100.00%> (ø)
smartsim/_core/launcher/step/__init__.py 100.00% <100.00%> (ø)
smartsim/_core/launcher/step/step.py 96.05% <100.00%> (ø)
smartsim/_core/schemas/dragonRequests.py 97.50% <100.00%> (ø)
smartsim/_core/schemas/utils.py 100.00% <100.00%> (ø)
smartsim/_core/utils/network.py 42.85% <ø> (ø)
smartsim/_core/utils/security.py 41.08% <ø> (ø)
smartsim/_core/utils/telemetry/telemetry.py 87.64% <100.00%> (ø)
smartsim/database/orchestrator.py 87.00% <ø> (ø)
... and 9 more

@al-rigazzi al-rigazzi marked this pull request as ready for review April 8, 2024 20:10
@al-rigazzi
Copy link
Collaborator Author

@amandarichardsonn @mellis13 @ashao if you have time, can you take a look at the new docs sections? They are:

Copy link
Contributor

@amandarichardsonn amandarichardsonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job on the documentation! I will review the other contents of the PR but heres some feedback to get you started.

doc/dragon.rst Outdated Show resolved Hide resolved
doc/dragon.rst Outdated Show resolved Hide resolved
doc/dragon.rst Outdated Show resolved Hide resolved
doc/dragon.rst Outdated Show resolved Hide resolved
doc/dragon.rst Outdated Show resolved Hide resolved
doc/dragon.rst Outdated Show resolved Hide resolved
doc/experiment.rst Outdated Show resolved Hide resolved
doc/experiment.rst Show resolved Hide resolved
doc/api/smartsim_api.rst Show resolved Hide resolved
doc/api/smartsim_api.rst Show resolved Hide resolved
Copy link
Contributor

@ankona ankona left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than minor docstring nitpicks, this LGTM!

Copy link
Contributor

@ankona ankona left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Copy link
Contributor

@amandarichardsonn amandarichardsonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for making the doc changes!

@al-rigazzi al-rigazzi merged commit 4a971fc into CrayLabs:dragon_launcher May 10, 2024
35 checks passed
@al-rigazzi al-rigazzi deleted the drg_batch branch May 11, 2024 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants