Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flux job cancellation #428

Merged
merged 4 commits into from
Oct 30, 2023
Merged

Conversation

jwhite242
Copy link
Collaborator

@jwhite242 jwhite242 commented Oct 4, 2023

Update 0.49 and 0.26 adapters cancellation to deal with recent update to submit that converts from flux JobID to native types.

Update general cancellation behaviors:

  • Check for in progress steps before declaring cancelled successfully to delay until actual final states can be serialized
  • Update cancel logic to mirror failure logic: mark all steps downstream of a cancelled step to also be cancelled

@jwhite242
Copy link
Collaborator Author

@FrankD412 You may want to revoke/revisit that review as testing led me to chase down some other oddities and make more significant behavioral changes to step cancellations in general

@jwhite242 jwhite242 merged commit 8487230 into LLNL:develop Oct 30, 2023
jwhite242 added a commit that referenced this pull request Dec 12, 2023
1.1.10 Release (#432)

* Sync up read the docs config with dev environments using poetry (#399)
* Print usage on command line when no args are provided (#404)
* Add sacct fallback to slurm adapter to improve robustness of job tracking (#405)
* Update Flurm Job State mappings for flux versions >= 0.26 (#407)
* Bump certifi from 2021.10.8 to 2022.12.7 to address security issue (#409)
* Bump cryptography from 37.0.1 to 38.0.3 to address security issue (#410)
* Add missing shbang in unscheduled scripts from lsf adapter (#411)
* Update poetry lockfile to address dependabot flagged security issues (#412)
* Fix for Dockerfile smell DL3006 (#418)
* Port Maestro documentation to mkdocs and expand coverage of features and tutorials (#403)
* Update version info to be driven from pyproject.toml exclusively, and hook up to command line (#419)
* Pin mermaid to < 10.x due to api change (#422)
* Bump lock file certifi from 2022.12.7 to 2023.7.22 to address security issue (#426)
* Refactor flux adapter to avoid using pickle to talk to flux brokers installed in external environments (#415)
   Also adds flux integration tests to exercise against real flux brokers
* Add pager functionality to status command (#420)
* Patch broken flux job cancellation (#428)
* Insulate slurm adapters from user customization of squeue and sacct output formats (#431)
   Also adds live unit and integration tests for slurm adapter

---------

Co-authored-by: Francesco Di Natale <frank.dinatale1988@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Bruno P. Kinoshita <kinow@users.noreply.github.com>
Co-authored-by: Charles Doutriaux <doutriaux1@llnl.gov>
Co-authored-by: Giovanni Rosa <grosa23@yahoo.com>
Co-authored-by: Brian Gunnarson <49216024+bgunnar5@users.noreply.github.com>
jwhite242 added a commit that referenced this pull request Feb 6, 2024
1.1.10 Release (#432)

* Sync up read the docs config with dev environments using poetry (#399)
* Print usage on command line when no args are provided (#404)
* Add sacct fallback to slurm adapter to improve robustness of job tracking (#405)
* Update Flurm Job State mappings for flux versions >= 0.26 (#407)
* Bump certifi from 2021.10.8 to 2022.12.7 to address security issue (#409)
* Bump cryptography from 37.0.1 to 38.0.3 to address security issue (#410)
* Add missing shbang in unscheduled scripts from lsf adapter (#411)
* Update poetry lockfile to address dependabot flagged security issues (#412)
* Fix for Dockerfile smell DL3006 (#418)
* Port Maestro documentation to mkdocs and expand coverage of features and tutorials (#403)
* Update version info to be driven from pyproject.toml exclusively, and hook up to command line (#419)
* Pin mermaid to < 10.x due to api change (#422)
* Bump lock file certifi from 2022.12.7 to 2023.7.22 to address security issue (#426)
* Refactor flux adapter to avoid using pickle to talk to flux brokers installed in external environments (#415)
   Also adds flux integration tests to exercise against real flux brokers
* Add pager functionality to status command (#420)
* Patch broken flux job cancellation (#428)
* Insulate slurm adapters from user customization of squeue and sacct output formats (#431)
   Also adds live unit and integration tests for slurm adapter

---------

Co-authored-by: Francesco Di Natale <frank.dinatale1988@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Bruno P. Kinoshita <kinow@users.noreply.github.com>
Co-authored-by: Charles Doutriaux <doutriaux1@llnl.gov>
Co-authored-by: Giovanni Rosa <grosa23@yahoo.com>
Co-authored-by: Brian Gunnarson <49216024+bgunnar5@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants