Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial prototype for scheduler parsing #3647

Closed
wants to merge 2 commits into from

Conversation

espenfl
Copy link
Contributor

@espenfl espenfl commented Dec 12, 2019

Here is an initial approach which enable basic scheduler parsing. The Slurm scheduler is used as an example. For Slurm, the State entry in the sacct contains for instance OUT_OF_MEMORY when the process breach the set memory limit. As such it makes sense to parse the sacct information and in case needed, map that to generic error codes.

There is also a method to parse the stdout and stderr of the parser, which in this commit does nothing. Currently the error code with the highest exit_status number is set (in case there are differences between the sacct and file approach.

Currently, exit codes are set on JobCalc. We might, in the future consider to move the same framework onto the scheduler framework to allow for specific handling of scheduler error codes without relying on specific definitions in a given JobCalc.

A test is also included, which sets up a node and tests the parsing for the sacct. We should, when functionality added, also add tests the file parsing.

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @espenfl great first start, I have a few comments. Also the logic on which exit code to give priority should probably be discussed, in person would be most efficient I think

aiida/engine/processes/calcjobs/calcjob.py Outdated Show resolved Hide resolved
aiida/engine/processes/calcjobs/calcjob.py Outdated Show resolved Hide resolved
aiida/orm/nodes/process/process.py Outdated Show resolved Hide resolved
aiida/orm/nodes/process/process.py Outdated Show resolved Hide resolved
aiida/orm/nodes/process/process.py Outdated Show resolved Hide resolved
aiida/schedulers/plugins/slurm.py Show resolved Hide resolved
aiida/schedulers/plugins/slurm.py Outdated Show resolved Hide resolved
aiida/schedulers/plugins/slurm.py Outdated Show resolved Hide resolved
aiida/engine/processes/calcjobs/calcjob.py Outdated Show resolved Hide resolved
@ramirezfranciscof
Copy link
Member

ramirezfranciscof commented Feb 26, 2020

Hey @espenfl , could you briefly explain to me the relationship between this PR and #3261? They would seem to be related (the other one seems more broad but also seems to be dealing with parsing the scheduler). Which is the current status of each?

@sphuber
Copy link
Contributor

sphuber commented Apr 14, 2020

Superseded by PR #3906 and #3931

@sphuber sphuber closed this Apr 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants