Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse UNLIMITED and NOT_SET time in SLURM scheduler #3415

Closed
greschd opened this issue Oct 11, 2019 · 11 comments · Fixed by #3586
Closed

Parse UNLIMITED and NOT_SET time in SLURM scheduler #3415

greschd opened this issue Oct 11, 2019 · 11 comments · Fixed by #3586

Comments

@greschd
Copy link
Member

greschd commented Oct 11, 2019

When running things on a SLURM scheduler, my log is usually cluttered with

10/01/2019 03:58:01 PM <69426> aiida.scheduler.slurm: [WARNING] Unrecognized format for time string 'UNLIMITED'
10/01/2019 03:58:01 PM <69426> aiida.scheduler.slurm: [WARNING] Error parsing the time limit for job id 622565

This should be easy to fix: In the _convert_time method of the SLURM plugin, return float('inf') if the string matches 'UNLIMITED' (and None for 'NOT_SET', which is also possible according to SLURM docs).

Does anyone know if this could have consequences down the line because the places where this is used (this_job.requested_wallclock_time_seconds and this_job.wallclock_time_seconds) are assumed to be int? They both go into the JobInfo, and None and float('inf') are JSON-serializable.

@sphuber
Copy link
Contributor

sphuber commented Oct 11, 2019

pinging @giovannipizzi who might know

@giovannipizzi
Copy link
Member

float(inf) will not work currently because of our validation that we discuss in #2412 and is not yet decided.
Maybe, even if a bit ugly and as if we were in the '70s, we can set -1 in the UNLIMITED case (and document it)?

@greschd
Copy link
Member Author

greschd commented Oct 11, 2019

Hmm, what about setting it to 2147483647 (maximum for the "normal" 4-byte signed integer in postgres, and roughly 68 years)?

The only issue I see with -1 is what will happen if someone decides to do comparisons on that.

@giovannipizzi
Copy link
Member

I see - also that however it a bit strange and people might not understand what it means.
Is there a real difference between UNSET and UNLIMITED? What if we put None in both?

@greschd
Copy link
Member Author

greschd commented Oct 11, 2019

TBH I've never actually seen NOT_SET in practice, but the docs say it happens when "the value is not yet established". UNLIMITED means exactly that, these are jobs that won't time out.

@greschd
Copy link
Member Author

greschd commented Oct 11, 2019

I also don't mind just keeping this open until we've moved on #2412. It's really not a big nuisance, and that might be better than a kludgy implementation now.

@sphuber
Copy link
Contributor

sphuber commented Oct 17, 2019

In the light of the decision on #2412 that inf will not be supported in attributes, should we set UNSET and UNLIMITED as None?

@giovannipizzi
Copy link
Member

Maybe indeed the best thing is to set it to None for UNSET and to a large integer as @greschd suggested for UNLIMITED, that is closer to what actually means, distinguishes the two and allows for the correct comparisons? Of course we need to discuss why in the docs.

@greschd
Copy link
Member Author

greschd commented Oct 17, 2019

Suggestions welcome for which magic high number should get the honors. Some possibilities:

  • 2147483647 == 2**31 - 1: maximum 32-bit signed int, as suggested above
  • 9223372036854775807== 2**63 - 1: same, but for 64-bit signed int, as suggested above. Also corresponds to sys.maxsize on most systems
  • Something more recognizable to a human as a "very large number", e.g. 9999999999.
  • 3093292800 == -int(datetime.datetime(year=1871,month=12,day=24).timestamp())

@sphuber
Copy link
Contributor

sphuber commented Oct 17, 2019

* `3093292800 == -int(datetime.datetime(year=1871,month=12,day=24).timestamp())`

👍 I like it!

@greschd
Copy link
Member Author

greschd commented Oct 17, 2019

Well, somebody needs to investigate the exact time, and of course I've ignored the time zone as well. Can we get funding for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants