Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the Computer have a node_max_memory property #3405

Closed
sphuber opened this issue Oct 8, 2019 · 2 comments
Closed

Should the Computer have a node_max_memory property #3405

sphuber opened this issue Oct 8, 2019 · 2 comments

Comments

@sphuber
Copy link
Contributor

sphuber commented Oct 8, 2019

Many schedulers have a way to specify a memory usage limit that when hit, it will kill the job. This is often a nicer way then just letting the node crash. If the node limit is known for a machine, it might be nice to always set the scheduler specific flag to some fraction of the max memory. To facilitate this we would need a new property on the computer.

@yakutovicha
Copy link
Contributor

yakutovicha commented Jul 3, 2023

@sphuber, I assume this can be marked as completed since the feature has been introduced in #5260

[update]
Ok, this refers more to the maximum available memory, while the #5260 introduces a default memory, which is not the same. At the same time, the existence of #3503 makes me think that people are leaning towards having such a feature.

@sphuber
Copy link
Contributor Author

sphuber commented Jul 4, 2023

The naming chosen in #5260 might indeed be suboptimal in hindsight. In essence, it did implement what this issue was asking for, so I will still close it and we can continue discussion in #3503 .

Looking back at it and the two issues (this one and #3503 ) there seem to be two use cases for being able to define the amount of memory a computer has available:

  1. Let the scheduler plugin automatically defines the max memory that should be allocated for a CalcJob, such that it kills the job with an OOM instead of letting the code crash hard.
  2. Let utilities know how much memory a machine (node) has available so they can compute how many nodes are required for a particular task and how to parallelize

PR #5260 essentially satisfied use case 1 as the default_memory_per_machine attribute of a Computer now serves as a default value for the max_memory_kb metadata option of the CalcJob. So the name is a bit misleading, maybe it would better have been simply called max_memory.

So that leaves use case 2. It could be argued that both use cases could simply reuse the same Computer attribute for their purpose, i.e., we could define the memory and cores attributes which define the amount of memory and cores available on a computer, respectively. This would directly satisfy use-case 2. It could also be used to automatically satisfy use-case 1. In practice I think this is what people would anyway usually do. If you have a machine with 64 cores and 128 GB of RAM, I think people would typically set default_mpiprocs_per_machine to 64 and telling the scheduler to allocated max 128 GB of RAM would also make sense as a default. But I am not sure if there might be use-cases where this hijacking of these defaults would be incorrect and then there would be no way to change the default.

@sphuber sphuber closed this as completed Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants