Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 FIX: max_memory_kb usage for direct scheduler #4825

Merged
merged 5 commits into from
Mar 22, 2021

Conversation

eimrek
Copy link
Member

@eimrek eimrek commented Mar 18, 2021

Hi,

I am proposing my solution to the issue I raised a while ago #4526.

The changes are the following:

  • In all scheduler plugins I just renamed virtual_memory_kb to physical_memory_kb to avoid confusion.

  • In the Direct scheduler, the max_memory_kb limit is ignored now and just a message is printed to logger. The justification behind this is that the previously used ulimit -v did impose a virtual memory limit (as opposed to a physical memory limit that all the other schedulers do) and therefore it's inequivalent (as a result, I always have to treat the direct scheduler differently in my workflows). There doesn't seem to be a straightforward standard way to limit the physical memory usage of a process in linux, so I decided just to ignore this variable.

@codecov
Copy link

codecov bot commented Mar 18, 2021

Codecov Report

Merging #4825 (e2c046c) into develop (32a6c23) will increase coverage by 0.02%.
The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4825      +/-   ##
===========================================
+ Coverage    79.51%   79.52%   +0.02%     
===========================================
  Files          519      519              
  Lines        37089    37083       -6     
===========================================
- Hits         29487    29486       -1     
+ Misses        7602     7597       -5     
Flag Coverage Δ
django 74.23% <0.00%> (-0.01%) ⬇️
sqlalchemy 73.17% <0.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiida/schedulers/plugins/direct.py 61.22% <0.00%> (+2.15%) ⬆️
aiida/schedulers/plugins/lsf.py 64.43% <0.00%> (ø)
aiida/schedulers/plugins/pbspro.py 70.59% <0.00%> (ø)
aiida/schedulers/plugins/slurm.py 69.80% <0.00%> (ø)
aiida/schedulers/plugins/torque.py 70.59% <0.00%> (ø)
aiida/engine/daemon/client.py 75.13% <0.00%> (-1.01%) ⬇️
aiida/transports/plugins/local.py 81.80% <0.00%> (+0.26%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 32a6c23...e2c046c. Read the comment docs.

@chrisjsewell
Copy link
Member

thanks @eimrek 😄
I haven't had any experience of setting memory limits for the direct scheduler; @sphuber, @ltalirz, @giovannipizzi any use cases you can think of why this should not be ignored?

''.format((job_tmpl.max_memory_kb))
)
lines.append(f'ulimit -v {virtual_memory_kb}')
self.logger.info('Physical memory limiting is not supported by the direct scheduler.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a warning - the user is requesting something that is not supported.

@ltalirz
Copy link
Member

ltalirz commented Mar 19, 2021

thanks @eimrek

I agree that having a stricter requirement for the direct scheduler (by limiting virtual memory as limiting physical memory seems difficult) than for all other schedulers (that only limit physical memory) is not ideal.

At the same time, it might be useful to be able to limit memory consumption also for the direct scheduler.

If I understand correctly, then your PR is driven by the use case of wanting to transfer a job's memory settings between an hpc scheduler and the direct one unchanged.
Is it really a better solution, if the direct scheduler just ignores the request (rather than trying to enforce it in an approximate way)?

One could also argue for keeping the virtual-memory-based implementation for the direct scheduler; combined with an "info" log message informing the user that this is limiting virtual memory, not physical memory.

However, I haven't been affected by this issue and I can see both sides; I'll let @chrisjsewell and @giovannipizzi decide

@eimrek
Copy link
Member Author

eimrek commented Mar 19, 2021

Thanks for the comments @ltalirz

True, it would be useful to be able to limit memory consumption for the direct scheduler. However, limiting virtual memory can be quite tricky and not intuitively match the real memory usage. To illustrate: when I run a small tiny test Gaussian calculation, the physical memory usage can be practically negligible (in the order of 10 MB) but the virtual memory usage is at least 1.5 GB due to large amount of different libraries available to gaussian (and most of them are not used). Additionally, if the code you're running uses a different set of libraries in another version (or perhaps is compiled differently), then the virtual memory usage will vary as well. This makes it quite difficult to limit the real memory usage through the ulimit -v directive in a systematic manner.

In principle, currently the implementation is not too restrictive for my use and I could just clear the max_memory_kb variable in my workflows for the direct scheduler and the effect would be the same for me. So if you think it's useful to have it there, we can keep it.

Another option we can consider is to define an additional input called something like metadata.max_virtual_memory_kb that invokes the ulimit -v for the direct scheduler. Could also be implemented for the HPC schedulers but limiting virtual memory in those cases is almost never done or needed afaik, so this probably is not necessary.

@ltalirz ltalirz self-requested a review March 20, 2021 19:59
ltalirz
ltalirz previously approved these changes Mar 20, 2021
Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's difficult to think of a use case for wanting to limit virtual memory (and I've not heard from users who need this).

I support this change.

@eimrek
Copy link
Member Author

eimrek commented Mar 21, 2021

Fixed another bug that caught my eye near my proposed changes. It's very tiny so I thought I could just include it with this PR. In case we don't merge this PR, I can open another one just for that.

@ltalirz ltalirz self-requested a review March 22, 2021 09:07
@chrisjsewell chrisjsewell linked an issue Mar 22, 2021 that may be closed by this pull request
@chrisjsewell chrisjsewell changed the title Fix virtual and physical memory mix-up in scheduler plugins 🐛 FIX: max_memory_kb usage for direct scheduler Mar 22, 2021
@chrisjsewell chrisjsewell merged commit 75310e0 into aiidateam:develop Mar 22, 2021
@chrisjsewell
Copy link
Member

cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scheduler plugins: virtual and physical memory mix-up
3 participants