Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Alert: NodeMemoryUsage should be triggered when system's usage is high, not user's usage #2760

Closed
squirrelsc opened this issue May 15, 2019 · 1 comment

Comments

@squirrelsc
Copy link
Member

Currently, the NodeMemoryUsage alert will be triggered when node's memory usage is higher than 95%. For a server with high mem capacity like 256G, it means it will be triggered when left 12.8G memory. And as a computing node, it's expected, if user requests and uses 240G memory as design. But if OpenPAI uses more than 4G memory, the alert will be triggered.

So the alert should be triggered when the system uses more memory than expected. But not triggered, when user uses 100% of requested memory.

If user needs more memory than requested and is OOM, user should get this information from job details page, and fix it. And admin doesn't need care about it.

If system's processes uses more memory than designed, there should be an alert. Even the free memory is enough. As there may be potential issue on system.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants