-
Notifications
You must be signed in to change notification settings - Fork 549
[display event] add event watcher in database controller #4939
Conversation
} | ||
}; | ||
|
||
async function assertDiskUsageHealthy() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to limit the quota per job and global, and does not impact the critical path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recorded in #4953 . We can solve this problem in the future.
# Max connection number to database in cluster event watcher. | ||
cluster-event-max-db-connection: 40 | ||
# Max disk usage in internal storage for cluster event watcher | ||
cluster-event-watcher-max-disk-usage-percent: 80 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also limit for history ? Why not move non-critical things to another DB server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recorded in #4954 . We can solve this problem in the future.
Event Watcher Test Cases:
Submit a job that will be always in waiting status (e.g. use a lot of resource). Then check if there is any event about "failed scheduling" on the job event page after a few minutes.
Submit a job with 2000+ tasks. After a few minutes check the event page can work properly.
Go to internal storage to see the existing usage:
Please notice the usage of loop device Create a big file under After a few minutes, confirm that 1. there is a NodeFilesystemUsage alert shown on webportal 2. the event watcher should exit automatically. Remove the big file. After a few minutes, confirm that: 1. there is no more NodeFilesystemUsage alert 2. the event watcher should work properly, and we can see events of new jobs on webportal. |
database size control strategy:
60s
and80%
are configurable.Problem found: