-
Notifications
You must be signed in to change notification settings - Fork 549
High disk load slow down the whole system #4865
Comments
shall we kill such offending pod? |
After some investigation, it usually appears in node using HDD. For SSD, we don't notice this issue. There are some drawbacks for elevator scheduler, here is more details: https://www.linuxjournal.com/article/6931. Here is the commit which change the default io scheduler to noop: https://git.launchpad.net/~mhcerri/ubuntu/+source/linux/+git/azure/commit/?h=azure-4.15-fsgsbase&id=75bec4e4cd32accb64f574dac31bb1910a52c19e |
If we using HDD, suggest change the io scheduler to And highly recommend using another disk to store log. If we using SSD, I think we will not suffer this issue. Since SSD support multi-queue by default. |
let's mark this as a known issue then. |
@Binyang2014 - may we list this as a best practice for PAI cluster set up? cc @hzy46 @mydmdm |
@scarlett2018 , we will rearchitect the log collection subsystem in the future release. This will be a tentative recommendation to mitigate the issue. |
Close as we don't use log-rotate anymore |
When there is an offence write log too quickly. Such as a job simply run
yes PAI
Will cause high disk io and slow down the whole system.
The text was updated successfully, but these errors were encountered: