-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSD with node id later detected as HDD #1970
Comments
The disk type is stored on a volatile storage (tmpfs) after the node is booted because disk type should not change while the node is running. This is like this because after a runtime update, storage daemon restarts and then it will need to re-detect the disks but because disks might be under heavy load detection can be off, hence it uses the stored detection that happened during boot. On a restart the disk types are wiped out and then has to be re-done, but this happens before any workload is restarted so it should be safe to run. We also need to rescan the disks anyway on boot in case new disks are added or disks have been swapped. The failure to detect the disk as SSD can also mean the disk of a low quality that is now not peformant enough to be considered SSD. Note that we don't rely on the "announced" disk type, we instead run a speed test to make sure the disks speed is actually SSD grade. Last thing is using an HDD as a cache disk will really hurt the node performance, hence zos ignore hdd for cache (and hence id storage) |
Ah, I do recall now that disk type was fixed for the time that the node is up. I wondered a similar thing about a disk quality issue or performance degradation that dropped its seek time into the HDD category. Definitely the node should not use HDD or bad performing SSD as its cache disk. But if the node id or some workload data is already stored on a disk originally detected as SSD and that disk is later detected as HDD, then that data is essentially lost. It seems the system would be a bit more resilient if something like the following could happen:
|
I know this is an old issue but we might be able to process this now. I like your suggestion but I also suggest we do this as follows:
|
related to #2020 |
The PR linked above will make sure detection of disks is persisted across reboots. So even if disk performance was degraded over time it will never get detected as HDD drive. The system will do This fix is is now available on devnet, and will get release to mainnet with version v3.9.x |
Farmer reported that their node 3791 detected the cache SSD as HDD after the node rebooted. There's two concerns here:
The text was updated successfully, but these errors were encountered: