-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make virtio flush disabled by default #2223
Conversation
Are there any known issues with snapshots on AMD ? |
This shouldn't normally happen. I logged this issue and retriggered the CI run. |
Thank you @alindima -- the test worked on the second attempt. If/When it happens again, how do I trigger a CI run? |
The CI dashboard is not publicly accessible, we will take care of any retriggers that are blocking PR merges. |
@pclesr thank you very much for all the hard work that went into the last couple of PRs. We are still working on deciding what the right fix for this flush issue is. We will let you know as soon as we reach a conclusion. |
Can you help me understand where (or what) the objections are centered on? I had hoped that this backward-compatible fix would be more acceptable, and it would help me to understand where the "pain points" are. I work in the auto embedded space now, and there are some very good use cases for firecracker. |
It seems to me that the reason for the binary size increase on ARM is natural. I think you should increase the allowed target by a little, in |
We are trying to first investigate the impact this issue has from a data integrity/corruption standpoint, before committing to a fix. We haven't yet identified a regression test for this issue, where the outcome is data corruption, not just the lack of actual We really appreciate and thank you for the effort you've put in. The way I see it, there is still another alternative to the approach outlined in this PR: not advertising the flush capability to the guest driver at all. Any work in identifying any visible impact on the lack of flush would be greatly appreciated. It's great to hear that Firecracker can be put to good use in the embedded space as well 😄 |
I tried to develop a deterministic test case that did not involve power-cycling the host, but still have not found a solution. My approach was to try and get the pages in the page cache to be dropped when dirty without flushing (think: echo 1 > /proc/sys/vm/drop_caches). Unfortunately, even with MADV_DONTNEED I still could not get a deterministic way to force dirty pages in the cache to just be discarded without flushing to backing store. The goal would be to write a file from the VM without sync (so the pages are just sitting in the page cache on the host), then drop the pages (without writing to backing store, simulating a power failure) and then re-read the data from media. If one writes a pattern (as can be done with fio), it would hopefully then result in a mismatch between what was written in the VM and what in on physical media. So far, no luck yet in this approach, but I'm still looking at options. |
If a device advertises VIRTIO_BLK_F_FLUSH, then it must sync the data to physical media. The current flush implemention only flushes cached data to the host page cache, which can lead to incoherency between VM and physical media. Unfortunately, during boot there can be up to six flush requests on a read-write root partition, which can slow down booting. This means that the current paradigm (of not flushing data to physical media) must remain the default. This commit adds a drive option 'want_flush' that allows each drive to specify whether they require consistency and are willing to take the performance hit of syncing data out to media. The default is the existing behaviour, namely no flush. For drives that require the flush, the virtio layer will first flush any cached data (the current behaviour) and then force the pages in the host page cache for the drive out to physical media via sync_all(). This has been verified by running a 'fio' test with sync_on_close in the VM and verifying via 'perf' that fsync() on the host is actually invoked. Also, the bandwidth in the VM with flush enabled is also now in line with the underlying hardware. The invocation of sync_all() has been refactored to be in accordance with other function invocations in the request handler. This change also requires that the sys_fsync system call is in the list of approved seccomp calls. Signed-off-by: Peter Lawthers peter.lawthers@esrlabs.com
This adds python tests to verify the correct operation of flush. The features file in /sys/block/<drive>/device/features is checked to ensure that it is correct with respect to the optional 'want_flush' drive setting. Metrics also verify that by default, no flushes are received. Only when 'want_flush' is true should the 'flush_count' block metric increase. Signed-off-by: Peter Lawthers peter.lawthers@esrlabs.com
Increase the binary size target and limit to account for added flush code Signed-off-by: Peter Lawthers peter.lawthers@esrlabs.com
c9224b6
to
5a0d9f7
Compare
@alindima : I have been out for a week -- what is the status of this? What have you all decided with respect to how to handle this issue? |
@pclesr we're currently validating options (one of which being the one proposed by you in this PR) with |
Any status update? |
Hey @pclesr! We have made progress on the issue, but we did not reach a decision on what are all the usecases we want to support. We plan to restart work next week. |
If Amazon is not going to take any status, can they at least fix the API so that the device is correctly shown as not supporting flush? |
Hi @pclesr, We understand the importance of this issue and we apologize for not being able to provide an accurate timeline for scoping and fixing it. This was a miss on our part as we are trying to come up with a solution that supports your use case as well as our other customers’. We are actively working on this issue and our next steps are to align with our other customers with respect to the fix and then implementing it. For now, we will remove the flush virtio feature because, as you pointed out, it was not correct to offer it if we don’t implement it. |
Hi @pclesr, We reached a conclusion regarding this issue (see here). This PR is a very good starting point and we'd like to resume progress here. Thank you for your patience. Let us know if you are still willing to contribute to this PR. If so, we will start reviewing it and we can move forward from there. |
I went ahead and implemented the required changes, see the PR here. I will close this PR in favor of the other one. |
Reason for This PR
The existing implementation incorrectly advertises VIRTIO_BLK_T_FLUSH, when it
does not actually flush data to media from the host page cache. Unfortunately,
making all devices respect the flush feature bit slows down booting because
on a RW root partition, there can be a number of flush requests during the boot
process.
Description of Changes
This commit adds a drive option 'want_flush' that allows each
drive to specify whether they require consistency and are willing
to take the performance hit of syncing data out to media. The
default is the existing behaviour, namely no flush.
rust-vmm
.License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.
PR Checklist
[Author TODO: Meet these criteria.]
[Reviewer TODO: Verify that these criteria are met. Request changes if not]
git commit -s
).unsafe
code is properly documented.firecracker/swagger.yaml
.CHANGELOG.md
.