Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make virtio flush disabled by default #2223

Closed
wants to merge 3 commits into from

Conversation

pclesr
Copy link

@pclesr pclesr commented Oct 27, 2020

Reason for This PR

The existing implementation incorrectly advertises VIRTIO_BLK_T_FLUSH, when it
does not actually flush data to media from the host page cache. Unfortunately,
making all devices respect the flush feature bit slows down booting because
on a RW root partition, there can be a number of flush requests during the boot
process.

Description of Changes

This commit adds a drive option 'want_flush' that allows each
drive to specify whether they require consistency and are willing
to take the performance hit of syncing data out to media. The
default is the existing behaviour, namely no flush.

  • This functionality can be added in rust-vmm.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.

PR Checklist

[Author TODO: Meet these criteria.]
[Reviewer TODO: Verify that these criteria are met. Request changes if not]

  • All commits in this PR are signed (git commit -s).
  • The reason for this PR is clearly provided (issue no. or explanation).
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • Any newly added unsafe code is properly documented.
  • Any API changes are reflected in firecracker/swagger.yaml.
  • Any user-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.

@pclesr
Copy link
Author

pclesr commented Oct 27, 2020

Are there any known issues with snapshots on AMD ?
FAILED integration_tests/functional/test_snapshot_basic.py::test_5_full_snapshots  
This test times out on AMD but passes on x86

@alindima
Copy link
Contributor

alindima commented Oct 28, 2020

Are there any known issues with snapshots on AMD ?
FAILED integration_tests/functional/test_snapshot_basic.py::test_5_full_snapshots  
This test times out on AMD but passes on x86

This shouldn't normally happen. I logged this issue and retriggered the CI run.

@pclesr
Copy link
Author

pclesr commented Oct 28, 2020

Are there any known issues with snapshots on AMD ?
FAILED integration_tests/functional/test_snapshot_basic.py::test_5_full_snapshots  
This test times out on AMD but passes on x86

This shouldn't normally happen. I logged this issue and retriggered the CI run.

Thank you @alindima -- the test worked on the second attempt. If/When it happens again, how do I trigger a CI run?

@alindima
Copy link
Contributor

alindima commented Oct 28, 2020

Are there any known issues with snapshots on AMD ?
FAILED integration_tests/functional/test_snapshot_basic.py::test_5_full_snapshots  
This test times out on AMD but passes on x86

This shouldn't normally happen. I logged this issue and retriggered the CI run.

Thank you @alindima -- the test worked on the second attempt. If/When it happens again, how do I trigger a CI run?

The CI dashboard is not publicly accessible, we will take care of any retriggers that are blocking PR merges.
The CI is also retriggered on any PR updates.

@pclesr pclesr marked this pull request as draft October 28, 2020 08:32
@pclesr pclesr marked this pull request as ready for review October 28, 2020 08:34
@alindima
Copy link
Contributor

@pclesr thank you very much for all the hard work that went into the last couple of PRs.

We are still working on deciding what the right fix for this flush issue is. We will let you know as soon as we reach a conclusion.

@pclesr
Copy link
Author

pclesr commented Oct 29, 2020

We are still working on deciding what the right fix for this flush issue is. We will let you know as soon as we reach a conclusion.

Can you help me understand where (or what) the objections are centered on? I had hoped that this backward-compatible fix would be more acceptable, and it would help me to understand where the "pain points" are.

I work in the auto embedded space now, and there are some very good use cases for firecracker.

@alindima
Copy link
Contributor

It seems to me that the reason for the binary size increase on ARM is natural. I think you should increase the allowed target by a little, in test_binary_size.py

@alindima
Copy link
Contributor

We are still working on deciding what the right fix for this flush issue is. We will let you know as soon as we reach a conclusion.

Can you help me understand where (or what) the objections are centered on? I had hoped that this backward-compatible fix would be more acceptable, and it would help me to understand where the "pain points" are.

I work in the auto embedded space now, and there are some very good use cases for firecracker.

We are trying to first investigate the impact this issue has from a data integrity/corruption standpoint, before committing to a fix. We haven't yet identified a regression test for this issue, where the outcome is data corruption, not just the lack of actual flushes.

We really appreciate and thank you for the effort you've put in. The way I see it, there is still another alternative to the approach outlined in this PR: not advertising the flush capability to the guest driver at all.

Any work in identifying any visible impact on the lack of flush would be greatly appreciated.

It's great to hear that Firecracker can be put to good use in the embedded space as well 😄

@pclesr
Copy link
Author

pclesr commented Oct 29, 2020

We are trying to first investigate the impact this issue has from a data integrity/corruption standpoint, before committing to a fix. We haven't yet identified a regression test for this issue, where the outcome is data corruption, not just the lack of actual flushes.

I tried to develop a deterministic test case that did not involve power-cycling the host, but still have not found a solution. My approach was to try and get the pages in the page cache to be dropped when dirty without flushing (think: echo 1 > /proc/sys/vm/drop_caches). Unfortunately, even with MADV_DONTNEED I still could not get a deterministic way to force dirty pages in the cache to just be discarded without flushing to backing store.

The goal would be to write a file from the VM without sync (so the pages are just sitting in the page cache on the host), then drop the pages (without writing to backing store, simulating a power failure) and then re-read the data from media. If one writes a pattern (as can be done with fio), it would hopefully then result in a mismatch between what was written in the VM and what in on physical media.

So far, no luck yet in this approach, but I'm still looking at options.

If a device advertises VIRTIO_BLK_F_FLUSH, then it must sync
the data to physical media. The current flush implemention
only flushes cached data to the host page cache, which
can lead to incoherency between VM and physical media.
Unfortunately, during boot there can be up to six flush
requests on a read-write root partition, which can slow
down booting. This means that the current paradigm (of not
flushing data to physical media) must remain the default.

This commit adds a drive option 'want_flush' that allows each
drive to specify whether they require consistency and are willing
to take the performance hit of syncing data out to media. The
default is the existing behaviour, namely no flush.

For drives that require the flush, the virtio layer will first
flush any cached data (the current behaviour) and then force
the pages in the host page cache for the drive out to physical
media via sync_all().

This has been verified by running a 'fio' test with
sync_on_close in the VM and verifying via 'perf' that fsync()
on the host is actually invoked. Also, the bandwidth in the
VM with flush enabled is also now in line with the underlying
hardware.

The invocation of sync_all() has been refactored to be in accordance
with other function invocations in the request handler.

This change also requires that the sys_fsync system call is in the
list of approved seccomp calls.

Signed-off-by: Peter Lawthers peter.lawthers@esrlabs.com
This adds python tests to verify the correct operation
of flush. The features file in /sys/block/<drive>/device/features
is checked to ensure that it is correct with respect to the
optional 'want_flush' drive setting. Metrics also verify that
by default, no flushes are received. Only when 'want_flush' is
true should the 'flush_count' block metric increase.

Signed-off-by: Peter Lawthers peter.lawthers@esrlabs.com
Increase the binary size target and limit to account for
added flush code

Signed-off-by: Peter Lawthers peter.lawthers@esrlabs.com
@pclesr pclesr force-pushed the flush_merge branch 2 times, most recently from c9224b6 to 5a0d9f7 Compare October 30, 2020 11:43
@pclesr
Copy link
Author

pclesr commented Nov 9, 2020

@alindima : I have been out for a week -- what is the status of this? What have you all decided with respect to how to handle this issue?

@acatangiu
Copy link
Contributor

@pclesr we're currently validating options (one of which being the one proposed by you in this PR) with firecracker customers.
Unfortunately now is a busy time for many people and this process takes some time. We have meetings on this topic scheduled up to mid-december, so that's the earlieast decision ETA.

@acatangiu acatangiu added the Status: Blocked Indicates that an issue or pull request cannot currently be worked on label Nov 25, 2020
@pclesr
Copy link
Author

pclesr commented Jan 12, 2021

Any status update?

@sandreim
Copy link
Contributor

sandreim commented Jan 13, 2021

Hey @pclesr! We have made progress on the issue, but we did not reach a decision on what are all the usecases we want to support. We plan to restart work next week.

@pclesr
Copy link
Author

pclesr commented Jan 25, 2021

If Amazon is not going to take any status, can they at least fix the API so that the device is correctly shown as not supporting flush?

@georgepisaltu
Copy link
Contributor

Hi @pclesr,

We understand the importance of this issue and we apologize for not being able to provide an accurate timeline for scoping and fixing it. This was a miss on our part as we are trying to come up with a solution that supports your use case as well as our other customers’.

We are actively working on this issue and our next steps are to align with our other customers with respect to the fix and then implementing it. For now, we will remove the flush virtio feature because, as you pointed out, it was not correct to offer it if we don’t implement it.

@georgepisaltu
Copy link
Contributor

georgepisaltu commented Jan 29, 2021

Hi @pclesr,

We reached a conclusion regarding this issue (see here). This PR is a very good starting point and we'd like to resume progress here.

Thank you for your patience. Let us know if you are still willing to contribute to this PR. If so, we will start reviewing it and we can move forward from there.

@georgepisaltu georgepisaltu added Status: Author and removed Status: Blocked Indicates that an issue or pull request cannot currently be worked on labels Feb 4, 2021
@georgepisaltu
Copy link
Contributor

I went ahead and implemented the required changes, see the PR here.

I will close this PR in favor of the other one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants