Make virtio flush disabled by default #2223

pclesr · 2020-10-27T09:57:51Z

Reason for This PR

The existing implementation incorrectly advertises VIRTIO_BLK_T_FLUSH, when it
does not actually flush data to media from the host page cache. Unfortunately,
making all devices respect the flush feature bit slows down booting because
on a RW root partition, there can be a number of flush requests during the boot
process.

Description of Changes

This commit adds a drive option 'want_flush' that allows each
drive to specify whether they require consistency and are willing
to take the performance hit of syncing data out to media. The
default is the existing behaviour, namely no flush.

This functionality can be added in rust-vmm.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.

PR Checklist

[Author TODO: Meet these criteria.]
[Reviewer TODO: Verify that these criteria are met. Request changes if not]

All commits in this PR are signed (git commit -s).
The reason for this PR is clearly provided (issue no. or explanation).
The description of changes is clear and encompassing.
Any required documentation changes (code and docs) are included in this PR.
Any newly added unsafe code is properly documented.
Any API changes are reflected in firecracker/swagger.yaml.
Any user-facing changes are mentioned in CHANGELOG.md.
All added/changed functionality is tested.

pclesr · 2020-10-27T11:07:53Z

Are there any known issues with snapshots on AMD ?
FAILED integration_tests/functional/test_snapshot_basic.py::test_5_full_snapshots
This test times out on AMD but passes on x86

alindima · 2020-10-28T07:58:44Z

Are there any known issues with snapshots on AMD ?
FAILED integration_tests/functional/test_snapshot_basic.py::test_5_full_snapshots
This test times out on AMD but passes on x86

This shouldn't normally happen. I logged this issue and retriggered the CI run.

pclesr · 2020-10-28T08:26:36Z

Are there any known issues with snapshots on AMD ?
FAILED integration_tests/functional/test_snapshot_basic.py::test_5_full_snapshots
This test times out on AMD but passes on x86

This shouldn't normally happen. I logged this issue and retriggered the CI run.

Thank you @alindima -- the test worked on the second attempt. If/When it happens again, how do I trigger a CI run?

alindima · 2020-10-28T08:30:28Z

Are there any known issues with snapshots on AMD ?
FAILED integration_tests/functional/test_snapshot_basic.py::test_5_full_snapshots
This test times out on AMD but passes on x86

This shouldn't normally happen. I logged this issue and retriggered the CI run.

Thank you @alindima -- the test worked on the second attempt. If/When it happens again, how do I trigger a CI run?

The CI dashboard is not publicly accessible, we will take care of any retriggers that are blocking PR merges.
The CI is also retriggered on any PR updates.

alindima · 2020-10-28T10:09:51Z

@pclesr thank you very much for all the hard work that went into the last couple of PRs.

We are still working on deciding what the right fix for this flush issue is. We will let you know as soon as we reach a conclusion.

pclesr · 2020-10-29T10:26:38Z

We are still working on deciding what the right fix for this flush issue is. We will let you know as soon as we reach a conclusion.

Can you help me understand where (or what) the objections are centered on? I had hoped that this backward-compatible fix would be more acceptable, and it would help me to understand where the "pain points" are.

I work in the auto embedded space now, and there are some very good use cases for firecracker.

alindima · 2020-10-29T15:21:28Z

It seems to me that the reason for the binary size increase on ARM is natural. I think you should increase the allowed target by a little, in test_binary_size.py

alindima · 2020-10-29T15:31:02Z

We are still working on deciding what the right fix for this flush issue is. We will let you know as soon as we reach a conclusion.

Can you help me understand where (or what) the objections are centered on? I had hoped that this backward-compatible fix would be more acceptable, and it would help me to understand where the "pain points" are.

I work in the auto embedded space now, and there are some very good use cases for firecracker.

We are trying to first investigate the impact this issue has from a data integrity/corruption standpoint, before committing to a fix. We haven't yet identified a regression test for this issue, where the outcome is data corruption, not just the lack of actual flushes.

We really appreciate and thank you for the effort you've put in. The way I see it, there is still another alternative to the approach outlined in this PR: not advertising the flush capability to the guest driver at all.

Any work in identifying any visible impact on the lack of flush would be greatly appreciated.

It's great to hear that Firecracker can be put to good use in the embedded space as well 😄

pclesr · 2020-10-29T15:38:24Z

We are trying to first investigate the impact this issue has from a data integrity/corruption standpoint, before committing to a fix. We haven't yet identified a regression test for this issue, where the outcome is data corruption, not just the lack of actual flushes.

I tried to develop a deterministic test case that did not involve power-cycling the host, but still have not found a solution. My approach was to try and get the pages in the page cache to be dropped when dirty without flushing (think: echo 1 > /proc/sys/vm/drop_caches). Unfortunately, even with MADV_DONTNEED I still could not get a deterministic way to force dirty pages in the cache to just be discarded without flushing to backing store.

The goal would be to write a file from the VM without sync (so the pages are just sitting in the page cache on the host), then drop the pages (without writing to backing store, simulating a power failure) and then re-read the data from media. If one writes a pattern (as can be done with fio), it would hopefully then result in a mismatch between what was written in the VM and what in on physical media.

So far, no luck yet in this approach, but I'm still looking at options.

If a device advertises VIRTIO_BLK_F_FLUSH, then it must sync the data to physical media. The current flush implemention only flushes cached data to the host page cache, which can lead to incoherency between VM and physical media. Unfortunately, during boot there can be up to six flush requests on a read-write root partition, which can slow down booting. This means that the current paradigm (of not flushing data to physical media) must remain the default. This commit adds a drive option 'want_flush' that allows each drive to specify whether they require consistency and are willing to take the performance hit of syncing data out to media. The default is the existing behaviour, namely no flush. For drives that require the flush, the virtio layer will first flush any cached data (the current behaviour) and then force the pages in the host page cache for the drive out to physical media via sync_all(). This has been verified by running a 'fio' test with sync_on_close in the VM and verifying via 'perf' that fsync() on the host is actually invoked. Also, the bandwidth in the VM with flush enabled is also now in line with the underlying hardware. The invocation of sync_all() has been refactored to be in accordance with other function invocations in the request handler. This change also requires that the sys_fsync system call is in the list of approved seccomp calls. Signed-off-by: Peter Lawthers peter.lawthers@esrlabs.com

This adds python tests to verify the correct operation of flush. The features file in /sys/block/<drive>/device/features is checked to ensure that it is correct with respect to the optional 'want_flush' drive setting. Metrics also verify that by default, no flushes are received. Only when 'want_flush' is true should the 'flush_count' block metric increase. Signed-off-by: Peter Lawthers peter.lawthers@esrlabs.com

Increase the binary size target and limit to account for added flush code Signed-off-by: Peter Lawthers peter.lawthers@esrlabs.com

pclesr · 2020-11-09T10:10:51Z

@alindima : I have been out for a week -- what is the status of this? What have you all decided with respect to how to handle this issue?

acatangiu · 2020-11-24T12:28:25Z

@pclesr we're currently validating options (one of which being the one proposed by you in this PR) with firecracker customers.
Unfortunately now is a busy time for many people and this process takes some time. We have meetings on this topic scheduled up to mid-december, so that's the earlieast decision ETA.

pclesr · 2021-01-12T15:07:21Z

Any status update?

sandreim · 2021-01-13T08:44:55Z

Hey @pclesr! We have made progress on the issue, but we did not reach a decision on what are all the usecases we want to support. We plan to restart work next week.

pclesr · 2021-01-25T13:53:06Z

If Amazon is not going to take any status, can they at least fix the API so that the device is correctly shown as not supporting flush?

georgepisaltu · 2021-01-26T10:51:01Z

Hi @pclesr,

We understand the importance of this issue and we apologize for not being able to provide an accurate timeline for scoping and fixing it. This was a miss on our part as we are trying to come up with a solution that supports your use case as well as our other customers’.

We are actively working on this issue and our next steps are to align with our other customers with respect to the fix and then implementing it. For now, we will remove the flush virtio feature because, as you pointed out, it was not correct to offer it if we don’t implement it.

georgepisaltu · 2021-01-29T11:34:29Z

Hi @pclesr,

We reached a conclusion regarding this issue (see here). This PR is a very good starting point and we'd like to resume progress here.

Thank you for your patience. Let us know if you are still willing to contribute to this PR. If so, we will start reviewing it and we can move forward from there.

georgepisaltu · 2021-02-08T11:54:10Z

I went ahead and implemented the required changes, see the PR here.

I will close this PR in favor of the other one.

This was referenced Oct 27, 2020

Virtio block flush (VIRTIO_BLK_T_FLUSH) not syncing data to host #2172

Closed

virtio: sync data on virtio flush request #2185

Closed

alindima assigned pclesr Oct 28, 2020

pclesr marked this pull request as draft October 28, 2020 08:32

pclesr marked this pull request as ready for review October 28, 2020 08:34

pclesr force-pushed the flush_merge branch from c9224b6 to 094ea8c Compare October 29, 2020 10:19

pclesr added 3 commits October 30, 2020 12:20

tests: increase binary size for aarch64

5a0d9f7

Increase the binary size target and limit to account for added flush code Signed-off-by: Peter Lawthers peter.lawthers@esrlabs.com

alindima mentioned this pull request Oct 30, 2020

metrics: fix bug triggered by memory reordering casualties #2229

Merged

9 tasks

pclesr force-pushed the flush_merge branch 2 times, most recently from c9224b6 to 5a0d9f7 Compare October 30, 2020 11:43

acatangiu added the Status: Blocked Indicates that an issue or pull request cannot currently be worked on label Nov 25, 2020

acatangiu requested review from alindima, acatangiu and sandreim November 25, 2020 09:45

berciuliviu mentioned this pull request Jan 14, 2021

Log errors for block device close() #2188

Closed

berciuliviu mentioned this pull request Jan 14, 2021

Add mechanism to select per block device cache strategy. #2207

Closed

georgepisaltu mentioned this pull request Jan 29, 2021

block: removed virtio flush feature #2427

Closed

9 tasks

georgepisaltu added Status: Author and removed Status: Blocked Indicates that an issue or pull request cannot currently be worked on labels Feb 4, 2021

georgepisaltu closed this Feb 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make virtio flush disabled by default #2223

Make virtio flush disabled by default #2223

pclesr commented Oct 27, 2020

pclesr commented Oct 27, 2020

alindima commented Oct 28, 2020 •

edited

Loading

pclesr commented Oct 28, 2020

alindima commented Oct 28, 2020 •

edited

Loading

alindima commented Oct 28, 2020

pclesr commented Oct 29, 2020 •

edited

Loading

alindima commented Oct 29, 2020

alindima commented Oct 29, 2020

pclesr commented Oct 29, 2020

pclesr commented Nov 9, 2020

acatangiu commented Nov 24, 2020

pclesr commented Jan 12, 2021

sandreim commented Jan 13, 2021 •

edited

Loading

pclesr commented Jan 25, 2021

georgepisaltu commented Jan 26, 2021

georgepisaltu commented Jan 29, 2021 •

edited

Loading

georgepisaltu commented Feb 8, 2021

Make virtio flush disabled by default #2223

Make virtio flush disabled by default #2223

Conversation

pclesr commented Oct 27, 2020

Reason for This PR

Description of Changes

License Acceptance

PR Checklist

pclesr commented Oct 27, 2020

alindima commented Oct 28, 2020 • edited Loading

pclesr commented Oct 28, 2020

alindima commented Oct 28, 2020 • edited Loading

alindima commented Oct 28, 2020

pclesr commented Oct 29, 2020 • edited Loading

alindima commented Oct 29, 2020

alindima commented Oct 29, 2020

pclesr commented Oct 29, 2020

pclesr commented Nov 9, 2020

acatangiu commented Nov 24, 2020

pclesr commented Jan 12, 2021

sandreim commented Jan 13, 2021 • edited Loading

pclesr commented Jan 25, 2021

georgepisaltu commented Jan 26, 2021

georgepisaltu commented Jan 29, 2021 • edited Loading

georgepisaltu commented Feb 8, 2021

alindima commented Oct 28, 2020 •

edited

Loading

alindima commented Oct 28, 2020 •

edited

Loading

pclesr commented Oct 29, 2020 •

edited

Loading

sandreim commented Jan 13, 2021 •

edited

Loading

georgepisaltu commented Jan 29, 2021 •

edited

Loading