Avoid BUG in migrate_folio_extra #16568

tstabrawa · 2024-09-25T14:43:51Z

Linux page migration code won't wait for writeback to complete unless it needs to call release_folio. Call SetPagePrivate wherever PageUptodate is set and define .release_folio, to cause fallback_migrate_folio to wait for us.

Motivation and Context

Thanks for considering this PR.

I came across issue #15140 from the Proxmox VE 8.1 release notes, and gave it a good long look over. As far as I can tell, what's happening is that the Linux kernel page migration code is starting writeback on some pages, not waiting for writeback to complete, and then throwing a BUG when it finds that pages are still under writeback.

Pretty much all of the interesting action happens in fallback_migrate_folio(), which doesn't show up in the stack traces listed in #15140, but suffice it to say that it's called from move_to_new_folio(), which does appear in the stack traces. What appears to be happening in the case of the crashes described in #15140 is that fallback_migrate_folio() is being called upon dirty ZFS page-cache pages, so it's starting writeback by calling writeout(). Then, since ZFS doesn't store private data in any page cache pages, it skips the call to filemap_release_folio() (because folio_test_private() returns false), and immediately calls migrate_folio(), which in turn calls migrate_folio_extra(). Then, at the beginning of migrate_folio_extra(), it BUGs out because the page is still under writeback (folio_test_writeback() returns true).

Notably, if the page did have private data, then fallback_migrate_folio() would call into filemap_release_folio(), which would return false for pages under writeback, causing fallback_migrate_folio() to exit before calling migrate_folio().

So, in summary, in order for the BUG to happen a few things need to be true:

Dirty pages are being migrated (or in the case of Kernel bug when flushing memory caches for hugepages from Linux 6.3.1 to 6.10.14 #15140 OP, compacted)
The filesystem does asynchronous writeback (calls to its .writepage function return with the page unlocked and with PG_writeback set)
The filesystem does not store private data with page cache pages (e.g. buffers)

I went through the code for all of the filesystems in the Linux kernel and didn't see any that met all three conditions. Notably, pretty much all traditional filesystems store buffers in page private data. Those filesystems that don't store buffers either store something else in page_private (e.g. shmem/tmpfs, iomap), or don't do asynchronous writeback (e.g. ecryptfs, fuse, romfs, squashfs). So it would appear as if ZFS may be the only filesystem that experiences this particular behavior. As far as I can tell, the above-described behavior goes back all the way to when page migration was first implemented in kernel 2.6.16.

The way I see it, there are two ways to make the problem go away:

Change the Linux kernel so that fallback_migrate_folio() won't call migrate_folio() if the page is under writeback, even for pages without private data.
Change ZFS so that it stores some private data (or at least indicates as if it does)

I assume the latter may be preferable (even if only temporarily) so that ZFS can avoid this crash for any/all kernel versions, but I'm happy to defer to the ZFS devs on which option(s) you choose to pursue.

The latter is the approach I took in the patch proposed here.

Description

Call SetPagePrivate wherever PageUptodate is set and define .release_folio, to cause fallback_migrate_folio to wait for writeback to complete.

How Has This Been Tested?

Tested by user @JKDingwall - results in the following comments on #15140:

Also, regression-tested by running ZFS Test Suite (on Ubunutu 23.10, running kernel version 6.5.0-35-generic. No new test failures were observed. See attached files:

On master: zfs-tests - ccc420acd5452ca6fffe3797ea7da5a5f1a64d2b.txt
On fix_15140 branch: zfs-tests - 202e6a4b3b1add7c4535584adff862106c86caf4.txt

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

Linux page migration code won't wait for writeback to complete unless it needs to call release_folio. Call SetPagePrivate wherever PageUptodate is set and define .release_folio, to cause fallback_migrate_folio to wait for us. Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>

behlendorf

@tstabrawa thanks for digging to to this and the clear analysis of the bug! I agree, setting PagePrivate() and registering the .releasepage and .invalidatepage sure does look like the best way to handle this. Plus it has the advantage of aligning the ZFS code a bit more closely with the other kernel filesystems which we can hope will help us avoid this code of bug in the future.

Interestingly, I see that private page check in fallback_migrate_folio() was already dropped by torvalds/linux@0201ebf for the 6.5 kernel so filemap_release_folio() is now called unconditionally.

Unrelated to the fix it looks like the CI had an infrastructure glitch cloning the repository on some of the builders. I'll go ahead and resubmit those builds once the running tests wrap up.

tstabrawa · 2024-09-25T19:42:04Z

Thanks for the quick PR approval!

Interestingly, I see that private page check in fallback_migrate_folio() was already dropped by torvalds/linux@0201ebf for the 6.5 kernel so filemap_release_folio() is now called unconditionally.

I don't think it matters RE: this pull request, but it may be worth being aware of anyway. The patch you reference didn't really remove the private-page check; it just moves it to folio_needs_release, which is now called by filemap_release_folio. So, the BUG should still be reproducible in 6.5 kernels as well (and was reproduced on 6.5.1 by @RodoMa92).

tstabrawa · 2024-09-26T16:18:21Z

Thanks again for the PR approval and merge! Let me know if you run into further trouble related to this problem and/or change.

RodoMa92 · 2024-10-30T14:48:33Z

Yeah, sadly this hasn't fixed anything on my end. Just switched back to ZFS yesterday. Same issue occurs. Stacktrace in the original issue that will need to be reopened I guess.

behlendorf · 2024-11-01T03:35:15Z

@RodoMa92 what version of ZFS and kernel did you manage to hit this with again?

tstabrawa · 2024-11-01T05:02:31Z

@behlendorf From this comment on #15140, it sounds like @RodoMa92 was using ~~ZFS 6.10.14 and~~ kernel 6.10.14-273-tkg-eevdf.

RodoMa92 · 2024-11-01T05:19:18Z

@behlendorf From this comment on #15140, it sounds like @RodoMa92 was using ZFS 6.10.14 and kernel 6.10.14-273-tkg-eevdf.

ZFS-dkms 2.2.6, and both latest LTS + latest supported kernel.

behlendorf added the Status: Code Review Needed Ready for review and testing label Sep 25, 2024

behlendorf requested review from bwatkinson and behlendorf September 25, 2024 14:53

behlendorf approved these changes Sep 25, 2024

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Sep 26, 2024

behlendorf merged commit b052035 into openzfs:master Sep 26, 2024
20 checks passed

tstabrawa deleted the fix_15140 branch September 26, 2024 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid BUG in migrate_folio_extra #16568

Avoid BUG in migrate_folio_extra #16568

tstabrawa commented Sep 25, 2024

behlendorf left a comment

tstabrawa commented Sep 25, 2024

tstabrawa commented Sep 26, 2024

RodoMa92 commented Oct 30, 2024 •

edited

Loading

behlendorf commented Nov 1, 2024

tstabrawa commented Nov 1, 2024 •

edited

Loading

RodoMa92 commented Nov 1, 2024

Avoid BUG in migrate_folio_extra #16568

Avoid BUG in migrate_folio_extra #16568

Conversation

tstabrawa commented Sep 25, 2024

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

behlendorf left a comment

Choose a reason for hiding this comment

tstabrawa commented Sep 25, 2024

tstabrawa commented Sep 26, 2024

RodoMa92 commented Oct 30, 2024 • edited Loading

behlendorf commented Nov 1, 2024

tstabrawa commented Nov 1, 2024 • edited Loading

RodoMa92 commented Nov 1, 2024

RodoMa92 commented Oct 30, 2024 •

edited

Loading

tstabrawa commented Nov 1, 2024 •

edited

Loading