-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: improve initramfs image performance and efficiency via cpio reflinks #1148
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to be missing shell competition and documentation ( man ) update/page from the PR.
So, the whole thing is most interesting for initramfs images stored on xfs/btrfs for qemu booting, as moving it to EFI vfat will remove the dedup feature. Boot times for bare metal would also increase due to non-compression. Did I understand this correctly? |
Yes, correct, it's most beneficial for environments where initramfs source files, Dracut staging area and destination image are all on the same XFS or Btrfs filesystem. If the Dracut destination is EFI vfat then duplication is expected, although it would still allow for |
I am sorry, you have to rebase due to upstream changes in |
No problem 😃 . I've made some extra changes to use statfs reported optimal alignment for padding, rather than the hardcoded |
@ddiss can you give us a link to your cpio --reflink upstream PR, looking upstream and based on lack of activity there,it looks like it can literally take years before your PR gets merged and is included in a new release. |
They've now been submitted via https://savannah.gnu.org/patch/index.php?10044 and the upstream mailing list: |
Changes since previous version:
|
Signed-off-by: David Disseldorp <ddiss@suse.de>
Individual test scripts may change working directory, so relative paths should be avoided. Signed-off-by: David Disseldorp <ddiss@suse.de>
Preparation for reusing cpio archive definitions. skipcpio's PROGRAM_VERSION_STRING is currently unused so drop it. Signed-off-by: David Disseldorp <ddiss@suse.de>
Changes since previous version:
|
1. Purpose Improve cpio archive creation performance and space efficiency by ensuring that file data is aligned within the archive to the filesystem block size. For filesystems supporting reflinks (e.g. XFS and Btrfs) this ensures that cpio archive data can share the same copy-on-write extents as the archive source files. A GNU cpio binary capable of copy-on-write cloning data via copy_file_range is required to make proper use of this. Padding can't be added to cpio archives arbitrarily, so we need to inject extra files into the archive to provide filesystem alignment. 2. Behaviour Read a zero terminated file list from stdin. If an input file, when cpio serialized, is aligned to <padding alignment order> then print it to stdout. If the cpio archived file data would be unaligned then create a padding file in <pad dir> which, when cpio archived before the input file, will provide cpio alignment and print the pad file path before the input file path. GNU cpio reorders hardlinks, so avoid extra complexity by deferring them (unpadded) to the end of the archive. Signed-off-by: David Disseldorp <ddiss@suse.de>
Changes since previous version:
|
Provides some coverage for the skipcpio and padcpio binaries. Signed-off-by: David Disseldorp <ddiss@suse.de>
…reflinks The new GNU cpio "--reflink" parameter sees it use copy_file_range() when copying between source and destination, allowing for copy-on-write optimization if supported by the underlying filesystem (e.g. Btrfs or XFS). When calling cpio with --reflink the file list is piped through padcpio to ensure optimal alignment for extent sharing. Microcode and initramfs proper archives are chained together using the new cpio "--chain" parameter, which ensures that subsequent archives are appended in a reflink friendly fashion. RFC: This shouldn't be merged until GNU cpio --reflink --chain patches have made it upstream. Signed-off-by: David Disseldorp <ddiss@suse.de>
Changes since previous version:
|
One thing I forgot to clear up here is that these changes don't just target qemu workloads - [open]SUSE distros commonly carry initramfs with the root filesystem, so that system rollback (from a Btrfs snapshot) is possible. |
I will merge this, if cpio upstream supports it. |
@ddiss given that it can take a while for upstream cpio to merge this would you mind putting this into draft mode if and then when it finally gets merged take it out of draft mode? |
Thanks Harald. I'll post an update to the cpio patchset this week following @luis-henrix 's review (https://lists.gnu.org/archive/html/bug-cpio/2021-03/msg00008.html).
Sure, that works for me. I'm probably missing something, but I don't see a draft-mode button in the Github UI. Are you able to do it from your side? |
@ddiss done, it's a link to click not a visible button ( for whatever reason ) and the link is in the bottom right corner, in the reviewers sections |
Just a minor update: I reworked and posted a new version of the GNU cpio patchset to the list last week following Luis' review - https://lists.gnu.org/archive/html/bug-cpio/2021-04/threads.html . |
Looks like this wont be reviewed until sometime in May based on Sergey's response so this probably will end up in the 55 release ( probably sometime in June ) or the 56 release ( August/September ) or the release after cpio gets a new release. |
After seeing Rust's native support for |
Here is my (outdated) |
On a funny side note, part of Additional sparse file support was rejected. |
Nice! I wasn't aware of your work in this area. I'd also be happy to integrate the dracut-cpio functionality into dracut-install.rs directly if you plan on continuing in that direction. |
I've updated the dracut-cpio patch to the point where it can now be used for proper testing as a swap-in replacement for GNU cpio. Like my GNU cpio patchset, it supports cpio reflinks and with data segment alignment. Data segment alignment is handled in a much cleaner fashion by injecting extra zeros into the cpio archive filename fields (rather than adding extra pad files to initramfs). |
This issue is being marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. If this is still an issue in the latest release of Dracut and you would like to keep it open please comment on this issue within the next 7 days. Thank you for your contributions. |
This patchset attempts to speed up initramfs generation for some common (Btrfs / XFS) setups by having Dracut make heavier use of reflinks (AKA copy-on-write clones) during initramfs generation. A good portion of an uncompressed+unstripped initramfs image is duplicate data, which really shouldn't need to be shuffled around when on the same COW clone capable FS.
Dracut already uses
cp --reflink=auto
when shuffling most things into the temporary staging area. This patchset adds a new--cpio-reflink
flag which sees Dracut call GNU cpio using the new--reflink
and--chain
parameters. Support for these new parameters can be found in my cpio repository at https://github.com/ddiss/cpio/tree/copy_file_range_2_13 .As XFS and Btrfs require filesystem alignment for extent sharing, a new
padcpio
utility is added to provide padding of file data to a block-size boundary.This allows for:
The following caveats would be present for dracut to successfully use reflink (otherwise fallback to read/write):
Benchmarks are still ongoing, but my initial results show:
See #1141 (comment) for details.
RFC: The top commit of this patchset shouldn't be merged until GNU
cpio --reflink
support is upstream. I believe all other changes are read.Fixes: #1141