Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: improve initramfs image performance and efficiency via cpio reflinks #1148

Closed
wants to merge 6 commits into from

Conversation

ddiss
Copy link
Contributor

@ddiss ddiss commented Mar 10, 2021

This patchset attempts to speed up initramfs generation for some common (Btrfs / XFS) setups by having Dracut make heavier use of reflinks (AKA copy-on-write clones) during initramfs generation. A good portion of an uncompressed+unstripped initramfs image is duplicate data, which really shouldn't need to be shuffled around when on the same COW clone capable FS.

Dracut already uses cp --reflink=auto when shuffling most things into the temporary staging area. This patchset adds a new --cpio-reflink flag which sees Dracut call GNU cpio using the new --reflink and --chain parameters. Support for these new parameters can be found in my cpio repository at https://github.com/ddiss/cpio/tree/copy_file_range_2_13 .
As XFS and Btrfs require filesystem alignment for extent sharing, a new padcpio utility is added to provide padding of file data to a block-size boundary.

This allows for:

  • improved space efficiency
    • initramfs creation avoids data duplication
  • improved performance
    • initramfs image needn't be stripped / compressed / decompressed

The following caveats would be present for dracut to successfully use reflink (otherwise fallback to read/write):

  • root, boot and dracut staging (/var/tmp) exist on the same Btrfs or XFS filesystem
  • paths don't have nocow flags set
  • boot performance may be negatively affected by fragmentation, but that should be compensated by the removal of compression / decompression

Benchmarks are still ongoing, but my initial results show:

---------------------------------+----------+----------+-----------
     Benchmark                   |  Before  |  After   |  Change
---------------------------------+----------+----------+-----------
Dracut create image runtime      |  8.452s  |  7.635s  |  -9.666%
---------------------------------+----------+----------+-----------
initramfs data (fiemap)          |          |          |
- total                          | 12894208 | 34009088 |  +163.7%
- shared (dedup)                 |     0    | 24068096 |
- exclusive                      | 12894208 |  9940992 |  -22.90%
---------------------------------+----------+----------+-----------
QEMU cold boot to Dracut init    |  3.208s  |  2.850s  |  -11.15%
---------------------------------+----------+----------+-----------

See #1141 (comment) for details.

RFC: The top commit of this patchset shouldn't be merged until GNU cpio --reflink support is upstream. I believe all other changes are read.

Fixes: #1141

@github-actions github-actions bot added the test Issues related to testing label Mar 10, 2021
Copy link
Collaborator

@johannbg johannbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to be missing shell competition and documentation ( man ) update/page from the PR.

skipcpio/padcpio.c Show resolved Hide resolved
dracut.sh Show resolved Hide resolved
test/TEST-62-CPIO/test.sh Show resolved Hide resolved
@haraldh
Copy link
Collaborator

haraldh commented Mar 10, 2021

So, the whole thing is most interesting for initramfs images stored on xfs/btrfs for qemu booting, as moving it to EFI vfat will remove the dedup feature. Boot times for bare metal would also increase due to non-compression.

Did I understand this correctly?

@ddiss
Copy link
Contributor Author

ddiss commented Mar 11, 2021

So, the whole thing is most interesting for initramfs images stored on xfs/btrfs for qemu booting, as moving it to EFI vfat will remove the dedup feature. Boot times for bare metal would also increase due to non-compression.

Did I understand this correctly?

Yes, correct, it's most beneficial for environments where initramfs source files, Dracut staging area and destination image are all on the same XFS or Btrfs filesystem. If the Dracut destination is EFI vfat then duplication is expected, although it would still allow for $initdir/*->${DRACUT_TMPDIR}/initramfs.img cpio data copy I/O to be reduced.

@haraldh
Copy link
Collaborator

haraldh commented Mar 11, 2021

I am sorry, you have to rebase due to upstream changes in Makefile.

@ddiss
Copy link
Contributor Author

ddiss commented Mar 11, 2021

I am sorry, you have to rebase due to upstream changes in Makefile.

No problem 😃 . I've made some extra changes to use statfs reported optimal alignment for padding, rather than the hardcoded 4K. I'll force push here along with @johannbg 's suggested changes when done testing.

@johannbg
Copy link
Collaborator

@ddiss can you give us a link to your cpio --reflink upstream PR, looking upstream and based on lack of activity there,it looks like it can literally take years before your PR gets merged and is included in a new release.

@ddiss
Copy link
Contributor Author

ddiss commented Mar 12, 2021

@ddiss can you give us a link to your cpio --reflink upstream PR, looking upstream and based on lack of activity there,it looks like it can literally take years before your PR gets merged and is included in a new release.

They've now been submitted via https://savannah.gnu.org/patch/index.php?10044 and the upstream mailing list:
https://lists.gnu.org/archive/html/bug-cpio/2021-03/msg00008.html (edit: fix ML link)

@ddiss
Copy link
Contributor Author

ddiss commented Mar 15, 2021

Changes since previous version:

  • rebase against current master branch
  • use statfs to determine optimal padding alignment
  • address @johannbg 's review feedback
  • add minor test cleanup: d139603
  • improve man-page documentation

ddiss added 3 commits March 15, 2021 18:03
Signed-off-by: David Disseldorp <ddiss@suse.de>
Individual test scripts may change working directory, so relative paths
should be avoided.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Preparation for reusing cpio archive definitions.
skipcpio's PROGRAM_VERSION_STRING is currently unused so drop it.

Signed-off-by: David Disseldorp <ddiss@suse.de>
@ddiss
Copy link
Contributor Author

ddiss commented Mar 15, 2021

Changes since previous version:

  • rebase against current master branch
  • apply astyle indent rules to padcpio

1. Purpose
Improve cpio archive creation performance and space efficiency by
ensuring that file data is aligned within the archive to the filesystem
block size.
For filesystems supporting reflinks (e.g. XFS and Btrfs) this ensures
that cpio archive data can share the same copy-on-write extents as the
archive source files. A GNU cpio binary capable of copy-on-write cloning
data via copy_file_range is required to make proper use of this.

Padding can't be added to cpio archives arbitrarily, so we need to
inject extra files into the archive to provide filesystem alignment.

2. Behaviour
Read a zero terminated file list from stdin. If an input file, when cpio
serialized, is aligned to <padding alignment order> then print it to
stdout.
If the cpio archived file data would be unaligned then create a padding
file in <pad dir> which, when cpio archived before the input file, will
provide cpio alignment and print the pad file path before the input file
path. GNU cpio reorders hardlinks, so avoid extra complexity by
deferring them (unpadded) to the end of the archive.

Signed-off-by: David Disseldorp <ddiss@suse.de>
@ddiss
Copy link
Contributor Author

ddiss commented Mar 15, 2021

Changes since previous version:

  • add padcpio to dracut.spec

ddiss added 2 commits March 15, 2021 23:59
Provides some coverage for the skipcpio and padcpio binaries.

Signed-off-by: David Disseldorp <ddiss@suse.de>
…reflinks

The new GNU cpio "--reflink" parameter sees it use copy_file_range()
when copying between source and destination, allowing for copy-on-write
optimization if supported by the underlying filesystem (e.g. Btrfs or
XFS).

When calling cpio with --reflink the file list is piped through padcpio
to ensure optimal alignment for extent sharing. Microcode and initramfs
proper archives are chained together using the new cpio "--chain"
parameter, which ensures that subsequent archives are appended in a
reflink friendly fashion.

RFC: This shouldn't be merged until GNU cpio --reflink --chain patches
     have made it upstream.

Signed-off-by: David Disseldorp <ddiss@suse.de>
@ddiss
Copy link
Contributor Author

ddiss commented Mar 15, 2021

Changes since previous version:

  • change TEST-62-CPIO commit message to make Commisery happy

@ddiss
Copy link
Contributor Author

ddiss commented Mar 15, 2021

So, the whole thing is most interesting for initramfs images stored on xfs/btrfs for qemu booting, as moving it to EFI vfat will remove the dedup feature. Boot times for bare metal would also increase due to non-compression.
Did I understand this correctly?

Yes, correct, it's most beneficial for environments where initramfs source files, Dracut staging area and destination image are all on the same XFS or Btrfs filesystem. If the Dracut destination is EFI vfat then duplication is expected

One thing I forgot to clear up here is that these changes don't just target qemu workloads - [open]SUSE distros commonly carry initramfs with the root filesystem, so that system rollback (from a Btrfs snapshot) is possible.

@johannbg johannbg self-requested a review March 16, 2021 01:19
@haraldh
Copy link
Collaborator

haraldh commented Mar 30, 2021

I will merge this, if cpio upstream supports it.

@johannbg
Copy link
Collaborator

@ddiss given that it can take a while for upstream cpio to merge this would you mind putting this into draft mode if and then when it finally gets merged take it out of draft mode?

@ddiss
Copy link
Contributor Author

ddiss commented Mar 30, 2021

I will merge this, if cpio upstream supports it.

Thanks Harald. I'll post an update to the cpio patchset this week following @luis-henrix 's review (https://lists.gnu.org/archive/html/bug-cpio/2021-03/msg00008.html).

@ddiss given that it can take a while for upstream cpio to merge this would you mind putting this into draft mode if and then when it finally gets merged take it out of draft mode?

Sure, that works for me. I'm probably missing something, but I don't see a draft-mode button in the Github UI. Are you able to do it from your side?

@johannbg johannbg marked this pull request as draft March 30, 2021 13:54
@johannbg
Copy link
Collaborator

@ddiss done, it's a link to click not a visible button ( for whatever reason ) and the link is in the bottom right corner, in the reviewers sections

@ddiss
Copy link
Contributor Author

ddiss commented Apr 19, 2021

Just a minor update: I reworked and posted a new version of the GNU cpio patchset to the list last week following Luis' review - https://lists.gnu.org/archive/html/bug-cpio/2021-04/threads.html .

@johannbg
Copy link
Collaborator

Looks like this wont be reviewed until sometime in May based on Sergey's response so this probably will end up in the 55 release ( probably sometime in June ) or the 56 release ( August/September ) or the release after cpio gets a new release.

@ddiss
Copy link
Contributor Author

ddiss commented May 19, 2021

After seeing Rust's native support for copy_file_range() via fs::copy(), I went ahead and wrote a prototype dracut-cpio utility:
https://github.com/ddiss/dracut/tree/dracut_cpio

@haraldh
Copy link
Collaborator

haraldh commented May 19, 2021

After seeing Rust's native support for copy_file_range() via fs::copy(), I went ahead and wrote a prototype dracut-cpio utility:
https://github.com/ddiss/dracut/tree/dracut_cpio

Here is my (outdated) cp version in rust for dracut-install with sparse support:
https://github.com/haraldh/dracut-install/blob/master/src/file/mod.rs#L228

@haraldh
Copy link
Collaborator

haraldh commented May 19, 2021

On a funny side note, part of fs::copy() was one of my first rust project contributions :)

Additional sparse file support was rejected.

@ddiss
Copy link
Contributor Author

ddiss commented May 19, 2021

Here is my (outdated) cp version in rust for dracut-install with sparse support:
https://github.com/haraldh/dracut-install/blob/master/src/file/mod.rs#L228

Nice! I wasn't aware of your work in this area. I'd also be happy to integrate the dracut-cpio functionality into dracut-install.rs directly if you plan on continuing in that direction.

@ddiss
Copy link
Contributor Author

ddiss commented May 27, 2021

After seeing Rust's native support for copy_file_range() via fs::copy(), I went ahead and wrote a prototype dracut-cpio utility:
https://github.com/ddiss/dracut/tree/dracut_cpio

I've updated the dracut-cpio patch to the point where it can now be used for proper testing as a swap-in replacement for GNU cpio. Like my GNU cpio patchset, it supports cpio reflinks and with data segment alignment. Data segment alignment is handled in a much cleaner fashion by injecting extra zeros into the cpio archive filename fields (rather than adding extra pad files to initramfs).
GNU cpio and kernel seem to handle filename zero-injection just fine, but I need to do some further testing and benchmarking.

@stale
Copy link

stale bot commented Jun 27, 2021

This issue is being marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. If this is still an issue in the latest release of Dracut and you would like to keep it open please comment on this issue within the next 7 days. Thank you for your contributions.

@stale stale bot added the stale communication is stuck label Jun 27, 2021
@stale stale bot closed this Jul 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale communication is stuck test Issues related to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] feature: use reflinks for extent sharing between initramfs source and archive data
3 participants