Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disk_util: support compressed btrfs filesystems, use btrfs for the OEM partition #131

Merged
merged 3 commits into from
Jul 28, 2021

Conversation

pothos
Copy link
Member

@pothos pothos commented Jul 14, 2021

  • disk_util: support compressed btrfs filesystems

    The limited /usr and OEM partiton size is a challenge when adding new
    packages or updating a package. Since the disk layout can't be changed
    for compatibility reasons when updating an existing instance, we can't
    simply try out something without ensuring first that enough space is
    there by removing something else. This situation can be relaxed by
    leveraging btrfs compression. There was some support for btrfs but it
    was a bit outdated and didn't allow to configure compression or setting
    read-only flags.
    Fix the btrfs support, allow to mark the default subvolume as read only
    and add a compression variable that allows to select a compression
    algorithm. Instead of enabling compression by setting the mount option,
    we can set the filesystem attribute which has the benefit that
    compression is still used with the default mount options for this (top)
    directory and its contents. While for the ext2 /usr partition a hack
    existed to force read-only mode by modifying some bytes and checking
    these bytes could also be used to know if read-only should be used to
    prevent corruption of dm-verity data, we rather check directly whether
    dm-verity is active for this partition and mount it read-only (and
    with the norecovery option to really prevent any write attempt).

  • disk_layout: use btrfs for the OEM partition

    The compression feature of btrfs allows us to store more in the
    size-limited /usr and OEM partitions. The size should of course still
    be monitored to not bloat the image but more headroom helps to try
    things out quickly without hitting the hard limit which fails the
    build.
    Use btrfs for the OEM partition but with zlib compression because
    the outdated GRUB version doesn't support zstd yet.
    New subvolumes currently can't be used for the OEM partition as default
    subvolumes because GRUB tries to read the grub.cfg from the top
    subvolume (at least with our old version). (We could however use
    subvolumes for the /usr partition when switching to btrfs if that
    makes any sense.)

  • disk_layout: optimize btrfs filesystem overhead

    The defaults already give more space than the ext4 defaults but it's
    recommended to use the mixed mode for filesystems smaller than 1-5 GB.
    Another aspect is the duplication of metadata and while it currently is
    off it's actually related to the underlying block device and could
    change as soon as the block device type changes.

    Select the mixed mode that uses a merged area for data and metadata
    blocks. Also ensure that no metadata duplication gets enabled
    automatically.

  • dropped before merging, see end disk_layout: use btrfs for the /usr partition

    The compression feature of btrfs allows us to store more in the
    size-limited /usr and OEM partitions. The size should of course still
    be monitored to not bloat the image but more headroom helps to try
    things out quickly without hitting the hard limit which fails the
    build.
    Use btrfs with zstd compression for the /usr partition. While for ext2
    a hack exists to force read-only mounts by manipulating some bytes of
    the filesystem, on btrfs we can use the subvolume read-only flag
    instead which also works for the default top level subvolume. However,
    it also makes also sense to mount the filesystem with the "norecovery"
    mount option to prevent any write attempts even when the "ro" option is
    set (not needed when using dm-verity in read-only mode but when
    directly mounting without dm-verity). A new subvolumes is not created
    because subvolumes don't offer anything special as long as we use the
    A/B partition update mechanism (but they could be an alternative for
    that). Note that switching to the btrfs on the /usr partition is only
    possible when the Flatcar Stable release has all patches in
    update-engine and seismograph's rootdev.

How to use/testing done

This was built and tested with the coreos-overlay branch kai/bootengine-verity-hashoffset from flatcar-archive/coreos-overlay#1106 in http://jenkins.infra.kinvolk.io:8080/job/os/job/manifest/3029/cldsv/ where the Flatcar image that has a btrfs /usr partition and OEM partition.
While the actual switch to a btrfs filesystem on the /usr partition is only possible when all changes are part of a Stable release because update-engine needs to know how to handle the new filesystem when updating, we can already do the switch for the OEM partition.
Before merging the last commit will be dropped as it is only used to validate the btrfs support. It will go into it's own PR later when Stable has all the coreos-overlay changes present.

Follow up

When all btrfs support changes are part of a Stable release we can consider to switch /usr to btrfs:

From aa8273cd9ce8d4133c459b7853247aeeb664b02f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kai=20L=C3=BCke?= <kailuke@microsoft.com>
Date: Wed, 14 Jul 2021 21:28:12 +0200
Subject: [PATCH] disk_layout: use btrfs for the /usr partition

The compression feature of btrfs allows us to store more in the
size-limited /usr and OEM partitions. The size should of course still
be monitored to not bloat the image but more headroom helps to try
things out quickly without hitting the hard limit which fails the
build.
Use btrfs with zstd compression for the /usr partition. While for ext2
a hack exists to force read-only mounts by manipulating some bytes of
the filesystem, on btrfs we can use the subvolume read-only flag
instead which also works for the default top level subvolume. However,
it also makes also sense to mount the filesystem with the "norecovery"
mount option to prevent any write attempts even when the "ro" option is
set (not needed when using dm-verity in read-only mode but when
directly mounting without dm-verity). A new subvolumes is not created
because subvolumes don't offer anything special as long as we use the
A/B partition update mechanism (but they could be an alternative for
that). Note that switching to the btrfs on the /usr partition is only
possible when the Flatcar Stable release has all patches in
update-engine and seismograph's rootdev.
---
 build_library/disk_layout.json | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/build_library/disk_layout.json b/build_library/disk_layout.json
index dfd7044b..f10f77d1 100644
--- a/build_library/disk_layout.json
+++ b/build_library/disk_layout.json
@@ -29,7 +29,8 @@
         "type":"flatcar-rootfs",
         "blocks":"2097152",
         "fs_blocks":"260094",
-        "fs_type":"ext2",
+        "fs_type":"btrfs",
+        "fs_compression":"zstd",
         "mount":"/usr",
         "features": ["prioritize", "verity"]
       },
-- 
2.31.1

@jepio
Copy link
Member

jepio commented Jul 27, 2021

For the OEM partition in particular, we should look into how exactly we format the btrfs filesystem. OEM is 128MB, right? With the default options it ends up looking like this:

Overall:
    Device size:                 128.00MiB
    Device allocated:             88.00MiB
    Device unallocated:           40.00MiB
    Device missing:                  0.00B
    Used:                        256.00KiB
    Free (estimated):             48.00MiB      (min: 28.00MiB)
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:                3.25MiB      (used: 0.00B)

              Data    Metadata  System
Id Path       single  DUP       DUP      Unallocated
-- ---------- ------- --------- -------- -----------
 1 /dev/loop3 8.00MiB  64.00MiB 16.00MiB    40.00MiB
-- ---------- ------- --------- -------- -----------
   Total      8.00MiB  32.00MiB  8.00MiB    40.00MiB
   Used         0.00B 112.00KiB 16.00KiB

and with mkfs.btrfs --mixed:

Overall:
    Device size:                 128.00MiB
    Device allocated:             12.00MiB
    Device unallocated:          116.00MiB
    Device missing:                  0.00B
    Used:                         32.00KiB
    Free (estimated):            123.16MiB      (min: 123.16MiB)
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:              832.00KiB      (used: 0.00B)

              Data+Metadata System
Id Path       single        single  Unallocated
-- ---------- ------------- ------- -----------
 1 /dev/loop3       8.00MiB 4.00MiB   116.00MiB
-- ---------- ------------- ------- -----------
   Total            8.00MiB 4.00MiB   116.00MiB
   Used            28.00KiB 4.00KiB

There's a huge difference in free capacity.

@pothos
Copy link
Member Author

pothos commented Jul 27, 2021

Currently it's this here on QEMU:

 $ sudo btrfs filesystem usage /usr/share/oem/
Overall:
    Device size:		 128.00MiB
    Device allocated:		  20.00MiB
    Device unallocated:		 108.00MiB
    Device missing:		     0.00B
    Used:			 128.00KiB
    Free (estimated):		 116.00MiB	(min: 116.00MiB)
    Data ratio:			      1.00
    Metadata ratio:		      1.00
    Global reserve:		   3.25MiB	(used: 0.00B)

Data,single: Size:8.00MiB, Used:0.00B
   /dev/vda6	   8.00MiB

Metadata,single: Size:8.00MiB, Used:112.00KiB
   /dev/vda6	   8.00MiB

System,single: Size:4.00MiB, Used:16.00KiB
   /dev/vda6	   4.00MiB

Unallocated:
   /dev/vda6	 108.00MiB

The mixed mode makes sense, too, will look into it, thanks, but the key difference is single vs DUP and the size of the metadata allocation.

pothos and others added 2 commits July 27, 2021 14:08
The limited /usr and OEM partiton size is a challenge when adding new
packages or updating a package. Since the disk layout can't be changed
for compatibility reasons when updating an existing instance, we can't
simply try out something without ensuring first that enough space is
there by removing something else. This situation can be relaxed by
leveraging btrfs compression. There was some support for btrfs but it
was a bit outdated and didn't allow to configure compression or setting
read-only flags.
Fix the btrfs support, allow to mark the default subvolume as read only
and add a compression variable that allows to select a compression
algorithm. Instead of enabling compression by setting the mount option,
we can set the filesystem attribute which has the benefit that
compression is still used with the default mount options for this (top)
directory and its contents. While for the ext2 /usr partition a hack
existed to force read-only mode by modifying some bytes and checking
these bytes could also be used to know if read-only should be used to
prevent corruption of dm-verity data, we rather check directly whether
dm-verity is active for this partition and mount it read-only (and
with the norecovery option to really prevent any write attempt).
The compression feature of btrfs allows us to store more in the
size-limited /usr and OEM partitions. The size should of course still
be monitored to not bloat the image but more headroom helps to try
things out quickly without hitting the hard limit which fails the
build.
Use btrfs for the OEM partition but with zlib compression because
the outdated GRUB version doesn't support zstd yet.
New subvolumes currently can't be used for the OEM partition as default
subvolumes because GRUB tries to read the grub.cfg from the top
subvolume (at least with our old version). (We could however use
subvolumes for the /usr partition when switching to btrfs if that
makes any sense.)
@jepio
Copy link
Member

jepio commented Jul 27, 2021

Did you force 'single'? If I don't, then the default ends up being dup for me.

@pothos
Copy link
Member Author

pothos commented Jul 27, 2021

Hm, no, I didn't but it's related to the underlying device, I guess we should force it then (in case someone changes the way the image is built).

The defaults already give more space than the ext4 defaults but it's
recommended to use the mixed mode for filesystems smaller than 1-5 GB.
Another aspect is the duplication of metadata and while it currently is
off it's actually related to the underlying block device and could
change as soon as the block device type changes.

Select the mixed mode that uses a merged area for data and metadata
blocks. Also ensure that no metadata duplication gets enabled
automatically.
@pothos pothos merged commit b892315 into main Jul 28, 2021
@pothos pothos deleted the kai/btrfs-usr-oem branch July 28, 2021 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants