Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-signing Fedora kernels with pesign - socket permission problems #1091

Closed
belegdol opened this issue May 29, 2023 · 22 comments
Closed

Self-signing Fedora kernels with pesign - socket permission problems #1091

belegdol opened this issue May 29, 2023 · 22 comments
Labels
enhancement feature request, rfe

Comments

@belegdol
Copy link
Contributor

Short description of the problem

Upon upgrading from mock-3.5-2.fc38 to mock-4.0-1.fc38 signing Fedora kernel RPMs using own certificate no longer works

Output of rpm -q mock

mock-4.0-1.fc38.noarch

Steps to reproduce issue

This is not really trivial unfortunately:

  1. Follow instructions from here https://gist.github.com/chenxiaolong/520914b191f17194a0acdc0e03122e63 with the following exception: add yourserf to pesign group: sudo usermod -a -G pesign julas
  2. Attempt to build a kernel: mock -r fedora-38-x86_64 --disable-plugin=tmpfs --enable-plugin=yum_cache --isolation=simple redhat/rpm/SRPMS/kernel-6.3.4-201.s0ix01.fc38.src.rpm --with baseonly --define='pe_signing_token NSS Certificate DB' --define='pe_signing_cert Julians Secure Boot signing key - Julian Sikorski'
  3. wait about 20 minutes (on an Ryzen 5 5600x)

The following happens:

+ /usr/bin/pesign --certdir /etc/pki/pesign -t 'NSS Certificate DB' -c 'Julians Secure Boot signing key - Julian Sikorski' -s -i arch/x86/boot/bzImage -o vmlinuz.tmp
pesign: Could not open NSS database ("security library: bad database."): Permission denied
error: Bad exit status from /var/tmp/rpm-tmp.tNH11n (%build)
RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.tNH11n (%build)
Child return code was: 1

Any additional notes

In order to fix the problem, rolling back to mock-3.5 was not enough. I had to do:

$ mock -r fedora-38-x86_64 --scrub=all clean

for the signing to work again.

@praiskup
Copy link
Member

pesign: Could not open NSS database ("security library: bad database."): Permission denied

Are you sure this is Mock's fault? Seems like the directory is correctly mounted by Mock -> but
Mock itself doesn't touch the permissions of mounted files. Can you help a bit to diagnose what
is going on?

@praiskup
Copy link
Member

Can you help a bit to diagnose what is going on?

By that I mean -> what permissions are expected? What mock should do to make it work?
You seem to be doing some tweaks to the "${KEY_DIR}/nss_db" database in your nice how-to.

@belegdol
Copy link
Contributor Author

I am happy to help but as I mentioned this is not a trivial setup by any means. I am also not 100 % sure mock is at fault here, but, given that downgrade and rebuilding the bootstrap makes the issue go away, it sure looks this way.
How-to is not mine but I relied on it heavily when setting my system up.
I will try generating a more detailed step-by-step how-to later.
How-to mentions a separate NSS DB which I am not using, I use /etc/pki/pesign. This renders a number of the readme steps unnecessary.

@belegdol
Copy link
Contributor Author

Basically the requirement is for mock tree to use host's pesign. Some info can be seen in issue #140.

@praiskup
Copy link
Member

Ok, this feels a bit weird. There is a change in the mountpoint management,
but this was about /proc, /dev and bootstrap. From what I see, nothing changed
with the set of user mountpoints.

In order to fix the problem, rolling back to mock-3.5 was not enough. I had to
do:
$ mock -r fedora-38-x86_64 --scrub=all clean

Yould you mind testing an upgrade of mock to v4.0 again, and run this
--scrub=all command first?

@belegdol
Copy link
Contributor Author

After boot I am starting pesign and unlocking the cert DB:

$ sudo systemctl start pesign
$ sudo pesign-client -u

Attempting to build with mock-3.5 works:

$ rpm -q mock mock-core-configs
mock-3.5-2.fc38.noarch
mock-core-configs-38.3-1.fc38.noarch
$ mock -r fedora-38-x86_64 --scrub=all clean
$ mock -r fedora-38-x86_64 --disable-plugin=tmpfs --enable-plugin=yum_cache  --isolation=simple redhat/rpm/SRPMS/kernel-6.3.4-201.s0ix01.fc38.src.rpm --with baseonly --define='pe_signing_token NSS Certificate DB' --define='pe_signing_cert Julians Secure Boot signing key - Julian Sikorski'

mock-4.0 does not work even if everything is scrubbed:

$ sudo dnf update
$ rpm -q mock mock-core-configs
mock-4.0-1.fc38.noarch
mock-core-configs-38.5-1.fc38.noarch
$ mock -r fedora-38-x86_64 --scrub=all clean
$ mock -r fedora-38-x86_64 --disable-plugin=tmpfs --enable-plugin=yum_cache  --isolation=simple redhat/rpm/SRPMS/kernel-6.3.4-201.s0ix01.fc38.src.rpm --with baseonly --define='pe_signing_token NSS Certificate DB' --define='pe_signing_cert Julians Secure Boot signing key - Julian Sikorski'

@belegdol
Copy link
Contributor Author

belegdol commented May 30, 2023

There is something really fishy going on: it turns out I actually have to reboot my system after downgrading mock or else the permission denied error persists. It is as if mock-4.0 does something to the /var/run/pesign bind that damages it.

@belegdol
Copy link
Contributor Author

belegdol commented May 30, 2023

I managed to dig a bit deeper. shim offers a much better test case as it builds in about a minute as opposed to 20 minutes:

mock -r fedora-38-x86_64 --enable-plugin=yum_cache  --isolation=simple shim-15.6-2.src.rpm --define='pe_signing_token NSS Certificate DB' --define='pe_signing_cert Julians Secure Boot signing key - Julian Sikorski'

Invalid signature error indicates success.
Secondly, reboot is not needed. I did the following to restore the working status:

$ sudo systemctl stop pesign
$ mock -r fedora-38-x86_64 --scrub=bootstrap clean
$ sudo systemctl start pesign
$ sudo pesign-client -u

@belegdol
Copy link
Contributor Author

I was able to perform a bisect, 22c8fdc is the first bad commit:

$ git bisect bad
22c8fdcbd0f1de50b942a5792a0fa6e88c620946 is the first bad commit
commit 22c8fdcbd0f1de50b942a5792a0fa6e88c620946
Author: Pavel Raiskup <praiskup@redhat.com>
Date:   Fri May 12 18:22:16 2023 +0200

    bootstrap: delay the buildroot-in-bootstrap recursive mount
    
    We need to make sure that all buildroot mountpoints that need to be
    visible from within the bootstrap chroot are mounted first, before we
    do the "grand" buildroot-in-bootstrap recursive mountpoint.  Then all
    the sub-mounts are visible on both places.
    
    Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2166028
    Closes: #1040

 mock/py/mock.py                      |  4 ++--
 mock/py/mockbuild/buildroot.py       |  4 +++-
 mock/py/mockbuild/mounts.py          | 22 ++++++++++++++++++++++
 mock/py/mockbuild/package_manager.py |  5 +++++
 mock/py/mockbuild/uid.py             | 10 ++++++++++
 5 files changed, 42 insertions(+), 3 deletions(-)

@belegdol
Copy link
Contributor Author

belegdol commented May 30, 2023

Somewhat untested reproduction instructions as I have my system already set up:

  1. generate signing cert: efikeygen -d /etc/pki/pesign -S -k -c 'CN=Your Name Key' -n 'Custom Secureboot'
  2. start pesign: sudo systemctl start pesign
  3. add oneself to pesign group: sudo usermod -a -G pesign julas
  4. set selinux to permissive for pesign: sudo semanage permissive -a pesign_t
  5. add the following line to mock config: config_opts['plugin_conf']['bind_mount_opts']['dirs'].append(('/var/run/pesign', '/var/run/pesign'))
  6. clone shim: fedpkg clone shim
  7. make an srpm: fedpkg srpm
  8. attempt a rebuild: mock -r fedora-38-x86_64 --enable-plugin=yum_cache --isolation=simple shim-15.6-2.src.rpm --define='pe_signing_token NSS Certificate DB' --define='pe_signing_cert Julians Secure Boot signing key - Julian Sikorski'

praiskup added a commit to praiskup/mock that referenced this issue May 31, 2023
Previously, the bind_mount plugin relied on pre-existing directory tree,
typically created by the chroot directory tree by the _PackageManager
installation.  So it was quite easy to mount stuff like
`/var/run/socket` because `/var/run` always existed.  Problem happened
with files like `/var/run/subdirectory/socket`.

Relates: rpm-software-management#1091
@praiskup
Copy link
Member

Thank you for all the details! I was able to reproduce it.

With Mock 3.5 and bootstrap ON, the "user mountpoins" (the /var/run/pesign)
were not available during installing BuildRequires. With bootstrap OFF, they
were available even with v3.5. This was one of Mock inconsistencies fixed by
the release v4.0.

What is happening here is that shim does BuildRequires: pesign, and when
Mock installs BuildRequires, /var/run/pesign directory is "extracted" from the
pesign.rpm package payload, and the directory ownership re-set to
pesign:pesign as defined in chroot. The pesign.rpm though also has a
%pre scriptlet which pre-creates a pesign user/group as 999:999 in
chroot
.

On my testing system, after all the pesign setup needed by the cited howto
above, /var/run/pesign is owned by 993:994 (pesign:pesign on host). But the
pesign.rpm installed into the buildroot by Mock re-sets it to 999:999
(which is btw systemd-oom on the host, but it doesn't matter, it just explains
why reboot helped you, you can try chown pesign:pesign /var/run/pesign instead).

The whole rpmbuild in Mock is executed under the normal user, mockbuild.
This user is normally able to read /var/run/pesign directory on host. Though
if pesign.rpm in chroot re-sets the group ownership to 999, the mockbuild
user can't read it (step into the dir) anymore.

As mentioned at the beginning, if used with --no-bootstrap-chroot, even Mock 3.5
fails. The new v4.0 current behavior seems cleaner and expected. Or the other
way around, the fact it worked with --use-bootstrap-chroot before was a pure
coincidence in Mock v3.5.

We should find a better way to work with Mock+pesign; perhaps creating the
socket file directly in /var/run, like /var/run/mock-pesign.socket? That one
wouldn't be overridden by pesign.rpm in chroot.

@belegdol
Copy link
Contributor Author

Thank you for the detailed response. I am happy to hear that you were able to reproduce despite my somewhat vague instructions.
It is quite scary to hear that it worked by accident before. While there are probably dozens of us building self-signed kernels, isn't the functionality also used for signing in koji? I believe smartcards are used instead but it still all goes via pesign to the best of my understanding.
Regarding finding a better way, I am afraid I am out of my depth here. @frozencemetery, what do you think?

@praiskup
Copy link
Member

Koji folks probably do the same thing, but they have a way around using ACLs.

@praiskup
Copy link
Member

In the same bkernel specific config they turn bootstrap off btw, not sure why exactly. But this is exactly the coincidence I meant - without bootstrap, you'd face the very same issues even before (for normal packages, Koji started using bootstrap quite recently).

Can you experiment with this work-around locally?

builduser=praiskup  # change accordingly
for entity in "u:$builduser" g:pesign; do
    for options in "-d" ""; do
        setfacl $options -R -m "$entity:rwx" /var/run/pesign
    done
done

@belegdol
Copy link
Contributor Author

I can see the changing permissions, on my machine pesign is 974:969 on the host but 999:999 in the chroot. Initialising a fresh chroot has 974:969 permissions, installing pesign in chroot changes it to 999:999.
Regarding the workaround: where would you recommend putting it? Can a bash script be triggered from mock config?

xsuchy pushed a commit that referenced this issue Jun 1, 2023
Previously, the bind_mount plugin relied on pre-existing directory tree,
typically created by the chroot directory tree by the _PackageManager
installation.  So it was quite easy to mount stuff like
`/var/run/socket` because `/var/run` always existed.  Problem happened
with files like `/var/run/subdirectory/socket`.

Relates: #1091
@praiskup
Copy link
Member

praiskup commented Jun 2, 2023

Regarding the workaround: where would you recommend putting it?

That should be executed only once on host; as one of the needed steps to configure Pesign for Mock.

@belegdol
Copy link
Contributor Author

belegdol commented Jun 2, 2023

Thanks! With the workaround I am getting the "invalid signature" with shim suggesting it is working. This got me thinking: I believe the second reason why this was working before is this:
rhboot/pesign@d8a8c25
Pesign used to set up acls as well but this was dropped in version 116.

@belegdol
Copy link
Contributor Author

belegdol commented Jun 6, 2023

It appears that the workaround from #1091 (comment) needs to be run every boot, not just once. I guess this is why it used to be triggered by the systemd unit.

@praiskup
Copy link
Member

praiskup commented Jun 6, 2023

Yes, every boot (because tmpfiles systemd daemon re-creates stuff in /run).

So I'm curious what to do about this issue. Should we close?

@belegdol
Copy link
Contributor Author

belegdol commented Jun 6, 2023

Well, on one hand I know how to get this working now. So as far as I am concerned, the issue is solved.
On the other hand, requiring users to set up ACLs every boot adds an additional step to what is already a relatively complex set-up procedure. Do you think this can be made to work out of the box? Or at least with a one-time set-up?

@praiskup
Copy link
Member

praiskup commented Jun 8, 2023

To repeat the problem - currently, we bind mount a host directory down
to the buildroot, but the directory can be arbitrarily overridden by the buildroot installation process.
Plus there's a UID/GID 'pesign' mismatch.

  • We could fix pesign.rpm to be more careful to not override things; but you
    whould have to fix pesign in all the chroots.

  • We can start bind-mounting the socket file into a different directory, which is
    not changed by dnf install pesign. Does pesign support something like this?

  • we teach Mock to pre-create pesign user in-chroot with pre-set UID:GID pair,
    or to copy the IDS from host

  • we can write Mock plugin specifically for pesign that would always
    do some permission/ACL hacks

Any other ideas? None of those ^^^ makes me happy, and help with the
implementation would be nice.

@praiskup praiskup changed the title mock-4.0 appears to break self-signing of Fedora kernels Self-signing Fedora kernels with pesign - socket permission problems Jun 8, 2023
@praiskup
Copy link
Member

praiskup commented Jun 9, 2023

we teach Mock to pre-create pesign user in-chroot with pre-set UID:GID pair, or to copy the IDS from host

There's #1103 doing this. Would you mind taking a look and test?

@praiskup praiskup added the enhancement feature request, rfe label Jun 9, 2023
praiskup added a commit to praiskup/mock that referenced this issue Jun 21, 2023
praiskup added a commit to praiskup/mock that referenced this issue Jun 21, 2023
praiskup added a commit to praiskup/mock that referenced this issue Jun 26, 2023
praiskup added a commit to praiskup/mock that referenced this issue Jun 26, 2023
praiskup added a commit to praiskup/mock that referenced this issue Jun 26, 2023
praiskup added a commit to praiskup/mock that referenced this issue Jun 29, 2023
praiskup added a commit to praiskup/mock that referenced this issue Jun 29, 2023
praiskup added a commit to praiskup/mock that referenced this issue Jun 29, 2023
praiskup added a commit to praiskup/mock that referenced this issue Jul 19, 2023
@xsuchy xsuchy closed this as completed in 01e8cc4 Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement feature request, rfe
Projects
None yet
Development

No branches or pull requests

2 participants