Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kola: add test for TPM- and Tang-based disk encryption #493

Merged
merged 1 commit into from
Mar 14, 2024

Conversation

simoncampion
Copy link

@simoncampion simoncampion commented Jan 10, 2024

Add test for TPM- and Tang-based disk encryption

This PR adds four new tests, for root and non-root encryption using TPM and Tang. It accompanies PRs in the bootengine and scripts repositories.

How to use

Run the new tests, e.g. with kola run -k -b cl -p qemu-unpriv --board amd64-usr --qemu-image $SOME_IMAGE cl.tpm.* cl.tang.* in a Docker container. Note that the Docker image needs to be rebuilt because of the new dependency on swtpm for the TPM emulation with QEMU. If you run the tests natively, you'll also need to ensure that swtpm is in $PATH.

The four tests should fail for Flatcar images without TPM- and Tang-based disk encryption support and succeed for images with such support.

Testing done

I ran the four new tests in Docker. On images built from the PR in the scripts repository, the nonroot tests pass. The root tests fail because the systemd-cryptsetup@rootencrypted.service fails on reboot, although the disks are decrypted properly. This is because the root disk has to be decrypted in the initramfs, causing the subsequent systemd-cryptsetup unit to fail. I will wait for the discussion on how to best fix this in the PR in the scripts repository.

root@saturn:/# kola run  -k -b cl -p qemu-unpriv --board amd64-usr --qemu-image /mnt/images/developer-3844.0.0+nightly-20240109-2100-22-g4dbe3e0505-a1_flatcar_production_qemu_image.img cl.tpm.* cl.tang.*                                                                                           
=== RUN   cl.tpm.root                                                                                                                              
=== RUN   cl.tang.nonroot                                                                                                                          
=== RUN   cl.tpm.nonroot                                                                                                                           
=== RUN   cl.tang.root                                                                                                                             
--- FAIL: cl.tpm.root (48.45s)                                                                                                                     
        tpm.go:176: could not reboot machine: machine "5cb8b0b9-059b-4ed9-8255-9e30151e0b4f" failed basic checks: some systemd units failed:       
● systemd-cryptsetup@rootencrypted.service loaded failed failed Cryptography Setup for rootencrypted                                               
status:                                                                                                                                                                                                                                                                                               
journal:-- No entries --                                                                                                                           
--- FAIL: cl.tang.root (48.50s)                                                                                                                    
        tang.go:217: could not reboot machine: machine "9f1b1acd-7f49-4296-ab38-5244784765f4" failed basic checks: some systemd units failed:                                                                                                                                                         
● systemd-cryptsetup@rootencrypted.service loaded failed failed Cryptography Setup for rootencrypted                                                                                                                                                                                                  
status:                                                                                                                                                                                                                                                                                               
journal:-- No entries --                                                                                                                                                                                                                                                                              
--- PASS: cl.tpm.nonroot (48.63s)                                                                                                                  
--- PASS: cl.tang.nonroot (48.73s)                                                                                                                 
FAIL, output in _kola_temp/qemu-unpriv-2024-01-10-2004-294                                                                                         
harness: test suite failed  
  • Changelog entries added in the respective changelog/ directory (user-facing change, bug fix, security fix, update)
  • Inspected CI output for image differences: /boot and /usr size, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.

// As advised in the Flatcar documentation, we then remove the ROOT label from the existing
// root partition, which is vda9 in the QEMU disk image.
IgnitionConfigRootTang = `{
"ignition": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a Butane configuration for better reading - but we need to drop the limitation of Clevis on the Butane side too: https://github.com/coreos/butane/blob/e859cb40d7c1d0c24e38311b2d51c4eb0a91ec88/config/flatcar/v1_2_exp/translate.go#L27

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

My thinking was that it would be best to first merge clevis support and after that open a PR to update the Butane spec, to avoid confusion. But once the Butane spec is updated, I'd be happy to bump the butane Go dependency in mantle and rewrite the config here. We can either hold off the merge of this PR until then, or I can open a second PR later. I don't have a preference either way; let me know what option you think is best.

"units": [{
"name": "remove-root-label.service",
"enabled": true,
"contents": "[Service]\nType=oneshot\nExecStart=wipefs -a /dev/vda9\n[Install]\nWantedBy=multi-user.target"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it possible to reformat this partition with Ignition? I would prefer that because otherwise it's not clear which rootfs is used in the initrd in the first run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some filesystems entry like that could work, here in butane yaml:

variant: flatcar
version: 1.0.0
storage:
  filesystems:
  - device: /dev/disk/by-partlabel/ROOT
    wipe_filesystem: true
    format: none

Ignition json:

{
  "ignition": {
    "version": "3.3.0"
  },
  "storage": {
    "filesystems": [
      {
        "device": "/dev/disk/by-partlabel/ROOT",
        "format": "none",
        "wipeFilesystem": true
      }
    ]
  }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that works. I changed it as you suggested.

// between this test and how the network is setup for the VMs, and also it might bind Tang to an interface with a public IP.
// An alternative approach would be to add another TAP interface to the bridge and let the Tang server bind there, but that would
// require the Tang setup to happen outside of these tests and introducing more complexity in different parts of the code base.
// I'll decide whether to rewrite this or leave it as it is based on feedback by reviewers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Platforms: []string{"qemu"}, that seems ok

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I removed the TODO comment.

"units": [{
"name": "remove-root-label.service",
"enabled": true,
"contents": "[Service]\nType=oneshot\nExecStart=wipefs -a /dev/vda9\n[Install]\nWantedBy=multi-user.target"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same comment as above)

@simoncampion
Copy link
Author

I added the kernel arguments to the tests that are now required to enable clevis unlock in the initramfs (see the thread here).

I also solved the issue with tests failing because systemd-cryptsetup@rootencrypted.service in late userspace fails as the disk is already unlocked from early userspace by masking systemd-cryptsetup@rootencrypted.service .

kola/tests/misc/tpm.go Outdated Show resolved Hide resolved
@pothos
Copy link
Member

pothos commented Mar 14, 2024

Oh, the Go compiler is not yet happy about net.FlagRunning not being defined

@simoncampion
Copy link
Author

Oh, the Go compiler is not yet happy about net.FlagRunning not being defined

That symbol was introduced in Go 1.20, but the pipeline used Go 1.19. I suppose the pipeline sees that the go.mod file only requires Go 1.19 and then runs that Go version. This issue does not arise when testing locally with the dockerized build because the Dockerfile uses docker.io/amd64/golang:1.21-bookworm as its base.

I bumped the Go version in go.mod to Go 1.20. If that's undesirable, I could alternatively rewrite the code to be Go 1.19-compatible.

@pothos pothos merged commit 9cf5351 into flatcar:flatcar-master Mar 14, 2024
@pothos
Copy link
Member

pothos commented Mar 14, 2024

Thanks, that should work then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants