Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Linux Kernel for main from 6.1.66 to 6.6.7 #1444

Merged
merged 1 commit into from
Dec 19, 2023

Conversation

ader1990
Copy link
Contributor

@ader1990 ader1990 commented Nov 29, 2023

To upgrade, the following changes were required:

  • added Changelog
  • switched to Linux kernel 6.6.7 sources
  • reverted pahole flags - the system halts otherwise with
    Linux kernel / initrd modules not found
  • removed the source symlink deletion, as it the symlink
    is no longer generated
  • updated or removed Linux kernel configs:
    • CONFIG_AUTOFS4_FS -> renamed to AUTOFS_FS
    • CONFIG_IXGB -> renamed to CONFIG_IXGB
    • CONFIG_EDAC_I5000 -> CONFIG_BROKEN
    • CONFIG_EFI_GENERIC_STUB_INITRD_CMDLINE_LOADER -> removed
    • CONFIG_IP_NF_TARGET_CLUSTERIP -> removed
    • CONFIG_MICROCODE_AMD -> removed
    • CONFIG_NET_SCH_CBQ -> removed
    • CONFIG_NET_SCH_DSMARK -> removed
    • CONFIG_NFT_OBJREF -> removed

Fixes: flatcar/Flatcar#1266

@ader1990
Copy link
Contributor Author

ader1990 commented Nov 29, 2023

When I run ebuild-amd64-usr ../third_party/coreos-overlay/sys-kernel/coreos-modules/coreos-modules-6.6.3.ebuild package, I get this error:

# XZ      /build/amd64-usr/var/tmp/portage/sys-kernel/coreos-modules-6.6.3/image/usr/lib/debug/usr/lib/modules/6.6.3-flatcar/kernel/virt/lib/irqbypass.ko.xz
  xz --check=crc32 --lzma2=dict=1MiB -f /build/amd64-usr/var/tmp/portage/sys-kernel/coreos-modules-6.6.3/image/usr/lib/debug/usr/lib/modules/6.6.3-flatcar/kernel/virt/lib/irqbypass.ko
# DEPMOD  /build/amd64-usr/var/tmp/portage/sys-kernel/coreos-modules-6.6.3/image/usr/lib/debug/usr/lib/modules/6.6.3-flatcar
  ../source/scripts/depmod.sh 6.6.3-flatcar
make[1]: Leaving directory '/build/amd64-usr/var/tmp/portage/sys-kernel/coreos-modules-6.6.3/work/coreos-modules-6.6.3/build'
make: Leaving directory '/build/amd64-usr/var/tmp/portage/sys-kernel/coreos-modules-6.6.3/work/coreos-modules-6.6.3/source'
rm: cannot remove '/build/amd64-usr/var/tmp/portage/sys-kernel/coreos-modules-6.6.3/image/usr/lib/debug/usr/lib/modules/6.6.3-flatcar/source': No such file or directory
 * ERROR: sys-kernel/coreos-modules-6.6.3::coreos failed (install phase):
 *   (no error message)
 *
 * Call stack:
 *     ebuild.sh, line 136:  Called src_install
 *   environment, line 2449:  Called die
 * The specific snippet of code:
 *       rm "${D}/usr/lib/debug/usr/lib/modules/${KV_FULL}/"{source,build} || die;
 *
 * If you need support, post the output of `emerge --info '=sys-kernel/coreos-modules-6.6.3::coreos'`,
 * the complete build log and the output of `emerge -pqv '=sys-kernel/coreos-modules-6.6.3::coreos'`.
 * The complete build log is located at '/build/amd64-usr/var/log/portage/sys-kernel:coreos-modules-6.6.3:20231129-205117.log'.
 * For convenience, a symlink to the build log is located at '/build/amd64-usr/var/tmp/portage/sys-kernel/coreos-modules-6.6.3/temp/build.log'.
 * The ebuild environment file is located at '/build/amd64-usr/var/tmp/portage/sys-kernel/coreos-modules-6.6.3/temp/environment'.
 * Working directory: '/build/amd64-usr/var/tmp/portage/sys-kernel/coreos-modules-6.6.3/work/coreos-modules-6.6.3'
 * S: '/build/amd64-usr/var/tmp/portage/sys-kernel/coreos-modules-6.6.3/work/coreos-modules-6.6.3'

Unfortunately, the logs are not so helpful, as the log says the same thing. This is the only error message from the actual logfile.

@ader1990 ader1990 changed the title Switch to Linux kernel 6.6 Upgrade Linux Kernel for main from 6.1.63 to 6.6.3 Nov 29, 2023
@krnowak
Copy link
Member

krnowak commented Nov 30, 2023

You probably could something like ls -la "${D}/usr/lib/debug/usr/lib/modules/${KV_FULL}/" to src_install and see what are the contents. Or if you are building locally, you could check it yourself - it should still be there.

Copy link

github-actions bot commented Nov 30, 2023

Build action triggered: https://github.com/flatcar/scripts/actions/runs/7260296103

@ader1990
Copy link
Contributor Author

You probably could something like ls -la "${D}/usr/lib/debug/usr/lib/modules/${KV_FULL}/" to src_install and see what are the contents. Or if you are building locally, you could check it yourself - it should still be there.

I have tried this patch ader1990@fe64043, but seems to have the same issue, with the symlink named source not being present there. I will try again to get this unblocked.

Thanks.

@ader1990 ader1990 force-pushed the switch_to_kernel_6_6 branch from 0319a3c to d1ac96c Compare December 4, 2023 15:39
@ader1990
Copy link
Contributor Author

ader1990 commented Dec 4, 2023

@krnowak I have a succesful build:

core@localhost ~ $ uname -a
Linux localhost 6.6.3-flatcar

I need to iron out the following:

  • test arm64 locally (builds fine, but did not run the qemu-vm)
  • update .3 to .4
  • make sure the config flags changes are correct from a functional perspective
  • pahole patch needs to be made smaller (to not remove all the newest bits, just the -j flag)
  • figure out why the source symlink does not exist anymore

Can you please trigger the actions to see if there are some issues with the functional tests?

Thanks.

@ader1990 ader1990 force-pushed the switch_to_kernel_6_6 branch from d1ac96c to 45284f5 Compare December 4, 2023 15:44
@ader1990 ader1990 marked this pull request as ready for review December 4, 2023 15:44
@ader1990
Copy link
Contributor Author

ader1990 commented Dec 5, 2023

There is an upstream issue related to the bcc warnings for Linux kernel 6.6 -> iovisor/bcc#4251. On Linux kernel 6.5, the warnings are not present.

2023-12-04T19:31:37.2287547Z --- FAIL: bpf.execsnoop (55.58s)
2023-12-04T19:31:37.2289460Z         cluster.go:125: Unable to find image 'quay.io/iovisor/bcc:latest' locally
2023-12-04T19:31:37.2290290Z         cluster.go:125: latest: Pulling from iovisor/bcc
2023-12-04T19:31:37.2290861Z         cluster.go:125: 68e7bb398b9f: Pulling fs layer
2023-12-04T19:31:37.2291377Z         cluster.go:125: aa73686fdcd4: Pulling fs layer
2023-12-04T19:31:37.2291905Z         cluster.go:125: 7e88b7dfb0a8: Pulling fs layer
2023-12-04T19:31:37.2294635Z         cluster.go:125: aa73686fdcd4: Verifying Checksum
2023-12-04T19:31:37.2295190Z         cluster.go:125: aa73686fdcd4: Download complete
2023-12-04T19:31:37.2295724Z         cluster.go:125: 68e7bb398b9f: Verifying Checksum
2023-12-04T19:31:37.2296255Z         cluster.go:125: 68e7bb398b9f: Download complete
2023-12-04T19:31:37.2296788Z         cluster.go:125: 7e88b7dfb0a8: Verifying Checksum
2023-12-04T19:31:37.2297341Z         cluster.go:125: 7e88b7dfb0a8: Download complete
2023-12-04T19:31:37.2297854Z         cluster.go:125: 68e7bb398b9f: Pull complete
2023-12-04T19:31:37.2298347Z         cluster.go:125: aa73686fdcd4: Pull complete
2023-12-04T19:31:37.2298831Z         cluster.go:125: 7e88b7dfb0a8: Pull complete
2023-12-04T19:31:37.2299580Z         cluster.go:125: Digest: sha256:63f8262abfa9e8fc531f23c960b27736c75d1f13fff20d9c34d3387391232dd9
2023-12-04T19:31:37.2300480Z         cluster.go:125: Status: Downloaded newer image for quay.io/iovisor/bcc:latest
2023-12-04T19:31:37.2301191Z         cluster.go:125: In file included from /virtual/main.c:14:
2023-12-04T19:31:37.2302085Z         cluster.go:125: In file included from include/uapi/linux/ptrace.h:183:
2023-12-04T19:31:37.2302830Z         cluster.go:125: In file included from arch/x86/include/asm/ptrace.h:5:
2023-12-04T19:31:37.2303568Z         cluster.go:125: In file included from arch/x86/include/asm/segment.h:7:
2023-12-04T19:31:37.2304900Z         cluster.go:125: arch/x86/include/asm/ibt.h:77:8: warning: 'nocf_check' attribute ignored; use -fcf-protection to enable the attribute [-Wignored-attributes]
2023-12-04T19:31:37.2306036Z         cluster.go:125: extern __noendbr u64 ibt_save(bool disable);
2023-12-04T19:31:37.2306566Z         cluster.go:125:        ^
2023-12-04T19:31:37.2307279Z         cluster.go:125: arch/x86/include/asm/ibt.h:32:34: note: expanded from macro '__noendbr'
2023-12-04T19:31:37.2308091Z         cluster.go:125: #define __noendbr       __attribute__((nocf_check))
2023-12-04T19:31:37.2308719Z         cluster.go:125:                                        ^
2023-12-04T19:31:37.2309933Z         cluster.go:125: arch/x86/include/asm/ibt.h:78:8: warning: 'nocf_check' attribute ignored; use -fcf-protection to enable the attribute [-Wignored-attributes]
2023-12-04T19:31:37.2311061Z         cluster.go:125: extern __noendbr void ibt_restore(u64 save);
2023-12-04T19:31:37.2311588Z         cluster.go:125:        ^
2023-12-04T19:31:37.2312300Z         cluster.go:125: arch/x86/include/asm/ibt.h:32:34: note: expanded from macro '__noendbr'
2023-12-04T19:31:37.2313113Z         cluster.go:125: #define __noendbr       __attribute__((nocf_check))
2023-12-04T19:31:37.2313737Z         cluster.go:125:                                        ^
2023-12-04T19:31:37.2314253Z         cluster.go:125: 2 warnings generated.
2023-12-04T19:31:37.2315181Z         bpf.go:123: Unable to find 'docker ps' log lines in execsnoop logs: Transient error: stream should not log to 'stderr'

@ader1990
Copy link
Contributor Author

ader1990 commented Dec 5, 2023

There is an upstream issue related to the bcc warnings for Linux kernel 6.6 -> iovisor/bcc#4251. On Linux kernel 6.5, the warnings are not present.

2023-12-04T19:31:37.2287547Z --- FAIL: bpf.execsnoop (55.58s)
2023-12-04T19:31:37.2289460Z         cluster.go:125: Unable to find image 'quay.io/iovisor/bcc:latest' locally
2023-12-04T19:31:37.2290290Z         cluster.go:125: latest: Pulling from iovisor/bcc
2023-12-04T19:31:37.2290861Z         cluster.go:125: 68e7bb398b9f: Pulling fs layer
2023-12-04T19:31:37.2291377Z         cluster.go:125: aa73686fdcd4: Pulling fs layer
2023-12-04T19:31:37.2291905Z         cluster.go:125: 7e88b7dfb0a8: Pulling fs layer
2023-12-04T19:31:37.2294635Z         cluster.go:125: aa73686fdcd4: Verifying Checksum
2023-12-04T19:31:37.2295190Z         cluster.go:125: aa73686fdcd4: Download complete
2023-12-04T19:31:37.2295724Z         cluster.go:125: 68e7bb398b9f: Verifying Checksum
2023-12-04T19:31:37.2296255Z         cluster.go:125: 68e7bb398b9f: Download complete
2023-12-04T19:31:37.2296788Z         cluster.go:125: 7e88b7dfb0a8: Verifying Checksum
2023-12-04T19:31:37.2297341Z         cluster.go:125: 7e88b7dfb0a8: Download complete
2023-12-04T19:31:37.2297854Z         cluster.go:125: 68e7bb398b9f: Pull complete
2023-12-04T19:31:37.2298347Z         cluster.go:125: aa73686fdcd4: Pull complete
2023-12-04T19:31:37.2298831Z         cluster.go:125: 7e88b7dfb0a8: Pull complete
2023-12-04T19:31:37.2299580Z         cluster.go:125: Digest: sha256:63f8262abfa9e8fc531f23c960b27736c75d1f13fff20d9c34d3387391232dd9
2023-12-04T19:31:37.2300480Z         cluster.go:125: Status: Downloaded newer image for quay.io/iovisor/bcc:latest
2023-12-04T19:31:37.2301191Z         cluster.go:125: In file included from /virtual/main.c:14:
2023-12-04T19:31:37.2302085Z         cluster.go:125: In file included from include/uapi/linux/ptrace.h:183:
2023-12-04T19:31:37.2302830Z         cluster.go:125: In file included from arch/x86/include/asm/ptrace.h:5:
2023-12-04T19:31:37.2303568Z         cluster.go:125: In file included from arch/x86/include/asm/segment.h:7:
2023-12-04T19:31:37.2304900Z         cluster.go:125: arch/x86/include/asm/ibt.h:77:8: warning: 'nocf_check' attribute ignored; use -fcf-protection to enable the attribute [-Wignored-attributes]
2023-12-04T19:31:37.2306036Z         cluster.go:125: extern __noendbr u64 ibt_save(bool disable);
2023-12-04T19:31:37.2306566Z         cluster.go:125:        ^
2023-12-04T19:31:37.2307279Z         cluster.go:125: arch/x86/include/asm/ibt.h:32:34: note: expanded from macro '__noendbr'
2023-12-04T19:31:37.2308091Z         cluster.go:125: #define __noendbr       __attribute__((nocf_check))
2023-12-04T19:31:37.2308719Z         cluster.go:125:                                        ^
2023-12-04T19:31:37.2309933Z         cluster.go:125: arch/x86/include/asm/ibt.h:78:8: warning: 'nocf_check' attribute ignored; use -fcf-protection to enable the attribute [-Wignored-attributes]
2023-12-04T19:31:37.2311061Z         cluster.go:125: extern __noendbr void ibt_restore(u64 save);
2023-12-04T19:31:37.2311588Z         cluster.go:125:        ^
2023-12-04T19:31:37.2312300Z         cluster.go:125: arch/x86/include/asm/ibt.h:32:34: note: expanded from macro '__noendbr'
2023-12-04T19:31:37.2313113Z         cluster.go:125: #define __noendbr       __attribute__((nocf_check))
2023-12-04T19:31:37.2313737Z         cluster.go:125:                                        ^
2023-12-04T19:31:37.2314253Z         cluster.go:125: 2 warnings generated.
2023-12-04T19:31:37.2315181Z         bpf.go:123: Unable to find 'docker ps' log lines in execsnoop logs: Transient error: stream should not log to 'stderr'

The bcc docker image is 2+ years old - quay.io/iovisor/bcc:latest. And it seems there is no upstream docker image maintained, I will look for an alternative / open an issue upstream to ask for an alternative.

@ader1990
Copy link
Contributor Author

ader1990 commented Dec 5, 2023

@krnowak I have a succesful build:

core@localhost ~ $ uname -a
Linux localhost 6.6.3-flatcar

I need to iron out the following:

  • test arm64 locally (builds fine, but did not run the qemu-vm)
  • update .3 to .4
  • make sure the config flags changes are correct from a functional perspective
  • pahole patch needs to be made smaller (to not remove all the newest bits, just the -j flag)
  • figure out why the source symlink does not exist anymore

Can you please trigger the actions to see if there are some issues with the functional tests?

Thanks.

The source symlink has been removed by this commit. torvalds/linux@d8131c2

There are two options here: to revert the commit (at least 3 commits from upstream linux kernel, as there were two more commits on top of the code), or to change the Flatcar kernel build .ebuild/.eclass files. @krnowak is there a preffered way? Currently, I could build the kernel file using a .patch file on top of the kernel sources that reverts those 3 commits.

@ader1990
Copy link
Contributor Author

ader1990 commented Dec 5, 2023

How to fix the bcc warnings that break the testing - create a new Docker image starting from the bcc Docker build image:

git clone https://github.com/iovisor/bcc
cd bcc

docker build --tag ader1990:bcc .

docker run --privileged -it -v /lib/modules:/lib/modules ader1990:bcc /usr/share/bcc/tools/execsnoop -n docker -l ps

Content of the Dockerfile:

FROM ghcr.io/iovisor/bcc:ubuntu-20.04 as builder

ARG BUILD_TYPE=release
ARG DEBIAN_FRONTEND=noninteractive

COPY ./ /root/bcc

WORKDIR /root/bcc

RUN cd /root/bcc && mkdir build && cd build &&LLVM_ROOT=/usr/lib/llvm-15 cmake -DCMAKE_BUILD_TYPE=Releases -DENABLE_LLVM_NATIVECODEGEN=ON -DPYTHON_CMD=python3 ..
RUN cd /root/bcc/build && make -j16
RUN cd /root/bcc/build && make install
RUN cd /root/bcc/build/src/python/ && make -j16 && make install
RUN rm -rf /root/bcc/build

@pothos
Copy link
Member

pothos commented Dec 15, 2023

Needs a rebase.

I did a local build and tested the image, seems to work :) We can also do a full test run for all platforms to see if any hardware configs require changes. There is also a detailed image change report which we can review for potentially missing kernel modules.

@pothos
Copy link
Member

pothos commented Dec 15, 2023

The mantle kola changes are in, after a rebase the tests should pass.

@ader1990 ader1990 force-pushed the switch_to_kernel_6_6 branch from 160b33e to c658667 Compare December 15, 2023 21:49
@ader1990
Copy link
Contributor Author

The mantle kola changes are in, after a rebase the tests should pass.

Hello, I made the rebase. To dos: update the kernel build process to get rid of the currently required Makefile revert patches and also bump the minor to the latest one. If you can start the tests, that would be great, as the changes still needed are more cosmetic in nature.

Thank you.

@ader1990
Copy link
Contributor Author

The mantle kola changes are in, after a rebase the tests should pass.

I have also retested the code locally and there might be an issue with the kernel headers still being 6.1. Can you please trigger the workflow to see if the issue also occurs with the github actions?

Thank you.

@ader1990
Copy link
Contributor Author

ader1990 commented Dec 18, 2023

The mantle kola changes are in, after a rebase the tests should pass.

I have also retested the code locally and there might be an issue with the kernel headers still being 6.1. Can you please trigger the workflow to see if the issue also occurs with the github actions?

Thank you.

I see that the kernel-headers installed are 6.1 when I do a clean env ./build_packages && ./build_image.
After that, if I do a manual emerge-amd64-usr sys-kernel/kernel-headers , 6.6 gets packages and if I do a subsequent build_packages, 6.1 gets removed from /build/amd64-usr/packages/sys-kernel/. Maybe this is a problem of the SDK I use and after this PR gets merged and the new SDK appears, this problem won t be inherited?

@pothos
Copy link
Member

pothos commented Dec 18, 2023

The kernel headers issue could be a problem with emerge dependencies - I would remove all old kernel header packages to avoid running into the downgrade.

Started a build and test run for all cloud platforms

@ader1990
Copy link
Contributor Author

@ader1990 ader1990 force-pushed the switch_to_kernel_6_6 branch from 6d630f7 to 9584954 Compare December 18, 2023 22:11
@ader1990 ader1990 changed the title Upgrade Linux Kernel for main from 6.1.63 to 6.6.3 Upgrade Linux Kernel for main from 6.1.66 to 6.6.7 Dec 18, 2023
To upgrade, the following changes were required:

  * added Changelog
  * switched to Linux kernel 6.6.7 sources
  * reverted pahole flags - the system halts otherwise with
    Linux kernel / initrd modules not found
  * removed the source symlink deletion, as it the symlink
    is no longer generated
  * updated or removed Linux kernel configs:
     * CONFIG_AUTOFS4_FS -> renamed to AUTOFS_FS
     * CONFIG_IXGB -> renamed to CONFIG_IXGB
     * CONFIG_EDAC_I5000 -> CONFIG_BROKEN
     * CONFIG_EFI_GENERIC_STUB_INITRD_CMDLINE_LOADER -> removed
     * CONFIG_IP_NF_TARGET_CLUSTERIP -> removed
     * CONFIG_MICROCODE_AMD -> removed
     * CONFIG_NET_SCH_CBQ -> removed
     * CONFIG_NET_SCH_DSMARK -> removed
     * CONFIG_NFT_OBJREF -> removed
@ader1990 ader1990 force-pushed the switch_to_kernel_6_6 branch from 9584954 to f1c8d36 Compare December 18, 2023 22:18
@pothos
Copy link
Member

pothos commented Dec 19, 2023

All cloud tests passed for yesterday's state (6.6.3)

Copy link
Member

@pothos pothos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFE] Update kernel in main to linux-6.6
4 participants