Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel is not reproducible #734

Closed
3 of 7 tasks
tlaurion opened this issue Jun 3, 2020 · 15 comments · Fixed by #1630
Closed
3 of 7 tasks

Kernel is not reproducible #734

tlaurion opened this issue Jun 3, 2020 · 15 comments · Fixed by #1630
Assignees
Labels
Bounty/Donations expected Work could/should be funded by interested stakeholder buildsystem help wanted linux

Comments

@tlaurion
Copy link
Collaborator

tlaurion commented Jun 3, 2020

@osresearch @BlackMaria @MrChromebox @flammit @daym

First step accomplished:

Originally posted by @tlaurion in #571 (comment)

@tlaurion
Copy link
Collaborator Author

tlaurion commented Jun 7, 2020

@daym?

@daym
Copy link
Collaborator

daym commented Jun 7, 2020

Still at it!

@tlaurion
Copy link
Collaborator Author

tlaurion commented Jul 22, 2020

@daym
Copy link
Collaborator

daym commented Aug 2, 2020

Converting a bzImage to an ELF image via https://github.com/marin-m/vmlinux-to-elf worked. diffoscope takes almost an hour on the result--but it worked. Seems that the main difference is that the start of the program headers moved--everything else seems okay enough (changes a lot of address references in instructions though). The reason is a size change in the section .init.data that grew by 32 Bytes.

│ - Start of program headers: 14573310 (bytes into file)
│ + Start of program headers: 14573342 (bytes into file)
│ Start of section headers: 64 (bytes into file)

├── readelf --wide --decompress --hex-dump=.init.data {}
│ @@ -31872,21 +31872,23 @@
│    0xffffffff81ede7d0 1569e481 ffffffff 7880e481 ffffffff .i......x.......
│    0xffffffff81ede7e0 e7abe481 ffffffff 75b6e481 ffffffff ........u.......
│    0xffffffff81ede7f0 deb9e481 ffffffff fb03e581 ffffffff ................
│    0xffffffff81ede800 4525e581 ffffffff bec83381 ffffffff E%........3.....
│    0xffffffff81ede810 b766e581 ffffffff 507be581 ffffffff .f......P{......
│    0xffffffff81ede820 bf92e581 ffffffff f0cce581 ffffffff ................
│    0xffffffff81ede830 6d782c81 ffffffff 2a0be581 ffffffff mx,.....*.......
│ -  0xffffffff81ede840 aa13e581 ffffffff 1f8b0800 00000000 ................
│ -  0xffffffff81ede850 0203ed8f b10e8230 1086fb2a 7d00e00e .......0...*}...
│ -  0xffffffff81ede860 6881b1b6 34317122 ee260283 49958168 h...41q".&..I..h
│ -  0xffffffff81ede870 7c7cab01 d4c69a8e 0e7ed37f e935dffd ||.......~...5..
│ -  0xffffffff81ede880 50400108 c0b8c894 064b8e12 3ca4ac46 P@.......K..<..F
│ -  0xffffffff81ede890 a62ae92c 683505f4 fccbe7d0 f5174208 .*.,h5........B.
│ -  0xffffffff81ede8a0 2cce2a63 0f678aa5 cf89814e e6ccf2c5 ,.*c.g.....N....
│ -  0xffffffff81ede8b0 99b4c369 1c4cefb8 85b8bf0b d4faab5b ...i.L.........[
│ -  0xffffffff81ede8c0 4235cd45 60dfa58c 39ec13d3 45d66b53 B5.E`...9...E.kS
│ -  0xffffffff81ede8d0 1b8fc39b 9fab20ff b37ba01f 5dfff13c ...... ..{..]..<
│ -  0xffffffff81ede8e0 9ae85af2 1dcfed0d 317eb827 88d03d97 ..Z.....1~.'..=.
│ -  0xffffffff81ede8f0 d51cb68d 586fea86 524afefc 04379ebd ....Xo..RJ...7..
│ -  0xffffffff81ede900 2ad20004 00000000 be000000 00000000 *...............
│ +  0xffffffff81ede840 aa13e581 ffffffff fd377a58 5a000001 .........7zXZ...
│ +  0xffffffff81ede850 6922de36 02002101 10000000 a8708e86 i".6..!......p..
│ +  0xffffffff81ede860 e003ff00 a25d0018 0ddd0463 83217e83 .....].....c.!~.
│ +  0xffffffff81ede870 e9ad6073 a46c4b93 d2e28c48 c9638c79 ..`s.lK....H.c.y
│ +  0xffffffff81ede880 21813c4b 1d832799 8ef5afc4 2eb2b110 !.<K..'.........
│ +  0xffffffff81ede890 993291e7 2a527da9 cf1a34e1 ee610d82 .2..*R}...4..a..
│ +  0xffffffff81ede8a0 7ee2294e dff44daa b0cfa8cc 47b5afc8 ~.)N..M.....G...
│ +  0xffffffff81ede8b0 80c892c9 cb28ec4d 63d40058 9b49c478 .....(.Mc..X.I.x
│ +  0xffffffff81ede8c0 3b563eef 55cf8e73 937bfcb3 17d1a531 ;V>.U..s.{.....1
│ +  0xffffffff81ede8d0 336e08a0 50392bd5 e50b17de 2108e228 3n..P9+.....!..(
│ +  0xffffffff81ede8e0 d0e29fbe b4be3a91 53d3cf1f 0ec849c7 ......:.S.....I.
│ +  0xffffffff81ede8f0 92cd1337 d3302293 807d35d8 18d7a2f5 ...7.0"..}5.....
│ +  0xffffffff81ede900 fa3d5537 11d4178a 00000000 9ebd2ad2 .=U7..........*.
│ +  0xffffffff81ede910 0001ba01 80080000 fd2a2ea6 3e300d8b .........*..>0..
│ +  0xffffffff81ede920 02000000 0001595a e0000000 00000000 ......YZ........

See also https://lwn.net/Articles/531148/ for the meaning of .init.data.

nm debugging symbols are the same for both versions of bzImage.

I'd try to use the same version of elfutils on both build hosts (that's used for the "linux" module) (check which one is used), maybe that fixes it (I doubt it). elfutils is used by Linux for CONFIG_UNWINDER_ORC for generating ORC metadata. (even though the ORC sections, respectivey, are entirely equal, maybe there's still some init data generated from it)

@krystian-hebel
Copy link
Contributor

├── readelf --wide --decompress --hex-dump=.init.data {}
│ @@ -31872,21 +31872,23 @@
│    0xffffffff81ede7d0 1569e481 ffffffff 7880e481 ffffffff .i......x.......
│    0xffffffff81ede7e0 e7abe481 ffffffff 75b6e481 ffffffff ........u.......
│    0xffffffff81ede7f0 deb9e481 ffffffff fb03e581 ffffffff ................
│    0xffffffff81ede800 4525e581 ffffffff bec83381 ffffffff E%........3.....
│    0xffffffff81ede810 b766e581 ffffffff 507be581 ffffffff .f......P{......
│    0xffffffff81ede820 bf92e581 ffffffff f0cce581 ffffffff ................
│    0xffffffff81ede830 6d782c81 ffffffff 2a0be581 ffffffff mx,.....*.......
│ -  0xffffffff81ede840 aa13e581 ffffffff 1f8b0800 00000000 ................
... snip...
│ +  0xffffffff81ede840 aa13e581 ffffffff fd377a58 5a000001 .........7zXZ...
... snip...

Those last bytes are two magic numbers for compressed files, the first one is for GZIP, and the second for XZ (see Wikipedia). It is a buffer used by function populate_rootfs, and as far as I'm aware it is built from files in /usr. The format of rootfs file should be controllable with CONFIG_INITRAMFS_COMPRESSION, which depends on INITRAMFS_SOURCE!="" here.

I do not know how it chooses the compression method if the option itself is disabled (I assume .config is identical in both cases, but I might be wrong about that), but then again I looked primarily on the newer sources and this part of configuration was changed couple of times. It seems that back in 4.14 it was impossible to use uncompressed rootfs.

@tlaurion
Copy link
Collaborator Author

tlaurion commented Sep 24, 2020

├── readelf --wide --decompress --hex-dump=.init.data {}
│ @@ -31872,21 +31872,23 @@
│    0xffffffff81ede7d0 1569e481 ffffffff 7880e481 ffffffff .i......x.......
│    0xffffffff81ede7e0 e7abe481 ffffffff 75b6e481 ffffffff ........u.......
│    0xffffffff81ede7f0 deb9e481 ffffffff fb03e581 ffffffff ................
│    0xffffffff81ede800 4525e581 ffffffff bec83381 ffffffff E%........3.....
│    0xffffffff81ede810 b766e581 ffffffff 507be581 ffffffff .f......P{......
│    0xffffffff81ede820 bf92e581 ffffffff f0cce581 ffffffff ................
│    0xffffffff81ede830 6d782c81 ffffffff 2a0be581 ffffffff mx,.....*.......
│ -  0xffffffff81ede840 aa13e581 ffffffff 1f8b0800 00000000 ................
... snip...
│ +  0xffffffff81ede840 aa13e581 ffffffff fd377a58 5a000001 .........7zXZ...
... snip...

Those last bytes are two magic numbers for compressed files, the first one is for GZIP, and the second for XZ (see Wikipedia). It is a buffer used by function populate_rootfs, and as far as I'm aware it is built from files in /usr. The format of rootfs file should be controllable with CONFIG_INITRAMFS_COMPRESSION, which depends on INITRAMFS_SOURCE!="" here.

I do not know how it chooses the compression method if the option itself is disabled (I assume .config is identical in both cases, but I might be wrong about that), but then again I looked primarily on the newer sources and this part of configuration was changed couple of times. It seems that back in 4.14 it was impossible to use uncompressed rootfs.

@krystian-hebel Thank you so muh for your analysis.
The linux-qemu config on which this build was from is here: https://github.com/osresearch/heads/blob/9719510f395bcb54efc6b01d309184ba637e6f29/config/linux-qemu.config and was generated from make savedefconfig, which is why the config file only contains differences from defconfig.

If one does, for the same of commit id 9719510 for which rom was given in previous comment:

git checkout 9719510f395bcb54efc6b01d309184ba637e6f29
git reset --hard
(already build here)
cp config/linux-qemu.config build/linux-4.14.62/.config
cd build/linux-4.14.62
make meuconfig
(save)
(exit)

Then:

user@x230-master:~/heads/build/linux-4.14.62$ grep INITRAMFS_COMPRESSION .config
CONFIG_INITRAMFS_COMPRESSION=".xz"
user@x230-master:~/heads/build/linux-4.14.62$ 
user@x230-master:~/heads/build/linux-4.14.62$ make savedefconfig
scripts/kconfig/conf  --savedefconfig=defconfig Kconfig
user@x230-master:~/heads/build/linux-4.14.62$ grep INITRAMFS_COMPRESSION defconfig 
user@x230-master:~/heads/build/linux-4.14.62$

Both roms were produce with the same linux config file

@tlaurion
Copy link
Collaborator Author

tlaurion commented Sep 24, 2020

@krystian-hebel
Actually, you might have been right. Why selected behavior was different depending of host build system is not explained in my mind yet, but here is the facts.

user@x230-master:~/heads$ git checkout 9719510f395bcb54efc6b01d309184ba637e6f29
user@x230-master:~/heads$ git reset --hard
user@x230-master:~/heads$ cp config/linux-qemu.config build/linux-4.14.62/.config
user@x230-master:~/heads$ cd build/linux-4.14.62/
user@x230-master:~/heads/build/linux-4.14.62$ make menuconfig
scripts/kconfig/mconf  Kconfig


*** End of the configuration.
*** Execute 'make' to start the build or try 'make help'.

Builtin_InitramFs_Support_was_deactivated
Now ticked.

XZ_was_not_Enforced
Save.

Then:

user@x230-master:~/heads/build/linux-4.14.62$ make savedefconfig
scripts/kconfig/conf  --savedefconfig=defconfig Kconfig
user@x230-master:~/heads/build/linux-4.14.62$ cp defconfig ../../config/linux-qemu.config 
user@x230-master:~/heads/build/linux-4.14.62$ git diff
diff --git a/config/linux-qemu.config b/config/linux-qemu.config
index bb49391..31bb1f6 100644
--- a/config/linux-qemu.config
+++ b/config/linux-qemu.config
@@ -25,6 +25,7 @@ CONFIG_INITRAMFS_SOURCE="../../../blobs/dev.cpio"
 # CONFIG_RD_LZMA is not set
 # CONFIG_RD_LZO is not set
 # CONFIG_RD_LZ4 is not set
+CONFIG_INITRAMFS_COMPRESSION_XZ=y
 CONFIG_CC_OPTIMIZE_FOR_SIZE=y
 # CONFIG_SGETMASK_SYSCALL is not set
 # CONFIG_SYSFS_SYSCALL is not set
@@ -248,9 +249,7 @@ CONFIG_GENERIC_PHY=y
 CONFIG_DMI_SYSFS=y
 CONFIG_GOOGLE_FIRMWARE=y
 CONFIG_GOOGLE_MEMCONSOLE_X86_LEGACY=y
-# CONFIG_EXT2_FS is not set
 CONFIG_EXT4_FS=y
-CONFIG_EXT4_USE_FOR_EXT2=y
 CONFIG_XFS_FS=y
 # CONFIG_DNOTIFY is not set
 # CONFIG_INOTIFY_USER is not set

Where CONFIG_INITRAMFS_COMPRESSION_XZ=y is now explicited.

@tlaurion
Copy link
Collaborator Author

tlaurion commented Sep 24, 2020

Same applies to all other linux configs for other boards it seems (tested for linux-x230.config also)

Time to explicitate that and retry builds.

tlaurion added a commit to tlaurion/heads that referenced this issue Sep 24, 2020
tlaurion added a commit to tlaurion/heads that referenced this issue Sep 24, 2020
tlaurion added a commit to tlaurion/heads that referenced this issue Oct 2, 2020
@tlaurion
Copy link
Collaborator Author

tlaurion commented Oct 2, 2020

@tlaurion
Copy link
Collaborator Author

tlaurion commented Oct 3, 2020

├── readelf --wide --decompress --hex-dump=.init.data {}
│ @@ -31872,21 +31872,23 @@
│    0xffffffff81ede7d0 1569e481 ffffffff 7880e481 ffffffff .i......x.......
│    0xffffffff81ede7e0 e7abe481 ffffffff 75b6e481 ffffffff ........u.......
│    0xffffffff81ede7f0 deb9e481 ffffffff fb03e581 ffffffff ................
│    0xffffffff81ede800 4525e581 ffffffff bec83381 ffffffff E%........3.....
│    0xffffffff81ede810 b766e581 ffffffff 507be581 ffffffff .f......P{......
│    0xffffffff81ede820 bf92e581 ffffffff f0cce581 ffffffff ................
│    0xffffffff81ede830 6d782c81 ffffffff 2a0be581 ffffffff mx,.....*.......
│ -  0xffffffff81ede840 aa13e581 ffffffff 1f8b0800 00000000 ................
... snip...
│ +  0xffffffff81ede840 aa13e581 ffffffff fd377a58 5a000001 .........7zXZ...
... snip...

Those last bytes are two magic numbers for compressed files, the first one is for GZIP, and the second for XZ (see Wikipedia). It is a buffer used by function populate_rootfs, and as far as I'm aware it is built from files in /usr. The format of rootfs file should be controllable with CONFIG_INITRAMFS_COMPRESSION, which depends on INITRAMFS_SOURCE!="" here.

I do not know how it chooses the compression method if the option itself is disabled (I assume .config is identical in both cases, but I might be wrong about that), but then again I looked primarily on the newer sources and this part of configuration was changed couple of times. It seems that back in 4.14 it was impossible to use uncompressed rootfs.

@krystian-hebel other ideas? Implementing the above changes still produce non-reproducible kernels as shown under #850 (comment)

@krystian-hebel
Copy link
Contributor

Issue remains the same, i.e. GZIP vs XZ.
Maybe do the same with RD_XZ? It is a long shot, but it is used here. However, unless I misunderstood this, it would mean that in some cases initramfs may be compressed twice, using two different methods (plus the compression of the kernel itself).

@krystian-hebel
Copy link
Contributor

By manually dd-ing that initrd out I can confirm that CPIOs are identical in both cases:

$ xzcat test_ci.xz | cpio -t
dev
dev/console
lib/ld-
lib/ld-musl-x86_64.so.1
2 bloki

$ gzip -c -d test_gl.gz | cpio -t
dev
dev/console
lib/ld-
lib/ld-musl-x86_64.so.1
2 bloki

$ xzcat test.xz | sha256sum 
00e62b12c5519ff505e1ea37e7395a56c94e4ca9fc4ba1e04a04b73e50d2526e  -

$ gzip -c -d test_gl.gz | sha256sum 
00e62b12c5519ff505e1ea37e7395a56c94e4ca9fc4ba1e04a04b73e50d2526e  -

All differences in code seem to be just in memory addresses of data and called functions. test_ci.xz is 0x20 (32) bytes bigger than test_gl.gz, but the offsets in the code differ by 0x30 and 0xf0 bytes, depending whether it accesses data before or after kernel decompression. These offsets seem strange, but it may be caused by alignment of individual sections - the main XZ-compressed part of the kernel is not aligned, and because of compression its size does not depend on the size of uncompressed data (not directly anyway).

tlaurion added a commit to tlaurion/heads that referenced this issue Oct 4, 2020
…: expend linux's defconfig stored config under config/linux-qemu.config to test linuxboot#734 (comment) per @krystian-hebel comment (RD_XZ and all other known config params specify XZ as of now. Will be more easy to troubleshoot and less long to have artifacts.
@tlaurion
Copy link
Collaborator Author

tlaurion commented Oct 4, 2020

Issue remains the same, i.e. GZIP vs XZ.
Maybe do the same with RD_XZ? It is a long shot, but it is used here. However, unless I misunderstood this, it would mean that in some cases initramfs may be compressed twice, using two different methods (plus the compression of the kernel itself).

@krystian-hebel Heads store configs (coreboot, linux) as make savedefconfig's result. I expended that in latest commit so we have a clear view on from what linux config the bzImage artifact is built from to try to understand better from where the variation comes from being built from debian-10 (CircleCI) or Fedora-30(GitlabCI).

So from tlaurion@653582e commit on, this PR builds only for qemu-coreboot to fasten builds. We know cpio is reproducible, and from past tests, we know that only the kernel is not reproducible as of now (review hashes.txt and compare.) The question is still why bzImage is not reproducible, if musl-cross-make is used to build the kernel and is supposed to be the same, building the same config from that musl-cross-make toolchain.

On that matter, you can see tlaurion@653582e#diff-883dee0e42d0c32ec557a7bd837e6b67 what is "supposed" to be part of the make savedefconfig. So here exposed is on the left the savedefconfig result and on the right the expended explicit version of the linux config file. Let's see if the variation are still there, since we now know that those config options are in:
- CONFIG_RD_XZ=y (was implicit from savedefconfig, now explicited in full config from last commit)

  • CONFIG_INITRAMFS_COMPRESSION_XZ=y (was explicit in savedefconfig, so is not in default linux config ofr that linux version)
  • CONFIG_KERNEL_XZ=y (was already explicit)
  • CONFIG_INITRAMFS_COMPRESSION=".xz" (was implicit)
  • CONFIG_XZ_DEC=y (was implicit)
  • CONFIG_XZ_DEC_TEST=m (was already explicit, but as module, so not packed nor loaded per anything in Heads policy)

To produce such config, what was done was:

  • Expend defconfig to explicit linux config:
cp config/linux-qemu.config build/linux-4.14.62/.config
cd build/linux-4.14.62/
make menuconfig
(save)
cp .config ../../config/linux-qemu.config
cd ~/heads
git add config/linux-qemu.config

Nothing else on that matter.

  • I removed all other boards of that PR from GitlabCI and CIrcleCI.

@tlaurion
Copy link
Collaborator Author

tlaurion commented Oct 4, 2020

As of CircleCI build artifact's hashes which matches even previous CI builds (normal)
13f03e823822fd854cbed6ddaadc9408d6626f5d32a1870fa7522242fea4d567 /root/project/build/qemu-coreboot/bzImage

@krystian-hebel
Copy link
Contributor

For GitlabCI it also matches the previous build. Even with full, explicit config it seems to either a) choose a different path when parsing Makefiles, or b) re-parse configuration from Kconfig files on final make, after which .config is modified/"sanitized" in a different manner on those two CIs. This kind of narrows it down, I think the next logical step would be to see if the build system uses environment variables for compression stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bounty/Donations expected Work could/should be funded by interested stakeholder buildsystem help wanted linux
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants