Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a jdk21u s390x Linux DevKit toolchain #3700

Closed
sxa opened this issue Mar 12, 2024 · 25 comments
Closed

Create a jdk21u s390x Linux DevKit toolchain #3700

sxa opened this issue Mar 12, 2024 · 25 comments
Assignees
Labels
aarch Issues that affect or relate to the aarch ARCHITECTURE enhancement Issues that enhance the code or documentation of the repo in any way z-linux Issues that affect or relate to the s390x LINUX OS

Comments

@sxa
Copy link
Member

sxa commented Mar 12, 2024

Parent: #3468

Similar to the aarch64 issue at #3519 this will cover the analysis required to create a devkit for JDK21+ on the Linux/s390x platform. Two options will be explored in parallel:

  • Creating a Devkit using Fedora (Should be supported by the devkit process as-is)
  • Creating a Devkit using RHEL using out ROSI subscription (May be more difficult as devkit does not explicitly support RHEL for this)
@sxa sxa added enhancement Issues that enhance the code or documentation of the repo in any way z-linux Issues that affect or relate to the s390x LINUX OS labels Mar 12, 2024
@sxa sxa self-assigned this Mar 12, 2024
@github-actions github-actions bot added the aarch Issues that affect or relate to the aarch ARCHITECTURE label Mar 12, 2024
@sxa
Copy link
Member Author

sxa commented Mar 15, 2024

As part of this I'm trying to replicate the existing aarch64 centos7 devkit from @andrew-m-leonard 's work from adoptium/ci-jenkins-pipelines#955 as well as trying on Fedora so I'll log the findings. ABI here referst to our adoptopenjdk docker build images

Distro+arch Notes
C7ABI aarch64 Completed successfully
Centos7.6.1810 aarch64 [1] Completed successfully
Fedora34 aarch64 [2] GCC 11.3.0; undefined reference to operator delete(void*, unsigned long)' in read-md.c`. May be due to missing findutils.
Fedora39 aarch64 [3] configure: error: Link tests are not allowed after GCC_NO_EXECUTABLES
RH7API s390x Completed successfully
Fedora34 s390x error: Pthreads are required to build libgomp in gcc 11.3.0 build
Fedora34 s390x (Target F27) Completed successfully
Fedora34 s390x (Target F19) Completed successfully
Fedora39 s390x GCC_NO_EXECUTABLES error
Fedora39 s390x (Target F19) Issue with gdb?

[1] - Uses gcc 4.8.5
[2] - Uses gcc 11.3.1
[3] - Uses gcc 13.2.1

NOTE 1: It should be possible to download the prerequisite packages on RHEL7 using yum reinstall --downloadonly --downloaddir=$PWD <package names>. These operations still have to be run as root so that yum can get the lock.

NOTE 2: Fedora 21 was the first one that has repositories available for aarch64 so you cannot build an earlier devkit on there. I have successfully built one on aarch64 based on Fedora 21 but that's no use to us. s390x was available before that so should be possible to build a RHEL7-compatible one on there.

NOTE 3: Between F27 and F28 the aarch64 port was moved out of the fedora-secondary repository so this needs to be accounted for in the devkit's Tools.gmk

NOTE 4: Packages that may or may not be required but were in my RHEL7 test system: libXinerama, libXinerama-devel, compat-glibc, compat-glibc-headers

NOTE 5: Attempting to build gcc 11.3 on Fedora 39 (outside the devkit) fails with error: multiple definition of enum fsconfig_command

NOTE 6: The platform detection on Fedora doesn't always seem to work and needed submakevars in the devkit Makefile adjusted to hard code HOST and BUILD to s390x-linux-gnu for anything to work (May only be Fedora 39)

@sxa
Copy link
Member Author

sxa commented Mar 15, 2024

The important ones for the purposes if this issue are the RH7API s390x and Fedora34 s390x (Target F19) lines which indicate we can build a devkit that targets glibc 2.17 suitable for RHEL7.
I will attempt to verify that they are usable next week.

@sxa
Copy link
Member Author

sxa commented Mar 18, 2024

Summary of s390x devkits (For reference on the dockerhost machine it takes about 45 minutes to build a RH7 devkit, and around 10 minutes to build the JDK afterwards):

  • RH7 on RH7: Did not work - libstdc++.a(eh_throw.o)(.note.stapsdt+0x14): error: relocation refers to local symbol "" [9], which is defined in a discarded section
  • RH7 on RH7 without systemtap rpms available to devkit build: SUCCEEDED
  • F19 devkit on F34 (built on F34, without systemtap available during devkit build): Build failure in hotspot/os/linux/systemMemoryBarrier_linux.cpp:47:4: error: #error define SYS_membarrier for the arch followed by error: 'SYS_membarrier' was not declared in this scope
  • RH7 devkit on F34 : Same error as previous line
  • F19 devkit on RH7: SUCCEEDED (and java -version runs correctly)

@andrew-m-leonard
Copy link
Contributor

excellent @sxa
Do you have a F19 DevKit I can try building a JDK with to compare...?

@andrew-m-leonard
Copy link
Contributor

@sxa i'm having issues with the F19 devkit, it seems some of the executables are picking up GLIBC_2.33 ?

[root@s390x-kvm-016 openjdkbuild]# /root/devkit/bin/as --version
/root/devkit/bin/as: /usr/lib64/libc.so.6: version `GLIBC_2.33' not found (required by /root/devkit/bin/as)

@sxa
Copy link
Member Author

sxa commented Mar 20, 2024

@sxa i'm having issues with the F19 devkit, it seems some of the executables are picking up GLIBC_2.33 ?

Yeah it hasn't worked as expected - there are still some dependencies coming from the host system for that one unfortuantely.

@sxa
Copy link
Member Author

sxa commented Mar 21, 2024

  • F19 devkit on RH7: SUCCEEDED (and java -version runs correctly)

It looks like I'd built the one I gave you was probably the one built with GCC11 on a Fedora34 host when I republished without SDT and so had some extra dependencies that it shouldn't have had. Now rebuilding with the RH7 system gcc (4.8.5) which should alleviate those errors with the binutils packages.

@andrew-m-leonard There is a new version of in the same directory as the previous one as F19devkitonF34.s390x.tar.xz (I'm keeping the location private for now as I don't want others downloading it while this is still in flux) which you can try.

@sxa
Copy link
Member Author

sxa commented Mar 21, 2024

A bit more experimentation - the reason for the SYS_membarrier error is because Fedora 19's glibc-headers package doesn't define it in /usr/include/bits/syscall.h. If you add a definition of that which is set to 356 (that value can be seen in later versions of fedora or CentOS9-stream in asm/unistd_64.h into the file then openjdk will compile ok.

Other things being tried. Note that all devkits referenced here are built without systemtap-sdt:

DK bld on Devkit jdk build on Result
RH7bi RH7 RH7bi
RH7bi F19 RH7bi
F34 F19 F34
? RH7 F34
F28 F19 F28 ✅ [1]
F28 F19 RH7bi ❌ [2]

[1] - Requires glibc2.27 so will not run on a RHEL7 system
[2] - libstdc++ symbol issues. Can resolve with LD_LIBRARY_PATH=/lib64 but causes follow-on errors during compilation of adlc code.

Unfortunately (presumably due to things like glibc being at different patch levels, and it looks like Fedora 19 came with gcc 4.8.1 instead of the 4.8.5 in later CentOS7 releases (For reference CentOS 7.0.1406 had gcc 4.8.2) the builds using a Fedora19 devkit are not binary identical to those build using a RHEL7 devkit.

The only practical options for the devkit are to build and test a devkit on the host the devkit was built on, which should be RHEL7 in the absence of having a Fedora 19 system. The devkit can be from RHEL7 or Fedora 19.

And with that, I'm done :-)

@sxa
Copy link
Member Author

sxa commented Mar 22, 2024

Additional tests building openjdk 21 with the built devkits:

devkit host identical [1] Runs on RH7?
RH7 RH7bi
RH7 RH7bi
F19 RH7bi [2]
RH7 F34
F19 F34 [2]
RH7 Ubu22.04
F19 (From F28) F34 [3]
F19 (From F28) Ubu22.04 [3]

[1] - Identical here means that there are no binary differences in anything within the final JDK tarball when comparing with the initial build from the first row in the table.
[2] - The builds with the Fedora 19 devkit are not the same but java -version runs correctly on RHEL7
[3] - The F19 devkit built on Fedora 28 [4]is not comparable with the ones from an F19 devkit built on RHEL7, but the jdk build does run on RHEL7, however they are NOT binary reproducible with the ones from the devkit used in the ones with footnote [2]. The two entries here with this footnote are binary identical.
[4] - While this was done using a F19 devkit built on Fedora 28 it is assumed that building the same F19 devkit on a later distribution would yield the same results, but this has not been explicitly verified.

Conclusion: To build a reproducible JDK you need to use the same devkit - you cannot build a devkit on one host system and expect the results from it to be identical to an "equivalent" devkit built in another environment. Based on this, the preferred option in order to meet the goal of using a devkit to produce something that works on RHEL7 is is to either use a RHEL7 devkit, or a Fedora19 devkit built on a well-defined fixed base OS. However, once we have the devkits produced we are not tied to running them on a RHEL7 system.

@sxa
Copy link
Member Author

sxa commented Mar 22, 2024

Next steps:

  • Decide on the devkit to use
  • Decide how/where to create it
  • Incorporate it into the build process

Ideally the last of those will be including it into the build image as a replacement for the GCC11 that we currently install into /usr/local/gcc11

@sxa
Copy link
Member Author

sxa commented Mar 22, 2024

Some notes from when I was experimenting in case they're useful to others:

  • to have a system suitable for building the devkit, install make git gcc-c++ wget xz bzip2 m4 cpio util-linux gnupg patch bison texinfo file diffutils
  • You may also need gnupg on Fedora, plus utils-linux for su and bzip2 plus diffutils and patch
  • If you're on a plain CentOS7 or similar system, you'll need to download and install GNU make 4.1 since the system one is too old
  • make TARGETS=s390x-linux-gnu BASE_OS=Fedora BASE_OS_VERSION=19
  • With Andrew's devkit patches applied, make TARGETS=s390x-linux-gnu BASE_OS=Centos BASE_OS_VERSION=7.6.1810
  • If on RHEL7 (you probably will be for s390x) create build/devkit/download/rpms/s390x-linux-gnu-Centos7.6.1810 and then yum reinstall --downloadonly --downloaddir=build/devkit/download/rpms/s390x-linux-gnu-Centos7.6.1810 followed by the list of package names from Tools.mk excluding systemtap-sdt: glibc glibc-headers glibc-devel cups-libs cups-devel libX11 libX11-devel xorg-x11-proto-devel alsa-lib alsa-lib-devel libXext libXext-devel libXtst libXtst-devel libXrender libXrender-devel libXrandr libXrandr-devel freetype freetype-devel libXt libXt-devel libSM libSM-devel libICE libICE-devel libXi libXi-devel libXdmcp libXdmcp-devel libXau libXau-devel libgcc libxcrypt zlib zlib-devel libffi libffi-devel fontconfig fontconfig-devel kernel-headers
  • To build the JDK on the resulting devkit you'll need some other packages (unless you're using a system set up with our playbooks): git make procms-ng openssl diffutils autoconf bzip2 curl unzip zip and ant if you want to do the SBoM creation.
  • You should then be able to build the JDK with:
    • git clone https://github.com/adoptium/temurin-build
    • cd temurin-build
    • ./makejdk-any-platform.sh --jdk-boot-dir <somewhere> -configure-args "--disable-warnings-as-errors --with-devkit=<somewhere> --release -freetype-dir bundled jdk21u - you can also have --target-file-name something.tar.gz --build-variant temurin --create-sbom before the jdk21u at the end if you so desire.

@sxa
Copy link
Member Author

sxa commented Mar 25, 2024

s390x devkit creation jobs (Based on @andrew-m-leonard's branch with some prototype modifications from https://github.com/sxa/ci-jenkins-pipelines/commits/devkit_s390x_rhel:

  • Fedora 19 (Failing with a GPG error retrieving the rpms)
  • RHEL7 (restricted access, and currently requires the rpms to be pre-downloaded on the host in the default location)

Neither of these are currently running in a docker container (unlike on the other platforms). There will need to be extra work to allow that to happen, including switching the docker software used on the host back to the default docker from the RHEL repositories, and also having a way of making the RHEL7 packages accessible - the downloads can generally only be done as root, so this may need tobe done as part of the Dockerfile that creates the images (maybe download them all to a known devkit_rpms location when the image is built?)

@sxa
Copy link
Member Author

sxa commented Mar 26, 2024

switching the docker software used on the host back to the default docker from the RHEL repositories

Prototyping this on build-marist-rhel79-s390x-2 which has had the following changes applied:

  • Registered via ROSI with the subscription-manager
  • Enable repo required for Xvfb in the static docker images: subscription-manager repos --enable rhel-7-for-system-z-optional-rpms
  • rpm -e docker-ce docker-ce-rootless-extras docker-ce-cli to remove the ones installed by the playbook
  • yum install docker
  • Verified that running the image with a volume mount seems to work ok, and that the UIDs are in sync
  • Now running a JDK21 build on the machine to verify correct operation.
Packages changed from switching from `docker-ce` to `docker`
sxa@fedora:~/rhel7-s390x$ diff packages.dockerce packages.docker
163c163,166
< docker-ce-25.0.5-1.el7.s390x
---
> oci-umount-2.5-3.el7.s390x
> oci-register-machine-0-6.git2b44233.el7.s390x
> device-mapper-persistent-data-0.8.5-3.el7_9.2.s390x
> docker-common-1.13.1-210.git7d71120.el7_9.s390x
355c358,359
< docker-ce-cli-25.0.5-1.el7.s390x
---
> yajl-2.0.4-4.el7.s390x
> oci-systemd-hook-0.2.0-1.git05e6923.el7_6.s390x
473c477,482
< docker-ce-rootless-extras-25.0.5-1.el7.s390x
---
> lvm2-libs-2.02.187-6.el7_9.5.s390x
> python-pytoml-0.1.14-1.git7dea353.el7.noarch
> docker-rhel-push-plugin-1.13.1-210.git7d71120.el7_9.s390x
> lvm2-2.02.187-6.el7_9.5.s390x
> containers-common-0.1.40-12.el7_9.s390x
> docker-client-1.13.1-210.git7d71120.el7_9.s390x
621a631,635
> device-mapper-event-libs-1.02.170-6.el7_9.5.s390x
> device-mapper-event-1.02.170-6.el7_9.5.s390x
> atomic-registries-1.22.1-33.gitb507039.el7_8.s390x
> container-storage-setup-0.11.0-2.git5eaf76c.el7.noarch
> docker-1.13.1-210.git7d71120.el7_9.s390x
sxa@fedora:~/rhel7-s390x$ 

@sxa
Copy link
Member Author

sxa commented Mar 26, 2024

Summary at end of Tuesday 25th:

  • Jobs created and tested which build an s390x devkit for Fedora 19 and RHEL7
  • RHEL7 docker image creation to automatically rebuild the images when required (While it's related to the work for the devkit, this is a general enhancement that we haven't had before - hence the reason the strace change has not yet taken effect). This job uses a RHEL7 devkit tarball which has to have been downloaded from the artifacts from the RHEL7 job from the first bullet point to the build machine host which is copied in during the docker image creation - it also pulls in the F19 one from the jenkins job for now while we are evaluating these changes. This process is subject to change ... The image creation can currently only be done on build-marist-rhel79-s390x-2 as this is the only one with the Red Hat docker package instead of docker-ce. This will also have pulled in strace so will fix
  • jdk21u build using a job created from the latest playbook changes, but NOT with the devkit enabled, to make sure I haven't broken anything.

Note 1: I had this machine running two executors and performing the docker image build, and a build job in parallel. This caused the machine to fail both jobs and have "unable to fork" message in the agent which were not immediately fixable. I have rebooted the machine and it has reconnected successfully.

Note 2: The Red Hat supplied docker package installs itself as a service, but does NOT automatically start it (either on install, or on reboot)

Note 3: I have not updated the playbooks in my infrastructure PR to switch over from docker-ce to docker.

@sxa
Copy link
Member Author

sxa commented Mar 27, 2024

NOTE: The issue in this comment have been resolved, but I'm leaving this here for historic reference

switching the docker software used on the host back to the default docker from the RHEL repositories

Prototyping this on build-marist-rhel79-s390x-2

OK That's causing problems. build jobs and the dockerbuild image rebuid job are having issues with fork: resource temporarily unavailable
Also the jenkins agent is unhappy too and won't start properly once these errors occur, although I can still log in interactively with an ssh session and use the machine without problems.

Options:

  • Use podman instead of docker (although I suspect that will be too old to have adequate feature parity such as volume mounts, and we'll need to see if it gets the subscription infromation
  • Do all this on a RHEL8 host instead of RHEL7 - we have https://ci.adoptium.net/computer/test%2Dmarist%2Drhel8%2Ds390x%2D2/
  • Go back to what I was doing before - docker-ce with the "hack" to register the image when it's being created. But I'd rather avoid that. It's a solution which is ok for the build itself though.
Changes on RHEL8 machine

rpm -e docker-ce docker-ce-rootless-extras docker-ce-cli containerd.io
yum install docker
This installs podman with the docker wrapper around it, which seems adequate for building our images

The issue with resource temporarily unavailable seems to have been a jenkins issue with the agent. When I duplicated the node definition for build-marist-rhel79-s390x-2 to -3, deleted -2, then renamed -3 to -2 it worked ok. There was no clear reason why this was a problem. I've seen that solution work for machines in the past, but not for quite some time. We are therefore back on track and do not need to evaluate the other options, although the RHEL8 machine is now using podman instead of docker which is a change

@sxa
Copy link
Member Author

sxa commented Mar 27, 2024

OK the "fix" mentioned in the previous comment didn't work. Despite the rename fixing the jenkins connection issues, the machine (build-2) still gave fork problems when running a build. I've switched it back to docker-ce [*] and repeating the test to verify if the problems are definitely related to the docker package or an issue with this specific machines which had been disabled previously. In parallel I will continue to run the tests using podman on the test-rhel8 machine.

[*] - rpm -e docker docker-common docker-client then yum install docker-ce

@andrew-m-leonard
Copy link
Contributor

OK the "fix" mentioned in the previous comment didn't work. Despite the rename fixing the jenkins connection issues, the machine (build-2) still gave fork problems when running a build. I've switched it back to docker-ce [*] and repeating the test to verify if the problems are definitely related to the docker package or an issue with this specific machines which had been disabled previously. In parallel I will continue to run the tests using podman on the test-rhel8 machine.

[*] - rpm -e docker docker-common docker-client then yum install docker-ce

I tend to get fork resource errors on my local aarch64 VM, and I either have to reboot it, or build with less "jobs", eg.--with-jobs=4

@sxa
Copy link
Member Author

sxa commented Mar 27, 2024

Time for another table I think...

Machine containers devkit build rhel7 container build JDK build
build-rhel79-1 docker-ce ok on host Needs ROSI_ creds [*] Good
build-rhel79-2 docker ok on host Good Failed (resource issue) [§]
build-rhel79-2 docker-ce Assume ok Needs ROSI_ creds [*] Good
test-rhel8-2 podman Expect OK in container Good jenkins UID mismatch [†]

[*] The old ROSI (RHEL subscription) parameters are the ones that are planned for removal as part of https://github.com/adoptium/infrastructure/pull/3492/files#diff-80de47d21d528cc9398601b8acc0578d6415e61ca5b3aa94f8dc9c8f645c5adb which can be done as long as the host has a subscription and is using the RHEL supplied docker or podman. Up to now the build images have been rebuilt manually on the build machine with the ROSI credentials being explicitly supplied to allow it to run the playbooks.
[†] - This will need further investigation - the error message is consistent with when we have had UID mismatches before which are typically easy to resolve, but this is the first attempt to run with podman, and the UIDs on this host are the same as on the other ones (jenkins=1003)
[§] - The reason for the resource issue in these builds is unclear. I have successfully run a build from the command line as a container running from the jenkins user on these machines without encountering the same issue. It is possible that Andrew's suggestion of running with fewer concurrent jobs would help, but the machine should be of an adequate capacity (4 core, 16GB RAM). Reducing the number of concurrent processes from 4 to 2 did not seem to resolve the resource issues.

TL;DR of the current blockers. Ideally we want one type of system that perform all three actions. The gotchas are:

  • devkit build requires access to the RHEL7 rpms (Currently from a known location on the host with the current implementation since they can't run the rpm commands to download as that requires root access). It should be possible to do this in a container (if host is running docker/podman from RHEL and not docker-ce)
  • RHEL7 container build needs to have access to the RHEL yum repositories so needs to be running on RHEL host attached to the subscription manager. There is a "hack" to temporarily register it if it's on a docker-ce system, but ideally this should be on a RHEL machine running the RHEL docker/podman packages to be able to access the RHEL subscription with containers on that host. Note also that podman on RHEL8 prefixes built images with localhost/ which is different from docker on the RHEL7 machines
  • The build can likely run on any host assuming it gets the build image from the previous step copied across to it but sticking with RHEL8 makes sense so it can run all the other parts of this process. Builds with the RHEL7-supplied docker appear unreliable, but are ok with docker-ce, which is what we have been using up til now.

PROPOSAL: Assuming RHEL7+docker cannot be made to work I think my preferred option would be to start doing the s390x builds on a RHEL8 host and deal with the changes required to achieve that. This would give us an experience comparable with the other platforms, where we are creating the devkit inside a container. The packages required for the devkit could be downloaded during the docker images creation, ready for use in the devkit build which could be done in the dockerfile or in the ansible playbooks in an adoptopenjdk role. This would require ensuring that the build processes can run correctly on a podman installation.
The second option would be to continue with RHEL7+docker-ce and the ROSI credentials hack, which is also feasible as an interim solution which we know works.

Related: adoptium/infrastructure#3217 (A place to store the images to assist distribution to build machines. We could store them as artifacts on the rhel7 job but they are large!)

@andrew-m-leonard
Copy link
Contributor

andrew-m-leonard commented Mar 28, 2024

For test-rhel8-2 podman : Try changing this line https://github.com/adoptium/ci-jenkins-pipelines/blob/e297546378b5fbdb676223eb1ae2a0abe7406679/pipelines/build/common/openjdk_build_pipeline.groovy#L2049
to:

dockerRunArg += " --userns keep-id:uid=1002,gid=1003"

@sxa
Copy link
Member Author

sxa commented Mar 28, 2024

Good shout - I hadn't realised that we had some special logic in there to handle this. It took a bit more effort since podman seems to take quite a while to start up when using extra options like that and therefore hits the 180 second timeout for the docker launch:

6:33:57  $ docker run -t -d -u 1002:1003 -e BUILDIMAGESHA= --userns keep-id:uid=1003,gid=1002 -w /home/jenkins/workspace/sxa-jdk21u-s390x -v /home/jenkins/workspace/sxa-jdk21u-s390x:/home/jenkins/workspace/sxa-jdk21u-s390x:rw,z -v /home/jenkins/workspace/sxa-jdk21u-s390x@tmp:/home/jenkins/workspace/sxa-jdk21u-s390x@tmp:rw,z [...] rhel7_build_image.20240327.3efa1fc6 cat
16:36:57  ERROR: Timeout after 180 seconds

I got past that by starting up a container with the same options manually, waiting 5-10 minutes for it to run (it feels like it's duplicating the image since it chews up space during that time) after which the images start near-instantly. It will need to be seen whether this causes a problem when the images is rebuilt.

To use RHEL8+podman as per the above proposal we need to either:

  1. Adjust the hard coding of uid=1000,gid=1000 to be 1002/1003
  2. Do something else to dynamically set the UID correctly in podman
  3. Change the UID of the jenkins user (and the rhel7_build_image) on the host system, probably via reprovisioning build-marist-rhel79-s390x-2 as a rhel8 system, prototyping on there with a new image, then look at migrating the second one.

I'm feeling that the reprovisioning option in 3 is good, subject to us being ok with only having one machine able to run the old stuff in the interim (Although the second one has been offline for a while anyway)

Other random podman stuff discovered today

For bind mounts to work on my desktop system I need to add :z afterwards e.g. -v $PWD:/map:z

If SELinux is enabled (as it is by default on the Marist RHEL8 systems) then the above will not work. However as per Andrew's comments --userns keep-id:uid=1003,gid=1002 will be ok for the purposes of the build but still requires SELinux to be disabled in /etc/selinux/config

@sxa
Copy link
Member Author

sxa commented Apr 2, 2024

Suggestion from Severin to use --userns=keep-id for the mapping in the future adoptium/ci-jenkins-pipelines#986 (comment)

@sxa
Copy link
Member Author

sxa commented Apr 2, 2024

16:36:57 ERROR: Timeout after 180 seconds

Ref this - it seems to take about 5 minutes on the machine to complete the operation. It's not immediately obvious where that timeout is set (The 3 minutes seems to be from this PR and can be overridden by -org.jenkinsci.plugins.docker.workflow.client.DockerClient.CLIENT_TIMEOUT=600 as per this comment but that will require a restart, so we should do it in the next maintenance window.

EDIT: I've mitigated this by putting a docker run with those parameters into the docker image creation job which will take the initial hit. Noting that it is quite variable. The first time I ran it it was about 1m20, the second time after creating a new image it was longer:

+ podman run --userns keep-id:uid=1002,gid=1003 localhost/rhel7_build_image uname -a
Linux 42eadf970412 4.18.0-372.26.1.el8_6.s390x #1 SMP Sat Aug 27 02:37:44 EDT 2022 s390x s390x s390x GNU/Linux

real	12m24.754s
user	0m2.755s
sys	0m14.669s

@sxa
Copy link
Member Author

sxa commented Apr 3, 2024

I was hitting a failure due to the RHEL7 devkit not being correctly extracted from the host. In this case when you point --with-devkit at the directory (which was present, but empty) it falls back to the system compiler in a somewhat non-obvious way since it fails to notice the absence of the devkit:

09:21:14  * Toolchain:      gcc (GNU Compiler Collection)
09:21:14  * Devkit:         /usr/local/devkit/s390x-on-s390x
09:21:14  * C Compiler:     Version 4.8.5 (at /usr/bin/gcc)
09:21:14  * C++ Compiler:   Version 4.8.5 (at /usr/bin/g++)

It will then fail to compile and towards the end show messages about untracked files, which have nothing to do with the underlying issue but did confuse me for a while:

9:21:37  	bootjdk.tar.gz
09:21:37  	bootjdk.tar.gz.sig
09:21:37  	jdk-20/
09:21:37  	security/

A "good" build with devkit will look something like this:

10:27:23  Tools summary:
10:27:23  * Boot JDK:       openjdk version "20.0.2-beta" 2023-07-18 OpenJDK Runtime Environment Temurin-20.0.2+9-202309010331 (build 20.0.2-beta+9-202309010331) OpenJDK 64-Bit Server VM Temurin-20.0.2+9-202309010331 (build 20.0.2-beta+9-202309010331, mixed mode, sharing) (at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-s390x-temurin/jdk-20)
10:27:23  * Toolchain:      gcc (GNU Compiler Collection)
10:27:23  * Devkit:         gcc-11.3.0 - Centos7.9.2009 (/usr/local/devkit/s390x-on-s390x.F19)
10:27:23  * C Compiler:     Version 11.3.0 (at /usr/local/devkit/s390x-on-s390x.F19/bin/gcc)
10:27:23  * C++ Compiler:   Version 11.3.0 (at /usr/local/devkit/s390x-on-s390x.F19/bin/g++)

and here is the equivalent with the RHEL7 devkit:

16:13:56  Tools summary:
16:13:56  * Boot JDK:       openjdk version "20.0.2-beta" 2023-07-18 OpenJDK Runtime Environment Temurin-20.0.2+9-202309010331 (build 20.0.2-beta+9-202309010331) OpenJDK 64-Bit Server VM Temurin-20.0.2+9-202309010331 (build 20.0.2-beta+9-202309010331, mixed mode, sharing) (at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-s390x-temurin/jdk-20)
16:13:56  * Toolchain:      gcc (GNU Compiler Collection)
16:13:56  * Devkit:         gcc-11.3.0 - Centos7.9.2009 (/usr/local/devkit/s390x-on-s390x)
16:13:56  * C Compiler:     Version 11.3.0 (at /usr/local/devkit/s390x-on-s390x/bin/gcc)
16:13:56  * C++ Compiler:   Version 11.3.0 (at /usr/local/devkit/s390x-on-s390x/bin/g++)

@sxa
Copy link
Member Author

sxa commented Apr 4, 2024

Note that I have generated a new devkit tarball on the RHEL8 machine with the devkit.info file modified to have the expected line based on the --with-adoptium-devkit parameter and modified the rhel7 image creation job to use pick up the new tarball.

ADOPTIUM_DEVKIT_RELEASE=s390x-on-s390x.RH7

@sxa
Copy link
Member Author

sxa commented Apr 4, 2024

I'd probably have left this in the final iteration of 1Q since that's where the work was done and only a few cleanups remained for this week but 🤷🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aarch Issues that affect or relate to the aarch ARCHITECTURE enhancement Issues that enhance the code or documentation of the repo in any way z-linux Issues that affect or relate to the s390x LINUX OS
Projects
Status: Done
Development

No branches or pull requests

2 participants