Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docker baking #1137

Merged
merged 4 commits into from
Jul 5, 2021
Merged

Add docker baking #1137

merged 4 commits into from
Jul 5, 2021

Conversation

timja
Copy link
Member

@timja timja commented Jul 2, 2021

ref #1133 (comment)

This is part 1 in a series of PRs

It converts the build process (CI only) to docker buildx bake which parallelises the build, and will allow moving shell script logic / makefile to declarative configuration

Already it shows that a number of the dockerfiles we have checked into the repo aren't being published which is not clear from looking at the scripts.

Remaining work for future PRs:

  • Convert publishing scripts to bake
  • Enable multi-arch on CI, make targets have been added in this PR to show how that will work
  • Add ssh credentials to trusted ci credentials store for existing s390x, ppc64le static agents
  • Update pipeline to load ssh credentials for above agents
  • Create an arm64 VM
  • Add ssh credentials to trusted ci for arm and add to pipeline
  • Add builder definitions to trusted ci agent or pipeline, see description in Add multi-arch build #1133
  • Enable publishing multi arch builds, should just be a matter of removing a --set '*.platform=linux/amd64'

Above steps do not need to all be done in order, probably won't enable CI until publishing is ready although can have a PR up earlier.

Any thoughts / feedback?

Help would be loved as well cc @dduportal / @olblak

@timja timja requested a review from a team as a code owner July 2, 2021 07:58
@timja
Copy link
Member Author

timja commented Jul 2, 2021

Timings for image build:

Previous master build: 5m 48s
This PR: 1m58s

Checking master build to see if it's failing too

Makefile Outdated Show resolved Hide resolved
"centos7_jdk8",
"centos8_jdk8",
"debian_jdk8",
"debian_jdk11",
Copy link
Member

@olblak olblak Jul 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to provide java11 for debian only?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a reverse engineered version of what we're currently publishing, we can change it later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've intentionally limited the images we provide because each image we provide needs a maintainer that will care for the image after its initial definition and delivery.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarkEWaite I think this question was more, why only debian has jdk11 and not the other variants, especially because all the dockerfile's exist in the repo.

and I believe the answer is because historically each image was built and published serially adding each variant made the build take longer.

With the introduction of docker buildx bake this moves it to a parallel build and isn't such a concern meaning we could introduce more variants if we wished, although I'd like to do it by retiring the jdk8 versions instead.

(build time is approx 5x faster now)

@olblak
Copy link
Member

olblak commented Jul 2, 2021

That work is really amazing as it brings a lot of clarity. Thanks for starting this

Makefile Show resolved Hide resolved
docker-bake.hcl Outdated
}

variable "OWNER" {
default = "jenkins"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to publish to jenkins4eval by default and override to publish to jenkins when running on trusted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why’s that? Publish is only run on trusted from what I can tell

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's commented out right now, but we do want to publish experimental.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would get published there? I think the work I'm doing here replaces all that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multi-arch images are not tested very well. The idea was we release them to jenkins4eval until they are tested more and then also publish them to jenkins.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jenkins4eval is a dockerhub org that we use to publish and test docker image, it's an untrusted organization as everybody can push images to it from ci.j.io

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think experimental images belong in their own bake file.

and aren't really related to this PR?

In future someone can restore jenkins4eval is needed but it's not used atm?

docker-bake.hcl Outdated Show resolved Hide resolved
docker-bake.hcl Outdated Show resolved Hide resolved
docker-bake.hcl Outdated Show resolved Hide resolved
docker-bake.hcl Outdated Show resolved Hide resolved
Co-authored-by: Mark Waite <mark.earl.waite@gmail.com>
@MarkEWaite
Copy link
Contributor

Thanks so much for this @timja!

I installed Docker on a Ubuntu 20.04 aarch64 computer and ran make build-arm64. The commands completed successfully. The Docker images that were created included:

REPOSITORY        TAG            IMAGE ID       CREATED       SIZE
jenkins/jenkins   2.300-centos   bc78a689a1e3   3 hours ago   965MB
jenkins/jenkins   2.300-jdk11    b1a781309594   3 hours ago   664MB
jenkins/jenkins   2.300-slim     465f1ddf7d72   3 hours ago   389MB
jenkins/jenkins   2.300          3d3d000397f6   3 hours ago   558MB

I was able to confirm that three of the four images can report their Java version and can report the contents of /etc/os-release. A sample of that output looks like this:

$ docker run --rm -i -t jenkins/jenkins:2.300 cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

$ docker run --rm -i -t jenkins/jenkins:2.300 java -version
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)

The jenkins/jenkins:2.300-centos image is different than the other three images. It reports:

$ docker run --rm -i -t jenkins/jenkins:2.300-centos cat /etc/os-release
standard_init_linux.go:228: exec user process caused: exec format error
$ docker run --rm -i -t jenkins/jenkins:2.300-centos java -version
standard_init_linux.go:228: exec user process caused: exec format error

Is there anything I can investigate that would clarify why the centos image is different than the others?

@timja
Copy link
Member Author

timja commented Jul 2, 2021

I'm not sure,

It works just fine on my arm64 machine,

➜  docker git:(bake) docker run --rm -i -t jenkins/jenkins:2.300-centos cat /etc/os-release
NAME="CentOS Linux"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
➜  docker git:(bake) docker run --rm -i -t jenkins/jenkins:2.300-centos java -version
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)
➜  docker git:(bake) docker run --rm -i -t jenkins/jenkins:2.300-centos arch
aarch64

I've been using the centos image to do all my testing

@timja
Copy link
Member Author

timja commented Jul 2, 2021

@slide is there a reason we don't publish version specific versions of the windows images? https://hub.docker.com/r/jenkins/jenkins/tags?page=1&ordering=last_updated&name=windows

@slide
Copy link
Member

slide commented Jul 2, 2021

@slide is there a reason we don't publish version specific versions of the windows images? https://hub.docker.com/r/jenkins/jenkins/tags?page=1&ordering=last_updated&name=windows

Other than it's just not setup right now, no. We would need to change the make.ps1 publish command to add -PushVersions and then in make.ps1 add the parameter. There are checks already in the make.ps1 to push versions tags, we just aren't using it right now.

@MarkEWaite
Copy link
Contributor

I've been using the centos image to do all my testing

The arm64 / aarch64 processor that I'm running is hosted on Oracle Cloud and is running Docker 1.10.7. I've seen the same failure on Oracle Linux 8 and on Ubuntu 20. Both failure cases were running on Oracle Cloud. I'll continue exploring as time allows.

@timja
Copy link
Member Author

timja commented Jul 5, 2021

Is there anything blocking this PR?

None of the multi-arch work here is enabled, it's plumbing for later.
None of the publishing scripts are touched, just the build run on CI.

I've got changes to the publishing scripts locally that are pretty close to ready, but I don't want to load too much in each PR.

@timja timja requested review from slide, olblak and MarkEWaite July 5, 2021 06:48
@MarkEWaite
Copy link
Contributor

MarkEWaite commented Jul 5, 2021

Is there anything blocking this PR?

Nothing blocking the pull request from me. I assume that the issues I'm encountering are either unique to my environment or are not yet actually relevant to use of this pull request.

I'm able to use this pull request on my amd64 Debian 10 computer ("buster"). The Docker version 20.10.7 has the buildx command.

I'm not able to use this pull request for CentOS on the Arm64 Oracle Cloud instances where I'm running. I've confirmed the issue exists for Ubuntu 20 on Arm in that environment and for Oracle Linux 8 in that environment.

I'm not able to use this pull request on the Arm64 Amazon Linux instance I have running on AWS because the Docker version provided with Amazon Linux does not seem to include the docker buildx command. I haven't tried other Linux variants on AWS.

I'm not able to use this pull request on my amd64 Debian testing (pre-release of Debian bullseye). The Docker version installed on Debian testing is 20.10.5 and does not have the buildx command.

I assume for now that we'll only be running buildx on machines that have a new enough Docker version to include that command.

@timja
Copy link
Member Author

timja commented Jul 5, 2021

It should be supported from docker 19.03 @MarkEWaite.

On older docker versions you may need to add:

export DOCKER_CLI_EXPERIMENTAL=enabled

I would recommend not worrying about it too much, we should be pulling docker from docker upstream repositories not package manager repositories so we should always be on the latest or close to latest anyway depending on when the base VM image was last built.

Copy link
Contributor

@dduportal dduportal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! This PR is improving the build time and the rest is "pre-plumbing" so let's go. Thanks a lot for this!

@timja timja merged commit 12fb253 into jenkinsci:master Jul 5, 2021
@timja timja deleted the bake branch July 5, 2021 18:37
@saper
Copy link
Collaborator

saper commented Jul 14, 2021

I have just tried this on Fedora 33 with Docker version 19.03.13, build 4484c46 and got:

docker buildx bake -f docker-bake.hcl --set '*.platform=linux/amd64' --load linux 
unknown shorthand flag: 'f' in -f

The old way worked fine there...

@saper saper mentioned this pull request Jul 14, 2021
6 tasks
@dduportal
Copy link
Contributor

@saper that is a good feedback: we should state clearly that Docker 20.10 is recommended,

  • I'm gonna try with Docker 19.03 + the latest buildx plugin, could you try the same?
  • Let me update the documentation in Enable parallel flag in bats #1128 to state the Docker 20.10 or Docker <=20.04 with buildx plugin

@saper
Copy link
Collaborator

saper commented Jul 14, 2021

I'm gonna try with Docker 19.03 + the latest buildx plugin, could you try the same?

My mistake - I thought I had buildx working already on this box but it wasn't - had to follow https://github.com/docker/buildx/#with-buildx-or-docker-1903 to get it installed and the make build process seems to be starting now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants