Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove compiler toolchain and python #23

Merged
merged 1 commit into from
Nov 2, 2021

Conversation

orangejulius
Copy link
Member

We currently include a full compiler toolchain (gcc, autoconf, automake, etc), as well as Python, in our Docker baseimage used for all Pelias Docker images.

This used to be required, since we had several Node.js modules that needed to compile native code.

However, since better-sqlite3 added support for prebuilt binaries in version 6.0, I don't think we have any such modules left.

Removing all these packages reduces the uncompressed baseimage size from 243MB to 230MB according to Google's container-diff. That's not huge, but it's not nothing either and if it's free, then why not?

Note that some of our Dockerimages, like whosonfirst and placeholder, include some of these same dependencies, so we'll have to remove them there too.

Closes #21

We currently include a full compiler toolchain (gcc, autoconf, automake,
etc), as well as Python, in our Docker baseimage used for all Pelias
Docker images.

This used to be required, since we had several Node.js modules that
needed to compile native code.

However, since `better-sqlite3` added support for prebuilt binaries in
[version 6.0](https://github.com/JoshuaWise/better-sqlite3/releases/tag/v6.0.0),
I don't think we have any such modules left.

Removing all these packages reduces the uncompressed baseimage size from
243MB to 230MB according to Google's
[container-diff](https://github.com/GoogleContainerTools/container-diff).
That's not huge, but it's not nothing either and if it's free, then why
not?
@orangejulius
Copy link
Member Author

orangejulius commented Nov 1, 2021

I put together a little script to test that all our Docker images will still work with this change. From a directory with all the pelias repositories checked out, run the following:

#!/bin/bash

set -euo pipefail

for i in api placeholder pip-service whosonfirst geonames csv-importer openaddresses openstreetmap schema docker-libpostal_baseimage; do
        pushd $i
        git stash || true
        git checkout master || true
        git pull
        sed -i 's/FROM pelias\/baseimage/FROM pelias\/baseimage:remove-compiler-toolchain/' Dockerfile
        docker build . -t pelias/$i:remove-compiler-toolchain
        popd
done

All the tested Docker images build fine with this change, so I don't think it will break anything!

Edit: the polylines Docker image had an implicit dependency on gcc to install some Go modules. I'll take a look at sorting that out.

orangejulius added a commit to pelias/polylines that referenced this pull request Nov 1, 2021
In pelias/docker-baseimage#23 we're leaning out
our Docker baseimage used by all other Pelias images, and hopefully can
remove the compiler toolchain all together.

The Polylines Docker images _do_ need `gcc` (but not quite a full
compiler toolchain), but didn't follow the convention in our other
Dockerfiles of having an `apt-get` step to install it.

This adds such a step, and is a little clever in installing `gcc` only
temporarily, and only for the `go get` step that requires it.

The polylines Docker image is already quite large (950MB uncompressed)
since it includes Node.js, Go, and package dependencies for both.
Skipping the installation of `gcc` cuts out 120MB of that.

Until pelias/docker-baseimage#23 this change
won't really have any impact on the size or operation of this Docker
image.
orangejulius added a commit to pelias/whosonfirst that referenced this pull request Nov 1, 2021
After `better-sqlite3` added support for pre-compiled binaries in
https://github.com/JoshuaWise/better-sqlite3/releases/tag/v6.0.0, we no
longer need to install a compiler toolchain to run `npm install` in our
Docker images.

pelias/docker-baseimage#23 is workong on
removing the compiler toolchain from our Pelias baseimages. In order for
the toolchain to be removed from the whosonfirst image in particular, we
also need to remove those dependencies here.

Until that PR is merged, this change is effectively a no-op. After,
between the two PRs we reduce the size of the whosonfirst docker image
from 490MB to 261MB, an impressive 221MB savings!
orangejulius added a commit to pelias/placeholder that referenced this pull request Nov 1, 2021
After better-sqlite3 added support for [pre-compiled
binaries](https://github.com/JoshuaWise/better-sqlite3/releases/tag/v6.0.0), we
no longer need to install a compiler toolchain to run npm install in our
Docker images.

pelias/docker-baseimage#23 removes the compiler toolchain from our
Pelias baseimages. In order for the toolchain to be removed from the
Placeholder image in particular, we also need to remove those
dependencies here.

Similar to the whosonfirst repository in
pelias/whosonfirst#532, this change by itself is effectively a no-op. After
the baseimage removes the compiler toolchain,the size of the Placeholder docker image
goes from 495MB to 266MB, an impressive 229MB savings!
orangejulius added a commit to pelias/polylines that referenced this pull request Nov 1, 2021
The polylines Docker image is a bit of a large one currently, as it
includes not just Node.js and a `node_modules` directory, but a full
compiler toolchain, an install of the Go language, the dependencies of
the `pbf` repository from https://github.com/missinglink/pbf, and the
final `pbf` executable that comes from it.

All told, this brought the total image size to a whopping 950MB
uncompressed.

This PR makes use of multi stage builds to run the compiling of the
`pbf` executable in a separate container. After this, all the toolchain
and dependencies needed can be thrown away, and only the small
executable copied to the final image.

Using `container-diff` it looks like the image size, uncompressed, after
pelias/docker-baseimage#23 as well, will be only
322MB. That's a nice 600MB savings!

Before pelias/docker-baseimage#23 the image size
still drops to 500MB, still a healthy reduction.

Replaces #262
orangejulius added a commit to pelias/polylines that referenced this pull request Nov 1, 2021
The polylines Docker image is a bit of a large one currently, as it
includes not just Node.js and a `node_modules` directory, but a full
compiler toolchain, an install of the Go language, the dependencies of
the `pbf` repository from https://github.com/missinglink/pbf, and the
final `pbf` executable that comes from it.

All told, this brought the total image size to a whopping 950MB
uncompressed.

This PR makes use of multi stage builds to run the compiling of the
`pbf` executable in a separate container. After this, all the toolchain
and dependencies needed can be thrown away, and only the small
executable copied to the final image.

Using `container-diff` it looks like the image size, uncompressed, after
pelias/docker-baseimage#23 as well, will be only
322MB. That's a nice 600MB savings!

Before pelias/docker-baseimage#23 the image size
still drops to 500MB, still a healthy reduction.

Replaces #262
orangejulius added a commit to pelias/polylines that referenced this pull request Nov 1, 2021
The polylines Docker image is a bit of a large one currently, as it
includes not just Node.js and a `node_modules` directory, but a full
compiler toolchain, an install of the Go language, the dependencies of
the `pbf` repository from https://github.com/missinglink/pbf, and the
final `pbf` executable that comes from it.

All told, this brought the total image size to a whopping 950MB
uncompressed.

This PR makes use of multi stage builds to run the compiling of the
`pbf` executable in a separate container. After this, all the toolchain
and dependencies needed can be thrown away, and only the small
executable copied to the final image.

Using `container-diff` it looks like the image size, uncompressed, after
pelias/docker-baseimage#23 as well, will be only
322MB. That's a nice 600MB savings!

Before pelias/docker-baseimage#23 the image size
still drops to 500MB, still a healthy reduction.

Replaces #262
orangejulius added a commit to pelias/libpostal-service that referenced this pull request Nov 2, 2021
…encies

After pelias/docker-baseimage#23, we will no
longer have a compiler toolchain in our Docker baseimage. However, due
to the way Docker images work and build upon each other, the biggest
wins come from ensuring we don't have a compiler toolchain _anywhere_ in
our images.

If you think about it, even a single image having a compiler toolchain
is the same as the baseimage having it, at least when comparing the
total size of all our images.

Thankfully, with multistage builds we can easily remove both the C++
compiler toolchain and Golang buildtime dependencies in the libpostal
service, similar to pelias/polylines#263.

This alone drops the total image size for the libpostal-service from
3.2GB to 2.8GB. Further improvements are possible in the libpostal
baseimage.
orangejulius added a commit to pelias/libpostal-service that referenced this pull request Nov 2, 2021
…encies

After pelias/docker-baseimage#23, we will no
longer have a compiler toolchain in our Docker baseimage. However, due
to the way Docker images work and build upon each other, the biggest
wins come from ensuring we don't have a compiler toolchain _anywhere_ in
our images.

If you think about it, even a single image having a compiler toolchain
is the same as the baseimage having it, at least when comparing the
total size of all our images.

Thankfully, with multistage builds we can easily remove both the C++
compiler toolchain and Golang buildtime dependencies in the libpostal
service, similar to pelias/polylines#263.

This alone drops the total image size for the libpostal-service from
3.2GB to 2.8GB. Further improvements are possible in the libpostal
baseimage.
orangejulius added a commit to pelias/docker-libpostal_baseimage that referenced this pull request Nov 2, 2021
With pelias/docker-baseimage#23 removing the C++
compiler toolchain from our baseimages, ideally we will remove build
time dependencies everywhere to save space on disk and over the network.

The libpostal baseimage does require a compiler toolchain to build
libpostal, but not to run it. So we can use a multistage image where
libpostal is compiled with all its dependencies, but only the build
artefacts are kept in the final image.

That saves a few hundred MB, but the libpostal GitHub repository is also
about 80MB, so we get even more savings there.
orangejulius added a commit to pelias/docker-libpostal_baseimage that referenced this pull request Nov 2, 2021
With pelias/docker-baseimage#23 removing the C++
compiler toolchain from our baseimages, ideally we will remove build
time dependencies everywhere to save space on disk and over the network.

The libpostal baseimage does require a compiler toolchain to build
libpostal, but not to run it. So we can use a multistage image where
libpostal is compiled with all its dependencies, but only the build
artefacts are kept in the final image.

That saves a few hundred MB, but the libpostal GitHub repository is also
about 80MB, so we get even more savings there.
@orangejulius
Copy link
Member Author

orangejulius commented Nov 2, 2021

I'm going to hit merge on this since it seems safe, but I won't explicitly bump all the end projects to rebuild on this new baseimage just yet. The highly active repositories like API and Placeholder will get this quickly, but I'll take care of all the others after hopefully updating the baseimage to Node.js 16 soon.

@orangejulius orangejulius merged commit ea19fa5 into master Nov 2, 2021
@orangejulius orangejulius deleted the remove-compiler-toolchain branch November 2, 2021 22:44
orangejulius added a commit to pelias/placeholder that referenced this pull request Nov 5, 2021
After better-sqlite3 added support for [pre-compiled
binaries](https://github.com/JoshuaWise/better-sqlite3/releases/tag/v6.0.0), we
no longer need to install a compiler toolchain to run npm install in our
Docker images.

pelias/docker-baseimage#23 removes the compiler toolchain from our
Pelias baseimages. In order for the toolchain to be removed from the
Placeholder image in particular, we also need to remove those
dependencies here.

Similar to the whosonfirst repository in
pelias/whosonfirst#532, this change by itself is effectively a no-op. After
the baseimage removes the compiler toolchain,the size of the Placeholder docker image
goes from 495MB to 266MB, an impressive 229MB savings!
orangejulius added a commit to pelias/interpolation that referenced this pull request Nov 10, 2021
The [node-postal](https://github.com/openvenues/node-postal) NPM module
requires a full C++ compiler toolchain _and_ python3 to install. After
pelias/docker-baseimage#23 and
pelias/docker-libpostal_baseimage#5 this
toolchain is no longer present in our Docker baseimage.

This PR uses a Docker multi-stage build to build _just_ the NPM modules
required by the interpolation service while a C++ toolchain is present.

The `node_modules` directory can then be copied to the final image
without needing a C++ toolchain or python to be present.

In addition to saving some space in the final image, this fixes issues
people were having with our Docker images, since `node-postal` wasn't
functional.

Fixes pelias/docker#271
orangejulius added a commit to pelias/interpolation that referenced this pull request Nov 10, 2021
The [node-postal](https://github.com/openvenues/node-postal) NPM module
requires a full C++ compiler toolchain _and_ python3 to install. After
pelias/docker-baseimage#23 and
pelias/docker-libpostal_baseimage#5 this
toolchain is no longer present in our Docker baseimage.

This PR uses a Docker multi-stage build to build _just_ the NPM modules
required by the interpolation service while a C++ toolchain is present.

The `node_modules` directory can then be copied to the final image
without needing a C++ toolchain or python to be present.

In addition to saving some space in the final image, this fixes issues
people were having with our Docker images, since `node-postal` wasn't
functional.

Fixes pelias/docker#271
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

minimal image
1 participant