Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker image is huge #946

Closed
4 tasks
LeoColomb opened this issue Jan 22, 2019 · 22 comments
Closed
4 tasks

Docker image is huge #946

LeoColomb opened this issue Jan 22, 2019 · 22 comments
Labels
T: feature-request Requests for new features

Comments

@LeoColomb
Copy link
Contributor

LeoColomb commented Jan 22, 2019

dependabot/dependabot-core image from Dockerfile is amazing!

But it now exceeds 1 GB, which is starting to be huge. (why it matters)

Some suggestions to optimize Dependabot's image (sorted by difficulty):

  • Follow basic apt recommandations:
    • Use of apt-get update should be paired with rm -rf /var/lib/apt/lists/* in the same layer.
    • Use of apt-get upgrade must be avoided in a Dockerfile. (ref).
    • Consider using a --no-install-recommends when apt-get installing packages.
  • Exclude development dependencies when distributing "production" image (Promote runs without helpers dev dependencies #851).
    • Can be optimized easily with build scripts and an environment variable.
  • Use stages to build helpers and avoid excessive binaries distribution.
    • For example (if I'm correct) golang is not required to run the helpers once built.
  • Switch to a smaller base image for the last stage, commonly to Alpine.
@LeoColomb
Copy link
Contributor Author

LeoColomb commented Jan 31, 2019

Coming back to this issue (my system regularly complains this is taking too much disk space! 😱).

Actually I wonder if dividing the docker image wouldn't be even cleaner:

  • Native helper isolation;
  • Much (much) smaller image.

The only downside is that running all managers updaters will not be possible in the same container. (Is this a real downside? Are you used to run multiple manager-type updaters in the same container?)

If interested, I can make PRs to iterate over this idea.

@greysteil
Copy link
Contributor

I know @hmarr has some plans for this - can you share?

@hmarr
Copy link
Contributor

hmarr commented Jan 31, 2019

@LeoColomb yes, we're planning exactly that! This is actually one of the main motivations for splitting up each language / package manager into independent top-level gems in dependabot-core. Once that big refactor is finished (we're super close), the next stage is to produce much leaner Docker images for each package manager, and use multi-stage builds to further trim the unnecessary fat.

If we use multi-stage builds to set up each of the native helpers, it should also be pretty trivial to re-combine them all into one big image, which would be a nice feature for people running this from home / their own servers.

@hmarr
Copy link
Contributor

hmarr commented Jan 31, 2019

If you're up for helping out, that'd be awesome 🙌. I'll ping you once we're ready to start splitting up the Dockerfiles.

@LeoColomb
Copy link
Contributor Author

Cool! Sure, I'm up for this! 🎉 👍

@rajivshah3
Copy link

rajivshah3 commented Feb 2, 2019

### JAVASCRIPT
# Install Node 10.0 and Yarn
RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - \
&& apt-get install -y nodejs \
&& curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | apt-key add - \
&& echo "deb https://dl.yarnpkg.com/debian/ stable main" | tee /etc/apt/sources.list.d/yarn.list \
&& apt-get update && apt-get install -y yarn

If can move the base image to Alpine or Debian, we might be able to replace this with the node image

@rajivshah3
Copy link

@LeoColomb have you already started on the recommendations mentioned in #946 (comment)? If not I'd be happy to try it

@LeoColomb
Copy link
Contributor Author

@rajivshah3 Yes, I have, but the last two tasks in the checklist are very difficult to implement:

  • Languages are sometimes cross-required (eg. npm from Node for ELM) making hard to extract entirely these requirements into stages for builds;
  • More than half of the installed languages just can't be extracted since they are required to run helpers (eg. PHP);
  • Moving from Ubuntu to Alpine actually require a lot of rewrites, especially because the languages compilations may require more attention (just take a look at the respective official images to get an idea).

That's why splitting seems much more relevant, on the one hand because using official languages docker images will be easier, on the other hand required languages will be isolated and won't pollute other Dependabot images, all of this keeping the same base for runs which can helps the cache for the docker images.

@LeoColomb LeoColomb mentioned this issue Feb 8, 2019
11 tasks
@alexanderadam
Copy link

alexanderadam commented Jul 23, 2019

Also as there is currently the Docker refactoring going on in #986:
The official Docker docs say:

If a service can run without privileges, use USER to change to a non-root user.

It is considered as a best practice and a general recommendation to make any attack surface smaller.

@hmarr
Copy link
Contributor

hmarr commented Jul 25, 2019

@alexanderadam +1 on that. We use an unprivileged user internally (we just use the dependabot-core image as a base image), but I'd also like the base image to start out with the non-root user wherever possible.

@pedropombeiro
Copy link

pedropombeiro commented Sep 11, 2019

Has anyone considered using Nix to create a reproducible development/build environment which also happens to be much lighter (and runs on the metal on Macs too)? I'd be up to create a PR that sets up the environment to support all languages supported by dependabot-core (and would probably get set up in less time than Docker without the need to set up tens of volumes).

FWIW, Nix has been around since 2005 and has several hundreds of PRs merged every month, so the community can be said to be quite active.

Let me know if this would be interesting or if you have any questions.

@hmarr
Copy link
Contributor

hmarr commented Sep 17, 2019

Hi @pombeirp! I'm a big fan of Nix's approach. That said, we aren't able to support a migration to Nix right now. It'd represent a fairly large change to our build process, and we don't the capacity to make that change on our side given all the other things we have on our plate, and I'd rather not have two separate build processes in this project. Additionally, while in some ways inferior, Docker is a much more widely used project and therefore more accessible to wider audience.

@stale stale bot added the stale label Nov 16, 2019
@alexanderadam

This comment was marked as outdated.

@stale stale bot removed the stale label Nov 17, 2019
@stale stale bot added the stale label Jan 17, 2020
@LeoColomb

This comment was marked as outdated.

@stale stale bot removed the stale label Jan 17, 2020
@robcharlwood

This comment was marked as outdated.

@stale stale bot added the stale label Apr 19, 2020
@alexanderadam

This comment was marked as outdated.

@stale stale bot removed the stale label Apr 19, 2020
@infin8x infin8x added F: infrastructure T: feature-request Requests for new features labels Jul 2, 2020
@Olegt0rr
Copy link

Any news? The issue looks like abandoned :(

@jurre
Copy link
Member

jurre commented Apr 12, 2021

Some work is being done here to improve some of the layers, but the actual solution here would be to split up the images per ecosystem. We've been wanting to do that for a long time, but unsure when we'll be able to prioritize it.

@LeoColomb
Copy link
Contributor Author

for a long time

Very long time 😉

@Olegt0rr I've submitted a PR a bit more than a year ago to split per ecosystem: #986
I stopped keeping it up to date (because, a year, you know), but you can use it as a base. 🙂

@PeterDaveHello
Copy link
Contributor

It's now 1.22GB compressed

image

The extracted size is pretty larger > 4GB 😱

$ docker images
REPOSITORY                         TAG             IMAGE ID       CREATED             SIZE
dependabot/dependabot-core         latest          aa25fd520c04   About an hour ago   4.21GB

Hopefully #3896 & #3897 will make it a little bit smaller ...

@mattt
Copy link
Contributor

mattt commented May 9, 2022

I'm happy to report that my recent work in #5024 to refactor our build automation with Earthly should allow us to easily split up our Docker image by ecosystem.

For example, in this commit adds a +docker target to the npm_and_yarn ecosystem, which can be built independently of the core image.

Comparing the uncompressed sizes of each, we can produce an image ~5x smaller. A corresponding Maven image, which doesn't have any system dependencies or native helpers is even smaller.

$ docker image list
dependabot/dependabot-core-npm_and_yarn ... 738MB
dependabot/dependabot-core-maven        ... 541MB
dependabot/dependabot-core              ... 3.52GB

Of course, moving from Ubuntu / Debian to Alpine (as in #986) would trim things down further, but we don't have any plans to switch distros right now.

@jakecoffman
Copy link
Member

The Docker images are now broken out by ecosystem making them much smaller. We'll continue to work to make and keep them as small as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T: feature-request Requests for new features
Projects
None yet
Development

Successfully merging a pull request may close this issue.