Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce image size #1283

Closed
wants to merge 3 commits into from
Closed

Reduce image size #1283

wants to merge 3 commits into from

Conversation

Aschen
Copy link

@Aschen Aschen commented Jun 26, 2020

What this PR do?

This PR reduce the alpine image size by 20% from 117MB to 93.5MB.

This PR reduce the alpine image size by 7,5% from 117MB to 108MB.

  • strip node binary (10MB)
  • remove c headers (3.9MB)
  • remove documentation (~6MB)
  • remove unnecessary files such as LICENSE
  • yarn compile cache (2.3MB)

If those changes are ok for you, I will do the same for other images.

@nschonni
Copy link
Member

I don't think removing LICENSE files arbitrarily is safe, as some license require the text to be distributed with the code.

@Aschen
Copy link
Author

Aschen commented Jun 27, 2020

You're right, I will keep the license files.
About stripping Node binary I don't think it's a good idea after all because some observability tools may rely on debug symbols (see this comment nodejs/build#2367 (comment))
What do you think?

@mmarchini
Copy link

It's probably best to not strip the binary. Maybe on the slim image, but even then I'm not sure. Users who need an image smaller than alpine or slim can either delete the unnecessary files from this image on a new layer, or they can build one from scratch using our Dockerfiles as base. I'd be curious which use cases require such savings on a dependency image (which is only downloaded once per version), since those are usually more worth on final images (which are usually significantly bigger and built many times).

remove c headers

This means users who install native modules will need to download the headers every time they build their images, so the tradeoff is 3.9MB on the base image vs. increased build time for all users who need headers.

@mmarchini
Copy link

Documentation seems fine to remove IMO? All others can cause some impact to our users.

If we're moving forward with this we'll probably also need to remove it from the other alpine images and the slim images (the gains on the normal image would be very small). Also, would it need to be semver-major and thus wait for v15? If not we can do this for v10 and v12 as well.

@Aschen
Copy link
Author

Aschen commented Jun 29, 2020

@mmarchini What about the yarn compile cache?

@mmarchini
Copy link

I'm not familiar with yarn enough to know how impactful removing it would be to our users, so I'll defer to others to answer that

@Aschen
Copy link
Author

Aschen commented Jul 21, 2020

As far as I can see the /tmp/v8-compile-cache is used to speed up a little yarn loading time: https://www.npmjs.com/package/v8-compile-cache (from 153ms to 113ms)

@Aschen
Copy link
Author

Aschen commented Sep 17, 2020

@mmarchini About the cache the choice is between some performance gain for some packages or a 2.3 MB gain in image size.
image

My thought are it's better to reduce the image size for everyone than improving the perf for certain packages

@mmarchini
Copy link

I'm not blocking it as it is right no btw :)

This is a trade-off between base image download (happens once) vs installation of packages during final image build (happens every time the user builds). I'm still not convinced the savings are worth it, but I'll defer any decisions to the Docker team as they have more expertise on which use cases this image is used for (and therefore which optimizations are desirable).

Co-authored-by: Axel Navarro <navarroaxel@gmail.com>
@dradetsky
Copy link

I don't think removing LICENSE files arbitrarily is safe, as some license require the text to be distributed with the code.

I would urge everyone to re-examine this assumption. I don't know all the FOSS licenses, but they generally don't require that the licenses be embedded in compiled binaries produced from the code. For example, on GPLv3, we see:

The “source code” for a work means the preferred form of the work for making modifications to it. “Object code” means any non-source form of a work.

A built docker image is clearly object code, as are the source files copied into it, because if we wanted to edit the source code we'd make PRs into the repos for the individual packages, and then pull those into the container on a subsequent build. We wouldn't edit them inside the container.

So it doesn't matter that the GPL requires that the license be distributed with the source, because we aren't actually distributing the source, we're distributing an object form of the source. I can't think of a license which distinguishes between source & object code in a way which is materially different from GPL.

Also, when you just think about it, putting all those licenses in a docker container is kind of silly. Nobody's going to read them. You do it because you think maybe the license terms require it. Well, I don't think they do.

Anyhow, by my count

/usr/local/lib/node_modules/npm # find . -name "LICENSE" | xargs cat | wc -c
264354
/usr/local/lib/node_modules/npm # find . -name "*.md" | xargs cat | wc -c
3081096
/usr/local/lib/node_modules/npm # find . -type f | xargs cat | wc -c
12727614

So it's not nothing. And of course that's not all the useless files you could delete.

Anyway, I would strongly urge you to just delete all the licenses. If for some reason I'm wrong, and somebody complains, you can always stop doing it on subsequent builds. But I really don't think I am.

Base automatically changed from master to main March 15, 2021 16:23
@ttshivers ttshivers mentioned this pull request Apr 10, 2023
12 tasks
@kuzzleio kuzzleio closed this by deleting the head repository Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants