Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add chown to COPY commands to reduce layer count #969

Merged
merged 2 commits into from
Oct 16, 2020

Conversation

bollwyvl
Copy link
Contributor

@bollwyvl bollwyvl commented Oct 15, 2020

Hi folks, thanks for repo2docker! ❤️

In the generated Dockerfile, this combines a number of COPY and subsequent USER and RUN chown calls with the COPY --chown syntax.

Anecdotally, this seems to reduce the layer count a good deal: 45 vs 53 for one of the first tests that runs with just an environment.yml... my tests are still chugging, so I don't have a bunch of other data points, bless CI!

I haven't assessed the impact to size, but I would image it would result in a 2-1 reduction, as the chowned files are, to my knowledge, copied in the subsequent layer. Can't hurt to look!

closes #164
closes #861

@welcome
Copy link

welcome bot commented Oct 15, 2020

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@minrk
Copy link
Member

minrk commented Oct 15, 2020

Thanks! Linking up to previous discussions: This closes #164 and is an expansion of #861.

The missing piece brought up in #861 was that using ARGs for username in chown means requiring docker 19.03, which was a little more recent than we wanted to force. Since we are already templating, using a template variable for NB_USER instead of env variable would let us do this in a way that works back to 17.09. Want to take a stab at that? Then I think we can land this one.

@@ -100,7 +100,7 @@
{% if build_script_files -%}
# If scripts required during build are present, copy them
{% for src, dst in build_script_files|dictsort %}
COPY {{ src }} {{ dst }}
COPY --chown=${NB_USER}:${NB_USER} {{ src }} {{ dst }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding NB_USER as a template variable in addition to build arg will allow us to do {{ NB_USER }} here and support docker engine back to 17.09 instead of requiring 19.03.

@bollwyvl
Copy link
Contributor Author

Thanks for the awesome, thorough response. of course y'all know about all this stuff, and 👻 me for not scrolling down.

this updates it to a jinja var, and also moved the magic 1000 up to one spot... as well as some notes with some of the information gleaned from the above. from a docs perspective, perhaps there is a good place I can add some docs like:

## Minimum Supported Docker Engine Version
To support the current flagship deployments, `repo2docker` currently requires Docker engine 17.09+.
Newer language features should be avoided until they are generally available on <platform, platform...>

This could be a separate, docs only PR...

@bollwyvl
Copy link
Contributor Author

Another thing I'm thinking about while digging around in the templates... way out of scope for this PR, but...

If the script object got a little fatter, and knew about its own files, they could be added "just in time". Further, multiple files headed for the same destination could be copied in a single layer

For example, apt.txt can break a repo's whole little universe, and can be expensive, but if it doesn't change, everything should be the same.

So something like:

apt = Script(
    files={"/tmp/": ["apt.txt"]},
    directives=["RUN apt-get install /tmp/apt.txt"],
)

consumed like...

{% for script in stage_scripts %}

{% for file in script.files %}
{% for src, dst in script.files|dictsort %}
COPY --chown={{ user }}:{{ user }} [{% for s in src %}{{ s }},{% endfor %}] {{ dst }}
{% endfor -%}

{% for s in script.directives %}
{{ directive }}
{% endfor %}

{% endfor %}

@betatim betatim merged commit 8c4eaa7 into jupyterhub:master Oct 16, 2020
@welcome
Copy link

welcome bot commented Oct 16, 2020

Congrats on your first merged pull request in this project! 🎉
congrats
Thank you for contributing, we are very proud of you! ❤️

@betatim
Copy link
Member

betatim commented Oct 16, 2020

Nice work actually implementing this. We might have known about this but apparently never got around to doing something about it :D

👍 on the docs only PR

The idea from #969 (comment) also needs a new issue/PR because (at least I) need a bit more explanation to understand what and why this would work.

@minrk
Copy link
Member

minrk commented Oct 16, 2020

Awesome, thanks for finishing this one!

@bollwyvl
Copy link
Contributor Author

bollwyvl commented Oct 16, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

use COPY --chown to stage repo (docker 17.09)
3 participants