-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement support for dockerignore and containerignore #1205
Conversation
Thanks for submitting your first pull request! You are awesome! 🤗 |
I just thought about something to discuss based on the default "ignore .git" behaviour introduced by this patch. Since the goal of repo2docker is to provide a ready to use work environment with only the relevant pieces from a repo, should we have a configurable list of paths that we could ignore by default so people don't need to introduce another configuration file ? I am currently mainly thinking of git and forge specific files and folders like |
I think we need to keep |
@manics I follow you on the reproducibility side but I thought that the following labels were there for that:
? I just saw that In case we keep the .git folder by default, what are you thinking about my suggestion of a new argument that can be used with a list of files/folders to ignore ? |
2b0d6a3
to
2e5e3c0
Compare
I rebased this PR by mistake, sorry for the noise! |
@consideRatio no worries ! On that matter, something I am unsure of, it it globally preferred to rebase or to merge main back to the branch ? I am used to do the former to have a "semi-linear" history as they say in GitLab's configuration. I thought I have read something about it in the contributor documentation but I can't find it anymore. |
There is clear no consensus, but I think of two situations. When a PR is accepted an merged/rebased/squashed, and how to handle PR development before that.
|
Thanks @consideRatio, sounds good to me. I haven't yet had a really big review on GitHub were I could see that rebasing did break the process for reviewers. Some times I have encountered policies that would be "we use merge from main" when updating a pull request but that is all. If it could help shape a consensus for the Jupyter project, we can use one of my contribution to do some testing in that regard. |
@manics a small ping to get your opinion an my reproducibility related questions above as well as |
Having re-read #269, the ignore I'll open tickets for the other points as they are not directly related to this patch. |
2e5e3c0
to
1bb3cba
Compare
Currently repo2docker creates a context object that includes the whole content of the repository it builds an image for. Thus it includes folders like .git which is usually something that has no interest in the final image, can take quite a lot of space and most importantly, kills the caching of that layer. This patch adds support for reading dockerignore and containerignore files that are used to ensure only the relevant data are used to build the image. By default it also excludes the .git folder if neither of these files are provided.
The original behavior was to create an src directory with the content of the repository. The creation would happen in any case (remote or local repository). With the filtering in place and the default to remove the .git folder, it breaks the build as the src folder can be missing. This patch ensures that the directory is present in the tar so the build can continue as it did until now.
See the design chapter for more details.
This ensures that the ignore files are retrieved from the proper folder. If they weren't the build would not succeed as the binder folder is ignored.
8572871
to
77df191
Compare
Currently repo2docker creates a context object that includes the whole content of the repository it builds an image for. Thus it includes folders like .git which is usually something that has no interest in the final image, can take quite a lot of space and most importantly, kills the caching of that layer.
This patch adds support for reading
dockerignore
andcontainerignore
files that are used to ensure only the relevant data are used to build the image.By default it also excludes the .git folder if neither of these files are provided.
It makes use of docker-py's own file discovery mechanism to ensure coherence and avoid re-inventing a working wheel.
Note:
containerignore
comes from podman that reads either file. See the podman-build documentation.Fixes #937
Related to #268 and #269 final discussions.