-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[reprounzip-docker] copy extracted DATA directory instead of data.tgz #274
Comments
Hi and thanks for looking into this! Basically the semantics I want:
I have been struggling a long time with the extraction of data in the image. I have changed the tar flags multiple times, and there are still issues, like #145. One set of flags I was using before correctly merged the files from the image with the files from the TAR, but failed if overwriting a directory with a file (as it can happen when unpacking the Fedora tree structure over the Debian one). The current flags (11929b3) don't fail in that case, but tend to remove existing files from directories when extracting files into it. I actually considered writing the Docker image manually so that I have better control over this. Instead of writing out a Dockerfile and running it, just assembling an image tar or container tar, and loading it with The issue I see with your own approach is that file and directory ownership would not be carried over. Permissions can also be lost if you are doing this on Windows, since you are round-tripping through the Windows file system that doesn't support them. Indeed it is very unfortunate that the |
Thanks for your detailed reply. The On a related note, I am working on an alternative method to minimize Docker containers with The alternative I am working on is to remove all of the files within the running container that were not caught by the trace, and then |
I think the best solution for both of us is to write out a Docker container TAR directly for |
That sounds good. I will keep you updated on my progress. |
Also note that being able to trace Docker images is something we are interested in! Tracing without installing ReproZip inside the container should be possible (but not super straightforward). I do not have time to work on it now unfortunately, as other matters are more pressing (but less fun). However should you find a reliable method to trace Docker images it is definitely something we'll want to support and distribute as part of ReproZip. |
Can you explain a little bit how one could trace inside a container without installing ReproZip? I might be able to work on this. |
The processes in the container's PID namespace also exist as processes on the host, so you can attach to them from the outside. However you would need the application to wait for ReproZip to attach before it starts. I imagine something like this:
Packing would be a different process, reading files from the original image instead of the filesystem. There might even be a way to have the ReproZip tracer itself be in a (separate) container, putting it in the same PID namespace as the container we want to trace using |
Also note that if using things like docker-machine or "docker native" the "host" is the VM, which would make this a bit annoying (unless ReproZip is also running in a container). I have most of the code for this and might take a shot at it when I find the time, of course you are welcome to try and make sense of my code 😅 |
Thanks, that makes sense to me. Is the code you wrote in a branch in this project? The Docker documentation for |
Have you considered using the |
I think |
Hi @remram44 - can you share the code you have to run a trace without having reprozip installed? I'd like to have a go at it. |
The specification for CircleCI 2.0 is stored in `.circleci/config.yml` instead of in `circle.yml`. Todo: - Run tests. For now, images are built and pushed, but no tests are run. - Minimize containers with neurodocker reprozip. This functionality will be updated soon. See discussion in VIDA-NYU/reprozip#274.
Hi @kaczmarj, unfortunately I have no code for this yet. This would be a change in the beginning of the tracer (instead of current |
I misunderstood you when you said to try to make sense of your code :) |
Using multi-stage builds would fix the original issue (that the FROM debian:stretch
COPY data.tar.gz /
RUN tar xzf tar.gz --strip-components 1
RUN rm /data.tar.gz
FROM debian:stretch
COPY --from=0 / / the new pieces to the dockerfile would be those last three lines. it will start a build from scratch and copy root over. this will copy all of the necessary files but the if this looks ok with you @remram44 i would be happy to submit a pr |
I will look into it. Do you think it would be possible to use a "reprounzip" docker image as the other container, so that anyone can build a Docker image from an RPZ like this? (apologies for the syntax, I have never used multi-stage builds, hopefully you get the idea)
|
yes i think that would work. bootstrapping |
Hello,
Regarding
reprounzip docker
, you can achieve smaller image sizes by copying over the untarredDATA
directory instead of copying and then extractingdata.tgz
. The delta image size is the size ofdata.tgz
. Is there a reason to havedata.tgz
inside the image?Also, all of the
COPY
instructions can be merged into one, because they copy into the same directory (/
). The Dockerfile could look like this:If this looks OK, I would be more than happy to submit a PR.
Thanks,
Jakub
The text was updated successfully, but these errors were encountered: