-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify dock_from_desc? #34
Comments
👋, Thanks a lot for this comment! I'm very glad to hear you like TBH this do reflects an internal discussion we had about what is the correct way to build a After that little piece of context, let me split my answer in several points. Why do we install all dependencies before?You've guessed right — basically the way the We're having a series of FROM rocker/r-ver
COPY app_*.tar.gz /app.tar.gz
RUN install_local(/app.tar.gz) will mean that you will have to reinstall the whole set of dependencies every time you come with a new version of the application. Which can happen a lot of times during dev :) On the other hand FROM rocker/r-ver
RUN remotes::install_cran(dep1)
RUN remotes::install_cran(dep2)
[...]
COPY app_*.tar.gz /app.tar.gz
RUN install_local(/app.tar.gz) Will only rerun from The downside : Docker can limit to 127 layers (see there for more details), but it seems to be kernel dependent. And to be honest, I'm not sure there will be that much golem apps with more than 100 listed dependencies :) I agree though that it's a dev format. So maybe we should have two flavours of Dockerfile ? (one dev, one ops ?), with the production format being just the copy & the Why that much
|
Thanks a lot @ColinFay for the great and complete reply, this is exactly the kind of discussion and I was aiming to trigger, getting the insights and reasoning behind the approach used in I believe Let me share some more thoughts following on your points. Individual
|
Great Discussion ! It may not be the perfect place but as you started the discussion, I chime in to share some thoughts. About deployment with docker, specifically for dev workflow, I prefer the approach of using a package management tool (before {packrat} now {renv}) to separate the package management from the docker and gets closer to R project. This allows also deployment without docker and ease the setup for a new dev environment (for a new developer in the team or change of computer). I find it a more general approach. Without docker, it would mean just restoring the project dependencies using R code. With docker, It means that a volume needs to be mounted for the project library where the package would be installed once and every new deployment would not reinstall packages already installed. Very quick deployment then. If the same deployment environment is used for many containers (it could be in testing setup), then the cache mechanism of such tools would accelerate the process for all deployment. The main advantage I see to this is having fixed version of package is possible. We use this at work - it allows to work in teams with the developer, the IT teams and the dataops who will help setup the deployment bridging R and DevOps stuffs. The approach without a project package management tool can be interested too. (always testing with last package versions, simplicity) and be coupled with a project repository (custom RAN) approach that would help fixed project version of packages. This is something used at works too using a NEXUS repository) but not for R packages. @ColinFay if you are curious about what I have tried so far for docker deployment and shiny apps using what I described, I can work on an example on how it would work with Golem. Cheers. |
@cderv thanks for the input! That's very enlightening. I'd be very happy to read what you have in mind for a I was also thinking about how we could share the I wonder also if this could be an option to the dockerfile creation, something like @riccardoporreca ok got it :) indeed the alphabetical order can make one lucky (or not), and I'm not sure how to tackle that issue for now 🤔 What I regularly do is what you're suggesting : building a "stable" docker image with the list of dependencies that I know won't change that much, and a second one which On one hand if we switch to simply install_local we reduce the amount of layers, but that means that we need to rebuild everything during dev, but that fits for ops. Thanks for pointing the For now though I think the next effort will be to integrate I think we should indeed split the dockerfile creation into several options (at least one for dev, one for ops). The choice of the image infrastructure being a specific to each project, I'm wondering what's the right overall approach to this. I can already see four combination : for dev / for ops & with / without environment management tools. One other thing worth considering is the date : currently, the rocker dockerfiles are set up to a specific date, linked to an MRAN repository. So for example if you have an older version of R but recent package, you need to manually change the date of the Docker image. I wonder if this should be the default behaviour in the current context (deploying the app) and if we should let the user change that date with a function parameter 🤔 |
Hey, All docker-related functions have moved to {dockerfiler}. We'll keep track of the issues here. |
Is there a reason why we are installing (some of) the package dependencies with an explicit
remotes::install_cran()
golem/R/add_deploy_helpers.R
Line 244 in 458ace8
instead of getting this through the existing
remotes::install_local()
?golem/R/add_deploy_helpers.R
Line 255 in 458ace8
Is it to get as much work as possible done before the
COPY
to leverage on layerization?golem/R/add_deploy_helpers.R
Line 251 in 458ace8
Beyond this, is there a big benefit of having individual
RUN
instructions for installing each package separately instead of using vectorized installation in a single layer? As soon as the package dependencies change we can be arbitrarily lucky or unlucky with such intermediate layers.Also, be it individual or vectorized, I am also wondering if there is a specific reason to favor (CRAN) package installation via
remotes::install_cran()
overinstall.packages()
or the neatinstall2.r
.Sorry for the many questions 😄 , I find golem very interesting and I am trying to bring up comments based on my experience / best practices and to consolidate these at the same time. Happy to discuss and contribute!
The text was updated successfully, but these errors were encountered: