Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running renv::restore() in parallel will fail #1571

Closed
Kjir opened this issue Jul 20, 2023 · 5 comments · Fixed by #1598
Closed

Running renv::restore() in parallel will fail #1571

Kjir opened this issue Jul 20, 2023 · 5 comments · Fixed by #1598
Labels
bug an unexpected problem or unintended behavior restore 🪄

Comments

@Kjir
Copy link

Kjir commented Jul 20, 2023

I have some Airflow tasks where I launch a Kubernetes Pod and in that Pod I use renv to install and run code from a package.
Since the task is, essentially, "copy data from A to B for date X", there can be multiple instance of this task running in parallel for different days.

If the packages are not in the cache, renv will attempt to install the missing packages in all the parallel jobs. Unfortunately, only the first job to finish installing will actually be sucessful and the other jobs will fail with an error:

Error: target file '/renv/cache/v5/linux-ubuntu-jammy/R-4.3/x86_64-pc-linux-gnu/kofcasts/1.2.1/51a5c034ff08245557a03533120c02e9/kofcasts' already exists
 - Installing kofcasts ...                       Traceback (most recent calls last):
 12: renv::install(package_name)
 11: renv_install_impl(records)
 10: renv_install_staged(records)
  9: renv_install_default(records)
  8: handler(package, renv_install_package(record))
  7: renv_install_package(record)
  6: renv_cache_synchronize(record, linkable = linkable)
  5: renv_cache_synchronize_impl(cache, record, linkable, path)
  4: renv_cache_copy(path, cache, overwrite = TRUE)
  3: renv_file_copy(source, target, overwrite = overwrite)
  2: renv_file_copy_dir(source, target)
  1: stop(status)
 Execution halted

Does this really need to be an error condition?
Ideally it would acknowledge that the desired package has been installed in the cache in the meantime and move on.

@kevinushey
Copy link
Collaborator

I believe #1598 will help resolve the issue you're seeing. Could you try testing with:

renv::install("rstudio/renv#1598")

and test if things appear better on your side?

@kevinushey kevinushey added bug an unexpected problem or unintended behavior restore 🪄 labels Jul 26, 2023
@kevinushey kevinushey added this to the 1.0.1 milestone Jul 26, 2023
@Kjir
Copy link
Author

Kjir commented Jul 26, 2023

I tried and reproduced the error running this in parallel in two terminals:

$ docker run --rm -it -v /tmp/rcache:/renv -e RENV_PATHS_CACHE=/renv rocker/verse R --vanilla -e "options(repos = c(CRAN='https://packagemanager.posit.co/cran/latest')); install.packages('renv'); renv::install('tsbox')"

I then tried to use the fixed version (clearing the cache beforehand sudo rm -rf /tmp/rcache/*):

$ docker run --rm -it -v /tmp/rcache:/renv -e RENV_PATHS_CACHE=/renv rocker/verse R --vanilla -e "options(repos = c(CRAN='https://packagemanager.posit.co/cran/latest')); install.packages('remotes'); remotes::install_github('rstudio/renv@1.0.0'); renv::install('tsbox')"

The fixed version did not throw an error.

I do wonder, however, what happens if the process dies for whatever reason while the lock is there: would it keep the installation locked until someone manually removes the lock?

@kevinushey
Copy link
Collaborator

The fixed version did not throw an error.

Thank you for confirming!

I do wonder, however, what happens if the process dies for whatever reason while the lock is there: would it keep the installation locked until someone manually removes the lock?

renv tries to guard against this scenario in two ways:

  1. R sessions using renv will also spawn a 'watchdog' process, whose job is to monitor active locks used by that process. If the monitored process suddenly dies, then the watchdog will release those locks and exit.

  2. renv will consider a lock that's more than 60 seconds old to be "stale", so in the worst case scenario you should just have a process waiting that amount of time until the lock is considered "stale" and that process can proceed.

renv will also refresh any active locks at opportune times, just to ensure that the 60 seconds threshold above isn't hit.

@Kjir
Copy link
Author

Kjir commented Jul 26, 2023

If a docker container dies, the watchdog won't work, but the other mechanism should fix that.

renv will also refresh any active locks at opportune times, just to ensure that the 60 seconds threshold above isn't hit.

So in case of a package with a long compilation time it would refresh the lock while compiling?

@kevinushey
Copy link
Collaborator

So in case of a package with a long compilation time it would refresh the lock while compiling?

Right -- to be more specific, the watchdog process is responsible for refreshing the locks, and it'll be able to do so even if the monitored R process is busy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior restore 🪄
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants