Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yet another BatchtoolsExpiration: Future ('<none>') expired error #285

Open
kpj opened this issue Apr 1, 2022 · 0 comments
Open

Yet another BatchtoolsExpiration: Future ('<none>') expired error #285

kpj opened this issue Apr 1, 2022 · 0 comments

Comments

@kpj
Copy link

kpj commented Apr 1, 2022

Executing the following snippet leads to the exception Error: BatchtoolsExpiration: Future ('<none>') expired (registry path [..]). on our LSF powered cluster.

library(future)
library("future.batchtools")
library(furrr)

plan(batchtools_lsf, template = "lsf-simple.tmpl", resources = list(queue = "gpu.4h", walltime = 60 * 60 * 4, memory = "5000", core_num = 2))
future_map_dfr(1:10, function(x) { data.frame(x = x , y = x^2) })
plan(sequential)

with the following lsf-simple.tmpl:

#BSUB -J <%= job.name %>
#BSUB -o <%= log.file %>
#BSUB -q <%= resources$queue %>
#BSUB -W <%= round(resources$walltime / 60, 1) %>    # resources$walltime in seconds
#BSUB -M <%= resources$memory %>
#BSUB -R "rusage[mem=<%= resources$memory %>, ngpus_excl_p=1]"
#BSUB -n <%= resources$core_num %>

Rscript -e 'batchtools::doJobCollection("<%= uri %>")'

This topic has already been discussed in various settings:

Possible solutions have been suggested for SLURM (futureverse/future.batchtools#74, #273).

I am currently trying to resolve these issues for LSF and I do not think the three topics mentioned above apply.
This is because 1) I am spawning a very small number of jobs, 2) no error message was reported and 3) the LSF job status is set to DONE for jobs which expired:

[R script]
Error: BatchtoolsExpiration: Future ('<none>') expired (registry path [..]).. The last few lines of the logged output:
Sender: LSF System <[..]>
Subject: Job 212072182: <jobc6330b006f2ac3311db1511449421e23> in cluster <[..]> Done
Job <jobc6330b006f2ac3311db1511449421e23> was submitted from host <[..]> by user <[..]> in cluster <[..]> at Fri Apr  1 14:24:57 2022
[..]
$ bjobs 212072182
JOBID      USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
212072182  [..]    DONE  gpu.4h     [..]        [..]        *449421e23 Apr  1 14:24

I'd be excited to hear your thoughts on this. Did I do something wrong? Or is there a way of fixing this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant