Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array.jobs supported for SLURM systems? #23

Open
jgrn307 opened this issue May 23, 2018 · 3 comments
Open

array.jobs supported for SLURM systems? #23

jgrn307 opened this issue May 23, 2018 · 3 comments

Comments

@jgrn307
Copy link

jgrn307 commented May 23, 2018

I'm trying to use future.batchtools with a SLURM system that heavily suggests using array jobs: e.g. makeClusterFunctionsSlurm(...,array.jobs=T) but I can't seem to figure out how to pass that option via future.batchtools. Any ideas?

@HenrikBengtsson
Copy link
Collaborator

Unfortunately, array jobs are currently not supported by the future.batchtools backends - each future is submitted as an individual job. I am aware that this is less than ideal and support for array jobs would be awesome.

However, how the awareness/concept of an array of jobs should be brought into the Future API is not 100% clear, i.e. it's a design decision that holds us back from supporting it. The main object is that any feature added to the Future API should be work the same regardless of backends. If this is not possible, it may be added as an optional feature, but the concept of optional features is yet to be designed.

The way I can see it being introduced is via higher level future APIs, e.g. future.apply::future_lapply(), where we up front know exactly how many futures we want to launch. That model maps nicely into array jobs on HPC schedulers. To do this for futures in general, we need a way to group individual futures (above design issue). For example, in the below example, how should we specify that the first two futures should be part of one array job, and the other two in another?

a1 <- future(1)
a2 <- future(2)
b1 <- future(1)
b2 <- future(2)

Maybe it can be done as:

a1 <- future(1, lazy = TRUE)
a2 <- future(2, lazy = TRUE)
a <- c(a1, a2)

b1 <- future(1, lazy = TRUE)
b2 <- future(2, lazy = TRUE)
b <- c(a1, a2)

where the concatenation (or some other mechanism) will trigger the futures to be launched.

@jgrn307
Copy link
Author

jgrn307 commented Oct 18, 2018

Would it be possible to do a "simple" approach where we set a max array size parameter someplace, and have future.batchtools spread what would have been individual jobs across the array instead? e.g. given a max array size of 100, with 250 iterations needed, and setting an array=T sort of thing, it would create three array jobs, (1:100, 101:200, 201:250)?

It's just an issue of distributing the iterations across array instances instead of individual jobs.

As I'm sure you are aware, many schedulers dislike (and often limit) a large number of individual jobs, but combined with array jobs you can functionally achieve lots of jobs.

With batchtools, we:

reg$cluster.functions = makeClusterFunctionsSlurm(template=slurm_template_file,array.jobs=TRUE)
walltime_hours=1/60
batchtools_resources=list(walltime = walltime_hours*3600, memory = 8192,ncpus=1,chunks.as.arrayjobs = T)
ids = batchMap(fun=batchtools_file_function,
		args=batchtools_df_to_loop_through,
		more.args=other_args)
max.array.size = 1001
ids$chunk = chunk(x=seq(nrow(ids)),chunk.size=max.array.size)

@HenrikBengtsson
Copy link
Collaborator

So, this is a tricky design problem that is hard to solve any time soon - it needs lots of thoughts and work.

But for your every day work, are you aware of:

plan(batchtools_slurm, workers = 100L)

It will cause nbrOfWorkers() to return 100 rather than the default +Inf (= "infinite job queue"). This in turn will cause:

y <- future.apply::future_lapply(X, ...)

to chunk up X in 100 chunks to be processed by 100 futures, i.e. 100 separate SLURM jobs, regardless of length(X) has 1,000 or 1,000,000 elements. This is not HPC array jobs, but it allows you to limit the number of jobs you submit to the queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants