[help] format = "qs"
seems to suppress storage = "worker"
#1304
-
Help
DescriptionI'm working with some very large datasets (100GB) in targets, and so I need to be very careful that no data is duplicated as there isn't enough memory to do that. Consequently, I'm using Here is a targets workflow that exhibits these same settings, that can be used to reproduce my findings: library(targets)
tar_option_set(
controller = crew.cluster::crew_controller_slurm(
workers = 1,
script_lines = c(
"module load R/4.4.0",
# Stop R from overallocating memory
"ulimit -v $(( SLURM_MEM_PER_CPU * 1024))"
),
slurm_memory_gigabytes_per_cpu = 10,
slurm_log_error = NULL,
slurm_log_output = NULL
),
format = "qs",
# Don't send data over the network
storage = "worker",
retrieval = "worker",
)
list(
# This is 7.5 Gb as determined by object.size
tar_target(vector, numeric(1E9)),
tar_target(size, object.size(vector))
) When you run this, targets prints this to the console:
When you check the Slurm log file, you get:
This worries me for two reasons:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 12 replies
-
I am not sure why the library(targets)
tar_option_set(
controller = crew::crew_controller_local(),
format = "qs",
storage = "worker",
retrieval = "worker",
)
list(
tar_target(vector, numeric(1e9)),
tar_target(size, object.size(vector))
) while running |
Beta Was this translation helpful? Give feedback.
b680b8f fixes the issue so that now the data sent over the network is very light in the event of a storage error. Now when you run the example in #1304 (reply in thread), the
crew
worker does not crash, andtar_make()
prints the error message "store std::bad_alloc". The underlying error comes fromqs::qsave()
, which is running into the memory constraint set byunix::rlimit_as()
.