Skip to content

Latest commit

 

History

History
48 lines (37 loc) · 2.53 KB

README.md

File metadata and controls

48 lines (37 loc) · 2.53 KB

Benchmark of container's image size reduction strategies

Container technologies are growing due to their benefits to reproducibility and deployement. However, The growing size of container's image (e.g. FMRIprep is ~15GB) is problematic since data transfer takes a considerable amount of time in data-intensive applications [1]. We study strategies to reduce data transfer time for container image. As a first appraoch, we aim at minimizing the image size of container with neurodocker and strace. This will help us determine if further effort should be invested to optimise transfer of container's image.

Material and Methods

Dataset

corr dataset

  • 1397 subjects
  • 408.4 GB
  • T1, fMRI

TODO:

  • Selection a subset of subjects

Tools used:

Bash

Docker

Singularity

Dask

BIDS App:

  • example
  • MAGeTbrain
  • FMRIprep
  • ...

Aproach:

  • Reduce image size with:
    • [ ] Neurodocker (Complete) (Removed due to issues when building container images)
    • [ ] Neurodocker (Whithelist) (Container fails to build with the dependency. Raise an issue to neurodocker)
    • Custom bash script using reprozip
    • Custom bash script using strace
  • Convert Docker image with docker2singularity
  • Benchmark the baseline versus the different reduce methods

Infrastructure

A 1-core scheduler with 7.5GB of memory with the Dask distributed scheduler and 10 dedicated 4-cores workers with 15GB of memory connected by a 10Gb/s network bandwidth.

Future work

We want to explore new scheduling and access strategies to reduce data transfer time of container's image.

Deliverable

  • Script to minify image's size of container.
  • Minified images
  • Benchmark report on the impact of minimizing image's size of container.
  • Interactive gantt chart for the execution of the pipelines