Skip to content
/ PAL Public

Active learning workflow with data and task parallelism through MPI

Notifications You must be signed in to change notification settings

aimat-lab/PAL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallel Active Learning - automated, modular, and parallel active learning workflow

Parallel active learning (PAL) workflow with data and task parallelism through Message Passing Interface (MPI) and mpi4py.

Features

  • The automatic workflow reduces human intervention in active learning.
  • The machine learning training (ML) and inference processes are decoupled, enabling data and task parallelism for data generation, labeling, and training tasks.
  • PAL is designed in a modular and highly adaptive fashion that can be extended to different tasks with various combinations of resources, data, and ML model types.
  • Implemented with MPI and its Python package (mpi4py), PAL is scalable and can be deployed flexibly on shared- (e.g., laptop) and distributed-memory systems (e.g., computer cluster).

Prerequisite

  • Python >= 3.9
  • mpi4py >= 3.1 with openmpi
  • Matplotlib/Numpy
  • openmpi == 4.1

Run PAL

Initialize 20 processes locally

mpirun -n 20 python main.py

Initialize 20 processes on 2 nodes for 1 hour on a computational cluster with Slurm system

#!/bin/sh

#SBATCH --nodes=2
#SBATCH --ntasks=20
#SBATCH --ntasks-per-node=10
#SBATCH --time=01:00:00
#SBATCH --cpus-per-task=1

export OMPI_MCA_coll_hcoll_enable=1
export UCX_TLS=dc,self,posix,sysv,cma

mpirun --bind-to core --map-by core --rank-by slot -report-bindings python main.py

About

Active learning workflow with data and task parallelism through MPI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages