Skip to content

copyrightly/EfficientML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

List of papers:

Neural Architecture Search(NAS):

Early NAS methods using RNN-based controllers

Differentiable NAS methods

State of the art (used in the lab's notebook)

MCUNets

COCO datasets

Inverted MobileNet blocks

Efficiency constraints in real world

TinyEngine and Parallel Processing:

Vision Transformer:

  • An image is worth 16x16 words
  • Segment Anything Model (SAM)
  • Segment Anything Model 2 (SAM 2)
  • EfficientViT: multi-scale linear attention
  • Flamingo: a Visual Language Model for Few-Shot Learning [Alayrac et al., 2022]
  • PaLM-E: An Embodied Multimodal Language Model [Driess et al., 2022]

GAN, Video, and Point Cloud

Diffusion Model

Distributed Training

  • Scaling Distributed Machine Learning with the Parameter Server. Mu Li et al. 2014
  • ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
  • Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
  • Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
  • DeepSpeed: Extreme-scale model training for everyone
  • Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning [Zheng et al. 2022]
  • Sparse communication for distributed gradient descent [Alham Fikri et al 2017]
  • Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training [Lin et al 2017]
  • Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes [Sun et al 2019]
  • PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization [Vogels et al 2019]
  • signSGD with Majority Vote is Communication Efficient and Fault Tolerant [Bernstein et al 2019]
  • ATOMO: Communication-efficient Learning via Atomic Sparsification [Wang et al 2018]
  • 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs [Frank 2014]
  • Scalable distributed DNN training using commodity GPU cloud computing [Nikko 2015]
  • TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
  • Delayed Gradient Averaging: Tolerate the Communication Latency in Federated Learning. [Zhu 2021]