Skip to content

Latest commit

 

History

History
192 lines (129 loc) · 12 KB

parallelism-and-hw.md

File metadata and controls

192 lines (129 loc) · 12 KB

Parallelism and hardware

Contents

  1. Introduction
  2. Performance and bandwidth
  3. Model parallelism
  4. Computational complexity of transformers
  5. Efficient transformers: Inference optimizations
  6. Efficient transformers: Architecture modifications
  7. Kernel programming
  8. Accelerators
  9. Conclusion

Introduction

Examples of GPU memory usage. source: https://arxiv.org/abs/2403.03507

Performance and bandwidth

Roofline plots:

Example of a Roofline plot. source: https://commons.wikimedia.org/wiki/File:Example_of_a_Roofline_model.svg

Model parallelism

Model parallelism (source: https://huggingface.co/docs/transformers/v4.17.0/en/parallelism)

Computational complexity of transformers

Efficient transformers: Inference optimizations

Efficient transformers: Architecture modifications

Kernel programming

Nvidia: CUDA

AMD: ROCm

Accelerators

Nvidia

AMD

Intel (Habana)

Blaize

Cerebras

Furiosa

Groq

Rebellions

SambaNova

Tenstorrent

Others:

  • d-Matrix
  • Etched
  • Graphcore
    • In July 2024, Softbank Group agreed to acquire Graphcore for around $500 million. The deal is under review by the UK's Business Department's investment security unit. [Wikipedia]
  • Lightmatter
  • MatX
  • Taalas
  • Untether AI

Conclusion

TODO