LLM, CUDA/System, CPU offloading, Distributed training
-
KAUST (King Abdullah University of Science and Technology)
- in/liangyu-wang-in
- @liangyuwang10
Pinned Loading
-
Tiny-DeepSpeed
Tiny-DeepSpeed PublicTiny-DeepSpeed, a minimalistic re-implementation of the DeepSpeed library
-
Flash-Attention-Implementation
Flash-Attention-Implementation PublicImplementation of Flash-Attention (both forward and backward) with PyTorch, CUDA, and Triton
Python 1
-
Tiny-Megatron
Tiny-Megatron PublicTiny-Megatron, a minimalistic re-implementation of the Megatron library
Python 3
-
MetaProfiler
MetaProfiler PublicMetaProfiler is a lightweight, structure-agnostic operator-level profiler for PyTorch models that leverages MetaTensor execution to simulate and benchmark individual ops without loading the full mo…
Python 1
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.