Skip to content

joey00072/Multi-Head-Latent-Attention-MLA-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Head Latent Attention (MLA)

MLA

Quick Start

install ohara

pip install ohara

To train MLA:

python train_mla.py --attn_type=mla

For baseline, use MHA:

python train_mla.py --attn_type=mha 

If you cant to calculate the number of parameters, and check what % kv cache you'll save visite this link: https://joey00072.github.io/Multi-Head-Latent-Attention-MLA-/

TODO

  • write blog post
  • add jax version
  • Add GQA and MOQ in calculation (index.html)
  • Distill llama to MLA version Maybe

About

working implimention of deepseek MLA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published