Multi-Head Latent Attention (MLA)

Quick Start

install ohara

pip install ohara

To train MLA:

python train_mla.py --attn_type=mla

For baseline, use MHA:

python train_mla.py --attn_type=mha

If you cant to calculate the number of parameters, and check what % kv cache you'll save visite this link: https://joey00072.github.io/Multi-Head-Latent-Attention-MLA-/

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md
index.html		index.html
mla.py		mla.py
modeling_mla.py		modeling_mla.py
train_mla.py		train_mla.py