Add Mistral Models to Flax #26809

kiansierra · 2023-10-14T18:03:25Z

Feature request

I would like to implement the ~~Llama~~ Mistral model in flax

Motivation

I've been trying to get familiar with jax and as such I started migrating the llama model, and I think I am at a point where both models are quite comparable in outcome

Your contribution

Yes I could submit a PR with the model implementation

ArthurZucker · 2023-10-16T07:53:24Z

I think this could be interesting! Feel free to open a PR and ping @sanchit-gandhi 😉

sanchit-gandhi · 2023-10-16T17:19:35Z

Hey @kiansierra - there's already a PR for Flax LLaMA that is pretty much ready to be merged: #24587 Feel free to check it out!

But we'd love contributions for other LLM's in the library where there's only PyTorch support and not Flax 🤗 If there are particular checkpoints on the HF Hub that you see getting a lot of usage (downloads) where there's only PyTorch support but not Flax, definitely let us know here and we can get going with a PR! 🚀

kiansierra · 2023-10-16T17:53:28Z

Thansk for the Heads up @sanchit-gandhi, I'll see if there is any other model I think I can add to Flax and tag you on the next issue

ArthurZucker · 2023-10-17T06:27:05Z

Oups I even reviewed the PR 😅 sorry @kiansierra 🤗

vvvm23 · 2023-10-17T07:56:19Z

@kiansierra sorry to scoop Flax Llama from you! If you want any suggestions, I think Mistral is a pretty popular model right now without a Flax port.

kiansierra · 2023-10-17T09:15:19Z

Hey no worries, I think I will give Mistral a go, it seems some of the work can be ported

konstantinos-p · 2023-10-17T15:36:26Z

Happy to see that a couple of people are interested in porting these models to flax! I was also interested in contributing! Is there any other model that would be interesting? On a side note: I guess flash-attention only works for the pytorch models atm (?) Is there any fundamental reason why porting the flash-attention implementation to jax would be difficult?

erfanzar · 2023-10-18T15:20:21Z

hello, guys I have created both llama and mistral models in flax if you want you can use them modelling_mistral_flax.py

sanchit-gandhi · 2023-10-24T17:16:12Z

Yes Flash Attention relies on dispatching optimised CUDA kernels, which as far as I'm aware haven't been implemented in JAX. You could look into Pallas and see if someone's written Flash Attention kernels for JAX using this library? https://jax.readthedocs.io/en/latest/pallas/design.html

konstantinos-p · 2023-11-03T18:19:40Z

Indeed there's an effort to write FlashAttention in Pallas, https://github.com/google/jax/blob/main/jax/experimental/pallas/ops/attention.py although it's still a work in progress jax-ml/jax#17328 . @sanchit-gandhi I'd be happy to try to port another model. For example, Yarn-Mistral seems to have some traction, though it's not part of the transformers library atm. Any other suggestions are welcome!

ArthurZucker added the Feature request Request for a new feature label Oct 16, 2023

sanchit-gandhi mentioned this issue Oct 9, 2023

Add Llama Flax Implementation #24587

Merged

5 tasks

kiansierra closed this as completed Oct 16, 2023

kiansierra changed the title ~~Add Llama Models to Flax~~ Add Mistral Models to Flax Oct 17, 2023

kiansierra reopened this Oct 17, 2023

kiansierra mentioned this issue Oct 19, 2023

Flax mistral #26943

Merged

5 tasks

ArthurZucker closed this as completed in #24587 Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Mistral Models to Flax #26809

Add Mistral Models to Flax #26809

kiansierra commented Oct 14, 2023 •

edited

Loading

ArthurZucker commented Oct 16, 2023

sanchit-gandhi commented Oct 16, 2023

kiansierra commented Oct 16, 2023

ArthurZucker commented Oct 17, 2023

vvvm23 commented Oct 17, 2023

kiansierra commented Oct 17, 2023

konstantinos-p commented Oct 17, 2023

erfanzar commented Oct 18, 2023

sanchit-gandhi commented Oct 24, 2023

konstantinos-p commented Nov 3, 2023 •

edited

Loading

Add Mistral Models to Flax #26809

Add Mistral Models to Flax #26809

Comments

kiansierra commented Oct 14, 2023 • edited Loading

Feature request

Motivation

Your contribution

ArthurZucker commented Oct 16, 2023

sanchit-gandhi commented Oct 16, 2023

kiansierra commented Oct 16, 2023

ArthurZucker commented Oct 17, 2023

vvvm23 commented Oct 17, 2023

kiansierra commented Oct 17, 2023

konstantinos-p commented Oct 17, 2023

erfanzar commented Oct 18, 2023

sanchit-gandhi commented Oct 24, 2023

konstantinos-p commented Nov 3, 2023 • edited Loading

kiansierra commented Oct 14, 2023 •

edited

Loading

konstantinos-p commented Nov 3, 2023 •

edited

Loading