Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phi-3 support #1672

Closed
Theodotus1243 opened this issue Apr 23, 2024 · 9 comments
Closed

Phi-3 support #1672

Theodotus1243 opened this issue Apr 23, 2024 · 9 comments

Comments

@Theodotus1243
Copy link

Theodotus1243 commented Apr 23, 2024

Powerful model trained on syntetic data, has high MMLU

4K context window one should be easier, as has no LongRope

https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
https://arxiv.org/pdf/2404.14219.pdf

@BBC-Esq
Copy link

BBC-Esq commented Apr 24, 2024

I second this. The current phi loader is broken, apparently because of some changes that Microsoft did to the model after it was initially released. At any rate, adapting the phi loader to the new phi3 should be easier than starting from scratch.

@jncraton
Copy link
Contributor

For anyone else researching this, phi3 support has been added to the convert_hf_to_gguf.py script in llama.cpp. Perhaps something can be gleaned from there to simplify the implementation of the ct2 converter.

@vince62s
Copy link
Member

no worry it will be done, it's quite easy for the mini-4k since it takes all llama2 arch.
fyi: https://forum.opennmt.net/t/phi-3-3-8b-llama2-7b-ensemble-just-for-fun/5729

@BBC-Esq
Copy link

BBC-Esq commented Apr 24, 2024

Is it done yet? I've been waiting patiently for approximately two hours now? ;-)

@minhthuc2502
Copy link
Collaborator

Hello, I am working on it. Some unexpected problems appears.

@BBC-Esq
Copy link

BBC-Esq commented Apr 25, 2024

I'm not skilled enough to help directly by implementing the code...but if you want me to do any grunt work or research let me know dude...anything to assist speed up the process. Thanks!

@BBC-Esq
Copy link

BBC-Esq commented Apr 25, 2024

I'd like to start learning to eventually possibly help...Question...how do I get the actual model architecture to start with...It's my understanding that getting the model's structure, what activation functions are used, etc. and basically starting to understanding the structure is key in making additional converters down the road. For example, here's a link:

https://bbycroft.net/llm

Here are some other links that I've been gathering with the goal of eventually contributing a converter...based on first trying to understand the structure of LLMs...

https://github.com/mert-kurttutan/torchview

https://github.com/lutzroeder/netron

Huggingface sometimes (but not always) has information like this...

image

Basically, any good starting point for me that you'd recommend dude? Thanks!

@BBC-Esq
Copy link

BBC-Esq commented Apr 25, 2024

Remember, you're dealing with an idiot who doesn't do this for a profession and has never taken the LLM 101 class in college let alone have a doctoral degree. ;-) I don't even know what "mlp.down" or "layernorm.weight" means, for example, but am willing to learn.

@minhthuc2502
Copy link
Collaborator

PR #1680 to add the converter for Phi3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants