Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset #23

Open
darkacorn opened this issue Oct 16, 2024 · 1 comment
Open

dataset #23

darkacorn opened this issue Oct 16, 2024 · 1 comment

Comments

@darkacorn
Copy link

darkacorn commented Oct 16, 2024

no matter how much i love phi 3.5 and and i am indeed very great-full you guys released the excerpt datasets

would there be a way to get the full dataset under academic licence ?

im trying to train up a 8b model either with llama3.1 3b llama 3.2 or the new 8b of mistral

and if not - the synthetic pipeline scripts would help already heaps

@alexandreteles
Copy link

Having the dataset available would be good, I've been wondering the same thing for the last two weeks.

Phi 3.5 was likely picked for its supposed strong reasoning capabilities, but it is a very underwhelming model. I would love to see a llama 3.2 3b finetune based on the Flow Judge dataset and I've also thought about asking for either the dataset or the synthdata pipeline scripts.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants