Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of the model on gsm8k/SVAMP/MultiArith. #22

Open
hccngu opened this issue May 4, 2023 · 1 comment
Open

Performance of the model on gsm8k/SVAMP/MultiArith. #22

hccngu opened this issue May 4, 2023 · 1 comment

Comments

@hccngu
Copy link

hccngu commented May 4, 2023

Thank you for your excellent project. I have conducted an evaluation of Flan-Alpaca-Base/Large/XL on the gsm8k/SVAMP/MultiArith datasets, and the evaluation results are as follows:
| Model | gsm8k | MultiArith | SVAMP |
| --------------- | --------------- |
| Flan-Alpaca-Base | 13.42 | 20.33 | 19.50 |
| Flan-Alpaca-Large | 14.40 | 19.83 | 17.80 |
| Flan-Alpaca-XL | 9.25 | 13.83 | 14.30 |
Overall, the larger the number of parameters in the model, the worse its performance. What do you think is the reason for this? Also, did you use the test sets of the three datasets mentioned above to train the model? If so, could the reason for this be that the smaller model overfit on the test data?
Thank you~

@chiayewken
Copy link
Collaborator

Hi, thanks for the interesting analysis! The gsm8k and SVAMP datasets are indeed used for Flan-T5 training but we are not sure about the reason for the trend of worse performance with model size. This definitely deserves a closer look, please let us know what you find!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants