Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High perplexity and FoSS adjustment in GPT-2 kNN-LM #17

Open
lihe27341 opened this issue Dec 4, 2024 · 6 comments
Open

High perplexity and FoSS adjustment in GPT-2 kNN-LM #17

lihe27341 opened this issue Dec 4, 2024 · 6 comments

Comments

@lihe27341
Copy link

Hi @urialon !
Thank you for your excellent work and the resources provided in this repository.

I am currently trying to replicate the kNN-LM results for GPT-2 but have encountered an issue with the perplexity being higher than expected. Using the default hyperparameters, I obtained a PPL of 17.34 vs 12.57.

What have I tried?

Tuning k, knn_temp, and lambda:

Adjusting these hyperparameters improved the PPL to some extent. For example, setting knn_temp to 50 resulted in a PPL of 14.84. Despite trying numerous combinations of these hyperparameters, I could not achieve the reported PPL of 12.57. Could you share the specific hyperparameter settings that were used to obtain this result?

Modifying the Fraction of Saved Searches (FoSS):

In the README, I noticed that changing the FoSS value significantly impacts kNN-LM performance. However, in traditional kNN-LM implementations, a full kNN search is conducted for every token, which seems to make the FoSS value fixed. I am unsure how to adjust the FoSS value effectively in this context. Could you provide any guidance or examples on how to modify it to optimize performance?
微信图片_20241205002258
Thank you!

@urialon
Copy link
Collaborator

urialon commented Dec 4, 2024 via email

@lihe27341
Copy link
Author

Thank you for your reply!

Yes, I have tried running with both your datastore and index, and I am able to achieve the same PPL as yours with both the base model and RetoMaton.

@urialon
Copy link
Collaborator

urialon commented Dec 4, 2024 via email

@lihe27341
Copy link
Author

The issue is with the GPT-2 kNN-LM's PPL. Without changing any hyperparameters, I get a PPL of 17.34 compared to your reported 12.57, which is significantly higher. (Although for the base model and RetoMaton, I do achieve results consistent with yours.)

@urialon
Copy link
Collaborator

urialon commented Dec 5, 2024 via email

@lihe27341
Copy link
Author

No, that’s not the case. Whether I use your datastore or create my own, I get the same results: a PPL of 17.34 compared to your reported 12.57.

微信图片_20241205092339

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants