-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation of word2vec models against semantic similarity datasets #1047
Conversation
Conflicts: CHANGELOG.txt
Conflicts: CHANGELOG.txt gensim/models/word2vec.py
… default vector size is 100, not 200).
Conflicts: gensim/models/word2vec.py
Conflicts: CHANGELOG.txt gensim/models/word2vec.py gensim/scripts/word2vec_standalone.py
Conflicts: CHANGELOG.md README.md gensim/models/word2vec.py tutorials.md
…y judgments datasets.
…y judgments datasets.
Thanks @piskvorky |
This is crazy. |
@akutuzov thanks for the feature. Could we please add some simple unit tests for this new feature? |
@tmylk what can those be? Evaluating against a toy dataset? Should it follow the same structure as testAccuracy in https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/test/test_word2vec.py#L370? Also, what should we do with the old unneeded commits in this PR? As I've said, I can probably start a new one from scratch, if it is not possible to just squash them all into one on Gensim side. |
That test is not a good example. It is not a test of accuracy but a test of KeyedVectors. A good test is when a model trained on Lee corpus being given a single pair to evaluate, like in the sanity test There is another point. Having the small and canonical questions-words.txt in the repo helps a lot of people to test accuracy of their models. So we should add a semantic similarity dataset it is less than 1Mb . Don't worry about commits, I will squash them. |
OK, I will add a test, then. |
Thanks for the PR! Merging to add it to this year's release. Tests and a dataset should be in a separate PR. |
Cool, thanks! |
We long had analogy evaluation of wor2vec models in Gensim (also known as analogical inference). However, another type of evaluation is widespread in distributional semantics world, that is using word pairs ranked by their semantic similarity (see SimLex-999 and other datasets), and the correlation of these similarities to those produced by the model.
This PR adds the self.evaluation function to perform such evaluation against arbitrary datasets.