-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests for the evaluate_word_pairs function #1061
Conversation
Conflicts: CHANGELOG.txt
Conflicts: CHANGELOG.txt gensim/models/word2vec.py
… default vector size is 100, not 200).
Conflicts: gensim/models/word2vec.py
Conflicts: CHANGELOG.md README.md gensim/models/word2vec.py tutorials.md
…y judgments datasets.
…y judgments datasets.
@tmylk the tests are ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tests. An oov_ratio
sanity test would be great
pearson = correlation[0][0] | ||
spearman = correlation[1][0] | ||
self.assertTrue(0.1 < pearson < 1.0) | ||
self.assertTrue(0.1 < spearman < 1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we please test for oov_ratio
in correlation[2]
too?
Sure, done. |
Thanks for the improvement! |
By the way, how is it better than using https://github.com/mfaruqui/eval-word-vectors ? |
It's better in that this code works directly from Gensim :) |
I agree with @akutuzov. The code currently in gensim for Pearson and Spearman coefficients is shorter. But I feel, we could also include the whole dataset for evaluating word vectors, given in https://github.com/mfaruqui/eval-word-vectors. It's just 205 KB, and contains all the major gold standards, it'd be good to integrate them into gensim itself, and have one method to directly evaluate word2vec models, right inside gensim. What do you think? The script I used to convert word2vec into the format for evaluating word vectors is quite small actually:
|
I am not sure it's a good idea to overload Gensim with various semantic similarity datasets included in the distribution. |
Yeah you are right. Sounds Good. |
Test for evaluating model against semantic similarity datasets (#1047).
Also fixes an error in the function call.