-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix SMART from TfidfModel for case when df == "n"
. Fix #2020
#2021
Merged
Merged
Changes from all commits
Commits
Show all changes
31 commits
Select commit
Hold shift + click to select a range
3160dea
Added Montemurro and Zanette's entropy-based keyword extraction algor…
PeteBleackley 0550651
Improved Docstrings
PeteBleackley a072f93
Fixed numerical bugs due to zero frequencies
PeteBleackley 02428fa
Merge branch 'develop' into develop
menshikh-iv c8a3792
Coding style changes, test and tutorial
PeteBleackley 8a8264e
Trying to fix a merge conflict
PeteBleackley e763f3c
I hate git
PeteBleackley 195c3c0
Summarization tutorial
PeteBleackley 5b9a3ad
Fixed some failing tests
PeteBleackley 4c2d8de
Tests, demo, nan_to_num and a few last flake8 issues
PeteBleackley d9c290a
Further flake8 issues
PeteBleackley 8809e5a
Further flake8 issues
PeteBleackley a97fd82
Removed Jupyter checkpoint
PeteBleackley 0d4e31c
Removed trailing whitespace
PeteBleackley 4d18223
Trailing whitespace
PeteBleackley dc42cee
Speed up test and add comment to explain threshold value
PeteBleackley fdddf02
Flake8 again
PeteBleackley 86db65c
rename vars + style fixes
menshikh-iv 28ae7cb
fix operation order
menshikh-iv 6add3ba
Update docs with Montemurro and Zanette's algorithm
PeteBleackley 86590bb
Revert "Update docs with Montemurro and Zanette's algorithm"
PeteBleackley eb65041
Merge remote-tracking branch 'upstream/master' into develop
PeteBleackley 7711451
Merge remote-tracking branch 'upstream/develop' into develop
PeteBleackley bdc1a6d
Fixed bug in TfidfModel, as described in Issue #2020
PeteBleackley 590b52a
Fix return type
menshikh-iv 3b64a78
Updated unit tests for TfidfModel
PeteBleackley 3588b88
Merge branch 'develop' of https://github.com/PeteBleackley/gensim int…
PeteBleackley 5f9aa93
Updated unit tests for TfidfModel
PeteBleackley fdab1e8
Changed log(x)/log(2) to log2(x) since this is clearer. Fixed the pla…
PeteBleackley d2f3ea8
Fixed persistence tests
PeteBleackley 5aff83b
Flake 8
PeteBleackley File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a
self.
tocorpus
here and everywhere.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed
self.corpus
tocorpus
because my calculations for expected_docs are based on the corpus given at the top of test_tfidfmodel.py, andself.corpus
(based on test_data/test_corpus.mm) doesn't give matching resultsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
self.corpus
is not used anywhere I suggest removing it from thesetUp
method.@menshikh-iv please suggest the corpus which should be chosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
up to you guys, it is not so important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just one last step, since we are not using
testcorpus.mm
(defined in thesetUp
method) can you replace it withtext
,dictionary
andcorpus
defined at the top of the code.After that it is ready to merge @menshikh-iv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @PeteBleackley
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still using self.corpus in the persistence tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use either
self.corpus
orcorpus
, the redundancy is polluting the code.