-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Skip common English words in phrases #2979
Conversation
cc6ce47
to
9e503a4
Compare
I still cannot access CircleCI (not even from an incognito window) – @mpenkov can you please check what it's complaining about? Thanks. |
10 random phrases that were discarded (left column) vs newly introduced (right column) by this PR. Trained on a toy corpus
|
I'm not sure Still, the case against making it the silent default is:
If this previously-underappreciated feature is now liked enough to recommend more widely, I believe a better way to do that would be to update docs/tutorials/change-notes to highlight the availability of the |
I agree. I'll make this optional. |
@mpenkov please review & merge (after checking that circleci fail). |
712d34d
to
b00b393
Compare
Merging to keep branches clean. @mpenkov review welcome as always, even if it's later. |
Follow up from #2976 (comment) : add a list of English "common words" to skip during phrase construction.
These English words are an optional switch; the default doesn't change.
Backward incompatible change: this PR renamed the
common_terms
parameter ofgensim.models.phrases.Phrases()
toconnector_words
, with the same meaning.Fixes #2520. Fixes #1465.