Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolate generic preprocessing functions #3180

Merged
merged 8 commits into from
Aug 12, 2021

Conversation

rock420
Copy link
Contributor

@rock420 rock420 commented Jun 22, 2021

Fixes #3171

PR Description -

  1. All the preprocessing functions from gensim.corpora.textcorpus and gensim.corpora.lowcorpus have been moved to gensim.parsing.preprocessing module.
  2. Made behavior of both remve_stopwords function consistent.
  3. Added test cases for these preprocessing functions.
  • check for PEP8
  • successfully passed all unit tests

Copy link
Owner

@piskvorky piskvorky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the cleanup! I left some minor style comments in the review.

gensim/corpora/textcorpus.py Outdated Show resolved Hide resolved
gensim/corpora/textcorpus.py Outdated Show resolved Hide resolved
gensim/parsing/preprocessing.py Show resolved Hide resolved
@piskvorky piskvorky added this to the 4.1.0 milestone Jun 22, 2021
@piskvorky
Copy link
Owner

@mpenkov how do you move tickets and PRs to your columns, in https://github.com/RaRe-Technologies/gensim/projects/9?

Copy link
Owner

@piskvorky piskvorky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@mpenkov
Copy link
Collaborator

mpenkov commented Aug 12, 2021

@mpenkov how do you move tickets and PRs to your columns, in https://github.com/RaRe-Technologies/gensim/projects/9?

Sorry for the late reply. First, add the ticket to the project (see Projects on the right side of this PR screen, next to milestones, etc.) and then select a column to add to.

@mpenkov mpenkov merged commit 6129a24 into piskvorky:develop Aug 12, 2021
@mpenkov
Copy link
Collaborator

mpenkov commented Aug 12, 2021

@rock420 Thank you for cleaning this up! I've merged your changes.

@piskvorky Do we want to include this in the change log for 4.1.0? If yes, please edit the CHANGELOG.md on develop HEAD.

@piskvorky
Copy link
Owner

piskvorky commented Aug 12, 2021

@piskvorky Do we want to include this in the change log for 4.1.0? If yes, please edit the CHANGELOG.md on develop HEAD.

Yes. Let's do it as part of the release or another PR, to avoid editing develop directly:


### :warning: Removed functionality & deprecations

* [#3180](https://github.com/RaRe-Technologies/gensim/pull/3180): Move  preprocessing functions from gensim.corpora.textcorpus and gensim.corpora.lowcorpus to gensim.parsing.preprocessing, by [@rock420](https://github.com/rock420)

tbbharaj pushed a commit to tbbharaj/gensim that referenced this pull request Aug 19, 2021
* Move preprocessing functions from textcourpus module

* Move preprocessing functions from lowcorpus module

* Add test cases for preprocessing functions

* Fix styling issues

* Refactor remove_stopwords() and strip_short()

* make tests pass

* rm unused import

Co-authored-by: Michael Penkov <m@penkov.dev>
tbbharaj pushed a commit to tbbharaj/gensim that referenced this pull request Aug 19, 2021
* Move preprocessing functions from textcourpus module

* Move preprocessing functions from lowcorpus module

* Add test cases for preprocessing functions

* Fix styling issues

* Refactor remove_stopwords() and strip_short()

* make tests pass

* rm unused import

Co-authored-by: Michael Penkov <m@penkov.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make remove_stopwords() behavior more consistent
3 participants