Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation of strip_punctuation vs strip_punctuation2 in gensim.parsing.preprocessing #2961

Closed
sciatro opened this issue Sep 28, 2020 · 4 comments · Fixed by #2965
Closed
Labels
documentation Current issue related to documentation

Comments

@sciatro
Copy link
Contributor

sciatro commented Sep 28, 2020

Thanks for all the hard work on this fantastic library. I found a small quirk today, not really a bug, just a bit of a rough edge:

In gensim.parsing preprocessing.py strip_punctuation2 is defined: strip_punctuation2 = strip_punctuation.

In the documentation the description of strip_punctuation2 is a duplication of strip_punctuation rather than a statement of equality.

I noticed this while reading the documentation and, assuming I was missing an obvious distinction, attempting to hand diff the the docs for the two functions. When I gave up and flipped to the source it became obvious how the two functions are related.

@piskvorky piskvorky added the documentation Current issue related to documentation label Sep 28, 2020
@piskvorky
Copy link
Owner

I have no idea why we have two identical functions there. Would you be able to check the code and see which one is used where? Maybe we can keep just one? (which would solve the problem with duplicate documentation automatically)

Thanks!

@sciatro
Copy link
Contributor Author

sciatro commented Sep 28, 2020

To the uninformed eye on a first pass neither strip_punctuation nor strip_punctuation2 seem to be used widely within the library. Conceptually they seem to me to be on the user facing edge of the API so that is not surprising.

Of the two, based on the linked results, strip_punctuation has it over strip_punctuation2 2:1.

@piskvorky
Copy link
Owner

Yeah looks useless. Can you open a PR to get rid of strip_punctuation2?

@sciatro
Copy link
Contributor Author

sciatro commented Sep 28, 2020

I don't know how to use any of the actual github functionality but I can try learn how to do so (assuming someone who knows doesn't deletes the line first). Thx again for the fantastic library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Current issue related to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants