Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added function to filter 'n' most frequent words #718

Merged
merged 3 commits into from
Jun 3, 2016

Conversation

abhinavchawla
Copy link
Contributor

I realized that in filter_extremes function, they keep the 'n' most frequent word as provided in argument. I was asked by my professor to remove the 50 most frequent words in the dictionary and hence had to write my own code.
Therefore I have added a function which will filter out the 'n' most frequent words.

@tmylk
Copy link
Contributor

tmylk commented Jun 2, 2016

Thanks! That is useful functionality to remove stop-words.
Could you please add a quick test and a line the CHANGELOG?

@abhinavchawla
Copy link
Contributor Author

I have added a quicktest and edited the CHANGELOG.txt file. Kindly check :)

@tmylk tmylk merged commit 26ae1c3 into piskvorky:develop Jun 3, 2016
@tmylk
Copy link
Contributor

tmylk commented Jun 3, 2016

@abhinavchawla Great. Thanks for the PR!

logger.info("discarding %i tokens: %s...",len(most_frequent_ids), most_frequent_words[:10])

self.filter_tokens(bad_ids=most_frequent_ids)
logger.info("resulting dictionary: %s" % self)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use comma to pass parameters to logger, to avoid unnecessary string formatting for ignored events.

@tmylk
Copy link
Contributor

tmylk commented Jun 10, 2016

Added PEP8 and logger changes to develop.

@abhinavchawla
Copy link
Contributor Author

Hey sorry I was kinda busy and got time now :( Thanks a lot @tmylk :) Is there anything else to do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants