Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nltk.data.find argument needs to be a path, not just a filename #548

Merged
merged 1 commit into from
Jan 2, 2017

Conversation

Gaboose
Copy link
Contributor

@Gaboose Gaboose commented Jan 2, 2017

nltk_download_corpus always downloads nltk resources even if it's already there.

It's because neither nltk.data.find('stopwords.zip') nor nltk.data.find('stopwords') finds the resource. Maybe nltk_data directory tree was flat at some point in the past, but right now it gets populated like this:

- corpora
  |- stopwords
  |- stopwords.zip
  |- wordnet
  |- wordnet.zip
- sentiment
  |- vader_lexicon.zip
- tokenizers
  |- punkt
  |- punkt.zip

and so only a call like nltk.data.find('corpora/stopwords') works.

@gunthercox
Copy link
Owner

Thank you 👍

@gunthercox gunthercox merged commit 305e22b into gunthercox:master Jan 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants