nltk.data.find argument needs to be a path, not just a filename #548

Gaboose · 2017-01-02T17:03:06Z

nltk_download_corpus always downloads nltk resources even if it's already there.

It's because neither nltk.data.find('stopwords.zip') nor nltk.data.find('stopwords') finds the resource. Maybe nltk_data directory tree was flat at some point in the past, but right now it gets populated like this:

- corpora
  |- stopwords
  |- stopwords.zip
  |- wordnet
  |- wordnet.zip
- sentiment
  |- vader_lexicon.zip
- tokenizers
  |- punkt
  |- punkt.zip

and so only a call like nltk.data.find('corpora/stopwords') works.

gunthercox · 2017-01-02T20:45:13Z

Thank you 👍

nltk.data.find argument needs to be a path, not just a filename

547e6b2

gunthercox approved these changes Jan 2, 2017

View reviewed changes

gunthercox merged commit 305e22b into gunthercox:master Jan 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nltk.data.find argument needs to be a path, not just a filename #548

nltk.data.find argument needs to be a path, not just a filename #548

Gaboose commented Jan 2, 2017

gunthercox commented Jan 2, 2017

nltk.data.find argument needs to be a path, not just a filename #548

nltk.data.find argument needs to be a path, not just a filename #548

Conversation

Gaboose commented Jan 2, 2017

gunthercox commented Jan 2, 2017