Skip to content

Hironsan/natural-language-preprocessings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Natural Language Pre-processing

This repository includes some recipes of natural language pre-processing.

The list of recipes are as follows:

  • Data cleaner
  • Word normalization
  • Stopwords remover
  • Tokenizer
  • Word Vector

Install

To install required modules, simply:

$ pip install -r requirements.txt

Setup

First, you should download livedoor news corpus and extract it. For downloading the corpus, please execute following command:

$ cd src/data
$ python make_dataset.py

Now, you can ready for classification!

Start jupyter notebook:

$ jupyter notebook

And you can execute notebooks/document_classification.ipynb.

Good NLP Life!

Licence

MIT

Author

Hironsan