Skip to content

Latest commit

 

History

History
185 lines (135 loc) · 7.47 KB

README.md

File metadata and controls

185 lines (135 loc) · 7.47 KB

NLP-tools-in-Dash       seaborn     plotly      PyPI     sklearnPyPI     nltk PyPI     seaborn PyPI

A Natural Language Processing (NLP) interactive Plotly Dash tool to process text data - from tokenizing, lemmatizing, etc. all the way to Machine Learning (ML) classification and word prediction.

About this app

NLP analysis in a single app. 11 figures, dropdown and slider analysis controls, ML training and classification

About dash

Here is a direct quote:

Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash ties modern UI elements like dropdowns, sliders, and graphs directly to your analytical Python code. Read our tutorial (proudly crafted ❤️ with Dash itself).

Getting Started in Python

Prerequisites and usage

Make sure that dash and its dependent libraries and others listed below are correctly installed (using pip or conda, pip shown here):

pip install dash
pip install dash-bootstrap-components
pip install dash-loading-spinners
pip install matplotlib
pip install networkx
pip install nltk
pip install numpy 
pip install pandas
pip install seaborn
pip install wordcloud
pip install yellowbrick

Features

  • Written entirely in Python - with an interactive ploty Dash web application
  • Load text dataframe, parse, tokenize, lemmatize, analyze, train a naive bayes classification model and predict word class.
  • Tabbed, interactive and visually-pleasing environment which is easy to use
  • Support for doing word relationships using bigram market basket analysis
  • Automatic file processing with dropdown for categories and sliders of how many top words (frequency) to plot and display in basket analysis.

Algorithm steps

Panels (tabs)

  1. DATA & FREQUENCY - has word frequency plots in different formats, and a datatable
  2. TREEMAP - Treemap of headline length distributions
  3. WORD RELATIONSHIPS - Basket analysis (netowrk and heatmap), top 5 word relationships. Calculated from lemmatized word co-occurence
  4. ML (NAIVE BAYES) - detailed freqency distribution for all categories, train and predict words using multinomina naive bayes

  1. DATA & FREQUENCY

  1. TREEMAP

  1. WORD RELATIONSHIPS

  1. ML (NAIVE BAYES)

Controls

How to use

  • Install Python 3.8 or newer and packages mentioned above

  • Run the app from the comman line with the python file name followed by the dataframe to use.

    python3 nlp_dash_tool.py assets/News_Category_Dataset_v3.json

  • use the dropdown and sliders in the first panel (tab) named "DATA & FREQUENCY" to control analysis.

  • The slider for sampling the data is set at 30% by default to give enough data for ML algorithm training

    self.sample_percent = 30 #percent

  • Use your command-line to follow app loading and analysis results. A few print outs are intentionally added to spy on performance. You will see changes as you play with sliders and the drop down. It will look like this:

WELLNESS
Length of all words:  85439
FreqDist:
life     628
time     561
one      557
peopl    539
dtype: int64
...........class built
Dash is running on http://127.0.0.1:9132/

 * Serving Flask app 'nlp_dash_tool'
 * Debug mode: on

And if you press "Run model" in "ML (NAIVE BAYES)" tab ~this shows up:

Train accuracy score: 84.41%
Test accuracy score: 80.82%

After which, if you type in a word to predict, you will see something like this:

Your input
tel

Prediction
ENTERTAINMENT

Your input
tele

Prediction
WELLNESS

Documentation

The Dash contains everything you need to know about the library. It contains useful information of on the core Dash components and how to use callbacks, examples, functioning code, and is fully interactive. You can also use the Press & news for a complete and concise specification of the API.

More references

Contributing and Permissions

Please do not directly copy anything without my concent. Feel free to reach out to me at https://www.linkedin.com/in/mulugeta-semework-abebe/ for ways to collaborate or use some components.

License

Dash is licensed under MIT. Please view LICENSE for more details. For other packages click on corresponding links at the top of this page (first line).

Acknowledgments

Huge thanks to the following contributors on kaggle. This app would not have been possible without their massive work!