A Natural Language Processing (NLP) interactive Plotly Dash tool to process text data - from tokenizing, lemmatizing, etc. all the way to Machine Learning (ML) classification and word prediction.
NLP analysis in a single app. 11 figures, dropdown and slider analysis controls, ML training and classification
Here is a direct quote:
Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash ties modern UI elements like dropdowns, sliders, and graphs directly to your analytical Python code. Read our tutorial (proudly crafted ❤️ with Dash itself).
Make sure that dash and its dependent libraries and others listed below are correctly installed (using pip or conda, pip shown here):
pip install dash
pip install dash-bootstrap-components
pip install dash-loading-spinners
pip install matplotlib
pip install networkx
pip install nltk
pip install numpy
pip install pandas
pip install seaborn
pip install wordcloud
pip install yellowbrick
- Written entirely in Python - with an interactive ploty Dash web application
- Load text dataframe, parse, tokenize, lemmatize, analyze, train a naive bayes classification model and predict word class.
- Tabbed, interactive and visually-pleasing environment which is easy to use
- Support for doing word relationships using bigram market basket analysis
- Automatic file processing with dropdown for categories and sliders of how many top words (frequency) to plot and display in basket analysis.
- DATA & FREQUENCY - has word frequency plots in different formats, and a datatable
- TREEMAP - Treemap of headline length distributions
- WORD RELATIONSHIPS - Basket analysis (netowrk and heatmap), top 5 word relationships. Calculated from lemmatized word co-occurence
- ML (NAIVE BAYES) - detailed freqency distribution for all categories, train and predict words using multinomina naive bayes
- DATA & FREQUENCY
- TREEMAP
- WORD RELATIONSHIPS
- ML (NAIVE BAYES)
Controls
-
Install Python 3.8 or newer and packages mentioned above
-
Run the app from the comman line with the python file name followed by the dataframe to use.
python3 nlp_dash_tool.py assets/News_Category_Dataset_v3.json
-
use the dropdown and sliders in the first panel (tab) named "DATA & FREQUENCY" to control analysis.
-
The slider for sampling the data is set at 30% by default to give enough data for ML algorithm training
self.sample_percent = 30 #percent
-
Use your command-line to follow app loading and analysis results. A few print outs are intentionally added to spy on performance. You will see changes as you play with sliders and the drop down. It will look like this:
WELLNESS
Length of all words: 85439
FreqDist:
life 628
time 561
one 557
peopl 539
dtype: int64
...........class built
Dash is running on http://127.0.0.1:9132/
* Serving Flask app 'nlp_dash_tool'
* Debug mode: on
And if you press "Run model" in "ML (NAIVE BAYES)" tab ~this shows up:
Train accuracy score: 84.41%
Test accuracy score: 80.82%
After which, if you type in a word to predict, you will see something like this:
Your input
tel
Prediction
ENTERTAINMENT
Your input
tele
Prediction
WELLNESS
The Dash contains everything you need to know about the library. It contains useful information of on the core Dash components and how to use callbacks, examples, functioning code, and is fully interactive. You can also use the Press & news for a complete and concise specification of the API.
Please do not directly copy anything without my concent. Feel free to reach out to me at https://www.linkedin.com/in/mulugeta-semework-abebe/ for ways to collaborate or use some components.
Dash is licensed under MIT. Please view LICENSE for more details. For other packages click on corresponding links at the top of this page (first line).
Huge thanks to the following contributors on kaggle. This app would not have been possible without their massive work!