https://warm-falls-46451.herokuapp.com/
Responding to diasters is an important task that needs to be quick and efficient. During a disaster, thousands of texts or messages flood the social media or any other news media that need to be paid attention. Based on the necessity in the message, it is forwarded to the relevant department and aid operations are carried out. During disasters, responsive teams are usually vunerable and simple key word mapping to classify a message might miss hidden nuances but should also be robust enough to makse sure it's aid related. A deployed machine learning model that has the capability to automatically classify the incoming messages is what this project is about.
There are 3 major componenets in this project
- An ETL pipeline that extracts the data, cleans it and loads it into a postgres database.
- A ML/NLP pipeline that loads the data from the database, performs training and optimizing operations to generate a model.
- A web app, that takes new incoming messages, feed them to the trained model, predict the category of the message and displays it on the UI.
- ORM - SqlAlchemy
- Language - Python 3.7.9
- ML - Sklearn, Numpy, Pandas
- NLP - NLTK (wordnet, punkt, stopwords, average_perceptron_tagger)
- Web app - Flask
- Visualizations - Plotly
data/process_data.py
File that takes in data, sends it through ETL pipeline and stores in databasemodels/train_classifier.py
File that loads data from database, trains and stores the ML model into a pickle file.data/disaster_messages.csv
anddata/disaster_categories.csv
Data used to train the model, provided by FigureEightrun.py
Flask web apptemplates/master.html
Main Html file andtemplates/go.html
Html file that displays fetched results
-
Clone the repository by executing
git clone https://github.com/siddarthaThentu/Disaster-Response-Pipeline.git
-
Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- To run ETL pipeline that cleans data and stores in database
-
Run the following command in the app's directory to run your web app.
python run.py
-
Go to http://0.0.0.0:3001/
- Udacity The project was developed as a part of Udacity's Data Science Nanodegree Program.
- FigureEight For providing the datasets to train the model.
- Training data as seen in homepage looks skewed. Weighted training or capturing more data should handle the data bias.
- A chance of data/concept drift in the future with changes in modern text languages.
- Improving code performance by identifying bottlenecks.