Architecture

NLP Text Mining Dashboard Web app

For full implementation and report, please refer to the file: report.pdf

Vasilios.io is a dynamic, interactive, nlp dashboard that provides insights into software engineering and data science articles. Designed to explore trends and reader interests in these fields, the dashboard presents exploratory data analysis and text mining visualizations. Users can uncover patterns, popular topics, and article types that resonate most with audiences, making it a valuable tool for understanding content trends in tech and data science.

Architecture

High-Level Architecture of the system:

Data Collection: (in separate repo, will publish soon)
- An Airflow orchestrator is responsible for scheduling and running the tasks here
- The first step is by scraping publicly available archive from the most popular tech publishers on Medium.com
- The next step is to transform and clean the data to fit the schema in PostgreSQL
- The data is then inserted to a message queue with RabbitMQ, which will store every row as a JSON queue
- The Database Worker fetch the data from the queue and insert them into PostgreSQL
Data Persistence:
- Data is saved from the data collection step into a PostgreSQL database
Backend System:
- The backend system is written in Python using the Django framework, which includes:
  - RESTful API endpoints to accept GET requests from Frontend
  - Data analysis by querying the database from PostgreSQL and analyzing the data using data analysis and nlp libraries, such as NumPy, Pandas, NLTK, etc.
  - Testing includes unit tests, integration tests, and using mocks to test sample data
Web App Basic Form / Frontend:
- The frontend is written in HTML, CSS, and JavaScript. Javascript is used to fetch data from the backend API endpoints and charts are visualized with Chart.js
Product Environment:
- The app is containerized with Docker, with Gunicorn serving as the WSGI HTTP server and Nginx acting as a reverse proxy to handle incoming web traffic and distribute it to the application
- The app is deployed on AWS running the following services:
  - AWS Elastic IP for static IP address
  - AWS EC2 running web app in Docker container
  - AWS ECR where the Docker images are stored
  - AWS RDS PostgreSQL running the PostgreSQL database

Project Demo

Dashboard Homepage

Text Mining Page

LDA Analysis

Getting Started (Non-Production Environment)

Please make sure you have Postgres installed in your local computer before your start the following steps

You can set up the database following the Airflow steps in the report.pdf or use the backup.dump

Set up your Postgres Database:

pg_restore -U <your_username> -h localhost -d new_database_name -v /path/to/backup.dump

Navigate into the project's directory:

cd nlp_dashboard/nlp_dashboard

Comment or delete the following lines in settings.py as this is for production only:

DEFAULT_AUTO_FIELD = "django.db.models.BigAutoField"

SECURE_PROXY_SSL_HEADER = ("HTTP_X_FORWARDED_PROTO", "https")

CSRF_TRUSTED_ORIGINS = os.environ.get("CSRF_TRUSTED_ORIGINS").split(" ")

Create a .env file and include the following information:

DEBUG=1 # leave as is
SECRET_KEY = 'insert-a-secret-key-here-can-be-anything' # you can leave as is or change to your preference
DJANGO_ALLOWED_HOSTS=localhost,0.0.0.0,127.0.0.1 # leave as is

SQL_ENGINE = 'django.db.backends.postgresql' # leave as is
SQL_NAME = 'db-name-you-set-up-earlier'
SQL_USER = 'your-postgres-username'
SQL_PASSWORD = 'your-postgres-password'
SQL_HOST = 'localhost' # leave as is
SQL_PORT = '5432' # leave as is
DATABASE = 'postgres' # leave as is

Create and activate your virtual environment, install dependencies, and run

python -m venv venv_name
pip install -r requirements.txt
source venv_name/bin/activate

# inspect the database, copy the models over to models.py
python manage.py inspectdb
python manage.py makemigrations
python manage.py migrate
python manage.py runserver

Note: The live website was previously called teas.cafe

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.idea		.idea
img		img
nlp_dashboard/nlp_dashboard		nlp_dashboard/nlp_dashboard
nlp_venv		nlp_venv
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data-text-analysis.pdf		data-text-analysis.pdf
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Text Mining Dashboard Web app

Architecture

Project Demo

Dashboard Homepage

Text Mining Page

LDA Analysis

Getting Started (Non-Production Environment)

About

Releases

Packages

Languages

License

eyereece/nlp-text-mining-dashboard

Folders and files

Latest commit

History

Repository files navigation

NLP Text Mining Dashboard Web app

Architecture

Project Demo

Dashboard Homepage

Text Mining Page

LDA Analysis

Getting Started (Non-Production Environment)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages