Skip to content

nlp text mining dashboard to explore current trends and extract most used keywords on software engineering and data science articles. Tech Stack: Django, Python, PostgreSQL, HTML/CSS, JavaScript, Docker, AWS

License

Notifications You must be signed in to change notification settings

eyereece/nlp-text-mining-dashboard

Repository files navigation

NLP Text Mining Dashboard Web app


Live demo: https://vasilios.io

For full implementation and report, please refer to the file: report.pdf

Vasilios.io is a dynamic, interactive, nlp dashboard that provides insights into software engineering and data science articles. Designed to explore trends and reader interests in these fields, the dashboard presents exploratory data analysis and text mining visualizations. Users can uncover patterns, popular topics, and article types that resonate most with audiences, making it a valuable tool for understanding content trends in tech and data science.

Architecture

High-Level Architecture of the system:

  • Data Collection: (in separate repo, will publish soon)
    • An Airflow orchestrator is responsible for scheduling and running the tasks here
    • The first step is by scraping publicly available archive from the most popular tech publishers on Medium.com
    • The next step is to transform and clean the data to fit the schema in PostgreSQL
    • The data is then inserted to a message queue with RabbitMQ, which will store every row as a JSON queue
    • The Database Worker fetch the data from the queue and insert them into PostgreSQL
  • Data Persistence:
    • Data is saved from the data collection step into a PostgreSQL database
  • Backend System:
    • The backend system is written in Python using the Django framework, which includes:
      • RESTful API endpoints to accept GET requests from Frontend
      • Data analysis by querying the database from PostgreSQL and analyzing the data using data analysis and nlp libraries, such as NumPy, Pandas, NLTK, etc.
      • Testing includes unit tests, integration tests, and using mocks to test sample data
  • Web App Basic Form / Frontend:
    • The frontend is written in HTML, CSS, and JavaScript. Javascript is used to fetch data from the backend API endpoints and charts are visualized with Chart.js
  • Product Environment:
    • The app is containerized with Docker, with Gunicorn serving as the WSGI HTTP server and Nginx acting as a reverse proxy to handle incoming web traffic and distribute it to the application
    • The app is deployed on AWS running the following services:
      • AWS Elastic IP for static IP address
      • AWS EC2 running web app in Docker container
      • AWS ECR where the Docker images are stored
      • AWS RDS PostgreSQL running the PostgreSQL database


Project Demo

Dashboard Homepage

Text Mining Page

LDA Analysis



Getting Started (Non-Production Environment)

Please make sure you have Postgres installed in your local computer before your start the following steps

You can set up the database following the Airflow steps in the report.pdf or use the backup.dump

Set up your Postgres Database:

pg_restore -U <your_username> -h localhost -d new_database_name -v /path/to/backup.dump

Navigate into the project's directory:

cd nlp_dashboard/nlp_dashboard

Comment or delete the following lines in settings.py as this is for production only:

DEFAULT_AUTO_FIELD = "django.db.models.BigAutoField"

SECURE_PROXY_SSL_HEADER = ("HTTP_X_FORWARDED_PROTO", "https")

CSRF_TRUSTED_ORIGINS = os.environ.get("CSRF_TRUSTED_ORIGINS").split(" ")

Create a .env file and include the following information:

DEBUG=1 # leave as is
SECRET_KEY = 'insert-a-secret-key-here-can-be-anything' # you can leave as is or change to your preference
DJANGO_ALLOWED_HOSTS=localhost,0.0.0.0,127.0.0.1 # leave as is

SQL_ENGINE = 'django.db.backends.postgresql' # leave as is
SQL_NAME = 'db-name-you-set-up-earlier'
SQL_USER = 'your-postgres-username'
SQL_PASSWORD = 'your-postgres-password'
SQL_HOST = 'localhost' # leave as is
SQL_PORT = '5432' # leave as is
DATABASE = 'postgres' # leave as is

Create and activate your virtual environment, install dependencies, and run

python -m venv venv_name
pip install -r requirements.txt
source venv_name/bin/activate

# inspect the database, copy the models over to models.py
python manage.py inspectdb
python manage.py makemigrations
python manage.py migrate
python manage.py runserver

Note: The live website was previously called teas.cafe

About

nlp text mining dashboard to explore current trends and extract most used keywords on software engineering and data science articles. Tech Stack: Django, Python, PostgreSQL, HTML/CSS, JavaScript, Docker, AWS

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published