Skip to content

(FYI: the free Render PostgreSQL database has expired) A SPA that takes a website URL as input, scrapes its content, and classifies visitors based on their interests or industry. The goal is to dynamically generate questions and multiple-choice options that help categorize users visiting the site.

Notifications You must be signed in to change notification settings

behi22/WebScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WebScraper

Web Scraper for Visitor Classification.

Table of Contents

General Information

A Single Page Application that takes a website URL as input, scrapes its content, and classifies visitors based on their interests or industry. The goal is to dynamically generate questions and multiple-choice options that help categorize users visiting the site.

Technologies Used

  • npm - 8.15.0
  • React.js - 18.3.1
  • Redux - 9.1.2
  • antd - 5.22.2
  • HTML - version html5
  • CSS
  • babel
  • Axios
  • AJAX
  • git version 2.38.1.windows.1
  • github
  • Linux
  • WSL
  • Python
  • Flask
  • PostgreSQL
  • Vercel
  • Redis
  • Render

Screenshots

alt text

Usage

The app should have the following features:

  • Frontend - Neat and User-Friendly component based Frontend, created with React and deployed using Vercel
  • Backend API - Python-based API, Properly implementing web scraping, data extraction, and AI-based content generation, deployed using Render
  • Storage - Utilize PostgreSQL database for storage, Hosted on Render
  • Caching - Utilize Redis for caching, Hosted on Redis Cloud
  • Effective integration of Frontend and Backend components

Project Status

Project is: Semi-Complete (Demo)

Room for Improvement

  • As indicated in the comments in Home.js, currently the answers for each question aren't submitted anywhere, and the logic could be developed further.

  • The script for generating questions in App.py is still very primitive and could be developed further with more time and resources at hand, so that we could generate more meaningful questions.

  • There is an issue with the Missing Answers StyledParagraph inside Home.js where it is still visible after submitting partial answers and changing the URL, that needs further time in debugging in order to resolve.

Acknowledgements

  • Many thanks to Brave Career for including me in their Software Engineer assessment project.

Contact

Created by Behbod Babai - feel free to contact me via email! my email: behibabai@gmail.com

About

(FYI: the free Render PostgreSQL database has expired) A SPA that takes a website URL as input, scrapes its content, and classifies visitors based on their interests or industry. The goal is to dynamically generate questions and multiple-choice options that help categorize users visiting the site.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published