Skip to content

Repository dedicated to data science and machine learning projects (e.g. ML models, notebooks, etc.)

License

Notifications You must be signed in to change notification settings

eduardocsilva/data-science

Repository files navigation

Data Science and Machine Learning Projects

Repository dedicated to data science and machine learning projects (e.g. ML models, notebooks, etc.)

For more information about each project, check each project's README or the respective synopsis below.

1. Projects

1.1. Book Recommendation System

This project includes two Book Recommendation Systems, the first one based on the Cosine Similarity between each book's content and characteristics, and the second one through collaborative filtering, a technique which, unlike the former, needs users to give ratings to the books they read, indicating if they liked it or not.

Keywords: Machine Learning, Market Segmentation, Cosine Similarity, Content Based Recommendation, Collaborative Filtering.

Technologies: Python, Jupyter, Scikit-learn, Pandas, etc.


1.2. Coffee Shop Chat Bot

A simple Chat Bot, simulating an owner of a coffee shop. You can greet him, ask what's for sale and even ask him some jokes.

The chat bot is supported by a neural network that detects the topic of your sentence (e.g. greeting, asking what's for sale), through Natural Language Processing, and then selects an appropriate response.

Keywords: Chat Bot, Conversational System, Machine Learning, Natural Language Processing, Neural Network.

Technologies: Python, Jupyter, PyTorch, NLTK, etc.


1.3. Formula 1 Encyclopedia

A small project that served as an introduction to the Streamlit library, offering a simple front-end from which Formula 1 data (e.g. drivers, teams, circuits, races, etc.) can be browsed.

At the moment, only driver stats are browsable, but in the future, more data can be added, bringing this project closer to what its name suggests.

Keywords: Data Exploration and Analysis, Web Application.

Technologies: Python, Streamlit, Pandas, etc.


1.4. Guitar Model Classification

This project served as an introduction to Tensorflow and its JavaScript implementation, allowing the trained model to be integrated in React.js front-end application, which can be consulted in the following repository and website.

The obtained deep neural network model is capable of processing an image and classifying the guitar contained in it, with a relatively low 60% accuracy, which can be improved by collecting more data/images for each of the guitar models and finer tuning of the neural network's architecture and training process.

Later on, a PyTorch implementation was developed, having obtained a neural network with a much improved 97% accuracy, although very little was changed in its implementation.

Keywords: Machine Learning, Classification, Neural Network, Deep Learning, TensorFlow.

Technologies: Python, Jupyter, TensorFlow, Matplotlib, etc.


1.5. Marketing Campaign Analysis

(...)


1.6. Name Classification

This project served as an introduction to PyTorch and RNN's (Recurrent Neural Networks). The obtained deep neural network is capable of classifying names according to their country's origin (e.g. the surname "Coelho" originated in Portugal). Furthermore, this projects shows how powerful RNN's can be when applied to Natural Language Processing.

Keywords: Deep Learning, Natural Language Processing, PyTorch, Recurrent Neural Networks, Classification.

Technologies: Python, Jupyter, PyTorch, Scikit-learn, Pandas, Matplotlib, etc.


1.7. User Review Classification

This project contains two user review classification models, the first for product categories (e.g. books, technology) and the second for the review's sentiment (e.g. positive, neutral or negative).

The classification is achieved through natural language processing techniques such as text tokenization and occurence/frequency analysis.

Keywords: Machine Learning, Classification, Natural Language Processing, Text Tokenization, Word Occurence/Frequency, Bag of Words.

Technologies: Python, Scikit-learn, Pandas, Seaborn, Matplotlib, etc.

2. Future Projects

  • Average Life Expectancy as a funcion of the Average Sleep Time
  • Churn Predictor
  • Formula 1 Race Winner Predictor
  • Marketing Campagin Performance Analysis
  • Real/Fake News Classifier
  • Rock Paper Scissors (w/ Computer Vision)

About

Repository dedicated to data science and machine learning projects (e.g. ML models, notebooks, etc.)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published