Stock Pulse: Machine Learning Tool to Predict Stock Trends

Group: PG-27
Dataset: Huge Stock Market Dataset
Project Timeline: August 2024 – December 2024

Project Overview

"Stock Pulse" is a machine learning project designed to predict stock market trends using historical data from the Huge Stock Market Dataset. The goal is to develop and evaluate multiple machine learning models, such as Random Forest, SVM, Multiple Linear Regression, and Neural Networks, to provide accurate stock price predictions. Our project is structured to not only maximize predictive performance but also to demonstrate a deep understanding of the data, modeling techniques, and the impact of features on model outcomes.

This README serves as a reference for our project progress, technical decisions, and evaluation methodology.

Dataset Description

The Huge Stock Market Dataset contains historical time-series data for about 8.4K companies from 2009 to 2017, spanning all U.S.-based stocks and ETFs traded on the NYSE, NASDAQ, and NYSE MKT exchanges. The dataset is in CSV format and includes the following features:

Date: Date of the record
Open: Opening price for the stock on that day
High: Highest price during the day
Low: Lowest price during the day
Close: Closing price for the stock on that day
Volume: Number of shares traded
OpenInt: Open interest (relevant for options trading)

The dataset is reliable, having been sourced from financial exchanges, and its prices have been adjusted for dividends and splits.

Project Motivation

The stock market is highly volatile and complex, making accurate predictions extremely valuable for investors and financial institutions. Developing a machine learning model to predict future stock movements can improve investment decisions, risk management, and financial planning.

By accurately forecasting stock prices, this project could benefit traders, investors, and financial analysts, enabling them to make more informed decisions and improve portfolio performance.

Problem Statement

The objective of "Stock Pulse" is to answer the following key questions:

How accurately can we predict future stock prices using historical time-series data?
What are the most influential factors in predicting stock market movements?
How do different machine learning models compare in terms of predictive accuracy?

Our aim is to explore a variety of machine learning approaches and assess their performance in stock price prediction.

General Approach

The project will follow a systematic approach to answer the above questions:

Exploratory Data Analysis (EDA): We will first explore the dataset to understand trends, relationships, and any potential issues (e.g., missing data, outliers).
Feature Engineering: We plan to create new features such as moving averages (e.g., 10-day, 50-day), relative strength index (RSI), or financial indicators like Bollinger Bands.
Model Selection: Multiple machine learning models will be evaluated:
- Baseline Models: Linear Regression and Decision Trees for initial predictions.
- Advanced Models: Random Forest, Support Vector Machines (SVM), Neural Networks (LSTM for time-series modeling).
Model Training and Tuning: We will train and tune each model using appropriate evaluation metrics.
Evaluation: Performance will be evaluated using metrics like Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and R-squared to assess prediction accuracy.
Error Analysis and Interpretability: We will conduct error analysis to understand model weaknesses and improve feature selection and model tuning.

Evaluation Criteria

We are following the course rubric that emphasizes analysis over pure performance. Here's how we will measure success:

Content (70% of the 20% total grade)

Originality (4%): We will develop original features and explore creative approaches to improving stock price prediction accuracy.
Relevance (4%): The project is deeply connected to core machine learning concepts, including time-series modeling, feature engineering, and model evaluation.
Related Work (4%): We will review academic research on stock market prediction and explain how our approach compares to prior work.
Technical Justification (10%): Each model will be justified in terms of its suitability for time-series stock price prediction. We'll explain why certain models perform better than others.
Implementation (20%): We will implement multiple models and ensure correct model training, including hyperparameter tuning.
Model Evaluation (18%): We will use both macroscopic (dataset-wide metrics like RMSE, Accuracy) and microscopic (error analysis) approaches to evaluate models.
Results Interpretation (10%): We will explain why certain models perform well and propose improvements for future iterations.

Presentation (30% of the 20% total grade)

Quality / Organization (10%): Our final presentation will be well-organized, with a clear timeline, and produced to a high standard.
Clarity / Understanding (10%): We will ensure that our technical approach and evaluation are described in detail, with visuals to support comprehension.
Visual Component (6%): The presentation will feature clean, readable, and colorful slides with charts, graphs, and other visuals to highlight key points.
Oral Component (4%): Our presentation will be polished, well-paced, and delivered confidently.

Timeline and Role Assignment

The project will be completed over 9 weeks:

Timeline	Task	Assigned Members
Week 6-8	Data Preprocessing, Exploration	Members 1 & 2
Weeks 9-12	Model Selection, Initial Training	Members 3 & 4
Weeks 9-12	Hyperparameter Tuning, Testing	Members 5 & 6
Week 13	Final Presentation, Results Interpretation	All Members

Resources

To carry out the project, we will use the following resources:

Python Libraries: pandas, NumPy, scikit-learn, TensorFlow/Keras
Kaggle Dataset: Huge Stock Market Dataset
Project Guide: CS3244 Project Quick Start Guide

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md
Stock_price_prediction.ipynb		Stock_price_prediction.ipynb
stock_price_prediction.ipynb		stock_price_prediction.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stock Pulse: Machine Learning Tool to Predict Stock Trends

Table of Contents

Project Overview

Dataset Description

Project Motivation

Problem Statement

General Approach

Evaluation Criteria

Content (70% of the 20% total grade)

Presentation (30% of the 20% total grade)

Timeline and Role Assignment

Resources

About

Releases

Packages

Contributors 3

Languages

AdamChoong0095/NUS-Stock-Data

Folders and files

Latest commit

History

Repository files navigation

Stock Pulse: Machine Learning Tool to Predict Stock Trends

Table of Contents

Project Overview

Dataset Description

Project Motivation

Problem Statement

General Approach

Evaluation Criteria

Content (70% of the 20% total grade)

Presentation (30% of the 20% total grade)

Timeline and Role Assignment

Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages