The goal of this project is to classify song lyrics into emotional categories: Happy, Sad, Angry, and Relaxed. This classification aims to enhance music recommendation systems by aligning songs with the listener's mood, potentially offering insights into mental health. With advancements in NLP, this project explores the use of AI to create mood-based recommendations that analyze lyrical content.
We utilized the MoodyLyricQ dataset, containing 2000 songs evenly distributed across four mood categories. Since the dataset lacked lyrics due to copyright constraints, we used the Genius API to retrieve song lyrics based on title and artist name.
- Structured a dataset with song IDs, preprocessed lyrics, and mood labels.
- Split the data into 80% training and 20% validation sets.
- Fine-tuned GPT-2 for sequence classification using Cross Entropy Loss, Adam optimizer, and a StepLR scheduler.
- Employed gradient accumulation, mixed precision, and a batch size of 4 across 20 epochs.
- Preprocessed the data to include system prompts, mood labels, and lyrical text.
- Generated training, validation, and test sets in JSONL format.
- Fine-tuned GPT-3.5 on OpenAI with default hyperparameters and evaluated its performance using the test dataset.
- BERT outperformed GPT-2: BERT’s bidirectional context capture helped in better mood classification by understanding nuanced sentiments in lyrics.
- Minimal Pre-processing: Preserving the raw text improved performance, as stemming/lemmatization removed important emotional nuances.
- Misclassification of "Relaxed": The model often confused "Relaxed" with "Happy," likely due to the subtle differences in valence and arousal between the two emotions.