Note: This is a work-in-progress repository for a Thesis I and II project. The final structure and content may evolve as the thesis progresses.
The directory is structured as follows:
- Data Preprocessing:
scraper.ipynb
: Scraping the.xlsx
file which contains the dataset [1].preprocessor.ipynb
- Feature Extraction:
text_stream_BERT.ipynb
: Extracts text features using BERT.audio_stream_VIT.ipynb
: Extracts visual features using ViT.visual_stream_VIT.ipynb
: Extracts visual features using ViT.
- File Extraction:
frameExtraction.ipynb
logmelExtraction.ipynb
textExtraction.ipynb
- Fusion and Classification:
- Modules:
cross_attention.py
dataloader.py
classifier.py
linear_transformation.py
output_max.py
- Evaluation and Validation:
trainer.py
evaluation.py
cross_validation.py
[1] Shafaei, M., Smailis, C., Kakadiaris, I. A., & Solorio, T. (2021).
A Case Study of Deep Learning Based Multi-Modal Methods for Predicting the Age-Suitability Rating of Movie Trailers.
arXiv preprint arXiv:2101.11704.
[2] Arevalo, J., Solorio, T., Montes-y-Gómez, M., & González, Fabio A. (2024).
Gated Multimodal Units for Information Fusion.
ArXiv.org. https://arxiv.org/abs/1702.01992
[3] Xie, B., Sidulova, M., & Park, C. H. (2021).
Robust multimodal emotion recognition from conversation with transformer-based crossmodality fusion.
Sensors, 21(14), 4913.
- Kyle Andre Castro
- Carl Mitzchel Padua
- Edjin Jerney Payumo
- Nathaniel David Samonte