2023F-Machine Learning Term Project: Topic Modelling on Amazon Review Data
- Data loader (Preprocessed)
- Document embedding
- Sentence Transformer
- Dimension Reduction (UMAP)
- Clustering (HDBScan)
- Topic extraction (LDA, C-TF-IDF from BERTopic)
./
├─ Final.ipynb ────────────────── All codes for this report
├─ result ─────────────────────── Directory to store experiment
└─ dataset
├─ beauty.csv ──────────────── Main dataset with 30,000 beauty reviews
└─ whole.csv ───────────────── 100,000 reviews of movie, beauty, hotel
Final.ipynb 실행 시, workspace path 를 환경에 맞게 (path 를) 재설정해주어야 한다.
예) /content/drive/MyDrive/Final_submit
\