Idea: Use prompt engineering and short-sentence transformer embedding to find (1) important product attributes (latent) and (2) consumer attribute-level sentiments for key brands/products
(1) Standford SNAP - Amazon Consumer Review: Loaded in Git Repo
https://snap.stanford.edu/data/web-Amazon.html
(2) IRI Academic Dataset:
To be loaded, Description paper in the link below
https://www.dropbox.com/s/lvdjlc67uvo94nr/The%20IRI%20Marketing%20Data%20Set.pdf
Both datasets are licensed only for research purposes. Please be careful not to distribute these beyond this project.
Note that there has been sizable NLP research conducted before the application of LLM. However, those efforts were not scalable due to high-cost manual labeling tasks.
(1) J. McAuley and J. Leskovec. "Hidden factors and hidden topics: understanding rating dimensions with review text," RecSys, 2013.
http://i.stanford.edu/~julian/pdfs/recsys13.pdf
(2) Bajari, "Machine Learning Methods for Demand Estimation"
https://faculty.washington.edu/bajari/published/RYAN_manuscript.pdf
(3) Netzer, Feldman, Goldenberg, Fresko. "Mine Your Own Business: Market-Structure Surveillance Through Text Mining," 2012 https://www0.gsb.columbia.edu/mygsb/faculty/research/pubfiles/4468/Netzer_Feldman_Goldenberg_Fresko_2012.pdf
(4) Lee and Bradlow, "Automated Marketing Research Using Online Consumer Reviews," 2011 https://journals.sagepub.com/doi/abs/10.1509/jmkr.48.5.881
(1) Validate results against survey data
(2) Refine prompts and clustering approach