Skip to content

DrSquare/ConsumerReviewNLP

Repository files navigation

Automated Market Intelligence

This is Git Repo for Automated Market Intelligence (based on Amazon Consumer Reviews)

Team Members: Minha Hwang, Michael Trusov

Idea: Use prompt engineering and short-sentence transformer embedding to find (1) important product attributes (latent) and (2) consumer attribute-level sentiments for key brands/products

1. Dataset:

(1) Standford SNAP - Amazon Consumer Review: Loaded in Git Repo
https://snap.stanford.edu/data/web-Amazon.html

(2) IRI Academic Dataset:
To be loaded, Description paper in the link below
https://www.dropbox.com/s/lvdjlc67uvo94nr/The%20IRI%20Marketing%20Data%20Set.pdf

Both datasets are licensed only for research purposes. Please be careful not to distribute these beyond this project.

2. Key Reference:

Note that there has been sizable NLP research conducted before the application of LLM. However, those efforts were not scalable due to high-cost manual labeling tasks.

(1) J. McAuley and J. Leskovec. "Hidden factors and hidden topics: understanding rating dimensions with review text," RecSys, 2013. http://i.stanford.edu/~julian/pdfs/recsys13.pdf

(2) Bajari, "Machine Learning Methods for Demand Estimation"
https://faculty.washington.edu/bajari/published/RYAN_manuscript.pdf

(3) Netzer, Feldman, Goldenberg, Fresko. "Mine Your Own Business: Market-Structure Surveillance Through Text Mining," 2012 https://www0.gsb.columbia.edu/mygsb/faculty/research/pubfiles/4468/Netzer_Feldman_Goldenberg_Fresko_2012.pdf

(4) Lee and Bradlow, "Automated Marketing Research Using Online Consumer Reviews," 2011 https://journals.sagepub.com/doi/abs/10.1509/jmkr.48.5.881

3. Next steps:

(1) Validate results against survey data
(2) Refine prompts and clustering approach