Publically avialble data found in my courses and books
- Consummer Complaint Database; https://www.consumerfinance.gov/data-research/consumer-complaints/#download-the-data
- City of Atlanta Crime Data; https://www.atlantapd.org/i-want-to/crime-data-downloads
- Simulated crime data for Channaie, India, based on comparable USA sities, including Atlanta, Los Angeled, and Chicago.
- Best applications: geospatial analysis & clustering;
- 499,365 records;
- IncidntNum, Category, Descript, DayOfWeek, Date, Time, PdDistrict, Resolution, Address, X, Y, Location, PdId
- Headlines from ABC News.
- Best applications: Topic analysis;
- 1,000,000 records;
- publish_date, headline_text
- Factors leading to low birth weight.
- Best applications: logistic regression;
- 690 records;
- ID, BIRTH, SMOKE, RACE, AGE, LWT, BWT, RESP
- Presidential debate between President Obama and Governor Romney.
- Best applications: topic analysis, sentiment analysis;
- 2,487 records;
- person, tot, time, role, dialogue
- 1984 House of Representatives voting record .
- Best applications: logistic regression;
- 435 records;
- issue_A through issue_B, party
- Large file for cash loans and revolving loans with 117 independent variables.
- Best applications: logistic regression;
- 48745 records with 120 vaiables;
- NAME_CONTRACT_TYPE = Cash Loans, Revolving Loans
- Complaint data for several banking product groups.
- Best applications: natural language processing (NLP), bag-of-words classifier model;
- 1,000,000 recods;
- complaint_id, product_group text
- Cleaned complaint data for several banking product groups.
- Best applications: natural language processing (NLP), bag-of-words classifier model;
- 1,000,000 records;
- complaint_id product_group text
- Product reviews for numerous products.
- Best applications: sentiment analysis;
- 120,000 records;
- Id, ProductId, UserId, ProfileName, HelpfulnessNumerator, HelpfulnessDenominator, Score, Time, Summary, Text
- Survey responses for cyberbulling.
- Best applications: logistic regression;
- 2,080 records;
- 11 packets;
- File Name, Is Cyberbullying Present?
- Tweets about ISIS Fanboys.
- Best applications: sentiment analysis;
- 115,600 ecords;
- name, username, tweetid, date, time, tweets