Skip to content
/ Data Public

Publically avialble data found in my courses and books

License

Notifications You must be signed in to change notification settings

stricje1/Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data

Publically avialble data found in my courses and books

Sources

chennai_crimes:

  • Simulated crime data for Channaie, India, based on comparable USA sities, including Atlanta, Los Angeled, and Chicago.
  • Best applications: geospatial analysis & clustering;
  • 499,365 records;
  • IncidntNum, Category, Descript, DayOfWeek, Date, Time, PdDistrict, Resolution, Address, X, Y, Location, PdId

abcnews-date-text

  • Headlines from ABC News.
  • Best applications: Topic analysis;
  • 1,000,000 records;
  • publish_date, headline_text

clslowbwt_mod

  • Factors leading to low birth weight.
  • Best applications: logistic regression;
  • 690 records;
  • ID, BIRTH, SMOKE, RACE, AGE, LWT, BWT, RESP

obama_romney

  • Presidential debate between President Obama and Governor Romney.
  • Best applications: topic analysis, sentiment analysis;
  • 2,487 records;
  • person, tot, time, role, dialogue

HouseVoyes84

  • 1984 House of Representatives voting record .
  • Best applications: logistic regression;
  • 435 records;
  • issue_A through issue_B, party

application test

  • Large file for cash loans and revolving loans with 117 independent variables.
  • Best applications: logistic regression;
  • 48745 records with 120 vaiables;
  • NAME_CONTRACT_TYPE = Cash Loans, Revolving Loans

case_study

  • Complaint data for several banking product groups.
  • Best applications: natural language processing (NLP), bag-of-words classifier model;
  • 1,000,000 recods;
  • complaint_id, product_group text

Complaints_Dataset

  • Cleaned complaint data for several banking product groups.
  • Best applications: natural language processing (NLP), bag-of-words classifier model;
  • 1,000,000 records;
  • complaint_id product_group text

reviews

  • Product reviews for numerous products.
  • Best applications: sentiment analysis;
  • 120,000 records;
  • Id, ProductId, UserId, ProfileName, HelpfulnessNumerator, HelpfulnessDenominator, Score, Time, Summary, Text

Bully Tracer Consensus

  • Survey responses for cyberbulling.
  • Best applications: logistic regression;
  • 2,080 records;
  • 11 packets;
  • File Name, Is Cyberbullying Present?

AboutIsis_Clean

  • Tweets about ISIS Fanboys.
  • Best applications: sentiment analysis;
  • 115,600 ecords;
  • name, username, tweetid, date, time, tweets

About

Publically avialble data found in my courses and books

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published