Skip to content

Latest commit

 

History

History
36 lines (22 loc) · 1.09 KB

README.md

File metadata and controls

36 lines (22 loc) · 1.09 KB

TwitterGenderPredictor

JT Wolohan

jwolohan@indiana.edu

Description

This is a Python implementation of Sap et al.'s gender prediction algorithm for Twitter. The algorithm should be 90% accurate given a large sample of users and a reasonable amount of data for each user.

Sap, M., Park, G., Eichstaedt, J., Kern, M., Stillwell, D., Kosinski, M., ... & Schwartz, H. A. (2014). Developing age and gender predictive lexica over social media. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1146-1151).

Use

  1. Clone the repository.
  2. Import SapGenderPrediction.
  3. Initiate a GndrPrdct class object.
  4. Call the predict_gender method on a string collection of tweets.

Predictions are returned as integers. 0 is a prediction of male, 1 is a prediction of female.

Example

# Step 2
from SapGenderPrediction import GndrPrdct

# Step 3
Classifier  = GndrPrdct()
tweets = ["This is a tweet.", "I'm another tweet!", "Hey, @realDonaldTrump, I'm yet another tweet!"]

# Step 4
Classifier.predict_gender(" ".join(tweets))