This script generates a dataset for classification of twitter interests. It has the ability to use multiple twitter API keys to speed up the process.
Output:
1)A CSV file of accountName / tweets / interest
2)A processed CSV file of the above.
tweepy
- Edit the dicOfAccounts() function with inputs where Key = interest, Values = List of twitter accounts to mine
eg, dic['News'] = ['cnn','bbc','nytimes']. 'cnn','bbc','nytimes' will be mined and their tweets will be tagged as 'news'.
2)Add your twitter API test keys. uncomment out accesstokenlist and add all the keys you have.
- (Optional) Run verifyTwitterAccounts to verify the twitter accounts given.
Upon running, Youll get a twitterInterests.csv in your python folder
Important: If you want to recreate the CSV, treat it as first time running. else, COMMENT OUT makeCSV() in the last few lines.