Collection of reddit submissions from various computer science subreddits to find trends in the computer science domain
Contains python file data collection, general metadata file, and sample generated datasets
- Install the following programs/plugins
- Any text editor
- A bash interface
- Install git (sudo apt-get install git)
- The latest version of python: https://www.python.org/ (pip install python 2.7)
- PRAW: https://praw.readthedocs.io (pip install praw)
- Clone this repository (git clone https://github.com/IanGross/RedditDataCollection/)
- Create a reddit account and setup and Script App with OAuth2
- Instructions here: https://github.com/reddit/reddit/wiki/OAuth2
- Copy the values for user_agent, client_id and client_secret of your application
- Open the Reddit_Data_Collection.py file and insert the values in the appropriate fields of "reddit = praw.Reddit()" (located right after import statements.
- Optional
- Go to the website: https://www.unixtimestamp.com/ and get a utc start and end value. Insert those values under start_utc and end_utc
- Add and/or remove subreddit names in the subreddit_list list
- Run the file with: python Reddit_Data_Collection.py