paragraph level data sentence level data review_dataset get a list of paper ids as well as their revisions urls python spider.py download the pdfs using the urls python downloader.py compile the dataset and save to json python dataset.py