GitHub

This repo is the slightly modified version of https://github.com/karpathy/arxiv-sanity-preserver.

Crawling and Organizing arxiv papers

Build virtual env.

conda env create -f arxiv-env.yml

Set your search keywords(list) at utils.py. (crawler will search for articles that contain the keywords in the abstract.)

class Config(object):
    search_list = ["graph convolution","graph neural network"]
    # search query : [graph] AND [convolution] OR [graph] AND [neural] AND [network]
    save_pdf_by_title = True
    # if save_pdf_by_title = False, the pdf file will be saved as 1805.07857v2.pdf
    save_pdf_by_months = True
    # if save_pdf_by_months = False, the files will not be organized by months. (will be organized by years.)

Run source crawling.sh

conda activate arxiv-env

python fetch_papers.py --max-index=1000  # maximum number of papers
python download_pdfs.py

# repeat
python fetch_papers.py --max-index=1000
python download_pdfs.py

conda deactivate

(Optional) Put papers on Mendeley library.

You can write simple review at note and organize papers by using tags. (searching papers based on notes and tags)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawling and Organizing arxiv papers

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
README.md		README.md
arxiv-env.yml		arxiv-env.yml
crawling.sh		crawling.sh
download_pdfs.py		download_pdfs.py
fetch_papers.py		fetch_papers.py
mendeley_library.png		mendeley_library.png
utils.py		utils.py

LeeJunHyun/arxiv_crawler

Folders and files

Latest commit

History

Repository files navigation

Crawling and Organizing arxiv papers

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages