biorxiv-retreiver

biorxiv-retriever is a resilient wrapper to the Biorxiv API. It consists of two main classes: BiorxivDataGenerator and BiorxivRetriever. The former uses resilient HTTP requests to generate a dataset with the available preprints in Biorxiv. BiorxivRetriever is an API wrapper that allows for API calls to any of the services supported by the Biorxiv API.

Installing biorxiv-retriever

Clone the repository and setup a Python virtual environment:

    python3 -m venv .venv
    source .venv/bin/activate
    pip install --upgrade pip
    pip install -r requirements.txt

Using biotxiv-retriever from the CLI

From the directory root you can get CLI help on how to call the commands using:

# To use BiorxivRetriever
python -m src.cli.search.search --help
# To use DatasetGenerator
python -m src.cli.create_data.create_data --help

Examples on using BiorxivRetriever

Using the details service of the Biorxiv API to find all papers between first of May 2022 and the current date.

python -m src.cli.search.search details biorxiv \
        --start_date=2022-05-01

Same as in the previous example with data from Medrxiv.

python -m src.cli.search.search details medrxiv \
        --start_date=2022-05-01

Search for details of article publishers. In this case, the publisher with a prefix doi 10.15252

python -m src.cli.search.search publisher biorxiv \
        --prefix=10.15252 \
        --start_date=2021-05-01

Show the summary of content statistics in Biorxiv

python -m src.cli.search.search sum biorxiv \
        --interval=m

Examples on using DatasetGenerator

Get all the available metadata in biorxiv since 4th May 2022 <(-_-)> may the force be with you.

python -m src.cli.create_data.create_data biorxiv \
      --start_date=2022-05-04 \
      --email=your.email@company.acme

Same as above for Medrxiv.

python -m src.cli.create_data.create_data medrxiv \
      --start_date=2022-05-04 \
      --email=your.email@company.acme

Retrieve the entire metadata available since April 2022 and also the source XML text.

python -m src.cli.create_data.create_data biorxiv \
      --start_date=2022-05-04 \
      --email=your.email@company.acme \
      --xml=True

Using biotxiv-retriever as a python module

The functionalities of biorxiv-retriever can be used as normal python modules in case it is necessary. The last line above can be called from a python script using:

from src.dataset_generator import BiorxivDataGenerator
data = BiorxivDataGenerator(start_date='2022-05-04', 
                            email='your.email@company.acme',
                            xml=True)
data()

If you are interested on downloading the metadata only and want to download the source xml files on a later stage, we provide the BiorxivDataGenerator.dl_source_xml method. It accepts the path to the json file with the metadata generated and it downloads the source files. This is useful if you want to obtain the metadata first and the source text on a later step.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

biorxiv-retreiver

Installing biorxiv-retriever

Using biotxiv-retriever from the CLI

Examples on using BiorxivRetriever

Examples on using DatasetGenerator

Using biotxiv-retriever as a python module

About

Releases

Packages

Contributors 2

Languages

License

source-data/biorxiv-retreiver

Folders and files

Latest commit

History

Repository files navigation

biorxiv-retreiver

Installing biorxiv-retriever

Using biotxiv-retriever from the CLI

Examples on using BiorxivRetriever

Examples on using DatasetGenerator

Using biotxiv-retriever as a python module

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages