Skip to content

vgenot/BibHelioTech

 
 

Repository files navigation

BibHelioTech

DOI License
GitHub Pipenv locked Python version GitHub release (latest by date)
GitHub repo size
GitHub issues

BibHelioTech project description

BibHelioTech is a program for recognition of temporal expressions and entities (satellites, instruments, regions) extracted from scientific articles in the field of heliophysics.
It was developed for the IRAP (INSTITUT DE RECHERCHE EN ASTROPHYSIQUE ET PLANÉTOLOGIE (CNRS)) of Toulouse.
Its main purpose is to retrieve this information which is not currently available digitally, and to allow its visualisation on AMDA (http://amda.irap.omp.eu/).

Installation guide

STEP 1: install all dependency
   On your shell, run: pip install -r requirements.txt
   Don't forget to install SUTime Java dependencies, more details on: https://pypi.org/project/sutime/
   Put the "english.sutime.txt" under sutime install directory, jars/stanford-corenlp-4.0.0-models.jar/edu/stanford/nlp/models/sutime/

STEP 2: tesseract 5 installation (Ubuntu exemple)
   sudo apt update
   sudo add-apt-repository ppa:alex-p/tesseract-ocr-devel
   sudo apt install -y tesseract-ocr
   sudo apt update
   tesseract --version

STEP 3: GROBID installation
   install GROBID under ../
   Follow install instruction on: https://grobid.readthedocs.io/en/latest/Install-Grobid/
   Make sure you have JVM 8 used by default !

STEP 4: GROBID python client installation
   install GROBID python client under ../
   Follow install instruction on: https://github.com/kermitt2/grobid_client_python

User guide

Put Heliophysics articles in pdf format under BibHelio_Tech/DATA/Papers.
You just have to run "MAIN.py".

optionally if you want to have AMDA catalogues by satellites,
you need to run "SATS_catalogue_generator.py".

License

If you use or contribute to BibHelio_Tech, you agree to use it or share your contribution following this license.

Authors

[Axel Dablanc]: axel.alain.dablanc@gmail.com
[Vincent Génot]: vincent.genot@irap.omp.eu

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%