Skip to content

GROBID extension for identifying and normalizing physical quantities.

License

Notifications You must be signed in to change notification settings

lfoppiano/grobid-quantities

Repository files navigation

grobid-quantities

License Documentation Status Github actions Coverage Status Demo grobid-quantities Docker Hub Open in Spaces

Work in progress.

The goal of this GROBID module is to recognize in textual documents any expressions of measurements (e.g. pressure, _ temperature_, etc.), to parse and normalization them, and finally to convert these measurements into SI units. We focus our work on technical and scientific articles (text, XML and PDF input) and patents (text and XML input).

GROBID Quantity Demo

As part of this task we support the recognition of the different value representation: numerical, alphabetical, exponential and date/time expressions.

Grobid Quantity Demo

Finally, we support the identification of the "quantified" substance related to the measure, e.g. silicon nitride powder in

GROBID Quantity Demo

As with the other GROBID models, the module relies only on machine learning and it uses linear CRF. The normalisation of quantities is handled by the java library Units of measurement.

Online demo

Grobid-quantities can be tested with the online demo running on GPU offered by Huggingface Spaces: https://lfoppiano-grobid-quantities.hf.space/

Latest version

The latest released version of grobid-quantities is 0.8.0. The current development version is 0.8.1-SNAPSHOT. Important: to upgrade please check here.

Documentation

All information on how to set up Grobid-quantities, including how to use the REST API, training and evaluation are in the documentation here.

Acknowledgement

This project has been created and developed by science-miner since 2015, with additional support by Inria, in Paris (France) and the National Institute for Materials Science, in Tsukuba (Japan).

How to cite

If you want to cite this work, please simply refer to the Github project with optionally the Software Heritage project-level permanent identifier:

grobid-quantities (2015-2024) <https://github.com/kermitt2/grobid-quantities>, swh:1:dir:dbf9ee55889563779a09b16f9c451165ba62b6d7

Here's a BibTeX entry using the Software Heritage project-level permanent identifier:

@misc{grobid-quantities,
    title = {grobid-quantities},
    howpublished = {\url{https://github.com/kermitt2/grobid-quantities}},
    publisher = {GitHub},
    year = {2015--2024},
    archivePrefix = {swh},
    eprint = {1:dir:dbf9ee55889563779a09b16f9c451165ba62b6d7}
}

License

GROBID and grobid-quantities are distributed under Apache 2.0 license.

The documentation is distributed under CC-0 license and the annotated data under CC-BY license.

If you contribute to grobid-quantities, you agree to share your contribution following these licenses.

Contact: Patrice Lopez (patrice.lopez@science-miner.com), Luca Foppiano (luca@foppiano.org)