Skip to content

Commit

Permalink
update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
lfoppiano committed Feb 4, 2024
1 parent e5e4f01 commit 81b2342
Show file tree
Hide file tree
Showing 3 changed files with 83 additions and 50 deletions.
34 changes: 2 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,39 +37,9 @@ Spaces: https://lfoppiano-grobid-quantities.hf.space/

## Latest version

The latest released version of grobid-quantities
is [0.7.3](https://github.com/kermitt2/grobid-quantities/releases/tag/v0.7.3). The current development version is
0.7.4-SNAPSHOT.
The latest released version of grobid-quantities is [0.7.3](https://github.com/kermitt2/grobid-quantities/releases/tag/v0.7.3). The current development version is 0.7.4-SNAPSHOT.
**Important**: to upgrade please check [here](https://grobid-quantities.readthedocs.io/gettingStarted.html#upgrade).

### Update from 0.7.2 to 0.7.3

#### Grobid models
In version 0.7.3 we have updated the DeLFT models. The DL models must be updated by running `./gradlew copyModels`.

#### JDK Update
The version 0.7.3 enable the support for running with JDK > 11. We recommend to run it with JDK 17.
Running grobid-quantities with gradle (`./gradlew clean run`) is already supported in the `build.gradle`.
Running grobid-quantities via the JAR file requires an additional parameter to set the java.path:
- Linux: `-Djava.library.path=../grobid-home/lib/lin-64:../grobid-home/lib/lin-64/jep`
- Mac (arm): `-Djava.library.path=.:/usr/lib/java:../grobid-home/lib/mac_arm-64:{MY_VIRTUAL_ENV}/jep/lib:{MY_VIRTUAL_ENV}/jep/lib/python3.9/site-packages/jep --add-opens java.base/java.lang=ALL-UNNAMED`
- Mac (intel): `-Djava.library.path=.:/usr/lib/java:../grobid-home/lib/mac-64:{MY_VIRTUAL_ENV}/jep/lib:{MY_VIRTUAL_ENV}/jep/lib/python3.9/site-packages/jep --add-opens java.base/java.lang=ALL-UNNAMED`
With `MY_VIRTUAL_ENV` I use `/Users/lfoppiano/anaconda3/envs/jep`


### Update from 0.7.1 to 0.7.2

In version 0.7.2 we have updated the DeLFT models.
The DL models must be updated by running `./gradlew copyModels`.

### Update from 0.7.0 to 0.7.1

In version 0.7.1 a new version of DeLFT using Tensorflow 2.x is used.
The DL models must be updated by running `./gradlew copyModels`.

### Update from 0.6.0 to 0.7.0

In version 0.7.0 the models have been updated, therefore is required to run a `./gradlew copyModels` to have properly
results especially for what concern the unit normalisation.

## Documentation

Expand Down
49 changes: 36 additions & 13 deletions doc/evaluation-scores.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,34 @@
.. topic:: Evaluation scores

*****************
Evaluation scores
*****************
**********
Evaluation
**********

--------------------
End 2 end evaluation
--------------------

The end-to-end evaluation was performed with the `MeasEval dataset <https://github.com/harperco/MeasEval>`_ (SemEval-2021 Task 8).
The scores in the following table are the micro average.
MeasEval was annotated to allow approximated entities, which are not supported in grobid-quantities.

+---------------------------+----------------+-----------+--------+---------+---------+
| Type (Ref) | Matching method| Precision | Recall | F1-score| Support |
+===========================+================+===========+========+=========+=========+
| Quantities (QUANT) | strict | 53.05 | 54.74 | 53.88 | 1165 |
+---------------------------+----------------+-----------+--------+---------+---------+
| Quantities (QUANT) | soft | 64.64 | 66.70 | 65.65 | 1165 |
+---------------------------+----------------+-----------+--------+---------+---------+
| Quantified substance (ME) | strict | 14.03 | 9.78 | 11.53 | 613 |
+---------------------------+----------------+-----------+--------+---------+---------+
| Quantified substance (ME) | soft | 21.53 | 15.02 | 17.69 | 613 |
+---------------------------+----------------+-----------+--------+---------+---------+

Note: the ME (Measured Entity) is still experimental in Grobid-quantities

-------------------------------------------------------
Machine Learning Named Entities Recognition Evaluation
-------------------------------------------------------

The scores (P: Precision, R: Recall, F1: F1-score) for all the models are performed either as 10-fold cross-validation or using an holdout dataset.
The holdout dataset of Grobid-quantities is composed by the following examples:
Expand All @@ -18,14 +44,14 @@ The models are organised as follow:
- BERT_CRF is a BERT-based model obtained by fine-tuning a SciBERT encoder. Like others, the activation function is composed by a CRF layer.


=======================

Results from 27/10/2022
=======================
~~~~~~~~~~~~~~~~~~~~~~~

The evaluation was performed on the holdout dataset from the grobid-quantities dataset.
Average values are computed as Micro average.

----------

Quantities
----------

Expand Down Expand Up @@ -79,7 +105,6 @@ Quantities
+------------------+--------------+--------+---------+-------------------------+--------+---------+


-----
Units
-----

Expand Down Expand Up @@ -113,7 +138,7 @@ Units were evaluated using UNISCOR dataset. For more information check the secti
| All (micro avg) | 70.19 | 60.88 | 65.20 | 73.03 | 65.31 | 68.94 |
+------------------+--------------+--------+---------+-------------------------+--------+---------+

------

Values
------

Expand Down Expand Up @@ -150,9 +175,9 @@ Values
| All (micro avg) | 98.90 | 99.17 | 99.03 | 98.86 | 99.25 | 99.05 |
+-----------------+------------+--------+----------+-------------------------+---------+----------+

================

Previous results
================
~~~~~~~~~~~~~~~~

The scores of this evaluation were obtained using n-fold cross-validation. The metrics are the micro average of n=10 folds.

Expand All @@ -163,7 +188,7 @@ Evaluation notes:
- The `CRF` model was evaluated on the 30/04/2020.
- The `BidLSTM_CRF_FEATURES` model was evaluated on the 28/11/2021

----------

Quantities
----------

Expand Down Expand Up @@ -191,7 +216,6 @@ Quantities
| All (micro avg) | 88.96 | 85.40 | 87.14 | 87.23 | 89.00 | 88.10 |
+---------------------+------------+--------+----------+----------------------+--------+----------+

-----
Units
-----

Expand All @@ -212,7 +236,6 @@ CRF was updated on the 10/02/2021
+------------------+------------+--------+----------+-----------+-------+-----------+


------
Values
------

Expand Down
50 changes: 45 additions & 5 deletions doc/gettingStarted.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,25 +7,65 @@
.. _latest discussion: https://github.com/kermitt2/grobid/issues/1014



###############
Getting started
===============
###############

Before you start
~~~~~~~~~~~~~~~~
.. warning:: Grobid and grobid-quantities are `not compatible with Windows`_ and limited on Apple M1. While Windows users can easily use Grobid and grobid-quantities through docker containers, the support for grobid on ARM is under development, see the `latest discussion`_.

.. warning:: Since grobid-quantities 0.7.3 (using grobid 0.7.3), we extended the support to JDK after version 11. This requires specifying the `java.library.path` explicitly. Obviously, *all these issues are solved by using Docker containers*.


Upgrade
~~~~~~~

0.7.2 to 0.7.3
==============

Grobid models
-------------

In version 0.7.3, we have updated the DeLFT models. The DL models must be updated by running ``./gradlew copyModels``.

JDK Update
-----------

The version 0.7.3 enables the support for running with JDK > 11. We recommend running it with JDK 17.
Running grobid-quantities with gradle (``./gradlew clean run``) is already supported in the ``build.gradle``.
Running grobid-quantities via the JAR file requires an additional parameter to set the java.path:

- Linux: ``-Djava.library.path=../grobid-home/lib/lin-64:../grobid-home/lib/lin-64/jep``
- Mac (arm): ``-Djava.library.path=.:/usr/lib/java:../grobid-home/lib/mac_arm-64:{MY_VIRTUAL_ENV}/jep/lib:{MY_VIRTUAL_ENV}/jep/lib/python3.9/site-packages/jep --add-opens java.base/java.lang=ALL-UNNAMED``
- Mac (intel): ``-Djava.library.path=.:/usr/lib/java:../grobid-home/lib/mac-64:{MY_VIRTUAL_ENV}/jep/lib:{MY_VIRTUAL_ENV}/jep/lib/python3.9/site-packages/jep --add-opens java.base/java.lang=ALL-UNNAMED``
With ``MY_VIRTUAL_ENV`` I use ``/Users/lfoppiano/anaconda3/envs/jep``

0.7.1 to 0.7.2
==============

In version 0.7.2, we have updated the DeLFT models.
The DL models must be updated by running ``./gradlew copyModels``.

0.7.0 to 0.7.1
==============

In version 0.7.1, a new version of DeLFT using Tensorflow 2.x is used.
The DL models must be updated by running ``./gradlew copyModels``.

0.6.0 to 0.7.0
==============

In version 0.7.0, the models have been updated, therefore it is required to run a ``./gradlew copyModels`` to have properly
results, especially for what concerns the unit normalization.


Install and build
~~~~~~~~~~~~~~~~~

Docker containers
-----------------
The simplest way to run grobid-quantities is via docker containers.

The Grobid-quantities repository provides a configuration file for docker: `resources/config/config-docker.yml`, which should work out of the box, although we recommend to **check the configuration** (e.g., to enable modules using deep learning).
The Grobid-quantities repository provides a configuration file for docker: ``resources/config/config-docker.yml``, which should work out of the box, although we recommend to **check the configuration** (e.g., to enable modules using deep learning).

To run the container use:
::
Expand Down

0 comments on commit 81b2342

Please sign in to comment.