Skip to content

Commit

Permalink
Fix some tooltip markup
Browse files Browse the repository at this point in the history
  • Loading branch information
rsgoncalves committed Jun 5, 2024
1 parent ae99a2a commit c6384b4
Showing 1 changed file with 12 additions and 16 deletions.
28 changes: 12 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,12 +159,9 @@ The function returns a pandas `DataFrame` containing the generated ontology mapp
`target_ontology`—Path, URL or name of 'target' ontology to map the source terms to

> [!TIP]
> Ontology names can be given as values to `target_ontology` e.g. "EFO" or "CL"--text2term uses [bioregistry](https://bioregistry.io) to get URLs for such names.
>
> Similarly, when the target ontology has been cached, enter the name used upon caching.
> Ontology names can be given as values to `target_ontology` e.g. "EFO" or "CL"--text2term uses [bioregistry](https://bioregistry.io) to get URLs for such names. Similarly, when the target ontology has been cached, enter the name used upon caching.
> [!NOTE]
> When using BioPortal or Zooma, this should be a comma-separated list of ontology acronyms (eg 'EFO,HPO') or **'all'** to search all ontologies.
When using BioPortal or Zooma, this should be a comma-separated list of ontology acronyms (eg 'EFO,HPO') or **'all'** to search all ontologies.

`base_iris`—Map only to ontology terms whose IRIs start with one of the strings given in this tuple

Expand All @@ -190,7 +187,7 @@ The function returns a pandas `DataFrame` containing the generated ontology mapp

`use_cache`—Use the cache for the ontology

`term_type`—Specifies whether to map to ontology classes, properties or any of the two. Possible values are ['class', 'property', 'any']
`term_type`—Specifies whether to map to ontology classes, properties or both. Possible values are `class, property, any`

`incl_unmapped`—Include unmapped terms in the output. If a term has been tagged 'Ignore' or has less than the `min_score`, it is included in the output data frame

Expand Down Expand Up @@ -223,9 +220,11 @@ text2term.clear_cache(ontology_acronym='')
If no arguments are specified, the entire cache will be cleared. Otherwise, only the ontology with the given acronym will be cleared.
Finally, `cache_exists(ontology_acronym='')` is a simple function that returns `True` if the given acronym exists in the cache, and `False` otherwise.

**_Notes_**
- The `cache_ontology` function returns an object that can be used to directly call the `map_terms` function, as well as `clear_cache` and `cache_exists`. These have the same arguments, except `ontology_target` is no longer specified and there is no `use_cache` option, since it is always True.
- While ontology URLs can be repeatedly used, acronyms must be distinct in a given environment.
> [!TIP]
> The `cache_ontology` function returns an object that can be used to directly call the `map_terms` function, as well as `clear_cache` and `cache_exists`. These have the same arguments, except `ontology_target` is no longer specified and there is no `use_cache` option, since it is always True.
> [!NOTE]
> While ontology URLs can be repeatedly used, acronyms must be distinct in a given environment.
</details>

Expand Down Expand Up @@ -286,7 +285,7 @@ To display a help message with descriptions of tool arguments do:

`-csv CSV_INPUT` Indicates a CSV format input—follow with the name of the column containing terms to map, optionally followed by the name of the column containing identifiers for the terms (eg 'my terms,my term ids').

`-sep SEPARATOR`, Specifies the cell separator to be used when reading a table
`-sep SEPARATOR` Specifies the cell separator to be used when reading a table

`-top TOP_MAPPINGS` Maximum number of top-ranked mappings returned per source term.

Expand All @@ -309,22 +308,19 @@ To display a help message with descriptions of tool arguments do:

## Supported Mappers

The mapping score associated with each mapping is indicative of how similar an input term is to an ontology term (via its labels or synonyms). The mapping/similarity scores generated by text2term are the result of applying one of the following "mappers":
The mapping score of each mapping is indicative of how similar an input term is to an ontology term (via its labels or synonyms). The mapping/similarity scores generated by text2term are the result of applying one of the following "mappers":

**TF-IDF-based mapper**&mdash;[TF-IDF](https://en.wikipedia.org/wiki/Tf–idf) is a statistical measure often used in information retrieval that measures how important a word is to a document in a corpus of documents. We first generate TF-IDF-based vectors of the source terms and of labels and synonyms of ontology terms. Then we compute the [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) between vectors to determine how similar a source term is to a target term (label or synonym).

**BioPortal Web API-based mapper**&mdash;uses an interface to the [BioPortal Annotator](https://bioportal.bioontology.org/annotator) that we built to allow mapping terms to ontologies in the [BioPortal](https://bioportal.bioontology.org) repository.

> [!IMPORTANT]
> Make sure to specify the target ontology name(s) as they appear in BioPortal
> [!WARNING]
> there are no confidence scores associated with BioPortal annotations, so we decided to set the mapping score of all mappings to 1
> There are no scores associated with BioPortal annotations, so the score of all mappings is always 1
**Zooma Web API-based mapper**&mdash;uses a [Zooma](https://www.ebi.ac.uk/spot/zooma/) interface that we built to allow mapping terms to ontologies in the [Ontology Lookup Service (OLS)](https://www.ebi.ac.uk/ols4) repository.

> [!IMPORTANT]
> Make sure to specify the target ontology name(s) as they appear in OLS
> When using the BioPortal or Zooma interfaces, make sure to specify the target ontology name(s) as they appear in BioPortal or OLS, respectively
**Syntactic distance-based mappers**&mdash;text2term provides support for commonly used and popular syntactic (edit) distance metrics: Levenshtein, Jaro, Jaro-Winkler, Jaccard, and Indel. We use the [nltk](https://pypi.org/project/nltk/) package to compute Jaccard distances and [rapidfuzz](https://pypi.org/project/rapidfuzz/) to compute all others.

Expand Down

0 comments on commit c6384b4

Please sign in to comment.