From c6384b4eaa95371fde61a4a5e9b89e1477bd151f Mon Sep 17 00:00:00 2001 From: Rafael Goncalves Date: Wed, 5 Jun 2024 12:43:31 -0400 Subject: [PATCH] Fix some tooltip markup --- README.md | 28 ++++++++++++---------------- 1 file changed, 12 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index 3f273dc..bd850d4 100644 --- a/README.md +++ b/README.md @@ -159,12 +159,9 @@ The function returns a pandas `DataFrame` containing the generated ontology mapp `target_ontology`—Path, URL or name of 'target' ontology to map the source terms to > [!TIP] -> Ontology names can be given as values to `target_ontology` e.g. "EFO" or "CL"--text2term uses [bioregistry](https://bioregistry.io) to get URLs for such names. -> -> Similarly, when the target ontology has been cached, enter the name used upon caching. +> Ontology names can be given as values to `target_ontology` e.g. "EFO" or "CL"--text2term uses [bioregistry](https://bioregistry.io) to get URLs for such names. Similarly, when the target ontology has been cached, enter the name used upon caching. -> [!NOTE] -> When using BioPortal or Zooma, this should be a comma-separated list of ontology acronyms (eg 'EFO,HPO') or **'all'** to search all ontologies. +When using BioPortal or Zooma, this should be a comma-separated list of ontology acronyms (eg 'EFO,HPO') or **'all'** to search all ontologies. `base_iris`—Map only to ontology terms whose IRIs start with one of the strings given in this tuple @@ -190,7 +187,7 @@ The function returns a pandas `DataFrame` containing the generated ontology mapp `use_cache`—Use the cache for the ontology -`term_type`—Specifies whether to map to ontology classes, properties or any of the two. Possible values are ['class', 'property', 'any'] +`term_type`—Specifies whether to map to ontology classes, properties or both. Possible values are `class, property, any` `incl_unmapped`—Include unmapped terms in the output. If a term has been tagged 'Ignore' or has less than the `min_score`, it is included in the output data frame @@ -223,9 +220,11 @@ text2term.clear_cache(ontology_acronym='') If no arguments are specified, the entire cache will be cleared. Otherwise, only the ontology with the given acronym will be cleared. Finally, `cache_exists(ontology_acronym='')` is a simple function that returns `True` if the given acronym exists in the cache, and `False` otherwise. -**_Notes_** -- The `cache_ontology` function returns an object that can be used to directly call the `map_terms` function, as well as `clear_cache` and `cache_exists`. These have the same arguments, except `ontology_target` is no longer specified and there is no `use_cache` option, since it is always True. -- While ontology URLs can be repeatedly used, acronyms must be distinct in a given environment. +> [!TIP] +> The `cache_ontology` function returns an object that can be used to directly call the `map_terms` function, as well as `clear_cache` and `cache_exists`. These have the same arguments, except `ontology_target` is no longer specified and there is no `use_cache` option, since it is always True. + +> [!NOTE] +> While ontology URLs can be repeatedly used, acronyms must be distinct in a given environment. @@ -286,7 +285,7 @@ To display a help message with descriptions of tool arguments do: `-csv CSV_INPUT` Indicates a CSV format input—follow with the name of the column containing terms to map, optionally followed by the name of the column containing identifiers for the terms (eg 'my terms,my term ids'). -`-sep SEPARATOR`, Specifies the cell separator to be used when reading a table +`-sep SEPARATOR` Specifies the cell separator to be used when reading a table `-top TOP_MAPPINGS` Maximum number of top-ranked mappings returned per source term. @@ -309,22 +308,19 @@ To display a help message with descriptions of tool arguments do: ## Supported Mappers -The mapping score associated with each mapping is indicative of how similar an input term is to an ontology term (via its labels or synonyms). The mapping/similarity scores generated by text2term are the result of applying one of the following "mappers": +The mapping score of each mapping is indicative of how similar an input term is to an ontology term (via its labels or synonyms). The mapping/similarity scores generated by text2term are the result of applying one of the following "mappers": **TF-IDF-based mapper**—[TF-IDF](https://en.wikipedia.org/wiki/Tf–idf) is a statistical measure often used in information retrieval that measures how important a word is to a document in a corpus of documents. We first generate TF-IDF-based vectors of the source terms and of labels and synonyms of ontology terms. Then we compute the [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) between vectors to determine how similar a source term is to a target term (label or synonym). **BioPortal Web API-based mapper**—uses an interface to the [BioPortal Annotator](https://bioportal.bioontology.org/annotator) that we built to allow mapping terms to ontologies in the [BioPortal](https://bioportal.bioontology.org) repository. -> [!IMPORTANT] -> Make sure to specify the target ontology name(s) as they appear in BioPortal - > [!WARNING] -> there are no confidence scores associated with BioPortal annotations, so we decided to set the mapping score of all mappings to 1 +> There are no scores associated with BioPortal annotations, so the score of all mappings is always 1 **Zooma Web API-based mapper**—uses a [Zooma](https://www.ebi.ac.uk/spot/zooma/) interface that we built to allow mapping terms to ontologies in the [Ontology Lookup Service (OLS)](https://www.ebi.ac.uk/ols4) repository. > [!IMPORTANT] -> Make sure to specify the target ontology name(s) as they appear in OLS +> When using the BioPortal or Zooma interfaces, make sure to specify the target ontology name(s) as they appear in BioPortal or OLS, respectively **Syntactic distance-based mappers**—text2term provides support for commonly used and popular syntactic (edit) distance metrics: Levenshtein, Jaro, Jaro-Winkler, Jaccard, and Indel. We use the [nltk](https://pypi.org/project/nltk/) package to compute Jaccard distances and [rapidfuzz](https://pypi.org/project/rapidfuzz/) to compute all others.