[out there] processing taxon identifiers via NLP #44

gottacatchenall · 2021-08-21T20:46:25Z

A "moonshot" idea I had for this library would be implementing rudimentary natural-language-processing (NLP) methods for processing taxon identifiers.

As an example, if the input contains ["A. p. aciculatus", "ponderosa pine", "Agelaius phoeniceus", "A. phoeniceus californicus", "red winged blackbird", "Agelaius xanthomus", "Pinus ponderosa", "P. ponderosa"] we would want a cleaning function to return ids in NCBI associated with the coarsest resolution id, e.g. ["Agelaius", "Pinus ponderosa"]

Clearly a false-postive here could be analysis-breaking so reporting some degree of confidence in
each resolved species label would also be necessary.

Just something to ruminate on

The text was updated successfully, but these errors were encountered:

gottacatchenall · 2022-02-01T21:17:13Z

https://github.com/nleguillarme/taxonerd could be relevant

gottacatchenall added enhancement New feature or request need-triage labels Aug 21, 2021

gottacatchenall assigned tpoisot and unassigned tpoisot Aug 21, 2021

tpoisot closed this as completed Feb 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[out there] processing taxon identifiers via NLP #44

[out there] processing taxon identifiers via NLP #44

gottacatchenall commented Aug 21, 2021 •

edited

Loading

gottacatchenall commented Feb 1, 2022

[out there] processing taxon identifiers via NLP #44

[out there] processing taxon identifiers via NLP #44

Comments

gottacatchenall commented Aug 21, 2021 • edited Loading

gottacatchenall commented Feb 1, 2022

gottacatchenall commented Aug 21, 2021 •

edited

Loading