Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[out there] processing taxon identifiers via NLP #44

Closed
gottacatchenall opened this issue Aug 21, 2021 · 1 comment
Closed

[out there] processing taxon identifiers via NLP #44

gottacatchenall opened this issue Aug 21, 2021 · 1 comment
Labels
enhancement New feature or request need-triage

Comments

@gottacatchenall
Copy link
Member

gottacatchenall commented Aug 21, 2021

A "moonshot" idea I had for this library would be implementing rudimentary natural-language-processing (NLP) methods for processing taxon identifiers.

As an example, if the input contains ["A. p. aciculatus", "ponderosa pine", "Agelaius phoeniceus", "A. phoeniceus californicus", "red winged blackbird", "Agelaius xanthomus", "Pinus ponderosa", "P. ponderosa"] we would want a cleaning function to return ids in NCBI associated with the coarsest resolution id, e.g. ["Agelaius", "Pinus ponderosa"]

Clearly a false-postive here could be analysis-breaking so reporting some degree of confidence in
each resolved species label would also be necessary.

Just something to ruminate on

@gottacatchenall gottacatchenall added enhancement New feature or request need-triage labels Aug 21, 2021
@gottacatchenall gottacatchenall assigned tpoisot and unassigned tpoisot Aug 21, 2021
@gottacatchenall
Copy link
Member Author

https://github.com/nleguillarme/taxonerd could be relevant

@tpoisot tpoisot closed this as completed Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request need-triage
Projects
None yet
Development

No branches or pull requests

2 participants