Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback from Bob Mesibov #245

Closed
dimus opened this issue Sep 12, 2023 · 7 comments
Closed

Feedback from Bob Mesibov #245

dimus opened this issue Sep 12, 2023 · 7 comments

Comments

@dimus
Copy link
Member

dimus commented Sep 12, 2023

(1) One problem is that gnparser adds quotes when I use the TSV output option. Originals in the Naturalis Mollusca list, followed by the gnparser output:

"""Glyptothauma"" cf ankasana" | """""""Glyptothauma"""" cf ankasana"""
"""Glyptothauma"" cf. ankasana" | """""""Glyptothauma"""" cf. ankasana"""
"""Glyptothauma"" cf. ankasana de Winter, 1996" | """""""Glyptothauma"""" cf. ankasana de Winter, 1996"""
"""Glyptothauma"" sp. 2" | """""""Glyptothauma"""" sp. 2"""
"Sepietta oweniana (D""Orbigny, 1839-1841)" | """Sepietta oweniana (D""""Orbigny, 1839-1841)"""
"Sepiola atlantica D""Orbigny, 1839-1842" | """Sepiola atlantica D""""Orbigny, 1839-1842"""
"""Triphora"" osclausum Rolán & Fernández-Garcés, 1995" | """""""Triphora"""" osclausum Rolán & Fernández-Garcés, 1995"""

(2) Another issue is that "D'Orbigny" in the original is "D’Orbigny" in the gnparser output. Why change UTF-8 27 to e2 80 99?

(3) regex says reject, gnparser says OK (regex_yes_gnparser_no file)

Please see. A lot of these end with "cf/CF" or "ms/MS".

(4) regex says OK, gnparser rejects (regex_OK_gnparser_no file)

Please see. It looks like gnparser doesn't like "Genus (Subgenus)", which I would have thought OK, and worries about "Author in Author, Year". Note also that the Dutch-persons at Naturalis have used "Von dem Busch" rather than "von dem Busch".

regex_OK_gnparser_no.txt
regex_yes_gnparser_OK.txt

@dimus
Copy link
Member Author

dimus commented Sep 13, 2023

Hm, Genus (Subgenus) should work according to these tests:

https://github.com/gnames/gnparser/blob/master/testdata/test_data.md#combination-of-two-uninomials


Name: Aaleniella (Danocythere)

Canonical: Aaleniella subgen. Danocythere


Name: Cordia (Adans.) Kuntze sect. Salimori

Canonical: Cordia sect. Salimori


Name: Calathus (Lindrothius) KURNAKOV 1961

Canonical: Calathus subgen. Lindrothius


Can you add examples that show your cases?

@dimus
Copy link
Member Author

dimus commented Sep 13, 2023

Can you please show examples for worries about "Author in Author, Year"

@dimus
Copy link
Member Author

dimus commented Sep 13, 2023

Looks like I need to add "dem" as an author word: Von dem Busch. Ill check if dem ever happens as a specific epithet.

@Mesibov
Copy link

Mesibov commented Sep 15, 2023

@dimus, sorry, I wasn't paying attention to this issue. The "Genus (Subgenus)" and "Author in Author, Year" cases I was thinking of can be found in in https://github.com/gnames/gnames/files/12587991/regex_OK_gnparser_no.txt. Both forms throw up a quality rating of 2.

Please also note that in "Eutrochatella babei (Arango y Molina, 1876)", the "y" is part of the author's surname, so the quality 2 indicator "Spanish 'y' is used instead of '&'" does not apply.

@dimus
Copy link
Member Author

dimus commented Sep 20, 2023

Thank you @Mesibov for explanation. I do think that y should decrease the quality, because there are many other languages that people can use for the and word, and doing so will create a mess. So I decided to limit and words to and and &. I personally would prefer et though :)

I am not sure what to do if y is a part of the Author name, I guess I do need to put exceptions and hardcode such authors into gnparser.

Added #251

@dimus
Copy link
Member Author

dimus commented Sep 20, 2023

In case of Genus (Subgenus) and Author in Author the quality is decreased after discussion with Paddy Patterson about these two issues. For botanical names 'Author in Author' is actually valid, so I am on the fence about it. For Genus (Subgenus) I can double check with ICZN folks.

dimus added a commit that referenced this issue Sep 25, 2023
@dimus dimus closed this as completed in 190627b Sep 26, 2023
@dimus dimus reopened this Sep 26, 2023
@dimus
Copy link
Member Author

dimus commented Sep 26, 2023

I did try to address most of the problems in v1.7.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants