-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epithets starting with non
are not parsed correctly
#211
Comments
Hold on. There is something odd here. Hyacinthoides non-scripta was reported as one of these cases, but current version of the online parser (v1.5.5) is already resolving it correctly (quality 1) But the others @tobymarsden mentions now are getting quality 4 (unparsed tails) |
for these specific names I quess we need a look-ahead with '-' There is a broader situation where names like "Aus bus (non Linnaeus)" would benefit from properly parsed "non", but it can be addressed in a separate issue. |
@dimus considering the absence of lookarounds in golang's regex, this is ugly but appears to work:
Have I missed anything? ( |
yes, lets try it this way, looks like lookahead is not included for performance reasons |
Currently names such as
Hyacinthoides non-scripta
have to be special-cased becausenon
is a stopword.There are also a bunch of these names which are not currently handled:
The most conservative way of handling this would be to change the
non
stopword intonon\s
-- this would retain the current behavior in the case of inputs such asXiphipops fisheri (non Snyder, 1904)
but allow epithets starting withnon-
to be parsed.The text was updated successfully, but these errors were encountered: