Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is converted to '? #247

Closed
dimus opened this issue Sep 12, 2023 · 2 comments
Closed

Why is converted to '? #247

dimus opened this issue Sep 12, 2023 · 2 comments

Comments

@dimus
Copy link
Member

dimus commented Sep 12, 2023

From #245

Another issue is that "D'Orbigny" in the original is "D’Orbigny" in the gnparser output. Why change UTF-8 27 to e2 80 99?

@dimus
Copy link
Member Author

dimus commented Sep 12, 2023

I do try to normalize/simplify characters if it does not change semantic meaning. My impression is that ' and are used interchangeably for authors in scientific names, and I picked ' because it is ASCII, meaning it will generate less problems for people with weird default encoding.

The original spelling of the authorship is preserved in JSON format in the verbatim field:

"authorship": {
    "verbatim": "B.D’Orbigny",
    "normalized": "B. D' Orbigny",
    "authors": [
      "B. D' Orbigny"
    ],
    "originalAuth": {
      "authors": [
        "B. D' Orbigny"
      ]
    }
  },

It might make sense to leave verbatim authorship in csv/tsv output, let me think about it a bit.

@dimus dimus changed the title Why "’" is converted to "'"? Why is converted to '? Sep 12, 2023
@dimus dimus transferred this issue from gnames/gnames Sep 12, 2023
@Mesibov
Copy link

Mesibov commented Sep 15, 2023

@dimus, I've rechecked the original dataset and found that the compilers used both characters:
3 records Acteocina candei (D’Orbigny, 1841)
37 records Acteocina candei (D'Orbigny, 1842)

gnparser converted both to apostrophe in Author, which is OK. I was looking at "D’Orbigny" in the verbatim field and thinking I had inputted "D'Orbigny", so my mistake, all is well. In my pseudo-duplicate search the results are fine:

Acteocina candei (D’Orbigny, 1841) [3]
Acteocina candei (D'Orbigny, 1842) [37]

@dimus dimus closed this as completed Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants