Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support graft-chimeras #194

Merged
merged 2 commits into from
Oct 22, 2021
Merged

Support graft-chimeras #194

merged 2 commits into from
Oct 22, 2021

Conversation

tobymarsden
Copy link

I'm trying to get gnparser to parse all names in Kew's Plants of the World Online.

I bumped into a parsing failure when dealing with graft-chimeras, e.g.

+ Crataegomespilus
Cytisus purpureus + Laburnum anagyroides
Crataegus + Mespilus

This PR parses these names successfully without any impact on existing test cases, e.g.

{
  "parsed": true,
  "quality": 2,
  "qualityWarnings": [
    {
      "quality": 2,
      "warning": "Named graft-chimera"
    }
  ],
  "verbatim": "+ Crataegomespilus",
  "normalized": "+ Crataegomespilus",
  "canonical": {
    "stemmed": "Crataegomespilus",
    "simple": "Crataegomespilus",
    "full": "+ Crataegomespilus"
  },
  "cardinality": 1,
  "hybrid": "NAMED_GRAFT_CHIMERA",
  "details": {
    "uninomial": {
      "uninomial": "Crataegomespilus"
    }
  },
  "words": [
    {
      "verbatim": "+",
      "normalized": "+",
      "wordType": "GRAFT_CHIMERA_CHAR",
      "start": 0,
      "end": 1
    },
    {
      "verbatim": "Crataegomespilus",
      "normalized": "Crataegomespilus",
      "wordType": "UNINOMIAL",
      "start": 2,
      "end": 18
    }
  ],
  "id": "408e8fc7-fa27-53a6-9eff-37cb779724e4",
  "parserVersion": "test_version"
}

and

{
  "parsed": true,
  "quality": 2,
  "qualityWarnings": [
    {
      "quality": 2,
      "warning": "Graft-chimera formula"
    }
  ],
  "verbatim": "Cytisus purpureus + Laburnum anagyroides",
  "normalized": "Cytisus purpureus + Laburnum anagyroides",
  "canonical": {
    "stemmed": "Cytisus purpure + Laburnum anagyroid",
    "simple": "Cytisus purpureus + Laburnum anagyroides",
    "full": "Cytisus purpureus + Laburnum anagyroides"
  },
  "cardinality": 0,
  "hybrid": "GRAFT_CHIMERA_FORMULA",
  "details": {
    "graftChimeraFormula": [
      {
        "species": {
          "genus": "Cytisus",
          "species": "purpureus"
        }
      },
      {
        "species": {
          "genus": "Laburnum",
          "species": "anagyroides"
        }
      }
    ]
  },
  "words": [
    {
      "verbatim": "Cytisus",
      "normalized": "Cytisus",
      "wordType": "GENUS",
      "start": 0,
      "end": 7
    },
    {
      "verbatim": "purpureus",
      "normalized": "purpureus",
      "wordType": "SPECIES",
      "start": 8,
      "end": 17
    },
    {
      "verbatim": "+",
      "normalized": "+",
      "wordType": "GRAFT_CHIMERA_CHAR",
      "start": 18,
      "end": 19
    },
    {
      "verbatim": "Laburnum",
      "normalized": "Laburnum",
      "wordType": "GENUS",
      "start": 20,
      "end": 28
    },
    {
      "verbatim": "anagyroides",
      "normalized": "anagyroides",
      "wordType": "SPECIES",
      "start": 29,
      "end": 40
    }
  ],
  "id": "a8f8ace8-ba1a-5371-b9d5-73efce81d52c",
  "parserVersion": "test_version"
}

I've reused the hybrid flag to make consumption of the JSON output easier; notwithstanding that these aren't true botanical hybrids, it seems reasonable to use the term in the broadest sense given that it's a string value with more details anyway.

I had to adjust the stemmer but I added some stemmer-specific tests in.

The PR duplicates much of the HybridFormula code as the syntax is so close; I've another branch which refactors things to reuse the HybridFormula objects, but there was no performance benefit and the code is harder to follow (for me, anyway). If you prefer that approach, though, I can submit a PR from that branch instead.

Copy link
Member

@dimus dimus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice patch @tobymarsden, thank you. Let me talk to our botanists, and meditate on it over weekend.

@dimus
Copy link
Member

dimus commented Oct 5, 2021

I've reused the hybrid flag to make consumption of the JSON output easier; notwithstanding that these aren't true
botanical hybrids, it seems reasonable to use the term in the broadest sense given that it's a string value with
more details anyway.

I think it is OK, because gnparser in general uses very 'broad' semantics in other parts, for example virus flag includes everything that is not cellular. I think v1 of GNparser is about practicality, and covering its domain. And v2 might become a more scientically accurate in its definitions.

@tobymarsden
Copy link
Author

@dimus Awesome, thanks for looking at this!

@dimus
Copy link
Member

dimus commented Oct 10, 2021

@tobymarsden I asked around, and looked at the codes. It seems that graft-chimeras are completely in the realm of cultivars code, so it would be logical to parse them only when cultivar flag is on. Can you make this change in your PR and make them 'visible' only if cultivar flag is used? I think their tests also should be in cultivar test file.

@dimus
Copy link
Member

dimus commented Oct 10, 2021

I think if people go through names that suppose to be in ICN context, parser should break on graft-chimera names.

@tobymarsden
Copy link
Author

@dimus Makes perfect sense. I'll try to find some time this week to make the changes to the PR.

@dimus
Copy link
Member

dimus commented Oct 11, 2021

sounds great @tobymarsden

@tobymarsden
Copy link
Author

@dimus The graft-chimera support is now contingent on the -C flag, and parsing breaks on graft chimeras without it.

I've updated the tests so the parsed graft-chimeras are in the cultivars file, and the main test file shows "parsed":false for these names.

@dimus
Copy link
Member

dimus commented Oct 22, 2021

@tobymarsden perfect! Trying it now...

@dimus
Copy link
Member

dimus commented Oct 22, 2021

It all looks good to me, @tobymarsden, great work, merging...

@dimus dimus merged commit 11a8fdd into gnames:master Oct 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants