Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion to allow single quotes in C template values #761

Merged
merged 3 commits into from
Jan 21, 2021

Conversation

beckyjackson
Copy link
Contributor

@beckyjackson beckyjackson commented Nov 4, 2020

  • docs/ have been added/updated
  • tests have been added/updated
  • mvn verify says all tests pass
  • mvn site says all JavaDocs correct
  • CHANGELOG.md has been updated

Currently, when a term label contains a single quote, we need to use CURIEs instead of labels in the parent column, or else the term cannot be parsed.

For example, the label:

Streptococcus sp. 'group B'

Results in:

MANCHESTER PARSE ERROR the expression ''Streptococcus sp. 'group B'''  at row 1976, column 2 
in table "src/ontology/templates/taxon.tsv" cannot be parsed: encountered unknown 'Streptococcus sp. '

Typically, I would just use NCBITaxon:1319 instead and the template would be OK, but now that we are using labels with VALVE, we need to be able to reference all labels in the "Parent" columns or else certain functions fail.

This is a proposed fix that first attempts to use a QuotedEntityChecker to retrieve a class by label. If a class cannot be found, then the ManchesterOWLSyntaxClassExpressionParser is used to try to parse an expression. My one concern is that for large templates with a bunch of class expressions, this double-step may be less efficient. At the same time, it may actually be quicker in some cases where only labels are used. The relevant lines are here:

expr = null;
if (checker != null) {
// If we have a checker, try to get the class by label
// This allows class labels with single quotes
expr = checker.getOWLClass(content);
}
if (expr == null) {
// If not found by label, try to parse
expr = parser.parse(content);
}

@beckyjackson
Copy link
Contributor Author

We still run into problems when using a label with a single quote in expressions...

MANCHESTER PARSE ERROR the expression ''in taxon' some 'Theiler's encephalomyelitis virus'' at row 494,
column 6 in table "src/ontology/templates/protein.tsv" cannot be parsed: encountered unknown 'Theiler'

I can work around this in VALVE since it's not a tree, but it's still annoying

@jamesaoverton
Copy link
Member

I guess ROBOT should be escaping before insertion: Theiler\'s encephalomyelitis virus.

These changes have a fair chance of to breaking somebody's stuff, somewhere. I'll want to unit test them thoroughly, then test against a bunch of real ontologies. If we see significant breakage, then I'll want to hide this improved string handling behind an option.

@beckyjackson
Copy link
Contributor Author

But you can plug in expressions into templates, e.g. I can provide this string as a value in a cell (not the substituted template string):

part_of some 'test one'

If we escape single quotes before passing it to the parser, this parsing will fail (the unit tests fail):

testTemplateCSV(org.obolibrary.robot.TemplateTest): template#MANCHESTER PARSE ERROR
the expression 'part_of some 'test one'' at row 3, column 8 in table "/template.csv" cannot be parsed:
encountered unknown 'test

I don't think there's a good way to know if something is an expression vs. a label, unless we run it through the QuotedEntityChecker. Thoughts?

@jamesaoverton
Copy link
Member

Yes, using the QuotedEntityChecker is the right way to determine whether a cell contains a label. I didn't think I was disagreeing, just mentioning one case. I guess we need a list of cases.

@beckyjackson
Copy link
Contributor Author

Oh I misunderstood, sorry!

@jamesaoverton jamesaoverton self-requested a review December 18, 2020 15:30
Copy link
Member

@jamesaoverton jamesaoverton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to move forward with this, but I need to be convinced that it won't break things. @beckyjackson Please try this on OBI, MRO, ONTIE, and maybe other projects and report on the results.

@beckyjackson
Copy link
Contributor Author

I think ONTIE is the only one I can think of that uses labels with single quotes in template patterns. I tested it on ONTIE (using the labels instead of the CURIEs) and it worked exactly as expected.

I also built MRO and OBI using the JAR from Jenkins and it completed just fine!

@jamesaoverton
Copy link
Member

Ok. Please add a few unit tests, and then we'll merge this.

@jamesaoverton jamesaoverton added this to the v1.8.0 milestone Jan 20, 2021
@beckyjackson
Copy link
Contributor Author

Test added!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants