Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for malformed IRI in IOHelper.createIRI #882

Merged
merged 3 commits into from
Aug 3, 2021

Conversation

beckyjackson
Copy link
Contributor

@beckyjackson beckyjackson commented Jun 18, 2021

Resolves #880

  • docs/ have been added/updated
  • tests have been added/updated
  • mvn verify says all tests pass
  • mvn site says all JavaDocs correct
  • CHANGELOG.md has been updated

This is a bit more sane than trying to do a regex match for a valid URL. If the output IRI is just a CURIE, this will fail (e.g., undefined prefix when running template). If the output IRI has a space it it, this will also fail (which we were just checking for a space before).

@matentzn
Copy link
Contributor

This is what I suggest as well - just consider the rare cases where this will fail:

  1. ontology IRIs which are by default when saved with Protege urns
  2. Its strictly speaking allowed to use URNs instead of URLs for IRIs as IRIs are extensions of URIs, but I think this is sane for 99.95% of all cases.

@beckyjackson
Copy link
Contributor Author

beckyjackson commented Jun 21, 2021 via email

@beckyjackson
Copy link
Contributor Author

beckyjackson commented Jun 21, 2021

We could also add to the method - if the IRI starts with urn: and doesn't have a space in it, it should be assumed valid. We could implement a check for URNs but I haven't seen them used in the wild, so I don't know if it's worth it (https://stackoverflow.com/a/5492927). I did check this regex against a bunch of URNs and it seems to work as expected (e.g. works for urn:isbn:0451450523 but fails for urn:123456)

@matentzn
Copy link
Contributor

I would kinda tend towards adding URN checking support, but defer to your better judgement. A question that I never quite answered to myself: what is the advantage of using try: new URL(...) compared to using a regular expression for matching URLs?

@beckyjackson
Copy link
Contributor Author

Here's another option: we could use the URLValidator from commons-validator, but that requires adding an additional dependency. This validator is more strict than new URL(...), though, as just passing http:// to that would pass - which would not be a valid IRI. new URL(...) will catch most issues, though. Maybe this is important enough to add another dependency, though.

I have concerns about just using regex to match a URL because there are so many variations on what a "good" regex pattern for a URL is. I just prefer to use existing and tested tools.

@matentzn
Copy link
Contributor

ok totally fine with new URL(...)!

@beckyjackson beckyjackson merged commit 5689f6f into master Aug 3, 2021
@beckyjackson beckyjackson deleted the template-iris branch August 3, 2021 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ROBOT should fail hard if CURIE prefix is unknown
3 participants