Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validate operation #691

Merged
merged 120 commits into from
Aug 14, 2020
Merged

Add validate operation #691

merged 120 commits into from
Aug 14, 2020

Conversation

beckyjackson
Copy link
Contributor

@beckyjackson beckyjackson commented May 12, 2020

Resolves #636

  • docs/ have been added/updated
  • tests have been added/updated
  • mvn verify says all tests pass
  • mvn site says all JavaDocs correct
  • CHANGELOG.md has been updated

Michael Cuffaro added 30 commits June 16, 2019 09:15
…rgument. The eventual implementation will either get its input through the local filesystem or through ImmPort (either by direct download or through an API). But likely it will be in JSON format.
…to have been checked at the time this method is called
…nd parsing a template row from the csv data
…c-label", "sc-iri" rules, etc. just "sc"). For a given cell, we now determine whether it is a label or iri on the fly.
…Operation class, and change naming convention: all static members of ValidateOperation (methods and variables) are named using underscore convention, while local method-internal variables are named using camelCase
@beckyjackson
Copy link
Contributor Author

I've made some updates to how validate works. I'm happy to rework some of these if they don't sound good, but I think these changes work a bit better with our standard Makefile workflows...

  • By default, validate will fail if there are any validation failures in any table (exit code 1). This is so validate can be used in Makefile workflows and stop the workflow on error. I put in a new --no-fail option so this can be overridden.
  • I added tooltips to the HTML tables, which requires the Bootstrap JS. Bootstrap JS is only included in the HTML file if there is at least one comment. To this extent, I changed --standalone to default to false so that users will always get the CSS and JS unless they say otherwise.
  • I added in "not" rules (not-subclass-of, not-superclass-of, not-instance-of). I originally tried subclass-of not X but this did not properly validate.
  • validate will now print an error message at the end on any validation failures, and print the names of the table(s) that failed with the path to the output (unless no output format):
VALIDATION FAILED - the following table(s) had one or more rule violation:
- immune_exposures.csv (results/immune_exposures.txt)
  • You can choose to print all validation failure messages with --silent false (messages will always be printed if there is no output format)

@beckyjackson
Copy link
Contributor Author

Also, this has a fix to template - the rule rows were being parsed even though they don't have an ID/LABEL.

@cmungall
Copy link
Contributor

Perhaps this should be called something like validate-table or validate-template-data? validate sounds like it's validation of the ontology itself, which is covered by report.

Or maybe this should be a flag in the template command, if I understand the intent correctly.

I had a look at the comments on the DL-query ticket
#387 (comment)

and at validate.md in this PR. I thought I grokked the overall idea, but maybe I'm being dense, I think I need a gentler introduction as to what the overall use case is here, sorry

@matentzn
Copy link
Contributor

Wait, I thought that the tables contain validation rules that are checked against the -i input ontology? So my worry was more whether this validate here clearly delineates from report. (is it that report is for making generic structural checks, and validate being a kind of unit testing framework?) The way I always understood this was more like a kind of shacl/shex thing for TBoxes: ensure that a phenotype is not subclass of a disease, but a subclass of say dependent continuant. And so on. Can we get a few examples/use cases explained in a non-technical way to understand how we will use it?

@jamesaoverton
Copy link
Member

Yes, we can update the documentation.

This validate command accepts a table, then returns the table with problem cells highlighted. So it's is quite different from report in behaviour and implementation. It can run on ROBOT templates (parts of the ontology) or just tables of data, so it's distinct from template as well.

@cmungall
Copy link
Contributor

cmungall commented Jun 29, 2020 via email

@beckyjackson
Copy link
Contributor Author

As a side note, I updated the docs to try and clarify the input/output files. I also put details on the HTML --standalone option, and added a new --write-all option. The --write-all true writes all tables to the output directory. When --write-all false (default), only tables with failed validations are written to the output directory. I think this will be easier for using in Makefile workflows where people only want to see what has a problem.

docs/validate.md Outdated Show resolved Hide resolved
@jamesaoverton
Copy link
Member

There's a new file robot-core/test.txt that should not be there.

@jamesaoverton
Copy link
Member

Are the two robot-core/src/test/resources/nucleus.* files being used? They're new in this PR, but I didn't seem them used in this PR.

@jamesaoverton
Copy link
Member

In the immune_exposures.csv example files, I don't understand why the IDs use underscores, e.g. NCBITaxon_11103. They should be proper CURIEs. This might be a holdover from a long time ago, but it should be fixed before merging. And there should not be any special code anywhere for pretending that these are actually CURIEs -- they are wrong and they should fail.

In the immune_exposures.html output file, if the input value is an ID/CURIE, then the HTML cell should show that same value. The HTML cells values should always match the input values, except that we add links when we can.

I would like immune_exposures.owl cleaned up to remove everything in the "mike" namespace, which was just for development. I think that the only object property we actually need is IDO:0000664 'has material basis in'. Most of the other classes are in ONTIE. This is much closer to what we want: https://github.com/jamesaoverton/immune-exposure-validation

@beckyjackson
Copy link
Contributor Author

I removed those couple of files that are not used anywhere. I also cleaned up the immune exposures test files.

And there should not be any special code anywhere for pretending that these are actually CURIEs -- they are wrong and they should fail.

It looks like the parser from OWLAPI is actually OK with these. I changed them in the file, but do we want to add a piece of code that confirms that CURIEs are valid? Otherwise it won't fail on CURIEs that just have an underscore.

The HTML cells values should always match the input values, except that we add links when we can.

Agreed and updated, except for anonymous expressions. The HTML renderer handles these and I think if we want to keep using the exact values, we would have to write a new renderer, or at least rework something... what do you think?

@jamesaoverton
Copy link
Member

This has performance problems, and I want to add an alternative format for specifying the rules. But it's blocking other PRs, so I'm going to merge it, and we'll continue to improve it.

As to whether it belongs in ROBOT: This works hand-in-hand with template, which is 50%+ of what I use ROBOT for. It also reuses outputs code for export and other commands. So I am confident that it belongs.

As to whether the name validate is too general: The focus is currently on validating a target table given DL query rules and an input ontology then output a table of results, but I think there's lots of room to extend this starting point to validate a target graph or OWL file, against SHACL rules or something.

@jamesaoverton jamesaoverton merged commit 2fc3ac5 into master Aug 14, 2020
@jamesaoverton jamesaoverton deleted the add_validate_operation branch June 16, 2022 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revise validate to use export output code
4 participants