Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we Close the shapes in the schema? #197

Open
goodb opened this issue Jan 24, 2020 · 19 comments
Open

Should we Close the shapes in the schema? #197

goodb opened this issue Jan 24, 2020 · 19 comments
Assignees

Comments

@goodb
Copy link
Contributor

goodb commented Jan 24, 2020

Right now, our schema definitions allow for the addition of relations not specified in the schema for our Shapes. Here is an example (will disappear http://noctua-dev.berkeleybop.org/editor/graph/gomodel:5e28fcb400000000 )

Screen Shot 2020-01-24 at 10 47 51 AM

This is an example @vanaukenk added as a positive test for a new <AnatomicalStructureDevelopment> shape which allows for the 'results_in_development_of' relation to an AnatomicalEntity. That shape is not present in the schema that is being used to test that file in the picture and the Tissue Development node is validating just fine as a Biological Process. This is because the shapes are, by default, open and allow for new relations. (See explanation of closed versus open).

If we leave them open, it is important to get the ShapeMap updated as the new shapes come in, otherwise models may appear to validate correctly when they should not be. e.g. here, the Tissue Development node should be tested against the AnatomicalStructureDevelopment Shape when all of the changes are complete.

Is it desirable to allow new relations, not in the spec for a shape, or do we want to close them such that a new relation would result in a schema validation error?
@vanaukenk @balhoff @cmungall @pgaudet @ukemi @kltm

@vanaukenk
Copy link
Contributor

@goodb I think we would want to close the shapes, right?

I also have a list of SPARQL queries that I was going to add to the ShapeMap as part of this work, e.g.
SPARQL 'SELECT ?x WHERE { ?x a/http://www.w3.org/2000/01/rdf-schema#subClassOf http://purl.obolibrary.org/obo/GO_0048856 }' @ http://purl.obolibrary.org/obo/go/shapes/AnatomicalStructureDevelopment,

Are you referring to this or something else?

@goodb
Copy link
Contributor Author

goodb commented Jan 24, 2020

Yes, I was referring to the SPARQL queries that define the ShapeMap.

It does seem like the way we are using the shapes suggests a closed view. Lets see if there are any other thoughts as this is a pretty big change and then if no opposition go for it en masse next week.

@ericprud
Copy link
Collaborator

This tension is not unusual as different use cases imply different needs. We have a start on an API doc which could include parameters to not enforce open or closed, though of course the first step is to implement the switch somewhere and gain experience.

Sometimes you want to close a shape because you want to make sure you've covered the properties or be able to detect misspelled predicates but you don't see an issue if someone wants to loosen that up in their environment. There are, however, some shapes that really demand CLOSEDness, such as rdf:Collections:

<#Paper> { :authors @<#AuthorList> }
<#AuthorList> rdf:nil OR CLOSED {
  rdf:first @<#Author> ;
  rdf:rest <#AuthorList>
}

Any guidance for me?

@goodb
Copy link
Contributor Author

goodb commented Jan 24, 2020

@ericprud guidance with respect to your API doc? Parameters to not enforce something that is in the other main sources of documentation, like that book I referenced, seems like maybe a bad idea?
I don't see anything wrong with adding the CLOSED constraint on as needed on a per-shape basis.

To clarify our context, in case that is useful to you, we have an editor that currently allows users to add arbitrary relations among the nodes in a model. We are migrating towards a more controlled system that uses the shex schema to govern what relations are allowed between different node types.

We also have models that come in from external sources and we want to ensure that they fit in with the rest.

@vanaukenk
Copy link
Contributor

Discussion on 2020-01-29 gocam specifications call:
We will opt for closing the shapes and then seeing how often curators try to add new relations that result in invalid models. We'll need to work out a good workflow for evaluating and responding to these cases.
@goodb will close the shapes and we'll see how it goes.

@goodb goodb self-assigned this Jan 31, 2020
@goodb
Copy link
Contributor Author

goodb commented Feb 4, 2020

@ericprud moving this question over from the go_shapes room which seems to have become vacant from lack of use.

We have a structure in our schema https://github.com/geneontology/go-shapes/blob/master/shapes/go-cam-shapes.shex in which all shapes extend a basic shape <GoCamEntity> that extends another basic shape <ProvenanceAnnotated>. These base shapes contain metadata (e.g. date etc.) that all nodes in our models can have. Per discussion above, we are considering closing the higher level shapes e.g. <CellularComponent>. How would you define a closed shape that allowed for the relations in the shapes that it extends? The only way I have come up with so far is to explicitly enumerate all of the relation types allowed by the parent shapes in the EXTRA definition, like this:
<CellularComponent> @<GoCamEntity> AND CLOSED EXTRA a contributor: date: provided_by: rdfs:comment xref: rdfs:label exact_match: {...
which seems inelegant..
Is there another pattern that would allow us to automatically bring in the predicates from the 'parent' shapes when a child shape is declared to be closed? Or perhaps I am not thinking about this the right way?

@balhoff
Copy link
Member

balhoff commented Feb 6, 2020

@goodb here is a reduced example to paste into https://rawgit.com/shexSpec/shex.js/master/packages/shex-webapp/doc/shex-simple.html

ShEx:

BASE   <http://purl.obolibrary.org/obo/go/shapes/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX contributor: <http://purl.org/dc/elements/1.1/contributor>
PREFIX provided_by: <http://purl.org/pav/providedBy>
PREFIX date: <http://purl.org/dc/elements/1.1/date>
PREFIX xref: <http://www.geneontology.org/formats/oboInOwl#hasDbXref>
PREFIX exact_match: <http://www.w3.org/2004/02/skos/core#exactMatch>
PREFIX source: <http://purl.org/dc/elements/1.1/source>
PREFIX evidence: <http://geneontology.org/lego/evidence>
PREFIX with: <http://geneontology.org/lego/evidence-with>
PREFIX GoCellularComponent: <http://purl.obolibrary.org/obo/GO_0005575>
PREFIX part_of: <http://purl.obolibrary.org/obo/BFO_0000050>
PREFIX has_part: <http://purl.obolibrary.org/obo/BFO_0000051>
PREFIX occurs_in: <http://purl.obolibrary.org/obo/BFO_0000066>
PREFIX adjacent_to: <http://purl.obolibrary.org/obo/RO_0002220>
PREFIX overlaps: <http://purl.obolibrary.org/obo/RO_0002131>


<ProvenanceAnnotated> {
  contributor: xsd:string *; #TODO would be better as an IRI
  date: xsd:string *; #TODO can we make this an xsd:date?
  provided_by: xsd:string *; #TODO would be better as an IRI
  rdfs:comment xsd:string *;
}

<GoCamEntity> IRI @<ProvenanceAnnotated> AND EXTRA a {
  a [owl:NamedIndividual] * // rdfs:comment  "Every entity we care about is a named individual";
  xref: . *;
  rdfs:label . {0,1};
  exact_match: . *;
} // rdfs:comment  "Default allowable metadata for GO-CAM entities"

<CellularComponentClass> IRI AND EXTRA rdfs:subClassOf {
  rdfs:subClassOf [ GoCellularComponent: ];
}

<CellularComponent> @<GoCamEntity> AND EXTRA a  CLOSED {
  a @<CellularComponentClass> {1};
  part_of: . {0,1};
} // rdfs:comment  "a cellular component"

Data:

BASE   <http://purl.obolibrary.org/obo/go/shapes/test/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX contributor: <http://purl.org/dc/elements/1.1/contributor>
PREFIX GoCellularComponent: <http://purl.obolibrary.org/obo/GO_0005575>
PREFIX part_of: <http://purl.obolibrary.org/obo/BFO_0000050>

<A> part_of: <B> .
<A> rdf:type <CC> .
<A> contributor: "Jim" .

<CC> rdfs:subClassOf GoCellularComponent: .

Result:

<test/A>@!obo:go\/shapes\/CellularComponent
  validating http://purl.obolibrary.org/obo/go/shapes/test/A as http://purl.obolibrary.org/obo/go/shapes/CellularComponent:
      ClosedShapeError: unexpected: {
        http://purl.obolibrary.org/obo/go/shapes/test/A http://purl.org/dc/elements/1.1/contributor "Jim" .
      }

@goodb
Copy link
Contributor Author

goodb commented Feb 6, 2020

Option. We could use the shape map to identify the shapes that we want to view as closed. We could then automatically add the 'closed' definition along with all of the EXTRA properties when the schema is loaded into our validator (by introspecting the shape and its references).

Downside is that we would become tightly coupled to that implementation. e.g. it would be harder to swap out the validator service and use something else (e.g. I could see us wanting to do validation at the javascript level at some point and then we would have to re-implement the same thing).

Similarly, we could add code to our validator service that would sniff for extra, undefined relationships on these nodes without using the shex validation code. Same problem of course.

@goodb
Copy link
Contributor Author

goodb commented Feb 7, 2020

@kltm I have a potential solution here but I'm stuck at a javascript error involving the Violations response. Can you tell me what you are expecting to see for n.types() below from NoctuaEditor.js ? Its crashing gracelessly with the error: TypeError: null is not an object (evaluating 'n.types')

                 G(a.violations(), function(e) {
                        var t = e.node_id(),
                            n = a.get_node(t),
                            r = "",
                            i = n.types();
                        x.each(i, function(e) {
                            r += U(e)
                        }), "" === r && (r = t), e.explanations() && 0 < e.explanations().length ? G(e.explanations(), function(e) {
                            e.shape && (c[e.shape] || (c[e.shape] = []), c[e.shape].push(r))
                        }) : (c[["unknown"]] || (c[["unknown"]] = []), c.unknown.push(r))
                    });

thanks..

@kltm
Copy link
Member

kltm commented Feb 7, 2020

It would be helpful to see the stack trace here--compiled code is a little nasty to work with. I'm guessing it's from around here:

https://github.com/berkeleybop/bbop-graph-noctua/blob/master/lib/edit.js#L239

Violations are things in a graph, so a is a graph.
You are getting a node n, with t (a string)... maybe a node was not actually gotten and you're running .types() on a null? Or something?

It might be easier to debug that with the response JSON from minerva and just making a test:
https://github.com/berkeleybop/bbop-graph-noctua/tree/master/tests
https://github.com/berkeleybop/bbop-graph-noctua/blob/cc126170af38f0e962de56d3f9e1b865cab3757a/tests/response-gomodel-5d88482400000052-2019-09-25.json
https://github.com/berkeleybop/bbop-graph-noctua/blob/a1123b0b4a5732f2a5358975e4c3f5997b873583/tests/noctua-basic.tests.js#L980

@goodb
Copy link
Contributor Author

goodb commented Feb 7, 2020

@kltm here is an example of a problematic json response from the server. The client hits that null and hangs. Struggling to find what is causing it in that json.

problem.json.txt

@kltm
Copy link
Member

kltm commented Feb 7, 2020

@goodb Thanks--I can now replicate locally. I'm digging in...

@kltm
Copy link
Member

kltm commented Feb 7, 2020

@goodb This is pretty straightforward: the node ids given in violations[].node are not consistent with the node ids in individuals.

The important thing to remember is that node ids in the response are pretty much CURIEs and there is no automatic handling on the client--the IDs given are used strings. You may also have an issue in waiting with constraints[].object.

@goodb
Copy link
Contributor Author

goodb commented Feb 7, 2020

@kltm confirming. I see full URIs like http://model.geneontology.org/R-HSA-1971475/R-HSA-9638064 in shex-validation.violations.node and curiefied ids like gomodel:R-HSA-1971475/R-HSA-9638064 in the individuals.id slot.

Will fix that..

What is the issue in waiting with the constraints object ?

@kltm
Copy link
Member

kltm commented Feb 7, 2020

constraints[].object also has a URL and not CURIE. As I believe that that id refers to an object that should be addressable to the client, the fact that it is a URL and not a CURIE means that it could not be found.

@goodb
Copy link
Contributor Author

goodb commented Feb 9, 2020

I implemented a SPARQL-based CLOSE in the minerva shex-validator. It very rapidly exposed some missing parts of the schema. e.g. annotation properties are present for nearly every node for X and Y coordinates.

geneontology/minerva#274

Will PR in some additions like these as I find them.

@goodb
Copy link
Contributor Author

goodb commented Feb 9, 2020

Hit another consequence of closing the shapes. When we have a shape that extends another shape, e.g. <TransporterActivity> @<MolecularFunction> and both <TransporterActivity> and <MolecularFunction> are part of the query map, we end up with errors on the higher level type.

Here <TransporterActivity> allows additional properties like transports_or_maintains_localization_of: that, under the closed view, are not allowed in the parent definition for <MolecularFunction>

Nodes that are transporter activities are also molecular functions. So we end up validating correctly against transporter activity and failing on molecular function.

My initial instinct here is to set it up so that the validation is only run against the most specific matching shape from the map. So here, these nodes would only be tested against the transporter activity shape.

@ericprud
Copy link
Collaborator

ericprud commented Feb 10, 2020

@goodb ,

... Is there another pattern that would allow us to automatically bring in the predicates from the 'parent' shapes when a child shape is declared to be closed? Or perhaps I am not thinking about this the right way?

i mocked up a CLOSED version of that subset of your schema using EXTENDS: https://tinyurl.com/w8j6yby . We need to lean on Iovka to finish her EXTENDS implementation. (I suspect she'd get a paper out of it as well, as it's very useful for these sorts of models.)

@goodb
Copy link
Contributor Author

goodb commented Feb 11, 2020

For the record here, for any thread followers.

Until we have an implementation that handles the EXTENDS construct (see iovka/shex-java#7 ), I don't think we want to add the CLOSED parameter to our shapes. This would require a large and confusing addition of EXTRA parameters. (One possible option there is to add them automatically in a kind of compilation step that happens when the schema is loaded, but not doing that now.)

But, I have implemented some procedural code checks that have a similar result for the interim. @kltm and others looking at JSON coming back from a minerva validation call will see Violations objects structured just like they have been up till now. When a property is used that is not part of the shape being tested for a given node (a closed violation), the 'intended range shapes' field will just contain owl:Nothing to indicate that there can be no allowed use of that property on that node. e.g. if you added a has_participant relation with a molecular function node as its domain, you would get a JSON message like the following. (BTW this is unintentionally helpful in debugging inconsistent OWL models...).

[
   {
      "shape": "obo:go/shapes/MolecularFunction",
      "constraints": [
         {
            "object": "http://model.geneontology.org/R-HSA-1971475/R-HSA-1971475",
            "property": "RO:0000057",
            "node_types": [
               "GO:0008194",
               "GO:0016757",
               "GO:0016758",
               "GO:0003824",
               "GO:0015020",
               "BFO:0000003",
               "BFO:0000015",
               "owl:Thing",
               "GO:0016740",
               "GO:0015018",
               "GO:0003674"
            ],
            "object_types": [
               "GO:0043170",
               "GO:0008150",
               "GO:0006807",
               "GO:0008152",
               "BFO:0000003",
               "BFO:0000015",
               "GO:1901564",
               "GO:1901135",
               "GO:0006022",
               "owl:Thing",
               "GO:0071704",
               "GO:0030203"
            ],
            "matched_range_shapes": [
               "obo:go/shapes/ProvenanceAnnotated",
               "obo:go/shapes/BiologicalProcess",
               "obo:go/shapes/GoCamEntity"
            ],
            "intended-range-shapes": [
               "owl:Nothing"
            ]
         }
      ]
   }
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants