Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enh/attributes in schema validation #822

Merged
merged 9 commits into from
Oct 13, 2022
Merged

Conversation

MBueschelberger
Copy link
Member

@MBueschelberger MBueschelberger commented Sep 26, 2022

Simple enhancement for checking of data properties (attributes) during schema validation in SimPhoNy < v4.

Closes #821

@MBueschelberger MBueschelberger changed the base branch from master to dev September 26, 2022 10:42
@kysrpex
Copy link
Contributor

kysrpex commented Sep 26, 2022

Thanks for providing your own solution, I will review it within the next few days.

@MBueschelberger
Copy link
Member Author

@kysrpex: the CI/testing currently fails because https://xmlns.com/foaf/spec/index.rdf for pulling the foaf-ontology is down. We might rerun it later again.

Copy link
Contributor

@kysrpex kysrpex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for submitting the PR. It is good.

I requested changes only due to two minor details: a typo and deleting a TODO (see reasons on review comments). I provided the changes as "code review suggestions" so I think you can just click a button below the suggestion to apply the changes directly on GitHub.

osp/core/utils/schema_validation.py Outdated Show resolved Hide resolved
osp/core/utils/schema_validation.py Outdated Show resolved Hide resolved
tests/test_utils.py Show resolved Hide resolved
osp/core/utils/schema_validation.py Outdated Show resolved Hide resolved
@kysrpex
Copy link
Contributor

kysrpex commented Oct 10, 2022

Thank you for submitting the PR. It is good.

I requested changes only due to two minor details: a typo and deleting a TODO (see reasons on review comments). I provided the changes as "code review suggestions" so I think you can just click a button below the suggestion to apply the changes directly on GitHub.

@MBueschelberger I will take care of the failing CI once you commit the requested changes.

@kysrpex kysrpex linked an issue Oct 12, 2022 that may be closed by this pull request
MBueschelberger and others added 5 commits October 13, 2022 10:46
Co-authored-by: José Manuel Domínguez <43052541+kysrpex@users.noreply.github.com>
Co-authored-by: José Manuel Domínguez <43052541+kysrpex@users.noreply.github.com>
Co-authored-by: José Manuel Domínguez <43052541+kysrpex@users.noreply.github.com>
@kysrpex kysrpex merged commit 231d72c into dev Oct 13, 2022
@kysrpex kysrpex deleted the enh/attributes-in-schema-validation branch October 13, 2022 09:08
@kysrpex kysrpex restored the enh/attributes-in-schema-validation branch October 13, 2022 09:08
@kysrpex kysrpex deleted the enh/attributes-in-schema-validation branch October 13, 2022 09:09
@MBueschelberger
Copy link
Member Author

@kysrpex:

I was also asked for the functionality to check the actual values of the Cuds.

For my opion this makes absolutely sense if you want to write a unittest for your function which is generating Cuds with actual data.

However, my suggestion would be that this might also overwrite the cardinality with a value-keyword in the case of data properties:

Restriction to length

    city.Neighborhood:
        city.name:
            STRING:
                value: 1+

Following a similar pattern as the cardinality, this value might e.g. in case of a string restrict the length (see example above) or the actual value:

Restriction of value

    city.Neighborhood:
        city.name:
            STRING:
                value: Stühlinger

In the case of floats, bools and ints, there is no restriction of the length needed, but rather a statement towards the data range:

Restriction of range

    city.Citizen:
        city.age:
            INTEGER:
                value: 0+

... or the actual value:

Restriction of value

    city.Citizen:
        city.age:
            INTEGER:
                value: 29
        city.name:
            STRING:
                value: Matthias

In case of vectors, length, data range and value restriction might be a bit more complicated. But I think this is not a huge priority to be implemented since we are ontologizing in the EMMO-way, without any vectors as data properties.

Since you already closed this PR and deleted the branch, I can come up with a new branch with this idea.

What do you think?

@kysrpex
Copy link
Contributor

kysrpex commented Oct 13, 2022

@kysrpex:

I was also asked for the functionality to check the actual values of the Cuds.

For my opion this makes absolutely sense if you want to write a unittest for your function which is generating Cuds with actual data.

However, my suggestion would be that this might also overwrite the cardinality with a value-keyword in the case of data properties:

Restriction to length

    city.Neighborhood:
        city.name:
            STRING:
                value: 1+

Following a similar pattern as the cardinality, this value might e.g. in case of a string restrict the length (see example above) or the actual value:

Restriction of value

    city.Neighborhood:
        city.name:
            STRING:
                value: Stühlinger

In the case of floats, bools and ints, there is no restriction of the length needed, but rather a statement towards the data range:

Restriction of range

    city.Citizen:
        city.age:
            INTEGER:
                value: 0+

... or the actual value:

Restriction of value

    city.Citizen:
        city.age:
            INTEGER:
                value: 29
        city.name:
            STRING:
                value: Matthias

In case of vectors, length, data range and value restriction might be a bit more complicated. But I think this is not a huge priority to be implemented since we are ontologizing in the EMMO-way, without any vectors as data properties.

Since you already closed this PR and deleted the branch, I can come up with a new branch with this idea.

What do you think?

Sorry I was not expecting a new message 😅

If you open a new PR and provide a working implementation for SimPhoNy v3 for what you are mentioning, I will definitely not decline to merge it. But before you put your hands on the keyboard, I again strongly advise you to carefully check if either SHACL or OTTR solve your problem.

I still did not carefully check, but it is my intention. I prefer to stick to something that exists (as far as I know there are validators for SHACL), especially if it is open-source, rather than to reinvent the wheel. The point I want to make here is that if you implement now a solution for SimPhoNy v3, unless it is better than SHACL or OTTR, I will not keep this solution in v4 or future versions. But unfortunately I do not know if those two can satisfy our needs right now or not.

@MBueschelberger
Copy link
Member Author

Hi @kysrpex, thanks for the hint on SHACL and OTTR.

This functionality for the schema validation was originally requested for a unit test in a small use case, not for a proper pipeline in a larger framework.

On the one hand, I totally agree that SHACL offers good stability+flexiblity against our current validation-algorithm in order to make a proper analysis of our CUDS.

But on the other hand, I think that a proper integration would require a parser which is able to convert the yml-schema into the node-shapes of shacl, which might require some work.

For me, it is currently not clear in how far this might fit into the timeline of the usecase, since there is enough other work to do (as usual). On the other hand, this is of course a major benefit for SimPhoNy v3 and v4, as you already evaluated.

I think I have to discuss this internally before deciding which way to go for the time given.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add atttributes (=data properties) to schema validation for CUDS
2 participants