Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If statement will not run for valueUrl #150

Open
sytzevh opened this issue Jan 20, 2023 · 5 comments
Open

If statement will not run for valueUrl #150

sytzevh opened this issue Jan 20, 2023 · 5 comments
Assignees
Labels

Comments

@sytzevh
Copy link
Contributor

sytzevh commented Jan 20, 2023

The exercise for the if statement wiki page doesn't give the proper results. I included the prefix: "sdmx-code": "http://purl.org/linked-data/sdmx/2009/code#" in the cow_person_example.csv JSON schema, and replaced the "male" column with the following code:

 {
    "name": "male",
    "datatype": "string",
    "@id": "https://iisg.amsterdam/cow_person_example.csv/column/male",
    "dc:description": "The state of being male or female",
    "titles": ["male"],
    "propertyUrl": "sdmx-code:sex",
    "valueUrl": "sdmx-code:{% if male == '0' %}sex-F{% else %}sex-M{% endif %}"
  },

The valueUrl does not seem to accept the if statement, while a "csvw:value" with the same code does run without issues.

@rijpma rijpma added the bug label Jan 27, 2023
@wxwilcke
Copy link
Contributor

wxwilcke commented Jan 27, 2023

This might be related to #148.

I did a brief look. The part of the code that is responsible for processing the valueUrl begins and ends on line 587 and 625, respectively. The expand_url function is then called upon the value, which in turn calls the render_pattern function to convert the value using the Jinja2 backend. Something goes wrong here, but I'm not yet sure what. I'll look into it some more later.

@rijpma
Copy link
Member

rijpma commented Jan 27, 2023

Tried to replicate and I get the same issue: runs fine with csvw:value, but no triples are generated when using valueUrl. I don't recall issues with this previously.

@rijpma
Copy link
Member

rijpma commented Jan 27, 2023

Hi @wxwilcke , though it could be that the output was always missing and we never noticed...

@rijpma
Copy link
Member

rijpma commented Jan 27, 2023

Could this potentially be solved relatively easily by moving a line like https://github.com/CLARIAH/COW/blob/base/src/converter/csvw.py#L629 ? Or is there a lot more complexity to that?

@wxwilcke
Copy link
Contributor

wxwilcke commented Jan 30, 2023

After a lot of testing I discovered that the recent versions of the rdflib json-ld parser won't process URIs with white space in it. This would normally be a good thing, but for some reason COW reads the metadata.json file as a json-ld file. Hence, the jinja pattern in the valueUrl tag gets ignored and is replaced by the base URI:

>>> import rdflib
>>> metadata_graph = rdflib.Graph()
>>> metadata_graph.load('../test/cow_person_example.csv-metadata.json', format='json-ld')
<Graph identifier=N9cadb1c623b84947975324b58c3ce06b (<class 'rdflib.graph.Graph'>)>
>>> for t in metadata_graph.triples((rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'),None,None)):
...     print(t)
... 
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#name'), rdflib.term.Literal('male'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#datatype'), rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://purl.org/dc/terms/description'), rdflib.term.Literal('The state of being male or female', lang='en'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#title'), rdflib.term.Literal('male', lang='en'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#propertyUrl'), rdflib.term.URIRef('http://purl.org/linked-data/sdmx/2009/code#sex'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#valueUrl'), rdflib.term.URIRef('https://example.com/id/'))

Ideally, COW gets rewritten to read the file as plain json, but this would require quite a bit of work. Instead, I fixed the issue by adding some code that replaces underscores ('_') by white spaces. However, the metadata file now has to be updated by replacing all white space in the valueUrl value by underscores:

{
    "name": "male",
    "datatype": "string",
    "@id": "https://iisg.amsterdam/cow_person_example.csv/column/male",
    "dc:description": "The state of being male or female",
    "titles": ["male"],
    "propertyUrl": "sdmx-code:sex",
    "valueUrl": "sdmx-code:{%_if_male_==_'0'_%}sex-F{%_else_%}sex-M{%_endif_%}"
   },

This allows the jinja patterns to be read by the json-ld parser:

>>> metadata_graph = rdflib.Graph()
>>> metadata_graph.load('../test/cow_person_example.csv-metadata.json', format='json-ld')
<Graph identifier=Nac44aa43f124410492df49bdc00fa9ad (<class 'rdflib.graph.Graph'>)>
>>> for t in metadata_graph.triples((rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'),None,None)):
...     print(t)
... 
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#name'), rdflib.term.Literal('male'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#datatype'), rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://purl.org/dc/terms/description'), rdflib.term.Literal('The state of being male or female', lang='en'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#title'), rdflib.term.Literal('male', lang='en'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#propertyUrl'), rdflib.term.URIRef('http://purl.org/linked-data/sdmx/2009/code#sex'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#valueUrl'), rdflib.term.URIRef("http://purl.org/linked-data/sdmx/2009/code#{%_if_male_==_'0'_%}sex-F{%_else_%}sex-M{%_endif_%}"))

I've uploaded the fix as branch issue148. @rijpma @sytzevh could you try the fix please? Could you also test whether this fix doesn't destroy other jinja patterns? Instead of installing it using pip, clone the branch and call the csvw_tool.py directly:

git clone https://github.com/CLARIAH/COW.git
cd COW
git checkout issue148
python ./src/csvw_tool.py build cow_person_example.csv
python ./src/csvw_tool.py convert cow_person_example.csv

@wxwilcke wxwilcke self-assigned this Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants