-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic resource count per UUID entry based on file size and configurable bucket size #7
Conversation
Made bucket size configurable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall seems good. I have on issue about the extraction of resources, to which I have added a suggestion such that the original pipeline would still produce the same result.
If you have another suggestion, feel free to add :)
There are many options for us both to get what we want.
EventSource/index.ts
Outdated
|
||
for (const subject of time_subjects) { | ||
// add observation to resource | ||
let quads = store.getQuads(subject, null, null, null); | ||
|
||
// add featureOfInterest to resource | ||
const feats = store.getQuads(subject, 'http://www.w3.org/ns/sosa/hasFeatureOfInterest', null, null); | ||
feats.forEach((interst) => { | ||
quads = quads.concat( | ||
store.getQuads(interst.object, null, null, null) | ||
); | ||
}); | ||
|
||
// add result to resource | ||
const results = store.getQuads(subject, 'http://www.w3.org/ns/sosa/hasResult', null, null); | ||
results.forEach((res) => { | ||
quads = quads.concat( | ||
store.getQuads(res.object, null, null, null) | ||
); | ||
}); | ||
|
||
// add location to resource | ||
const location = store.getQuads(subject, 'http://www.w3.org/ns/sosa/observedProperty', null, null); | ||
location.forEach((loc) => { | ||
quads = quads.concat( | ||
store.getQuads(loc.object, null, null, null) | ||
); | ||
}); | ||
|
||
// add sensor to resource | ||
const sensor = store.getQuads(subject, 'http://www.w3.org/ns/sosa/madeBySensor', null, null); | ||
sensor.forEach((sens) => { | ||
// we dont want show all the observations the sensor made in every resource, only the one that matters | ||
quads.push(store.getQuads(sens.object, 'http://www.w3.org/ns/sosa/madeObservation', subject, null)[0]); | ||
// take all quads and filter out all madeBySensor quads | ||
const all_sens = store.getQuads(sens.object, null, null, null); | ||
const diff = all_sens.filter(x => x.predicate.value !== 'http://www.w3.org/ns/sosa/madeObservation'); | ||
quads = quads.concat(diff); | ||
|
||
// add platform to resource | ||
const platform = store.getQuads(sens.object, 'http://www.w3.org/ns/sosa/isHostedBy', null, null); | ||
platform.forEach((plat) => { | ||
quads = quads.concat( | ||
store.getQuads(plat.object, null, null, null) | ||
); | ||
}); | ||
}); | ||
|
||
resources.push(quads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this part is something you do not need from the index.ts
script. However, for the original pipeline it is necessary for the extraction of the full resource of the location model.
To be more concrete:
Without that bit of code I would per resource receive this:
<http://location.example.com/tracks/observation/2022-08-07T08%3A14%3A04Z> dct:isVersionOf ex:location ;
rdf:type sosa:Observation ;
sosa:hasFeatureOfInterest <https://data.knows.idlab.ugent.be/person/woslabbi/#me> ;
sosa:hasResult <http://location.example.com/tracks/observation/result/2022-08-07T08%3A14%3A04Z> ;
sosa:hasSimpleResult "POINT(3.621189000 50.962510000)"^^geo:wktLiteral ;
sosa:madeBySensor <http://sensor.be> ;
sosa:observedProperty <http://location.example.com/location> ;
sosa:resultTime "2022-08-07T08:14:04Z"^^xsd:dateTime .
<http://location.example.com/tracks/observation/result/2022-08-07T08%3A14%3A04Z> rdf:type sosa:Result ;
wgs:elevation "7.1" ;
wgs:latitude "50.962510000" ;
wgs:longitude "3.621189000" ;
<https://w3id.org/transportmode#transportMode> <https://w3id.org/transportmode#Walking> .
While with this piece of code I receive more information:
<http://device.be> rdf:type sosa:Platform ;
sosa:hosts <http://sensor.be> .
<http://location.example.com/location> rdf:type sosa:observedProperty ;
rdfs:comment "The Geographic location observed by a sensor."@en ;
rdfs:label "Location"@en .
<http://location.example.com/tracks/observation/2022-08-07T08%3A14%3A04Z> dct:isVersionOf ex:location ;
rdf:type sosa:Observation ;
sosa:hasFeatureOfInterest <https://data.knows.idlab.ugent.be/person/woslabbi/#me> ;
sosa:hasResult <http://location.example.com/tracks/observation/result/2022-08-07T08%3A14%3A04Z> ;
sosa:hasSimpleResult "POINT(3.621189000 50.962510000)"^^geo:wktLiteral ;
sosa:madeBySensor <http://sensor.be> ;
sosa:observedProperty <http://location.example.com/location> ;
sosa:resultTime "2022-08-07T08:14:04Z"^^xsd:dateTime .
<http://location.example.com/tracks/observation/result/2022-08-07T08%3A14%3A04Z> rdf:type sosa:Result ;
wgs:elevation "7.1" ;
wgs:latitude "50.962510000" ;
wgs:longitude "3.621189000" ;
<https://w3id.org/transportmode#transportMode> <https://w3id.org/transportmode#Walking> .
<http://sensor.be> rdf:type sosa:Sensor ;
sosa:isHostedBy <http://device.be> ;
sosa:madeObservation <http://location.example.com/tracks/observation/2022-08-07T08%3A14%3A04Z> ;
sosa:observes <http://location.example.com/location> .
<https://data.knows.idlab.ugent.be/person/woslabbi/#me> rdf:type sosa:FeatureOfInterest .
As a suggestion, the above code could be placed in a utility function extractLocationResource
.
The default behaviour would still be to call that resource.
In your case, you are only interested in samples per subject, so then you can extract the resource on subject base (which can also be configurable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it strange that the recursive implementation didn't add the triples with http://sensor.be (and recursively http://device.be and http://location.example.com/location) as a subject. I'll play around with it some more to find out how this happened and see if I can resolve it to do this properly as well. However, the triple with subject https://data.knows.idlab.ugent.be/person/woslabbi/#me would indeed not be added with this approach, so a separate function (that is called on default, when no additional arguments are present) would be required indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created a fix for another (related) issue, but I am unable to replicate your specific results. Using the data you have given as an example above, and creating a TTL file from that, the data gets parsed to a single resource, which contains the following subjects (and all its data):
Set(6) {
'http://location.example.com/tracks/observation/2022-08-07T08%3A14%3A04Z',
'http://location.example.com/location',
'https://data.knows.idlab.ugent.be/person/woslabbi/#me',
'http://location.example.com/tracks/observation/result/2022-08-07T08%3A14%3A04Z',
'http://sensor.be',
'http://device.be'
}`. Could you maybe provide an example .ttl (or .nt) file so I can see if that helps in replicating the issue?
PS: I see the subject 'https://data.knows.idlab.ugent.be/person/woslabbi/#me' is referenced in the original measurement as well, so a separate function might not be required after all (if I can figure out how it went wrong on your end).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice Work
With these changes, it should be possible to group multiple resources into a single UUID, based on a target file size per resource. The bucket size is now more dynamic as well.