Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic resource count per UUID entry based on file size and configurable bucket size #7

Merged
merged 3 commits into from
Oct 12, 2022

Conversation

TomWindels
Copy link
Contributor

With these changes, it should be possible to group multiple resources into a single UUID, based on a target file size per resource. The bucket size is now more dynamic as well.

Copy link
Owner

@woutslabbinck woutslabbinck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall seems good. I have on issue about the extraction of resources, to which I have added a suggestion such that the original pipeline would still produce the same result.

If you have another suggestion, feel free to add :)
There are many options for us both to get what we want.

Comment on lines 48 to 96

for (const subject of time_subjects) {
// add observation to resource
let quads = store.getQuads(subject, null, null, null);

// add featureOfInterest to resource
const feats = store.getQuads(subject, 'http://www.w3.org/ns/sosa/hasFeatureOfInterest', null, null);
feats.forEach((interst) => {
quads = quads.concat(
store.getQuads(interst.object, null, null, null)
);
});

// add result to resource
const results = store.getQuads(subject, 'http://www.w3.org/ns/sosa/hasResult', null, null);
results.forEach((res) => {
quads = quads.concat(
store.getQuads(res.object, null, null, null)
);
});

// add location to resource
const location = store.getQuads(subject, 'http://www.w3.org/ns/sosa/observedProperty', null, null);
location.forEach((loc) => {
quads = quads.concat(
store.getQuads(loc.object, null, null, null)
);
});

// add sensor to resource
const sensor = store.getQuads(subject, 'http://www.w3.org/ns/sosa/madeBySensor', null, null);
sensor.forEach((sens) => {
// we dont want show all the observations the sensor made in every resource, only the one that matters
quads.push(store.getQuads(sens.object, 'http://www.w3.org/ns/sosa/madeObservation', subject, null)[0]);
// take all quads and filter out all madeBySensor quads
const all_sens = store.getQuads(sens.object, null, null, null);
const diff = all_sens.filter(x => x.predicate.value !== 'http://www.w3.org/ns/sosa/madeObservation');
quads = quads.concat(diff);

// add platform to resource
const platform = store.getQuads(sens.object, 'http://www.w3.org/ns/sosa/isHostedBy', null, null);
platform.forEach((plat) => {
quads = quads.concat(
store.getQuads(plat.object, null, null, null)
);
});
});

resources.push(quads)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this part is something you do not need from the index.ts script. However, for the original pipeline it is necessary for the extraction of the full resource of the location model.

To be more concrete:
Without that bit of code I would per resource receive this:

<http://location.example.com/tracks/observation/2022-08-07T08%3A14%3A04Z> dct:isVersionOf ex:location ;
rdf:type sosa:Observation ;
sosa:hasFeatureOfInterest <https://data.knows.idlab.ugent.be/person/woslabbi/#me> ;
sosa:hasResult <http://location.example.com/tracks/observation/result/2022-08-07T08%3A14%3A04Z> ;
sosa:hasSimpleResult "POINT(3.621189000 50.962510000)"^^geo:wktLiteral ;
sosa:madeBySensor <http://sensor.be> ;
sosa:observedProperty <http://location.example.com/location> ;
sosa:resultTime "2022-08-07T08:14:04Z"^^xsd:dateTime .

<http://location.example.com/tracks/observation/result/2022-08-07T08%3A14%3A04Z> rdf:type sosa:Result ;
wgs:elevation "7.1" ;
wgs:latitude "50.962510000" ;
wgs:longitude "3.621189000" ;
<https://w3id.org/transportmode#transportMode> <https://w3id.org/transportmode#Walking> .

While with this piece of code I receive more information:

<http://device.be> rdf:type sosa:Platform ;
sosa:hosts <http://sensor.be> .

<http://location.example.com/location> rdf:type sosa:observedProperty ;
rdfs:comment "The Geographic location observed by a sensor."@en ;
rdfs:label "Location"@en .

<http://location.example.com/tracks/observation/2022-08-07T08%3A14%3A04Z> dct:isVersionOf ex:location ;
rdf:type sosa:Observation ;
sosa:hasFeatureOfInterest <https://data.knows.idlab.ugent.be/person/woslabbi/#me> ;
sosa:hasResult <http://location.example.com/tracks/observation/result/2022-08-07T08%3A14%3A04Z> ;
sosa:hasSimpleResult "POINT(3.621189000 50.962510000)"^^geo:wktLiteral ;
sosa:madeBySensor <http://sensor.be> ;
sosa:observedProperty <http://location.example.com/location> ;
sosa:resultTime "2022-08-07T08:14:04Z"^^xsd:dateTime .

<http://location.example.com/tracks/observation/result/2022-08-07T08%3A14%3A04Z> rdf:type sosa:Result ;
wgs:elevation "7.1" ;
wgs:latitude "50.962510000" ;
wgs:longitude "3.621189000" ;
<https://w3id.org/transportmode#transportMode> <https://w3id.org/transportmode#Walking> .

<http://sensor.be> rdf:type sosa:Sensor ;
sosa:isHostedBy <http://device.be> ;
sosa:madeObservation <http://location.example.com/tracks/observation/2022-08-07T08%3A14%3A04Z> ;
sosa:observes <http://location.example.com/location> .

<https://data.knows.idlab.ugent.be/person/woslabbi/#me> rdf:type sosa:FeatureOfInterest .

As a suggestion, the above code could be placed in a utility function extractLocationResource.
The default behaviour would still be to call that resource.
In your case, you are only interested in samples per subject, so then you can extract the resource on subject base (which can also be configurable).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it strange that the recursive implementation didn't add the triples with http://sensor.be (and recursively http://device.be and http://location.example.com/location) as a subject. I'll play around with it some more to find out how this happened and see if I can resolve it to do this properly as well. However, the triple with subject https://data.knows.idlab.ugent.be/person/woslabbi/#me would indeed not be added with this approach, so a separate function (that is called on default, when no additional arguments are present) would be required indeed.

Copy link
Contributor Author

@TomWindels TomWindels Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a fix for another (related) issue, but I am unable to replicate your specific results. Using the data you have given as an example above, and creating a TTL file from that, the data gets parsed to a single resource, which contains the following subjects (and all its data):
Set(6) {
'http://location.example.com/tracks/observation/2022-08-07T08%3A14%3A04Z',
'http://location.example.com/location',
'https://data.knows.idlab.ugent.be/person/woslabbi/#me',
'http://location.example.com/tracks/observation/result/2022-08-07T08%3A14%3A04Z',
'http://sensor.be',
'http://device.be'
}`. Could you maybe provide an example .ttl (or .nt) file so I can see if that helps in replicating the issue?
PS: I see the subject 'https://data.knows.idlab.ugent.be/person/woslabbi/#me' is referenced in the original measurement as well, so a separate function might not be required after all (if I can figure out how it went wrong on your end).

@woutslabbinck woutslabbinck linked an issue Oct 12, 2022 that may be closed by this pull request
Copy link
Owner

@woutslabbinck woutslabbinck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice Work

@woutslabbinck woutslabbinck merged commit b849465 into woutslabbinck:main Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make bucketSize configurable
2 participants