-
Notifications
You must be signed in to change notification settings - Fork 36
Finding Vocabularies that Datasets Use
- One click data dump makes it easy to get all RDF from a Prizms node.
- util/p-and-c.sh lists all properties and classes used in an RDF file.
- Linked Open Vocabularies and prefix.cc maintain collections of vocabulary namespaces.
- VoID can be used to describe which vocabularies a dataset uses.
- How csv2rdf4lod asserts
void:vocabulary
during conversion. - How csv2rdf4lod-automation discovers
void:vocabulary
after conversion- ... thanks to Linked Open Vocabularies.
- ... thanks to prefix.cc.
Every time the Java implementation of csv2rdf4lod asserts a triple, it also asserts the property and class that it asserted. The class edu.rpi.tw.data.csv.valuehandlers.DefaultValueHandler
maintains Set<URI>
s assertedPredicates
and assertedClasses
. Example Sesame code looks like:
repositoryConnection.add(valueR, additionalPredicate, templateFiller.tryExpand(additionalObject));
super.assertedPredicates.add(additionalPredicate);
if( additionalPredicate.equals(RDF.TYPE) ) {
super.assertedClasses.add((URI)templateFiller.tryExpand(additionalObject));
}
The Java instrumenation above results in the metadata in automatic/*.void.ttl
:
<http://lofd.tw.rpi.edu/source/datahub-io/dataset/corpwatch/version/2013-Apr-24/conversion/enhancement/1>
a void:Dataset;
conversion:uses_predicate void:inDataset , ov:csvRow ,
ov:subjectDiscriminator , dcterms:isReferencedBy , e1:no_sic ,
rdfs:label , e1:most_recent , e1:year , corpwatch_vocab:company_name ,
dcterms:title , e1:source_type , foaf:page , e1:top_parent_id , e1:min_year ,
e1:num_children , dcterms:isPartOf , skos:broader , con:preferredURI ,
e1:max_year , dcterms:identifier , e1:num_parents , prov:alternateOf ,
datahub-io_vocab:bestLocation ;
void:vocabulary <http://rdfs.org/ns/void#> ,
<http://open.vocab.org/terms/> , <http://purl.org/dc/terms/> ,
<http://lofd.tw.rpi.edu/source/datahub-io/dataset/corpwatch/vocab/enhancement/1/> ,
<http://www.w3.org/2000/01/rdf-schema#> ,
<http://lofd.tw.rpi.edu/source/datahub-io/dataset/corpwatch/vocab/> ,
<http://xmlns.com/foaf/0.1/> , <http://www.w3.org/2004/02/skos/core#> ,
<http://www.w3.org/2000/10/swap/pim/contact#> ,
<http://www.w3.org/ns/prov#> ,
<http://lofd.tw.rpi.edu/source/datahub-io/vocab/> .
LOV updates http://lov.okfn.org/dataset/lov/lov.rdf daily. We can get the vocabulary namespaces by applying the query:
prefix vann: <http://purl.org/vocab/vann/>
prefix voaf: <http://purl.org/vocommons/voaf#>
select distinct ?namespace
where {
?vocab
a voaf:Vocabulary;
vann:preferredNamespaceUri ?namespace
.
}
tdbquery --loc=manual/lov.rdf.tdb --query=../../src/preferredNamespaceUri.rq --results csv | grep -v "^namespace" | sort -u
produces:
http://aims.fao.org/aos/geopolitical.owl#
http://cms-wg.sti2.org/ns/minimal-service-model#
http://commontag.org/ns#
http://contextus.net/ontology/ontomedia/core/expression#
http://contextus.net/ontology/ontomedia/ext/common/trait#
http://courseware.rkbexplorer.com/ontologies/courseware#
http://creativecommons.org/ns#
http://d-nb.info/standards/elementset/agrelon.owl#
http://d-nb.info/standards/elementset/gnd#
http://data.archiveshub.ac.uk/def/
http://data.lirmm.fr/ontologies/food#
...
lofd@lofd:~/prizms/lofd/data/source/us/cr-full-dump/version/latest$ p-and-c.sh publish/lofd-tw-rpi-edu.nt.gz | sort -u
lists all properties and classes used in the One click data dump, which looks like:
http://inference-web.org/2.0/pml-justification.owl#hasAntecedentList
http://inference-web.org/2.0/pml-justification.owl#hasConclusion
http://inference-web.org/2.0/pml-justification.owl#hasIndex
http://inference-web.org/2.0/pml-justification.owl#hasInferenceEngine
http://inference-web.org/2.0/pml-justification.owl#hasInferenceRule
http://inference-web.org/2.0/pml-justification.owl#hasSourceUsage
http://inference-web.org/2.0/pml-justification.owl#InferenceStep
http://inference-web.org/2.0/pml-justification.owl#isConsequentOf
http://inference-web.org/2.0/pml-justification.owl#NodeSet
http://inference-web.org/2.0/pml-provenance.owl#hasFormat
...
http://openprovenance.org/ontology#cause
http://openprovenance.org/ontology#effect
http://openprovenance.org/ontology#endTime
http://openprovenance.org/ontology#WasControlledBy
http://open.vocab.org/terms/csvCol
http://open.vocab.org/terms/csvHeader
http://open.vocab.org/terms/csvRow
http://provenanceweb.org/ns/pml#TranslationActivity
http://purl.org/dc/terms/author
http://purl.org/dc/terms/contributor
http://purl.org/dc/terms/created
http://purl.org/dc/terms/creator
...
http://www.w3.org/ns/prov#wasGeneratedBy
http://www.w3.org/ns/prov#wasQuotedFrom
http://xmlns.com/foaf/0.1/accountName
http://xmlns.com/foaf/0.1/Agent
http://xmlns.com/foaf/0.1/isPrimaryTopicOf
http://xmlns.com/foaf/0.1/OnlineAccount
http://xmlns.com/foaf/0.1/primaryTopic
http://xmlns.com/foaf/0.1/topic
This update to bin/secondary/cr-linksets.sh implements it, which is invoked by cr-cron.sh (see Automation on twc-healthdata project).