Skip to content

RiverBench/dataset-dbpedia-live

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

.github/workflows/release.yaml

Dataset: dbpedia-live (development version)

DBpedia Live was a real-time service that monitored edits on Wikipedia and published a stream of changes to the DBpedia knowledge graph. This dataset contains only the "added" triples in the stream, so it does not include deletes or other types of changes. Only one month (January 2014) is covered at the moment, but the dataset can be easily expanded in the future (the service stopped functioning in 2021). The stream's elements are irregular in size, depending on the volume of traffic on Wikipedia at a given moment and how the DBpedia Live service was able to cope with it. See also the paper.

The dataset was extensively cleaned to fix or remove bad IRIs, bad Unicode, and invalid literals.

This README is a snapshot of documentation for the latest development version of the dataset. Full documentation for all versions can be found on the website.

General information

Technical metadata

  • Has stream type usage:
    • RDF stream type usage (1)
      • Type: RDF stream type usage (stax:RdfStreamTypeUsage)
      • Comment: The dataset can be viewed as a stream of graphs corresponding to batches of updates of Wikipedia articles. (en)
      • Has stream type: RDF graph stream (stax:graphStream)
    • RDF stream type usage (2)
  • Has stream element count: 166,204
  • Has stream element split:
    • Type: Stream elements split by time (rb:TimeStreamElementSplit)
    • Comment: Each element corresponds to a batch of recent changes from Wikipedia. The size of the batch may have been influenced by the traffic on Wikipedia, the load on the system, and other factors, so the element sizes are irregular. (en)
    • Has temporal property: http://dbpedia.org/ontology/wikiPageExtracted
  • Uses vocabulary: http://dbpedia.org/ontology/
  • Conforms to W3C RDF 1.1 specification: yes
  • Conforms to W3C RDF-star draft specification as of December 17, 2021: yes
  • Uses generalized triples: no
  • Uses generalized RDF datasets: no
  • Uses RDF-star: no

Distributions

Full stream distribution

Full Jelly distribution

Full flat distribution

100K elements stream distribution

100K elements Jelly distribution

100K elements flat distribution

10K elements stream distribution

10K elements Jelly distribution

10K elements flat distribution