Skip to content

Latest commit

 

History

History
260 lines (230 loc) · 47.8 KB

README.md

File metadata and controls

260 lines (230 loc) · 47.8 KB

.github/workflows/release.yaml

Dataset: dbpedia-live (development version)

DBpedia Live was a real-time service that monitored edits on Wikipedia and published a stream of changes to the DBpedia knowledge graph. This dataset contains only the "added" triples in the stream, so it does not include deletes or other types of changes. Only one month (January 2014) is covered at the moment, but the dataset can be easily expanded in the future (the service stopped functioning in 2021). The stream's elements are irregular in size, depending on the volume of traffic on Wikipedia at a given moment and how the DBpedia Live service was able to cope with it. See also the paper.

The dataset was extensively cleaned to fix or remove bad IRIs, bad Unicode, and invalid literals.

This README is a snapshot of documentation for the latest development version of the dataset. Full documentation for all versions can be found on the website.

General information

Technical metadata

  • Has stream type usage:
    • RDF stream type usage (1)
      • Type: RDF stream type usage (stax:RdfStreamTypeUsage)
      • Comment: The dataset can be viewed as a stream of graphs corresponding to batches of updates of Wikipedia articles. (en)
      • Has stream type: RDF graph stream (stax:graphStream)
    • RDF stream type usage (2)
  • Has stream element count: 166,204
  • Has stream element split:
    • Type: Stream elements split by time (rb:TimeStreamElementSplit)
    • Comment: Each element corresponds to a batch of recent changes from Wikipedia. The size of the batch may have been influenced by the traffic on Wikipedia, the load on the system, and other factors, so the element sizes are irregular. (en)
    • Has temporal property: http://dbpedia.org/ontology/wikiPageExtracted
  • Uses vocabulary: http://dbpedia.org/ontology/
  • Conforms to W3C RDF 1.1 specification: yes
  • Conforms to W3C RDF-star draft specification as of December 17, 2021: yes
  • Uses generalized triples: no
  • Uses generalized RDF datasets: no
  • Uses RDF-star: no

Distributions

Full stream distribution

Full Jelly distribution

Full flat distribution

100K elements stream distribution

100K elements Jelly distribution

100K elements flat distribution

10K elements stream distribution

10K elements Jelly distribution

10K elements flat distribution