Aggregating metadata from portals across Belgium
Various local, regional and federal portals ...
- powered by different software products
- publishing metadata in different languages
- using different themes and keywords
- Scrapers harvest metadata from various portals
- Enhancers clean and transform the metadata
- Upload tool sends the metadata to data.gov.be
- Metadata is also published on github
+++
In a perfect world, everyone is using the same format to exchange data. The world isn't perfect.
+++
- Written in Java, open source
- Command-line, runs almost everywhere
- Custom parser per scraped site
- Luckily some portals are standardising on DCAT-AP
+++
- Cleaning and enriching the scraped metadata
- Various SPARQL queries for small corrections
- Mapping keywords and themes to EU ODP themes
- Using (manually created) SKOS files
+++
- Translates DCAT-AP into Drupal JSON
+++
- Drupal 7 website, based on OpenFed
- Additional modules Rest-WS and Rest-WS i18n
- Allows updates via JSON REST API
- Services modules too heavy / overkill
- Wasn't fully DCAT-AP ready in 2015
- We already have 100+ Drupal websites
- Doesn't solve harvesting metadata from non-CKAN sites
- The code of the EU ODP wasn't available yet in 2015
- Maintenance: many components, overkill for our purposes
- The data is sent to the ODP anyway
- Metadata exchange format (RDF)
- Titles, descriptions, download links ...
- Application Profile of W3C DCAT
- Promoted by JoinUp.eu / European Commission |
Questions ?