-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test serialization of GND RDF-XML to compact JSON-LD #1
Comments
For testing the quality of the JSON-LD output you should take a look at entities with geo coordinates (which are added via a bnode). For example http://d-nb.info/gnd/4074335-4 (ttl). See the issue at lobid/lodmill#503. |
First results, for http://d-nb.info/gnd/2047974-8/about/lds:
For http://d-nb.info/gnd/4074335-4/about/lds:
|
So the geo stuff is in there. However, we will need some post- and pre-processign to get the expected results. Pre-processing / ReasoningIn 1.0, we added some inferencing to get more general properties. I suggest doing similar things here:
Having done 1.) and 2.), the result would look like this: {
"@graph" : [ {
"@id" : "_:t1",
"@type" : "http://www.opengis.net/ont/sf#Point",
"http://www.opengis.net/ont/geosparql#asWKT" : [ {
"@type" : "http://www.opengis.net/ont/geosparql#wktLiteral",
"@value" : "Point ( -000.125740 +051.508530 )"
} ]
}, {
"@id" : "http://d-nb.info/gnd/4074335-4",
"@type" : [ "http://d-nb.info/standards/elementset/gnd#TerritorialCorporateBodyOrAdministrativeUnit", "http://d-nb.info/standards/elementset/gnd#PlaceOrGeographicName", "http://d-nb.info/standards/elementset/gnd#AuthorityResource" ],
"http://d-nb.info/standards/elementset/dnb#deprecatedUri" : [ "http://d-nb.info/gnd/1005809-6" ],
"http://d-nb.info/standards/elementset/gnd#definition" : [ {
"@language" : "de",
"@value" : "Hauptstadt des Vereinigten Königreichs von Großbritannien und Nordirland, in Mittelsteinzeit besiedelt, 43 n.Chr. von Römern gegründet; das County of London war 1889-1965 Verwaltungsgrafschaft u. zeremonielle Grafschaft"
} ],
"http://d-nb.info/standards/elementset/gnd#geographicAreaCode" : [ {
"@id" : "http://d-nb.info/standards/vocab/gnd/geographic-area-code#XA-GB"
} ],
"http://d-nb.info/standards/elementset/gnd#gndIdentifier" : [ "4074335-4" ],
"homepage" : [ {
"@id" : "http://www.london.gov.uk"
} ],
"http://d-nb.info/standards/elementset/gnd#oldAuthorityNumber" : [ "(DE-588)1005809-6", "(DE-588b)1005809-6", "(DE-588c)4074335-4" ],
"http://d-nb.info/standards/elementset/gnd#preferredName" : [ "London" ],
"http://d-nb.info/standards/elementset/gnd#relatedDdcWithDegreeOfDeterminacy4" : [ {
"@id" : "http://dewey.info/class/2--421/"
} ],
"http://d-nb.info/standards/elementset/gnd#variantName" : [ "Londinum", "Londra", "Lundonia", "Augusta Trinobantum", "Westminster", "Lundun", "Landan", "Londyn", "Londres", "Londen", "London (Great Britain)", "Londinium" ],
"http://www.opengis.net/ont/geosparql#hasGeometry" : [ {
"@id" : "_:t1"
} ],
"http://www.w3.org/2002/07/owl#sameAs" : [ {
"@id" : "http://d-nb.info/gnd/1005809-6"
}, {
"@id" : "http://sws.geonames.org/2643743"
} ]
} ]
} Context & FramingThe result of framing the above output (based on the to-be-added Furthermore, the |
I just found out that I already created a context for the 2.0 GND API, see #1. (We should probably delete this repo as soon as we have moved the issue over here.) This context is also missing some things (e.g. the geo properties), see http://tinyurl.com/y8z3f3rl. |
Another option would be direct transformation from MARC-XML to JSON, like in lobid-organisations. We could adapt the existing mappings for the RDF conversion: |
Re. the framing output from http://tinyurl.com/ychm4t92, I just noticed that blank nodes get an "hasGeometry": {
"@id": "_:b0",
"@type": "http://www.opengis.net/ont/sf#Point",
"asWKT": "Point ( -000.125740 +051.508530 )"
} We should get rid of them. This has already been addressed in the JSON-LD Framing spec 1.1 ("pruneBlankNodeIdentifiers") but is currently only implemented in the Ruby library, see json-ld/json-ld.org#293. |
Input: http://d-nb.info/gnd/4074335-4/about/lds Output:
@acka47 Except for the points you already mentioned (missing keys in context, blank node IDs) this looks OK. Did I understand correctly: the idea is to add the |
Yes, this already looks quite good. And yes, as in 1.0 we should add type Furthermore, we should have a type from the second level of GND ontology attached to each resource. We will need this for facetting. GND ontology has three levels in its type hierarchy (except for Person, where we have a fourth one added). see the overview over the GND class hierarchy at https://wiki1.hbz-nrw.de/x/CIeW. In the concrete example, Regarding the name properties, we should only use |
Deployed current state to: http://test.lobid.org/authorities Our London example: http://test.lobid.org/authorities/4074335-4.json @acka47 The context is used directly from GitHub, so you can edit on GitHub to test context tweaks: https://github.com/hbz/lobid-authorities/blob/master/conf/context.jsonld (Context content is from https://gist.githubusercontent.com/acka47/98035a3f215c783bdc00/raw/5699ab4e89b5e7ab896ac69442c84fcf7f50ad66/gnd-context_20160126.jsonld) |
Before working on the details (2nd level superclasses, rename fields, remove blank node IDs), I suggest we continue with testing the actual indexing of this format in Elasticsearch. I'd suggest we resolve this issue, and open new issues for the things I mentioned above. Assigning @acka47 for functional review. |
I just noticed that the language isn't indicated as we do in other lobid services: "definition":[
{
"@language":"de",
"@value":"Hauptstadt des Vereinigten Königreichs von Großbritannien und Nordirland, in Mittelsteinzeit besiedelt, 43 n.Chr. von Römern gegründet; das County of London war 1889-1965 Verwaltungsgrafschaft u. zeremonielle Grafschaft"
}
] We would rather have "definition":[
{
"de":"Hauptstadt des Vereinigten Königreichs von Großbritannien und Nordirland, in Mittelsteinzeit besiedelt, 43 n.Chr. von Römern gegründet; das County of London war 1889-1965 Verwaltungsgrafschaft u. zeremonielle Grafschaft"
}
] I updated the context accordingly but we will have to also take this into accoutn during transformation. |
Looks fine already, thus nothing more to do. (also adjusted context for biographicalOrHistoricalInformation). We will have to find out on what other properties language tags are used. |
+1 Did some adjustments to the context and I am satisfied for now. Will open issues for the other things. |
I don't think we need a separate beta/prod system yet, context is used from GitHub, so nothing to deploy, closing this. Opened #5 for indexing. |
Both dumps and updates (via OAI) are available as RDF-XML, so that would be a suitable source format:
http://datendienst.dnb.de/cgi-bin/mabit.pl?userID=opendata&pass=opendata&cmd=login
http://www.dnb.de/DE/Service/DigitaleDienste/OAI/oai_node.html (s. "Formate")
We should test serializing that RDF-XML as compact JSON-LD using the entityfacts context:
http://hub.culturegraph.org/entityfacts/context/v1/entityfacts.jsonld
http://hub.culturegraph.org/entityfacts/118540238
If the result looks good, this might be the format to index in Elasticsearch. We might have to do some preprocessing to make sure the values always have the same type (see footnote 1 in http://blog.lobid.org/2017/06/08/lobid-api-why-how.html about compact JSON-LD serialization in Elasticsearch).
The text was updated successfully, but these errors were encountered: