-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi lingual dataset support #6
Comments
I agree. At OGI there are multi-lingual datasets. We may consider using JSON-LD (@language) to express the language used |
I'm no longer sure we can use JSONLD, but am curious about the requirements for multiple languages. For example would a multilingual client want to list all labels in all languages? Or should it only ever get back a single requested (or default) language? e.g. you could imagine changing the language at the outermost field for the whole subtree e.g.: {
datasets(language:"fr") {
title
dimensions {
values {
label
}
}
}
} Obviously we could also let you query for what languages are currently in the system, e.g. {
languages {
country_code
}
} Other alternatives are to expand every string field into two sub fields of |
I think a single requested or default language is enough. So something like "datasets(language:"fr"){..}; is ok. It is preferable to get the available languages for a specific dataset not for the whole system because different datasets may have different available languages. e.g.
|
👍 |
Some issues related to the language:
|
Specifically the current problem with language strings is that they cause exceptions during schema generation by failing the following spec (from issue #53):
@zeginis I think it would be desirable to keep the graphql schema simple here and avoid having to represent multiple languages in the schema at this stage, i.e. we should avoid doing things like this for every label/title: {
title {
title # the real title string
language
}
} i.e. I think I'd rather keep the schema for labels flat like this: {
title
} This will probably mean in the cases of multiple languages setting a default to use everywhere throughout the API; we could potentially allow toggling the default at the top of the query. @zeginis Does that sound like an acceptable compromise? Limitation is that within a single request you'll not be able to see things like the title for a dataset in english and greek. |
It is ok to define the language at the top of the query and thus get results only in one language |
One other question @zeginis, would it be acceptable to not let you set this at the top of the query; but to supply it as a configuration option to the server itself? i.e. no schema representation at all? |
@RickMoynihan this solution is not applicable at OGI since we will have cubes from many pilots at the same server that will have labels in different languages e.g. Greek, English. So it is preferable to define the language at the top of the query. Any idea how to do this? |
It's not currently supported; if you're asking about how I think it should be implemented, then I'd suggest:
i.e. we would probably have to change it to do this, so {
cubiql(lang_preference: "gr") {
datasets {
title
description
}
}
In terms of implementation I don't think there is a good way to express this priority on labels in SPARQL in a performant and simple enough way. So I think the best way to implement this is to make sure we implement all these queries as Is something like this what you were thinking of implementing? |
Yes this is what I was thinking to implement. I realize that it is not as simple as I expected. Do you think there is a way to temporarily overcome the exceptions (#88) caused by the language tags even if we do not fully support filtering by language? |
That's a good question @zeginis. I suspect it's a pretty trivial fix to make that specific error go away, as it's probably not much more than calling However there's still the expectation that there's only ONE value for a lot of these fields. So this would likely only really work for string properties with a cardinality of 1; as to retain the schema you'll need to pick just one string; and then you're into the territory of the above suggestion. I could be wrong but I'm not sure this hacky solution is worth doing, because you either need to implement the prioritisation logic above, or return a random string (unnacceptable as datasets would render with mixed languages), or hack your data so you only ever have one string for these fields (either an The only counter-argument I can see to this (in support of implementing the Practically speaking though, I'm not sure this correctness argument holds much weight though as you'll still need to hack your data to guarantee it works... it's just the hack is a tiny bit less hacky. |
@RickMoynihan any update on this? Are you going to fix this or we should go on with the "quick fix" option -> call the str on the language tagged string before returning it ? |
Issue #6 - Move all dataset queries under a new top-level cubiql query used to set global parameters on the contained queries. Datasets are now fields on the qb type returned from the cubiql query.
RDF supports lang strings, and there's a possibility of multi-lingual datasets.
We may want to add support for this as part of OGI.
The text was updated successfully, but these errors were encountered: