-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot get dimension values for reference area and period #55
Comments
For example:
|
This is problematic because dimension values are enums and some datasets contain many thousands of areas etc... So there are problems making this scale for the RootSchema as it is multiplied by each dataset. |
Ok, we've been discussing this some more, and I thought I'd write up a few notes here, with a bit more detail than we went in to on the call. Basically the current implementation is correct in this bevaiour because of a sensible compromise. Essentially because there are many 10's of 1000's of areas we represent areas as a string type, so there is no It's also worth noting that our current approach to schema generation is that types are generated from the DSD for each dataset; so you'll see that for genders on scotland every dataset with a gender dimension has its own graphql schema type for it: This is worth highlighting as it means that if we were to generate enum types for areas with the existing model the schema would explode to an almost unusable size, as every dataset would have its own copy of the areas. I've opened a new issue #60 about sharing distinct schema types across datasets to look into this. |
I see the point about having an area enum per dataset is impractical - sharing codelists cross datasets as per issue 60 sounds promising. I don't really understand your comment above about areas being a 'string type' - i.e. I'm not sure what that means to a user in terms of (a) discovering the possible values and (b) selecting observations by fixing the refArea dimension to a particular value (or list of possible values). Could you elaborate on that? Or is that question not relevant if we can solve issue 60 re codelist/enum re-use and treat refArea as an enum? |
Sure, if the type of something is If the type is an Enum, all the possible values are enumerated in the schema / type information. However your question has made we realise that we somewhat have our wires crossed. As graphql has its own reflective capabilities via To explain a bit more, a query like this graphql {
"name": "gender",
"description": "Gender",
"type": {
"name": "dataset_births_gender_type",
"kind": "ENUM",
"enumValues": [
{
"description": null,
"name": "MALE"
},
{
"description": null,
"name": "FEMALE"
},
{
"description": null,
"name": "ALL"
}
],
"description": null
} The So to answer For refPeriods we could do the same, but would need a good way to turn them into valid graphql enum syntax. I think part of #40 would need be to define something like a So I think solving #60 and #40 will effectively let us solve this. Does that make sense and answer your question @BillSwirrl? |
I should also point out that for If the type of dimension value is just a However in answering |
Ok proposal for this is that we support querying both styles of dimension values like this: {
dataset_births{
dimensions {
... on Dimension {
uri
values {
uri
label
}
}
... on EnumDimension {
enum_name
values {
enum_name
}
}
}
}
} With types/interfaces looking something like this (includes basic ideas for refArea): interface Resource {
uri: ID!
label: String!
}
interface Dimension {
uri: ID!
label: String!
}
interface DimensionValue {
uri: ID!
label: String!
}
type DefaultDimension implements Dimension {
uri: ID!
label: String!
values: [DimensionValue]
}
type DefaultDimensionValue implements DimensionValue {
uri: ID!
label: String!
}
type EnumDimensionValue implements DimensionValue {
uri: ID!
label: String!
enum_name: String!
}
type EmumDimension implements Dimension {
values: [EnumDimensionValue]
}
type HierarchicalValue implements DimensionValue { # i.e. could be a RefAreaValue
uri: ID!
label: String!
children: [DimensionValue] # NOTE you can't have recursive datatypes in graphql :-( but we could potentially improve later by generating more specific types for each area level etc...
}
type HierarchicalDimension implements Dimension { # i.e. could be a RefAreaDimension
values: [HierarchicalValue]
} NOTE: for this part of the schema that the basic "out of the box" |
@RickMoynihan this looks good. Just a question. Why should we hardcode the The only difference I see is I think it is beter to generalize this to |
Agree on the generalisation aspect, I had the same thoughts when writing the example, but chose to describe it concretely to try and make it clearer. Will edit snippet above & rename them though to what you suggest. I should also say I think my proposal is still pretty minimal in functionality, and I think the limitations of the model above for HierarchicalDimension's and the lack of recursive datatypes might not be good enough for what we actually want. I think solving these problems essentially involve us abstracting over Types themselves with a CubiQL notion of Kinds. i.e. CubiQL would recursively create all the types necessary to represent each level of hierarchy in the graphql schema, essentially working around GraphQL's lack of recursive data types. So in CubiQL We could probably get something like the schema in my comment above working in a week or two; but I think doing something more complete will require a lot more detailed spec work to figure out the limitations/vocabs/schemas we require. |
What we have found in our 'features of interest' approach in PublishMyData is that a strict hierarchy for geographical data (or organisations etc) is too restrictive, because we might want different hierarchies for different datasets. Also, even for a single tree-structure hierarchy, it can be convenient to jump levels in the tree. A common requirement is to get all the data zones in a council area, and it's useful to get those directly. So we want council area --> data zone Not: council area --> ward --> data zone (note those are two different hierarchical relationships as intermediate zones don't nest inside wards in Scottish geography). Also when we start mixing in data about hospitals or schools or job centres, we might want to know which hospitals are in a council area. The approach we've taken in PublishMyData is that a feature of interest (area, organisation, etc) can be a The 'within' relationship could be generalised to other kinds of relationships between items in codelists, for example a medical treatment might be 'offeredBy' a hospital. I think this basic data model is generic enough that it could work with everyone's data so could be appropriate to use in CubiQL. But we'll need to define and document the specific triples we expect, and data publishers will have to augment their codelists with the collection and relationship data |
The main motivation for this has been fixed; but we should consider further refinements as part of a new issue #81. |
The result contains an empty list for those two dimensions
The text was updated successfully, but these errors were encountered: