-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support new Aggregation DataStore #804
Comments
I think this would be a great addition to Elide. Just some questions/comments below. Minor document fixup: I think the JPQL model should be using
It is unclear to me how this would work for JSON-API. If I am fetching an object, I can't necessarily invoke the One way I can think of is by possibly encoding this information in the id parameter. This feels like a pretty difficult-to-use interface to me, though.
Should this be a separate design/write-up?
Since these tables are read-only, the The discovery aspect of this document is interesting in that it's a small shift from what we have today. Previously, Elide had a uniform policy on how to interact with data (exception: certain computed attributes) and this was defined by which API-format you used (GraphQL vs. JSON-API). This adds some additional power and I wonder if the discovery mechanism needs to be very bold on how it's called out. Just a thought: no real action item here.
To clarify: as part of this work we would implement the bridgeable store on any official store which does not yet support it, correct? |
Fixed.
Correct
Yes. We don't see this as an MVP feature for the
👍
Can you elaborate on what you mean here?
Correct. |
I think we need a section on date handling. Specifically how do you |
@jkusa Since there is no guarantee the table even has a time dimension, I think the grain of the table should be baked into the table itself. This means there would be separate tables for hour, day, month, etc for a fact table with a time grain. Does Navi need to know the time grain of the table? |
It looks like to me that SQL already support SUM, COUNT, etc. Are we talking about having Elide to just support these syntax? Because I am not sure why we need a new |
I'm just calling out that call this API will now deviate out-of-the box. The id structure, reading/writing data, etc. While it's still well-formed (and to the remaining spec's) it's a restricted set of operations. While the distinction feels quite apparent to me, I wonder if new users will struggle with this if we don't document it thoroughly. |
That is correct, however aggregating across various time grains is a basic feature in any analytics/reporting tool. The datastore can be aware of the presence of time based columns. This could be done with an annotation such as In the case there is a Date/Time field present, a user should be able
At a minimum, it needs to know if there is a Date/Time field present in the table. An optimization would be to communicate to Navi what time grains are supported. For instance, if the table is pre-aggregated at the day level, don't allow the option for hourly grouping and filtering |
How would this work if a set of filter values no longer return any rows from the data source? |
In case it's not well understood, time grain is a computed dimension on a dateTime attribute. Which means if we did have a special |
We definitely need some sort of field seperator so that explicitly null values can be baked into the key as emptystrings and there's no ambiguity if FooBar is Foo, Bar, '' or '', Foo, Bar or Foo, '', Bar |
How do null joins work now? |
Overview
Elide would benefit from a new Aggregation
DataStore
that can support:Group By
clauses over model attributes and relationshipsHaving
clauses which filter on aggregated attributes.Models managed by this store would have the following restrictions:
toOne
relationships to enforce a star (or snowflake) schema.One goal of such a
DataStore
is to add seamless integration between Elide and Navi.Example Model
Here is an example model:
Example API Call
JSON-API lacks some of the flexibility of GraphQL (attributes with parameters). While most features in this RFC will work with JSON-API, some aspects will only work with GraphQL.
GraphQL
Request
Response
JSON-API
JPQL Generation
The example API queries would generate JPQL similar to:
SELECT MAX(models_PlayerStats.highScore), models_PlayerStats.country.id FROM models_PlayerStats GROUP BY models_PlayerStats.country.id HAVING SUM(models_PlayerStats.highScore) > 300
Model Bindings
The intent is that the JPA Models will be bound to either existing SQL tables or views that are created to project the desired fact tables.
Metric Definition
Metric columns are any column annotated with a new DataStore annotation
MetricAggregation
. This annotation takes two arguments:Client Supplied Aggregation Function
For GraphQL only, we can support arguments for the metric attributes allowing the client to pick the aggregation function:
Supported aggregations can be specified in the model:
Metric classes can be registered with the
AggregationDataStore
. They are responsible for generating a JPQL query fragment that wraps the column name.The return string is actually a template where the text '{}' is substituted with the annotated column name.
Dimension Definition
Dimension columns are any relationship or alternatively any attribute without a
MetricAggregation
annotation.Date Grain Specification
The
AggregationDataStore
will inject a grain attribute argument into the GraphQL schema for any:Time grains will be implementations of
JPQLWrapper
that are used to wrap the date columns with custom JPQL. Given that date truncation will be SQL dialect specific, the wrappers should leverage the FUNCTION capability in JPQL.Group By
The data store will generate a
Group By
clause for any dimension column the client retrieves in the query. If no dimension columns are retrieved, noGroup By
clause is generated.Dimension Attribute Grouping
Grouping by a dimension attribute (country.region) will not be supported initially. However, it could be supported by a new GraphQL parameter at the relationship level (
groupBy
):The groupBy parameter can support a '.' separated path through the entity relationship graph (provided that all relationships in the path are to-one relationships).
Having
The data store will generate a
Having
clause for any client supplied filter for a metric column.Id Generation
A row identifier will be generated that is unique per query row.
It will not be possible to fetch data through this ID.
Metadata Discovery
Metadata will be discovered through tables defined by this issue.
The text was updated successfully, but these errors were encountered: