-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Expressions] Unifying datatable
, kibana_datatable
, and lens_multitable
#68457
Comments
Pinging @elastic/kibana-app-arch (Team:AppArch) |
i agree. we should standardize the meta information added to the columns and then have all functions use just a single correct me if i am wrong, but also, so currently we have this on
what do you think of changing it into:
we should then also update the utilities like:
|
@ppisljar If I understand your comment, you are talking about changing the shape of both
|
Maybe a more specific question for @ppisljar or @streamich: Would it be reasonable for Lens to write an expression that sets each table to a variable, something like:
|
yes @wylieconlon i think that could be reasonable. another option as you mentioned would be using a type that supports multiple tables. and yes, i want to change formats of all the datatables, or more specifically:
so the goal of this issue should be to figure out a minimal set of required meta information we need to attach to the datatable to address the needs in canvas, lens and visualize |
I think this is a nice approach @wylieconlon . It's basically turning the expression into an imperative language. Each Another option (which I think is relatively similar but maybe makes the syntax a little cleaner) is to leverage the "context" passed along by keeping a "multitable" data structure around and simply providing an If we don't do anything to the tables, we can just pass them via expression (like it's happening right now in Lens):
The difference becomes obvious if tables are altered along the way. With the variables approach it would look like this:
With the multi table which can be passed along, the variables can be omitted:
I might miss something, but IMHO a multitable data structure and an
|
@flash1293 I finally got a chance to read this and respond to it. While I agree about many of the criteria you've listed, I don't share the preference to keep using a multitable, because i think that there is a "cleaner" way to pass multiple tables into expression functions. I do share your interest in keeping a clean syntax using expression chaining, so here is my updated proposal:
Putting this together into the same examples as you've given:
I am proposing that we modify all of the existing table manipulation functions to support reading and writing from variables, and to update the Lens expression functions to accept variable names instead of passing via context. The main benefit of the approach I'm proposing here is that we don't need to write a new set of functions like |
This seems like it's trading off the overhead of maintaining a separate multi table data structure and utility functions with pulling the variable logic into all table related functions. I don't have a strong preference in either direction - the maintenance effort seems roughly identical to me. Maybe someone from the @elastic/kibana-canvas team wants to weigh in whether extending the table manipulation functions with variable getters like this is something that fits in the greater scope. In short: Sounds good to me as well. |
regarding variables: no function should be allowed to read/write variables. only expression parser can do that (special var_set and var functions). that doesn't negate anything in the above suggestion however, only syntax would change a bit, but imo in a better (more flexible) way:
however i think we will need a multitable type. i am working on the |
@ppisljar If this is just about |
The main reason I wanted to avoid a multitable is because I want to avoid an explosion of table manipulation functions: it is important to me that we keep the standard library of table manipulation functions small, without creating a separate function for |
@wylieconlon Why do you think there would be an explosion of functions? Using something like |
@flash1293 the part I liked about @ppisljar's proposal is that we would have an automatic converter for multitables: this would hopefully make it so all existing expressions written by Canvas users will continue to work. Basically I think the function you've proposed makes it so that our users need to know more about our internals than I think they should. Also @ppisljar you've asked for a set of requirements for the metadata: we already had a separate issue for it here #65262 |
@wylieconlon I like the auto-converting approach and we should go forward with it but we still would need something like I don't have a strong opinion here because I think the various approaches are very close to each other in terms of usability. Can we commit to one? |
auto converting will only get you so far, i agree with wylie that multitable can be a problem for example (lets assume for a moment esdsl returns multitable):
any table processing will degrade that multitable into single table so xy chart no longer gets all the tables. what about going in this direction: so no multitable but lense accepting tables as arguments (instead or next to input) actually, how exactly do you use multitable in lens today ? something like also do you need multitable for your other chart types like pie ? @wylieconlon is getting aggs and docs together so bad that we should not support it for any case ? and what i am most worried about that not supporting it will most likely mean we still query for it but just ignore the results, unless additional steps are taken to make sure that's not the case. |
This sounds like Wylies approach to me and I'm fine with it. Restructuring lens shouldn't be too complicated.
Yes, exactly.
Not right now, only xy chart is supporting multiple layers.
@ppisljar I assume this question is directed at me because I brought up that point. It's not a big deal to get aggs and docs together, but as it's not best practice anyway I don't see a good reason we have to support it if it is causing an overhead elsewhere (like the introduction of a whole separate data structure). If a user absolutely wants to do this, they can do two separate
In my POC the esdsl function actually looked like this: |
i prefer to leave this really a
|
@ppisljar +1 for solving that as part of the |
Have spent some time talking this over with @ppisljar. We have a couple updates: First, there's a PR open for the Second, we have a proposed datatable structure that we think might solve everyone's needs... but please leave your thoughts and help us poke holes in it 🙂 This was our strategy with dealing with metadata:
More details in the comments below... interface Datatable {
meta: DatatableMeta;
columns: DatatableColumn[];
rows: DatatableRow[];
}
interface DatatableMeta {
// Name of expression function which retrieved data
type: string;
// Kibana index pattern ID, ES index pattern, or table name
source: string;
}
interface DatatableRow {
// `key` is a `DatatableColumn.name`
[key: string]: unknown;
}
interface DatatableColumn {
name: string; // this is basically the ID
meta: DatatableColumnMeta;
}
interface DatatableColumnMeta {
// the agg type or boolean, string, etc
type: string;
// name of the field from the data that this column represents
field: string;
// arbitrary params which are function-specific but must be serializable
params: SerializableState | {};
}
// Helper for getting a field format
type getFieldFormatForDatatableColumn = (datatable: Datatable, column: string | number) => FieldFormat; Examplesesaggsconst esaggsTable: Datatable = { meta: { type: 'esaggs', source: '90943e30-9a47-11e8-b64d-95841ca0b247', }, columns: [ { name: 'response.keyword: Descending', meta: { type: 'count', field: 'response.keyword', // for esaggs, params are basically the serialized AggConfig params: { size: 5, order: 'desc', orderBy: '1', otherBucket: false, missingBucket: false }, } }, { name: 'Average bytes', meta: { type: 'avg', field: 'bytes', params: {}, } }, ], rows: [{ 'response.keyword: Descending': '200', 'Average bytes': 5926.317024128686 }], }; essqlconst essql: Datatable = { meta: { type: 'essql', source: 'kibana_sample_data_logs', }, columns: [ { name: 'response.keyword: Descending', meta: { type: 'string', field: 'response.keyword', params: {}, } }, { name: 'Average bytes', meta: { type: 'number', field: 'bytes', params: {}, } }, ], rows: [{ 'response.keyword: Descending': '200', 'Average bytes': 5926.317024128686 }], }; esdslconst esdsl: Datatable = { meta: { type: 'esdsl', source: 'kibana_sample_data_logs', }, columns: [ { name: 'response.keyword: Descending', meta: { type: 'string', field: 'response.keyword', params: {}, } }, { name: 'Average bytes', meta: { type: 'number', field: 'bytes', params: {}, } }, ], rows: [{ 'response.keyword: Descending': '200', 'Average bytes': 5926.317024128686 }], }; One notable change to point out is that we didn't put an Another idea here (which would presumably be a much larger change) would be to use nested arrays as our rows, instead of the arrays of objects with IDs we currently have: interface Datatable {
meta: DatatableMeta;
columns: DatatableColumn[];
- rows: DatatableRow[];
+ rows: unknown[][];
} |
Thanks for the great summary, @lukeelmers ! I got one question, one nit and one concern. The question: the column The nit: About the The concern: For the Lens integration we currently rely on predictable column naming ( Using just the predictable ids like Seems like Visualize has the same problem here, how are you planning to map these? My suggestion would be making the interface DatatableColumn {
name: string; // ID like col-1-1
meta: DatatableColumnMeta;
}
interface DatatableColumnMeta {
// ...
// human readable label if the datasource can provide it
label?: string;
} |
Thanks @flash1293 -- these are all great points.
That's a fair question & I agree the meaning is a bit loosely defined. I think we were starting from thinking about how I think your proposal of making this a finite list makes sense. We could make this work with enum KBN_FIELD_TYPES {
_SOURCE = '_source',
ATTACHMENT = 'attachment',
BOOLEAN = 'boolean',
DATE = 'date',
GEO_POINT = 'geo_point',
GEO_SHAPE = 'geo_shape',
IP = 'ip',
MURMUR3 = 'murmur3',
NUMBER = 'number',
STRING = 'string',
UNKNOWN = 'unknown',
CONFLICT = 'conflict',
OBJECT = 'object',
NESTED = 'nested',
HISTOGRAM = 'histogram',
}
That's a good point; I hadn't considered how field may not always be applicable. The original intent was to have these meta keys be as predictable as possible, i.e. they are all required, but perhaps making them optional is more sensible in this case.
Thanks for pointing this out; we had assumed that nobody was relying on predictable naming like this, which is why we excluded the old way of doing column IDs to simplify. However we can definitely add the Revised interfaces, lmk what you think: interface Datatable {
meta: DatatableMeta;
columns: DatatableColumn[];
rows: DatatableRow[];
}
interface DatatableMeta {
type: string; // expression function which retrieved data
source: string; // index pattern ID, ES index pattern, or table name
}
interface DatatableRow {
[key: string]: unknown; // `key` is a `DatatableColumn['id']`
}
interface DatatableColumn {
id: string; // predictable ID like `col-1-1`
name: string; // human-readable name
meta: DatatableColumnMeta;
}
interface DatatableColumnMeta {
type: 'unknown' | 'string' | 'object' | 'date' | 'ip' | etc; // union of KBN_FIELD_TYPES values
field?: string; // if column data represents a field, the name goes here
params: SerializableState | {}; // arbitrary params which are function-specific
} |
Hi @lukeelmers , the revised interfaces look great to me, thanks. I don't know how difficult it is for canvas to provide a label along with an id, so it would also be fine with making |
in the worst case we can always go with name=id |
@ppisljar Currently in Lens we rely on both name and id (human readable name and predictable Edit: I think I misunderstood what your comment is about, if you mean to set name to the same value as id if there is no separate human readable label, then please disregard. |
started looking into essql and it seems its not possible to provide index nor field names .... |
I think it's very hard to do that reliably considering how much you can do with a grammar like SQL (even within the tight boundaries of ESSQL). When looking into this in my space time I imagined to have a separate argument to the function (or maybe a separate expression function) to manually specify the fields to make filters work. This means:
We could put a step in between those two, but I'm not sure how valuable it is:
|
I think we have a good idea how to unify The latest state of the discussion is @wylieconlon s proposal:
As this has an impact on the "standard library" of expression functions, I would like to bring it up again. In short, besides getting passed in the table to work with via context, all table manipulation functions would get a separate |
i think we should rather go with something like this:
which doesn't allow changing all datatable manipulation functions. I am worried about that due to
|
actually, what you should do in your above example is probably:
(simpler syntax, but achieves the same as my first suggestion). so at the moment i don't see a need for introducing this |
@wylieconlon Do you see any problems with @ppisljar s suggestion? To me it looks like this is a nice middle ground which still gives us everything we need without doing "weird" changes to the base functions |
@wylieconlon @ppisljar @flash1293 Canvas exports workpads in JSON format. I don't see how we can change from EDIT: Chatted with @flash1293 -- as long as these changes don't break old workpads and allow Canvas users to use tables as they have throughout 7.x, I don't see an issue. I'll do a sanity-check with the rest of @elastic/kibana-canvas to be certain. |
Yes, this looks like it will give us everything we need. It also extends to future uses such as joins, which I would imagine looking like this:
The main change here is that we are expanding the use of variables, and having the Lens render function ignore the context and instead require variable names. This keeps the scope of the change small for other apps like Canvas. |
I branched out the removal of Lens multi table into #74643 @lukeelmers can we close this dicussion issue or will you use it to track changes in the table format? |
agreed on this interface:
|
as table can be result of multiple functions (when merging tables etc) we shouldn't have any metadata on the table itself but rather all the relevant info should be kept on the column. the new proposal:
|
For Lens this would work as well. There might be some strange edge cases, but it covers a lot of legit cases we would have run into anyway at some point. |
Lens, Canvas, and Visualize all have different table implementations, which makes it difficult to define a standard library of table manipulation functions. For example, I would like to use the
mapColumn
function in Lens, but I can't due to its signature.Comparison of the different tables:
essql
type
property on each columnesaggs
, contains more information like formformatHint
andindexPatternId
, as well asaggConfigParams
- these are all used by Lenskibana_datatable
s in this structure, which lets us combine data into one visualizationRecord<string, KibanaDatatable>
representing LayersTo highlight some of the differences that might be fixable:
lens_multitable
approach can be replaced by a different approach. Could we use Variables instead?So as a starting point of discussion, I think there are two potential solutions to this:
a) Combine
datatable
andkibana_datatable
into a single representation. Lens can use some solution to combine all the tables that doesn't involve a custom wrapper.b) Change the signature of the table manipulation functions to accept any input and provide the same type of output
Would appreciate feedback from @ppisljar and @lukeelmers
The text was updated successfully, but these errors were encountered: