-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for multiple data types and schemas in Kamelets #1980
Comments
Maybe we can do something like this: steps:
- marshal: "{{format}}" This should be a reference to a in/out schema, the operator can then create the properties to configure the data format via properties |
I like this idea and would add the possibility for:
|
Yes, this would avoid having to write the implementation at runtime side, leaving also field to the user to implement custom transformations in the flow. |
Yeah, these are concerns we need to address now as well. The I think it's a good time to deprecate For the "schema in header", I think it's a good idea for sources. We can make sure the operator passes the location of the schema in a configuration property and, in case the schema is inline, it also mount it as a file in the pod, so that the header can always be an URL. The Kamelet runtime may also bind that property into a header. The destination (or an intermediate step) can the use that URL to do stuff. Wdyt @lburgazzoli ? |
Let me have the layman in me try summarising the discussion (at least for my brain to wrap this up 🤯):
Is my understanding correct? |
Maybe we can use "schemes" instead.
I think we can improve data formats in general, as example:
We can then define a specific schema like: avro:
media-type: application/avro
schema:
# the avro schema inline|reference
data-format:
# optional, if not provided use the scheme id
id: "avro"
properties:
class-name: org.apache.camel.xxx.MyClass
compute-schema: true|false
...
dependencies:
- camel-avro
- mvn:org.acme/my-artifact/1.0.0
|
looks correct :) |
+1 then 😄! |
Thinking a little bit more, wonder if this new schema/data-format thing is something we can define as a dedicated custom resource (which we can eventually embed in the kamelet) but could also be something we could use for the dynamic computation of schemes. Eventually, schema registries can watch and duck type those resource to automatically load them. |
Interesting ideas. I like the concept and I think we should understand if/how is possible to externalize such data formats. If the data formats are an external entity they could be reusable and keep the |
Yeah good idea to have support for multiple data types, especially when its common on kafka land to have avro, json, types. For kamelets it would also be good if we could generate documentation (ascii doc files) to use for the website / kamelet repository. And in that documentation we can then easily grab the data types and prominently show in the docs what types are supported. |
Btw do we have any thoughts on schema-less kamelets? For example if you just use a kamelet to route data from one messaging system to another between queues. And dont really want/need to specify any schema, as the data is just "raw". |
Yep schema is not always required and to be honest for Camel it may not even needed (except for some components like kafka) so it is mainly a tooling related information. |
Let's do another iteration on this... I'm thinking to your comments and I like the idea of having stuff also as CRs. I remember some brainstorming with @lburgazzoli about how dynamic schemas may work in this model. The idea was to let Kamelets define their schemas, if known in advance, but also let KameletBindings redefine them, if needed. DataFormats are generic in Camel, but when talking about connectors (a.k.a. Kamelets), I think it's better for the Kamelet to enumerate all the possible dataformats it supports. E.g. @davsclaus was talking about sources that can only produce I also see that we're talking about formats and schemas as if they were the same thing, but even if they are related (i.e. dataFormat + Kamelet [+ Binding Properties] may imply a Schema), maybe we can do a better job in treating them as separate entities. I think the following model may be good for the in-Kamelet specification of a "format": kind: Kamelet
apiVersion: camel.apache.org/v1alpha1
metadata:
name: chuck-source
# ...
spec:
definition:
properties:
format:
title: Format
type: string
enum:
- JSON
- Avro
default: JSON
# ...
formats:
- name: JSON
# optional, useful in case of in/out Kamelets
scope: out
schema:
mediaType: "application/json"
data: # the JSON schema inline
url: # alternative link to the shema
ref: # alternative Kubernetes reference to the schema (see below)
name: # ...
# the source produces JSON by default, no libs or transformations needed
- name: Avro
schema:
type: avro-schema
mediaType: "application/avro"
data: # the avro schema inline
url: # alternative link to the schema
ref: # alternative Kubernetes reference to the schema (see below)
name: # ...
dataFormat:
# optional, but if not provided "no format" is assumed
id: "avro"
properties: # only if "id" is present
class-name: org.apache.camel.xxx.MyClass
compute-schema: true|false
# ...
dependencies:
- camel:jackson
- camel:avro
- mvn:org.acme/my-artifact/1.0.0
You can notice the The kind: Schema
apiVersion: camel.apache.org/v1alpha1
metadata:
name: my-avro-schema
spec:
type: avro-schema
mediaType: application/avro
data: # the avro schema inline
url: # alternative URL reference
# no, ref is forbidden here Structure is almost the same as the inline version. The binding can use the predefined schema: kind: KameletBinding
apiVersion: camel.apache.org/v1alpha1
metadata:
name: chuck-to-channel
spec:
source:
kind: Kamelet
apiVersion: camel.apache.org/v1alpha1
name: chuck-source
properties:
# may have been omitted, since it's the default
format: JSON
sink:
# ... The binding above will produce objects in JSON format with the inline definition of the schema. The one below is using a custom schema: kind: KameletBinding
apiVersion: camel.apache.org/v1alpha1
metadata:
name: chuck-to-channel
spec:
source:
kind: Kamelet
apiVersion: camel.apache.org/v1alpha1
name: chuck-source
properties:
# since there's no inline format named "my-avro", it refers to the external one
format: Avro
schema:
# since it's a source, we assume this is the schema of the output
ref:
name: my-avro-schema
# or alternatively also inline
data: #...
url: # ...
sink:
# ... This mechanism may be used also in cases where the schema can be computed dynamically before running the integration. In this case, an external entity saves the schema in a CR and references it in the KameletBinding. For the use case of using the Schema CR to sync external entities (like registries), it's possible, but we should think more about that because of edge cases: sometimes the schema is known only at runtime and sometimes it varies from message to message. In that cases, it's the integration itself that needs to update the registries. Probably it would be cleaner if it's the integration that always updates the registry. |
I think we could also have a case where we want the data format to automatically compute the schema i.e. from a pojo, so basically a formats whiteout the
Yep, we don't need to publish each schema up-front but for pre-computed scheme (either because they are known at runtime or because they are computed before running the integration), we should store them as CR so other can eventually consume them. |
I guess there may be some confusion from a user pov as you can define multiple in and multiple out schemes, how do we validate that ? Having an in/out formats separations would allow to define such semantic and validation, at CRD level. This may also work the other way around, if an external tool creates a CR with the schema, then camel-k can consume it without the need to generate it. But I agree, this is low priority. |
Yeah, the schema associated inline with the format was intended to be optional, present only if known in advance. When we think about sources I think there's no confusion: user chooses a The problem arises when you think to sinks: a Telegram sink may accept an image, a video, a text or a structured JSON. |
- Introduce data type converters - Add data type processor to auto convert exchange message from/to given data type - Let user choose which data type to use (via Kamelet property) - Add data type registry and annotation based loader to find data type implementations by component scheme and name Relates to CAMEL-18698 and apache/camel-k#1980
- Introduce data type converters - Add data type processor to auto convert exchange message from/to given data type - Let user choose which data type to use (via Kamelet property) - Add data type registry and annotation based loader to find data type implementations by component scheme and name Relates to CAMEL-18698 and apache/camel-k#1980
- Introduce data type converters - Add data type processor to auto convert exchange message from/to given data type - Let user choose which data type to use (via Kamelet property) - Add data type registry and annotation based loader to find data type implementations by component scheme and name Relates to CAMEL-18698 and apache/camel-k#1980
- Introduce data type converters - Add data type processor to auto convert exchange message from/to given data type - Let user choose which data type to use (via Kamelet property) - Add data type registry and annotation based loader to find data type implementations by component scheme and name Relates to CAMEL-18698 and apache/camel-k#1980
- Introduce data type converters - Add data type processor to auto convert exchange message from/to given data type - Let user choose which data type to use (via Kamelet property) - Add data type registry and annotation based loader to find data type implementations by component scheme and name Relates to CAMEL-18698 and apache/camel-k#1980
- Introduce data type converters - Add data type processor to auto convert exchange message from/to given data type - Let user choose which data type to use (via Kamelet property) - Add data type registry and annotation based loader to find data type implementations by component scheme and name Relates to CAMEL-18698 and apache/camel-k#1980
- Enable service discovery for data type converter resources in order to enable factory finder mechanism when resolving data type implementations - Add proper quarkus-maven-plugin build time properties for quarkus.camel.* properties - Fixes the way camel-quarkus build time properties are set (set properties on the quarkus-maven-plugin instead of using generic Maven system properties) - Explicitly add quarkus.camel.service.discovery.include-patterns for data type converter resources in order to enable lazy loading of Kamelets data type implementations Relates to apache#1980
- Enable service discovery for data type converter resources in order to enable factory finder mechanism when resolving data type implementations - Add proper quarkus-maven-plugin build time properties for quarkus.camel.* properties - Fixes the way camel-quarkus build time properties are set (set properties on the quarkus-maven-plugin instead of using generic Maven system properties) - Explicitly add quarkus.camel.service.discovery.include-patterns for data type converter resources in order to enable lazy loading of Kamelets data type implementations Relates to apache#1980
- Enable service discovery for data type converter resources in order to enable factory finder mechanism when resolving data type implementations - Add proper quarkus-maven-plugin build time properties for quarkus.camel.* properties - Fixes the way camel-quarkus build time properties are set (set properties on the quarkus-maven-plugin instead of using generic Maven system properties) - Explicitly add quarkus.camel.service.discovery.include-patterns for data type converter resources in order to enable lazy loading of Kamelets data type implementations Relates to apache#1980
- Enable service discovery for data type converter resources in order to enable factory finder mechanism when resolving data type implementations - Add proper quarkus-maven-plugin build time properties for quarkus.camel.* properties - Fixes the way camel-quarkus build time properties are set (set properties on the quarkus-maven-plugin instead of using generic Maven system properties) - Explicitly add quarkus.camel.service.discovery.include-patterns for data type converter resources in order to enable lazy loading of Kamelets data type implementations Relates to apache#1980
- Enable service discovery for data type converter resources in order to enable factory finder mechanism when resolving data type implementations - Add proper quarkus-maven-plugin build time properties for quarkus.camel.* properties - Fixes the way camel-quarkus build time properties are set (set properties on the quarkus-maven-plugin instead of using generic Maven system properties) - Explicitly add quarkus.camel.service.discovery.include-patterns for data type converter resources in order to enable lazy loading of Kamelets data type implementations Relates to apache#1980
- Enable service discovery for data type converter resources in order to enable factory finder mechanism when resolving data type implementations - Add proper quarkus-maven-plugin build time properties for quarkus.camel.* properties - Fixes the way camel-quarkus build time properties are set (set properties on the quarkus-maven-plugin instead of using generic Maven system properties) - Explicitly add quarkus.camel.service.discovery.include-patterns for data type converter resources in order to enable lazy loading of Kamelets data type implementations Relates to apache#1980
- Enable service discovery for data type converter resources in order to enable factory finder mechanism when resolving data type implementations - Add proper quarkus-maven-plugin build time properties for quarkus.camel.* properties - Fixes the way camel-quarkus build time properties are set (set properties on the quarkus-maven-plugin instead of using generic Maven system properties) - Explicitly add quarkus.camel.service.discovery.include-patterns for data type converter resources in order to enable lazy loading of Kamelets data type implementations Relates to apache#1980
- Enable service discovery for data type converter resources in order to enable factory finder mechanism when resolving data type implementations - Add proper quarkus-maven-plugin build time properties for quarkus.camel.* properties - Fixes the way camel-quarkus build time properties are set (set properties on the quarkus-maven-plugin instead of using generic Maven system properties) - Explicitly add quarkus.camel.service.discovery.include-patterns for data type converter resources in order to enable lazy loading of Kamelets data type implementations Relates to #1980
- Adds input/output/error data type spec to Kamelet CRD. The data type specifications provides additional information to the user such as what kind of input is required to use a Kamelet and what output is produced by the Kamelet. - The data type specifications can be used by tooling and data type converters to improve the overall usability of Kamelets
- Adds input/output/error data type spec to Kamelet CRD. The data type specifications provides additional information to the user such as what kind of input is required to use a Kamelet and what output is produced by the Kamelet. - The data type specifications can be used by tooling and data type converters to improve the overall usability of Kamelets - Deprecate former types field and the EventTypeSpec
- Adds input/output/error data type spec to Kamelet CRD. The data type specifications provides additional information to the user such as what kind of input is required to use a Kamelet and what output is produced by the Kamelet. - The data type specifications can be used by tooling and data type converters to improve the overall usability of Kamelets - Deprecate former types field and the EventTypeSpec
- Support data type reference in KameletBinding that automatically adds data type action Kamelet to the resulting integration template flow - Allow the user to specify the data types for output/input on Kamelet references in a binding - Camel K automatically adds respective steps (using the data-type-action Kamelet) in order to apply the data type conversion logic
- Support data type reference in KameletBinding that automatically adds data type action Kamelet to the resulting integration template flow - Allow the user to specify the data types for output/input on Kamelet references in a binding - Camel K automatically adds respective steps (using the data-type-action Kamelet) in order to apply the data type conversion logic
- Support data type reference in KameletBinding that automatically adds data type action Kamelet to the resulting integration template flow - Allow the user to specify the data types for output/input on Kamelet references in a binding - Camel K automatically adds respective steps (using the data-type-action Kamelet) in order to apply the data type conversion logic - Update YAKS 0.14.3
- Adds input/output/error data type spec to Kamelet CRD. The data type specifications provides additional information to the user such as what kind of input is required to use a Kamelet and what output is produced by the Kamelet. - The data type specifications can be used by tooling and data type converters to improve the overall usability of Kamelets - Deprecate former types field and the EventTypeSpec
- Support data type reference in KameletBinding that automatically adds data type action Kamelet to the resulting integration template flow - Allow the user to specify the data types for output/input on Kamelet references in a binding - Camel K automatically adds respective steps (using the data-type-action Kamelet) in order to apply the data type conversion logic - Update YAKS 0.14.3
Seeking for help on improving the Kamelet model before going full on the Kamelet catalog effort.
Currently the model expects that one declares the default input/output of a Kamelet in the
spec
->types
->in
/out
field, like:The same for Kamelets that consume an input, but the property is named
in
.The meaning of those types is simply stated:
out
in
That has unfortunately some drawbacks, one of which is that a Kamelet must have a single data type as output (for sources) and/or a single data type for input.
Many implementations of Kamelets that produce JSON data, in fact, have the following route snippet in the flow part so far:
So e.g., if we go full with the Kamelet catalog and add support for them in camel-kafka-connector, I expect soon to have a
salesforce-source-json
and asalesforce-source-avro
to overcome this limitation. But it's not ideal.I think we should allow a Kamelet to have a default input/output format, without forcing users to use that one: they may have choices.
I was thinking to something like this:
The
dataFormat
option tells the operator to automatically addcamel:jackson
and the marshalling step when the Kamelet is used in a KameletBinding.For
in
, this translates into:This should add the unmarshalling to a specific (optional) class.
In case we want this behavior to be common for KameletBinding and standard integration, this should be better implemented at runtime level.
Now the question is how to deal with the case of multiple input/output data types.
A possibility would be to add another level of description:
That would break a bit the current schema, but it will provide more options in the future.
Having the possibility to choose, a user can specify the format option in a KameletBinding (that we're going to reserve, like we did for id), to select an input/output format that is different from the default (maybe including
none
, to obtain the original data in advanced use cases).In case this should work also in the standard integration, we may use the following syntax:
From the operator side, the required libraries for Avro will be added, but the runtime should enhance the route with a loader/customizer.
Wdyt @lburgazzoli, @astefanutti , @davsclaus, @squakez
The text was updated successfully, but these errors were encountered: