-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allow named types in unions #469
Conversation
The Avro specification defines: > Unions may not contain more than one schema with the same type, except for the named types record, fixed and enum see https://avro.apache.org/docs/1.11.1/specification/#unions meaning that if a `record` type has a name, it is valid to mix multiple of them in a union, unwrapped.
Closing this for now, I think it needs an additional fix in the bucket selection for Line 1261 in 7cb76a6
|
For my current project's, very specific use-case (avro schemas generated from an openapi definition and then the data queried via a generated client as well, so the class names match) this version works: function getValueBucket(val) {
if (val === null) {
return 'null';
}
var bucket = typeof val;
if (bucket === 'object') {
// Could be bytes, fixed, array, map, or record.
if (Array.isArray(val)) {
return 'array';
} else if (Buffer.isBuffer(val)) {
return 'buffer';
}
if (val.constructor.name) {
return 'model.' + val.constructor.name;
};
}
return bucket;
} however if (val.constructor.name) {
return 'model.' + val.constructor.name;
}; doesn't transfer to raw JSON. The current namespace would need to be passed into the Generally, there is a discriminator (an attribute named |
Reopened this with the proper implementation respecting the namespace and inferred arrays holding objects. Would be great to get some feedback on this and see if this is the right direction. If yes, will add tests of course. All current tests pass with this change. |
Unions of named types are already supported using wrapped unions. Can you share some context on your use-case to understand why they aren't a good fit? Note also that you can already force an unwrapped representation for arbitrary unions using logical types (see here, and the linked comment). |
Of course. I am using this API from Affinity CRM (OpenAPI spec at the top or a current extract here) to generate a Typescript client with openapi-generator/typescript. You can see the implementation of it here. Our current Affinity CRM instance has multiple 10 thousand items in it of the I am using openapi-generator/avro-schema to generate an Avro schema for the same data. I then read the data in batches of 100 from the API and stream them directly through I want the data in Snowflake to be:
I also want to:
Does that make sense?
Yes, I am currently using |
Thank you for the context. This change will not help you since you are uploading Avro-encoded files to Snowflake. All unions have the same Avro encoding, independent of their in-memory representation. You will have to check on the Snowflake side to see how they represent the data. W.r.t. the logic, a major concern is that it only works with values with a named constructor. No other values currently have this requirement. |
with
Yes, hence the guard. |
It's not sufficient unfortunately. It would break many users of unwrapped unions, for example the common nullable case: const type = avro.Type.forSchema([
'null',
{type: 'record', name: 'test.Foo', fields: [{name: 'id', type: 'string'}]},
]);
type.toBuffer({id: 'abc'}); // Not valid anymore (We should add a test for this.) Another issue is that the library should be self-contained: any constructors required for use with unions should be included in--or generated by--the library. Record constructors are already generated but meant to be opt-in. They also aren't compatible with this implementation, which means that certain values don't roundtrip anymore: const val = type.fromBuffer(Buffer.from('0206616263', 'hex')); // Generated record instance
type.toBuffer(val); // Throws For this feature to be worth adding, I think it needs to allow arbitrary record representations. Even if we assume the use of constructors to disambiguate branches, there is no obvious single way to name them. Another user would be justified in requesting that their own records, named slightly differently, also be supported. Here is one way we could achieve this:
This single function gives users full flexibility to decide how to disambiguate unions. In your case, it could be a simple constructor check using the naming convention from the PR. Other users would be free to use their own constructor naming convention, or something else entirely. WDYT? |
Okay, I get your point, thank you for elaborating. Reading the constructor name is definitely not a great implementation option anyway.
I like it. This is what we would use instead of the result from
I can do this when I open the pull request for the above? |
I believe the above implementation suggestion can also solve #225 (comment) - you can simply return the discriminator of the union to ensure there is no wrapping needed. (cc @hath995 - might be a bit late for you possibly) |
That's right.
A separate tiny PR adding the test would be best. |
|
This reverts commit 3d9efd5.
@mtth PTAL |
I wasn't going to - given it's not a fix, I think it could reasonably be expected to upgrade if the feature is needed? Do you think we need to? |
Thanks @joscha! |
I was checking in case you needed it to be backported, to use before |
@mtth are you able to publish a new 6.x alpha version which includes this change by any chance, please? |
@joscha - done: (Consider pinning, there will be breaking changes in later alpha releases.) |
Was broken via mtth#469
Was broken via #469. Without this fix, the error message is `undefined`.
The Avro specification defines:
see https://avro.apache.org/docs/1.11.1/specification/#unions
meaning that if a
record
type has a name, it is valid to mix multiple of them in a union, unwrapped.The same change should possibly be made for enums as well.