Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Option to Differentiate Between Optional and Required Message Properties When Using the json_types Option #1062

Closed
anthonysessa opened this issue Jan 28, 2025 · 13 comments

Comments

@anthonysessa
Copy link

anthonysessa commented Jan 28, 2025

In certain scenarios, we prefer leveraging typings (e.g., the UserJson type) to reduce payload size when interacting with protobufs from the client. Including a single large schema can increase package sizes by over 20 KB—an avoidable overhead if the focus is limited to build-time type checking combined with type guards or zod schemas built around these types.

Currently, when generating JSON types, all fields are treated as optional. Would there be any resistance to introducing an option that preserves the distinction between optional and required fields as defined in the original protobuf schema? This approach would align with how the Message generic maintains optional vs. non-optional definitions when generating types.

Somewhat related to: #1058

@anthonysessa anthonysessa changed the title Option to differentiate between optional and required message properties when generating with json_types option. Add Option to Differentiate Between Optional and Required Message Properties When Using the json_types Option Jan 28, 2025
@rossipedia
Copy link

Ohmygosh please. I didn't notice that json_types effectively produces Partial<T> types, and it's proving to be quite the headache in our converstion to V2. At least for proto3 sources.

IMO, given the following proto:

syntax = "proto3";
package example;
message User {
  string first_name = 1;
}

Converting that message using toJson() should return the equivalent of the type:

type User { 
  firstName: string;
};

Marking every field as optional (using the ?) doesn't (IMO) accurately reflect the contract, especially when it comes to output messages:

const user = await client.getUser(id);
const userJson = toJson(UserSchema, user);

// this is typed as:   { firstName?: string | undefined }
// when it should be   { firstName: string } 
const n = userJson.firstName; 

For input messages, fromJson() should probably mark its own type parameter as Partial<T>, so that the type definition itself of UserJson can describe presence/absence.

If a field is marked optional in the proto3 definition, then it should be suffixed with a ? in the generated typedefs.

@timostamm
Copy link
Member

@rossipedia, this is not possible, unfortunately. From the manual:

The JSON type matches exactly what toJson() will emit with standard serialization options

If you serialize the message example.User to JSON, the result is {}, not {"firstName": ""}. So the generated type wouldn't match the actual output.

We did not come up with this, it's how Protobuf serializes to JSON in every implementation. The format is documented here.

The function fromJson() doesn't use the generated JSON type, because the JSON format requires that parsers accept many more variants, for example null, and strings for integers.

@timostamm
Copy link
Member

@anthonysessa, with "required" fields, do you mean proto2 required fields such as:

syntax = "proto2";

message User {
  required string first_name = 1;
}

or proto3 fields without the optional label, such as:

syntax = "proto3";

message User {
  string first_name = 1;
}

?

@anthonysessa
Copy link
Author

anthonysessa commented Feb 10, 2025

This is what I meant.

proto3 fields without the optional label...

I think its valid given the non json_types code generation makes the optional label an optional type, and if there is no optional label, it makes it a required type.

@anthonysessa
Copy link
Author

After further reviewing the documentation, it’s clear that we need to adhere to the standard toJson and fromJson methods. My initial suggestion to use json_types may not be the right approach.

That said, it’s crucial to ensure that optional and non-optional typings are accurately generated from the protobuf definitions. The typings for oneOf fields, in particular, do not align well with client-side implementations.

Ideally, we should be able to simply import type { ProtoType } from 'generated-ts-library' and trust that it represents the "raw" protobuf structure—without the Message generic interface or the oneOf case/value type. This approach would keep package and web app artifact sizes unaffected.

@timostamm
Copy link
Member

Correct, we have to stick to ProtoJSON. Otherwise, protobuf-es wouldn't be compatible with other Protobuf implementations.

That said, it’s crucial to ensure that optional and non-optional typings are accurately generated from the protobuf definitions.

This works out for proto3 where optional is the exception, but not for proto2 and editions, where all fields are typically optional. It doesn't seem like a good idea to require users to handle undefined for every single property access. Also see Field presence and default values in the manual.

The typings for oneOf fields, in particular, do not align well with client-side implementations.

Can you elaborate? What do you mean with client-side implementation, and how would you expect oneof to be represented?

Ideally, we should be able to simply import type { ProtoType } from 'generated-ts-library' and trust that it represents the "raw" protobuf structure—without the Message generic interface or the oneOf case/value type. This approach would keep package and web app artifact sizes unaffected.

By "raw structure", you mean the JavaScript Object Notation? That would be ideal, but since ProtoJSON doesn't emit zero values by default, and since JSON can't serialize NaN, Infinity, and 64-bit integers, the JSON representation can be quite lacking.

Related: Connect documentation on SSR.

@rossipedia
Copy link

What about this blurb here, can we get an option at least to provide default values for fields?

When generating JSON-encoded output from a protocol buffer, if a protobuf field has the default value and if the field doesn’t support field presence, it will be omitted from the output by default. An implementation may provide options to include fields with default values in the output

@timostamm
Copy link
Member

The option already exists, see the manual here.

It would be fantastic if this was the default behavior for ProtoJSON, instead of an optional feature.

@rossipedia
Copy link

Ok but if I do that, then the returned type is no longer the *Json type that I get when I use json_types: true, instead it's just the opaque JsonValue

@timostamm
Copy link
Member

Closing this. The generated JSON type has to match the ProtoJSON format, even if this format isn't ideal. The potential for runtime errors is clearly worse.

Because it isn't feasible to generate types for every permutation of ProtoJSON options, we need to pick one set of options to generate types for, and the only logical choice is to pick the default options that are supported in every Protobuf implementation.

We are aware that this is not ideal. One alternative could be to define an alternative to ProtoJSON. It would use alwaysEmitImplicit: true, not provide any options that vary the output, and not allow variance when parsing (e.g. not accept string and null for a Protobuf int32, which a conformant ProtoJSON parser is required to accept).

This would allow for a more idiomatic mapping to a TypeScript type. It would still have rough edges around Protobuf float (number | "NaN" | "Infinity" | "-Infinity" because JSON only supports finite numbers), bytes (string with base64-encoded binary data because JSON does not support Uint8Array), 64-bit integers (string because JSON does not support bigint), and I'm not sure if TypeScript can reasonably type oneof as a flat structure yet.

I'm not sure how useful an alternate JSON format would be without support in other implementations, but the best chance is to explore the idea by building it.

@timostamm
Copy link
Member

Just in case it wasn't clear when I linked to Field presence and default values and the Connect documentation on SSR:
You can use something like devalue on any proto3 message.

syntax = "proto3";
package example;

message User {
  string first_name = 1;
  bool active = 2;
  bytes data = 3;
  uint64 large_number = 4;
  double double = 5;
}
import { create } from "@bufbuild/protobuf";
import { UserSchema } from "./gen/example_pb.js";
import * as devalue from "devalue";
import * as assert from "node:assert";

const user = create(UserSchema, {
  firstName: "bob",
  active: true,
  data: new Uint8Array(16),
  largeNumber: 2n**52n,
  double: -Infinity,
});
console.log(user);

const str = devalue.stringify(user);
const user2 = devalue.parse(str);
assert.deepStrictEqual(user, user2);

@anthonysessa
Copy link
Author

anthonysessa commented Feb 22, 2025

The current issue arises from how the generated code handles oneof fields in Protocol Buffers. Instead of directly reflecting the simple structure defined in the proto file, the code generation wraps these fields in a nested object using a case: value pattern. This approach makes the resulting TypeScript types less intuitive and more cumbersome—especially when integrating with libraries like Zod that adhere to standardized schema definitions (see [Standard Schema](https://standardschema.dev/)) for validating network requests and responses.

Including the create method on the client-side makes the web client artifact very large given it includes downstream implementation that is not required on the client to maintain strict typings.

Client applications need a way to have simplistic typings that are strict and respect the optional protobuf modifier. For example, we have data that must propogate to our data-lake for compliance reasons that originates from the client. The buf library provides the non-json_types typings which respect optional vs non-optional but has nested case: value typings which are hard to work with because the nested nature is unintuitive from a proto->TypeScript data structure perspective. The json_types makes everything optional so a consumer must rely on manually defining schema validations (w/ a library like zod) that are cannot be directly tied to the protobuf codegen which is very error prone.

The json_types generated code gets us closer to a simplistic/intuitive typing for the client, but all fields are optional.


Example: Proto Definition and Generated Code

Consider this proto definition:

syntax = "proto3";

import "google/protobuf/timestamp.proto";

message Pet {
  string name = 1;
  optional int32 age = 2;
  PetType pet_type = 3;
}

message PetType {
  oneof type {
    Cat cat = 1;
    Dog dog = 2;
  }
}

message Cat {
  string breed = 1;
}

message Dog {
  string breed = 1;
}

When compiled with the json_types flag, the generated TypeScript code creates a nested case: value structure for oneof fields. Additionally the json_types generation makes all fields optional:

// @generated by protoc-gen-es v2.2.3 with parameter "json_types=true"
// @generated from file com/clearme/protobuf/shared/v1/example_proto.proto (syntax proto3)
/* eslint-disable */

import type { GenFile, GenMessage } from "@bufbuild/protobuf/codegenv1";
import type { Message } from "@bufbuild/protobuf";

/**
 * Describes the file com/clearme/protobuf/shared/v1/example_proto.proto.
 */
export declare const file_com_clearme_protobuf_shared_v1_example_proto: GenFile;

/**
 * @generated from message Pet
 */
export declare type Pet = Message<"Pet"> & {
  /**
   * @generated from field: string name = 1;
   */
  name: string;

  /**
   * @generated from field: optional int32 age = 2;
   */
  age?: number;

  /**
   * @generated from field: PetType pet_type = 3;
   */
  petType?: PetType;
};

/**
 * @generated from message Pet
 */
export declare type PetJson = {
  /**
   * @generated from field: string name = 1;
   */
  name?: string;

  /**
   * @generated from field: optional int32 age = 2;
   */
  age?: number;

  /**
   * @generated from field: PetType pet_type = 3;
   */
  petType?: PetTypeJson;
};

/**
 * Describes the message Pet.
 * Use `create(PetSchema)` to create a new message.
 */
export declare const PetSchema: GenMessage<Pet, PetJson>;

/**
 * @generated from message PetType
 */
export declare type PetType = Message<"PetType"> & {
  /**
   * @generated from oneof PetType.type
   */
  type: {
    /**
     * @generated from field: Cat cat = 1;
     */
    value: Cat;
    case: "cat";
  } | {
    /**
     * @generated from field: Dog dog = 2;
     */
    value: Dog;
    case: "dog";
  } | { case: undefined; value?: undefined };
};

/**
 * @generated from message PetType
 */
export declare type PetTypeJson = {
  /**
   * @generated from field: Cat cat = 1;
   */
  cat?: CatJson;

  /**
   * @generated from field: Dog dog = 2;
   */
  dog?: DogJson;
};

/**
 * Describes the message PetType.
 * Use `create(PetTypeSchema)` to create a new message.
 */
export declare const PetTypeSchema: GenMessage<PetType, PetTypeJson>;

/**
 * @generated from message Cat
 */
export declare type Cat = Message<"Cat"> & {
  /**
   * @generated from field: string breed = 1;
   */
  breed: string;
};

/**
 * @generated from message Cat
 */
export declare type CatJson = {
  /**
   * @generated from field: string breed = 1;
   */
  breed?: string;
};

/**
 * Describes the message Cat.
 * Use `create(CatSchema)` to create a new message.
 */
export declare const CatSchema: GenMessage<Cat, CatJson>;

/**
 * @generated from message Dog
 */
export declare type Dog = Message<"Dog"> & {
  /**
   * @generated from field: string breed = 1;
   */
  breed: string;
};

/**
 * @generated from message Dog
 */
export declare type DogJson = {
  /**
   * @generated from field: string breed = 1;
   */
  breed?: string;
};

/**
 * Describes the message Dog.
 * Use `create(DogSchema)` to create a new message.
 */
export declare const DogSchema: GenMessage<Dog, DogJson>;

What is interesting is that the non-json_types generated code does respect optional vs non-optional fields, so the request to have a json_types style version of the generated code, that respects optional vs non-optional types without the nested case: value structure.


What Would Be Helpful

A more straightforward typing—similar to what ts-proto produces—would look like this:

export interface Pet {
  name: string;
  age?: number | undefined;
  petType: PetType | undefined;
}

export interface PetType {
  cat?: Cat | undefined;
  dog?: Dog | undefined;
}

export interface Cat {
  breed: string;
}

export interface Dog {
  breed: string;
}

This version removes the extra case: value wrapper and directly represents the optional oneOf nature of the cat and dog fields. It respects non-optional fields better and is easier to work with, particularly in scenarios where schema validation (e.g., using Zod) is important.


Why This Matters

  • Developer Experience: The case: value pattern complicates the code, making it less intuitive and harder to work with, especially when validating data.
  • Client-Side Efficiency: Using a simpler type structure allows you to import types with a statement like import type { Pet } from 'generated-client'. This approach strictly types your protobuf events without increasing the bundle size of your web application.
  • Alternative Approaches: There are different ways to handle oneof fields, such as generating a union type directly rather than using a nested mapping. This alternative could lead to cleaner, more maintainable code.

In summary, the goal is to generate TypeScript types that are as close as possible to the original proto definitions—avoiding unnecessary nesting—thereby ensuring more intuitive and efficient usage on the client side, especially when working with modern validation libraries.

@jcready
Copy link

jcready commented Feb 22, 2025

export interface PetType {
  cat?: Cat | undefined;
  dog?: Dog | undefined;
}

This type doesn't enforce that only one can be set. This passes type checking:

const a: PetType = { cat: { breed: '' }, dog: { breed: '' } };

There are different ways to handle oneof fields, such as generating a union type directly rather than using a nested mapping.

I don't believe there is actually a way to produce a type which both restricts the object to only have zero or one of the fields set. I assume you're talking about a type like:

type PetType =
  | { cat: Cat; dog?: never }
  | { cat?: never; dog: Dog }
  | { cat?: never; dog?: never };

But as far as I can tell that doesn't actually work correctly either:

declare var a: PetType;
a.cat = { breed: '' };
a.dog = { breed: '' };

There are no type errors generated for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants