Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding lenient() option for string inputs #804

Closed
nfantone opened this issue Dec 1, 2021 · 12 comments
Closed

Consider adding lenient() option for string inputs #804

nfantone opened this issue Dec 1, 2021 · 12 comments
Labels
enhancement New feature or request wontfix This will not be worked on

Comments

@nfantone
Copy link

nfantone commented Dec 1, 2021

I was in the process of migrating my existing yup schemas to zod and found that the main sticking point is handling the parsing/validation of request path and query string parameters. Since they are typically considered raw strings and the parsing is left to the application level, zod doesn't really provide a great DX in these scenarios. Specially when compared to how yup does it out-of-the-box.

// My existing `yup` schema
const schema = yup.object({ params: yup.object({ id: yup.number().required() }) })

// GET /user/42
// req -> { params: { id: '42' } }
schema.validateSync(req) // { params: { id: 42 } } -> Passes 
// `zod` equivalent
const schema = z.object({ params: z.object({ id: z.number() }) })

// GET /user/42
// req -> { params: { id: '42' } }

schema.parse(req) // -> Fails

/*
Uncaught:
[
  {
    "code": "invalid_type",
    "expected": "number",
    "received": "string",
    "path": [
      "params",
      "id"
    ],
    "message": "Expected number, received string"
  }
]
*/

Which leads me to write custom preprocess() validators, along with custom error messages, for each expected type, every time. Here's an example for validating numerical strings.

const DEFAULT_ZOD_NUMERICAL_PARAMS = Object.freeze({
  errorMap: (issue, _ctx) => {
    const message =
      issue.code === z.ZodIssueCode.invalid_type
        ? `Expected ${issue.expected}, received ${issue.received === 'string' ? `'${_ctx.data}'` : issue.received}`
        : _ctx.defaultError

    return { message }
  },
})

function numerical(params) {
  return z.preprocess(value => {
    const num = isNilOrEmpty(value) ? NaN : Number(value)
    return Number.isNaN(num) ? value : num
  }, z.number(params ?? DEFAULT_ZOD_NUMERICAL_PARAMS))
}

export default numerical

I know this has been discussed before, but having something close to a .lenient() parsing option, allowing for values to be internally coerced would be great.

z.lenient(z.number()).parse('42') 
// 42

IMHO, this is such a common scenario when dealing with serialized data, that it only makes sense for a library such as this to support it without extra hassle. In addition, while the .preprocess() method above works, it transfers the responsibility of the parsing to the user, which is arguably the main use case of zod.

@scotttrinh
Copy link
Collaborator

Thanks for the write up and the thoughtful suggestion @nfantone !

Speaking for myself, I do not view Zod's main use case as parsing JSON or query strings but as safely turning an unknown to a T. Assuming that the input data is of any particular serialized type (query strings, JSON, form data, protobuf, csv, etc) is a level of abstraction above what I feel Zod should be focused on. However, all of those use cases are important and making it easier to build the right thing for each use case is definitely something I think we should try to make easier.

Your proposal (lenient) is ergonomic, but I'm not sure that we want to hard-code this into core since there are so many ways you might want to coerce the input. A library (or a module in your own application) seems like a good place to have wrapped versions of most types that attempt to coerce their input from strings (or whatever else: Date, null -> 0, false -> 0, etc). This is the approach that I've personally taken and it gives us the flexibility that we absolutely require when doing these kinds of transformations.

Definitely open to continue discussing this and trying to support making this as easy as possible.

FWIW, this is our numeric string schema: CodeSandbox

const numericString = z
  .string()
  .refine((s) => {
    const n = Number(s);

    return Number.isFinite(n) && !Number.isNaN(n);
  })
  .transform(Number);

@nfantone
Copy link
Author

nfantone commented Dec 1, 2021

Hi @scotttrinh. Thanks for your reply! You raise good points. Let me see if I can expand on them inline.

Speaking for myself, I do not view Zod's main use case as parsing JSON or query strings but as safely turning an unknown to a T.

Well, at the risk of sounding a bit cheeky, saying that while having your core/main library function be named .parse is a tough sell 😛.

In all seriousness, I get what you mean here - but I really didn't want to circumscribe the uses of zod to "parsing query strings". More like parsing arbitrary typed data. It just so happens that, this being JavaScript, you can expect plenty of very valid use cases dealing with web servers where that data is pretty much always represented as either strings or byte streams.

A library (or a module in your own application) seems like a good place to have wrapped versions of most types that attempt to coerce their input from strings

It would a good place, absolutely. A better place? zod. Or maybe a companion library? Obviously, your milage may vary, but to me, without this concept of being able to coerce string values without effort, because I always end up writing boilerplate code, zod comes in second when deciding which validation library to use for most projects.

FWIW, this is our numeric string schema

Thanks for sharing that! That's really good. Doesn't quite fit my needs (i.e., the semantics of the lenient() approach), though:

  • doesn't work with numbers;
  • parses empty/blank strings as 0;
  • default error message doesn't provide any hints on what is actually expected (reads "Invalid input").

Of course, you can go ahead and try to fix those things.

z.number().or(
  z.preprocess(
    value => (isNil(value) ? value : String(value).trim()),
    z
      .string()
      .min(1, 'Expected number or numeric string, received empty string')
      .refine(
        s => {
          const n = Number(s)
          return Number.isFinite(n) && !Number.isNaN(n)
        },
        value => ({ message: `Expected number or numeric string, received '${value}'` })
      )
      .transform(Number)
  )
)

But I guess this kinda further proves the point I am trying to raise.

@scotttrinh
Copy link
Collaborator

Well, at the risk of sounding a bit cheeky, saying that while having your core/main library function be named .parse is a tough sell 😛.

😆

More like parsing arbitrary typed data. It just so happens that, this being JavaScript, you can expect plenty of very valid use cases dealing with web servers where that data is pretty much always represented as either strings or byte streams.

Yeah, I 100% agree with you that dealing with serialized data is something you have to do often. I guess my feeling is that Zod is an abstraction below that as a runtime type system. In a similar way to TypeScript, a z.number should be a number and if it's not, you should have to be explicit about that. TypeScript doesn't let you be lose about it, and I feel pretty strongly that Zod should be as explicit as possible about what it is doing and not offer coercion or transformation as a hidden or implicit effect. Now, I know you're proposing something explicit (lenient) here, but what lenient does seems pretty sensitive to case-by-case variation.

It would a good place, absolutely. A better place? zod. Or maybe a companion library? Obviously, your milage may vary, but to me, without this concept of being able to coerce string values without effort, because I always end up writing boilerplate code, zod comes in second when deciding which validation library to use for most projects.

Totally fair! If Zod isn't the style of runtime type system that works for you, I think it's totally fine to say something like yup is a better fit. As it is, I appreciate that Zod is very close to TypeScript and doesn't come with any (many?) assumptions about your use cases so you can adapt it to your domain. Which brings me to the point you make here:

Doesn't quite fit my needs (i.e., the semantics of the lenient() approach), though:
But I guess this kinda further proves the point I am trying to raise.

On the contrary, I think that points to the point I'm trying to make: For us, we want to be absolutely certain that if something is a numeric string, it's a numeric string, not a date or null or something else that can be coercible to a number. And parsing empty strings as 0 is precisely what coercion should do, in my opinion, but as you've pointed out, you feel differently (it should return an error?). So, if we bring your (or my) opinion into core about how to convert between types, we risk increasing our maintenance burden, introducing more complexity (bugs), and still only properly serving a portion of users.

I think this is why I feel like building a library (or having an internal module/library) is really the best way forward here: It allows Zod to focus on just representing and narrowing TypeScript types at runtime while providing affordances to do transformations/refinements/coercions as needed. This approach seems to be successful for io-ts which similarly has a "core" and libraries that you can opt into depending on your needs. Superstruct has taken a similar core vs library approach as well. I absolutely support the idea of an ecosystem building up around Zod that supports these use cases in opinionated and ergonomic ways that align to the values and opinions of those library maintainers.


As a separate note, here is another wrinkle in the proposed solution:

And think of all of the other types beyond the primitives. What is a lenient(intersection)? Or do we need to distinguish between primitives and complex types?

@nfantone
Copy link
Author

nfantone commented Dec 1, 2021

Ok, so I'm 100% behind everything you are saying here, in principle 👍🏼.

Except (there's always a "but"), I have a small issue with the implicit implication that the original use case I provided for something like lenient() is "opinionated" and/or can be dismissed as being "sensitive to case-by-case variation". I think it's fair to say that no library will cover every use case. That's a given. But I frankly can't remember the last time I had to work on a node web service and did not need to parse/validate string data. It's so quintessential to web development that, to me, if feels like a glaring omission on zod's part. Whether this logic should be housed within zod core or not, is not really the point, I believe. The argument is more centered around developer experience.

Since you brought up the topic of libraries, after seeing what other libraries closer to zod exist in the ecosystem, I think it's safe to say that a very important function of zod is to provide safe typings and parsing to web APIs (tRPC, json-schema-to-zod, etc.). With that in mind, I can't help but wonder: why is it not straightforward to express the type for something like GET /users/:id, with id being a number, with a zod schema?

And think of all of the other types beyond the primitives. What is a lenient(intersection)? Or do we need to distinguish between primitives and complex types?

I admit I didn't go deep into the implementation details of my proposal. But I suppose that lenient() should provide a "best effort" approach at coercing your input value (think JSON.parse). The next parser in the pipeline shouldn't really matter.

@scotttrinh
Copy link
Collaborator

I have a small issue with the implicit implication that the original use case I provided for something like lenient() is "opinionated" and/or can be dismissed as being "sensitive to case-by-case variation". I think it's fair to say that no library will cover every use case. That's a given. But I frankly can't remember the last time I had to work on a node web service and did not need to parse/validate string data.

Right, but as your example pointed out, you think the parser should accept numbers also, and if the string is empty it should throw an error. I think that's perfect valid, but that's not at all how I would want something similar to act. I think that's what I'm trying to say when I say that I think each developer (or team) needs to make decisions about how, when, and in what way serialized data in converted, and that providing functionality that picks a way is necessarily opinionated. I don't mean to be dismissive: I think your approach is a good one that makes sense for some use cases!

The argument is more centered around developer experience.

I 100% agree, and a lot of people have brought up other such use cases: especially forms. In each case, you might want to make different decisions about how to cast. For another example, pg converts some "serialized" data already for you, but leaves some types as strings since they can be round-trip lossy without BigInt. Making it easy to write the layer on-top of Zod that is appropriate for each team and use case is absolutely a part of what I see as Zod's responsibility (preprocess, transform, refine, etc). Providing implementations for each use case is something I wouldn't want to see Zod take on, and as the link you posted in your first comment attests to, I don't believe @colinhacks wants there to be multiple ways to transform data for certain cases like number -> string, etc.

Since you brought up the topic of libraries, after seeing what other libraries closer to zod exist in the ecosystem, I think it's safe to say that a very important function of zod is to provide safe typings and parsing to web APIs (tRPC, json-schema-to-zod, etc.). With that in mind, I can't help but wonder: what is it not straightforward to express the type for something like GET /users/:id, with id being a number, with a zod schema?

From my perspective, the schema for that (z.string().transform(Number)) is perfectly straightforward. And if you're writing a library like tRPC maybe you have some more robust schemas that check for NaN and undefined and return some appropriate error, but I think that kind of logic belongs in that library rather than in Zod. I think it makes sense that Zod treats types in a similar way to TypeScript (TypeScript treats that "type" as string also) while giving the affordances for transformation such that the input type and the output type might be different.


I don't mean to come across as confrontational, and I very much appreciate your perspective and thoughtful answers and suggestions here. My hope is that users with the right vantage point based on their expertise and opinions can provide the layer that you feel we're missing, and I very much agree with you that the ecosystem is missing these sorts of developer-friendly and use-case specific libraries. I am also frustrated that I have to write these transforms by hand, but even if we provided your specific solution, I would still write them by hand since they do not align with my team's specific viewpoint on the proper way to specify these schemas for the multitude of use cases we have (json, query strings, form data, database data, etc.). I hope my comments help to situate my opinion (and that's all this is: my opinion!) about the direction I'd like to see Zod take and don't dissuade you from continuing to advocate for your own perspective.

@nfantone
Copy link
Author

nfantone commented Dec 2, 2021

You're not coming across as confrontational - quite the contrary. Don't stress about it! And many thanks for taking the time to reply thoughtfully.

And again, I do agree with your points, even if we don't see things exactly the same colour. Perhaps I'm expecting things from zod that it's just not meant to be providing. That's on me and it's absolutely fine.

The one comment I would like to address is:

From my perspective, the schema for that (z.string().transform(Number)) is perfectly straightforward.

I would like to challenge that. IMHO, there are (several) issues with your suggestion. These are the ones I can think of off the top of my head.

  • Unlike z.number(), produces NaN for most inputs.
  • Unlike z.number(), produces unexpected results for some inputs (i.e, z.string().transform(Number).parse(' ') // 0) **.
  • The error messages arising from failing to parse that are completely useless/misleading.
  • Can't chain extra number validators, such as gt, int, etc.
  • ...and more importantly, semantics and documentation. yup.number() explicitly states intention and unequivocally conveys information about the expected input type. z.string().transform(Number) doesn't.

So, no - sadly, I don't think it's straightforward in zod (or useful, at all, in my perspective, for that matter). Like I mention before: yes - you can work around (some) of these limitations, currently. The natural question is: why?


** I appreciate your comments above on how this is "my own use case" and it points to zod covering "team needs". I get it. But I assure you, most teams in the world (I don't feel confident about saying "all", but it should be pretty darn close) would not expect curl 'https://my.company.io/api/users/%20%20%20' to gracefully and willingly fetch user 0. Making this the default behaviour makes little sense.

@RichiCoder1
Copy link

I ran into this for a rather weird use case where we're currently using a system that "stringifys" all the properties passed in. So we get a correctly shaped object, but all the booleans/numbers/etc... are stringified. It's another edge case I'll admit, but it is a case.

Of the above the most "significant" issue personally is the inability to chain. So even if I wanted to write my own "type" I end up having to go through contortions to make it look like a normal type.

@nfantone
Copy link
Author

So, if we bring your (or my) opinion into core about how to convert between types, we risk increasing our maintenance burden, introducing more complexity (bugs), and still only properly serving a portion of users.

This is true for every design decision on every project, ever. Any and all additions to an existing stack bear forward a certain opinion, dismissing (voluntarily or not) others. I frankly don't see this being an argument against implementing new stuff.

Also, I'm not convinced that the fact that there might be "different opinions" on how to convert data, prevents us from giving users the option. There are also different opinions on how to parse functions, objects and every other type out there and still zod provides ways to handle those. Like any other library on npm, zod is alredy very opinionated.

const z = require('zod')

typeof null // 'object'
z.object().parse(null) // 'Expected object, received null' <--- Opinion 👀 

@markandrus
Copy link

I'd like to share what I've been experimenting with while working on a project to convert the outputs of openapi-typescript to Zod in order to implement a strongly typed REST API server.

Parsing JSON bodies is straightforward. But, as already mentioned in the thread, parsing path, query, and header parameters is trickier because, although the Open API spec and Zod schemas may specify a parameter to be boolean-valued, numeric, etc., these parameters always arrive as strings.

I've taken the approach of preprocessing my Zod schemas (ZodObjects) for handling these parameters. I do this by preprocessing each ZodObject property:

  1. Detect whether the property could be boolean-valued or numeric. This requires recursing through ZodLazy and ZodUnion, looking for instances of ZodBoolean and ZodNumber. (I may need to handle other types, too, but I haven't gotten there yet.)
  2. Wrap the property's schema in z.preprocess:
    1. If the property could be boolean-valued, arrives as a string, and its trimmed lowercase representation equals "true" or "false", return the corresponding boolean.
    2. If the property could be numeric, arrives as a non-empty string, and the result of parsing it as a Number is neither NaN nor Infinity, return the number.
    3. Otherwise, return the unparsed input.

I give precedence to boolean values and then numeric values before falling back to whatever the ZodSchema is looking for (which hopefully can be parsed from a string). Although it's a bit annoying to do this, I am forced to take some decisions that may not be applicable to other use cases. I think I generally agree with @scotttrinh's comment here: #804 (comment)

@stale
Copy link

stale bot commented Apr 28, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Apr 28, 2022
@stale stale bot closed this as completed May 5, 2022
@alavkx
Copy link

alavkx commented Jul 28, 2022

FWIW I'm encountering similar struggles when attempting to work with number inputs, using zod as a react-hook-form resolver. It is.....pretty challenging to figure out.
https://codesandbox.io/s/stupefied-moser-0fpq94?file=/src/App.tsx

Given...

  • a strongly typed endpoint
  • a matching zod schema (consider tRPC)
  • a form design for HTTP PATCH; a partial update
  • the need to represent NO CHANGE as an empty input
  • HTML's native behavior to represent EMPTY as empty string ('')

How do you represent a number input?

@ryanhaticus
Copy link

Please see coercion: https://github.com/colinhacks/zod#coercion-for-primitives

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

7 participants