Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(payload-schemas): add custom RegEx validation for 'email' #677

Closed

Conversation

oscard0m
Copy link
Member

@oscard0m oscard0m commented Jul 20, 2022

Description

  • Added custom validation format for emails in ajv
  • Added a new schema example to validate emails with special characters are validated
  • Regenerate index.json

Context

Current JSON Schema validation for emails does not support emails with this format.

EDIT:

- Current JSON Schema validation for `emails` does not support emails with this format.
+ emails aren't compliant with the RFC, not because of anything related to the JSON schema itself.
+ RFC 5322 Section 3.4 defines valid email addresses here: https://datatracker.ietf.org/doc/html/rfc5322#section-3.4

Example of not supported format:
41898282+github-actions[bot]@users.noreply.github.com

Fixes #453

@oscard0m
Copy link
Member Author

@timrogers @wolfy1339 before fixing the Regex I references in a code comment, I wanted to validate with you this solution proposal.

What do you think?

@oscard0m oscard0m added the Type: Bug Something isn't working as documented label Jul 20, 2022
@oscard0m oscard0m requested review from wolfy1339 and timrogers July 20, 2022 16:01
@wolfy1339 wolfy1339 added Type: Maintenance Any dependency, housekeeping, and clean up Issue or PR and removed Type: Bug Something isn't working as documented labels Jul 20, 2022
@wolfy1339
Copy link
Member

wolfy1339 commented Jul 20, 2022

You don't need to specify it that way, you can simply pass in a regex for validation directly in the schema itself using the pattern property

https://json-schema.org/understanding-json-schema/reference/string.html#pattern

@wolfy1339
Copy link
Member

The current JSON Schema specification does not cover e-mails with this format.

It's because the emails aren't compliant with the RFC, not because of anything related to the JSON schema itself.

RFC 5322 Section 3.4 defines valid email addresses

I just want that to be obvious, so we are all on the same page and don't blame the wrong people.


Also, don't forget to re-generate the index.json file

@timrogers
Copy link
Contributor

It's because the emails aren't compliant with the RFC, not because of anything related to the JSON schema itself.

RFC 5322 Section 3.4 defines valid email addresses

I agree on this. Personally, I think we should just remove format: "email" and say it is a string, rather than changing how emails are validated here.

From the JSON Schema standard's perspective, it is a string as it doesn't conform to the email RFC.

@wolfy1339
Copy link
Member

Here's a modified version of AJV's regex for email validation that works with the emails we have

^[a-z0-9!#$%&'*+/=?^_`{|}~\-\[\]]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~\-\[\]]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$

@oscard0m
Copy link
Member Author

oscard0m commented Jul 20, 2022

Here's a modified version of AJV's regex for email validation that works with the emails we have

^[a-z0-9!#$%&'*+/=?^_`{|}~\-\[\]]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~\-\[\]]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$

Could you guide me on which part of the code are we supposed to add this pattern property? I'm a bit lost right now. @wolfy1339

I agree on this. Personally, I think we should just remove format: "email" and say it is a string, rather than changing how emails are validated here.

From the JSON Schema standard's perspective, it is a string as it doesn't conform to the email RFC.

@timrogers wolfy already got a regex to cover our use-case, I would prefer to use that so we don't allow non-valid e-mails and add extend the regex to cover our use-case. Any downsides of using it?

@wolfy1339
Copy link
Member

wolfy1339 commented Jul 20, 2022

Could you guide me on which part of the code are we supposed to add this pattern property? I'm a bit lost right now.
@wolfy1339

"oneOf": [{ "type": "string", "format": "email" }, { "type": "null" }]

Instead of having "format": "email", you would use pattern with the regex
https://json-schema.org/understanding-json-schema/reference/string.html#pattern

@timrogers
Copy link
Contributor

timrogers commented Jul 20, 2022

@timrogers wolfy already got a regex to cover our use-case, I would prefer to use that so we don't allow non-valid e-mails and add extend the regex to cover our use-case. Any downsides of using it?

The downside, in my opinion, is that people might be using these schemas in another language with another JSON Schema validation tool.

If someone uses another tool, it won't know that we have "redefined" format: "email" to use our own custom regex, so the other tool will consider webhooks to be invalid according to the schema.

@wolfy1339
Copy link
Member

The downside, in my opinion, is that people might be using these schemas in another language with another JSON Schema validation tool.

If someone uses another tool, it won't know that we have "redefined" format: "email" to use our own custom regex, so the other tool will consider webhooks to be invalid according to the schema.

Totally agree.

Which is why I suggested the pattern key directly in the schema

@timrogers
Copy link
Contributor

Totally agree.

Which is why I suggested the pattern key directly in the schema

Ah, makes sense - I missed "in the schema" in your comment above. I think that would be the best way of doing it: don't use the email format and define our own pattern instead.

@oscard0m oscard0m force-pushed the add-custom-regex-validation-for-emails-in-ajv branch from 67a26a0 to 401c14c Compare July 20, 2022 19:59
@oscard0m
Copy link
Member Author

oscard0m commented Jul 20, 2022

I had to double escape some characters (due to JSON but the RegExp is not matching after escaping). Any clue on what I'm doing wrong @wolfy1339 ?


Should we migrate the rest of schemas with email to this pattern?

@oscard0m oscard0m force-pushed the add-custom-regex-validation-for-emails-in-ajv branch from 401c14c to 561fb88 Compare July 20, 2022 20:15
@wolfy1339
Copy link
Member

I had to double escape some characters (due to JSON but the RegExp is not matching after escaping). Any clue on what I'm doing wrong @wolfy1339 ?

The double escapes are changing the meaning of the RegExp. The backslashes are escapes themselves for characters in the RegExp, and by escaping them, you're changing their meaning to be a backslash.

That's all I can think of

@G-Rath
Copy link
Member

G-Rath commented Jul 20, 2022

Honestly I don't think we should be messing with a pattern regexp like that, as it's a good way to cause more trouble (the size of it alone is huge even before you think about all the subpatterns).

If that field is meant to hold valid emails then they should be valid according to the spec; otherwise they are a string and should not be validated further than that. The third option I'd be happy with is matching whatever validation that git itself uses which I would hope is either nothing or RFC 5322 but I wouldn't be surprised if it wasn't as email validation is notorious for being done incorrectly by everyone...

@oscard0m
Copy link
Member Author

I had to double escape some characters (due to JSON but the RegExp is not matching after escaping). Any clue on what I'm doing wrong @wolfy1339 ?

The double escapes are changing the meaning of the RegExp. The backslashes are escapes themselves for characters in the RegExp, and by escaping them, you're changing their meaning to be a backslash.

That's all I can think of

This is the issue I'm getting, invalid JSON basically:
image


Honestly I don't think we should be messing with a pattern regexp like that, as it's a good way to cause more trouble (the size of it alone is huge even before you think about all the subpatterns).

If that field is meant to hold valid emails then they should be valid according to the spec; otherwise they are a string and should not be validated further than that. The third option I'd be happy with is matching whatever validation that git itself uses which I would hope is either nothing or RFC 5322 but I wouldn't be surprised if it wasn't as email validation is notorious for being done incorrectly by everyone...

I can agree the RegEx is very long and hard to follow but basically, we are extending the one we were already using with ajv-format.

Your point of not following a standard is interesting but at least there are some basic patterns of an email address we should be covering, right?

@G-Rath
Copy link
Member

G-Rath commented Jul 20, 2022

Your point of not following a standard is interesting but at least there are some basic patterns of an email address we should be covering, right?

My point is we should be matching whatever is using that field because that's what actually matters or otherwise should not be validating it at all - anything else means that we are being incorrect.

Sometimes that is fine because the fix can be straightforward, but in this case it's a honking big RegExp that we've sourced from somewhere else i.e. consider if someone else comes along and presents another email that is considered invalid by this pattern but works with git - we'll need to safely modify the pattern again to cover that in addition to what we currently consider valid.

(also consider the time you've already spent on trying to get this working)

@oscard0m
Copy link
Member Author

oscard0m commented Jul 20, 2022

I see your point @G-Rath. It's very valid IMO.

👍🏽 thumbs up for just removing the format for email: #678

🚀 for using our own RegEx pattern: #677


@wolfy1339 @G-Rath @timrogers @gr2m @nickfloyd

@wolfy1339
Copy link
Member

If we are to remove it, we should also document it somewhere

@oscard0m
Copy link
Member Author

Created associated PR here already in case we end up deciding on this solution: #678

I am still checking why this Regex is not working as a personal challenge 🤓 and pushing the working solution.

@oscard0m oscard0m force-pushed the add-custom-regex-validation-for-emails-in-ajv branch from 561fb88 to 623d3bc Compare July 21, 2022 08:12
@oscard0m oscard0m force-pushed the add-custom-regex-validation-for-emails-in-ajv branch from 623d3bc to 3cb527b Compare July 21, 2022 08:37
@oscard0m oscard0m marked this pull request as ready for review July 21, 2022 08:37
@oscard0m
Copy link
Member Author

Created associated PR here already in case we end up deciding on this solution: #678

I am still checking why this Regex is not working as a personal challenge 🤓 and pushing the working solution.

Found the issue, we were missing the case insensitive flag. JSON Schemas (from what I found) do not support flags like \i for RegEx so I had to emulate that by adding A-Za-z everywhere 🤮

@oscard0m oscard0m added Type: Bug Something isn't working as documented and removed Type: Maintenance Any dependency, housekeeping, and clean up Issue or PR labels Jul 21, 2022
@wolfy1339
Copy link
Member

For reasons listed above, i think it's best to leave this issue alone.

As GitHub has published it's own OpenAPI spec for webhooks, efforts will be concentrated on the current community spec and transitioning to the official spec

@wolfy1339 wolfy1339 closed this Oct 26, 2022
@wolfy1339 wolfy1339 deleted the add-custom-regex-validation-for-emails-in-ajv branch October 26, 2022 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Something isn't working as documented
Projects
None yet
Development

Successfully merging this pull request may close these issues.

false-positives when validating webhooks payloads due to email addresses
4 participants