Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stdlib): add punycode encoding functions #672

Merged
merged 9 commits into from
Feb 7, 2024

Conversation

esensar
Copy link
Contributor

@esensar esensar commented Feb 5, 2024

This adds encode_punycode and decode_punycode functions. It also adds tests to confirm parse_url function behavior when it comes to punycode.

Fixes: #659

This adds `encode_punycode` and `decode_punycode` functions.
It also adds tests to confirm `parse_url` function behavior when it comes to punycode.

Fixes: vectordotdev#659
@esensar
Copy link
Contributor Author

esensar commented Feb 5, 2024

This adds punycode related functions. I have also realised that parse_url already handles that, so I have added a few tests to confirm it. The newly added crate for handling punycode was already a transitive dependency of the project (because of the use of url crate, and idna crate which handles punycode is a direct dependency of url and maintained in the same repository https://github.com/servo/rust-url).

@johnhtodd
Copy link

johnhtodd commented Feb 5, 2024

I haven't tested this patch, so forgive me if this lazy request could be answered that way: What happens if a string not containing UTF-8 characters is run through the encode_punycode process? Or a non-punycode string is run through decode_punycode?

If the strings remain un-mangled through both directions if they do not contain UTF-8, then the ability to detect if a string is UTF-8 or not is un-necessary since simple string evaluations can make that determination.

@esensar
Copy link
Contributor Author

esensar commented Feb 5, 2024

Right, I should have added some tests and examples to demonstrate that behavior. To answer your question, it will only encode whatever requires encoding, so for that reason the example (www.café.com) only encodes the café part.

I will add tests and examples with fully ASCII domains to make this more clear.

@johnhtodd
Copy link

johnhtodd commented Feb 5, 2024

Thanks - more test cases are always good. Very happy to see this feature. I suspect it will have a better chance of approval if it also had documentation in the main docs repository as part of the PR.

Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @esensar, this looks good.

Can you also add a VRL test like this that exercises both encode and decode?

@jszwedko
Copy link
Member

jszwedko commented Feb 6, 2024

Thank you @esensar, this looks good.

Can you also add a VRL test like this that exercises both encode and decode?

Additionally, could we add these new functions to the benchmarks in benches/stdlib.rs?

@esensar
Copy link
Contributor Author

esensar commented Feb 7, 2024

Thank you @esensar, this looks good.
Can you also add a VRL test like this that exercises both encode and decode?

Additionally, could we add these new functions to the benchmarks in benches/stdlib.rs?

Yeah, I have added them now.

I have added both cases when there is nothing to encode/decode and with IDN strings, to confirm that cases when no encoding is needed are faster (an order of magnitude faster on my machine).

@esensar
Copy link
Contributor Author

esensar commented Feb 7, 2024

I have added VRL tests as well, demonstrating encoding, decoding and combination with parse_url as well. I realized that I have made the function infallible accidentally, so I have updated that too.

Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding the tests!

src/stdlib/decode_punycode.rs Outdated Show resolved Hide resolved
src/stdlib/encode_punycode.rs Outdated Show resolved Hide resolved
@pront pront enabled auto-merge February 7, 2024 14:58
@pront pront added this pull request to the merge queue Feb 7, 2024
Merged via the queue into vectordotdev:main with commit 114bb3c Feb 7, 2024
11 checks passed
@esensar esensar deleted the feature/punycode_parse branch February 7, 2024 15:28
github-merge-queue bot pushed a commit to vectordotdev/vector that referenced this pull request Feb 9, 2024
* docs(vrl): add documentation for punycode encoding functions

Related: vectordotdev/vrl#672

* Allow IDN and punycode in spellchecker

* Change IDN allow entry into lowercase

* chore: expose component test utils (#19826)

* chore(deps): Bump VRL to 0.11.0 (#19827)

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* chore(ci): Bump aws-actions/configure-aws-credentials from 4.0.1 to 4.0.2 (#19823)

chore(ci): Bump aws-actions/configure-aws-credentials

Bumps [aws-actions/configure-aws-credentials](https://github.com/aws-actions/configure-aws-credentials) from 4.0.1 to 4.0.2.
- [Release notes](https://github.com/aws-actions/configure-aws-credentials/releases)
- [Changelog](https://github.com/aws-actions/configure-aws-credentials/blob/main/CHANGELOG.md)
- [Commits](aws-actions/configure-aws-credentials@v4.0.1...v4.0.2)

---
updated-dependencies:
- dependency-name: aws-actions/configure-aws-credentials
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): Bump the prost group with 1 update (#19830)

Bumps the prost group with 1 update: [prost-reflect](https://github.com/andrewhickman/prost-reflect).


Updates `prost-reflect` from 0.12.0 to 0.13.0
- [Changelog](https://github.com/andrewhickman/prost-reflect/blob/main/CHANGELOG.md)
- [Commits](andrewhickman/prost-reflect@0.12.0...0.13.0)

---
updated-dependencies:
- dependency-name: prost-reflect
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prost
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update punycode encoding to be fallible in docs

* Add failure reasons for punycode encoding

* Fix typo in decode_punycode docs

* Simplify error descriptions for punycode_encoding

* Fix formatting of punycode_encoding cue files

---------

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
github-merge-queue bot pushed a commit to vectordotdev/vector that referenced this pull request Feb 9, 2024
* docs(vrl): add documentation for punycode encoding functions

Related: vectordotdev/vrl#672

* Allow IDN and punycode in spellchecker

* Change IDN allow entry into lowercase

* chore: expose component test utils (#19826)

* chore(deps): Bump VRL to 0.11.0 (#19827)

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* chore(ci): Bump aws-actions/configure-aws-credentials from 4.0.1 to 4.0.2 (#19823)

chore(ci): Bump aws-actions/configure-aws-credentials

Bumps [aws-actions/configure-aws-credentials](https://github.com/aws-actions/configure-aws-credentials) from 4.0.1 to 4.0.2.
- [Release notes](https://github.com/aws-actions/configure-aws-credentials/releases)
- [Changelog](https://github.com/aws-actions/configure-aws-credentials/blob/main/CHANGELOG.md)
- [Commits](aws-actions/configure-aws-credentials@v4.0.1...v4.0.2)

---
updated-dependencies:
- dependency-name: aws-actions/configure-aws-credentials
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): Bump the prost group with 1 update (#19830)

Bumps the prost group with 1 update: [prost-reflect](https://github.com/andrewhickman/prost-reflect).


Updates `prost-reflect` from 0.12.0 to 0.13.0
- [Changelog](https://github.com/andrewhickman/prost-reflect/blob/main/CHANGELOG.md)
- [Commits](andrewhickman/prost-reflect@0.12.0...0.13.0)

---
updated-dependencies:
- dependency-name: prost-reflect
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prost
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update punycode encoding to be fallible in docs

* Add failure reasons for punycode encoding

* Fix typo in decode_punycode docs

* Simplify error descriptions for punycode_encoding

* Fix formatting of punycode_encoding cue files

---------

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
github-merge-queue bot pushed a commit to vectordotdev/vector that referenced this pull request Feb 9, 2024
* docs(vrl): add documentation for punycode encoding functions

Related: vectordotdev/vrl#672

* Allow IDN and punycode in spellchecker

* Change IDN allow entry into lowercase

* chore: expose component test utils (#19826)

* chore(deps): Bump VRL to 0.11.0 (#19827)

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* chore(ci): Bump aws-actions/configure-aws-credentials from 4.0.1 to 4.0.2 (#19823)

chore(ci): Bump aws-actions/configure-aws-credentials

Bumps [aws-actions/configure-aws-credentials](https://github.com/aws-actions/configure-aws-credentials) from 4.0.1 to 4.0.2.
- [Release notes](https://github.com/aws-actions/configure-aws-credentials/releases)
- [Changelog](https://github.com/aws-actions/configure-aws-credentials/blob/main/CHANGELOG.md)
- [Commits](aws-actions/configure-aws-credentials@v4.0.1...v4.0.2)

---
updated-dependencies:
- dependency-name: aws-actions/configure-aws-credentials
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): Bump the prost group with 1 update (#19830)

Bumps the prost group with 1 update: [prost-reflect](https://github.com/andrewhickman/prost-reflect).


Updates `prost-reflect` from 0.12.0 to 0.13.0
- [Changelog](https://github.com/andrewhickman/prost-reflect/blob/main/CHANGELOG.md)
- [Commits](andrewhickman/prost-reflect@0.12.0...0.13.0)

---
updated-dependencies:
- dependency-name: prost-reflect
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prost
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update punycode encoding to be fallible in docs

* Add failure reasons for punycode encoding

* Fix typo in decode_punycode docs

* Simplify error descriptions for punycode_encoding

* Fix formatting of punycode_encoding cue files

---------

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
AndrooTheChen pushed a commit to discord/vector that referenced this pull request Sep 23, 2024
…otdev#19794)

* docs(vrl): add documentation for punycode encoding functions

Related: vectordotdev/vrl#672

* Allow IDN and punycode in spellchecker

* Change IDN allow entry into lowercase

* chore: expose component test utils (vectordotdev#19826)

* chore(deps): Bump VRL to 0.11.0 (vectordotdev#19827)

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* chore(ci): Bump aws-actions/configure-aws-credentials from 4.0.1 to 4.0.2 (vectordotdev#19823)

chore(ci): Bump aws-actions/configure-aws-credentials

Bumps [aws-actions/configure-aws-credentials](https://github.com/aws-actions/configure-aws-credentials) from 4.0.1 to 4.0.2.
- [Release notes](https://github.com/aws-actions/configure-aws-credentials/releases)
- [Changelog](https://github.com/aws-actions/configure-aws-credentials/blob/main/CHANGELOG.md)
- [Commits](aws-actions/configure-aws-credentials@v4.0.1...v4.0.2)

---
updated-dependencies:
- dependency-name: aws-actions/configure-aws-credentials
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): Bump the prost group with 1 update (vectordotdev#19830)

Bumps the prost group with 1 update: [prost-reflect](https://github.com/andrewhickman/prost-reflect).


Updates `prost-reflect` from 0.12.0 to 0.13.0
- [Changelog](https://github.com/andrewhickman/prost-reflect/blob/main/CHANGELOG.md)
- [Commits](andrewhickman/prost-reflect@0.12.0...0.13.0)

---
updated-dependencies:
- dependency-name: prost-reflect
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prost
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update punycode encoding to be fallible in docs

* Add failure reasons for punycode encoding

* Fix typo in decode_punycode docs

* Simplify error descriptions for punycode_encoding

* Fix formatting of punycode_encoding cue files

---------

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: detect, parse punycode for DNS IDN objects
4 participants