Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(jmespath): add lexer component #2214

Merged
merged 3 commits into from
Mar 19, 2024
Merged

feat(jmespath): add lexer component #2214

merged 3 commits into from
Mar 19, 2024

Conversation

dreamorosi
Copy link
Contributor

@dreamorosi dreamorosi commented Mar 12, 2024

Description of your changes

This PR includes the implementation of the Lexer component, part of the JMESPath utility.

As discussed in the linked issue, the purpose of a lexer is to break down the input JMESPath expression into smaller meaningful units (tokens). These tokens represent the building blocks of the JMESPath language and are defined in the language grammar (#2192).

The Lexer's main method (public *tokenize() is implemented as a generator. This pattern allows the lexer to walk the expression iteratively and yield (aka return) tokens as it goes. While not a direct equivalent, the closest patten to describe this implementation would be a recursive function that maintains an external state (aka a reducer).

At each step, the lexer interprets a certain character and based on its type it performs certain actions.

To describe how the lexer works, let's take this expression as an example: foo.bar (leading white space is intentional).

With the expression above, the lexer will start looking at each character in the order they appear:

  • position: 0 - Since the first character is a white space, the lexer advances with no further action (source)
  • position: 1 - Next, the lexer encounters a valid character (f). At this point the lexer needs to understand how long this identifier is and so it will advance the position until a non-identifier character (aka anything that is not a number or letter) is found (source). In this example it will advance to position 4 and interpret foo as a single token.
  • position: 4 - The next character is a dot (aka .) which in the context of a lexer is considered a simple token.
  • position: 5 - Next, the lexer encounters another character (b), so just like one of the previous steps, it advances until a non-identifier character is found. In this case the lexer reaches the end of the expression.

This is a relatively simple example, but hopefully it helps clarifying the flow of the processing. For simpler tokens the implementation is inlined in the public *tokenize() method, while in other cases where the processing of a token required a more involved logic a dedicated method was created.

Related issues, RFCs

Issue number: #2205

Checklist

  • My changes meet the tenets criteria
  • I have performed a self-review of my own code
  • I have commented my code where necessary, particularly in areas that should be flagged with a TODO, or hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my change is effective and works
  • The PR title follows the conventional commit semantics

Breaking change checklist

Is it a breaking change?: NO

  • I have documented the migration process
  • I have added, implemented necessary warnings (if it can live side by side)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

@dreamorosi dreamorosi self-assigned this Mar 12, 2024
@dreamorosi dreamorosi requested review from a team as code owners March 12, 2024 11:44
@pull-request-size pull-request-size bot added the size/L PRs between 100-499 LOC label Mar 12, 2024
@github-actions github-actions bot added the feature PRs that introduce new features or minor changes label Mar 12, 2024
@dreamorosi dreamorosi linked an issue Mar 12, 2024 that may be closed by this pull request
2 tasks
@dreamorosi dreamorosi marked this pull request as draft March 12, 2024 16:27
@dreamorosi dreamorosi marked this pull request as ready for review March 12, 2024 17:13
Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@am29d am29d merged commit 006ebcf into main Mar 19, 2024
12 checks passed
@am29d am29d deleted the feat/jmespath_lexer branch March 19, 2024 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature PRs that introduce new features or minor changes size/L PRs between 100-499 LOC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: JMESPath lexer
2 participants