Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse paths and identifiers only once #840

Merged
merged 2 commits into from
Aug 2, 2023
Merged

Parse paths and identifiers only once #840

merged 2 commits into from
Aug 2, 2023

Conversation

Kijewski
Copy link
Contributor

In the old implementation each variable in an expression would be parsed up to three times:

  • Try to parse a path because it contains a leading double colon, or infix double colons.
  • Try to parse it as path again by scanning for an identifier that contains an upper case character.
  • Fall back to scanning for any identifier.

This PR turns all three steps into one, without the need for backtracking.

Copy link
Owner

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Did you benchmark the effect?


// If `identifier()` fails then just return the original error
Err(err)
(None, name, []) if !name.contains(char::is_uppercase) => {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What tests break if we leave out this arm? Maybe move the comment to be directly before this arm, which seems to be the main thing that's affected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This arm tells variable names (no upper case characters) and struct names (at least one upper case character) apart. I guess every test will fail without it. :)

askama_derive/src/parser/expr.rs Outdated Show resolved Hide resolved
askama_derive/src/parser/expr.rs Outdated Show resolved Hide resolved
@Kijewski
Copy link
Contributor Author

Kijewski commented Jul 21, 2023

As benchmark I used cd testing; find -name '*.rs' -exec touch '{}' '+'; time cargo b --tests (best out of three runs):

       real     user    sys
old  1.623s  10.442s  2.235s
new  1.473s  10.483s  2.272s

I cannot say why the real time got so much better, if the user time got worse? All CPU cores only have around 33°C, so they should all be utilized in my test without thermal throttling. I guess I should try to find a better benchmark, but I guess without #834 we cannot properly benchmark the parser. :-/ I'll try to review your PR this weekend!

@djc
Copy link
Owner

djc commented Aug 2, 2023

(Feel free to just merge this after rebasing.)

Kijewski and others added 2 commits August 2, 2023 23:49
In the old implementation each variable in an expression would be parsed
up to three times:

* Try to parse a path because it contains a leading double colon, or
  infix double colons.
* Try to parse it as path again by scanning for an identifier that
  contains an upper case character.
* Fall back to scanning for any identifier.

This PR turns all three steps into one, without the need for
backtracking.
@Kijewski
Copy link
Contributor Author

Kijewski commented Aug 2, 2023

Not a huge difference, but it's a little better:

100 bytes               time:   [11.475 µs 11.594 µs 11.694 µs]
                        change: [-6.8066% -4.5471% -2.2145%] (p = 0.00 < 0.05)
                        Performance has improved.

1000 bytes              time:   [62.269 µs 62.421 µs 62.555 µs]
                        change: [-4.8183% -4.1854% -3.5724%] (p = 0.00 < 0.05)
                        Performance has improved.

10000 bytes             time:   [536.05 µs 536.83 µs 537.69 µs]
                        change: [-3.2005% -2.9935% -2.7835%] (p = 0.00 < 0.05)
                        Performance has improved.

100000 bytes            time:   [4.9944 ms 4.9996 ms 5.0053 ms]
                        change: [-2.8939% -2.7114% -2.5330%] (p = 0.00 < 0.05)
                        Performance has improved.

@Kijewski Kijewski merged commit d4fbad1 into djc:main Aug 2, 2023
18 checks passed
@Kijewski Kijewski deleted the pr-parse-var-once branch August 2, 2023 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants