-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster AST Construction #21398
Closed
Closed
Faster AST Construction #21398
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
b634d6c
update the tokenizer to use 32 bit indexes - faster tokenizing and pa…
RetroDev256 1554f24
update Ast.zig to take advantage of having token end indexes avaliable
RetroDev256 3a00772
parse with `appendAssumeCapacity` instead of `assume`, and clear up m…
RetroDev256 2fa362a
fix CI failure, implement helpful suggestions
RetroDev256 b87a3e5
raise calls to ensureUnusedCapacity out of append, and into outer loops
RetroDev256 0002897
fix CI and speed things up a little :)
RetroDev256 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has this changed? Storing only the start offset of the token is a very intentional design choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that the tokenSlice function in Ast.zig could avoid running the tokenizer if I stored the end index. Would you like me to revert that change?
That change alone is responsible for the higher memory usage, but also cut off a few percent on the wall time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please. This tradeoff was made intentionally to save memory at the potential expense of a small amount of performance. I say "potential" because tokenization is fast enough that it could be faster to re-tokenize this one then than to get more cache misses from storing this data in memory. See Andrew's talk on DOD for details.
If the performance impact is more significant than expected perhaps we can revisit this. If you'd like to quantify the performance difference this gives on ast-check, that'd be appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the benchmark on Sema.zig with the changes to Ast.zig:
Here is the benchmark on Sema.zig without the changes to Ast.zig:
The results honestly make me wonder if anything else I did was just pure noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the difference between these two runs is purely the
pub const TokenList = std.MultiArrayList(Token);
change (alongside the tokenSlice change), without any of the other changes in this PR thrown in.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that based on these findings, I should close this PR if we want the memory savings.