Faster AST Construction #21398

RetroDev256 · 2024-09-13T06:02:41Z

Faster on majority of source files
Doesn't make the source code too obscure

With alternate changes, I could consistently get ~10% speed improvements on large files, but it had a big downside. Firstly, smaller files took a lot longer to parse, and secondly, it made the Parse.zig source code really hard to maintain.

With these sets of tweaks, a typical source file will be parsed ~3-5% faster, but very small files seem to be parsed slightly slower. Do note that by "slightly slower", it is a percentage difference. Smaller files are inherently less complicated, so less time is gained/wasted by the same percentage difference of parsing a small file, than a large file, and if you look at the benchmarks, the standard deviation makes telling if something was "slow" or "fast" pretty difficult.

This does constitute a breaking change on std.zig.Tokenizer.Token - my reasoning for the change is that the language itself limits itself to 2^32 bytes per file at most, so the standard library should not need to bother allowing us to tokenize up to 2^64 bytes of source, while not being able to parse it all. With the breaking change, I got rid of that annoying 'loc', which personally I don't think was too necessary.

Optimizations in the Zig compiler are of course really enjoyable to try out, but do note that the AST construction takes a very insignificant time in the grand scheme of things.

Benchmarking ast-check on Sema.zig:

Benchmarking ast-check on print_air.zig:

Benchmarking ast-check on the `zig init` main.zig:

…rsing

…emory for the rest of the pipeline

Rexicon226

Just a quick glance.

lib/std/zig/Ast.zig

RetroDev256 · 2024-09-13T16:31:51Z

For the recent changes, perf on the zig init main.zig and print_air.zig remain very slightly improved, but now I am consistently seeing this benchmark for Sema.zig:

mlugg · 2024-09-13T17:25:11Z

lib/std/zig/Ast.zig

-    tag: Token.Tag,
-    start: ByteOffset,
-});
+pub const TokenList = std.MultiArrayList(Token);


Why has this changed? Storing only the start offset of the token is a very intentional design choice.

I found that the tokenSlice function in Ast.zig could avoid running the tokenizer if I stored the end index. Would you like me to revert that change?

That change alone is responsible for the higher memory usage, but also cut off a few percent on the wall time.

Yes, please. This tradeoff was made intentionally to save memory at the potential expense of a small amount of performance. I say "potential" because tokenization is fast enough that it could be faster to re-tokenize this one then than to get more cache misses from storing this data in memory. See Andrew's talk on DOD for details.

If the performance impact is more significant than expected perhaps we can revisit this. If you'd like to quantify the performance difference this gives on ast-check, that'd be appreciated.

Here is the benchmark on Sema.zig with the changes to Ast.zig:

Here is the benchmark on Sema.zig without the changes to Ast.zig:

The results honestly make me wonder if anything else I did was just pure noise.

And the difference between these two runs is purely the pub const TokenList = std.MultiArrayList(Token); change (alongside the tokenSlice change), without any of the other changes in this PR thrown in.

I feel that based on these findings, I should close this PR if we want the memory savings.

RetroDev256 · 2024-09-13T19:11:40Z

In any case, the changes aren't playing well with translate-c/ast.zig, I don't think this change is significant enough to warrant a ton of code shuffling.

RetroDev256 added 3 commits September 12, 2024 23:18

update the tokenizer to use 32 bit indexes - faster tokenizing and pa…

b634d6c

…rsing

update Ast.zig to take advantage of having token end indexes avaliable

1554f24

parse with appendAssumeCapacity instead of assume, and clear up m…

3a00772

…emory for the rest of the pipeline

Rexicon226 reviewed Sep 13, 2024

View reviewed changes

lib/std/zig/Ast.zig Outdated Show resolved Hide resolved

lib/std/zig/Ast.zig Outdated Show resolved Hide resolved

wooster0 reviewed Sep 13, 2024

View reviewed changes

lib/std/zig/Ast.zig Outdated Show resolved Hide resolved

RetroDev256 added 2 commits September 13, 2024 09:44

fix CI failure, implement helpful suggestions

2fa362a

raise calls to ensureUnusedCapacity out of append, and into outer loops

b87a3e5

mlugg reviewed Sep 13, 2024

View reviewed changes

fix CI and speed things up a little :)

0002897

RetroDev256 closed this Sep 13, 2024

RetroDev256 deleted the fastest-ast branch September 18, 2024 01:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster AST Construction #21398

Faster AST Construction #21398

RetroDev256 commented Sep 13, 2024 •

edited

Loading

Rexicon226 left a comment

RetroDev256 commented Sep 13, 2024

mlugg Sep 13, 2024

RetroDev256 Sep 13, 2024 •

edited

Loading

mlugg Sep 13, 2024

RetroDev256 Sep 13, 2024 •

edited

Loading

RetroDev256 Sep 13, 2024 •

edited

Loading

RetroDev256 Sep 13, 2024

RetroDev256 commented Sep 13, 2024

Faster AST Construction #21398

Faster AST Construction #21398

Conversation

RetroDev256 commented Sep 13, 2024 • edited Loading

Benchmarking ast-check on Sema.zig:

Benchmarking ast-check on print_air.zig:

Benchmarking ast-check on the zig init main.zig:

Rexicon226 left a comment

Choose a reason for hiding this comment

RetroDev256 commented Sep 13, 2024

mlugg Sep 13, 2024

Choose a reason for hiding this comment

RetroDev256 Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

mlugg Sep 13, 2024

Choose a reason for hiding this comment

RetroDev256 Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

RetroDev256 Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

RetroDev256 Sep 13, 2024

Choose a reason for hiding this comment

RetroDev256 commented Sep 13, 2024

RetroDev256 commented Sep 13, 2024 •

edited

Loading

Benchmarking ast-check on the `zig init` main.zig:

RetroDev256 Sep 13, 2024 •

edited

Loading

RetroDev256 Sep 13, 2024 •

edited

Loading

RetroDev256 Sep 13, 2024 •

edited

Loading