Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reserve capacity before extending Punctuated #1471

Closed
wants to merge 9 commits into from

Conversation

mo8it
Copy link

@mo8it mo8it commented Jun 23, 2023

While extending Punctuated from an iterator, use size_hint to reserve capacity before pushing to avoid repeated allocations.

I did also avoid unneeded allocations of a new Box.

There are also small optimizations like removing the check of nomore in the while loop and only checking if we did receive an End.

I did mention in a TODO comment that the behavior of extend from an iterator of pairs is not consistent with the behavior of extend from an iterator of items. You can also see it in the added test. I would recommend not adding a punctuation if the iterator is empty, but since that could be a braking change, I did not do it.

This is my first contribution and I did not find a CONTRIBUTING file. Therefore, please tell me if something is odd :)

Sorry for the typo in the branch name :/

@mo8it mo8it marked this pull request as draft June 24, 2023 13:37
@mo8it mo8it marked this pull request as ready for review June 24, 2023 17:32
@mo8it
Copy link
Author

mo8it commented Jun 24, 2023

I have done the following benchmark:

cargo b -r && hyperfine -N "BINARY":

for _ in 0..10_000 {
    std::hint::black_box(Punctuated::<u64, syn::token::Comma>::from_iter(0..4096));
    std::hint::black_box(Punctuated::<u64, syn::token::Comma>::from_iter(
        (0..4096).map(|i| syn::punctuated::Pair::Punctuated(i, Default::default())),
    ));
}

Results:

master branch:

Time (mean ± σ):     630.4 ms ±   9.4 ms    [User: 628.7 ms, System: 0.7 ms]
Range (min … max):   617.5 ms … 642.1 ms    10 runs

This branch:

Time (mean ± σ):     257.0 ms ±   4.3 ms    [User: 256.0 ms, System: 0.5 ms]
Range (min … max):   253.3 ms … 265.7 ms    11 runs

The speedup for this benchmark is about 2.5x :D

I did get the same speedup for more 1_000_000 loops and 0..16 in both iterators.

@mo8it
Copy link
Author

mo8it commented Jun 24, 2023

I think that the assertions in push_punct and push_value should be debug_assert! instead of assert!. People using them are opting in for better performance, otherwise they would just use push.

@mo8it
Copy link
Author

mo8it commented Jun 24, 2023

While waiting for a review, I did just see that I could recycle Box in push too!

Here is a benchmark:

for _ in 0..1_000_000 {
    let mut p: Punctuated<_, syn::token::Comma> = Punctuated::new();
    for i in 0..16 {
        p.push(i);
    }
}

master branch:

Time (mean ± σ):     240.9 ms ±   1.7 ms    [User: 240.0 ms, System: 0.4 ms]
Range (min … max):   239.2 ms … 245.3 ms    12 runs

This branch:

Time (mean ± σ):     124.2 ms ±   2.2 ms    [User: 123.5 ms, System: 0.4 ms]
Range (min … max):   121.1 ms … 127.1 ms    23 runs

Speedup about 1.9x

@mo8it
Copy link
Author

mo8it commented Jun 25, 2023

Actually, why is last boxed in the first place?

@mo8it
Copy link
Author

mo8it commented Jul 24, 2023

@dtolnay Hey, is there something wrong with the PR?

I know that it ended up a bit messy. I could create a new one, but I need some feedback first :)

Copy link
Owner

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty skeptical that the added complexity of all this is worthwhile. Is there a real use case that is sensitive to release-mode Punctuated::extend performance? Could you give detail on what led you to pursue this particular change?

Thanks anyway for the PR!

@mo8it
Copy link
Author

mo8it commented Jul 24, 2023

@dtolnay I did read some of syn's source code while working on typed-builder and did find out that it can be optimized to avoid allocations.

I did just remove some of unsafe to reduce the complexity. I did declare that function unsafe because I wanted to use unwrap_unchecked, but this is not compatible with the current MSRV.

Currently, unsafe is only used once to avoid the comparison of the length and capacity before pushing because we do the allocation. I can remove it if you prefer that and just use Vec::push.

More complexity could be avoided if last is not boxed. Could you please explain to me why it is boxed? Probably to avoid having huge objects in the stack, but is this really the case here?

I think that 2.5x speedup is worth it, but this is only my opinion.

I wanted to add reserve and with_capacity later to further optimize pushing. But that would be another PR.

Copy link
Owner

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absent of a real use case where this change would make a difference, I prefer to keep the original straightforward code.

@dtolnay dtolnay closed this Jul 24, 2023
@arendjr
Copy link

arendjr commented Jul 24, 2023

As a heavy user of typed-builder and also the author of fp-bindgen, both of which depend heavily on syn, I am very much interested in work that speeds up compilation time of syn-using macros.

Not sure how much this PR is going to help on our overall build times of course, but I might give it a shot soon to see if this PR improves build times for us.

@arendjr
Copy link

arendjr commented Jul 25, 2023

I tried recompiling our crates that heavily use both typed-builder and fp-bindgen, both with syn 2.0.20 and with this branch, but I have to admit the results appear to be a wash. Sometimes one was faster, sometimes the other. In any case they were within margin of error from each other.

@dtolnay
Copy link
Owner

dtolnay commented Jul 27, 2023

As you can probably tell, I am not surprised to hear that. A couple dozen memory allocations max per macro call is not a perceptible amount. If I had to guess, you are looking at an effect on the order of 0.0001% (1 microsecond per second spent on compilation). Any positive effect is reversed by 2 orders of magnitude by putting this amount of extra code into syn. The number of extra memory allocations from this new code during compiling syn would be 2 orders of magnitude bigger than it could possibly save downstream.

@mo8it
Copy link
Author

mo8it commented Jul 27, 2023

@dtolnay Alright, thank you a lot for this explanation.

I am still interested in knowing why last is boxed, just out of curiosity.

Would it be fine for you if I would at least add with_capacity and reserve? They don't make the code more complex, just 2 simple additional methods. I am fine with it if you say that you don't want it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants