Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup ContextInstances if there are null instances #38761

Merged
merged 1 commit into from
Feb 20, 2024

Conversation

franz1981
Copy link
Contributor

This is an alternative to #38737 with no need of cutoff values and always performing fast-path unlocked checks, if possible, and lazy allocation of a ReentrantLock.

In order to reduce the stack-map it has introduced smaller methods to remove/lazy allocate locks which are reused whenever possible

@franz1981
Copy link
Contributor Author

@Ladicek @mkouba

@quarkus-bot quarkus-bot bot added the area/arc Issue related to ARC (dependency injection) label Feb 13, 2024
@franz1981 franz1981 force-pushed the main_lazy_ctx_instances branch 2 times, most recently from fb75964 to 46da21d Compare February 13, 2024 15:47
@franz1981 franz1981 marked this pull request as ready for review February 13, 2024 15:50
@franz1981 franz1981 force-pushed the main_lazy_ctx_instances branch from 46da21d to a25c127 Compare February 13, 2024 15:55
@franz1981
Copy link
Contributor Author

@mkouba I've added 92bc2e5 and update the Java code snippet to explain what is doing: it shouldn't be a big deal but it seems more correct to return only the value read, instead of performing an unguarded volatile read, there.

@franz1981 franz1981 force-pushed the main_lazy_ctx_instances branch from 92bc2e5 to 8aa4ea2 Compare February 13, 2024 17:43

This comment has been minimized.

@franz1981 franz1981 force-pushed the main_lazy_ctx_instances branch from 8aa4ea2 to be0d26d Compare February 14, 2024 12:22

This comment has been minimized.

Copy link
Contributor

@mkouba mkouba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added few comments but it looks really good!

I will try to run our microbenchmarks to see if it makes any difference...

@franz1981
Copy link
Contributor Author

Thanks @mkouba yep, in case the micro doesn't impact, I suggest to perform one where the context instance is allocated on the fly and have different instance fields while using a parameter to decide how many to compute, and including a remove at the end of the usage life-cycle ; having such benchmark will help capturing the actual possible behaviours for request scoped beans, which seems to be allocated, computed, removed in the hot path

@mkouba
Copy link
Contributor

mkouba commented Feb 15, 2024

So I ran the microbenchmarks and the only benchmark that shows significant improvement is the RequestContextActivationBenchmark which merely tests request context activation/deactivation, no bean operations - and it's expected due to lazy init of Lock. The throughtput is doubled!

The ContextProviderBenchmark that tests ArcContextProvider (SmallRye Context Propagation integration) shows a decent improvement too (because it does not do much more than RequestContextActivationBenchmark). So this PR could slightly improve the performance in use cases where SR Context Propagation is used extensively.

However, the RequestContextBenchmark, which activates the context, invokes client proxies of 5 @RequestScoped beans and finally terminates the context, shows no significant change. And I think that it's expected too because this PR does not improve the ContextInstances#computeIfAbsent() method which is critical for this benchmark. @franz1981 is the use case you describe in the previous comment different?

@franz1981
Copy link
Contributor Author

franz1981 commented Feb 15, 2024

@mkouba

@franz1981 is the use case you describe in the previous comment different?

Exactly and happy that you already had crafted other benchmarks which capture the use case (which is farly realistic for the request scoped case) of the changes sent here, thanks again for checking!!!!

Probably RequestContextBenchmark, could have been relevant if the benchmark had a parameter to decide how many of these beans are computed eg for a total of 5 beans, just 1(,2,3,4,5) is computed. It will impact the removal as well, given that this pr remove only what is actually computed, too.

Just for reference

image

this was the call-stack of a compute case with 1/5 beans were computed (hence saving 3 * 4 = 12 atomic volatile ops in the hot path)

@mkouba
Copy link
Contributor

mkouba commented Feb 16, 2024

Probably RequestContextBenchmark, could have been relevant if the benchmark had a parameter to decide how many of these beans are computed eg for a total of 5 beans, just 1(,2,3,4,5) is computed. It will impact the removal as well, given that this pr remove only what is actually computed, too.

So I've tried to adapt the RequestContextBenchmark in a way that:

  1. The app contains 10 request scoped beans
  2. We always test 15 client proxy invocations but in three variants:
    1. 1 of the 10 beans is used (i.e. instantiated in ContextInstances.computeIfAbsent()),
    2. 3 of the 10 beans are used, and
    3. 5 of the 10 beans are used.

And this PR indeed increased the throughput in the first scenario (1/10) by ~ 40%, in the second (3/10) by ~ 15% and the results of the third are more or less the same.

In other words, the more request scoped beans exist in the app and the less beans are actually used in the benchmark the better throughput with this PR. And since it's very likely that only a fraction of all request scoped beans is used in a single request of a "real world" app I think that it's really a great improvement. Good job!

@franz1981 franz1981 force-pushed the main_lazy_ctx_instances branch from be0d26d to 0608ddd Compare February 16, 2024 09:35
@franz1981
Copy link
Contributor Author

franz1981 commented Feb 16, 2024

I'm adding a note here to help ourselves of the future; in order to "fix" the existing impl and this PR in the case where a compute is started after (no need to be concurrent really) it (regardless the remove has found any value to remove, before):

  1. compute always have to check for any existing invalidation before starting
  2. after compute has happened, in case it has stored any value, it should check again the invalidation status, and if true, should loop till removing null out everything

The last loop is necessary because other(s) concurrent compute can end up "recomputing" the value again, so it's important for them to keep on trying to remove the value till is stable as null.

This mechanism is to prevent @PreDestroy to NOT be called because the remove will read a null volatile instance value, and the compute will set it right after, without cleaning it up.

@mkouba @Ladicek
I know that's crazy but i can easily forget this concurrent stuff, so let me brain dump this here

This comment has been minimized.

@franz1981 franz1981 force-pushed the main_lazy_ctx_instances branch from 0608ddd to c256a0c Compare February 16, 2024 13:48

This comment has been minimized.

@franz1981 franz1981 force-pushed the main_lazy_ctx_instances branch from c256a0c to a86cddb Compare February 17, 2024 10:23
@franz1981
Copy link
Contributor Author

Ok @geoand this should be a very good one, once merged. Just need to make sure the test failures are all flaky ones

This comment has been minimized.

@franz1981
Copy link
Contributor Author

https://github.com/franz1981/quarkus/actions/runs/7941041918 seems to show that the failure reported here is real or another flaky test

@franz1981 franz1981 force-pushed the main_lazy_ctx_instances branch from a86cddb to ab94de7 Compare February 17, 2024 17:09

This comment has been minimized.

@geoand
Copy link
Contributor

geoand commented Feb 19, 2024

@mkouba this looks good, WDYT?

This comment has been minimized.

@franz1981 franz1981 force-pushed the main_lazy_ctx_instances branch from b0b28a4 to 9905b94 Compare February 20, 2024 08:55
@franz1981
Copy link
Contributor Author

Done @mkouba and @geoand waiting the CI again, although I've just changed a comment and rebased)

@mkouba mkouba added the triage/waiting-for-ci Ready to merge when CI successfully finishes label Feb 20, 2024
Copy link
Contributor

@Ladicek Ladicek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getAllPresent() will no longer return a consistent snapshot, but that's not guaranteed for ComputingCacheContextInstances either, so should be fine. Nice work!

Copy link

quarkus-bot bot commented Feb 20, 2024

Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit 9905b94.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

@geoand geoand merged commit da7f89a into quarkusio:main Feb 20, 2024
49 checks passed
@quarkus-bot quarkus-bot bot added this to the 3.9 - main milestone Feb 20, 2024
@quarkus-bot quarkus-bot bot removed the triage/waiting-for-ci Ready to merge when CI successfully finishes label Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/arc Issue related to ARC (dependency injection) triage/flaky-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants