Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing race condition between end-operation and future completion. #7705

Merged
merged 1 commit into from
Sep 3, 2024

Conversation

sdedic
Copy link
Member

@sdedic sdedic commented Aug 29, 2024

@mbien mentioned test failures in #7655, I've experienced them too in #7665 (action run) and #7697 (action run).

I've "sort of" traced/guessed it to finalization of a reload: in the original code, the Future completes (which means it unblocks the test thread), then the reload formally ends (pending reload is removed). The test may be able to observe the still-running reload (nearing its end) during its tearDown. The order was chosen so that events are fired only after Future completion, so potential observers see the new state as completed/ready.

The PR splits the "formal termination" of a pending reload from event delivery, so the sequence is now:

  • end the reload, potentially prepare the next one in the queue
  • complete the reload Future
  • deliver events and proceed with the next reload
    This sequence is actually better / more predictable for clients.

@sdedic sdedic added Gradle [ci] enable "build tools" tests Maven [ci] enable "build tools" tests labels Aug 29, 2024
@sdedic sdedic added this to the NB24 milestone Aug 29, 2024
@sdedic sdedic requested a review from mbien August 29, 2024 07:50
@sdedic sdedic self-assigned this Aug 29, 2024
@sdedic
Copy link
Member Author

sdedic commented Aug 29, 2024

Still does not work ;) so I'll play with it ... will ping after it is reviewable.

@sdedic sdedic marked this pull request as draft August 29, 2024 09:02
@apache apache locked and limited conversation to collaborators Aug 29, 2024
@apache apache unlocked this conversation Aug 29, 2024
@sdedic sdedic marked this pull request as ready for review August 29, 2024 14:05
@sdedic
Copy link
Member Author

sdedic commented Aug 29, 2024

@mbien seems better now. The 2nd failure was actually a bad test diagnostics ... but while fixing I made sure that the events created during a reload are delivered before next reload starts.

@sdedic
Copy link
Member Author

sdedic commented Sep 1, 2024

@mbien ping

Copy link
Member

@mbien mbien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran maven and gradle tests 20 times in sequence using a custom workflow on my clone and got only one failure in the gradle step
https://github.com/mbien/netbeans/actions/runs/10656862483 gradle runs
https://github.com/mbien/netbeans/actions/runs/10659082827 maven runs (still running but it looks good so far)

so the fix seems to work I believe. Some comments inline.

2024-09-01T20:57:07.1157732Z     [junit] Testcase: testGeneratedSources(org.netbeans.modules.gradle.java.classpath.GradleSourcesImplTest):	FAILED
2024-09-01T20:57:07.1159927Z     [junit] expected:<2> but was:<0>
2024-09-01T20:57:07.1160727Z     [junit] junit.framework.AssertionFailedError: expected:<2> but was:<0>
2024-09-01T20:57:07.1162431Z     [junit] 	at org.netbeans.modules.gradle.java.classpath.GradleSourcesImplTest.testGeneratedSources(GradleSourcesImplTest.java:55)
2024-09-01T20:57:07.1164429Z     [junit] 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2024-09-01T20:57:07.1166016Z     [junit] 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
2024-09-01T20:57:07.1167834Z     [junit] 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2024-09-01T20:57:07.1169325Z     [junit] 	at org.netbeans.junit.NbTestCase.access$200(NbTestCase.java:83)
2024-09-01T20:57:07.1170432Z     [junit] 	at org.netbeans.junit.NbTestCase$2.doSomething(NbTestCase.java:488)
2024-09-01T20:57:07.1171555Z     [junit] 	at org.netbeans.junit.NbTestCase$1Guard.run(NbTestCase.java:409)
2024-09-01T20:57:07.1172530Z     [junit] 	at java.base/java.lang.Thread.run(Thread.java:840)

Comment on lines 849 to 900
CompletableFuture<ProjectState> f = CompletableFuture.runAsync(() -> nextReloader.initRound(), loader).
thenCompose((v) -> nextReloader.start(floader));
f.whenComplete((a, b) -> {
CompletableFuture<ProjectState> f = CompletableFuture.runAsync(() -> fNextReloader.initRound(), loader).
thenCompose((v) -> fNextReloader.start(floader));
// run this cleanup in the dispatcher thread
f.whenCompleteAsync((a, b) -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a is the state and b an exception. Please give things names, this makes it more difficult to review on github than necessary.

simplified the block a bit

            CompletableFuture.runAsync(() -> fNextReloader.initRound(), loader)
                             .thenCompose((v) -> fNextReloader.start(floader))
                             .whenCompleteAsync((projectState, throwable) -> { // run this cleanup in the dispatcher thread

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, but separating the 1st part into a temporary turned out to be much nicer when debugging ;)

@mbien
Copy link
Member

mbien commented Sep 2, 2024

I pressed the wrong button, wanted to comment and not request changes - most of it are just suggestions.

@mbien
Copy link
Member

mbien commented Sep 2, 2024

I suppose we can link #7263 to this PR too?

@sdedic sdedic force-pushed the project/fix-tests-endoperation branch from b80cde2 to 9dcefae Compare September 2, 2024 13:12
@sdedic
Copy link
Member Author

sdedic commented Sep 2, 2024

rebased / squashed.

@sdedic sdedic requested a review from mbien September 2, 2024 13:12
@mbien mbien linked an issue Sep 2, 2024 that may be closed by this pull request
Comment on lines +764 to +777
private Collection<IdentityHolder> collectReleases(ProjectOperations op) {
Collection<IdentityHolder> releases = op.releases;
// this is copied from postCleanup, but can be done in batch without
// checking the project operation is in progress for each reference. These are already removed
// from stateIdentity, so just check they did not obtain another one:
releases.removeIf(expired -> {
ProjectStateData d = expired.state.get();
if (d == null) {
return true;
}
IdentityHolder h = stateIdentity.get(d);
return h != null && h != expired;
});
op.releases = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving L777 to L766 would make this a little bit safer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO not important as entire call executes under synchronized(this) which also guards other accesses to op.releases. Maybe optically safer = adjacent to taking out the value.

@sdedic sdedic merged commit 68010b0 into apache:master Sep 3, 2024
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Gradle [ci] enable "build tools" tests Maven [ci] enable "build tools" tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

maven ExecutionEnvHelperTest test is causing sporadic CI timeouts
2 participants