Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FISH-6588 Embedded EJB Container NPE Fix #6007

Conversation

Pandrex247
Copy link
Member

@Pandrex247 Pandrex247 commented Oct 28, 2022

Description

Bug fix for the PayaraExecutorService encountering NPEs during use in an Embedded EJB Container.

With ApplicationLoaderService now starting at run level 15 rather tha 10, this meant the ALL_APPLICATIONS_LOADED event was firing later than before, leading to the thread pools in the PayaraExecutorService still being null during the "half-restarts" that occur when using the Embedded EJB Container.

These "half-restarts" only re-initialise some services, not all, which was leading to the thread pools in PayaraExecutorService being nulled from receiving the SERVER_SHUTDOWN event, but never being reinitialised.

Since the SERVER_START event doesn't get fired until after run-level 10 (our NPE happens at run-level 10), I've created a new event to get fired after run-level 1.

Important Info

Blockers

None

Testing

New tests

None

Testing Performed

Ran the ejb30/lite/appexception test suite from the Jakarta EE 8 TCK.
Pending: full run of Jakarta EE 8 TCK - TCKSuite#594 (in queue)

Testing Environment

WSL, Zulu 8u345.

Documentation

N/A

Notes for Reviewers

The ejb30/lite/appexception is actually broken in two ways right now: by the boot sequencing changes I'm fixing in this PR, and by FISH-6495 (Luis testing his own fix for that now).

Problem and solution investigation, including debug logging of the boot/shutdown sequence of various events and services, has been added to the same Confluence page as before: Start-up, Post-Boot, Deployment, and Post-Deploy#PayaraExecutorService-NPE

Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>
Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>
Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>
@Pandrex247
Copy link
Member Author

Jenkins test please

Copy link
Contributor

@pdudits pdudits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HazelcastCore has similar event listener. Would you please have a look if either of initialization events is better fit there as well?

@Pandrex247
Copy link
Member Author

Pandrex247 commented Oct 28, 2022

Just from eyeballing it...

HazelcastCore starts at run-level 10, it wouldn't be initialised in time to receive the POST_SERVER_INIT event outside of any similar "half-restart" scenarios.

Regarding responding to the ALL_APPLICATIONS_LOADED event to perform its bootstrap, yes it will now be initialising itself at run-level 10 but bootstrapping at run-level 15. From looking at the history of the class, this deferred bootstrap appears to be intentional. Hazelcast reacting to the ALL_APPLICATIONS_STOPPED and ALL_APPLICATIONS_LOADED events rather than SERVER_SHUTDOWN and SERVER_STARTUP appears to have been done as a fix for EJB Timers.

Hazelcast will perform its bootstrap in reaction to essentially anything asking for it, so this deferred bootstrap is "non-validating" so to speak. All of the following methods will perform the bootstrap if it hasn't been done yet: getInstance, getCachingProvider, isLite, getUUID, and setEnabled.

A quick look through the public methods of this class would lead me to believe the only way to access Hazelcast during startup before it has completed bootstrap is if a run-level 14 or earlier service attempts to use any of the following methods: getAttribute, setAttribute, getAttributes, getPort, getMemberName, getMemberGroup, isEnabled.
A few of those methods look like they may cause an NPE for accessing theInstance variable before it's initialised during bootstrapHazelcast.

Technically I think the window to get these NPEs was always there, but with my boot sequencing changes I've widened the window to where it's more feasible you could actually hit them.

Separately, I'm not sure this event reaction condition would ever now evaluate to true now, but the event gets sent during bootstrapHazelcast so while it's potentially now (or always was) dead code it's inconsequential.

        else if(event.is(EventTypes.SERVER_STARTUP) && isEnabled() && booted) {
            // send this event only after all Startup services have been initialized
            events.send(new Event(HazelcastEvents.HAZELCAST_BOOTSTRAP_COMPLETE));
        }

So TLDR: New event isn't helpful to Hazelcast. NPEs might now be possible.

@Pandrex247 Pandrex247 merged commit 274da07 into payara:master Nov 1, 2022
@Pandrex247 Pandrex247 deleted the FISH-6588-PayaraExecutorService-Fix-No-Revert branch November 1, 2022 09:06
Pandrex247 added a commit to Pandrex247/Payara that referenced this pull request Dec 5, 2022
…raExecutorService-Fix-No-Revert"

This reverts commit 274da07, reversing
changes made to d2b7a54.
Pandrex247 added a commit to Pandrex247/Payara that referenced this pull request Dec 5, 2022
…raExecutorService-Fix-No-Revert"

This reverts commit 274da07, reversing
changes made to d2b7a54.
Pandrex247 added a commit to Pandrex247/Payara that referenced this pull request Dec 5, 2022
…raExecutorService-Fix-No-Revert"

This reverts commit 274da07, reversing
changes made to d2b7a54.
Pandrex247 added a commit to Pandrex247/Payara that referenced this pull request Dec 5, 2022
…orService-Fix-No-Revert

FISH-6588 Embedded EJB Container NPE Fix
Pandrex247 added a commit to Pandrex247/Payara that referenced this pull request Dec 5, 2022
…raExecutorService-Fix-No-Revert"

This reverts commit 274da07, reversing
changes made to d2b7a54.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants