Fix concurrency bug in TransactionScoped beans initialization #29159

imperatorx · 2022-11-09T19:32:46Z

Introduced locking to prevent double, triple etc. initialization of TransactionScoped beans when called from different threads concurrently (think virtual threads and StucturedTaskScope forks).

ReentrantLock is used instead of synchronized blocks to make the locking virtual thread friendly. The computeIfAbsent usage of the ConcurrentHashMap in TransactionContextState would also be problematic (uses synchronized blocks internally) when bean creation does IO, like establishing a TCP connection, thus pinning the virtual threads carrier thread.

gastaldi · 2022-11-10T13:17:45Z

/cc @mmusgrov

gsmet · 2022-11-10T13:34:33Z

/cc @Sanne too

mmusgrov

The changes look ok but it'd be good if someone with CDI experience (mine is in transactions) could take a look, maybe @mkouba or @manovotn .

But we need a test for this because the MP CP TCK @Ignore`s the test [1] for the feature (the SmallRye implementation did pass the test before it was ignored). Can a similar test be added to this PR.

[1] https://github.com/eclipse/microprofile-context-propagation/blob/master/tck/src/main/java/org/eclipse/microprofile/context/tck/cdi/JTACDITest.java#L250

mmusgrov · 2022-11-22T12:56:56Z

Do we have a feel for the performance overhead added by the extra locking? Is the performance hit only on the first request for a contextual instance (jakarta.enterprise.context.spi.Context.get(...))?

manovotn · 2022-11-22T14:47:02Z

Do we have a feel for the performance overhead added by the extra locking? Is the performance hit only on the first request for a contextual instance (jakarta.enterprise.context.spi.Context.get(...))?

As per CDI documentation, Context#get() is called whenever you need to obtain a bean instance. It either returns an existing one, or creates a new one.
How and when you choose to perform locking is up to the context implementation. Obviously, you ideally want to only lock when you find out there is no instance and you are about to create a new one store it into context before returning it.

imperatorx · 2022-11-22T14:51:04Z

Do we have a feel for the performance overhead added by the extra locking? Is the performance hit only on the first request for a contextual instance (jakarta.enterprise.context.spi.Context.get(...))?

As the PR uses double-checked locking for both the TransactionContextState and the instance inside the TransactionContextState, only the first invocation will hit the locks, subsequent invocations will use the same path as before.

Slight correction: the first lock (TransactionContextState) will only lock once per transaction if uncontested, the second once per bean in uncontested.

Sanne · 2022-11-22T15:56:39Z

Hi @imperatorx , thanks - it looks good at high level but there's some details I'd like to clarify.

You seem to have wrapped within the lock also the acces to the TransactionSynchronizationRegistry (the invocation of transactionSynchronizationRegistry.get(), which is a LazyValue.

LazyValue already provides concurrent initialization safeguards - but it's relying on synchronization.
Is that something that should be fixed as well? There's more access to it beyond the block you've just protected; in general I wonder how any of this code could work safely with virtual threads.

imperatorx · 2022-11-22T19:07:07Z

I updated the PR:

Added a test
Removed the LazyValue.get() from the synchronized block as @Sanne suggested.
Removed the double-checked locking from the first lock, since uncontended locking (single threaded regular use case) uses a simple compare-and-set operation (see NonfairSync.initialTryLock in ReentrantLock), but the two getResource calls do a lot of things. So IMHO one compare-and-set is better performance-wise than an additional getResource call (calls TransactionManager.getTransaction(), accessess a synhronized hashmap, etc.)

As for refactoring LazyValue: Syncronized blocks work well with virtual threads if there is no blocking done inside them (IO, native calls, sleep, etc.), and the TransactionContext use case has none of it, it just returns a new instance. If other LazyValue use cases have blocking work in the creation part of the new instance, the synhronized block should be changed to a ReentrantLock.

Sanne · 2022-11-22T19:47:07Z

Excellent, thanks 👍 and I agree with the decision regarding LazyValue; I asked because in the previous revision of the patch is seemed like you intentetionally included that in the critical section.

Could you squash the changes together and avoid the import javax.transaction.*; please? It's a very minor nitpick, but we prefer listing all individual imports.

Sanne

Suggested two minor changes, however even in current state I think it's good.

mmusgrov

Thanks for adding the test and for answering my performance question.

manovotn · 2022-11-24T12:01:21Z

@mmusgrov FYI, the very same problem is in Narayana codebase, see https://github.com/jbosstm/narayana/blob/master/ArjunaJTA/cdi/classes/com/arjuna/ats/jta/cdi/TransactionContext.java

manovotn · 2022-11-24T12:07:15Z

@mmusgrov Also how does this solution link into the discussion in #29157 (comment)?
Does Narayna support that or not? This PR makes sure there is no bean creation race but IMO doesn't answer the MP CP question and we still lack test that would verify it either way.

imperatorx · 2022-11-24T12:12:58Z

@mmusgrov Also how does this solution link into the discussion in #29157 (comment)? Does Narayna support that or not? This PR makes sure there is no bean creation race but IMO doesn't answer the MP CP question and we still lack test that would verify it either way.

No, sadly Narayana does not fully support this workload: in case an exception is thrown inside a child virtual thread, the unwinding of the context propagation causes the transaction on the parent thread to get deactivated (or rather not re-activated). I'll look into that if I have time, the culprit seems to be https://github.com/smallrye/smallrye-context-propagation/blob/main/jta/src/main/java/io/smallrye/context/jta/context/propagation/JtaContextProvider.java#L65

mmusgrov · 2022-11-24T15:13:59Z

@mmusgrov FYI, the very same problem is in Narayana codebase, see https://github.com/jbosstm/narayana/blob/master/ArjunaJTA/cdi/classes/com/arjuna/ats/jta/cdi/TransactionContext.java

That code was added to support the wildfly-mp-reactive-feature-pack (https://github.com/wildfly-extras/wildfly-mp-reactive-feature-pack) which we did at the request of the WildFly team. I'll check with Kabir to see if there is any mechanism to get some concurrency testing (with respect to transaction propagation) added to the feature pack.

And, if that isn't possible then I'll investigate if it's feasible to add the test to the narayana testsuite.

manovotn · 2022-11-24T15:47:52Z

That code was added to support the wildfly-mp-reactive-feature-pack (https://github.com/wildfly-extras/wildfly-mp-reactive-feature-pack) which we did at the request of the WildFly team.

The Context impl class must have been present (in one form or another) since JTA wanted to integrate with CDI and have their own custom @TransactionalScoped.

mmusgrov · 2022-11-24T16:47:51Z

@manovotn Yes but isn't this bug only present when used with MP CP with parallel threads sharing the same transaction and we aren't testing that in narayana and the code was requested by WildFly as an incubator for reactive messaging, reactive streams operators and context propagation and, it turns out, is not used after all by WildFly, I'm checking to see who else might use it.

So if nobody is using it then I'd prefer to either send users to the Quarkus JTA extension for the functionality or ask them to raise an RFE.

And if nobody is using the feature in narayana and we remove it would we still need to protect the Context (I don't see any other code protecting the CDI Context in this way).

But that said, I will still look into how easy it would be for us to add the testing of MP CP to our own testsuite even if there is every chance it never get used!!

manovotn · 2022-11-24T17:48:29Z

I was simply pointing out that it might make sense to keep the TransactionalContext code/behavior identical (or as identical as possible) between both codebases. Whether you do it is ultimately up to you :)

mmusgrov · 2022-11-25T09:13:47Z

Ok it may make sense to keep them aligned and thanks for pointing it out. I will report back if/when we decide to include SmallRye testing in the Naryana CDI module.

quarkus-bot bot added the area/narayana Transactions / Narayana label Nov 9, 2022

mmusgrov suggested changes Nov 22, 2022

View reviewed changes

Sanne requested changes Nov 22, 2022

View reviewed changes

This comment has been minimized.

Sign in to view

imperatorx force-pushed the fix-29157 branch from 61cb2ec to e27eddd Compare November 22, 2022 20:03

Fix concurrency issues in TransactionScope quarkusio#29157

700950c

imperatorx force-pushed the fix-29157 branch from e27eddd to 700950c Compare November 22, 2022 20:08

Sanne approved these changes Nov 22, 2022

View reviewed changes

Sanne added the triage/waiting-for-ci Ready to merge when CI successfully finishes label Nov 22, 2022

Sanne mentioned this pull request Nov 23, 2022

Fix flaky test in reactive-messaging-hibernate-orm #29433

Merged

mmusgrov approved these changes Nov 24, 2022

View reviewed changes

gsmet merged commit 701b620 into quarkusio:main Nov 24, 2022

quarkus-bot bot added this to the 2.15 - main milestone Nov 24, 2022

quarkus-bot bot added kind/bugfix and removed triage/waiting-for-ci Ready to merge when CI successfully finishes labels Nov 24, 2022

gsmet changed the title ~~Fix concurrency bug~~ Fix concurrency bug in TransactionScoped beans initialization Nov 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix concurrency bug in TransactionScoped beans initialization #29159

Fix concurrency bug in TransactionScoped beans initialization #29159

imperatorx commented Nov 9, 2022 •

edited

Loading

gastaldi commented Nov 10, 2022

gsmet commented Nov 10, 2022

mmusgrov left a comment

mmusgrov commented Nov 22, 2022

manovotn commented Nov 22, 2022

imperatorx commented Nov 22, 2022 •

edited

Loading

Sanne commented Nov 22, 2022

imperatorx commented Nov 22, 2022

Sanne commented Nov 22, 2022 •

edited

Loading

Sanne left a comment

This comment has been minimized.

mmusgrov left a comment •

edited

Loading

manovotn commented Nov 24, 2022

manovotn commented Nov 24, 2022

imperatorx commented Nov 24, 2022 •

edited

Loading

mmusgrov commented Nov 24, 2022 •

edited

Loading

manovotn commented Nov 24, 2022

mmusgrov commented Nov 24, 2022

manovotn commented Nov 24, 2022

mmusgrov commented Nov 25, 2022

Fix concurrency bug in TransactionScoped beans initialization #29159

Fix concurrency bug in TransactionScoped beans initialization #29159

Conversation

imperatorx commented Nov 9, 2022 • edited Loading

gastaldi commented Nov 10, 2022

gsmet commented Nov 10, 2022

mmusgrov left a comment

Choose a reason for hiding this comment

mmusgrov commented Nov 22, 2022

manovotn commented Nov 22, 2022

imperatorx commented Nov 22, 2022 • edited Loading

Sanne commented Nov 22, 2022

imperatorx commented Nov 22, 2022

Sanne commented Nov 22, 2022 • edited Loading

Sanne left a comment

Choose a reason for hiding this comment

This comment has been minimized.

mmusgrov left a comment • edited Loading

Choose a reason for hiding this comment

manovotn commented Nov 24, 2022

manovotn commented Nov 24, 2022

imperatorx commented Nov 24, 2022 • edited Loading

mmusgrov commented Nov 24, 2022 • edited Loading

manovotn commented Nov 24, 2022

mmusgrov commented Nov 24, 2022

manovotn commented Nov 24, 2022

mmusgrov commented Nov 25, 2022

imperatorx commented Nov 9, 2022 •

edited

Loading

imperatorx commented Nov 22, 2022 •

edited

Loading

Sanne commented Nov 22, 2022 •

edited

Loading

mmusgrov left a comment •

edited

Loading

imperatorx commented Nov 24, 2022 •

edited

Loading

mmusgrov commented Nov 24, 2022 •

edited

Loading