-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request Scope not active in Fault Tolerance handling #2632
Comments
Unfortunately based on the description, this looks like a race condition. It may take a bit to reproduce and fix this one. |
@bunto76 Is there any new code that you can share? Or should this be theoretically reproducible with one of the existing Helidon tests that we created for the other related issues? |
In theory should be reproducible with the tests for the related issues. I tried to recreate with a simple example but couldn't but I hadn't yet tried a simple example with a Qualifier. I'll give that a try as well, as the application is using a Qualifier. |
@spericas I tested with a Qualifier, with Retry, Fallback and Circuit Breaker in a simple example but couldn't reproduce that way. I can reproduce it in our service integration tests and will continue to investigate those to see if can identify anything. |
@bunto76 We now have this test based on your earlier findings that we run in our pipeline: https://github.com/oracle/helidon/tree/master/tests/functional/request-scope Perhaps you can take a look and let us know if it is representative of the issue that you are seeing. We ran this test on every commit. Anything else of relevance related to your Jenkins setup? Could this be due to pipeline machines being heavily loaded? |
As a quick test, I tried running the test in the link above repeatedly (using
|
Using concurrent requests in the tests does not result in errors either. See, |
I can reproduce this locally on my own machine, so not specific to Jenkins env. The stack trace above is from a local fail. It is happening in more than one fault tolerance scenario and have seen this with Retry and Circuit Breaker. The scenario is request scope in a fault tolerance scenario such as when a Retry is invoked due to a failure. It doesn't seem like the request scope tests linked above would cover that? |
I tested this with 2.2.1 to see if any changes in that release would resolve this issue but I still see exactly the same problem. I will keep trying to create a simple code example outside of the service where we see the problem. |
As discussed attached pom.xml and executable jar. Call endpoint GET /images: |
After hours and hours of analysis, I can report back on the multiple conditions that are required to reproduce this problem:
All three conditions above (the perfect trifecta) are a necessary requirement to produce the problem. For example, a workaround would be to explicitly set the default delay/jitter in There is special code in the Helidon was already handling request scope migration for HK2 and CDI, but this code only works correctly when there is a single injection manager in play. The prototype code, now in a new helper class, is available here: Because of all the conditions that need to be met to reproduce the problem, it has been challenging to write a test in Helidon. However, a new test is necessary to ensure that any changes in Jersey do not affect this type of Helidon applications. That's the next step. |
…ultiple applications. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com>
Helidon functional test that shows the problem: |
…2856) * Run concurrent requests for each test. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com> * New test. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com> * New helper class to manage request scope logic across threads. Update of thread local variable in Jersey to make sure correct InjectionManager is used in non-request threads. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com> * Updated test to show the problem described in #2632 using multiple applications. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com> * Fixed copyright. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com>
PR #2856 |
…2856) * Run concurrent requests for each test. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com> * New test. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com> * New helper class to manage request scope logic across threads. Update of thread local variable in Jersey to make sure correct InjectionManager is used in non-request threads. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com> * Updated test to show the problem described in #2632 using multiple applications. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com> * Fixed copyright. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com>
…2856) * Run concurrent requests for each test. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com> * New test. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com> * New helper class to manage request scope logic across threads. Update of thread local variable in Jersey to make sure correct InjectionManager is used in non-request threads. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com> * Updated test to show the problem described in #2632 using multiple applications. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com> * Fixed copyright. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com>
…elidon-io#2856) * Run concurrent requests for each test. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com> * New test. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com> * New helper class to manage request scope logic across threads. Update of thread local variable in Jersey to make sure correct InjectionManager is used in non-request threads. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com> * Updated test to show the problem described in helidon-io#2632 using multiple applications. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com> * Fixed copyright. Signed-off-by: Santiago Pericasgeertsen <santiago.pericasgeertsen@oracle.com>
Environment Details
Problem Description
This is an intermittent problem that I can only reproduce locally about 10% of the time and only by constantly re-running the tests. It is causing periodic fails in project jenkins build, which has highlighted the issue. During Fault Tolerant handling, Retry and Fallback handling, the Request Scope is not active or not available during the retry or fallback handling. I can't say if it is exclusive to those scenarios but those are the scenarios used in this code.
The TenantContext is a RequestScope object and can see in the stack trace above:
But maybe more relevant:
Steps to reproduce
I was unable to reproduce this problem on a simple application although have seen it more regularly in the service in the above stack trace which occurs during integration tests which are specifically testing the services fault tolerance strategies, with intermitted failures with the above stack trace.
The issue occurs during a Retry or Fallback in the service which is accessing a Request Scope object.
The text was updated successfully, but these errors were encountered: