Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(process worker): fix the crashing of pod #1119

Conversation

dhiren-singh-007
Copy link
Contributor

Description

Handled the system exception and added the Critical/Fatal log type so that we can monitor the logs for criticality.

Why

It was causing issues for other processes as well .
Eg if a OSP customer sets the wrong callback address or if it is down then it returns the referenced class of SystemException and then it crashed the pod , which stops all other processes.

Issue

#1114

Checklist

Please delete options that are not relevant.

  • I have followed the contributing guidelines
  • I have performed a self-review of my own code
  • I have successfully tested my changes locally
  • I have checked that new and existing tests pass locally with my changes

@dhiren-singh-007 dhiren-singh-007 requested review from ntruchsess and Phil91 and removed request for ntruchsess October 25, 2024 13:10
Copy link
Member

@Phil91 Phil91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert the changes in the framework dir since nothing changed in the framework logic

…calls.

terminate processworker only when running out of memory.
update framework version
@ntruchsess ntruchsess force-pushed the bugifx/1114-fix-process-worker-crash branch from c9ebf6f to b62273b Compare October 29, 2024 11:53
Copy link

@ntruchsess
Copy link
Contributor

ntruchsess commented Oct 29, 2024

@dhiren-singh-007 @Phil91 : I think it makes sense to force a restart of the pod when it runs out of memory - everything else should be logged as critical. I also changed the way SocketException and IOException are handled within the remote api calls. Those wouln't throw this kind of systemexception but ServiceException flagged as RecoverOptions.NETWORK (being mapped to RecoverOptions.INFRASTRUCTURE which is most external system calls treat as recoverable skipping the process-step to be retried later). Nevertheless we might want to evaluate whether this is the right strategy for all external system calls - e.g. when calling a url that is provided by the customer it might be better to set the respective process-step to ERROR while retrying the same call with the next process-worker run would be suitable for calls to 'known as good' urls where it's reasonable to assume the network-related issue is temporary

@ntruchsess ntruchsess merged commit a6936a5 into eclipse-tractusx:main Oct 29, 2024
11 checks passed
@ntruchsess ntruchsess added this to the Release 25.03 milestone Oct 29, 2024
@dhiren-singh-007 dhiren-singh-007 deleted the bugifx/1114-fix-process-worker-crash branch October 29, 2024 15:26
dhiren-singh-007 added a commit to Cofinity-X/entry-portal-backend that referenced this pull request Nov 8, 2024
* terminate processworker only when running out of memory.
* handle SocketException and IOException explicitly in external system calls.
* update framework version
* update process worker test

---------

Co-authored-by: Norbert Truchsess <norbert.truchsess@t-online.de>
Phil91 pushed a commit that referenced this pull request Nov 22, 2024
* terminate processworker only when running out of memory.
* handle SocketException and IOException explicitly in external system calls.
* update framework version
* update process worker test

---------

Co-authored-by: Norbert Truchsess <norbert.truchsess@t-online.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: USER READY
Development

Successfully merging this pull request may close these issues.

3 participants