Change launch backend script to handle errors gracefully #3334
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
The
launch_backend_service.sh
script enters infinite loops for both the task executors and the backend server. When an error occurs in any of these processes, the script continuously restarts them without properly handling termination signals. This behavior causes the script to even ignore interrupts, leading to persistent error messages and making it difficult to exit the script gracefully.Type of change
Explanation of Modifications
trap
:trap cleanup SIGINT SIGTERM
line ensures that when aSIGINT
orSIGTERM
signal is received, the cleanup function is invoked.cleanup
function sets theSTOP
flag totrue
, iterates through all child process IDs stored in thePIDS
array, and sends akill
signal to each process to terminate them gracefully.MAX_RETRIES
variable to limit the number of restart attempts for bothtask_executor.py
andragflow_server.py
cleanup
function to terminate all processes and exit the script.PIDS
Array:task_exe
andrun_server
), their Process IDs (PIDs) are stored in thePIDS
array.cleanup
function to terminate all child processes effectively when needed.cleanup
function is called, it iterates over all child PIDs and sends a termination signal (kill
) to each, ensuring that all subprocesses are stopped before the script exits.echo
statements to provide clearer logs about the state of each process, including attempts, successes, failures, and retries.ragflow_server.py
or atask_executor.py
process exits with a success code (0), the loop breaks, preventing unnecessary retries.