Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Change launch backend script to handle errors gracefully (infiniflow#…
…3334) ### What problem does this PR solve? The `launch_backend_service.sh` script enters infinite loops for both the task executors and the backend server. When an error occurs in any of these processes, the script continuously restarts them without properly handling termination signals. This behavior causes the script to even ignore interrupts, leading to persistent error messages and making it difficult to exit the script gracefully. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Explanation of Modifications 1. **Signal Trapping with `trap`:** - The `trap cleanup SIGINT SIGTERM` line ensures that when a `SIGINT` or `SIGTERM` signal is received, the cleanup function is invoked. - The `cleanup` function sets the `STOP` flag to `true`, iterates through all child process IDs stored in the `PIDS` array, and sends a `kill` signal to each process to terminate them gracefully. 2. **Retry Limits:** - Introduced a `MAX_RETRIES` variable to limit the number of restart attempts for both `task_executor.py` and `ragflow_server.py` - The loops now check if the retry count has reached the maximum limit. If so, they invoke the `cleanup` function to terminate all processes and exit the script. 3. **Process Tracking with `PIDS` Array:** - After launching each background process (`task_exe` and `run_server`), their Process IDs (PIDs) are stored in the `PIDS` array. - This allows the `cleanup` function to terminate all child processes effectively when needed. 4. **Graceful Shutdown:** - When the `cleanup` function is called, it iterates over all child PIDs and sends a termination signal (`kill`) to each, ensuring that all subprocesses are stopped before the script exits. 5. **Logging Enhancements:** - Added `echo` statements to provide clearer logs about the state of each process, including attempts, successes, failures, and retries. 6. **Exit on Successful Completion:** - If `ragflow_server.py` or a `task_executor.py` process exits with a success code (0), the loop breaks, preventing unnecessary retries. Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
- Loading branch information