Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change launch backend script to handle errors gracefully #3334

Merged
merged 2 commits into from
Nov 12, 2024

Conversation

sananbintahir
Copy link
Contributor

What problem does this PR solve?

The launch_backend_service.sh script enters infinite loops for both the task executors and the backend server. When an error occurs in any of these processes, the script continuously restarts them without properly handling termination signals. This behavior causes the script to even ignore interrupts, leading to persistent error messages and making it difficult to exit the script gracefully.

Type of change

  • Bug Fix (non-breaking change which fixes an issue)

Explanation of Modifications

  1. Signal Trapping with trap:
    • The trap cleanup SIGINT SIGTERM line ensures that when a SIGINT or SIGTERM signal is received, the cleanup function is invoked.
    • The cleanup function sets the STOP flag to true, iterates through all child process IDs stored in the PIDS array, and sends a kill signal to each process to terminate them gracefully.
  2. Retry Limits:
    • Introduced a MAX_RETRIES variable to limit the number of restart attempts for both task_executor.py and ragflow_server.py
    • The loops now check if the retry count has reached the maximum limit. If so, they invoke the cleanup function to terminate all processes and exit the script.
  3. Process Tracking with PIDS Array:
    • After launching each background process (task_exe and run_server), their Process IDs (PIDs) are stored in the PIDS array.
    • This allows the cleanup function to terminate all child processes effectively when needed.
  4. Graceful Shutdown:
    • When the cleanup function is called, it iterates over all child PIDs and sends a termination signal (kill) to each, ensuring that all subprocesses are stopped before the script exits.
  5. Logging Enhancements:
    • Added echo statements to provide clearer logs about the state of each process, including attempts, successes, failures, and retries.
  6. Exit on Successful Completion:
    • If ragflow_server.py or a task_executor.py process exits with a success code (0), the loop breaks, preventing unnecessary retries.

@KevinHuSh KevinHuSh requested a review from yuzhichang November 11, 2024 11:33
@KevinHuSh KevinHuSh merged commit 62a9afd into infiniflow:main Nov 12, 2024
2 checks passed
jhaiq pushed a commit to jhaiq/ragflow that referenced this pull request Nov 30, 2024
…3334)

### What problem does this PR solve?

The `launch_backend_service.sh` script enters infinite loops for both
the task executors and the backend server. When an error occurs in any
of these processes, the script continuously restarts them without
properly handling termination signals. This behavior causes the script
to even ignore interrupts, leading to persistent error messages and
making it difficult to exit the script gracefully.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Explanation of Modifications

1. **Signal Trapping with `trap`:** 
- The `trap cleanup SIGINT SIGTERM` line ensures that when a `SIGINT` or
`SIGTERM` signal is received, the cleanup function is invoked.
- The `cleanup` function sets the `STOP` flag to `true`, iterates
through all child process IDs stored in the `PIDS` array, and sends a
`kill` signal to each process to terminate them gracefully.
2. **Retry Limits:**
- Introduced a `MAX_RETRIES` variable to limit the number of restart
attempts for both `task_executor.py` and `ragflow_server.py`
- The loops now check if the retry count has reached the maximum limit.
If so, they invoke the `cleanup` function to terminate all processes and
exit the script.
3. **Process Tracking with `PIDS` Array:**
- After launching each background process (`task_exe` and `run_server`),
their Process IDs (PIDs) are stored in the `PIDS` array.
- This allows the `cleanup` function to terminate all child processes
effectively when needed.
4. **Graceful Shutdown:**
- When the `cleanup` function is called, it iterates over all child PIDs
and sends a termination signal (`kill`) to each, ensuring that all
subprocesses are stopped before the script exits.
5. **Logging Enhancements:**
- Added `echo` statements to provide clearer logs about the state of
each process, including attempts, successes, failures, and retries.
6. **Exit on Successful Completion:**
- If `ragflow_server.py` or a `task_executor.py` process exits with a
success code (0), the loop breaks, preventing unnecessary retries.

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants