You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Occasionally, a PR fails for one reason or another and the Bash CI attempts to kill the driver via SSH on a different head node. However, in some circumstances, the driver may have crashed. In those cases, the script (ci/scripts/driver.sh) still attempts to look for and kill all child processes, but is unable to find the parent process, resulting in a failure.
Also, if a PR directory is manually deleted, the Bash CI will fail to write to a log file, causing a failure
What should have happened?
The driver.sh script should first check if the PID is still running before attempting to kill it. Also, the same script should check for the existence of the PR directory and create it if it does not exist.
Adds a check to the SSH command used to kill child PIDs of a defunct
driver instance on a different head node to prevent invalid kill
commands, preventing CI failures.
Resolves#2798
CoryMartin-NOAA
pushed a commit
to CoryMartin-NOAA/global-workflow
that referenced
this issue
Aug 7, 2024
…A-EMC#2799)
Adds a check to the SSH command used to kill child PIDs of a defunct
driver instance on a different head node to prevent invalid kill
commands, preventing CI failures.
ResolvesNOAA-EMC#2798
What is wrong?
Occasionally, a PR fails for one reason or another and the Bash CI attempts to kill the driver via SSH on a different head node. However, in some circumstances, the driver may have crashed. In those cases, the script (ci/scripts/driver.sh) still attempts to look for and kill all child processes, but is unable to find the parent process, resulting in a failure.
Also, if a PR directory is manually deleted, the Bash CI will fail to write to a log file, causing a failure
What should have happened?
The driver.sh script should first check if the PID is still running before attempting to kill it. Also, the same script should check for the existence of the PR directory and create it if it does not exist.
What machines are impacted?
WCOSS2
Steps to reproduce
Rather difficult to reproduce.
Additional information
Found while testing #2791
Do you have a proposed solution?
Add a check to the ssh command that the driver.sh PID is still running.
The text was updated successfully, but these errors were encountered: