Sometimes the "beam" processes can't stop completely when stop the rabbit_node_ng job #170

gu-bin · 2016-03-30T09:04:41Z

When running "bosh deploy" to upgrade the rabbit_node_ng job, bosh will stop all the jobs on the vm first and then unmount the persistent disk. But sometimes stopping the jobs can't stop the "beam" processes (created by rabbit_node job), so the persistent disk is still used by "beam" and can't be unmounted. It will cause the bosh deploy fail. To fix it, we need to log in to the rabbit_node_ng vm and kill all the beam processes and re-run bosh deploy. With this problem, we can't run bosh deploy fully automatically without manual interference.

We investigated that the problem is caused by some warden processes (the father process of beam processes) can't be stopped by stopping rabbit_node job (/var/vcap/jobs/rabbit_node_ng/bin/rabbit_node_ctl stop). Normally, when bosh stops rabbit_node job, the warden processes (like wshd: 19gdipma38k) will be killed so the beam processes will be killed along with them. But sometimes the warden processes failed to be killed so the beam processes keep alive and occupy the persistent disk.

One way to fix this problem we can think of is to add the function to "/var/vcap/jobs/rabbit_node_ng/bin/rabbit_node_ctl stop" command to check if there are some warden processes still alive after "kill_and_wait $PIDFILE 60". If yes, kill them.

gu-bin · 2016-04-22T05:51:32Z

Is there any update?
/cc @maximilien

maximilien added the bug label Apr 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sometimes the "beam" processes can't stop completely when stop the rabbit_node_ng job #170

Sometimes the "beam" processes can't stop completely when stop the rabbit_node_ng job #170

gu-bin commented Mar 30, 2016 •

edited

Loading

gu-bin commented Apr 22, 2016

Sometimes the "beam" processes can't stop completely when stop the rabbit_node_ng job #170

Sometimes the "beam" processes can't stop completely when stop the rabbit_node_ng job #170

Comments

gu-bin commented Mar 30, 2016 • edited Loading

gu-bin commented Apr 22, 2016

gu-bin commented Mar 30, 2016 •

edited

Loading