Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraf seems to send SIGKILL on script timeout instead of first SIGINT then SIGKILL #2526

Closed
jayjayeos opened this issue Mar 10, 2017 · 4 comments · Fixed by #6302
Closed
Assignees
Labels
bug unexpected problem or unintended behavior
Milestone

Comments

@jayjayeos
Copy link

Bug report

On timeout in a script executed by the inputs.exec plugin (and possibly others) Telegraf does not first send SIGTERM (SIGHUP, SIGINT, SIGQUIT) and then SIGKILL but directly sends SIGKILL.

While this kills the script, any cleanup that the script needs to do does not get a chance to run.

This leads to orphaned processes when a script has started child processes.

Relevant telegraf.conf:

[[inputs.exec]]
commands = [
"/etc/telegraf/sqlscripts/oracle_metrics.sh"
]
interval = "1m"
timeout = "5s"
data_format = "influx"

System info:

Oracle Linux Server release 6.3
Linux xxx.tld 2.6.39-400.17.1.el6uek.x86_64 #1 SMP Fri Feb 22 18:16:18 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
Telegraf v1.2.0 (git: release-1.2 b2c1d98)

Steps to reproduce:

  1. configure Telegraf to run a script with inputs.exec Plugin
  2. have script call other process or script that reaches timout or hangs
  3. watch script die and init inherit the started process from the script

Expected behavior:

script gets sent SIGTERM (SIGHUP, SIGINT, SIGQUIT), cleans up after itself then terminates itself.

Actual behavior:

script is killed hard without chance to run signal handler.

Additional info:

Have a look at how SIGKILL works and on best practices on terminating processes.
TL;DR: don't indiscriminately use SIGKILL

http://stackoverflow.com/questions/395877/are-child-processes-created-with-fork-automatically-killed-when-the-parent-is
http://stackoverflow.com/questions/690415/in-what-order-should-i-send-signals-to-gracefully-shutdown-processes
ftp://ftp.gnu.org/old-gnu/Manuals/glibc-2.2.3/html_chapter/libc_24.html#SEC472

Proposal:

Send a non-fatal signal first, then after a grace period (of, say, 5 seconds) send SIGKILL.

Current behavior:

script dies, init inherits childrend processes.

Desired behavior:

script cleans up after itself and commits suicide.

Use case: [Why is this important (helps with prioritizing requests)]

it fu**s up servers running Telegraf by racking up hundreds of orphaned processes.

@danielnelson danielnelson added the bug unexpected problem or unintended behavior label Mar 10, 2017
@danielnelson danielnelson added this to the 1.3.0 milestone Mar 10, 2017
@danielnelson
Copy link
Contributor

@jayjayeos Thanks for opening an issue, are you able to work on a PR?

@danielnelson
Copy link
Contributor

We also should investigate if we should signal the process group or only the parent.

@danielnelson danielnelson modified the milestones: 1.4.0, 1.5.0 Aug 14, 2017
@danielnelson danielnelson modified the milestones: 1.5.0, 1.6.0 Nov 29, 2017
@danielnelson danielnelson modified the milestones: 1.6.0, 1.7.0 Jan 27, 2018
@danielnelson danielnelson modified the milestones: 1.7.0, 1.8.0 Jun 3, 2018
@russorat russorat modified the milestones: 1.8.0, 1.9.0 Sep 4, 2018
@danielnelson danielnelson modified the milestones: 1.9.0, 1.10 Oct 29, 2018
@jorioux
Copy link

jorioux commented Nov 29, 2018

I have this issue also:

[inputs.exec]: Error in plugin: exec: signal: killed for command '/home/telegraf/some_script.ps1'

@danielnelson
Copy link
Contributor

On Unix systems we should send SIGTERM first, but is there a signal on Windows to request shutdown? Based on golang/go#6720 it doesn't seem possible.

@russorat russorat modified the milestones: 1.10.0, 1.11.0 Jan 14, 2019
@danielnelson danielnelson self-assigned this May 2, 2019
@danielnelson danielnelson removed this from the 1.11.0 milestone May 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants