Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraf establishing multiple connections after conectivity lost with server #2778

Closed
diego-maravankin opened this issue May 9, 2017 · 5 comments

Comments

@diego-maravankin
Copy link

I am currently experiencing an issue with Telegraf, as it opens multiple connections to the InfluDB server severely affecting the engine's performance (one telegraf instance reached 700+ simultaneous connections)

Relevant telegraf.conf:

No relevant telegraf.conf, most of it is default. The full config and additional config files are attached.

System info:

Telegraf version

$ telegraf --version
Telegraf v1.2.0 (git: release-1.2 b2c1d98cff5a58d5ded3e74741c6bc32d6e789ee)

Operating System

$ cat /etc/debian_version 
8.6
$ uname -a
Linux sonargtd006 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1+deb8u2 (2017-03-07) x86_64 GNU/Linux

Engine version

InfluxDB shell version: 1.2.0
[Include Telegraf version, operating system name, and other relevant details]

Steps to reproduce:

  1. Have a working instance of telegraf, with one established connection to the server.
  2. Remove access to the server, in my case that was achieved by removing internet connection to the client (network nterface should always have link)
  3. Monitor telegraf network connections (I used watch "lsof $(pgrep telegraf)")
  4. Restore access to the server, eventually telegraf will send multiple SYN requests to the server and all will be established.

Expected behavior:

Only one active connection to the server

Actual behavior:

Multiple connections to the server

Additional info:

Open files and connections by telegraf process

$ lsof -p $(pgrep telegraf)
COMMAND    PID     USER   FD   TYPE             DEVICE SIZE/OFF    NODE NAME
telegraf 12379 telegraf  cwd    DIR                8,1     4096       2 /
telegraf 12379 telegraf  rtd    DIR                8,1     4096       2 /
telegraf 12379 telegraf  txt    REG                8,1 32548479 1192538 /usr/bin/telegraf
telegraf 12379 telegraf  mem    REG                8,1  1738176  262630 /lib/x86_64-linux-gnu/libc-2.19.so
telegraf 12379 telegraf  mem    REG                8,1   137384  262703 /lib/x86_64-linux-gnu/libpthread-2.19.so
telegraf 12379 telegraf  mem    REG                8,1   140928  262558 /lib/x86_64-linux-gnu/ld-2.19.so
telegraf 12379 telegraf    0r   CHR                1,3      0t0       7 /dev/null
telegraf 12379 telegraf    1u  unix 0xffff88003611b080      0t0 1242524 socket
telegraf 12379 telegraf    2u  unix 0xffff88003611b080      0t0 1242524 socket
telegraf 12379 telegraf    3r   CHR                1,9      0t0      12 /dev/urandom
telegraf 
12379 telegraf    4r   REG               0,15     2304   11892 /run/utmp
telegraf 12379 telegraf    5u  0000                0,9        0    6603 anon_inode
telegraf 12379 telegraf    6u  IPv4            1270461      0t0     TCP hostname:43812->XXX.XXX.XXX.XXX:7074 (ESTABLISHED)
telegraf 12379 telegraf    7u  IPv4            1271138      0t0     TCP hostname:43815->XXX.XXX.XXX.XXX:7074 (ESTABLISHED)

additional-config.conf.txt
telegraf.conf.txt
strace_telegraf.txt.7z.zip

strace_telegraf is the output of strace during the issue. It is compressed with 7z, because of the compression rate. I had to rename it to .zip in order to upload it. Issue starts in line 388324.

@diego-maravankin
Copy link
Author

I updated the client version to Telegraf v1.2.1 (git: release-1.2 3b6ffb344e5c03c1595d862282a6823ecb438cff). I am testing if I can reproduce in this version, otherwise I will close the issue.

@danielnelson
Copy link
Contributor

We are going to release 1.3 soon, could you test with the latest release candidate? #2733 (comment)

@diego-maravankin
Copy link
Author

@danielnelson after testing 1.2.1 I will test 1.3 rc3, but since this is a live system I'd wait for the stable release rather than going with the RC to production.

@danielnelson
Copy link
Contributor

Sounds good, I expect the final release sometime next week.

@danielnelson
Copy link
Contributor

Please reopen if you are still having the issue after testing with 1.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants