Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node_exporter can't work on Centos 7 #697

Closed
xiaojiaqi opened this issue Oct 13, 2017 · 47 comments · Fixed by #728
Closed

node_exporter can't work on Centos 7 #697

xiaojiaqi opened this issue Oct 13, 2017 · 47 comments · Fixed by #728

Comments

@xiaojiaqi
Copy link

Host operating system: output of uname -a

[aaa@localhost ~]$ uname -a

Linux localhost.localdomain 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 0.15.0 (branch: HEAD, revision: 6e2053c)
build user: root@168089f37ad9
build date: 20171006-11:33:58
go version: go1.9.1

node_exporter command line flags

sudo ./node_exporter --log.level="debug" --no-collector.zfs

Are you running node_exporter in Docker?

no. it didn't work in docker

What did you do that produced an error?

[aaa@localhost ~]$ sudo ./node_exporter --log.level="debug" --no-collector.zfs
INFO[0000] Starting node_exporter (version=0.15.0, branch=HEAD, revision=6e2053c557f96efb63aef3691f15335a70baaffd) source="node_exporter.go:43"
INFO[0000] Build context (go=go1.9.1, user=root@168089f37ad9, date=20171006-11:33:58) source="node_exporter.go:44"
INFO[0000] No directory specified, see --collector.textfile.directory source="textfile.go:57"
INFO[0000] Enabled collectors: source="node_exporter.go:50"
INFO[0000] - filesystem source="node_exporter.go:52"
INFO[0000] - vmstat source="node_exporter.go:52"
INFO[0000] - edac source="node_exporter.go:52"
INFO[0000] - hwmon source="node_exporter.go:52"
INFO[0000] - infiniband source="node_exporter.go:52"
INFO[0000] - meminfo source="node_exporter.go:52"
INFO[0000] - textfile source="node_exporter.go:52"
INFO[0000] - cpu source="node_exporter.go:52"
INFO[0000] - entropy source="node_exporter.go:52"
INFO[0000] - arp source="node_exporter.go:52"
INFO[0000] - sockstat source="node_exporter.go:52"
INFO[0000] - loadavg source="node_exporter.go:52"
INFO[0000] - netdev source="node_exporter.go:52"
INFO[0000] - wifi source="node_exporter.go:52"
INFO[0000] - timex source="node_exporter.go:52"
INFO[0000] - xfs source="node_exporter.go:52"
INFO[0000] - netstat source="node_exporter.go:52"
INFO[0000] - diskstats source="node_exporter.go:52"
INFO[0000] - mdadm source="node_exporter.go:52"
INFO[0000] - time source="node_exporter.go:52"
INFO[0000] - conntrack source="node_exporter.go:52"
INFO[0000] - filefd source="node_exporter.go:52"
INFO[0000] - ipvs source="node_exporter.go:52"
INFO[0000] - stat source="node_exporter.go:52"
INFO[0000] - uname source="node_exporter.go:52"
INFO[0000] - bcache source="node_exporter.go:52"
INFO[0000] Listening on :9100 source="node_exporter.go:76"
DEBU[0005] OK: bcache collector succeeded after 0.000072s. source="collector.go:126"
DEBU[0005] CPU "/sys/bus/cpu/devices/cpu0" is missing cpufreq source="cpu_linux.go:114"
DEBU[0005] CPU "/sys/bus/cpu/devices/cpu0" is missing thermal_throttle source="cpu_linux.go:135"
DEBU[0005] Package "/sys/bus/node/devices/node0" CPU "0" is missing package_throttle source="cpu_linux.go:166"
DEBU[0005] OK: cpu collector succeeded after 0.000471s. source="collector.go:126"
DEBU[0005] Ignoring mount point: /sys source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /proc source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /dev source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /sys/kernel/security source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /dev/shm source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /dev/pts source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /sys/fs/cgroup source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /sys/fs/cgroup/systemd source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /sys/fs/pstore source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /sys/fs/cgroup/devices source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /sys/fs/cgroup/cpu,cpuacct source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /sys/fs/cgroup/memory source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /sys/fs/cgroup/perf_event source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /sys/fs/cgroup/freezer source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /sys/fs/cgroup/net_cls source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /sys/fs/cgroup/cpuset source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /sys/fs/cgroup/hugetlb source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /sys/fs/cgroup/blkio source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /sys/kernel/config source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /proc/sys/fs/binfmt_misc source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /sys/kernel/debug source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /dev/mqueue source="filesystem_linux.go:42"
DEBU[0005] Ignoring mount point: /dev/hugepages source="filesystem_linux.go:42"
DEBU[0005] OK: filesystem collector succeeded after 0.000990s. source="collector.go:126"
DEBU[0005] OK: edac collector succeeded after 0.000022s. source="collector.go:126"
DEBU[0005] Unable to detect InfiniBand devices source="infiniband_linux.go:110"
DEBU[0005] OK: infiniband collector succeeded after 0.000081s. source="collector.go:126"
DEBU[0005] Set node_mem: map[string]float64{"WritebackTmp":0, "HugePages_Rsvd":0, "DirectMap4k":6.2849024e+07, "DirectMap2M":2.084569088e+09, "Inactive_file":1.22441728e+08, "SwapTotal":2.147479552e+09, "KernelStack":2.12992e+06, "PageTables":4.52608e+06, "VmallocTotal":3.5184372087808e+13, "Buffers":970752, "Unevictable":0, "HugePages_Total":0, "Dirty":8192, "CommitLimit":3.112349696e+09, "Active_file":3.760128e+07, "AnonPages":5.5844864e+07, "SUnreclaim":1.570816e+07, "Committed_AS":4.00420864e+08, "MemAvailable":1.667551232e+09, "Cached":1.6797696e+08, "Mlocked":0, "HugePages_Surp":0, "MemTotal":1.929740288e+09, "MemFree":1.611333632e+09, "SwapFree":2.147479552e+09, "Shmem":8.904704e+06, "SReclaimable":2.299904e+07, "Bounce":0, "AnonHugePages":8.388608e+06, "Active":9.3741056e+07, "Inactive":1.3099008e+08, "HugePages_Free":0, "Hugepagesize":2.097152e+06, "Writeback":0, "VmallocChunk":3.5184201691136e+13, "Inactive_anon":8.548352e+06, "Mapped":4.3884544e+07, "Slab":3.87072e+07, "NFS_Unstable":0, "VmallocUsed":1.61316864e+08, "HardwareCorrupted":0, "SwapCached":0, "Active_anon":5.6139776e+07} source="meminfo.go:48"
DEBU[0005] OK: meminfo collector succeeded after 0.000517s. source="collector.go:126"
DEBU[0005] OK: textfile collector succeeded after 0.000000s. source="collector.go:126"
DEBU[0005] OK: entropy collector succeeded after 0.000061s. source="collector.go:126"
DEBU[0005] OK: arp collector succeeded after 0.000098s. source="collector.go:126"
DEBU[0005] OK: sockstat collector succeeded after 0.000149s. source="collector.go:126"
DEBU[0005] return load 0: 0.000000 source="loadavg.go:51"
DEBU[0005] return load 1: 0.010000 source="loadavg.go:51"
DEBU[0005] return load 2: 0.050000 source="loadavg.go:51"
DEBU[0005] OK: loadavg collector succeeded after 0.000116s. source="collector.go:126"
DEBU[0005] OK: netdev collector succeeded after 0.000479s. source="collector.go:126"
DEBU[0005] OK: wifi collector succeeded after 0.000247s. source="collector.go:126"
DEBU[0005] OK: timex collector succeeded after 0.000024s. source="collector.go:126"
DEBU[0005] OK: xfs collector succeeded after 0.000157s. source="collector.go:126"
DEBU[0005] OK: netstat collector succeeded after 0.001702s. source="collector.go:126"
DEBU[0005] Ignoring device: fd0 source="diskstats_linux.go:175"
DEBU[0005] Ignoring device: sda1 source="diskstats_linux.go:175"
DEBU[0005] Ignoring device: sda2 source="diskstats_linux.go:175"
DEBU[0005] OK: diskstats collector succeeded after 0.000333s. source="collector.go:126"
DEBU[0005] OK: mdadm collector succeeded after 0.000084s. source="collector.go:126"
DEBU[0005] Return time: 1507873748.996584 source="time.go:47"
DEBU[0005] OK: time collector succeeded after 0.000041s. source="collector.go:126"
DEBU[0005] OK: conntrack collector succeeded after 0.000086s. source="collector.go:126"
DEBU[0005] OK: filefd collector succeeded after 0.000064s. source="collector.go:126"
DEBU[0005] ipvs collector metrics are not available for this system source="ipvs_linux.go:113"
DEBU[0005] OK: ipvs collector succeeded after 0.000099s. source="collector.go:126"
DEBU[0005] OK: stat collector succeeded after 0.000185s. source="collector.go:126"
DEBU[0005] OK: uname collector succeeded after 0.000049s. source="collector.go:126"
DEBU[0005] OK: vmstat collector succeeded after 0.008549s. source="collector.go:126"

use curl to do a testing
[aaa@localhost ~]$ curl http://10.29.101.101:9100/metrics --max-time 10 -kvv

  • About to connect() to 10.29.101.101 port 9100 (#0)
  • Trying 10.29.101.101...
  • Connected to 10.29.101.101 (10.29.101.101) port 9100 (#0)

GET /metrics HTTP/1.1
User-Agent: curl/7.29.0
Host: 10.29.101.101:9100
Accept: /

  • Operation timed out after 10001 milliseconds with 0 out of -1 bytes received
  • Closing connection 0
    curl: (28) Operation timed out after 10001 milliseconds with 0 out of -1 bytes received

What did you expect to see?

no output on Centos7 http://10.29.101.101:9100/metrics. The same process can work well on ubuntu 16.04. it seem it is hanging on some step, would you please take a look?

What did you see instead?

@SuperQ
Copy link
Member

SuperQ commented Oct 13, 2017

This sounds like you have a firewall dropping packets. Do you get the same issue when curl http://localhost:9100/metrics?

@xiaojiaqi
Copy link
Author

it is a good idea. so I do a testing again. Following is what I do, please correct me if i wrong

1. check the firewall, and confirm the 9100 is open

[aaa@localhost ~]$ sudo systemctl disable firewalld
[aaa@localhost ~]$ sudo systemctl stop firewalld
[aaa@localhost ~]$ ss -atnl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 :22 :
LISTEN 0 100 127.0.0.1:25 :
LISTEN 0 128 :::9100 :::

LISTEN 0 128 :::22 :::*
LISTEN 0 100 ::1:25 :::*
[aaa@localhost ~]$ curl http://localhost:9100/metrics
^C
[aaa@localhost ~]$ ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno16780032: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:50:56:b0:bc:94 brd ff:ff:ff:ff:ff:ff
inet 10.29.101.101/24 brd 10.29.101.255 scope global eno16780032
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:feb0:bc94/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
link/ether 02:42:2b:4c:34:fb brd ff:ff:ff:ff:ff:ff
inet 10.2.40.1/24 scope global docker0
valid_lft forever preferred_lft forever
[aaa@localhost ~]$ curl http://localhost:9100/metrics
^C
[aaa@localhost ~]$ curl http://localhost:9100/metrics --max-time 10 -kvv

  • About to connect() to localhost port 9100 (#0)
  • Trying ::1...
  • Connected to localhost (::1) port 9100 (#0)

GET /metrics HTTP/1.1
User-Agent: curl/7.29.0
Host: localhost:9100
Accept: /

  • Operation timed out after 10001 milliseconds with 0 out of -1 bytes received
  • Closing connection 0
    curl: (28) Operation timed out after 10001 milliseconds with 0 out of -1 bytes received

2. check the node_export log

[aaa@localhost ~]$ sudo ./node_exporter --log.level="debug"
INFO[0000] Starting node_exporter (version=0.15.0, branch=HEAD, revision=6e2053c557f96efb63aef3691f15335a70baaffd) source="node_exporter.go:43"
INFO[0000] Build context (go=go1.9.1, user=root@168089f37ad9, date=20171006-11:33:58) source="node_exporter.go:44"
INFO[0000] No directory specified, see --collector.textfile.directory source="textfile.go:57"
INFO[0000] Enabled collectors: source="node_exporter.go:50"
INFO[0000] - conntrack source="node_exporter.go:52"
INFO[0000] - vmstat source="node_exporter.go:52"
INFO[0000] - cpu source="node_exporter.go:52"
INFO[0000] - netstat source="node_exporter.go:52"
INFO[0000] - textfile source="node_exporter.go:52"
INFO[0000] - stat source="node_exporter.go:52"
INFO[0000] - zfs source="node_exporter.go:52"
INFO[0000] - meminfo source="node_exporter.go:52"
INFO[0000] - time source="node_exporter.go:52"
INFO[0000] - filefd source="node_exporter.go:52"
INFO[0000] - xfs source="node_exporter.go:52"
INFO[0000] - sockstat source="node_exporter.go:52"
INFO[0000] - loadavg source="node_exporter.go:52"
INFO[0000] - infiniband source="node_exporter.go:52"
INFO[0000] - ipvs source="node_exporter.go:52"
INFO[0000] - uname source="node_exporter.go:52"
INFO[0000] - filesystem source="node_exporter.go:52"
INFO[0000] - entropy source="node_exporter.go:52"
INFO[0000] - timex source="node_exporter.go:52"
INFO[0000] - mdadm source="node_exporter.go:52"
INFO[0000] - wifi source="node_exporter.go:52"
INFO[0000] - arp source="node_exporter.go:52"
INFO[0000] - bcache source="node_exporter.go:52"
INFO[0000] - netdev source="node_exporter.go:52"
INFO[0000] - diskstats source="node_exporter.go:52"
INFO[0000] - hwmon source="node_exporter.go:52"
INFO[0000] - edac source="node_exporter.go:52"
INFO[0000] Listening on :9100 source="node_exporter.go:76"
DEBU[0003] Ignoring mount point: /sys source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /proc source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /dev source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /sys/kernel/security source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /dev/shm source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /dev/pts source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /sys/fs/cgroup source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /sys/fs/cgroup/systemd source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /sys/fs/pstore source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /sys/fs/cgroup/devices source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /sys/fs/cgroup/cpu,cpuacct source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /sys/fs/cgroup/memory source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /sys/fs/cgroup/perf_event source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /sys/fs/cgroup/freezer source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /sys/fs/cgroup/net_cls source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /sys/fs/cgroup/cpuset source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /sys/fs/cgroup/hugetlb source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /sys/fs/cgroup/blkio source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /sys/kernel/config source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /proc/sys/fs/binfmt_misc source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /sys/kernel/debug source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /dev/mqueue source="filesystem_linux.go:42"
DEBU[0003] Ignoring mount point: /dev/hugepages source="filesystem_linux.go:42"
DEBU[0003] OK: filesystem collector succeeded after 0.001161s. source="collector.go:126"
DEBU[0003] OK: bcache collector succeeded after 0.000054s. source="collector.go:126"
DEBU[0003] OK: timex collector succeeded after 0.000028s. source="collector.go:126"
DEBU[0003] OK: mdadm collector succeeded after 0.000057s. source="collector.go:126"
DEBU[0003] OK: wifi collector succeeded after 0.000459s. source="collector.go:126"
DEBU[0003] OK: arp collector succeeded after 0.000112s. source="collector.go:126"
DEBU[0003] Ignoring device: fd0 source="diskstats_linux.go:175"
DEBU[0003] Ignoring device: sda1 source="diskstats_linux.go:175"
DEBU[0003] Ignoring device: sda2 source="diskstats_linux.go:175"
DEBU[0003] OK: diskstats collector succeeded after 0.000323s. source="collector.go:126"
DEBU[0003] OK: netdev collector succeeded after 0.000433s. source="collector.go:126"
DEBU[0003] OK: edac collector succeeded after 0.000016s. source="collector.go:126"
DEBU[0003] CPU "/sys/bus/cpu/devices/cpu0" is missing cpufreq source="cpu_linux.go:114"
DEBU[0003] CPU "/sys/bus/cpu/devices/cpu0" is missing thermal_throttle source="cpu_linux.go:135"
DEBU[0003] Package "/sys/bus/node/devices/node0" CPU "0" is missing package_throttle source="cpu_linux.go:166"
DEBU[0003] OK: cpu collector succeeded after 0.000420s. source="collector.go:126"
DEBU[0003] OK: conntrack collector succeeded after 0.000083s. source="collector.go:126"
DEBU[0003] OK: vmstat collector succeeded after 0.000436s. source="collector.go:126"
DEBU[0003] OK: textfile collector succeeded after 0.000000s. source="collector.go:126"
DEBU[0003] OK: netstat collector succeeded after 0.001631s. source="collector.go:126"
DEBU[0003] OK: stat collector succeeded after 0.000162s. source="collector.go:126"
DEBU[0003] Cannot open "/proc/spl/kstat/zfs/dmu_tx" for reading. Is the kernel module loaded? source="zfs_linux.go:32"
DEBU[0003] ZFS / ZFS statistics are not available source="zfs.go:62"
DEBU[0003] OK: zfs collector succeeded after 0.000096s. source="collector.go:126"
DEBU[0003] Set node_mem: map[string]float64{"MemTotal":1.929740288e+09, "Inactive":1.29236992e+08, "Inactive_anon":8.527872e+06, "Active_file":3.9387136e+07, "Writeback":0, "Shmem":8.904704e+06, "Slab":3.9018496e+07, "DirectMap4k":6.2849024e+07, "SwapCached":0, "AnonPages":5.752832e+07, "SUnreclaim":1.5929344e+07, "NFS_Unstable":0, "HugePages_Surp":0, "SwapFree":2.147479552e+09, "Dirty":8192, "SReclaimable":2.3089152e+07, "CommitLimit":3.112349696e+09, "VmallocTotal":3.5184372087808e+13, "HardwareCorrupted":0, "HugePages_Total":0, "DirectMap2M":2.084569088e+09, "Buffers":970752, "WritebackTmp":0, "Committed_AS":4.11389952e+08, "VmallocUsed":1.61316864e+08, "VmallocChunk":3.5184201691136e+13, "Hugepagesize":2.097152e+06, "MemAvailable":1.665536e+09, "Cached":1.68030208e+08, "Active_anon":5.7892864e+07, "HugePages_Free":0, "HugePages_Rsvd":0, "MemFree":1.609220096e+09, "Active":9.728e+07, "Unevictable":0, "Mlocked":0, "SwapTotal":2.147479552e+09, "Mapped":4.4453888e+07, "PageTables":4.644864e+06, "Inactive_file":1.2070912e+08, "KernelStack":2.12992e+06, "AnonHugePages":8.388608e+06, "Bounce":0} source="meminfo.go:48"
DEBU[0003] OK: meminfo collector succeeded after 0.000504s. source="collector.go:126"
DEBU[0003] Return time: 1507888826.291955 source="time.go:47"
DEBU[0003] OK: time collector succeeded after 0.000050s. source="collector.go:126"
DEBU[0003] OK: filefd collector succeeded after 0.000046s. source="collector.go:126"
DEBU[0003] OK: xfs collector succeeded after 0.000267s. source="collector.go:126"
DEBU[0003] OK: sockstat collector succeeded after 0.000145s. source="collector.go:126"
DEBU[0003] Unable to detect InfiniBand devices source="infiniband_linux.go:110"
DEBU[0003] OK: infiniband collector succeeded after 0.000059s. source="collector.go:126"
DEBU[0003] return load 0: 0.340000 source="loadavg.go:51"
DEBU[0003] return load 1: 0.090000 source="loadavg.go:51"
DEBU[0003] return load 2: 0.070000 source="loadavg.go:51"
DEBU[0003] OK: loadavg collector succeeded after 0.000134s. source="collector.go:126"
DEBU[0003] OK: entropy collector succeeded after 0.000030s. source="collector.go:126"
DEBU[0003] ipvs collector metrics are not available for this system source="ipvs_linux.go:113"
DEBU[0003] OK: ipvs collector succeeded after 0.000060s. source="collector.go:126"
DEBU[0003] OK: uname collector succeeded after 0.000025s. source="collector.go:126"

3 another check of iptables

[aaa@localhost ~]$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination
[aaa@localhost ~]$ curl http://localhost:9100/metrics --max-time 10 -kvv

  • About to connect() to localhost port 9100 (#0)
  • Trying ::1...
  • Connected to localhost (::1) port 9100 (#0)

GET /metrics HTTP/1.1
User-Agent: curl/7.29.0
Host: localhost:9100
Accept: /

  • Operation timed out after 10001 milliseconds with 0 out of -1 bytes received
  • Closing connection 0
    curl: (28) Operation timed out after 10001 milliseconds with 0 out of -1 bytes received

from the node_export output , I think the process get the query, and collector the metrics, but it didn't send out the response. Please take a look

@xiaojiaqi
Copy link
Author

I don't know the root cause, but it can work on another centos 7. so I close it.

@xiaojiaqi xiaojiaqi reopened this Oct 16, 2017
@xiaojiaqi
Copy link
Author

some update:

I think it is a bug of node_export on centos 7. so I reopen it. following is what I do,
you can found the curl can get full response from http://127.0.0.1:9100/ and http://10.29.101.105:9100/ but, it will block on http://127.0.0.1:9100/metrics and
http://10.29.101.105:9100/metrics. so I think the network is correct.

checking log.

[aaa@localhost node_exporter-0.15.0.linux-amd64]$ curl http://127.0.0.1:9100/ -m 10

<title>Node Exporter</title>

Node Exporter

Metrics

[aaa@localhost node_exporter-0.15.0.linux-amd64]$ curl http://10.29.101.105:9100/ -m 10 <title>Node Exporter</title>

Node Exporter

Metrics

[aaa@localhost node_exporter-0.15.0.linux-amd64]$ curl http://10.29.101.105:9100/metrics -m 10 curl: (28) Operation timed out after 10001 milliseconds with 0 out of -1 bytes received [aaa@localhost node_exporter-0.15.0.linux-amd64]$ curl http://127.0.0.1:9100/metrics -m 10 curl: (28) Operation timed out after 10001 milliseconds with 0 out of -1 bytes received [aaa@localhost node_exporter-0.15.0.linux-amd64]$ sudo iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination

Chain FORWARD (policy DROP)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Chain DOCKER (0 references)
target prot opt source destination

Chain DOCKER-ISOLATION (0 references)
target prot opt source destination

Chain DOCKER-USER (0 references)
target prot opt source destination

@xiaojiaqi
Copy link
Author

When I rollback to https://github.com/prometheus/node_exporter/releases/download/v0.14.0/node_exporter-0.14.0.linux-amd64.tar.gz
on the same box, It is work well.

@SuperQ
Copy link
Member

SuperQ commented Oct 16, 2017

Based on the log you included, the only collector not reporting OK was hwmon. I would try running it with --no-collector.hwmon to test.

@matthiasr
Copy link
Contributor

And if that fixes the issue, run node exporter with only this collector and strace it during the /metrics call. This should yield a file that it's trying to read – can you cat that file or does that hang too?

If this happens a lot we might want to disable hwmon by default?

@matthiasr
Copy link
Contributor

(but so far, this is the only report)

@xiaojiaqi
Copy link
Author

xiaojiaqi commented Oct 17, 2017

Post some strace log for debuging.

[aaa@localhost node_exporter-0.15.0.linux-amd64]$ strace ./node_exporter 
execve("./node_exporter", ["./node_exporter"], [/* 23 vars */]) = 0
uname({sysname="Linux", nodename="localhost.localdomain", ...}) = 0
brk(NULL)                               = 0x224f000
brk(0x22501c0)                          = 0x22501c0
arch_prctl(ARCH_SET_FS, 0x224f880)      = 0
set_tid_address(0x224fb50)              = 9917
set_robust_list(0x224fb60, 24)          = 0
rt_sigaction(SIGRTMIN, {0x8e7da0, [], SA_RESTORER|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x8e7e30, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
readlink("/proc/self/exe", "/home/aaa/node1.5/node_exporter-"..., 4096) = 64
brk(0x22711c0)                          = 0x22711c0
brk(0x2272000)                          = 0x2272000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
sched_getaffinity(0, 8192, [0 ...])     = 640
mmap(0xc000000000, 65536, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc000000000
munmap(0xc000000000, 65536)             = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0101e2e000
mmap(0xc420000000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc420000000
mmap(0xc41fff8000, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc41fff8000
mmap(0xc000000000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc000000000
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0101e1e000
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0101e0e000
rt_sigprocmask(SIG_SETMASK, NULL, [], 8) = 0
sigaltstack(NULL, {ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}) = 0
sigaltstack({ss_sp=0xc420002000, ss_flags=0, ss_size=32768}, NULL) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
gettid()                                = 9917
rt_sigaction(SIGHUP, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGHUP, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGINT, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGINT, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGQUIT, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGQUIT, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGILL, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGILL, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGTRAP, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGTRAP, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGABRT, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGABRT, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGBUS, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGBUS, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGFPE, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGFPE, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGUSR1, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGUSR1, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGSEGV, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGSEGV, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGUSR2, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGUSR2, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGPIPE, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGPIPE, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGALRM, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGALRM, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGTERM, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGTERM, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGSTKFLT, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGSTKFLT, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGCHLD, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGURG, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGURG, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGXCPU, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGXCPU, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGXFSZ, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGXFSZ, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGVTALRM, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGVTALRM, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGPROF, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGPROF, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGWINCH, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGWINCH, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGIO, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGIO, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGPWR, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGPWR, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGSYS, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGSYS, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRTMIN, NULL, {0x8e7da0, [], SA_RESTORER|SA_SIGINFO, 0x8e7490}, 8) = 0
rt_sigaction(SIGRTMIN, NULL, {0x8e7da0, [], SA_RESTORER|SA_SIGINFO, 0x8e7490}, 8) = 0
rt_sigaction(SIGRTMIN, {0x8e7da0, [], SA_RESTORER|SA_STACK|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_1, NULL, {0x8e7e30, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x8e7490}, 8) = 0
rt_sigaction(SIGRT_1, NULL, {0x8e7e30, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x8e7490}, 8) = 0
rt_sigaction(SIGRT_1, {0x8e7e30, [], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_2, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_2, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_3, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_3, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_4, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_4, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_5, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_5, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_6, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_6, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_7, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_7, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_8, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_8, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_9, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_9, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_10, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_10, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_11, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_11, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_12, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_12, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_13, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_13, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_14, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_14, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_15, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_15, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_16, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_16, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_17, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_17, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_18, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_18, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_19, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_19, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_20, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_20, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_21, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_21, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_22, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_22, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_23, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_23, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_24, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_24, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_25, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_25, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_26, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_26, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_27, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_27, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_28, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_28, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_29, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_29, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_30, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_30, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_31, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_31, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigaction(SIGRT_32, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_32, {0x45e160, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x8e7490}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f010160d000
mprotect(0x7f010160d000, 4096, PROT_NONE) = 0
clone(child_stack=0x7f0101e0ce70, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f0101e0d9d0, tls=0x7f0101e0d700, child_tidptr=0x7f0101e0d9d0) = 9918
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f0100e0c000
mprotect(0x7f0100e0c000, 4096, PROT_NONE) = 0
clone(child_stack=0x7f010160be70, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f010160c9d0, tls=0x7f010160c700, child_tidptr=0x7f010160c9d0) = 9919
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x104a990, FUTEX_WAIT, 0, NULL)   = 0
readlinkat(AT_FDCWD, "/proc/self/exe", "/home/aaa/node1.5/node_exporter-"..., 128) = 64
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0100dcc000
openat(AT_FDCWD, "/proc/sys/net/core/somaxconn", O_RDONLY|O_CLOEXEC) = 3
epoll_create1(EPOLL_CLOEXEC)            = 4
epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14487408, u64=139642286182256}}) = 0
fcntl(3, F_GETFL)                       = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
read(3, "128\n", 4096)                  = 4
read(3, "", 4092)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 3, 0xc42004dc0c) = 0
close(3)                                = 0
mmap(0xc420100000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc420100000
mmap(0xc41fff0000, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc41fff0000
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
getpid()                                = 9917
stat("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
openat(AT_FDCWD, "/proc/stat", O_RDONLY|O_CLOEXEC) = 3
epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14487408, u64=139642286182256}}) = 0
fcntl(3, F_GETFL)                       = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
read(3, "cpu  94407 81 49904 40361911 158"..., 4096) = 787
read(3, "", 3309)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 3, 0xc42004d2dc) = 0
close(3)                                = 0
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
futex(0xc42002e810, FUTEX_WAKE, 1)      = 1
ioctl(2, TCGETS, {B9600 opost isig icanon echo ...}) = 0
write(2, "\33[36mINFO\33[0m[0000] Starting nod"..., 163INFO[0000] Starting node_exporter (version=0.15.0, branch=HEAD, revision=6e2053c557f96efb63aef3691f15335a70baaffd)  source="node_exporter.go:43"
) = 163
write(2, "\33[36mINFO\33[0m[0000] Build contex"..., 134INFO[0000] Build context (go=go1.9.1, user=root@168089f37ad9, date=20171006-11:33:58)  source="node_exporter.go:44"
) = 134
write(2, "\33[36mINFO\33[0m[0000] No directory"..., 113INFO[0000] No directory specified, see --collector.textfile.directory  source="textfile.go:57"
) = 113
stat("/sys", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
stat("/sys", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
stat("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
write(2, "\33[36mINFO\33[0m[0000] Enabled coll"..., 104INFO[0000] Enabled collectors:                           source="node_exporter.go:50"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - loadavg  "..., 104INFO[0000]  - loadavg                                    source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - edac     "..., 104INFO[0000]  - edac                                       source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - infiniban"..., 104INFO[0000]  - infiniband                                 source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - netstat  "..., 104INFO[0000]  - netstat                                    source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - hwmon    "..., 104INFO[0000]  - hwmon                                      source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - diskstats"..., 104INFO[0000]  - diskstats                                  source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - textfile "..., 104INFO[0000]  - textfile                                   source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - uname    "..., 104INFO[0000]  - uname                                      source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - entropy  "..., 104INFO[0000]  - entropy                                    source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - filesyste"..., 104INFO[0000]  - filesystem                                 source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - xfs      "..., 104INFO[0000]  - xfs                                        source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - stat     "..., 104INFO[0000]  - stat                                       source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - arp      "..., 104INFO[0000]  - arp                                        source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - ipvs     "..., 104INFO[0000]  - ipvs                                       source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - meminfo  "..., 104INFO[0000]  - meminfo                                    source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - zfs      "..., 104INFO[0000]  - zfs                                        source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - filefd   "..., 104INFO[0000]  - filefd                                     source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - netdev   "..., 104INFO[0000]  - netdev                                     source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - mdadm    "..., 104INFO[0000]  - mdadm                                      source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - time     "..., 104INFO[0000]  - time                                       source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - bcache   "..., 104INFO[0000]  - bcache                                     source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - conntrack"..., 104INFO[0000]  - conntrack                                  source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - vmstat   "..., 104INFO[0000]  - vmstat                                     source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - sockstat "..., 104INFO[0000]  - sockstat                                   source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - timex    "..., 104INFO[0000]  - timex                                      source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - cpu      "..., 104INFO[0000]  - cpu                                        source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000]  - wifi     "..., 104INFO[0000]  - wifi                                       source="node_exporter.go:52"
) = 104
write(2, "\33[36mINFO\33[0m[0000] Listening on"..., 104INFO[0000] Listening on :9100                            source="node_exporter.go:76"
) = 104
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
close(3)                                = 0
socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP) = 3
setsockopt(3, SOL_IPV6, IPV6_V6ONLY, [1], 4) = 0
bind(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP) = 5
setsockopt(5, SOL_IPV6, IPV6_V6ONLY, [0], 4) = 0
bind(5, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
close(5)                                = 0
close(3)                                = 0
socket(AF_INET6, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
setsockopt(3, SOL_IPV6, IPV6_V6ONLY, [0], 4) = 0
setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(3, {sa_family=AF_INET6, sin6_port=htons(9100), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
listen(3, 128)                          = 0
epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14487408, u64=139642286182256}}) = 0
getsockname(3, {sa_family=AF_INET6, sin6_port=htons(9100), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
accept4(3, 0xc42014b928, 0xc42014b91c, SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(4, [], 128, 0)               = 0
epoll_wait(4, [{EPOLLIN, {u32=14487408, u64=139642286182256}}], 128, -1) = 1
futex(0x104a1b8, FUTEX_WAKE, 1)         = 1
accept4(3, {sa_family=AF_INET6, sin6_port=htons(34105), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28], SOCK_CLOEXEC|SOCK_NONBLOCK) = 5
epoll_ctl(4, EPOLL_CTL_ADD, 5, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14487216, u64=139642286182064}}) = 0
getsockname(5, {sa_family=AF_INET6, sin6_port=htons(9100), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
setsockopt(5, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(5, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(5, SOL_TCP, TCP_KEEPINTVL, [180], 4) = 0
setsockopt(5, SOL_TCP, TCP_KEEPIDLE, [180], 4) = 0
accept4(3, 0xc42014b928, 0xc42014b91c, SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
read(5, "GET /metrics HTTP/1.1\r\nUser-Agen"..., 4096) = 85
openat(AT_FDCWD, "/proc/sys/fs/file-nr", O_RDONLY|O_CLOEXEC) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14487024, u64=139642286181872}}) = 0
fcntl(6, F_GETFL)                       = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
read(6, "1088\t0\t185114\n", 512)       = 14
read(6, "", 1522)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 6, 0xc42003daac) = 0
close(6)                                = 0
read(5, 0xc4201b2731, 1)                = -1 EAGAIN (Resource temporarily unavailable)
stat("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
stat("/proc/9917", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
openat(AT_FDCWD, "/proc/9917/stat", O_RDONLY|O_CLOEXEC) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14487024, u64=139642286181872}}) = -1 EPERM (Operation not permitted)
epoll_ctl(4, EPOLL_CTL_DEL, 6, 0xc42004f474) = -1 EPERM (Operation not permitted)
read(6, "9917 (node_exporter) R 9915 9915"..., 512) = 333
read(6, "", 1203)                       = 0
close(6)                                = 0
openat(AT_FDCWD, "/proc/stat", O_RDONLY|O_CLOEXEC) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14487024, u64=139642286181872}}) = 0
fcntl(6, F_GETFL)                       = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
read(6, "cpu  94412 81 49910 40362352 158"..., 4096) = 787
read(6, "", 3309)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 6, 0xc42004efbc) = 0
close(6)                                = 0
openat(AT_FDCWD, "/proc/9917/fd", O_RDONLY|O_CLOEXEC) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14487024, u64=139642286181872}}) = -1 EPERM (Operation not permitted)
epoll_ctl(4, EPOLL_CTL_DEL, 6, 0xc42004f6cc) = -1 EPERM (Operation not permitted)
getdents64(6, /* 9 entries */, 4096)    = 216
getdents64(6, /* 0 entries */, 4096)    = 0
close(6)                                = 0
openat(AT_FDCWD, "/proc/9917/limits", O_RDONLY|O_CLOEXEC) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14487024, u64=139642286181872}}) = -1 EPERM (Operation not permitted)
epoll_ctl(4, EPOLL_CTL_DEL, 6, 0xc42004f5f4) = -1 EPERM (Operation not permitted)
read(6, "Limit                     Soft L"..., 4096) = 1323
mmap(0xc420200000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc420200000
mmap(0xc41ffe8000, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc41ffe8000
read(6, "", 2773)                       = 0
close(6)                                = 0
epoll_wait(4, [{EPOLLOUT, {u32=14487216, u64=139642286182064}}], 128, 0) = 1
openat(AT_FDCWD, "/proc/sys/net/netfilter/nf_conntrack_count", O_RDONLY|O_CLOEXEC) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14487024, u64=139642286181872}}) = 0
fcntl(6, F_GETFL)                       = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
fstat(6, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(6, "27\n", 512)                    = 3
read(6, "", 1533)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 6, 0xc42003bbd4) = 0
close(6)                                = 0
openat(AT_FDCWD, "/proc/sys/net/netfilter/nf_conntrack_max", O_RDONLY|O_CLOEXEC) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14487024, u64=139642286181872}}) = 0
fcntl(6, F_GETFL)                       = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
fstat(6, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(6, "65536\n", 512)                 = 6
read(6, "", 1530)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 6, 0xc42003bbd4) = 0
close(6)                                = 0
openat(AT_FDCWD, "/proc/vmstat", O_RDONLY|O_CLOEXEC) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14487024, u64=139642286181872}}) = 0
fcntl(6, F_GETFL)                       = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
read(6, "nr_free_pages 327479\nnr_alloc_ba"..., 4096) = 2316
read(6, "", 4096)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 6, 0xc42003bbc4) = 0
close(6)                                = 0
openat(AT_FDCWD, "/proc/net/sockstat", O_RDONLY|O_CLOEXEC) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14487024, u64=139642286181872}}) = 0
fcntl(6, F_GETFL)                       = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
read(6, "sockets: used 219\nTCP: inuse 5 o"..., 4096) = 132
read(6, "", 3964)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 6, 0xc42003ba8c) = 0
close(6)                                = 0
adjtimex({modes=0, offset=0, freq=0, maxerror=16000000, esterror=16000000, status=STA_UNSYNC, constant=2, precision=1, tolerance=32768000, time={1508207660, 900507}, tick=10000, ppsfreq=0, jitter=0, shift=0, stabil=0, jitcnt=0, calcnt=0, errcnt=0, stbcnt=0, tai=0}) = 5 (TIME_ERROR)
stat("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
openat(AT_FDCWD, "/proc/stat", O_RDONLY|O_CLOEXEC) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14487024, u64=139642286181872}}) = 0
fcntl(6, F_GETFL)                       = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
read(6, "cpu  94412 81 49910 40362352 158"..., 4096) = 787
read(6, "", 3309)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 6, 0xc42004b26c) = 0
close(6)                                = 0
stat("/sys/bus/cpu/devices", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
openat(AT_FDCWD, "/sys/bus/cpu/devices", O_RDONLY|O_CLOEXEC) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14487024, u64=139642286181872}}) = -1 EPERM (Operation not permitted)
epoll_ctl(4, EPOLL_CTL_DEL, 6, 0xc42004b71c) = -1 EPERM (Operation not permitted)
getdents64(6, /* 3 entries */, 4096)    = 72
getdents64(6, /* 0 entries */, 4096)    = 0
close(6)                                = 0
stat("/sys/bus/cpu/devices/cpu0/cpufreq", 0xc42017d488) = -1 ENOENT (No such file or directory)
stat("/sys/bus/cpu/devices/cpu0/thermal_throttle", 0xc42017d558) = -1 ENOENT (No such file or directory)
ERRO[0004] Error on statfs() system call for "/var/lib/docker/overlay/332d554a8338d480294003470c34daddbda9200e1c5e90d147b27d461a798040/merged": permission denied  source="filesystem_linux.go:57"
ERRO[0004] Error on statfs() system call for "/var/lib/docker/containers/bf46847e19811fdc47845143ba001c46f737c80ecb9afc647eedad82cc21c6e1/shm": permission denied  source="filesystem_linux.go:57"
ERRO[0004] Error on statfs() system call for "net:[4026532447]": no such file or directory  source="filesystem_linux.go:57"
futex(0x104a1b8, FUTEX_WAKE, 1)         = 1
stat("/sys/bus/node/devices", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
openat(AT_FDCWD, "/sys/bus/node/devices", O_RDONLY|O_CLOEXEC) = 7
epoll_ctl(4, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14486832, u64=139642286181680}}) = -1 EPERM (Operation not permitted)
epoll_ctl(4, EPOLL_CTL_DEL, 7, 0xc42004b71c) = -1 EPERM (Operation not permitted)
getdents64(7, /* 3 entries */, 4096)    = 80
getdents64(7, /* 0 entries */, 4096)    = 0
close(7)                                = 0
stat("/sys/bus/node/devices/node0/cpulist", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
openat(AT_FDCWD, "/sys/bus/node/devices/node0/cpulist", O_RDONLY|O_CLOEXEC) = 7
epoll_ctl(4, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14486832, u64=139642286181680}}) = 0
fcntl(7, F_GETFL)                       = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
fstat(7, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
read(7, "0\n", 4608)                    = 2
read(7, "", 4606)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 7, 0xc42004b88c) = 0
close(7)                                = 0
stat("/sys/bus/node/devices/node0/cpu0/thermal_throttle/package_throttle_count", 0xc4202d7ca8) = -1 ENOENT (No such file or directory)
futex(0x104a990, FUTEX_WAIT, 0, NULLERRO[0013] Error on statfs() system call for "/var/lib/docker/overlay/332d554a8338d480294003470c34daddbda9200e1c5e90d147b27d461a798040/merged": permission denied  source="filesystem_linux.go:57"
ERRO[0013] Error on statfs() system call for "/var/lib/docker/containers/bf46847e19811fdc47845143ba001c46f737c80ecb9afc647eedad82cc21c6e1/shm": permission denied  source="filesystem_linux.go:57"
ERRO[0013] Error on statfs() system call for "net:[4026532447]": no such file or directory  source="filesystem_linux.go:57"
)   = 0
futex(0x104a7b0, FUTEX_WAKE, 1)         = 1
openat(AT_FDCWD, "/sys/class/hwmon", O_RDONLY|O_CLOEXEC) = 10
epoll_ctl(4, EPOLL_CTL_ADD, 10, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14486256, u64=139642286181104}}) = -1 EPERM (Operation not permitted)
epoll_ctl(4, EPOLL_CTL_DEL, 10, 0xc42003dae4) = -1 EPERM (Operation not permitted)
openat(AT_FDCWD, "/proc/diskstats", O_RDONLY|O_CLOEXEC) = 11
epoll_ctl(4, EPOLL_CTL_ADD, 11, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14486256, u64=139642286181104}}) = 0
fcntl(11, F_GETFL)                      = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(11, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
read(11, "   2       0 fd0 0 0 0 0 0 0 0 0"..., 4096) = 489
openat(AT_FDCWD, "/proc/net/ip_vs_stats", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/proc/meminfo", O_RDONLY|O_CLOEXEC) = 12
epoll_ctl(4, EPOLL_CTL_ADD, 12, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14486064, u64=139642286180912}}) = 0
fcntl(12, F_GETFL)                      = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(12, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
uname({sysname="Linux", nodename="localhost.localdomain", ...}) = 0
openat(AT_FDCWD, "/proc/sys/kernel/random/entropy_avail", O_RDONLY|O_CLOEXEC) = 13
epoll_ctl(4, EPOLL_CTL_ADD, 13, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14485872, u64=139642286180720}}) = 0
fcntl(13, F_GETFL)                      = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(13, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
fstat(13, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(13, "229\n", 512)                  = 4
read(13, "", 1532)                      = 0
epoll_ctl(4, EPOLL_CTL_DEL, 13, 0xc42003fbd4) = 0
close(13)                               = 0
openat(AT_FDCWD, "/proc/mounts", O_RDONLY|O_CLOEXEC) = 13
epoll_ctl(4, EPOLL_CTL_ADD, 13, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14485872, u64=139642286180720}}) = 0
fcntl(13, F_GETFL)                      = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(13, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
stat("/sys/fs/xfs", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
openat(AT_FDCWD, "/sys/fs/xfs", O_RDONLY|O_CLOEXEC) = 14
epoll_ctl(4, EPOLL_CTL_ADD, 14, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14485680, u64=139642286180528}}) = -1 EPERM (Operation not permitted)
epoll_ctl(4, EPOLL_CTL_DEL, 14, 0xc4204bb85c) = -1 EPERM (Operation not permitted)
getdents64(14, /* 5 entries */, 4096)   = 120
getdents64(14, /* 0 entries */, 4096)   = 0
close(14)                               = 0
stat("/sys/fs/xfs/dm-0", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
openat(AT_FDCWD, "/sys/fs/xfs/dm-0", O_RDONLY|O_CLOEXEC) = 14
epoll_ctl(4, EPOLL_CTL_ADD, 14, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14485680, u64=139642286180528}}) = -1 EPERM (Operation not permitted)
epoll_ctl(4, EPOLL_CTL_DEL, 14, 0xc4204bb90c) = -1 EPERM (Operation not permitted)
stat("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
openat(AT_FDCWD, "/proc/stat", O_RDONLY|O_CLOEXEC) = 15
epoll_ctl(4, EPOLL_CTL_ADD, 15, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14485680, u64=139642286180528}}) = 0
fcntl(15, F_GETFL)                      = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(15, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
read(15, "cpu  94435 81 49922 40364665 158"..., 4096) = 787
openat(AT_FDCWD, "/proc/net/arp", O_RDONLY|O_CLOEXEC) = 16
epoll_ctl(4, EPOLL_CTL_ADD, 16, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14485488, u64=139642286180336}}) = 0
fcntl(16, F_GETFL)                      = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(16, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
openat(AT_FDCWD, "/proc/spl/kstat/zfs/arcstats", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/proc/sys/fs/file-nr", O_RDONLY|O_CLOEXEC) = 17
epoll_ctl(4, EPOLL_CTL_ADD, 17, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14485296, u64=139642286180144}}) = 0
fcntl(17, F_GETFL)                      = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(17, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
read(17, "1088\t0\t185114\n", 512)      = 14
read(17, "", 1522)                      = 0
epoll_ctl(4, EPOLL_CTL_DEL, 17, 0xc420039aac) = 0
close(17)                               = 0
openat(AT_FDCWD, "/proc/net/dev", O_RDONLY|O_CLOEXEC) = 17
epoll_ctl(4, EPOLL_CTL_ADD, 17, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14485296, u64=139642286180144}}) = 0
fcntl(17, F_GETFL)                      = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(17, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
read(17, "Inter-|   Receive               "..., 4096) = 710
openat(AT_FDCWD, "/proc/mdstat", O_RDONLY|O_CLOEXEC) = 18
epoll_ctl(4, EPOLL_CTL_ADD, 18, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14485104, u64=139642286179952}}) = 0
fcntl(18, F_GETFL)                      = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(18, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
stat("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
openat(AT_FDCWD, "/proc/stat", O_RDONLY|O_CLOEXEC) = 19
epoll_ctl(4, EPOLL_CTL_ADD, 19, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14484912, u64=139642286179760}}) = 0
fcntl(19, F_GETFL)                      = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(19, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
stat("/sys/fs/bcache", 0xc4204c41d8)    = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/proc/vmstat", O_RDONLY|O_CLOEXEC) = 20
epoll_ctl(4, EPOLL_CTL_ADD, 20, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14484720, u64=139642286179568}}) = 0
fcntl(20, F_GETFL)                      = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(20, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
read(20, "nr_free_pages 327080\nnr_alloc_ba"..., 4096) = 2317
openat(AT_FDCWD, "/proc/net/sockstat", O_RDONLY|O_CLOEXEC) = 21
epoll_ctl(4, EPOLL_CTL_ADD, 21, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=14484528, u64=139642286179376}}) = 0
fcntl(21, F_GETFL)                      = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(21, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
adjtimex({modes=0, offset=0, freq=0, maxerror=16000000, esterror=16000000, status=STA_UNSYNC, constant=2, precision=1, tolerance=32768000, time={1508207684, 438802}, tick=10000, ppsfreq=0, jitter=0, shift=0, stabil=0, jitcnt=0, calcnt=0, errcnt=0, stbcnt=0, tai=0}) = 5 (TIME_ERROR)
socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC) = 22
bind(22, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
getsockname(22, {sa_family=AF_NETLINK, pid=9917, groups=00000000}, [12]) = 0
sendmsg(22, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{" \0\0\0\20\0\1\0\217\233 \252\275&\0\0\3\1\0\0\f\0\2\0nl80211\0", 32}], msg_controllen=0, msg_flags=0}, 0) = 32
recvmsg(22, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\220\7\0\0\20\0\0\0\217\233 \252\275&\0\0\1\2\0\0\f\0\2\0nl80211\0"..., 4096}], msg_controllen=0, msg_flags=0}, MSG_PEEK) = 1936
recvmsg(22, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\220\7\0\0\20\0\0\0\217\233 \252\275&\0\0\1\2\0\0\f\0\2\0nl80211\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 1936
stat("/sys/devices/system/edac/mc", 0xc4204c42a8) = -1 ENOENT (No such file or directory)
epoll_wait(4, [{EPOLLIN|EPOLLOUT, {u32=14486256, u64=139642286181104}}, {EPOLLIN|EPOLLOUT, {u32=14486064, u64=139642286180912}}, {EPOLLIN, {u32=14485872, u64=139642286180720}}, {EPOLLIN|EPOLLOUT, {u32=14485680, u64=139642286180528}}, {EPOLLIN|EPOLLOUT, {u32=14485488, u64=139642286180336}}, {EPOLLIN|EPOLLOUT, {u32=14485296, u64=139642286180144}}, {EPOLLIN, {u32=14485104, u64=139642286179952}}, {EPOLLIN|EPOLLOUT, {u32=14484912, u64=139642286179760}}, {EPOLLIN|EPOLLOUT, {u32=14484720, u64=139642286179568}}, {EPOLLIN|EPOLLOUT, {u32=14484528, u64=139642286179376}}], 128, 0) = 10
epoll_wait(4, ERRO[0028] Error on statfs() system call for "/var/lib/docker/overlay/332d554a8338d480294003470c34daddbda9200e1c5e90d147b27d461a798040/merged": permission denied  source="filesystem_linux.go:57"
ERRO[0028] Error on statfs() system call for "/var/lib/docker/containers/bf46847e19811fdc47845143ba001c46f737c80ecb9afc647eedad82cc21c6e1/shm": permission denied  source="filesystem_linux.go:57"
ERRO[0028] Error on statfs() system call for "net:[4026532447]": no such file or directory  source="filesystem_linux.go:57"
[{EPOLLIN|EPOLLOUT, {u32=14484336, u64=139642286179184}}], 128, -1) = 1
futex(0x104a990, FUTEX_WAIT, 0, NULLERRO[0043] Error on statfs() system call for "/var/lib/docker/overlay/332d554a8338d480294003470c34daddbda9200e1c5e90d147b27d461a798040/merged": permission denied  source="filesystem_linux.go:57"
ERRO[0043] Error on statfs() system call for "/var/lib/docker/containers/bf46847e19811fdc47845143ba001c46f737c80ecb9afc647eedad82cc21c6e1/shm": permission denied  source="filesystem_linux.go:57"
ERRO[0043] Error on statfs() system call for "net:[4026532447]": no such file or directory  source="filesystem_linux.go:57"
ERRO[0058] Error on statfs() system call for "/var/lib/docker/overlay/332d554a8338d480294003470c34daddbda9200e1c5e90d147b27d461a798040/merged": permission denied  source="filesystem_linux.go:57"
ERRO[0058] Error on statfs() system call for "/var/lib/docker/containers/bf46847e19811fdc47845143ba001c46f737c80ecb9afc647eedad82cc21c6e1/shm": permission denied  source="filesystem_linux.go:57"
ERRO[0058] Error on statfs() system call for "net:[4026532447]": no such file or directory  source="filesystem_linux.go:57"

@SuperQ I think you are correct, when I use ./node_exporter --no-collector.hwmon, it works well.

@SuperQ
Copy link
Member

SuperQ commented Oct 17, 2017

Please try running and then scraping the node_exporter with all collectors except the hwmon collector and attach the log and trace output from this command.

strace -o node_exporter.trace ./node_exporter --log.level="debug" \
  --no-collector.arp \
  --no-collector.bcache \
  --no-collector.conntrack \
  --no-collector.cpu \
  --no-collector.diskstats \
  --no-collector.edac \
  --no-collector.entropy \
  --no-collector.filefd \
  --no-collector.filesystem \
  --no-collector.infiniband \
  --no-collector.ipvs \
  --no-collector.loadavg \
  --no-collector.mdadm \
  --no-collector.meminfo \
  --no-collector.netdev \
  --no-collector.netstat \
  --no-collector.sockstat \
  --no-collector.stat \
  --no-collector.textfile \
  --no-collector.time \
  --no-collector.timex \
  --no-collector.uname \
  --no-collector.vmstat \
  --no-collector.wifi \
  --no-collector.xfs \
  --no-collector.zfs

@DanielNeedles
Copy link

DanielNeedles commented Oct 18, 2017

Lol. Wish I had seen this. My thread tonight is here:
https://groups.google.com/forum/#!topic/prometheus-users/JbCwbMrzt9E
See the trace files at the bottom of this post.

What I saw:
The problem occurred even if I only hit CPU or MEMUSAGE with 0.15. The same setup running under 0.14 worked just fine. When I refreshed the browser somehow the request for metrics dropped off and the agent changed from Prometheus to the Firefox browser. Here's the snipped.

WORKING 0.14.0
read(6, "GET /metrics?collect%5B%5D=meminfo HTTP/1.1\r\nHost: 127.0.0.1:9100\r\nUser-Agent: Prometheus/1.8.0\r\nAccept: application/vnd.google.protobuf;proto=io.prometheus.client.MetricFamily;encoding=delimited;q=0.7,text/plain;version=0.0.4;q=0.3,/;q=0.1\r\nX-Prometheus-Scrape-Timeout-Seconds: 10.000000\r\nAccept-Encoding: gzip\r\nConnection: close\r\n\r\n", 4096) = 336

BROKEN 0.15.0
read(5, "GET /metrics HTTP/1.1\r\nHost: 127.0.0.1:9100\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nConnection: keep-alive\r\nUpgrade-Insecure-Requests: 1\r\nCache-Control: max-age=0\r\n\r\n", 4096) = 347

When I refreshed the browser somehow the request for metrics dropped off and the agent changed from Prometheus to the Firefox browser.

My test case was

Working
Run under strace -s4096 node 0.14
Run Prometheus
Via Firefox hit http://127.0.0.1:9100/metrics
This works.

Clear Test
Kill Node, Promethus, and keep Firefox up.

Broken
Run under strace -s4096 node 0.15
Run Prometheus
Refresh the link to Firefox http://127.0.0.1:9100/metrics
This sits and spins

Trace Files For Both Working and Broken Testcase.
NodeExporterBug.zip

@SuperQ
Copy link
Member

SuperQ commented Oct 18, 2017

@DanielNeedles Please try the command I listed above to trace only the hwmon collector. FYI, using the collect[] parameter is not supported, and is ignored by 0.14.0 and previous versions. This feature is for advanced use cases and not recommended for normal operation.

@DanielNeedles
Copy link

DanielNeedles commented Oct 18, 2017

@SuperQ Thanks so much! And I appologize in advance for my failure find and RTFM but can you point me to the URL /man page on Node Explorer besides the source code? I know 0.14.0 is collecting metrics for memory and CPU, because I am graphing it. So clearly I don't understand what you mean by "collect[]" isn't supported. Is there a more indepth description of these various parameters and what node exporter is actually performing? Thanks! There probably should be something in the README if this is the command folks need to use on RHEL7/CentOS7. 8-) Attached is the resulting trace.
node_exporter-0.15.0.linux-amd64-debug-nocollector.zip

@SuperQ
Copy link
Member

SuperQ commented Oct 18, 2017

@DanielNeedles We added a new feature to the node_exporter in 0.15.0 to allow selective collection. Specifying this param is optional, and usually unnecessary. We typically recommend collecting all metrics from targets at the same interval, but there are some specific advanced use cases where it's handy to collect a small number of things with different frequencies. I think we need to clarify this in the README.

@SuperQ
Copy link
Member

SuperQ commented Oct 18, 2017

You included only the log output of the exporter, but not the strace contents. Also, it would help to set the log level to debug, so that I can get the timing info for your curl test command to see where the hwmon collector is failing.

@DanielNeedles
Copy link

DanielNeedles commented Oct 18, 2017

@SuperQ My bad. I missed that you had the -o option arealy. Adding that file.

This is the command I used for your reference:
[dneedles@localhost node_exporter-0.15.0.linux-amd64]$ strace -o node_exporter.trace ./node_exporter --log.level="debug" \

--no-collector.arp
--no-collector.bcache
--no-collector.conntrack
--no-collector.cpu
--no-collector.diskstats
--no-collector.edac
--no-collector.entropy
--no-collector.filefd
--no-collector.filesystem
--no-collector.infiniband
--no-collector.ipvs
--no-collector.loadavg
--no-collector.mdadm
--no-collector.meminfo
--no-collector.netdev
--no-collector.netstat
--no-collector.sockstat
--no-collector.stat
--no-collector.textfile
--no-collector.time
--no-collector.timex
--no-collector.uname
--no-collector.vmstat
--no-collector.wifi
--no-collector.xfs
--no-collector.zfs 2> /tmp/node_exporter-0.15.0.linux-amd64-debug-nocollector.trc

There was no STDOUT to the console, only extra STDERR which all went to the file I sent.

node_exporter-0.15.0.linux-amd64-debug-nocollector.zip

@stephan-vollmer
Copy link

I have encountered the same issue with Ubuntu 14.04 LTS:

uname -a
Linux obcs690 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

node_exporter 0.14.0 worked fine, in node_exporter 0.15.0 I need to disable hwmon. I noticed it because Prometheus was unable to get metrics and the host eventually ran out of memory because the node_exporter process used all of it.

@SuperQ
Copy link
Member

SuperQ commented Oct 19, 2017

@stephan-vollmer If you can also run the same procedure above to see if we can trace the root cause of what is going on with these older kernels.

@DanielNeedles
Copy link

@SuperQ Did the new trace I provide help at all? Or do you need something else?

@SuperQ
Copy link
Member

SuperQ commented Oct 19, 2017

Yes, a little. It was attempting to open temp1_input, and got an error. We may be not catching the error correctly, or there is some other problem. If you look in the trace, you could try cat on the last file it tries to open. I will see about adding some more debug logging to this collector.

@stephan-vollmer
Copy link

@SuperQ Here is the trace file I created.
node_exporter.trace.zip

@lindhor
Copy link

lindhor commented Oct 27, 2017

I also have problems with running 0.15.0 on CentOS (in Docker), in my case using the filesystem collector. I need to exclude the filesystems "net.*" to avoid loads of errors like time="2017-10-27T15:12:29Z" level=error msg="Error on statfs() system call for \"net:[4026532475]\": no such file or directory" source="filesystem_linux.go:57".

My kernel is Linux prometheus-node1-roli 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux in both the container and host.

@DanielNeedles
Copy link

@lindhor my work around was to use 0.14.0 for now.

@hrak
Copy link

hrak commented Oct 30, 2017

I'm seeing similar behavior on Ubuntu 14.04 x86_64 (on VMware) with node_exporter 0.15.0. A curl call to /metrics just hangs after the GET has been sent. Disabling the hwmon collector (or downgrading to node_exporter-0.14.0) solves the problem. I can provide more data if needed.

strace_node_exporter-0.15.0.txt

@SuperQ
Copy link
Member

SuperQ commented Oct 30, 2017

@hrak That helps, again, I'm seeing an EAGAIN trying to open a temp1_input file. At least we're seeing something consistent.

I don't see any obvious places where we're trying to read a file and don't catch errors.

@SuperQ
Copy link
Member

SuperQ commented Nov 1, 2017

I've built a test node_exporter binary with some more verbose debugging logs. I will setup some test machines with older distros to try and reproduce the bug.

@SuperQ
Copy link
Member

SuperQ commented Nov 2, 2017

I've done several hours of testing on a laptop (ThinkPad x230) running CentOS 7 with both the production and test binary and haven not been able to reproduce the error. 😞

If anyone is interested in testing, please try building from the superq/hwmon_test branch. It includes a bunch of extra debug text, and only enables hwmon by default.

@DanielNeedles
Copy link

DanielNeedles commented Nov 2, 2017 via email

@hrak
Copy link

hrak commented Nov 3, 2017

@SuperQ I just tried your branch, and this is what i'm getting:

INFO[0000] Starting node_exporter (version=, branch=, revision=)  source="node_exporter.go:78"
INFO[0000] Build context (go=go1.9.2, user=, date=)      source="node_exporter.go:79"
INFO[0000] Enabled collectors:                           source="node_exporter.go:86"
INFO[0000]  - hwmon                                      source="node_exporter.go:88"
INFO[0000] Listening on :9100                            source="node_exporter.go:103"
DEBU[0010] collect query: []                             source="node_exporter.go:35"
DEBU[0010] START: hwmon collector starting.              source="collector.go:130"
DEBU[0010] Reading directory "/sys/class/hwmon"          source="hwmon_linux.go:403"
DEBU[0010] Reading directory "/sys/class/hwmon/hwmon0"   source="hwmon_linux.go:106"
DEBU[0010] Reading file "/sys/class/hwmon/hwmon0/power"  source="hwmon_linux.go:64"
DEBU[0010] Reading directory "/sys/class/hwmon/hwmon0/device"  source="hwmon_linux.go:106"
DEBU[0010] Reading file "/sys/class/hwmon/hwmon0/device/power"  source="hwmon_linux.go:64"
DEBU[0010] Reading file "/sys/class/hwmon/hwmon0/device/temp1_crit"  source="hwmon_linux.go:64"
DEBU[0010] Reading file "/sys/class/hwmon/hwmon0/device/temp1_crit_alarm"  source="hwmon_linux.go:64"
DEBU[0010] Reading file "/sys/class/hwmon/hwmon0/device/temp1_input"  source="hwmon_linux.go:64"
DEBU[0015] collect query: []                             source="node_exporter.go:35"
DEBU[0015] START: hwmon collector starting.              source="collector.go:130"
DEBU[0015] Reading directory "/sys/class/hwmon"          source="hwmon_linux.go:403"
DEBU[0015] Reading directory "/sys/class/hwmon/hwmon0"   source="hwmon_linux.go:106"
DEBU[0015] Reading file "/sys/class/hwmon/hwmon0/power"  source="hwmon_linux.go:64"
DEBU[0015] Reading directory "/sys/class/hwmon/hwmon0/device"  source="hwmon_linux.go:106"
DEBU[0015] Reading file "/sys/class/hwmon/hwmon0/device/power"  source="hwmon_linux.go:64"
DEBU[0015] Reading file "/sys/class/hwmon/hwmon0/device/temp1_crit"  source="hwmon_linux.go:64"
DEBU[0015] Reading file "/sys/class/hwmon/hwmon0/device/temp1_crit_alarm"  source="hwmon_linux.go:64"
DEBU[0015] Reading file "/sys/class/hwmon/hwmon0/device/temp1_input"  source="hwmon_linux.go:64"

What i also noticed is that i'm getting a 'Resource temporarily unavailable' when trying to read from /sys/class/hwmon/hwmon0/device/temp1_input. Others are fine:

root@lc2-apigw1:~# ls /sys/class/hwmon/hwmon0/device/temp1_input
/sys/class/hwmon/hwmon0/device/temp1_input
root@lc2-apigw1:~# cat /sys/class/hwmon/hwmon0/device/temp1_input
cat: /sys/class/hwmon/hwmon0/device/temp1_input: Resource temporarily unavailable
root@lc2-apigw1:~# cat /sys/class/hwmon/hwmon0/device/temp1_crit_alarm
0
root@lc2-apigw1:~# cat /sys/class/hwmon/hwmon0/device/temp1_crit
100000

node_exporter_hwmon_strace.txt

@SuperQ
Copy link
Member

SuperQ commented Nov 3, 2017

@hrak Thanks, I think what we might have to do is replace the ioutil with a different file reader in order to deal with the Resource temporarily unavailable issues. It seems like it is hanging and retrying forever, rather than returning the error to the collector.

Can you give me some details on what your hardware is, or what hwmon0 is?

ls -l /sys/class/hwmon/hwmon0
cat /sys/class/hwmon/hwmon0/name

@hrak
Copy link

hrak commented Nov 3, 2017

@SuperQ This is on Ubuntu 14.04 x86_64 on VMware. Actually i'm seeing a pattern of this only happening on Ubuntu 14.04 in our VMware env. Bare metal is fine.

root@lc2-apigw1:~# ls -l /sys/class/hwmon/hwmon0
lrwxrwxrwx 1 root root 0 Oct 12 05:53 /sys/class/hwmon/hwmon0 -> ../../devices/platform/coretemp.0/hwmon/hwmon0
root@lc2-apigw1:~# cat /sys/class/hwmon/hwmon0/name
cat: /sys/class/hwmon/hwmon0/name: No such file or directory
root@lc2-apigw1:~# ls -l /sys/class/hwmon/hwmon0/
total 0
lrwxrwxrwx 1 root root    0 Oct 12 05:53 device -> ../../../coretemp.0
drwxr-xr-x 2 root root    0 Oct 12 05:53 power
lrwxrwxrwx 1 root root    0 Oct 12 05:53 subsystem -> ../../../../../class/hwmon
-rw-r--r-- 1 root root 4096 Oct 12 05:53 uevent
root@lc2-apigw1:~# uname -a
Linux lc2-apigw1 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 15:49:21 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
root@lc2-apigw1:~# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 14.04.5 LTS
Release:	14.04
Codename:	trusty

@SuperQ
Copy link
Member

SuperQ commented Nov 3, 2017

I'm assuming that vmware is trying to present a CPU that Ubuntu thinks has coretemp, but this isn't actually a valid combination. It should be safe enough to unload and blacklist coretemp as a workaround.

I'm still trying to figure out if bufio scanner is going to be non-blocking.

@hrak
Copy link

hrak commented Nov 3, 2017

@SuperQ i just found out installing kernel 4.4.0 (linux-generic-lts-xenial) on Ubuntu 14.04 also fixes the issue. So it seems the stock 3.13 kernel of Ubuntu 14.04 doesn't play well with hwmon on VMware. It still wouldn't hurt to look into a non-blocking read though 👍

@SuperQ
Copy link
Member

SuperQ commented Nov 3, 2017

@hrak Good news there. Does the kernel load/report hwmon coretemp metrics correctly?

@hrak
Copy link

hrak commented Nov 3, 2017

Its reporting bogus values from the looks of things, but i guess that makes sense in a VM?
This is on a VM with 2 cores.

root@lc2-apigw1:~# ls -l /sys/class/hwmon/hwmon0
lrwxrwxrwx 1 root root 0 Nov  3 10:03 /sys/class/hwmon/hwmon0 -> ../../devices/platform/coretemp.0/hwmon/hwmon0
root@lc2-apigw1:~# ls -l /sys/class/hwmon/hwmon0/
total 0
lrwxrwxrwx 1 root root    0 Nov  3 10:03 device -> ../../../coretemp.0
-r--r--r-- 1 root root 4096 Nov  3 10:03 name
drwxr-xr-x 2 root root    0 Nov  3 10:03 power
lrwxrwxrwx 1 root root    0 Nov  3 10:03 subsystem -> ../../../../../class/hwmon
-r--r--r-- 1 root root 4096 Nov  3 10:03 temp1_crit
-r--r--r-- 1 root root 4096 Nov  3 10:03 temp1_crit_alarm
-r--r--r-- 1 root root 4096 Nov  3 10:03 temp1_input
-r--r--r-- 1 root root 4096 Nov  3 10:03 temp1_label
-r--r--r-- 1 root root 4096 Nov  3 10:03 temp1_max
-r--r--r-- 1 root root 4096 Nov  3 10:03 temp2_crit
-r--r--r-- 1 root root 4096 Nov  3 10:03 temp2_crit_alarm
-r--r--r-- 1 root root 4096 Nov  3 10:03 temp2_input
-r--r--r-- 1 root root 4096 Nov  3 10:03 temp2_label
-r--r--r-- 1 root root 4096 Nov  3 10:03 temp2_max
-r--r--r-- 1 root root 4096 Nov  3 10:03 temp3_crit
-r--r--r-- 1 root root 4096 Nov  3 10:03 temp3_crit_alarm
-r--r--r-- 1 root root 4096 Nov  3 10:03 temp3_input
-r--r--r-- 1 root root 4096 Nov  3 10:03 temp3_label
-r--r--r-- 1 root root 4096 Nov  3 10:03 temp3_max
-rw-r--r-- 1 root root 4096 Nov  3 10:03 uevent
root@lc2-apigw1:~# cat /sys/class/hwmon/hwmon0/temp1_input
100000
root@lc2-apigw1:~# cat /sys/class/hwmon/hwmon0/temp2_input
100000
root@lc2-apigw1:~# cat /sys/class/hwmon/hwmon0/temp3_input
100000
# HELP node_hwmon_chip_names Annotation metric for human-readable chip names
# TYPE node_hwmon_chip_names gauge
node_hwmon_chip_names{chip="platform_coretemp_0",chip_name="coretemp"} 1
# HELP node_hwmon_sensor_label Label for given chip and sensor
# TYPE node_hwmon_sensor_label gauge
node_hwmon_sensor_label{chip="platform_coretemp_0",label="core_0",sensor="temp2"} 1
node_hwmon_sensor_label{chip="platform_coretemp_0",label="core_1",sensor="temp3"} 1
node_hwmon_sensor_label{chip="platform_coretemp_0",label="physical_id_0",sensor="temp1"} 1
# HELP node_hwmon_temp_celsius Hardware monitor for temperature (input)
# TYPE node_hwmon_temp_celsius gauge
node_hwmon_temp_celsius{chip="platform_coretemp_0",sensor="temp1"} 100
node_hwmon_temp_celsius{chip="platform_coretemp_0",sensor="temp2"} 100
node_hwmon_temp_celsius{chip="platform_coretemp_0",sensor="temp3"} 100
# HELP node_hwmon_temp_crit_alarm_celsius Hardware monitor for temperature (crit_alarm)
# TYPE node_hwmon_temp_crit_alarm_celsius gauge
node_hwmon_temp_crit_alarm_celsius{chip="platform_coretemp_0",sensor="temp1"} 0
node_hwmon_temp_crit_alarm_celsius{chip="platform_coretemp_0",sensor="temp2"} 0
node_hwmon_temp_crit_alarm_celsius{chip="platform_coretemp_0",sensor="temp3"} 0
# HELP node_hwmon_temp_crit_celsius Hardware monitor for temperature (crit)
# TYPE node_hwmon_temp_crit_celsius gauge
node_hwmon_temp_crit_celsius{chip="platform_coretemp_0",sensor="temp1"} 100
node_hwmon_temp_crit_celsius{chip="platform_coretemp_0",sensor="temp2"} 100
node_hwmon_temp_crit_celsius{chip="platform_coretemp_0",sensor="temp3"} 100
# HELP node_hwmon_temp_max_celsius Hardware monitor for temperature (max)
# TYPE node_hwmon_temp_max_celsius gauge
node_hwmon_temp_max_celsius{chip="platform_coretemp_0",sensor="temp1"} 100
node_hwmon_temp_max_celsius{chip="platform_coretemp_0",sensor="temp2"} 100
node_hwmon_temp_max_celsius{chip="platform_coretemp_0",sensor="temp3"} 100

I still wonder what changed between 0.14.0 and 0.15.0 that broke this on 3.13 kernels though. 0.14.0 worked fine on these VMs.

@DanielNeedles
Copy link

@SuperQ Sorry I've been completely slammed. But I wanted to note that like @hrak 's case, the Linux system that failed was on VMware as well.

@lucasvel
Copy link

lucasvel commented Nov 6, 2017

Got this same issue running on RHEL7 and fixed with the --no-collector.hwmon flag.

But one color note is that of the 4 nodes I'm running, 3 of them are on kubs 1.8.0 and 1 on 1.8.1.
The one with 1.8.1 was running ok with the hwmon ON

@SuperQ
Copy link
Member

SuperQ commented Nov 6, 2017

@lucasvel Can you report on what kind of platform these are on? vmware?

Can you report the output of ls -l /sys/class/hwmon/.

@lucasvel
Copy link

lucasvel commented Nov 6, 2017

@SuperQ Yes they are on vmware.

# ls -l /sys/class/hwmon/
total 0
lrwxrwxrwx. 1 root root 0 Nov  6 18:24 hwmon0 -> ../../devices/platform/coretemp.0/hwmon/hwmon0
lrwxrwxrwx. 1 root root 0 Nov  6 18:24 hwmon1 -> ../../devices/platform/coretemp.2/hwmon/hwmon1
lrwxrwxrwx. 1 root root 0 Nov  6 18:24 hwmon2 -> ../../devices/platform/coretemp.4/hwmon/hwmon2
lrwxrwxrwx. 1 root root 0 Nov  6 18:24 hwmon3 -> ../../devices/platform/coretemp.6/hwmon/hwmon3

@SuperQ
Copy link
Member

SuperQ commented Nov 6, 2017

Yes, it seems like coretemp kernel module is broken on CentOS 7 and older Ubuntu systems when running on vmware. It would be best to unload and blacklist this kernel module.

@lucasvel
Copy link

lucasvel commented Nov 6, 2017

@SuperQ I can confirm that. Just unload the coretemp module and removed the --no-collector.hwmon flag and the node-exporter is running fine now.

Thanks!

@SuperQ
Copy link
Member

SuperQ commented Nov 7, 2017

We have published v0.15.1 that contains the workaround for broken hwmon data. Please post if it improves things.

@lucasvel
Copy link

lucasvel commented Nov 7, 2017

@SuperQ just tested the v0.15.1 and works like a charm without the need to disable the coretemp kernel module.

Thank you!

@DanielNeedles
Copy link

DanielNeedles commented Nov 7, 2017 via email

@hrak
Copy link

hrak commented Nov 8, 2017

@SuperQ verified 0.15.1 working here as well. Thanks!

@nsaud01
Copy link

nsaud01 commented Jan 8, 2018

I'm still getting Error on statfs() system call forError on statfs() system call for "/rootfs/sys/kernel/debug/tracing": permission denied" source="filesystem_linux.go:57`

Anything rootfs related is giving permission errors. This wasn't an issue in v14. My docker run command is:

docker run -d -p 9100:9100 \
  -v "/proc:/host/proc" \
  -v "/sys:/host/sys" \
  -v "/:/rootfs" \
  --net="host" \
  quay.io/prometheus/node-exporter:v0.15.2 \
    --path.procfs /host/proc \
    --path.sysfs /host/sys \
    --collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($|/)"

Is there something I'm missing?

@ojle
Copy link

ojle commented Jan 22, 2018

@lindhor I've tried several things but no luck, still getting that error. CentOS 7.3/Node exporter 15.2
Can you give details regarding startup/ignore flags that you used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants