Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic with mountstats at debian wheezy and weird magic with nfs at jessie #571

Closed
Savemech opened this issue Apr 24, 2017 · 9 comments
Closed
Assignees

Comments

@Savemech
Copy link

Host operating system: output of uname -a

Linux somewheezy 3.2.46 x86_64 GNU/Linux
and
Linux somejessie 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux

node_exporter version: output of node_exporter -version

node_exporter, version 0.14.0 (branch: master, revision: 840ba5d)
build user: root@bb6d0678e7f3
build date: 20170321-12:12:54
go version: go1.7.5

Are you running node_exporter in Docker?

nope

What did you do that produced an error?

/opt/prometheus/bin/node_exporter -collectors.enabled conntrack,diskstats,entropy,filefd, \
filesystem,hwmon,loadavg,mdadm,meminfo,netdev,\
netstat,sockstat,stat,textfile,time,uname,vmstat,mountstats - \
collector.textfile.directory=/var/lib/node_exporter/textfile_collector/

^actually there is no \ symbols, its one big line

nfsstat -c

Client rpc stats:
calls      retrans    authrefrsh
403452466   0          403452466

Client nfs v3:
null         getattr      setattr      lookup       access       readlink     
0         0% 265795759 65% 236       0% 16420     0% 137606953 34% 0         0% 
read         write        create       mkdir        symlink      mknod        
11371     0% 453       0% 119       0% 33        0% 0         0% 0         0% 
remove       rmdir        rename       link         readdir      readdirplus  
32        0% 0         0% 41        0% 0         0% 0         0% 19439     0% 
fsstat       fsinfo       pathconf     commit       
1478      0% 4         0% 2         0% 125       0% 
mount | grep nfs
rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
10.0.0.1:/var/lib/somedata on /var/lib/datatooo type nfs (rw,nodev,noatime,vers=3,rsize=262144,wsize=262144,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.110.220.1,mountvers=3,mountport=28425,mountproto=udp,local_lock=none,addr=10.110.220.1)
10.0.0.1:/var/lib/somedata2 on /var/lib/datatooo2 type nfs (rw,nodev,noatime,vers=3,rsize=262144,wsize=262144,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.0.1,mountvers=3,mountport=28425,mountproto=udp,local_lock=none,addr=10.0.0.1)
nfsd on /proc/fs/nfsd type nfsd (rw,relatime)

there is actually even more information on /proc/self/mountstats:

device rpc_pipefs mounted on /var/lib/nfs/rpc_pipefs with fstype rpc_pipefs
device 10.0.0.1:/var/lib/somedata mounted on /var/lib/somedata with fstype nfs statvers=1.0
        opts:   rw,vers=3,rsize=262144,wsize=262144,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.0.1,mountvers=3,mountport=28425,mountproto=udp,local_lock=none
        age:    8808327
        caps:   caps=0x3fcf,wtmult=4096,dtsize=4096,bsize=0,namlen=255
        sec:    flavor=1,pseudoflavor=1
        events: 265795001 2349571869 19 693 205255006 16534 2587819800 18504 0 11364 18501 470907411 77191 142 205196647 224 0 205196631 0 93 18504 16 0 0 0 0 0 
        bytes:  62141776651 75366231 0 0 6920924 75366231 12250 18501 
        RPC iostats version: 1.0  p/v: 100003/3 (nfs)
        xprt:   tcp 0 1 9821 0 0 403452500 403452499 1 659008617 0
        per-op statistics
                NULL: 0 0 0 0 0 0 0 0
             GETATTR: 265795003 265795003 0 28706036148 29769040336 658810 30616798 33640200
             SETATTR: 236 236 0 33412 33984 0 49 53
              LOOKUP: 16416 16416 0 1976280 3866016 47 2368 2629
              ACCESS: 137606535 137606535 0 15412058000 16512784080 286476 15698876 17193886
            READLINK: 0 0 0 0 0 0 0 0
                READ: 11371 11371 0 1364520 8399092 86 25064 25308
               WRITE: 453 453 0 75424552 61608 524 508 1053
              CREATE: 119 119 0 17048 31892 0 49 51
               MKDIR: 33 33 0 4856 8844 0 4 5
             SYMLINK: 0 0 0 0 0 0 0 0
               MKNOD: 0 0 0 0 0 0 0 0
              REMOVE: 32 32 0 4288 4608 20 7 28
               RMDIR: 0 0 0 0 0 0 0 0
              RENAME: 41 41 0 7440 10660 0 4 5
                LINK: 0 0 0 0 0 0 0 0
             READDIR: 0 0 0 0 0 0 0 0
         READDIRPLUS: 19069 19069 0 2667772 19453800 39 3767 3960
              FSSTAT: 739 739 0 73900 62076 10 198 228
              FSINFO: 2 2 0 192 160 0 0 0
            PATHCONF: 1 1 0 96 56 0 0 0
              COMMIT: 125 125 0 15000 16000 1 13 16

device 10.0.0.1:/var/lib/somedata22 mounted on /var/lib/somedata22 with fstype nfs statvers=1.0
        opts:   rw,vers=3,rsize=262144,wsize=262144,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.0.1,mountvers=3,mountport=28425,mountproto=udp,local_lock=none
        age:    8808327
        caps:   caps=0x3fcf,wtmult=4096,dtsize=4096,bsize=0,namlen=255
        sec:    flavor=1,pseudoflavor=1
        events: 754 5358 7 8 548 0 5399 0 0 0 0 0 732 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
        bytes:  0 0 0 0 0 0 0 0 
        RPC iostats version: 1.0  p/v: 100003/3 (nfs)
        xprt:   tcp 0 1 9821 0 0 403452500 403452499 1 659008617 0
        per-op statistics
                NULL: 0 0 0 0 0 0 0 0
             GETATTR: 756 756 0 82584 84672 3 133 144
             SETATTR: 0 0 0 0 0 0 0 0
              LOOKUP: 4 4 0 480 944 0 0 0
              ACCESS: 418 418 0 48656 50160 0 41 45
            READLINK: 0 0 0 0 0 0 0 0
                READ: 0 0 0 0 0 0 0 0
               WRITE: 0 0 0 0 0 0 0 0
              CREATE: 0 0 0 0 0 0 0 0
               MKDIR: 0 0 0 0 0 0 0 0
             SYMLINK: 0 0 0 0 0 0 0 0
               MKNOD: 0 0 0 0 0 0 0 0
              REMOVE: 0 0 0 0 0 0 0 0
               RMDIR: 0 0 0 0 0 0 0 0
              RENAME: 0 0 0 0 0 0 0 0
                LINK: 0 0 0 0 0 0 0 0
             READDIR: 0 0 0 0 0 0 0 0
         READDIRPLUS: 370 370 0 50392 448148 0 39 43
              FSSTAT: 739 739 0 73900 62076 2 108 126
              FSINFO: 2 2 0 192 160 0 0 0
            PATHCONF: 1 1 0 96 56 0 0 0
              COMMIT: 0 0 0 0 0 0 0 0

device nfsd mounted on /proc/fs/nfsd with fstype nfsd
device fusectl mounted on /sys/fs/fuse/connections with fstype fusectl

^and actually these nfs mountpoints identical, that's just bad me trying remove any sensitive data

What did you expect to see?

actually working node_exporter

What did you see instead?

/opt/prometheus/bin/node_exporter -collectors.enabled conntrack,diskstats,entropy,filefd,filesystem,hwmon,loadavg,mdadm,meminfo,netdev,netstat,sockstat,stat,textfile,time,uname,vmstat,mountstats -collector.textfile.directory=/var/lib/node_exporter/textfile_collector/ -log.level debug
INFO[0000] Starting node_exporter (version=0.14.0, branch=master, revision=840ba5dcc71a084a3bc63cb6063003c1f94435a6)  source="node_exporter.go:140"
INFO[0000] Build context (go=go1.7.5, user=root@bb6d0678e7f3, date=20170321-12:12:54)  source="node_exporter.go:141"
INFO[0000] Enabled collectors:                           source="node_exporter.go:160"
INFO[0000]  - stat                                       source="node_exporter.go:162"
INFO[0000]  - textfile                                   source="node_exporter.go:162"
INFO[0000]  - conntrack                                  source="node_exporter.go:162"
INFO[0000]  - filefd                                     source="node_exporter.go:162"
INFO[0000]  - loadavg                                    source="node_exporter.go:162"
INFO[0000]  - mdadm                                      source="node_exporter.go:162"
INFO[0000]  - sockstat                                   source="node_exporter.go:162"
INFO[0000]  - time                                       source="node_exporter.go:162"
INFO[0000]  - vmstat                                     source="node_exporter.go:162"
INFO[0000]  - mountstats                                 source="node_exporter.go:162"
INFO[0000]  - diskstats                                  source="node_exporter.go:162"
INFO[0000]  - entropy                                    source="node_exporter.go:162"
INFO[0000]  - hwmon                                      source="node_exporter.go:162"
INFO[0000]  - meminfo                                    source="node_exporter.go:162"
INFO[0000]  - netdev                                     source="node_exporter.go:162"
INFO[0000]  - filesystem                                 source="node_exporter.go:162"
INFO[0000]  - netstat                                    source="node_exporter.go:162"
INFO[0000]  - uname                                      source="node_exporter.go:162"
INFO[0000] Listening on :9100                            source="node_exporter.go:186"
DEBU[0004] OK: uname collector succeeded after 0.000141s.  source="node_exporter.go:98"
DEBU[0004] OK: sockstat collector succeeded after 0.000425s.  source="node_exporter.go:98"
DEBU[0004] Return time: 1493041316.000000                source="time.go:47"
DEBU[0004] OK: time collector succeeded after 0.000642s.  source="node_exporter.go:98"
DEBU[0004] Set node_mem: map[string]float64{"MemTotal":3.381118976e+10, "Unevictable":5.124096e+06, "HugePages_Rsvd":0, "DirectMap2M":3.405774848e+10, "Inactive":7.191379968e+09, "VmallocUsed":3.17784064e+08, "VmallocChunk":3.5184033210368e+13, "DirectMap4k":3.01989888e+08, "AnonPages":1.5124672512e+10, "SReclaimable":3.01338624e+09, "WritebackTmp":0, "Committed_AS":2.1645664256e+10, "SwapCached":3.9936e+06, "Writeback":20480, "SUnreclaim":4.17558528e+08, "NFS_Unstable":0, "Bounce":0, "CommitLimit":4.6357581824e+10, "VmallocTotal":3.5184372087808e+13, "Active_anon":1.4895927296e+10, "Inactive_file":6.945275904e+09, "Slab":3.430944768e+09, "PageTables":4.8771072e+07, "HardwareCorrupted":0, "HugePages_Surp":0, "MemFree":8.70387712e+08, "Mlocked":5.124096e+06, "SwapFree":2.9396918272e+10, "KernelStack":1.052672e+07, "HugePages_Total":0, "Hugepagesize":2.097152e+06, "Buffers":5.10275584e+08, "Dirty":143360, "Mapped":3.25206016e+08, "Shmem":1.5917056e+07, "SwapTotal":2.9451988992e+10, "AnonHugePages":0, "HugePages_Free":0, "Cached":1.3549133824e+10, "Active":2.1991354368e+10, "Inactive_anon":2.46104064e+08, "Active_file":7.095427072e+09}  source="meminfo.go:48"
panic: runtime error: index out of range

goroutine 46 [running]:
panic(0x8b1120, 0xc42000e120)
        /usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/prometheus/node_exporter/vendor/github.com/prometheus/procfs.parseNFSTransportStats(0xc42036e020, 0xa, 0xa, 0xc4202d6061, 0x3, 0x39, 0x8, 0x64)
        /go/src/github.com/prometheus/node_exporter/vendor/github.com/prometheus/procfs/mountstats.go:548 +0x341
github.com/prometheus/node_exporter/vendor/github.com/prometheus/procfs.parseMountStatsNFS(0xc42023ec48, 0xc4202d6061, 0x3, 0x9, 0x1, 0x0)
        /go/src/github.com/prometheus/node_exporter/vendor/github.com/prometheus/procfs/mountstats.go:350 +0x3bf
github.com/prometheus/node_exporter/vendor/github.com/prometheus/procfs.parseMountStats(0xb7bdc0, 0xc42027c000, 0xc42027c000, 0x0, 0x0, 0x1, 0xc420268040)
        /go/src/github.com/prometheus/node_exporter/vendor/github.com/prometheus/procfs/mountstats.go:250 +0x360
github.com/prometheus/node_exporter/vendor/github.com/prometheus/procfs.Proc.MountStats(0x40a2, 0x9385f8, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/prometheus/node_exporter/vendor/github.com/prometheus/procfs/proc.go:204 +0x157
github.com/prometheus/node_exporter/collector.(*mountStatsCollector).Update(0xc4201c21c0, 0xc42013b5c0, 0xba4e80, 0x0)
        /go/src/github.com/prometheus/node_exporter/collector/mountstats_linux.go:495 +0x6a
main.execute(0x7fff66f0aba3, 0xa, 0xb7a780, 0xc4201c21c0, 0xc42013b5c0)
        /go/src/github.com/prometheus/node_exporter/node_exporter.go:90 +0x9a
main.NodeCollector.Collect.func1(0xc42013b5c0, 0xc4201d3210, 0x7fff66f0aba3, 0xa, 0xb7a780, 0xc4201c21c0)
        /go/src/github.com/prometheus/node_exporter/node_exporter.go:70 +0x53
created by main.NodeCollector.Collect
        /go/src/github.com/prometheus/node_exporter/node_exporter.go:72 +0x120

meanwhile at debian jessie with this:

/opt/prometheus/bin/node_exporter -collectors.enabled conntrack,diskstats,entropy,filefd,filesystem,hwmon,loadavg,mdadm,meminfo,netdev,netstat,sockstat,stat,textfile,time,uname,vmstat,mountstats,nfs -collector.textfile.directory=/var/lib/node_exporter/textfile_collector/

starts fine, but siliently fails to collect and show any "node_mountstats_nfs_age_seconds_total" data, meanwhile with enabled debug, i did see

node_scrape_collector_duration_seconds{collector="mountstats"} 0.00012238600000000002
node_scrape_collector_success{collector="mountstats"} 1

and actually nfs collector seems not working too:

curl localhost:9100/metrics | grep -i nfs
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 97662  100 97662    0     0  11.0M      0 --:--:-- --:--:-- --:--:-- 11.6M
# HELP node_memory_NFS_Unstable Memory information field NFS_Unstable.
# TYPE node_memory_NFS_Unstable gauge
node_memory_NFS_Unstable 0
# HELP node_nfs_net_connections Number of connections at the network layer.
# TYPE node_nfs_net_connections counter
node_nfs_net_connections{protocol="tcp"} 0
# HELP node_nfs_net_reads Number of reads at the network layer.
# TYPE node_nfs_net_reads counter
node_nfs_net_reads{protocol="tcp"} 0
node_nfs_net_reads{protocol="udp"} 0
# HELP node_nfs_rpc_authentication_refreshes Number of RPC authentication refreshes performed.
# TYPE node_nfs_rpc_authentication_refreshes counter
node_nfs_rpc_authentication_refreshes 0
# HELP node_nfs_rpc_operations Number of RPCs performed.
# TYPE node_nfs_rpc_operations counter
node_nfs_rpc_operations 0
# HELP node_nfs_rpc_retransmissions Number of RPC transmissions performed.
# TYPE node_nfs_rpc_retransmissions counter
node_nfs_rpc_retransmissions 0
node_scrape_collector_duration_seconds{collector="nfs"} 6.355e-05
node_scrape_collector_success{collector="nfs"} 1

at "broken" wheezy vm:

curl localhost:9100/metrics | grep -i nfs
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 99940  100 99940    0     0  9083k      0 --:--:-- --:--:-- --:--:-- 9759k
node_filesystem_avail{device="10.0.0.1:/var/lib/somedata2/data2",fstype="nfs",mountpoint="/var/lib/somedata2/data2"} 2.1221605376e+10
node_filesystem_avail{device="10.0.0.1:/var/lib/somedata22",fstype="nfs",mountpoint="/var/lib/somedata22"} 2.1200896e+10
node_filesystem_avail{device="rpc_pipefs",fstype="rpc_pipefs",mountpoint="/var/lib/nfs/rpc_pipefs"} 0
node_filesystem_files{device="10.0.0.1:/var/lib/somedata2/data2",fstype="nfs",mountpoint="/var/lib/somedata2/data2"} 2.097152e+07
node_filesystem_files{device="10.0.0.1:/var/lib/somedata22",fstype="nfs",mountpoint="/var/lib/somedata22"} 2.097152e+07
node_filesystem_files{device="rpc_pipefs",fstype="rpc_pipefs",mountpoint="/var/lib/nfs/rpc_pipefs"} 0
node_filesystem_files_free{device="10.0.0.1:/var/lib/somedata2/data2",fstype="nfs",mountpoint="/var/lib/somedata2/data2"} 2.0906611e+07
node_filesystem_files_free{device="10.0.0.1:/var/lib/somedata22",fstype="nfs",mountpoint="/var/lib/somedata22"} 2.0970505e+07
node_filesystem_files_free{device="rpc_pipefs",fstype="rpc_pipefs",mountpoint="/var/lib/nfs/rpc_pipefs"} 0
node_filesystem_free{device="10.0.0.1:/var/lib/somedata2/data2",fstype="nfs",mountpoint="/var/lib/somedata2/data2"} 2.1221605376e+10
node_filesystem_free{device="10.0.0.1:/var/lib/somedata22",fstype="nfs",mountpoint="/var/lib/somedata22"} 2.1200896e+10
node_filesystem_free{device="rpc_pipefs",fstype="rpc_pipefs",mountpoint="/var/lib/nfs/rpc_pipefs"} 0
node_filesystem_readonly{device="10.0.0.1:/var/lib/somedata2/data2",fstype="nfs",mountpoint="/var/lib/somedata2/data2"} 0
node_filesystem_readonly{device="10.0.0.1:/var/lib/somedata22",fstype="nfs",mountpoint="/var/lib/somedata22"} 0
node_filesystem_readonly{device="rpc_pipefs",fstype="rpc_pipefs",mountpoint="/var/lib/nfs/rpc_pipefs"} 0
node_filesystem_size{device="10.0.0.1:/var/lib/somedata2/data2",fstype="nfs",mountpoint="/var/lib/somedata2/data2"} 2.146435072e+10
node_filesystem_size{device="10.0.0.1:/var/lib/somedata22",fstype="nfs",mountpoint="/var/lib/somedata22"} 2.146435072e+10
node_filesystem_size{device="rpc_pipefs",fstype="rpc_pipefs",mountpoint="/var/lib/nfs/rpc_pipefs"} 0
# HELP node_memory_NFS_Unstable Memory information field NFS_Unstable.
# TYPE node_memory_NFS_Unstable gauge
node_memory_NFS_Unstable 0

Maybe should i did two separate issues with description wheezy and jessie issues separately ❔

@SuperQ
Copy link
Member

SuperQ commented Apr 24, 2017

Could you attach a file with the raw contents of procfs with the tab characters preserved? I want to compare with the current fixture data we have.

Also, what kernel versions are involved here?

@SuperQ SuperQ self-assigned this Apr 24, 2017
@SuperQ
Copy link
Member

SuperQ commented Apr 24, 2017

Oh, sorry, I see the versions at the top now.

@grobie
Copy link
Member

grobie commented Apr 24, 2017

@mdlayher That's an issue in the NFS parsing code in procfs.

@mdlayher
Copy link
Contributor

Yep. A gist with the full, unmodified output would be great. Probably a different version of the stats that I'm not accounting for.

@grobie
Copy link
Member

grobie commented Apr 24, 2017

@mdlayher Looking at https://github.com/prometheus/procfs/blob/332f6238064950a97bc3ed3f5421f6418537e707/mountstats.go#L511-L552 I bet he has a file using stat version 1.0. You require in this function stat 1.1.

@grobie
Copy link
Member

grobie commented Apr 24, 2017

@mdlayher IMHO this could be fixed by changing line 527 to ns := make([]uint64, fieldTransport11Len) (initializing a slice with 13 elements, not 0).

@SuperQ
Copy link
Member

SuperQ commented Apr 24, 2017

Well, we still need to bump the vendoring, but that's easy.

Thanks @mdlayher / @grobie! 🌮 🎉 🌮

@SuperQ
Copy link
Member

SuperQ commented Apr 25, 2017

@Savemech, if you could build and test from master to verify, that would be great.

@Savemech
Copy link
Author

@SuperQ now works perfectly fine; many thanks guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants