Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug report] smartctl input plugin reports incorrect device type #15723

Closed
TTTPOB opened this issue Aug 9, 2024 · 0 comments · Fixed by #15724
Closed

[bug report] smartctl input plugin reports incorrect device type #15723

TTTPOB opened this issue Aug 9, 2024 · 0 comments · Fixed by #15724
Labels
bug unexpected problem or unintended behavior

Comments

@TTTPOB
Copy link
Contributor

TTTPOB commented Aug 9, 2024

Relevant telegraf.conf

[[inputs.smartctl]]
  use_sudo = true

# not relevenet to the problem, but showing here to not suprise you with a prometheus style metric later
[[outputs.prometheus_client]]
  ## Address to listen on.
  ##   ex:
  ##     listen = ":9273"
  ##     listen = "vsock://:9273"
  listen = ":9273"

Logs from Telegraf

not relevant, but still

❯ sudo telegraf --config /etc/telegraf/telegraf.conf --debug
WARN[0000]log.go:244 gosnowflake.(*defaultLogger).Warn DBUS_SESSION_BUS_ADDRESS envvar looks to be not set, this can lead to runaway dbus-daemon processes. To avoid this, set envvar DBUS_SESSION_BUS_ADDRESS=$XDG_RUNTIME_DIR/bus (if it exists) or DBUS_SESSION_BUS_ADDRESS=/dev/null. 
2024-08-09T06:23:57Z I! Loading config: /etc/telegraf/telegraf.conf
2024-08-09T06:23:57Z I! Starting Telegraf 1.31.2 brought to you by InfluxData the makers of InfluxDB
2024-08-09T06:23:57Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 6 secret-stores
2024-08-09T06:23:57Z I! Loaded inputs: cpu disk diskio kernel mem processes smartctl swap system
2024-08-09T06:23:57Z I! Loaded aggregators: 
2024-08-09T06:23:57Z I! Loaded processors: 
2024-08-09T06:23:57Z I! Loaded secretstores: 
2024-08-09T06:23:57Z I! Loaded outputs: prometheus_client
2024-08-09T06:23:57Z I! Tags enabled: host=nas2-dorm
2024-08-09T06:23:57Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"nas2-dorm", Flush Interval:10s
2024-08-09T06:23:57Z D! [agent] Initializing plugins
2024-08-09T06:23:57Z D! [agent] Connecting outputs
2024-08-09T06:23:57Z D! [agent] Attempting connection to [outputs.prometheus_client]
2024-08-09T06:23:57Z I! [outputs.prometheus_client] Listening on http://[::]:9273/metrics
2024-08-09T06:23:57Z D! [agent] Successfully connected to outputs.prometheus_client
2024-08-09T06:23:57Z D! [agent] Starting service inputs
## I HIT CTRL C HERE
^C2024-08-09T06:24:02Z D! [agent] Stopping service inputs
2024-08-09T06:24:02Z D! [agent] Input channel closed
2024-08-09T06:24:02Z I! [agent] Hang on, flushing any cached metrics before shutdown
2024-08-09T06:24:02Z D! [outputs.prometheus_client]  Wrote batch of 51 metrics in 4.121802ms
2024-08-09T06:24:02Z D! [outputs.prometheus_client]  Buffer fullness: 0 / 10000 metrics
2024-08-09T06:24:02Z I! [agent] Stopping running outputs
2024-08-09T06:24:02Z D! [agent] Stopped Successfully

System info

Telegraf 1.31.2 (git: HEAD@854ac83b), debian 12

Docker

No response

Steps to reproduce

  1. add smartctl input plugin to the config
  2. run the telegraf service
  3. see the false device type detected in exposed metric
    ...

Expected behavior

I have six disk attached, one is nvme and the rest five are all sata,
I expect to get

# HELP smartctl_temperature Telegraf collected metric
# TYPE smartctl_temperature untyped
smartctl_temperature{firmware="",host="nas2-dorm",model="",name="/dev/sda",serial="BTTV33240xxx",type="ata"} 34
smartctl_temperature{firmware="",host="nas2-dorm",model="",name="/dev/sdb",serial="WD-WX32D80xxxS",type="ata"} 34
smartctl_temperature{firmware="",host="nas2-dorm",model="",name="/dev/sdc",serial="WD-WX32D8xxxF",type="ata"} 34
smartctl_temperature{firmware="",host="nas2-dorm",model="",name="/dev/sdd",serial="WD-WX32D80DxxxR",type="ata"} 34
smartctl_temperature{firmware="",host="nas2-dorm",model="",name="/dev/sde",serial="WD-WX32D80Exxx",type="ata"} 34
smartctl_temperature{firmware="10604103",host="nas2-dorm",model="KXG60ZNV256G NVMe TOSHIBA 256GB",name="/dev/nvme0",serial="39UF72xxx",type="nvme"} 50

Actual behavior

instead I got

# HELP smartctl_temperature Telegraf collected metric
# TYPE smartctl_temperature untyped
smartctl_temperature{firmware="",host="nas2-dorm",model="",name="/dev/sda",serial="BTTV33240DHCxxx",type="scsi"} 0
smartctl_temperature{firmware="",host="nas2-dorm",model="",name="/dev/sdb",serial="WD-WX32Dxxx",type="scsi"} 0
smartctl_temperature{firmware="",host="nas2-dorm",model="",name="/dev/sdc",serial="WD-WX32DxxxF",type="scsi"} 0
smartctl_temperature{firmware="",host="nas2-dorm",model="",name="/dev/sdd",serial="WD-WX32Dxxx",type="scsi"} 0
smartctl_temperature{firmware="",host="nas2-dorm",model="",name="/dev/sde",serial="WD-WX32D8xxxxx",type="scsi"} 0
smartctl_temperature{firmware="10604103",host="nas2-dorm",model="KXG60ZNV256G NVMe TOSHIBA 256GB",name="/dev/nvme0",serial="39UF727xxx",type="nvme"} 50

Please note the device type and incorrect temperature

Additional info

as instructed in smartctl output plugin folder, I checked

> sudo smartctl --json --scan
{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      3
    ],
    "svn_revision": "5338",
    "platform_info": "x86_64-linux-6.1.0-21-amd64",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "--json",
      "--scan"
    ],
    "exit_status": 0
  },
  "devices": [
    {
      "name": "/dev/sda",
      "info_name": "/dev/sda",
      "type": "scsi",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/sdb",
      "info_name": "/dev/sdb",
      "type": "scsi",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/sdc",
      "info_name": "/dev/sdc",
      "type": "scsi",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/sdd",
      "info_name": "/dev/sdd",
      "type": "scsi",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/sde",
      "info_name": "/dev/sde",
      "type": "scsi",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/nvme0",
      "info_name": "/dev/nvme0",
      "type": "nvme",
      "protocol": "NVMe"
    }
  ]
}

and

❯ sudo smartctl --json --all /dev/sda --device scsi
{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      3
    ],
    "svn_revision": "5338",
    "platform_info": "x86_64-linux-6.1.0-21-amd64",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "--json",
      "--all",
      "/dev/sda",
      "--device",
      "scsi"
    ],
    "exit_status": 4
  },
  "local_time": {
    "time_t": 1723184929,
    "asctime": "Fri Aug  9 14:28:49 2024 CST"
  },
  "device": {
    "name": "/dev/sda",
    "info_name": "/dev/sda",
    "type": "scsi",
    "protocol": "SCSI"
  },
  "user_capacity": {
    "blocks": 390721968,
    "bytes": 200049647616
  },
  "logical_block_size": 512,
  "scsi_lb_provisioning": {
    "name": "fully provisioned",
    "value": 0,
    "management_enabled": {
      "name": "LBPME",
      "value": -1
    },
    "read_zeros": {
      "name": "LBPRZ",
      "value": 0
    }
  },
  "rotation_rate": 0,
  "form_factor": {
    "scsi_value": 3,
    "name": "2.5 inches"
  },
  "logical_unit_id": "0x55cd2e404b43xxxx",
  "serial_number": "BTTV33240DHxxxx",
  "device_type": {
    "scsi_terminology": "Peripheral Device Type [PDT]",
    "scsi_value": 0,
    "name": "disk"
  },
  "smart_support": {
    "available": false
  },
  "temperature": {
    "current": 0
  },
  "scsi_temperature": {
    "drive_trip": 0
  }
}

Indeed the the smartctl output is wrong. so I did quick search on google and found https://www.smartmontools.org/ticket/811 this old ticket suggest that people should use --scan-open instead of --scan to get the right device type.

here is the output, not the device type is correct now.

sudo smartctl --json --scan-open
{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      3
    ],
    "svn_revision": "5338",
    "platform_info": "x86_64-linux-6.1.0-21-amd64",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "--json",
      "--scan-open"
    ],
    "exit_status": 0
  },
  "devices": [
    {
      "name": "/dev/sda",
      "info_name": "/dev/sda [SAT]",
      "type": "sat",
      "protocol": "ATA"
    },
    {
      "name": "/dev/sdb",
      "info_name": "/dev/sdb [SAT]",
      "type": "sat",
      "protocol": "ATA"
    },
    {
      "name": "/dev/sdc",
      "info_name": "/dev/sdc [SAT]",
      "type": "sat",
      "protocol": "ATA"
    },
    {
      "name": "/dev/sdd",
      "info_name": "/dev/sdd [SAT]",
      "type": "sat",
      "protocol": "ATA"
    },
    {
      "name": "/dev/sde",
      "info_name": "/dev/sde [SAT]",
      "type": "sat",
      "protocol": "ATA"
    },
    {
      "name": "/dev/nvme0",
      "info_name": "/dev/nvme0",
      "type": "nvme",
      "protocol": "NVMe"
    }
  ]
}
@TTTPOB TTTPOB added the bug unexpected problem or unintended behavior label Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant