Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux system measurement is split into two lines #2444

Closed
markuskont opened this issue Feb 20, 2017 · 4 comments
Closed

Linux system measurement is split into two lines #2444

markuskont opened this issue Feb 20, 2017 · 4 comments

Comments

@markuskont
Copy link

markuskont commented Feb 20, 2017

Bug report

Hey. For some reason, the system measurement (load1,5,15, uptime, etc) is sent by telegraf as two distinct lines.

system,host=TICKAlerta load1=0,load5=0.03,load15=0.03,n_users=1i,n_cpus=2i 1487579590000000000
system,host=TICKAlerta uptime_format=" 0:13",uptime=807i 1487579590000000000

This essentially breaks stream processing with kapacitor of system measurement, as relevant field is missing 50% of the times.

E! error evaluating expression for level CRITICAL: no field or tag exists for load1

This can be verified while looking into datapoints, as seen by kapacitor

{
  "Name": "system",
  "Database": "telegraf",
  "RetentionPolicy": "default",
  "Group": "host=host1.ex",
  "Dimensions": {
    "ByName": false,
    "TagNames": [
      "host"
    ]
  },
  "Tags": {
    "environment": "offsite",
    "host": "host1.ex",
    "osname": "Ubuntu",
    "virtual": "physical"
  },
  "Fields": {
    "load1": 0,
    "load15": 0.05,
    "load5": 0.01,
    "n_cpus": 4,
    "n_users": 0
  },
  "Time": "2017-02-14T12:38:40Z"
}
{
  "Name": "system",
  "Database": "telegraf",
  "RetentionPolicy": "default",
  "Group": "host=host1.ex",
  "Dimensions": {
    "ByName": false,
    "TagNames": [
      "host"
    ]
  },
  "Tags": {
    "environment": "offsite",
    "host": "host1.ex",
    "osname": "Ubuntu",
    "virtual": "physical"
  },
  "Fields": {
    "uptime": 5278035,
    "uptime_format": "61 days,  2:07"
  },
  "Time": "2017-02-14T12:38:40Z"
}

Relevant telegraf.conf:

[global_tags]
[agent]
  interval = "10s"
  round_interval = true

  metric_batch_size = 1000

  metric_buffer_limit = 10000

  collection_jitter = "0s"

  flush_interval = "10s"
  flush_jitter = "0s"

  precision = ""

  debug = false
  quiet = false
  logfile = ""

  hostname = ""
  omit_hostname = false

 [[outputs.file]]
   files = ["stdout", "/tmp/metrics.out"]

   data_format = "influx"

[[inputs.system]]
  # no configuration

System info:

As far as I know, present in telegraf versions 1.0, 1.1 and 1.2. Tested on Ubuntu and Debian LTS versions (precise, trusty, xenial, jessie).

Steps to reproduce:

Telegraf

Use the included telegraf config file.

telegraf --config telegraf.conf --debug
cat /tmp/metrics.out

Kapacitor

var warn_threshold = 4
var crit_threshold = 10

var period = 1h
var every = 1m

var data = stream
  |from()
    .database('telegraf')
    .retentionPolicy('default')
    .measurement('system')
    .groupBy('host')
  |log()
  |window()
    .period(period)
    .every(every)
  |last('load1')
    .as('stat')
grep load1 /var/log/kapacitor/kapacitor.log

Expected behavior:

Single line for system measurement.

Actual behavior:

Two distinct lines for system measurement

@sparrc
Copy link
Contributor

sparrc commented Feb 20, 2017

This is not the only place where metrics can sometimes be split across multiple lines. This is valid Influx line-protocol and it doesn't make any difference to the db, so I'm going to close as "won't fix"

the reason for the split is that the metrics are two different types, which can be represented & processed differently. The types are also exposed to the prometheus exporter endpoint.

@desa
Copy link
Contributor

desa commented Feb 22, 2017

@sparrc this issue needs to be fixed somewhere. Since the metrics are split into two lines, InfluxDB forwards it on to Kapacitor as two lines and therefore its possible to write a TICKscript that will perpetually fail.

I've opened an issue on Kapacitor for the time being.

@sparrc
Copy link
Contributor

sparrc commented Feb 22, 2017

IMO it should be fixed in Kapacitor then.

Metrics can be split for any number of reasons, including by the UDP client itself in order to split metrics with many fields into multiple packets. Telegraf doesn't have any guarantee that all fields for a single measurement will arrive on a single line-protocol line and the InfluxDB client doesn't guarantee that either.

IMO this is a good design decision from InfluxDB's perspective and I think Kapacitor should do the same.

@desa
Copy link
Contributor

desa commented Feb 22, 2017

Yeah I think I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants