Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add host inventory metrics to system module #20415

Merged
merged 18 commits into from
Sep 21, 2020
Merged

Add host inventory metrics to system module #20415

merged 18 commits into from
Sep 21, 2020

Conversation

kaiyan-sheng
Copy link
Contributor

@kaiyan-sheng kaiyan-sheng commented Aug 3, 2020

What does this PR do?

This PR is to add proposed host common fields into system module:

  • host.id
  • host.name
  • host.cpu.pct
  • host.network.in.bytes
  • host.network.in.packets
  • host.network.out.bytes
  • host.network.out.packets
  • host.disk.read.bytes
  • host.disk.write.bytes

For network and disk metrics, instead of reporting one event per interface, host.network.* and host.disk.* aggregates values from all interfaces.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Change system module config system.yml with:

- module: system
  period: 10s
  metricsets:
    - cpu
    - load
    - network
  cpu.metrics: [normalized_percentages]

Then start Metricbeat with ./metricbeat -e and you should see host common fields listed above in events.

Event Example

CPU metric:

{
    "event": {
      "duration": 88523,
      "dataset": "system.cpu",
      "module": "system"
    },
    "metricset": {
      "name": "cpu",
      "period": 10000
    },
    "service": {
      "type": "system"
    },
    "system": {
      "cpu": {
        "user": {
          "pct": 0.7653,
          "norm": {
            "pct": 0.0638
          }
        },
        "steal": {
          "pct": 0,
          "norm": {
            "pct": 0
          }
        },
        "cores": 12,
        "system": {
          "pct": 0.2961,
          "norm": {
            "pct": 0.0247
          }
        },
        "iowait": {
          "pct": 0,
          "norm": {
            "pct": 0
          }
        },
        "nice": {
          "pct": 0,
          "norm": {
            "pct": 0
          }
        },
        "total": {
          "norm": {
            "pct": 0.0884
          },
          "pct": 1.0614
        },
        "idle": {
          "pct": 10.9386,
          "norm": {
            "pct": 0.9116
          }
        },
        "irq": {
          "norm": {
            "pct": 0
          },
          "pct": 0
        },
        "softirq": {
          "pct": 0,
          "norm": {
            "pct": 0
          }
        }
      }
    },
    "host": {
      "cpu": {
        "pct": 0.0884
      },
      "name": "KaiyanMacBookPro",
      "hostname": "KaiyanMacBookPro",
      "architecture": "x86_64"
    }
  }
}

Network metrics:

{
  "_source": {
    "metricset": {
      "name": "network",
      "period": 60000
    },
    "service": {
      "type": "system"
    },
    "host.network.out.bytes": 26123693254,
    "host.network.out.packets": 134348248,
    "event": {
      "dataset": "system.network",
      "module": "system",
      "duration": 6234971
    },
    "host.network.in.bytes": 74833783286,
    "host.network.in.packets": 139823676,
    "host": {
      "name": "KaiyanMacBookPro",
      "hostname": "KaiyanMacBookPro",
      "architecture": "x86_64",
      "id": "9C7FAB7B-29D1-5926-8E84-158A9CA3E25D"
    }
}

Disk metrics:

{
  "_source": {
    "host.disk.read.bytes": 535049839616,
    "host.disk.write.bytes": 782890594304,
    "event": {
      "dataset": "system.diskio",
      "module": "system",
      "duration": 606341
    },
    "metricset": {
      "name": "diskio",
      "period": 10000
    },
    "service": {
      "type": "system"
    },
    "host": {
      "name": "KaiyanMacBookPro",
      "architecture": "x86_64",
      "hostname": "KaiyanMacBookPro"
    }
  }
}

Related issues

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Aug 3, 2020
@kaiyan-sheng kaiyan-sheng self-assigned this Aug 3, 2020
@kaiyan-sheng kaiyan-sheng added in progress Pull request is currently in progress. needs_backport PR is waiting to be backported to other branches. Team:Platforms Label for the Integrations - Platforms team labels Aug 3, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Aug 3, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Aug 4, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #20415 updated]

  • Start Time: 2020-09-21T16:18:54.458+0000

  • Duration: 64 min 55 sec

Test stats 🧪

Test Results
Failed 0
Passed 3212
Skipped 706
Total 3918

@kaiyan-sheng kaiyan-sheng added review and removed in progress Pull request is currently in progress. labels Aug 10, 2020
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the data.json files?

metricbeat/module/system/cpu/cpu.go Show resolved Hide resolved
metricbeat/module/system/diskio/diskio.go Outdated Show resolved Hide resolved
metricbeat/module/system/network/network.go Outdated Show resolved Hide resolved
@fearful-symmetry
Copy link
Contributor

The only thing I'm kinda worried about here is how conceptually similar metrics (for example, network.in) are turned into gauges from counters. That seems like a potential point of confusion, but if we're treating the host.* fields as basically artifacts for other components of the stack, I don't think it'll be a huge issue.

@@ -95,6 +95,7 @@ func (m *MetricSet) Fetch(r mb.ReporterV2) error {
event.Put("softirq.norm.pct", normalizedPct.SoftIRQ)
event.Put("steal.norm.pct", normalizedPct.Steal)
event.Put("total.norm.pct", normalizedPct.Total)
hostFields.Put("host.cpu.pct", normalizedPct.Total)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the other metricsets this PR adds host data to just constructs a MapStr in Event(). I wonder if we want to do that instead for the sake of consistency?

@kaiyan-sheng
Copy link
Contributor Author

@fearful-symmetry Yes, these metrics will be common across all hosts/VMs, such as AWS EC2, Azure compute VM. These components don't report counters so in order to make them all consistent, we decide to use gauges instead. We will definitely keep all the counters in system module because they are the raw data and more accurate when data loss occurred. But for the sake of consistency and for Metrics UI display, we will use gauges instead.

@fearful-symmetry
Copy link
Contributor

Looks Good. Also +1 on updating the data.json files.

@kaiyan-sheng
Copy link
Contributor Author

@fearful-symmetry Thank you! Will do! (maybe early next week). I will ping you and Jaime for review again after I updated the data.json files.

Copy link
Contributor Author

@kaiyan-sheng kaiyan-sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data.json files updated. For disk and network host metrics, it’s not showing up in data.json. That’s because the host disk and host network metrics are calculated gauges instead of counters. So it needs more than one metric to calculate gauge.

Copy link
Contributor

@fearful-symmetry fearful-symmetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kaiyan-sheng kaiyan-sheng added the test-plan Add this PR to be manual test plan label Sep 21, 2020
@kaiyan-sheng kaiyan-sheng merged commit 4a27562 into elastic:master Sep 21, 2020
@kaiyan-sheng kaiyan-sheng deleted the add_host_data_system branch September 21, 2020 21:34
@kaiyan-sheng kaiyan-sheng added v7.10.0 and removed needs_backport PR is waiting to be backported to other branches. labels Sep 21, 2020
kaiyan-sheng added a commit that referenced this pull request Sep 22, 2020
* Add host inventory metrics to system module
* Update data.json files

(cherry picked from commit 4a27562)
v1v added a commit to v1v/beats that referenced this pull request Sep 24, 2020
…ne-2.0-arm

* upstream/master: (29 commits)
  Fix librpm installation in auditbeat build (elastic#21239)
  Fix prometheus default config (elastic#21253)
  Fix dev guide test command (elastic#21254)
  Move aws lambda metricset to GA (elastic#21255)
  [Docs] Typo in table syntax (elastic#20227)
  [ECS] Adds related.hosts to capture all hostnames and host identifiers on an event. (elastic#21160)
  Add recursive split to httpjson (elastic#21214)
  [DOCS] Add beat specific start widgets (elastic#21217)
  Fix timestamp handling in remote_write (elastic#21166)
  Fix aws, azure and googlecloud compute dashboards (elastic#21098)
  Add acceptable event log keys to winlog (elastic#21205)
  Add elastic-agent to gitignore (elastic#21219)
  Add cloudfoundry tags to events (elastic#21177)
  [Ingest Manager] Agent includes pgp file (elastic#19480)
  Add compatibility note about ingress-controller-v0.34.1 (elastic#21209)
  [Ingest Manager] Support for UPGRADE_ACTION (elastic#21002)
  Fix libbeat.output.*.bytes metrics of Elasticsearch output (elastic#21197)
  [packaging] use docker.elastic.co/ubi8/ubi-minimal (elastic#21154)
  Add host inventory metrics to system module (elastic#20415)
  [Filebeat][Azure Module] Fixing event.outcome from result_type issue (elastic#20998)
  ...
@andresrc andresrc added the test-plan-added This PR has been added to the test plan label Oct 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review Team:Platforms Label for the Integrations - Platforms team test-plan Add this PR to be manual test plan test-plan-added This PR has been added to the test plan v7.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants