[Stack Monitoring] compatibility for agent data streams #119112

neptunian · 2021-11-18T21:39:59Z

Updates queries to include metrics-{dataset}-{namespace} for usage with Agent and integrations. These changes don't require testing of the actual packages, we are mainly wanting to look at the changes in the queries and make sure existing functionality does not break because nearly every query has been modified. If you find issues trying to load the app when using the Agent with packages, please add them to the separate issues for each package listed here https://github.com/elastic/observability-dev/issues/1660

removes monitoring.ui.metricbeat.index and all references to it
adds datastream index patterns along with the existing legacy pattern and removes metricbeat-* in a new function called getNewIndexPatterns (to replace getIndexPatterns)
adds constant keyword data_stream.dataset to the filter when targeting a particular dataset
comments out all the "-mb" functional and api integration tests
creates the index pattern where the query is built/called instead of passing down from the API handler
does not modify the filebeat-* patterns to include logs-*. This will be done in a separate PR.
does not handle migration of legacy to metricbeat monitoring. the UI will still ask you to use metricbeat and check for it. I opened a separate issue [Stack Monitoring] user stories for migration to Agent #120414

Test

Manual test with legacy internal collection (.monitoring) Turn on internal monitoring and everything should work as normal. Ideally setup all possible modules (ES, Logstash, APM, Beats, Kibana) and setup CCR and ML jobs. @matschaffer used the Docker testing cluster with some tweaks to make it use internal collection https://github.com/elastic/observability-dev/pull/1831/files
Manual test with standalone metricbeat collection and xpack.enabled: true (.monitoring with -mb) prefix . This is the default behavior when spinning up the Docker testing cluster, but I don't think this is possible until we resolve the mapping issue
manually test alerts are being created correctly, but most helpful would be to review the query changes I made in the code
functional tests should pass

Example query change:

Cluster stats call before

{
  "index": "*:.monitoring-es-6-*,*:.monitoring-es-7-*,.monitoring-es-6-*,.monitoring-es-7-*,metricbeat-*,*:metricbeat-*",
  "size": 10000,
  "ignore_unavailable": true,
  "filter_path": [
    "hits.hits._index",
    "hits.hits._source.cluster_uuid",
    "hits.hits._source.elasticsearch.cluster.id",
    "hits.hits._source.cluster_name",
    "hits.hits._source.elasticsearch.cluster.name",
    "hits.hits._source.version",
    "hits.hits._source.elasticsearch.version",
    "hits.hits._source.elasticsearch.cluster.node.version",
    "hits.hits._source.license.status",
    "hits.hits._source.elasticsearch.cluster.stats.license.status",
    "hits.hits._source.license.type",
    "hits.hits._source.elasticsearch.cluster.stats.license.type",
    "hits.hits._source.license.issue_date",
    "hits.hits._source.elasticsearch.cluster.stats.license.issue_date",
    "hits.hits._source.license.expiry_date",
    "hits.hits._source.elasticsearch.cluster.stats.license.expiry_date",
    "hits.hits._source.license.expiry_date_in_millis",
    "hits.hits._source.elasticsearch.cluster.stats.license.expiry_date_in_millis",
    "hits.hits._source.cluster_stats",
    "hits.hits._source.elasticsearch.cluster.stats",
    "hits.hits._source.cluster_state",
    "hits.hits._source.elasticsearch.cluster.stats.state",
    "hits.hits._source.cluster_settings.cluster.metadata.display_name"
  ],
  "body": {
    "query": {
      "bool": {
        "filter": [
          {
            "bool": {
              "should": [
                {
                  "term": {
                    "type": "cluster_stats"
                  }
                },
                {
                  "term": {
                    "metricset.name": "cluster_stats"
                  }
                }
              ]
            }
          },
          {
            "range": {
              "timestamp": {
                "format": "epoch_millis",
                "gte": 1638370491606,
                "lte": 1638371391606
              }
            }
          }
        ]
      }
    },
    "collapse": {
      "field": "cluster_uuid"
    },
    "sort": {
      "timestamp": {
        "order": "desc",
        "unmapped_type": "long"
      }
    }
  }
}

cluster stats call now

{
  "index": "*:.monitoring-es-6-*,*:.monitoring-es-7-*,*:metrics-elasticsearch.cluster_stats-*,.monitoring-es-6-*,.monitoring-es-7-*,metrics-elasticsearch.cluster_stats-*",
  "size": 10000,
  "ignore_unavailable": true,
  "filter_path": [
    "hits.hits._index",
    "hits.hits._source.cluster_uuid",
    "hits.hits._source.elasticsearch.cluster.id",
    "hits.hits._source.cluster_name",
    "hits.hits._source.elasticsearch.cluster.name",
    "hits.hits._source.version",
    "hits.hits._source.elasticsearch.version",
    "hits.hits._source.elasticsearch.cluster.node.version",
    "hits.hits._source.license.status",
    "hits.hits._source.elasticsearch.cluster.stats.license.status",
    "hits.hits._source.license.type",
    "hits.hits._source.elasticsearch.cluster.stats.license.type",
    "hits.hits._source.license.issue_date",
    "hits.hits._source.elasticsearch.cluster.stats.license.issue_date",
    "hits.hits._source.license.expiry_date",
    "hits.hits._source.elasticsearch.cluster.stats.license.expiry_date",
    "hits.hits._source.license.expiry_date_in_millis",
    "hits.hits._source.elasticsearch.cluster.stats.license.expiry_date_in_millis",
    "hits.hits._source.cluster_stats",
    "hits.hits._source.elasticsearch.cluster.stats",
    "hits.hits._source.cluster_state",
    "hits.hits._source.elasticsearch.cluster.stats.state",
    "hits.hits._source.cluster_settings.cluster.metadata.display_name"
  ],
  "body": {
    "query": {
      "bool": {
        "filter": [
          {
            "bool": {
              "should": [
                {
                  "term": {
                    "data_stream.dataset": "elasticsearch.cluster_stats"
                  }
                },
                {
                  "term": {
                    "type": "cluster_stats"
                  }
                }
              ]
            }
          },
          {
            "term": {
              "cluster_uuid": "v9ecplXfT5aUQdCJb2pTGA"
            }
          },
          {
            "range": {
              "timestamp": {
                "format": "epoch_millis",
                "gte": 1638375953258,
                "lte": 1638376853258
              }
            }
          }
        ]
      }
    },
    "collapse": {
      "field": "cluster_uuid"
    },
    "sort": {
      "timestamp": {
        "order": "desc",
        "unmapped_type": "long"
      }
    }
  }
}

metrics aggregation query before

{
  "index": ".monitoring-es-6-*,.monitoring-es-7-*,metricbeat-*",
  "size": 0,
  "ignore_unavailable": true,
  "body": {
    "query": {
      "bool": {
        "filter": [
          {
            "term": {
              "cluster_uuid": "v9ecplXfT5aUQdCJb2pTGA"
            }
          },
          {
            "term": {
              "source_node.uuid": "RRbf6gveSIGx5zhsuG9GGg"
            }
          },
          {
            "range": {
              "timestamp": {
                "format": "epoch_millis",
                "gte": 1638371010418,
                "lte": 1638371910418
              }
            }
          },
          {
            "term": {
              "source_node.uuid": "RRbf6gveSIGx5zhsuG9GGg"
            }
          }
        ]
      }
    },
    "aggs": {
      "check": {
        "date_histogram": {
          "field": "timestamp",
          "fixed_interval": "10s"
        },
        "aggs": {
          "metric": {
            "max": {
              "field": "node_stats.indices.segments.count"
            }
          }
        }
      }
    }
  }
}

metrics aggregation query after

{
  "index": "*:.monitoring-es-6-*,*:.monitoring-es-7-*,*:metrics-elasticsearch.*-*,.monitoring-es-6-*,.monitoring-es-7-*,metrics-elasticsearch.*-*",
  "size": 0,
  "ignore_unavailable": true,
  "body": {
    "query": {
      "bool": {
        "filter": [
          {
            "term": {
              "cluster_uuid": "v9ecplXfT5aUQdCJb2pTGA"
            }
          },
          {
            "term": {
              "source_node.uuid": "RRbf6gveSIGx5zhsuG9GGg"
            }
          },
          {
            "range": {
              "timestamp": {
                "format": "epoch_millis",
                "gte": 1638373535167,
                "lte": 1638374435167
              }
            }
          },
          {
            "term": {
              "source_node.uuid": "RRbf6gveSIGx5zhsuG9GGg"
            }
          }
        ]
      }
    },
    "aggs": {
      "check": {
        "date_histogram": {
          "field": "timestamp",
          "fixed_interval": "10s"
        },
        "aggs": {
          "metric": {
            "max": {
              "field": "node_stats.indices.segments.count"
            }
          }
        }
      }
    }
  }
}

{
  "index": ".monitoring-es-6-*,.monitoring-es-7-*,metricbeat-*,*:.monitoring-es-6-*,*:.monitoring-es-7-*,*:metricbeat-*",
  "filter_path": [
    "aggregations"
  ],
  "body": {
    "size": 0,
    "query": {
      "bool": {
        "filter": [
          {
            "terms": {
              "cluster_uuid": [
                "v9ecplXfT5aUQdCJb2pTGA"
              ]
            }
          },
          {
            "term": {
              "type": "node_stats"
            }
          },
          {
            "range": {
              "timestamp": {
                "gte": "now-5s"
              }
            }
          }
        ]
      }
    },
    "aggs": {
      "clusters": {
        "terms": {
          "field": "cluster_uuid",
          "size": 10000,
          "include": [
            "v9ecplXfT5aUQdCJb2pTGA"
          ]
        },
        "aggs": {
          "nodes": {
            "terms": {
              "field": "node_stats.node_id",
              "size": 10000
            },
            "aggs": {
              "index": {
                "terms": {
                  "field": "_index",
                  "size": 1
                }
              },
              "total_in_bytes": {
                "max": {
                  "field": "node_stats.fs.total.total_in_bytes"
                }
              },
              "available_in_bytes": {
                "max": {
                  "field": "node_stats.fs.total.available_in_bytes"
                }
              },
              "usage_ratio_percentile": {
                "bucket_script": {
                  "buckets_path": {
                    "available_in_bytes": "available_in_bytes",
                    "total_in_bytes": "total_in_bytes"
                  },
                  "script": "100 - Math.floor((params.available_in_bytes / params.total_in_bytes) * 100)"
                }
              },
              "name": {
                "terms": {
                  "field": "source_node.name",
                  "size": 1
                }
              }
            }
          }
        }
      }
    }
  }
}

disk usage alerts before

{
  "index": "*:.monitoring-es-6-*,*:.monitoring-es-7-*,*:metrics-elasticsearch.node_stats-*,.monitoring-es-6-*,.monitoring-es-7-*,metrics-elasticsearch.node_stats-*",
  "filter_path": [
    "aggregations"
  ],
  "body": {
    "size": 0,
    "query": {
      "bool": {
        "filter": [
          {
            "terms": {
              "cluster_uuid": [
                "v9ecplXfT5aUQdCJb2pTGA"
              ]
            }
          },
          {
            "term": {
              "data_stream.dataset": "elasticsearch.node_stats"
            }
          },
          {
            "term": {
              "type": "node_stats"
            }
          },
          {
            "range": {
              "timestamp": {
                "gte": "now-5s"
              }
            }
          }
        ]
      }
    },
    "aggs": {
      "clusters": {
        "terms": {
          "field": "cluster_uuid",
          "size": 10000,
          "include": [
            "v9ecplXfT5aUQdCJb2pTGA"
          ]
        },
        "aggs": {
          "nodes": {
            "terms": {
              "field": "node_stats.node_id",
              "size": 10000
            },
            "aggs": {
              "index": {
                "terms": {
                  "field": "_index",
                  "size": 1
                }
              },
              "total_in_bytes": {
                "max": {
                  "field": "node_stats.fs.total.total_in_bytes"
                }
              },
              "available_in_bytes": {
                "max": {
                  "field": "node_stats.fs.total.available_in_bytes"
                }
              },
              "usage_ratio_percentile": {
                "bucket_script": {
                  "buckets_path": {
                    "available_in_bytes": "available_in_bytes",
                    "total_in_bytes": "total_in_bytes"
                  },
                  "script": "100 - Math.floor((params.available_in_bytes / params.total_in_bytes) * 100)"
                }
              },
              "name": {
                "terms": {
                  "field": "source_node.name",
                  "size": 1
                }
              }
            }
          }
        }
      }
    }
  }
}

disk usage alerts after

neptunian · 2021-11-18T22:32:30Z

x-pack/plugins/monitoring/common/ccs_utils.ts

@@ -66,28 +66,17 @@ export function prefixIndexPattern(
  }

  if (!ccsEnabled || !ccs) {
-    return appendMetricbeatIndex(


stop appending the metricbeat-* pattern

I had a thought here that there's a possibility people might be using monitoring.ui.metricbeat.index to append custom index patterns. Should we add a deprecation warning if that config key is specified?

I don't think we ever supported the user using this as it wasn't documented anywhere and we didn't technically support metricbeat-*. if that's the case would deprecating it be confusing?

Ahh, okay if it's not in the docs I guess it's not bad to just let it go away silently.

#120384 has me wondering if we should re-think this approach. I get the feeling there's more than 1 customer problem we've solved by reaching for monitoring.ui.metricbeat.index

Even if we kept the config value there, we aren't using it anymore because we aren't querying metricbeat index anymore. If we recommended that kind of hack, hopefully we let the user know that. We probably should have discussed a fix whether we should be setting ccs to default to *.

That's fair. I guess we could say merging this PR raises the importance of having a proper config (which is #120384)

neptunian · 2021-11-18T22:35:00Z

x-pack/plugins/monitoring/server/lib/cluster/get_index_patterns.ts

  return indexPatterns;
 }
+


new function similar to getIndexPatterns but returns 1. There will be a case for each type (kibana, elasticsearch, logstash, etc). Since the datastreams use elasticsearch instead of es (used in .monitoring-es) we have to map them.

neptunian · 2021-11-18T22:36:34Z

x-pack/plugins/monitoring/server/lib/cluster/get_index_patterns.ts

+  }
+  return `${type}-${datasetsPattern}-${namespace}`;
+}
+


gives you both datastream and legacy index patterns for a given moduleType (elasticsearch, kibana, etc)

x-pack/plugins/monitoring/server/lib/details/get_series.ts

elasticmachine · 2021-12-01T17:21:46Z

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

x-pack/plugins/monitoring/server/lib/cluster/get_clusters_from_request.ts

x-pack/plugins/monitoring/server/lib/cluster/get_clusters_stats.ts

neptunian · 2022-01-04T17:30:26Z

@matschaffer thanks for the heads up. We'll see what @richkuz says as @kovyrin is on holiday for a while. I think this PR should continue to be merged, since we don't officially support metricbeat-* and we can do a follow up PR if the solution involves changes to our code for the sake of enterprise search.

richkuz · 2022-01-04T19:56:18Z

We'll see what @richkuz says as @kovyrin is on holiday for a while. I think this PR should continue to be merged, since we don't officially support metricbeat-* and we can do a follow up PR if the solution involves changes to our code for the sake of enterprise search.

I'm OK with merging this into 8.1.0 and breaking Ent Search temporarily. As long as this PR is only targeting 8.1.0, and not 8.0.0, we should have enough time to find and implement a resolution for Enterprise Search in a follow-up PR (tracking in #121975 ).

neptunian · 2022-01-20T13:37:47Z

@matschaffer @klacabane What do you think about trying to get this in so we can start testing?

klacabane · 2022-01-20T13:56:30Z

Sounds good to me :)

neptunian · 2022-01-20T18:42:52Z

@elasticmachine merge upstream

neptunian · 2022-01-20T19:35:05Z

@elasticmachine merge upstream

…lectionStatus

…kibana into 119109-es-integration-queries

kibana-ci · 2022-01-20T21:56:14Z

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`monitoring`	445.7KB	445.5KB	-207.0B

History

💔 Build #18821 failed 648d582
💔 Build #18798 failed dfe3dd6
💚 Build #18149 succeeded dee9ac0
💔 Build #18133 failed 6878713
💚 Build #16208 succeeded 0996399
💔 Build #16197 failed 3213b43

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @neptunian

Evesy · 2023-06-16T08:14:19Z

👋 Hi, I'm wondering if these changes have broken stack monitoring on Elastic 8.x when using Metricbeat.

From what I can see with this change, Kibana is now expecting documents with terms for Elastic cluster stats
data_stream.dataset: "elasticsearch.stack_monitoring.cluster_stats"
metricset.name: "cluster_stats"
type: "cluster_stats"

However the metricbeat elasticsearch module (when used for stack monitoring) does not include these fields in the documents it emits?

neptunian added the Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services label Nov 18, 2021

neptunian self-assigned this Nov 18, 2021

neptunian changed the title ~~[Stack Monitoring] update queries for elasticsearch integration~~ [Stack Monitoring] compatibility for elasticsearch integration Nov 18, 2021

neptunian commented Nov 18, 2021

View reviewed changes

x-pack/plugins/monitoring/server/lib/details/get_series.ts Outdated Show resolved Hide resolved

neptunian changed the title ~~[Stack Monitoring] compatibility for elasticsearch integration~~ [Stack Monitoring] compatibility for agent data streams Nov 21, 2021

neptunian force-pushed the 119109-es-integration-queries branch 3 times, most recently from 8511fe8 to 597be18 Compare November 25, 2021 01:02

neptunian force-pushed the 119109-es-integration-queries branch from 597be18 to 0f389fb Compare November 30, 2021 15:56

matschaffer mentioned this pull request Dec 1, 2021

Elasticsearch integration: missing index_stats alias on index mappings #119984

Closed

neptunian force-pushed the 119109-es-integration-queries branch from 1764d6d to 8f8d8b1 Compare December 1, 2021 16:42

neptunian marked this pull request as ready for review December 1, 2021 17:21

neptunian requested a review from a team as a code owner December 1, 2021 17:21

neptunian added release_note:enhancement v8.1.0 labels Dec 1, 2021

neptunian marked this pull request as draft December 1, 2021 17:51

klacabane reviewed Dec 2, 2021

View reviewed changes

x-pack/plugins/monitoring/server/lib/cluster/get_clusters_from_request.ts Outdated Show resolved Hide resolved

klacabane reviewed Dec 2, 2021

View reviewed changes

x-pack/plugins/monitoring/server/lib/cluster/get_clusters_stats.ts Outdated Show resolved Hide resolved

neptunian added 8 commits December 2, 2021 10:59

update queries for elasticsearch package

1a1e5fa

fix unit test

eeb2c60

add gitCcs helper function

9424af7

modify rest of es queries

94e05b2

update logstash and kibana queries to use new createQuery

effdbb7

change beats and apm to use new createQuery

ed4196a

update changeQuery and remove old one

2d03cb6

make getIndexPattern take request to check for ccs

09b8407

neptunian added 9 commits January 5, 2022 14:03

add metricset.name back to queries

5dc264b

comment tests back in

a88fd55

fix conflicts from logstash changes

596b4c0

remove enterprise search checking for standalone cluster to fix test

e5c539f

update es index metricset name from index_stats to index for mb data

3213b43

fix type

0996399

Merge branch 'main' into 119109-es-integration-queries

2b841e4

fetchClusters creates index pattern

6878713

fix type

dee9ac0

klacabane approved these changes Jan 20, 2022

View reviewed changes

Merge branch 'main' into 119109-es-integration-queries

dfe3dd6

kibanamachine and others added 4 commits January 20, 2022 14:35

Merge branch 'main' into 119109-es-integration-queries

17b6b2a

remove monitoring.ui.metricbeat.index from config and usage in getCol…

ca549e4

…lectionStatus

Merge branch '119109-es-integration-queries' of github.com:neptunian/…

648d582

…kibana into 119109-es-integration-queries

fix type

2379e9e

neptunian added the backport:skip This commit does not require backporting label Jan 20, 2022

neptunian merged commit eb17b10 into elastic:main Jan 20, 2022

matschaffer mentioned this pull request Jan 25, 2022

Metricbeat enterprise search module: add xpack.enabled support elastic/beats#29871

Merged

6 tasks

KOTungseth added the Feature:Stack Monitoring label Mar 2, 2022

neptunian mentioned this pull request Mar 7, 2022

[Stack Monitoring] add back api integration tests and update esArchiver data #126998

Merged

klacabane mentioned this pull request Jun 28, 2022

[Stack Monitoring] Support for integrations #120415

Closed

33 tasks

neptunian mentioned this pull request Jul 28, 2022

[Stack Monitoring] Removing remaining usage of legacy getIndexPatterns() #123508

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stack Monitoring] compatibility for agent data streams #119112

[Stack Monitoring] compatibility for agent data streams #119112

neptunian commented Nov 18, 2021 •

edited

Loading

neptunian Nov 18, 2021

matschaffer Nov 19, 2021

neptunian Nov 19, 2021

matschaffer Nov 19, 2021

matschaffer Dec 6, 2021

neptunian Dec 6, 2021 •

edited

Loading

matschaffer Dec 7, 2021

neptunian Nov 18, 2021

neptunian Nov 18, 2021 •

edited

Loading

elasticmachine commented Dec 1, 2021

neptunian commented Jan 4, 2022 •

edited

Loading

richkuz commented Jan 4, 2022

neptunian commented Jan 20, 2022

klacabane commented Jan 20, 2022

neptunian commented Jan 20, 2022

neptunian commented Jan 20, 2022

kibana-ci commented Jan 20, 2022

Evesy commented Jun 16, 2023

[Stack Monitoring] compatibility for agent data streams #119112

[Stack Monitoring] compatibility for agent data streams #119112

Conversation

neptunian commented Nov 18, 2021 • edited Loading

Test

neptunian Nov 18, 2021

Choose a reason for hiding this comment

matschaffer Nov 19, 2021

Choose a reason for hiding this comment

neptunian Nov 19, 2021

Choose a reason for hiding this comment

matschaffer Nov 19, 2021

Choose a reason for hiding this comment

matschaffer Dec 6, 2021

Choose a reason for hiding this comment

neptunian Dec 6, 2021 • edited Loading

Choose a reason for hiding this comment

matschaffer Dec 7, 2021

Choose a reason for hiding this comment

neptunian Nov 18, 2021

Choose a reason for hiding this comment

neptunian Nov 18, 2021 • edited Loading

Choose a reason for hiding this comment

elasticmachine commented Dec 1, 2021

neptunian commented Jan 4, 2022 • edited Loading

richkuz commented Jan 4, 2022

neptunian commented Jan 20, 2022

klacabane commented Jan 20, 2022

neptunian commented Jan 20, 2022

neptunian commented Jan 20, 2022

kibana-ci commented Jan 20, 2022

💚 Build Succeeded

Metrics [docs]

Async chunks

History

Evesy commented Jun 16, 2023

neptunian commented Nov 18, 2021 •

edited

Loading

neptunian Dec 6, 2021 •

edited

Loading

neptunian Nov 18, 2021 •

edited

Loading

neptunian commented Jan 4, 2022 •

edited

Loading