Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stack Monitoring] compatibility for agent data streams #119112

Merged
merged 73 commits into from
Jan 20, 2022

Conversation

neptunian
Copy link
Contributor

@neptunian neptunian commented Nov 18, 2021

Closes #104271

Updates queries to include metrics-{dataset}-{namespace} for usage with Agent and integrations. These changes don't require testing of the actual packages, we are mainly wanting to look at the changes in the queries and make sure existing functionality does not break because nearly every query has been modified. If you find issues trying to load the app when using the Agent with packages, please add them to the separate issues for each package listed here https://github.com/elastic/observability-dev/issues/1660

  • removes monitoring.ui.metricbeat.index and all references to it
  • adds datastream index patterns along with the existing legacy pattern and removes metricbeat-* in a new function called getNewIndexPatterns (to replace getIndexPatterns)
  • adds constant keyword data_stream.dataset to the filter when targeting a particular dataset
  • comments out all the "-mb" functional and api integration tests
  • creates the index pattern where the query is built/called instead of passing down from the API handler
  • does not modify the filebeat-* patterns to include logs-*. This will be done in a separate PR.
  • does not handle migration of legacy to metricbeat monitoring. the UI will still ask you to use metricbeat and check for it. I opened a separate issue [Stack Monitoring] user stories for migration to Agent #120414

Test

  • Manual test with legacy internal collection (.monitoring) Turn on internal monitoring and everything should work as normal. Ideally setup all possible modules (ES, Logstash, APM, Beats, Kibana) and setup CCR and ML jobs. @matschaffer used the Docker testing cluster with some tweaks to make it use internal collection https://github.com/elastic/observability-dev/pull/1831/files
  • Manual test with standalone metricbeat collection and xpack.enabled: true (.monitoring with -mb) prefix . This is the default behavior when spinning up the Docker testing cluster, but I don't think this is possible until we resolve the mapping issue
  • manually test alerts are being created correctly, but most helpful would be to review the query changes I made in the code
  • functional tests should pass

Example query change:

Cluster stats call before
{
  "index": "*:.monitoring-es-6-*,*:.monitoring-es-7-*,.monitoring-es-6-*,.monitoring-es-7-*,metricbeat-*,*:metricbeat-*",
  "size": 10000,
  "ignore_unavailable": true,
  "filter_path": [
    "hits.hits._index",
    "hits.hits._source.cluster_uuid",
    "hits.hits._source.elasticsearch.cluster.id",
    "hits.hits._source.cluster_name",
    "hits.hits._source.elasticsearch.cluster.name",
    "hits.hits._source.version",
    "hits.hits._source.elasticsearch.version",
    "hits.hits._source.elasticsearch.cluster.node.version",
    "hits.hits._source.license.status",
    "hits.hits._source.elasticsearch.cluster.stats.license.status",
    "hits.hits._source.license.type",
    "hits.hits._source.elasticsearch.cluster.stats.license.type",
    "hits.hits._source.license.issue_date",
    "hits.hits._source.elasticsearch.cluster.stats.license.issue_date",
    "hits.hits._source.license.expiry_date",
    "hits.hits._source.elasticsearch.cluster.stats.license.expiry_date",
    "hits.hits._source.license.expiry_date_in_millis",
    "hits.hits._source.elasticsearch.cluster.stats.license.expiry_date_in_millis",
    "hits.hits._source.cluster_stats",
    "hits.hits._source.elasticsearch.cluster.stats",
    "hits.hits._source.cluster_state",
    "hits.hits._source.elasticsearch.cluster.stats.state",
    "hits.hits._source.cluster_settings.cluster.metadata.display_name"
  ],
  "body": {
    "query": {
      "bool": {
        "filter": [
          {
            "bool": {
              "should": [
                {
                  "term": {
                    "type": "cluster_stats"
                  }
                },
                {
                  "term": {
                    "metricset.name": "cluster_stats"
                  }
                }
              ]
            }
          },
          {
            "range": {
              "timestamp": {
                "format": "epoch_millis",
                "gte": 1638370491606,
                "lte": 1638371391606
              }
            }
          }
        ]
      }
    },
    "collapse": {
      "field": "cluster_uuid"
    },
    "sort": {
      "timestamp": {
        "order": "desc",
        "unmapped_type": "long"
      }
    }
  }
}
cluster stats call now
{
  "index": "*:.monitoring-es-6-*,*:.monitoring-es-7-*,*:metrics-elasticsearch.cluster_stats-*,.monitoring-es-6-*,.monitoring-es-7-*,metrics-elasticsearch.cluster_stats-*",
  "size": 10000,
  "ignore_unavailable": true,
  "filter_path": [
    "hits.hits._index",
    "hits.hits._source.cluster_uuid",
    "hits.hits._source.elasticsearch.cluster.id",
    "hits.hits._source.cluster_name",
    "hits.hits._source.elasticsearch.cluster.name",
    "hits.hits._source.version",
    "hits.hits._source.elasticsearch.version",
    "hits.hits._source.elasticsearch.cluster.node.version",
    "hits.hits._source.license.status",
    "hits.hits._source.elasticsearch.cluster.stats.license.status",
    "hits.hits._source.license.type",
    "hits.hits._source.elasticsearch.cluster.stats.license.type",
    "hits.hits._source.license.issue_date",
    "hits.hits._source.elasticsearch.cluster.stats.license.issue_date",
    "hits.hits._source.license.expiry_date",
    "hits.hits._source.elasticsearch.cluster.stats.license.expiry_date",
    "hits.hits._source.license.expiry_date_in_millis",
    "hits.hits._source.elasticsearch.cluster.stats.license.expiry_date_in_millis",
    "hits.hits._source.cluster_stats",
    "hits.hits._source.elasticsearch.cluster.stats",
    "hits.hits._source.cluster_state",
    "hits.hits._source.elasticsearch.cluster.stats.state",
    "hits.hits._source.cluster_settings.cluster.metadata.display_name"
  ],
  "body": {
    "query": {
      "bool": {
        "filter": [
          {
            "bool": {
              "should": [
                {
                  "term": {
                    "data_stream.dataset": "elasticsearch.cluster_stats"
                  }
                },
                {
                  "term": {
                    "type": "cluster_stats"
                  }
                }
              ]
            }
          },
          {
            "term": {
              "cluster_uuid": "v9ecplXfT5aUQdCJb2pTGA"
            }
          },
          {
            "range": {
              "timestamp": {
                "format": "epoch_millis",
                "gte": 1638375953258,
                "lte": 1638376853258
              }
            }
          }
        ]
      }
    },
    "collapse": {
      "field": "cluster_uuid"
    },
    "sort": {
      "timestamp": {
        "order": "desc",
        "unmapped_type": "long"
      }
    }
  }
}
metrics aggregation query before
{
  "index": ".monitoring-es-6-*,.monitoring-es-7-*,metricbeat-*",
  "size": 0,
  "ignore_unavailable": true,
  "body": {
    "query": {
      "bool": {
        "filter": [
          {
            "term": {
              "cluster_uuid": "v9ecplXfT5aUQdCJb2pTGA"
            }
          },
          {
            "term": {
              "source_node.uuid": "RRbf6gveSIGx5zhsuG9GGg"
            }
          },
          {
            "range": {
              "timestamp": {
                "format": "epoch_millis",
                "gte": 1638371010418,
                "lte": 1638371910418
              }
            }
          },
          {
            "term": {
              "source_node.uuid": "RRbf6gveSIGx5zhsuG9GGg"
            }
          }
        ]
      }
    },
    "aggs": {
      "check": {
        "date_histogram": {
          "field": "timestamp",
          "fixed_interval": "10s"
        },
        "aggs": {
          "metric": {
            "max": {
              "field": "node_stats.indices.segments.count"
            }
          }
        }
      }
    }
  }
}
metrics aggregation query after
{
  "index": "*:.monitoring-es-6-*,*:.monitoring-es-7-*,*:metrics-elasticsearch.*-*,.monitoring-es-6-*,.monitoring-es-7-*,metrics-elasticsearch.*-*",
  "size": 0,
  "ignore_unavailable": true,
  "body": {
    "query": {
      "bool": {
        "filter": [
          {
            "term": {
              "cluster_uuid": "v9ecplXfT5aUQdCJb2pTGA"
            }
          },
          {
            "term": {
              "source_node.uuid": "RRbf6gveSIGx5zhsuG9GGg"
            }
          },
          {
            "range": {
              "timestamp": {
                "format": "epoch_millis",
                "gte": 1638373535167,
                "lte": 1638374435167
              }
            }
          },
          {
            "term": {
              "source_node.uuid": "RRbf6gveSIGx5zhsuG9GGg"
            }
          }
        ]
      }
    },
    "aggs": {
      "check": {
        "date_histogram": {
          "field": "timestamp",
          "fixed_interval": "10s"
        },
        "aggs": {
          "metric": {
            "max": {
              "field": "node_stats.indices.segments.count"
            }
          }
        }
      }
    }
  }
}
{
  "index": ".monitoring-es-6-*,.monitoring-es-7-*,metricbeat-*,*:.monitoring-es-6-*,*:.monitoring-es-7-*,*:metricbeat-*",
  "filter_path": [
    "aggregations"
  ],
  "body": {
    "size": 0,
    "query": {
      "bool": {
        "filter": [
          {
            "terms": {
              "cluster_uuid": [
                "v9ecplXfT5aUQdCJb2pTGA"
              ]
            }
          },
          {
            "term": {
              "type": "node_stats"
            }
          },
          {
            "range": {
              "timestamp": {
                "gte": "now-5s"
              }
            }
          }
        ]
      }
    },
    "aggs": {
      "clusters": {
        "terms": {
          "field": "cluster_uuid",
          "size": 10000,
          "include": [
            "v9ecplXfT5aUQdCJb2pTGA"
          ]
        },
        "aggs": {
          "nodes": {
            "terms": {
              "field": "node_stats.node_id",
              "size": 10000
            },
            "aggs": {
              "index": {
                "terms": {
                  "field": "_index",
                  "size": 1
                }
              },
              "total_in_bytes": {
                "max": {
                  "field": "node_stats.fs.total.total_in_bytes"
                }
              },
              "available_in_bytes": {
                "max": {
                  "field": "node_stats.fs.total.available_in_bytes"
                }
              },
              "usage_ratio_percentile": {
                "bucket_script": {
                  "buckets_path": {
                    "available_in_bytes": "available_in_bytes",
                    "total_in_bytes": "total_in_bytes"
                  },
                  "script": "100 - Math.floor((params.available_in_bytes / params.total_in_bytes) * 100)"
                }
              },
              "name": {
                "terms": {
                  "field": "source_node.name",
                  "size": 1
                }
              }
            }
          }
        }
      }
    }
  }
}

disk usage alerts before
{
  "index": "*:.monitoring-es-6-*,*:.monitoring-es-7-*,*:metrics-elasticsearch.node_stats-*,.monitoring-es-6-*,.monitoring-es-7-*,metrics-elasticsearch.node_stats-*",
  "filter_path": [
    "aggregations"
  ],
  "body": {
    "size": 0,
    "query": {
      "bool": {
        "filter": [
          {
            "terms": {
              "cluster_uuid": [
                "v9ecplXfT5aUQdCJb2pTGA"
              ]
            }
          },
          {
            "term": {
              "data_stream.dataset": "elasticsearch.node_stats"
            }
          },
          {
            "term": {
              "type": "node_stats"
            }
          },
          {
            "range": {
              "timestamp": {
                "gte": "now-5s"
              }
            }
          }
        ]
      }
    },
    "aggs": {
      "clusters": {
        "terms": {
          "field": "cluster_uuid",
          "size": 10000,
          "include": [
            "v9ecplXfT5aUQdCJb2pTGA"
          ]
        },
        "aggs": {
          "nodes": {
            "terms": {
              "field": "node_stats.node_id",
              "size": 10000
            },
            "aggs": {
              "index": {
                "terms": {
                  "field": "_index",
                  "size": 1
                }
              },
              "total_in_bytes": {
                "max": {
                  "field": "node_stats.fs.total.total_in_bytes"
                }
              },
              "available_in_bytes": {
                "max": {
                  "field": "node_stats.fs.total.available_in_bytes"
                }
              },
              "usage_ratio_percentile": {
                "bucket_script": {
                  "buckets_path": {
                    "available_in_bytes": "available_in_bytes",
                    "total_in_bytes": "total_in_bytes"
                  },
                  "script": "100 - Math.floor((params.available_in_bytes / params.total_in_bytes) * 100)"
                }
              },
              "name": {
                "terms": {
                  "field": "source_node.name",
                  "size": 1
                }
              }
            }
          }
        }
      }
    }
  }
}
disk usage alerts after

@neptunian neptunian added the Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services label Nov 18, 2021
@neptunian neptunian self-assigned this Nov 18, 2021
@neptunian neptunian changed the title [Stack Monitoring] update queries for elasticsearch integration [Stack Monitoring] compatibility for elasticsearch integration Nov 18, 2021
@@ -66,28 +66,17 @@ export function prefixIndexPattern(
}

if (!ccsEnabled || !ccs) {
return appendMetricbeatIndex(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop appending the metricbeat-* pattern

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a thought here that there's a possibility people might be using monitoring.ui.metricbeat.index to append custom index patterns. Should we add a deprecation warning if that config key is specified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we ever supported the user using this as it wasn't documented anywhere and we didn't technically support metricbeat-*. if that's the case would deprecating it be confusing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, okay if it's not in the docs I guess it's not bad to just let it go away silently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#120384 has me wondering if we should re-think this approach. I get the feeling there's more than 1 customer problem we've solved by reaching for monitoring.ui.metricbeat.index

Copy link
Contributor Author

@neptunian neptunian Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we kept the config value there, we aren't using it anymore because we aren't querying metricbeat index anymore. If we recommended that kind of hack, hopefully we let the user know that. We probably should have discussed a fix whether we should be setting ccs to default to *.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I guess we could say merging this PR raises the importance of having a proper config (which is #120384)

return indexPatterns;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new function similar to getIndexPatterns but returns 1. There will be a case for each type (kibana, elasticsearch, logstash, etc). Since the datastreams use elasticsearch instead of es (used in .monitoring-es) we have to map them.

}
return `${type}-${datasetsPattern}-${namespace}`;
}

Copy link
Contributor Author

@neptunian neptunian Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gives you both datastream and legacy index patterns for a given moduleType (elasticsearch, kibana, etc)

@neptunian neptunian changed the title [Stack Monitoring] compatibility for elasticsearch integration [Stack Monitoring] compatibility for agent data streams Nov 21, 2021
@neptunian neptunian force-pushed the 119109-es-integration-queries branch 3 times, most recently from 8511fe8 to 597be18 Compare November 25, 2021 01:02
@neptunian neptunian force-pushed the 119109-es-integration-queries branch from 1764d6d to 8f8d8b1 Compare December 1, 2021 16:42
@neptunian neptunian marked this pull request as ready for review December 1, 2021 17:21
@neptunian neptunian requested a review from a team as a code owner December 1, 2021 17:21
@elasticmachine
Copy link
Contributor

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

@neptunian
Copy link
Contributor Author

neptunian commented Jan 4, 2022

@matschaffer thanks for the heads up. We'll see what @richkuz says as @kovyrin is on holiday for a while. I think this PR should continue to be merged, since we don't officially support metricbeat-* and we can do a follow up PR if the solution involves changes to our code for the sake of enterprise search.

@richkuz
Copy link
Contributor

richkuz commented Jan 4, 2022

We'll see what @richkuz says as @kovyrin is on holiday for a while. I think this PR should continue to be merged, since we don't officially support metricbeat-* and we can do a follow up PR if the solution involves changes to our code for the sake of enterprise search.

I'm OK with merging this into 8.1.0 and breaking Ent Search temporarily. As long as this PR is only targeting 8.1.0, and not 8.0.0, we should have enough time to find and implement a resolution for Enterprise Search in a follow-up PR (tracking in #121975 ).

@neptunian
Copy link
Contributor Author

@matschaffer @klacabane What do you think about trying to get this in so we can start testing?

@klacabane
Copy link
Contributor

Sounds good to me :)

@neptunian
Copy link
Contributor Author

@elasticmachine merge upstream

@neptunian
Copy link
Contributor Author

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
monitoring 445.7KB 445.5KB -207.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @neptunian

@Evesy
Copy link

Evesy commented Jun 16, 2023

👋 Hi, I'm wondering if these changes have broken stack monitoring on Elastic 8.x when using Metricbeat.

From what I can see with this change, Kibana is now expecting documents with terms for Elastic cluster stats
data_stream.dataset: "elasticsearch.stack_monitoring.cluster_stats"
metricset.name: "cluster_stats"
type: "cluster_stats"

However the metricbeat elasticsearch module (when used for stack monitoring) does not include these fields in the documents it emits?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Stack Monitoring release_note:enhancement Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Stack monitoring] Add agent-compatible data stream patterns to data fetching patterns