Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[loki-operator] CRD: support podMonitor k8s CRD #7387

Open
liguozhong opened this issue Oct 11, 2022 · 10 comments
Open

[loki-operator] CRD: support podMonitor k8s CRD #7387

liguozhong opened this issue Oct 11, 2022 · 10 comments

Comments

@liguozhong
Copy link
Contributor

liguozhong commented Oct 11, 2022

Is your feature request related to a problem? Please describe.
The promtail service discovery module is implemented with reference to the service discovery of prometheus, which achieves great flexibility,
but the configuration is relatively complex and does not conform to the design concept of the k8s environment.

The CRD in k8s is more popular with operater personnel.

So we should implement a podMonitor CRD similar to the prometheus-operator project to simplify the service discovery feature of promtail. And make promtail easier to be integrated into the CICD process in the k8s environment.

server:
 ...
scrape_configs:
  - job_name: kubernetes-pods-name
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels:
          - __meta_kubernetes_pod_label_name
        target_label: __service__
      - source_labels:
          - __meta_kubernetes_pod_node_name
        target_label: __host__
      - action: drop
        regex: ''
        source_labels:
          - __service__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        replacement: $1
        separator: /
        source_labels:
          - __meta_kubernetes_namespace
          - __service__
        target_label: job
      - action: replace
        source_labels:
          - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        source_labels:
          - __meta_kubernetes_pod_name
        target_label: pod
      - action: replace
        source_labels:
          - __meta_kubernetes_pod_container_name
        target_label: container
      - replacement: /var/log/pods/*$1/*.log
        separator: /
        source_labels:
          - __meta_kubernetes_pod_uid
          - __meta_kubernetes_pod_container_name
        target_label: __path__

Describe the solution you'd like
CRD is simple to read 👍

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: example-app
spec:
  selector:
    matchLabels:
      app: example-app
  namespaceSelector:
    any: true

Describe alternatives you've considered
#7247 introduce reload endoint / signal to promtail. This PR will be useful when the CRD applied.

logging-operator: https://banzaicloud.com/docs/one-eye/logging-operator/configuration/flow/
Logging operator for Kubernetes based on Fluentd and Fluent-bit. not promtail.
if Fluentd and Fluent-bit has some bug, I dont know how to fix it .

apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
  name: flow-sample
  namespace: default
spec:
  filters:
    - parser:
        remove_key_name_field: true
        parse:
          type: nginx
    - tag_normaliser:
        format: ${namespace_name}.${pod_name}.${container_name}
  localOutputRefs:
    - s3-output
  match:
    - select:
        labels:
          app: nginx

https://github.com/banzaicloud/logging-operator
image

Additional context
Add any other context or screenshots about the feature request here.
podmonitor design:
https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/design.md#podmonitor
podmonitors link:
https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/crds/crd-podmonitors.yaml

#7387 (comment)

@liguozhong
Copy link
Contributor Author

help wanted @periklis , loki-operator module

@liguozhong
Copy link
Contributor Author

“Check out the Grafana Agent. PodLogs CRD is performing the functionality you are looking for.
https://grafana.com/docs/agent/latest/operator/custom-resource-quickstart/”

thanks Joel Verezhak.
slack

@liguozhong
Copy link
Contributor Author

I took a deep look at the grafana-agent code https://github.com/grafana/agent/blob/main/pkg/logs/logs.go, which is actually promtail. It's pretty cool.

so the question is, if a promtail user only collects logs, is it necessary to replace it with grafana-agent for the CRD feature.

@periklis
Copy link
Collaborator

@liguozhong as per design the Loki operator is not supporting/managing promtail or any log collector. We believe that the Loki Operator is better serving Loki-only installations. This is so because collector have so many config variants and forthcoming trends like OTEL make it hard to specialize on one.

Saying this, I totally see the value in a promtail CRD with variants like PodMonitor and ServiceMonitor. I can only see this happening in separate operator though

@liguozhong
Copy link
Contributor Author

@periklis hi , Thanks for the professional advice you shared.
so generally what we need is a promtail-operator, rather than continuing to expand on the basis of loki-operator, which will distract loki-operator.

So the question becomes, is it possible for us to develop another promtail-operator to support the PodMonitor CRD.
There are already so many log crds,Grafana Agent, logging-operator ...

Is it possible to have a promtail-operator SIG build this component independently first. This will not distract loki and loki-operator maintainers too much time.

@periklis
Copy link
Collaborator

From experience wring a couple operators till today writing a dedicated promtail-operator should be an easy task. My advice though is to spent some time designing and crafting the various CRDs you need for that purpose. AFAICS above you propose two to three CRDs, a promtail CRD for the deamonset, a PodMonitor to reconcile scrape configs, a ServiceMonitor to reconcile scrape configs. Even starting with two of them, e.g. promtail + servicemonitor/podmonitor, I would advise:

  1. Start with a draft of Promtail CRD and eliminate obsolete promtail config field you won't expose anyway.
  2. Work out the contradictory config fields, e.g. by make fields enums where possible
  3. Work out the status field for the Promtail CRD to inform users about error/degraded conditions
  4. Work out how you inject credentials, certificates, tokens, CAs via LocalObjectReference ConfigMap/Secret in your Promtail CRD.
  5. Think about how you express tenants and what this means about fields in your CRD.

I am glad to help you with reviews on design even on markdown first.

@liguozhong
Copy link
Contributor Author

liguozhong commented Oct 12, 2022

thanks! @periklis
I am very happy that the loki community intends to support the podMonitor CRD, which will make loki even more successful in the cloud-native era. So that more users of the original prometheus can easily monitor logs.

I'd be happy to implement a preliminary version of the promtail+ podMonitor CRD for the community to get an initial progress.

@periklis
Copy link
Collaborator

I recommend to add @cyriltovena @owen-d and @slim-bean to the discussion here. I agree that promtail needs a separate small operator but as said above the competition with so many other log collection offerings , e.g. OTEL, OpenShift's ClusterLoggingOperator, to name some is a thing to consider. I would not object having an operator for promtail/grafana-agent that takes care of the log collection/normalization parts.

@verejoel
Copy link
Contributor

@liguozhong I would be very happy to contribute to the Promtail operator if I can in any way! Have a big interest from my side in enabling this

@liguozhong
Copy link
Contributor Author

@liguozhong I would be very happy to contribute to the Promtail operator if I can in any way! Have a big interest from my side in enabling this

oh, that's great, looking forward to your contribution.
Before writing code, we need to wait for more loki maintainers to participate in issue discussions to decide whether we need promtail-operator. if not ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants