Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with TiDB Monitor #1949

Closed
DanielZhangQD opened this issue Mar 17, 2020 · 3 comments · Fixed by #1962
Closed

Issues with TiDB Monitor #1949

DanielZhangQD opened this issue Mar 17, 2020 · 3 comments · Fixed by #1962
Assignees
Labels
area/monitor monitoring type/bug Something isn't working
Milestone

Comments

@DanielZhangQD
Copy link
Contributor

DanielZhangQD commented Mar 17, 2020

Bug Report

What version of Kubernetes are you using?

1.12.8
What version of TiDB Operator are you using?

v1.1.0-beta.2
What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?

local-storage
What's the status of the TiDB cluster pods?

Running
What did you do?

Upgrade TiDB Operator from v1.0 to v1.1 and Create TiDB Monitor CR.
What did you expect to see?
Monitor works as expected.
What did you see instead?

  • The existing Monitor deployment is rolling updated after CR created.
  • Service NodePorts are changed after CR created.
    Before CR creation:
    dan10-grafana            NodePort    10.233.22.65    <none>        3000:31558/TCP                   23h
    dan10-monitor-reloader   NodePort    10.233.6.191    <none>        9089:30400/TCP                   23h
    dan10-prometheus         NodePort    10.233.16.45    <none>        9090:32184/TCP                   23h
    
    After CR creation:
    dan10-grafana            NodePort    10.233.22.65    <none>        3000:32287/TCP                   24h
    dan10-monitor-reloader   NodePort    10.233.6.191    <none>        9089:30400/TCP                   24h
    dan10-prometheus         NodePort    10.233.16.45    <none>        9090:30273/TCP                   24h
    dan10-reloader           NodePort    10.233.13.22    <none>        9089:32448/TCP                   40m
    
  • The reloader service name is changed from <cluster>-monitor-reloader to <cluster>-monitor-reloader. (Please see detail in above info)
  • The monitor deployment rolling updates continuously.
    dan10-monitor-765b58fcb-bwn5l             0/3     Pending                      0          0s
    dan10-monitor-765b58fcb-bwn5l             0/3     Pending                      0          0s
    dan10-monitor-765b58fcb-bwn5l             0/3     Init:0/1                     0          1s
    dan10-monitor-765b58fcb-bwn5l             0/3     PodInitializing              0          7s
    dan10-monitor-765b58fcb-bwn5l             3/3     Running                      0          9s
    dan10-monitor-765b58fcb-bwn5l             3/3     Terminating                  0          56s
    dan10-monitor-765b58fcb-bwn5l             0/3     Terminating                  0          59s
    dan10-monitor-765b58fcb-bwn5l             0/3     Terminating                  0          60s
    dan10-monitor-765b58fcb-bwn5l             0/3     Terminating                  0          60s
    dan10-monitor-746b7bdb84-mlkcw            0/3     Pending                      0          0s
    dan10-monitor-746b7bdb84-mlkcw            0/3     Pending                      0          0s
    dan10-monitor-746b7bdb84-mlkcw            0/3     Init:0/1                     0          0s
    dan10-monitor-746b7bdb84-mlkcw            0/3     Init:0/1                     0          6s
    dan10-monitor-746b7bdb84-mlkcw            0/3     PodInitializing              0          7s
    dan10-monitor-746b7bdb84-mlkcw            3/3     Running                      0          12s
    dan10-monitor-746b7bdb84-mlkcw            3/3     Terminating                  0          27s
    dan10-monitor-746b7bdb84-mlkcw            0/3     Terminating                  0          29s
    dan10-monitor-746b7bdb84-mlkcw            0/3     Terminating                  0          30s
    dan10-monitor-746b7bdb84-mlkcw            0/3     Terminating                  0          30s
    dan10-monitor-765b58fcb-r4l9p             0/3     Pending                      0          0s
    dan10-monitor-765b58fcb-r4l9p             0/3     Pending                      0          0s
    dan10-monitor-765b58fcb-r4l9p             0/3     Init:0/1                     0          0s
    dan10-monitor-765b58fcb-r4l9p             0/3     PodInitializing              0          6s
    dan10-monitor-765b58fcb-r4l9p             3/3     Running                      0          10s
    
  • If change service.portName of one service, the NodePort is changed too.
    Before update:
    dan10-grafana            NodePort    10.233.22.65    <none>        3000:32287/TCP                   26h
    
    After update:
    dan10-grafana            NodePort    10.233.22.65    <none>        3000:30562/TCP                   26h
    
@DanielZhangQD DanielZhangQD added this to the v1.1.0 milestone Mar 17, 2020
@DanielZhangQD
Copy link
Contributor Author

CR:

apiVersion: pingcap.com/v1alpha1
kind: TidbMonitor
metadata:
  name: dan10
spec:
  clusters:
  - name: dan10
  prometheus:
    baseImage: prom/prometheus
    version: v2.11.1
    resources:
      limits: {}
      #   cpu: 8000m
      #   memory: 8Gi
      requests: {}
      #   cpu: 4000m
      #   memory: 4Gi
    imagePullPolicy: IfNotPresent
    logLevel: info
    reserveDays: 12
    service:
      type: NodePort
      portName: http-prometheus 
  grafana:
    baseImage: grafana/grafana
    version: 6.0.1
    imagePullPolicy: IfNotPresent
    logLevel: info
    resources:
      limits: {}
      #   cpu: 8000m
      #   memory: 8Gi
      requests: {}
      #   cpu: 4000m
      #   memory: 4Gi
    username: admin
    password: admin
    envs:
      # Configure Grafana using environment variables except GF_PATHS_DATA, GF_SECURITY_ADMIN_USER and GF_SECURITY_ADMIN_PASSWORD
      # Ref https://grafana.com/docs/installation/configuration/#using-environment-variables
      GF_AUTH_ANONYMOUS_ENABLED: "true"
      GF_AUTH_ANONYMOUS_ORG_NAME: "Main Org."
      GF_AUTH_ANONYMOUS_ORG_ROLE: "Viewer"
      # if grafana is running behind a reverse proxy with subpath http://foo.bar/grafana
      # GF_SERVER_DOMAIN: foo.bar
      # GF_SERVER_ROOT_URL: "%(protocol)s://%(domain)s/grafana/"
    service:
      type: NodePort
      portName: http-grafana
  initializer:
    baseImage: pingcap/tidb-monitor-initializer
    version: v3.0.9
    imagePullPolicy: Always
    resources: {}
    # limits:
    #  cpu: 50m
    #  memory: 64Mi
    # requests:
    #  cpu: 50m
    #  memory: 64Mi
  reloader:
    baseImage: pingcap/tidb-monitor-reloader
    version: v1.0.1
    imagePullPolicy: IfNotPresent
    service:
      type: NodePort
      portName: tcp-reloader
    resources: {}
      # limits:
      #  cpu: 50m
      #  memory: 64Mi
      # requests:
      #  cpu: 50m
      #  memory: 64Mi
  imagePullPolicy: IfNotPresent
  persistent: true
  storageClassName: local-storage
  storage: 10Gi
  nodeSelector: {}
  annotations: {}
  tolerations: []
  kubePrometheusURL: http://prometheus-k8s.monitoring.svc:9090
  alertmanagerURL: ""

@DanielZhangQD
Copy link
Contributor Author

For the monitor deployment continuously rolling update issue, it's probable caused by code https://github.com/pingcap/tidb-operator/blob/master/pkg/monitor/monitor/util.go#L507, wherewith multiple envs, the order of envs may be different during each sync.

@DanielZhangQD
Copy link
Contributor Author

For the NodePort change issue, it's probable caused by code https://github.com/pingcap/tidb-operator/blob/master/pkg/controller/generic_control.go#L250, only the clusterIP is reserved during the service update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitor monitoring type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants