Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus is not enabled% #121

Closed
zhangzheyu2simple opened this issue Nov 25, 2019 · 20 comments
Closed

prometheus is not enabled% #121

zhangzheyu2simple opened this issue Nov 25, 2019 · 20 comments
Assignees
Labels
bug Something isn't working vault-server Area: operation and usage of vault server in k8s

Comments

@zhangzheyu2simple
Copy link

after deploying ha vault in a k8s cluster, I started to try to scrape prometheus metrics of vault following the regular guide.
but get this error when
curl -X GET "http://localhost:8236/v1/sys/metrics?format="prometheus"" -H "X-Vault-Token: <root_token>"

prometheus is not enabled%

you can reproduce this err following this step

# Available parameters and their default values for the Vault chart.

global:
  # enabled is the master enabled switch. Setting this to true or false
  # will enable or disable all the components within this chart by default.
  enabled: true

  # Image is the name (and tag) of the Vault Docker image.
  image: "vault:1.3.0"
  # Overrides the default Image Pull Policy
  imagePullPolicy: IfNotPresent
  # Image pull secret to use for registry authentication.
  imagePullSecrets: []
  # imagePullSecrets:
  #   - name: image-pull-secret
  # TLS for end-to-end encrypted transport
  tlsDisable: true

server:
  # Resource requests, limits, etc. for the server cluster placement. This
  # should map directly to the value of the resources field for a PodSpec.
  # By default no direct resource request is made.

  resources:
  # resources:
  #   requests:
  #     memory: 256Mi
  #     cpu: 250m
  #   limits:
  #     memory: 256Mi
  #     cpu: 250m

  # Ingress allows ingress services to be created to allow external access
  # from Kubernetes to access Vault pods.
  ingress:
    enabled: false
    labels:
      {}
      # traffic: external
    annotations:
      {}
      # kubernetes.io/ingress.class: nginx
      # kubernetes.io/tls-acme: "true"
    hosts:
      - host: chart-example.local
        paths: []

    tls: []
    #  - secretName: chart-example-tls
    #    hosts:
    #      - chart-example.local

  # authDelegator enables a cluster role binding to be attached to the service
  # account.  This cluster role binding can be used to setup Kubernetes auth
  # method.  https://www.vaultproject.io/docs/auth/kubernetes.html
  authDelegator:
    enabled: false

  # extraContainers is a list of sidecar containers. Specified as a raw YAML string.
  extraContainers: null

  # extraEnvironmentVars is a list of extra enviroment variables to set with the stateful set. These could be
  # used to include variables required for auto-unseal.
  extraEnvironmentVars:
    {}
    # GOOGLE_REGION: global
    # GOOGLE_PROJECT: myproject
    # GOOGLE_APPLICATION_CREDENTIALS: /vault/userconfig/myproject/myproject-creds.json

  # extraSecretEnvironmentVars is a list of extra enviroment variables to set with the stateful set.
  # These variables take value from existing Secret objects.
  extraSecretEnvironmentVars:
    []
    # - envName: AWS_SECRET_ACCESS_KEY
    #   secretName: vault
    #   secretKey: AWS_SECRET_ACCESS_KEY

  # extraVolumes is a list of extra volumes to mount. These will be exposed
  # to Vault in the path `/vault/userconfig/<name>/`. The value below is
  # an array of objects, examples are shown below.
  extraVolumes:
    []
    # - type: secret (or "configMap")
    #   name: my-secret
    #   path: null # default is `/vault/userconfig`

  # Affinity Settings
  # Commenting out or setting as empty the affinity variable, will allow
  # deployment to single node services such as Minikube
  affinity: |
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app.kubernetes.io/name: {{ template "vault.name" . }}
              app.kubernetes.io/instance: "{{ .Release.Name }}"
              component: server
          topologyKey: kubernetes.io/hostname

  # Toleration Settings for server pods
  # This should be a multi-line string matching the Toleration array
  # in a PodSpec.
  tolerations: {}

  # nodeSelector labels for server pod assignment, formatted as a muli-line string.
  # ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
  # Example:
  # nodeSelector: |
  #   beta.kubernetes.io/arch: amd64
  nodeSelector: {}

  # Extra labels to attach to the server pods
  # This should be a multi-line string mapping directly to the a map of
  # the labels to apply to the server pods
  extraLabels: {}

  # Extra annotations to attach to the server pods
  # This should be a multi-line string mapping directly to the a map of
  # the annotations to apply to the server pods
  annotations: {}

  # Enables a headless service to be used by the Vault Statefulset
  service:
    enabled: true
    # clusterIP controls whether a Cluster IP address is attached to the
    # Vault service within Kubernetes.  By default the Vault service will
    # be given a Cluster IP address, set to None to disable.  When disabled
    # Kubernetes will create a "headless" service.  Headless services can be
    # used to communicate with pods directly through DNS instead of a round robin
    # load balancer.
    # clusterIP: None

    # Port on which Vault server is listening
    port: 8200
    # Target port to which the service should be mapped to
    targetPort: 8200
    # Extra annotations for the service definition
    annotations: {}

  # This configures the Vault Statefulset to create a PVC for data
  # storage when using the file backend.
  # See https://www.vaultproject.io/docs/configuration/storage/index.html to know more
  dataStorage:
    enabled: false
    # Size of the PVC created
    size: 10Gi
    # Name of the storage class to use.  If null it will use the
    # configured default Storage Class.
    storageClass: null
    # Access Mode of the storage device being used for the PVC
    accessMode: ReadWriteOnce

  # This configures the Vault Statefulset to create a PVC for audit
  # logs.  Once Vault is deployed, initialized and unseal, Vault must
  # be configured to use this for audit logs.  This will be mounted to
  # /vault/audit
  # See https://www.vaultproject.io/docs/audit/index.html to know more
  auditStorage:
    enabled: false
    # Size of the PVC created
    size: 10Gi
    # Name of the storage class to use.  If null it will use the
    # configured default Storage Class.
    storageClass: null
    # Access Mode of the storage device being used for the PVC
    accessMode: ReadWriteOnce

  # Run Vault in "dev" mode. This requires no further setup, no state management,
  # and no initialization. This is useful for experimenting with Vault without
  # needing to unseal, store keys, et. al. All data is lost on restart - do not
  # use dev mode for anything other than experimenting.
  # See https://www.vaultproject.io/docs/concepts/dev-server.html to know more
  dev:
    enabled: false

  # Run Vault in "standalone" mode. This is the default mode that will deploy if
  # no arguments are given to helm. This requires a PVC for data storage to use
  # the "file" backend.  This mode is not highly available and should not be scaled
  # past a single replica.
  standalone:
    enabled: "-"

    # config is a raw string of default configuration when using a Stateful
    # deployment. Default is to use a PersistentVolumeClaim mounted at /vault/data
    # and store data there. This is only used when using a Replica count of 1, and
    # using a stateful set. This should be HCL.
    config: |
      ui = true

      listener "tcp" {
        tls_disable = 1
        address = "[::]:8200"
        cluster_address = "[::]:8201"
      }
      storage "file" {
        path = "/vault/data"
      }

      # Example configuration for using auto-unseal, using Google Cloud KMS. The
      # GKMS keys must already exist, and the cluster must have a service account
      # that is authorized to access GCP KMS.
      #seal "gcpckms" {
      #   project     = "vault-helm-dev"
      #   region      = "global"
      #   key_ring    = "vault-helm-unseal-kr"
      #   crypto_key  = "vault-helm-unseal-key"
      #}

  # Run Vault in "HA" mode. There are no storage requirements unless audit log
  # persistence is required.  In HA mode Vault will configure itself to use Consul
  # for its storage backend.  The default configuration provided will work the Consul
  # Helm project by default.  It is possible to manually configure Vault to use a
  # different HA backend.
  ha:
    enabled: true
    replicas: 3

    # config is a raw string of default configuration when using a Stateful
    # deployment. Default is to use a Consul for its HA storage backend.
    # This should be HCL.
    config: |
      ui = true

      listener "tcp" {
        tls_disable = 1
        address = "[::]:8200"
        cluster_address = "[::]:8201"
      }
      storage "mysql" {
        address = "mysql-gzfcm.mysql-gzfcm:3306"
        username = "admin"
        password = "admin"
        ha_enabled = "true"
        lock_table = "vault_lockrr"
      }
      telemetry {
        prometheus_retention_time = "24h"
        disable_hostname = true
      }
      # Example configuration for using auto-unseal, using Google Cloud KMS. The
      # GKMS keys must already exist, and the cluster must have a service account
      # that is authorized to access GCP KMS.
      #seal "gcpckms" {
      #   project     = "vault-helm-dev-246514"
      #   region      = "global"
      #   key_ring    = "vault-helm-unseal-kr"
      #   crypto_key  = "vault-helm-unseal-key"
      #}

    # A disruption budget limits the number of pods of a replicated application
    # that are down simultaneously from voluntary disruptions
    disruptionBudget:
      enabled: true

      # maxUnavailable will default to (n/2)-1 where n is the number of
      # replicas. If you'd like a custom value, you can specify an override here.
      maxUnavailable: null

  # Definition of the serviceaccount used to run Vault.
  serviceaccount:
    annotations: {}

# Vault UI
ui:
  # True if you want to create a Service entry for the Vault UI.
  #
  # serviceType can be used to control the type of service created. For
  # example, setting this to "LoadBalancer" will create an external load
  # balancer (for supported K8S installations) to access the UI.
  enabled: true
  serviceType: "ClusterIP"
  serviceNodePort: null
  externalPort: 8200

  # loadBalancerSourceRanges:
  #   - 10.0.0.0/16
  #   - 1.78.23.3/32

  # loadBalancerIP:

  # Extra annotations to attach to the ui service
  # This should be a multi-line string mapping directly to the a map of
  # the annotations to apply to the ui service
  annotations: {}

helm install ./
then port forward the svc to localhost:8236 ,and unseal vault in web ui.

then the curl metrics return prometheus is not enabled%

@zhangzheyu2simple
Copy link
Author

zhangzheyu2simple commented Nov 25, 2019

$ git branch

  • master

@zhangzheyu2simple
Copy link
Author

zhangzheyu2simple commented Nov 25, 2019

while I can get a json format metrics using curl -X GET "http://localhost:8236/v1/sys/metrics -H "X-Vault-Token: <root_token>"

@tvoran tvoran added question A general question about usage vault-server Area: operation and usage of vault server in k8s labels Jan 23, 2020
@tvoran tvoran added bug Something isn't working and removed question A general question about usage labels Feb 5, 2020
@tvoran tvoran self-assigned this Feb 5, 2020
@tvoran
Copy link
Member

tvoran commented Feb 7, 2020

Hi @zhangzheyu2simple, I wasn't able to reproduce this bug with your given values file; the telemetry stanza appears correct for enabling prometheus. Since the json format metrics were accessible, it sounds like the config from your helm values isn't making it into the ConfigMap.

I'd suggest double-checking which values are being used in the deployment (helm get values vault), and also checking the deployed config map. If you run kubectl describe configmap vault-config, that should contain the telemetry stanza with the prometheus_retention_time setting.

@tvoran tvoran added the waiting for response Waiting for a response from the author label Feb 7, 2020
@cablespaghetti
Copy link
Contributor

#215 Should help you get up and running with Prometheus a little more easily.

@tvoran
Copy link
Member

tvoran commented Mar 10, 2020

Thanks @cablespaghetti, we'll take a look.

Closing this issue for now, let us know if you run into further issues @zhangzheyu2simple.

@tvoran tvoran closed this as completed Mar 10, 2020
@tvoran tvoran removed the waiting for response Waiting for a response from the author label Mar 10, 2020
@damianfedeczko
Copy link

Hello,
I think that your issue is connected to the fact that you are running Vault in HA with 3 instances.
Accordingly to the docs:
"The /v1/sys/metrics endpoint is only accessible on active nodes and automatically disabled on standby nodes. You can enable the /v1/sys/metrics endpoint on standby nodes by enabling unauthenticated metrics access."
https://www.vaultproject.io/docs/configuration/telemetry#prometheus

Your issue would be probably solved by adding unauthenticated_metrics_access = true to your telemetry stanza - it worked for me when deployed HA with 3 Vault instances. When running 1 Vault instance, the directive is not needed.

Really hope that this will help with your issue, cheers!

@one1zero1one
Copy link

@tvoran We're seeing a similar behaviour - thought not to open a new issue since this one is quite recently closed.

chart: hashicorp/vault
version: 0.6.0

The relevant bits from the values.yaml:

  ha:
    enabled: true
    replicas: 3
    config: |
      ui = true
      log_format = "json"
      listener "tcp" {
        tls_disable = 1
        address = "[::]:8200"
        cluster_address = "[::]:8201"
      }
      telemetry {
        unauthenticated_metrics_access = true
        prometheus_retention_time = "24h"
        disable_hostname = true
      }
      storage "consul" {
        path = "vault"
        address = "vault-consul-server.vault.svc.cluster.local:8500"
      }

I've checked the configmap and container and the configuration made it ok there:

/ $ cat /tmp/storageconfig.hcl
...
telemetry {
  unauthenticated_metrics_access = true
  prometheus_retention_time = "24h"
  disable_hostname = true
}
...
ps axf
...
    9 vault     0:02 vault server -config=/tmp/storageconfig.hcl
...

However, when trying to scrape with prometheus we get 400 Bad Request

curl http://10.4.80.34:8200/v1/sys/metrics?format=prometheus
prometheus is not enabled

@cablespaghetti
Copy link
Contributor

The tricky bit which took me a while to work out is that the unauthenticated_metrics_access needs to be within your listener config e.g.

listener "tcp" {
  telemetry {
    unauthenticated_metrics_access = "true"
  }
}
telemetry {
  prometheus_retention_time = "30s",
  disable_hostname = true
}

@damianfedeczko
Copy link

damianfedeczko commented Jun 10, 2020

@one1zero1one can you try `curl -X GET 'http://$YOUR_VAULT_INSTANCE/v1/sys/metrics?format=prometheus' -H "X-Vault-Token:$YOUR_TOKEN" ? Ditch the 8200 in curl.

@damianfedeczko
Copy link

@cablespaghetti yeah, the docs are a little bit misleading in this case.

As described here, the unauthenticated_metrics_access telemetry directive has to declared within the listener, just like you did:
https://www.vaultproject.io/docs/configuration/listener/tcp#configuring-unauthenticated-metrics-access
But, when looking for the telemetry configuration docs for Prometheus, you are also instructed to use a telemetry stanza - just not embedded into the listener config. Confusing, should be unified in my opinion.

@one1zero1one
Copy link

@cablespaghetti thanks, that solved it.

/ # curl http://10.4.80.40:8200/v1/sys/metrics?format=prometheus
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection 
...

@damianfedeczko thanks - but didn't get a chance to try it as my colleague ran the new config faster than I could check - but I would assume would have worked to use the service and token. Issue was that we aimed for unauthenticated scrape from the get go.

+1 for ultimately having the telemetry stanza unified to avoid confusion.

@damianfedeczko
Copy link

@one1zero1one cool, no worries - @cablespaghetti answer nailed it

@cybercharly1988
Copy link

The tricky bit which took me a while to work out is that the unauthenticated_metrics_access needs to be within your listener config e.g.

listener "tcp" {
  telemetry {
    unauthenticated_metrics_access = "true"
  }
}
telemetry {
  prometheus_retention_time = "30s",
  disable_hostname = true
}

thanks basically this is the answer, create separate section for every telemetry, i have had wasted almost 5 hours fixing this issue, thanks

@mshivanna
Copy link

the above solution works. After adding the above configuration, if you are running vault on Prometheus you will have to restart the pods.

@jsmickey
Copy link

Since this post helped with the undocumented telemetry settings, I wanted to share this:

I deployed the Prometheus Helm Chart
Add the following to https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml

This automatically imports the endpoints and sets them as targets in Prometheus.

prometheus:
  prometheusSpec:
    additionalScrapeConfigs:
      - job_name: 'vault'
        metrics_path: '/v1/sys/metrics'
        params:
          format: ['prometheus']
        scheme: https
        tls_config:
          ca_file: '/etc/prometheus/secrets/my-secret/ca.crt'
          insecure_skip_verify: true
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
          - source_labels:
              [
                __meta_kubernetes_namespace,
                __meta_kubernetes_pod_container_port_number,
              ]
            action: keep
            regex: vault;8200

@june07
Copy link

june07 commented Dec 24, 2021

Heads up for anyone else to come across this... a restart is needed for the settings to take effect. Simple reload was not sufficient. Maybe someone more familiar with the code base can confirm as well.

Was trying to avoid needing to unlock vault again. Ah well.

@tyriis
Copy link

tyriis commented Feb 12, 2022

@june07 restart is not related to vault, it is the default kubernetes functionality to not restart pods when the config-map or secret changes (they are loaded into the container but most applications read config only start in the lifecycle. To bypass this limitation, have a look at https://github.com/stakater/Reloader.

@tyriis
Copy link

tyriis commented Feb 12, 2022

have translated the prom config to a pod monitor

---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: vault
  namespace: secops
  labels:
    app.kubernetes.io/instance: vault
    app.kubernetes.io/managed-by: fluxcd
    app.kubernetes.io/name: vault
spec:
  namespaceSelector:
    matchNames:
      - secops
  selector:
    matchLabels:
      app.kubernetes.io/instance: vault
      app.kubernetes.io/name: vault
      vault-active: "true"
  podMetricsEndpoints:
    - path: /v1/sys/metrics
      params:
        format: ["prometheus"]
      port: http
      relabelings:
        - action: keep
          sourceLabels: ["__meta_kubernetes_namespace", "__meta_kubernetes_pod_container_port_number"]
          regex: secops;8200

assure your namesapce matches when you copy it ;)

@ellipsis-me
Copy link

The tricky bit which took me a while to work out is that the unauthenticated_metrics_access needs to be within your listener config e.g.

listener "tcp" {
  telemetry {
    unauthenticated_metrics_access = "true"
  }
}
telemetry {
  prometheus_retention_time = "30s",
  disable_hostname = true
}

That worked for me, tks!

@cheeyeelim
Copy link

The tricky bit which took me a while to work out is that the unauthenticated_metrics_access needs to be within your listener config e.g.

listener "tcp" {
  telemetry {
    unauthenticated_metrics_access = "true"
  }
}
telemetry {
  prometheus_retention_time = "30s",
  disable_hostname = true
}

Also if you are working with the helm chart, there are 3 separate config sections (for different mode/storage - i.e. standalone, ha, raft). Make sure you set the entire config under the mode/storage that you are using.

In case you got confused by the comments in the values.yaml like me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working vault-server Area: operation and usage of vault server in k8s
Projects
None yet
Development

No branches or pull requests