Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canary analysis metrics templating #418

Closed
stefanprodan opened this issue Jan 29, 2020 · 2 comments · Fixed by #419
Closed

Canary analysis metrics templating #418

stefanprodan opened this issue Jan 29, 2020 · 2 comments · Fixed by #419
Assignees
Labels
kind/feature Feature request

Comments

@stefanprodan
Copy link
Member

stefanprodan commented Jan 29, 2020

Using custom metrics for canary analysis requires the query to be inlined in the Canary spec.
To allow these queries to be parameterised, shared across namespaces and canary objects,
the Canary spec could contain a reference to a metric template that defines the metrics provider and the query.

Metric template specifications

Kubernetes custom resource:

type MetricTemplateSpec struct {
    // Provider of this metric
    Provider MetricTemplateProvider `json:"provider,omitempty"`

    // Query template of this metric
    Query string `json:"query,omitempty"`
}

type MetricTemplateProvider struct {
    // Type of provider
    Type string `json:"type,omitempty"`

    // Address of this provider API
    Address string `json:"address,omitempty"`

    // Secret reference containing the provider credentials
    // +optional
    SecretRef *corev1.LocalObjectReference `json:"secretRef,omitempty"`
}

The provider type could be prometheus, influxdb, datadog, wavefront, etc.
Depending on the provider, the secret could contain basic-auth credentials, org/token, an API key or a TLS cert.

Prometheus example:

apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: latency
spec:
  provider:
    type: prometheus
    address: http://linkerd-prometheus.linkerd:9090
  query: |
    histogram_quantile(
        0.99,
        sum(
            rate(
                response_latency_ms_bucket{
                    namespace="{{ namespace }}",
                    deployment=~"{{ target }}",
                    direction="inbound"
                    }[{{ interval }}]
                )
            ) by (le)
        )

The following variables are available in templates:

  • name (canary.metadata.name)
  • namespace (canary.metadata.namespace)
  • target (canary.spec.targetRef.name)
  • service (canary.spec.service.name)
  • ingress (canary.spec.ingresRef.name)
  • interval (canary.spec.canaryAnalysis.metrics[].interval)

The query is rendered with go text/template.

InfluxDB example:

apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: max-db-con
  namespace: flagger
spec:
  provider:
    type: influxdb
    address: http://influx.db:8086
    # secret must contain two keys: org and token
    secretRef:
      name: influx-auth
  # influxdb v2 query (flux lang)
  query: |
    from(bucket: "connections")
      |> filter(fn: (r) => r._client =~ /{{ target }}-{{ namespace }}-.*/)
      |> range(start: -{{ interval }})
      |> count()

Canary analysis specifications

A canary analysis metric can reference a template with templateRef:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: podinfo
  namespace: dev
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  canaryAnalysis:
    metrics:
      - name: "max database connections"
        templateRef:
          name: max-db-con
          # namespace is optional
          # when not specified, the canary namespace will be used
          namespace: flagger
        threshold: 100
        interval: 1m

When a metric has a template reference, Flagger will render the query and using the provider's client will execute that query.
The query runner takes the first result and converts it to type float64.
If the result is below the threshold, the metric check passes, otherwise the check fails and the rollout advancement is paused or stopped.

Metric provider specifications

To add a provider one should implement the provider interface:

type Interface interface {
	// RunQuery executes the query and converts the first result to float64
	RunQuery(query string) (float64, error)

	// IsOnline calls the provider endpoint and returns an error if the API is unreachable
	IsOnline() (bool, error)
}

An implementation will be available for Prometheus and it will replace the current metrics client.

Prometheus provider example:

// PrometheusProvider executes promQL queries against the Prometheus API
// using basic authentication if username and password are supplied 
type PrometheusProvider struct {
	timeout  time.Duration
	url      url.URL
	username string
	password string
}

// NewPrometheusProvider takes a provider spec and the credentials map,
// validates the HTTP(S) address, extracts the username and password values if provided and
// returns a Prometheus client ready to execute queries against the HTTP API
func NewPrometheusProvider(provider flaggerv1.MetricTemplateProvider, credentials map[string][]byte) (*PrometheusProvider, error) { }

// RunQuery executes the promQL query and returns the first result as float64
func (p *PrometheusProvider) RunQuery(query string) (float64, error) { }

// IsOnline calls the Prometheus status endpoint and returns an error if the API is unreachable
func (p *PrometheusProvider) IsOnline() (bool, error) { }

Ref: #241 #283 #284

@stefanprodan stefanprodan added the kind/feature Feature request label Jan 29, 2020
@stefanprodan stefanprodan self-assigned this Jan 29, 2020
@stealthybox
Copy link
Member

Very nice API design you have here.
It's succinct but still broken out well enough to be flexible.

@gitirabassi
Copy link

This looks very good @stefanprodan. Thanks! will be adding the influxdb implementation shortly after this gets merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants