Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap the server with the Prometheus so we get metrics + add an e2e te… #267

Merged
merged 7 commits into from
Dec 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions .github/workflows/verify-k8s.yml
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,17 @@ jobs:
sed -i -e 's,--ca=googleca,--ca=ephemeralca,g' ${{ github.workspace }}/config/deployment.yaml
# Drop the ct-log flag's value to elide CT-log uploads.
sed -i -E 's,"--ct-log-url=[^"]+","--ct-log-url=",g' ${{ github.workspace }}/config/deployment.yaml
# Switch to one replica to make it easier to test the scraping of
# metrics since we know all the requests then go to the same server.
sed -i -E 's,replicas: 3,replicas: 1,g' ${{ github.workspace }}/config/deployment.yaml
# Expose the prometheus port as a service so tests can grab it
# without hitting the k8s API
cat <<EOF >> ${{ github.workspace }}/config/deployment.yaml
- name: prometheus
protocol: TCP
port: 2112
targetPort: 2112
EOF

# From: https://banzaicloud.com/blog/kubernetes-oidc/
# To be able to fetch the public keys and validate the JWT tokens against
Expand Down Expand Up @@ -225,6 +236,25 @@ jobs:

kubectl wait --for=condition=Complete --timeout=90s job/check-oidc

- name: Validate prometheus metrics exported and look correct
run: |
cat <<EOF | ko apply -f -
apiVersion: batch/v1
kind: Job
metadata:
name: check-prometheus-metrics
spec:
template:
spec:
restartPolicy: Never
automountServiceAccountToken: false
containers:
- name: check-metrics
image: ko://github.com/sigstore/fulcio/test/prometheus/
EOF

kubectl wait --for=condition=Complete --timeout=90s job/check-prometheus-metrics

- name: Collect logs
if: ${{ always() }}
run: |
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ There are other targets available in the [`Makefile`](Makefile), check it out.

## API

The API is defined via OpenAPI, defined [here](openapi.yaml).
The API is defined [here](./pkg/api/client.go).

## Transparency

Expand Down
30 changes: 17 additions & 13 deletions cmd/app/serve.go
Original file line number Diff line number Diff line change
Expand Up @@ -100,19 +100,23 @@ var serveCmd = &cobra.Command{
}

decorateHandler := func(h http.Handler) http.Handler {
return http.HandlerFunc(func(rw http.ResponseWriter, r *http.Request) {
ctx := r.Context()

// For each request, infuse context with our snapshot of the FulcioConfig.
// TODO(mattmoor): Consider periodically (every minute?) refreshing the ConfigMap
// from disk, so that we don't need to cycle pods to pick up config updates.
// Alternately we could take advantage of Knative's configmap watcher.
ctx = config.With(ctx, cfg)
ctx = api.WithCA(ctx, baseca)
ctx = api.WithCTLogURL(ctx, viper.GetString("ct-log-url"))

h.ServeHTTP(rw, r.WithContext(ctx))
})
// Wrap the inner func with instrumentation to get latencies
// that get partitioned by 'code' and 'method'.
return promhttp.InstrumentHandlerDuration(
Copy link
Member

@lukehinds lukehinds Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the decorate handler mean fulcio would still function correctly without a running Prometheus instance present?What would happen to requests in that scenario?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nice part of the prometheus model is that it's pull based - exposing metrics with no instance is safe and a noop. A running collector would be needed to actually scrape these and do something useful with them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, good for me.

api.MetricLatency,
http.HandlerFunc(func(rw http.ResponseWriter, r *http.Request) {
ctx := r.Context()

// For each request, infuse context with our snapshot of the FulcioConfig.
// TODO(mattmoor): Consider periodically (every minute?) refreshing the ConfigMap
// from disk, so that we don't need to cycle pods to pick up config updates.
// Alternately we could take advantage of Knative's configmap watcher.
ctx = config.With(ctx, cfg)
ctx = api.WithCA(ctx, baseca)
ctx = api.WithCTLogURL(ctx, viper.GetString("ct-log-url"))

h.ServeHTTP(rw, r.WithContext(ctx))
}))
}

prom := http.Server{
Expand Down
3 changes: 2 additions & 1 deletion config/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ spec:
app: fulcio-server
type: LoadBalancer
ports:
- protocol: TCP
- name: http
protocol: TCP
port: 80
targetPort: 5555
3 changes: 2 additions & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ require (
github.com/mitchellh/mapstructure v1.4.3 // indirect
github.com/pkg/errors v0.9.1
github.com/prometheus/client_golang v1.11.0
github.com/prometheus/common v0.29.0 // indirect
github.com/prometheus/client_model v0.2.0
github.com/prometheus/common v0.29.0
github.com/prometheus/procfs v0.7.0 // indirect
github.com/sigstore/sigstore v1.0.1
github.com/spf13/cobra v1.2.1
Expand Down
2 changes: 1 addition & 1 deletion pkg/api/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ const SigstorePublicServerURL = "https://fulcio.sigstore.dev"

// Client is the interface for accessing the Fulcio API.
type Client interface {
// SigningCert sends the provided CertificateRequest to the /api/v1/singingCert
// SigningCert sends the provided CertificateRequest to the /api/v1/signingCert
// endpoint of a Fulcio API, authenticated with the provided bearer token.
SigningCert(cr CertificateRequest, token string) (*CertificateResponse, error)
}
Expand Down
2 changes: 1 addition & 1 deletion pkg/api/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,5 +29,5 @@ var (
MetricLatency = promauto.NewHistogramVec(prometheus.HistogramOpts{
Name: "fulcio_api_latency",
Help: "API Latency on calls",
}, []string{"path", "code"})
}, []string{"code", "method"})
)
116 changes: 116 additions & 0 deletions test/prometheus/main.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
// Copyright 2021 The Sigstore Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//

package main

import (
"flag"
"fmt"
"log"
"net/http"

dto "github.com/prometheus/client_model/go"
"github.com/prometheus/common/expfmt"
)

const (
latencyMetric = "fulcio_api_latency"
certMetric = "fulcio_new_certs"
)

func parseMF(url string) (map[string]*dto.MetricFamily, error) {
resp, err := http.Get(url) // nolint
if err != nil {
return nil, err
}
defer resp.Body.Close()

var parser expfmt.TextParser
return parser.TextToMetricFamilies(resp.Body)
}

func main() {
f := flag.String("url", "http://fulcio-server.fulcio-dev.svc:2112/metrics", "set url to fetch metrics from")
flag.Parse()

mf, err := parseMF(*f)
if err != nil {
log.Fatalf("Failed to fetch/parse metrics: %v", err)
}

// Just grab the api_latency metric, make sure it's a histogram
// and just make sure there is at least one 200, and no errors there.
latency, ok := mf[latencyMetric]
if !ok || latency == nil {
log.Fatal("Did not get fulcio_api_latency metric")
}
if err := checkLatency(latency); err != nil {
log.Fatalf("fulcio_api_latency metric failed: %s", err)
}

// Then make sure the cert counter went up.
certCount, ok := mf[certMetric]
if !ok || certCount == nil {
log.Fatal("Did not get fulcio_new_certs metric")
}
if err := checkCertCount(certCount); err != nil {
log.Fatalf("fulcio_new_certs metric failed: %s", err)
}
}

// Make sure latency is a Histogram, and it has a POST with a 201.
func checkLatency(latency *dto.MetricFamily) error {
if *latency.Type != *dto.MetricType_HISTOGRAM.Enum() {
return fmt.Errorf("Wrong type, wanted %+v, got: %+v", dto.MetricType_HISTOGRAM.Enum(), latency.Type)
}
if len(latency.Metric) != 1 {
return fmt.Errorf("Got multiple entries, or none for metric, wanted one, got: %+v", latency.Metric)
}
// Make sure there's a 'post' and it's a 201.
var code string
var method string
for _, value := range latency.Metric[0].Label {
if *value.Name == "code" {
code = *value.Value
}
if *value.Name == "method" {
method = *value.Value
}
}
if code != "201" {
return fmt.Errorf("unexpected code, wanted 201, got %s", code)
}
if method != "post" {
return fmt.Errorf("unexpected method, wanted post, got %s", method)
}

if *latency.Metric[0].Histogram.SampleCount != 1 {
return fmt.Errorf("Unexpected samplecount, wanted 1, got %d", *latency.Metric[0].Histogram.SampleCount)
}
return nil
}

func checkCertCount(certCount *dto.MetricFamily) error {
if *certCount.Type != *dto.MetricType_COUNTER.Enum() {
return fmt.Errorf("Wrong type, wanted %+v, got: %+v", dto.MetricType_COUNTER.Enum(), certCount.Type)
}
if len(certCount.Metric) != 1 {
return fmt.Errorf("Got multiple entries, or none for metric, wanted one, got: %+v", certCount.Metric)
}
if *certCount.Metric[0].Counter.Value < 1 {
return fmt.Errorf("Got incorrect cert count, wanted one, got: %f", *certCount.Metric[0].Counter.Value)
}
return nil
}