[chore] configure controller prometheus endpoint for collecting metrics #1520

ccfishk · 2021-07-08T21:32:42Z

What this PR does / why we need it:
Controller metrics - memory, cpu, threads, API response should be used to evaluate controller runtime.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #
Start an http server integrating with prometheus go library collecting controller metrics during runtime.
#705
#1278

Special notes for your reviewer:

PR Readiness Checklist:

Complete these before marking the PR as ready to review:

the CHANGELOG.md release notes have been updated to reflect any significant (and particularly user-facing) changes introduced by this PR
[] able to provide runtime metrics using /metrics, /build, /debug/profile/ endpoints

codecov · 2021-07-08T21:35:27Z

Codecov Report

Merging #1520 (c8fdc40) into next (0a503e5) will increase coverage by 0.31%.
The diff coverage is 57.14%.

@@            Coverage Diff             @@
##             next    #1520      +/-   ##
==========================================
+ Coverage   46.13%   46.45%   +0.31%     
==========================================
  Files          71       72       +1     
  Lines        6745     6822      +77     
==========================================
+ Hits         3112     3169      +57     
- Misses       3262     3282      +20     
  Partials      371      371

Flag	Coverage Δ
integration-test	`46.45% <57.14%> (+0.31%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
internal/ctrlutils/utils.go	`93.75% <ø> (-2.09%)`	⬇️
internal/mgrutils/reports.go	`47.61% <0.00%> (-7.94%)`	⬇️
internal/manager/prometheus.go	`63.63% <63.63%> (ø)`
internal/manager/run.go	`49.29% <100.00%> (+1.46%)`	⬆️
internal/proxy/clientgo_cached_proxy_resolver.go	`61.58% <0.00%> (+1.32%)`	⬆️
...trollers/configuration/zz_generated_controllers.go	`34.63% <0.00%> (+2.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0a503e5...c8fdc40. Read the comment docs.

mflendrich

Railgun should not be concerned with installation of Prometheus.

If you need Prometheus for local development, have you considered using some Prometheus helm chart or the Prometheus Operator?

ccfishk · 2021-07-08T22:06:36Z

Railgun should not be concerned with installation of Prometheus.

If you need Prometheus for local development, have you considered using some Prometheus helm chart or the Prometheus Operator?

Right, basically, they are the same tooling - prometheus. The advantage of this one is easy to configure/access.

ccfishk · 2021-07-09T02:38:04Z

This will be used in next release. close atm.

…troller into controller-prometheus

shaneutt · 2021-07-11T14:37:01Z

I wanted to point out that KTF now has the concept of addons for clusters, which can be used for testing e.g.:

	metallb := metallbaddon.New()
	kong := kongaddon.New()
	builder := environment.NewBuilder().WithAddons(kong, metallb)
	env, err := builder.Build(ctx)

If you're looking to add prometheus to your testing environment, we could add an issue to KTF to add that as an available cluster addon and then maybe look into that after KIC 2.0 beta releases.

ccfishk · 2021-07-12T16:59:36Z

I wanted to point out that KTF now has the concept of addons for clusters, which can be used for testing e.g.:
	metallb := metallbaddon.New()
	kong := kongaddon.New()
	builder := environment.NewBuilder().WithAddons(kong, metallb)
	env, err := builder.Build(ctx)
If you're looking to add prometheus to your testing environment, we could add an issue to KTF to add that as an available cluster addon and then maybe look into that after KIC 2.0 beta releases.

As part of KIC2.0, we want use prometheus tooling controller performance comparing to the way proxy collecting data.

…troller into controller-prometheus

shaneutt · 2021-07-13T19:51:33Z

I wanted to point out that KTF now has the concept of addons for clusters, which can be used for testing e.g.:
	metallb := metallbaddon.New()
	kong := kongaddon.New()
	builder := environment.NewBuilder().WithAddons(kong, metallb)
	env, err := builder.Build(ctx)
If you're looking to add prometheus to your testing environment, we could add an issue to KTF to add that as an available cluster addon and then maybe look into that after KIC 2.0 beta releases.
As part of KIC2.0, we want use prometheus tooling controller performance comparing to the way proxy collecting data.

Yes, that sounds good. Do you have any thoughts about adding a prometheus addon to the KTF?

…troller into controller-prometheus

ccfishk · 2021-07-14T22:46:55Z

I wanted to point out that KTF now has the concept of addons for clusters, which can be used for testing e.g.:
	metallb := metallbaddon.New()
	kong := kongaddon.New()
	builder := environment.NewBuilder().WithAddons(kong, metallb)
	env, err := builder.Build(ctx)
If you're looking to add prometheus to your testing environment, we could add an issue to KTF to add that as an available cluster addon and then maybe look into that after KIC 2.0 beta releases.
As part of KIC2.0, we want use prometheus tooling controller performance comparing to the way proxy collecting data.
Yes, that sounds good. Do you have any thoughts about adding a prometheus addon to the KTF?

I would create an issue addressing the prometheus addon there.

mflendrich · 2021-07-21T15:17:47Z

We're putting this on hold for now until all other items blocking the KIC 2.0 beta release are resolved - because #705 has lower priority than finishing KIC 2.0.

ccfishk · 2021-07-26T11:33:26Z

close temporarily.

ccfishk · 2021-07-26T18:36:30Z

Michal mentioned he wants to take a look at this, removing WIP atm.

shaneutt · 2021-07-28T20:56:46Z

I was looking at this branch in terms of #1591 to see if I could proactively rebase it for you @ccfishk but the changes in here with the prometheus/ directory don't seem to fit with the root, let me know how you want to proceed and if you're OK with me rebasing them into a different directory?

rainest · 2021-07-28T23:02:18Z

I think we should be fine to just close this and open up new PRs for a revised solution to #705 and #1497. This one has a rather confusing history since it was originally installing Prometheus and then moved to creating a new listener for it, whereas what we want now for #705 is to add new metrics into the existing kubebuilder metrics server. That server is already working and contains built-in metrics for the controller, so for example, at 10c0b42:

$ kubectl port-forward trk-kong-c7d47997f-lx76q 10255:10255

15:53:55-0700 yagody $ http localhost:10255/metrics | grep 'controller="ingress"' | grep reconcile | grep result
controller_runtime_reconcile_total{controller="ingress",result="error"} 0
controller_runtime_reconcile_total{controller="ingress",result="requeue"} 0
controller_runtime_reconcile_total{controller="ingress",result="requeue_after"} 0
controller_runtime_reconcile_total{controller="ingress",result="success"} 2
15:54:16-0700 yagody $ kubectl get ing --all-namespaces
NAMESPACE   NAME             CLASS    HOSTS                ADDRESS   PORTS   AGE
bar         demo             <none>   *                              80      4m46s
default     trk-kong-proxy   <none>   proxy.kong.example             80      4m31s
foo         demo             <none>   *                              80      5m13s

Class display is borked because lol v1beta1 still, but the two demo Ingresses have class Kong.

ccfishk · 2021-07-28T23:06:01Z

I was looking at this branch in terms of #1591 to see if I could proactively rebase it for you @ccfishk but the changes in here with the prometheus/ directory don't seem to fit with the root, let me know how you want to proceed and if you're OK with me rebasing them into a different directory?

@shaneutt I am OK to rebasing them into internal directory.

mflendrich

On the high level:

This PR adds a Prometheus /metrics endpoint but Railgun already has a /metrics endpoint
This PR adds a /healthz and readyz endpoint but Railgun already has both
This PR adds pprof endpoints but Railgun already has these
This PR adds a /build endpoint and I don't understand its purpose
This PR adds a /stop endpoint and I don't understand its purpose, plus I see it as a DoS security vulnerability

I think that the best way to add metrics (as described in the PR title and description) would be by publishing additional metrics on the /metrics endpoint already exposed by controller-runtime.

mflendrich · 2021-07-29T05:40:39Z

railgun/manager/prometheus.go

+	logger logr.Logger,
+	wg *sync.WaitGroup) {
+	defer wg.Done()
+	mux.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {


Railgun has a healthz check already - here is a test that verifies that:

kubernetes-ingress-controller/test/integration/controller_test.go

Line 17 in bc19341

func TestHealthEndpoint(t *testing.T) {

There's no need to add another one.

mflendrich · 2021-07-29T05:41:13Z

railgun/manager/prometheus.go

+		w.WriteHeader(http.StatusOK)
+	})
+
+	mux.Handle("/metrics", promhttp.Handler())


likewise, Railgun has a Prometheus metrics listener already, too:

kubernetes-ingress-controller/test/integration/controller_test.go

Line 30 in bc19341

func TestMetricsEndpoint(t *testing.T) {

mflendrich · 2021-07-29T05:41:42Z

railgun/manager/prometheus.go

+
+	mux.Handle("/metrics", promhttp.Handler())
+
+	mux.HandleFunc("/build", func(w http.ResponseWriter, r *http.Request) {


what is the use of this endpoint? why do we need it?

mflendrich · 2021-07-29T05:41:59Z

railgun/manager/prometheus.go

+		}
+	})
+
+	mux.HandleFunc("/stop", func(w http.ResponseWriter, r *http.Request) {


why do we need this? If there is no demonstrated need, I suggest dropping this:

because YAGNI

for security reasons: this allows everybody with raw HTTP access to KIC to perform a DoS attack by killing the controller.

mflendrich · 2021-07-29T05:45:24Z

railgun/manager/prometheus.go

+		}
+	})
+
+	if enableProfiling {


We already have a diagnostics server exposing pprof:

kubernetes-ingress-controller/internal/diagnostics/server.go

Line 91 in bc19341

func installProfilingHandlers(mux *http.ServeMux) {

mflendrich · 2021-07-29T05:49:20Z

railgun/manager/run.go

@@ -112,6 +112,7 @@ func Run(ctx context.Context, c *config.Config) error {
 	}

 	go FlipKnativeController(mgr, proxy, &c.KnativeIngressEnabled, c, setupLog)
+	go RunHTTP(setupLog)


Other additional HTTP servers we have (that are not part of controller-runtime) are started under rootcmd.Run and not under manager.Run:

kubernetes-ingress-controller/internal/cmd/rootcmd/run.go

Line 10 in bc19341

func Run(ctx context.Context, c *manager.Config) error {

This means that, if we want to add more HTTP servers to Railgun, we should be doing that in rootcmd.Run. But given the other comments (all the endpoints offered by the newly added HTTP server are either duplicates of services Railgun already offers, or raised my thoughts about their usefulness) we may well end up without that additional HTTP server.

mflendrich · 2021-07-29T05:49:42Z

railgun/internal/ctrlutils/utils.go

-const classSpec = "IngressClassName"
+const (
+	// classSpec indicates the fieldName for objects which support indicating their Ingress Class by spec
+	classSpec  = "IngressClassName"


unused constant

shaneutt · 2021-07-29T12:42:44Z

I was looking at this branch in terms of #1591 to see if I could proactively rebase it for you @ccfishk but the changes in here with the prometheus/ directory don't seem to fit with the root, let me know how you want to proceed and if you're OK with me rebasing them into a different directory?

@shaneutt I am OK to rebasing them into internal directory.

OK! I've made the rebase for you which includes the adjustments from next look it over and make sure everything seems 👍 or let me know if you need any help 🖖

ccfishk · 2021-07-29T21:21:13Z

publishing additional metrics

On the high level:

This PR adds a Prometheus /metrics endpoint but Railgun already has a /metrics endpoint

This PR adds a /healthz and readyz endpoint but Railgun already has both

This PR adds pprof endpoints but Railgun already has these

This PR adds a /build endpoint and I don't understand its purpose

This PR adds a /stop endpoint and I don't understand its purpose, plus I see it as a DoS security vulnerability

I think that the best way to add metrics (as described in the PR title and description) would be by publishing additional metrics on the /metrics endpoint already exposed by controller-runtime.

Agree @mflendrich that publish needed metrics using controller-runtime would be the exact change I am going to work on.

ccfishk · 2021-07-29T21:29:33Z

I think we should be fine to just close this and open up new PRs for a revised solution to #705 and #1497. This one has a rather confusing history since it was originally installing Prometheus and then moved to creating a new listener for it, whereas what we want now for #705 is to add new metrics into the existing kubebuilder metrics server. That server is already working and contains built-in metrics for the controller, so for example, at 10c0b42:
$ kubectl port-forward trk-kong-c7d47997f-lx76q 10255:10255
15:53:55-0700 yagody $ http localhost:10255/metrics | grep 'controller="ingress"' | grep reconcile | grep result
controller_runtime_reconcile_total{controller="ingress",result="error"} 0
controller_runtime_reconcile_total{controller="ingress",result="requeue"} 0
controller_runtime_reconcile_total{controller="ingress",result="requeue_after"} 0
controller_runtime_reconcile_total{controller="ingress",result="success"} 2
15:54:16-0700 yagody $ kubectl get ing --all-namespaces
NAMESPACE   NAME             CLASS    HOSTS                ADDRESS   PORTS   AGE
bar         demo             <none>   *                              80      4m46s
default     trk-kong-proxy   <none>   proxy.kong.example             80      4m31s
foo         demo             <none>   *                              80      5m13s
Class display is borked because lol v1beta1 still, but the two demo Ingresses have class Kong.

@rainest Got your point of re-use existing metrics and returns the data to the user from there.

Sorry about the change of this PR, but I think the good thing is change based on review/comments is addressed accordingly.

…troller into controller-prometheus

…netes-ingress-controller into controller-prometheus

ccfishk · 2021-07-30T14:54:10Z

Per offline slack and comments, close this PR and open a new one.

prometheus setup for controller

5c1a339

ccfishk requested a review from a team as a code owner July 8, 2021 21:32

ccfishk temporarily deployed to Configure ci July 8, 2021 21:32 Inactive

ccfishk changed the title ~~[kic-2.0] prometheus setup for controller~~ [kic-2.0] configure prometheus for controller collecting cpu/memory/network Jul 8, 2021

ccfishk self-assigned this Jul 8, 2021

ccfishk added the area/maintenance Cleanup, refactoring, and other maintenance improvements that don't change functionality. label Jul 8, 2021

ccfishk added this to the Blockers for cutting KIC 2.0-beta.1 milestone Jul 8, 2021

github-actions bot added the ci/license/unchanged label Jul 8, 2021

ccfishk changed the title ~~[kic-2.0] configure prometheus for controller collecting cpu/memory/network~~ [kic-2.0] configure prometheus for controller collecting cpu/memory/network metrics Jul 8, 2021

mflendrich reviewed Jul 8, 2021

View reviewed changes

ccfishk mentioned this pull request Jul 8, 2021

Analyze 2.x config update frequency and fluctuation #1519

Closed

3 tasks

Merge branch 'next' of https://github.com/Kong/kubernetes-ingress-con…

012baa8

…troller into controller-prometheus

ccfish2 added 2 commits July 12, 2021 10:23

Merge branch 'next' of https://github.com/Kong/kubernetes-ingress-con…

cf7353f

…troller into controller-prometheus

Merge branch 'next' of https://github.com/Kong/kubernetes-ingress-con…

f671a3d

…troller into controller-prometheus

ccfish2 added 2 commits July 14, 2021 09:21

Merge branch 'next' of https://github.com/Kong/kubernetes-ingress-con…

6841885

…troller into controller-prometheus

Add prometheus library and remove yaml for less dependency

c0d33f6

ccfishk had a problem deploying to Configure ci July 14, 2021 18:52 Failure

ccfishk temporarily deployed to Configure ci July 14, 2021 18:52 Inactive

ccfishk added area/feature New feature or request priority/medium size/small and removed area/maintenance Cleanup, refactoring, and other maintenance improvements that don't change functionality. labels Jul 14, 2021

ccfishk changed the title ~~[kic-2.0] configure prometheus for controller collecting cpu/memory/network metrics~~ [kic-2.0] configure controller prometheus endpoint for collecting metrics Jul 14, 2021

ccfishk changed the title ~~[kic-2.0] configure controller prometheus endpoint for collecting metrics~~ [chore] configure controller prometheus endpoint for collecting metrics Jul 14, 2021

ccfishk closed this Jul 26, 2021

ccfishk changed the title ~~[chore] configure controller prometheus endpoint for collecting metrics~~ [wip][chore] configure controller prometheus endpoint for collecting metrics Jul 26, 2021

ccfishk reopened this Jul 26, 2021

ccfishk changed the title ~~[wip][chore] configure controller prometheus endpoint for collecting metrics~~ [chore] configure controller prometheus endpoint for collecting metrics Jul 26, 2021

ccfishk linked an issue Jul 27, 2021 that may be closed by this pull request

Prometheus metrics for the controller #705

Closed

4 tasks

ccfishk mentioned this pull request Jul 27, 2021

Prometheus needs to scrape 2 addresses in a pod #1497

Closed

3 tasks

mflendrich suggested changes Jul 29, 2021

View reviewed changes

ccfish2 added 2 commits July 29, 2021 08:39

prometheus setup for controller

c344db0

Add prometheus library and remove yaml for less dependency

d424017

shaneutt force-pushed the controller-prometheus branch from c0d33f6 to d424017 Compare July 29, 2021 12:41

ccfish2 added 2 commits July 29, 2021 14:40

Merge branch 'next' of https://github.com/Kong/kubernetes-ingress-con…

7172654

…troller into controller-prometheus

Merge branch 'controller-prometheus' of https://github.com/Kong/kuber…

c8fdc40

…netes-ingress-controller into controller-prometheus

ccfishk closed this Jul 30, 2021

shaneutt deleted the controller-prometheus branch October 27, 2021 21:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[chore] configure controller prometheus endpoint for collecting metrics #1520

[chore] configure controller prometheus endpoint for collecting metrics #1520

ccfishk commented Jul 8, 2021 •

edited

Loading

codecov bot commented Jul 8, 2021 •

edited

Loading

mflendrich left a comment

ccfishk commented Jul 8, 2021

ccfishk commented Jul 9, 2021

shaneutt commented Jul 11, 2021 •

edited

Loading

ccfishk commented Jul 12, 2021

shaneutt commented Jul 13, 2021

ccfishk commented Jul 14, 2021

mflendrich commented Jul 21, 2021 •

edited

Loading

ccfishk commented Jul 26, 2021

ccfishk commented Jul 26, 2021

shaneutt commented Jul 28, 2021

rainest commented Jul 28, 2021

ccfishk commented Jul 28, 2021

mflendrich left a comment

mflendrich Jul 29, 2021

mflendrich Jul 29, 2021

mflendrich Jul 29, 2021

mflendrich Jul 29, 2021

mflendrich Jul 29, 2021

mflendrich Jul 29, 2021

mflendrich Jul 29, 2021

shaneutt commented Jul 29, 2021

ccfishk commented Jul 29, 2021

ccfishk commented Jul 29, 2021

ccfishk commented Jul 30, 2021


		mux.Handle("/metrics", promhttp.Handler())

		mux.HandleFunc("/build", func(w http.ResponseWriter, r *http.Request) {

[chore] configure controller prometheus endpoint for collecting metrics #1520

[chore] configure controller prometheus endpoint for collecting metrics #1520

Conversation

ccfishk commented Jul 8, 2021 • edited Loading

codecov bot commented Jul 8, 2021 • edited Loading

Codecov Report

mflendrich left a comment

Choose a reason for hiding this comment

ccfishk commented Jul 8, 2021

ccfishk commented Jul 9, 2021

shaneutt commented Jul 11, 2021 • edited Loading

ccfishk commented Jul 12, 2021

shaneutt commented Jul 13, 2021

ccfishk commented Jul 14, 2021

mflendrich commented Jul 21, 2021 • edited Loading

ccfishk commented Jul 26, 2021

ccfishk commented Jul 26, 2021

shaneutt commented Jul 28, 2021

rainest commented Jul 28, 2021

ccfishk commented Jul 28, 2021

mflendrich left a comment

Choose a reason for hiding this comment

mflendrich Jul 29, 2021

Choose a reason for hiding this comment

mflendrich Jul 29, 2021

Choose a reason for hiding this comment

mflendrich Jul 29, 2021

Choose a reason for hiding this comment

mflendrich Jul 29, 2021

Choose a reason for hiding this comment

mflendrich Jul 29, 2021

Choose a reason for hiding this comment

mflendrich Jul 29, 2021

Choose a reason for hiding this comment

mflendrich Jul 29, 2021

Choose a reason for hiding this comment

shaneutt commented Jul 29, 2021

ccfishk commented Jul 29, 2021

ccfishk commented Jul 29, 2021

ccfishk commented Jul 30, 2021

ccfishk commented Jul 8, 2021 •

edited

Loading

codecov bot commented Jul 8, 2021 •

edited

Loading

shaneutt commented Jul 11, 2021 •

edited

Loading

mflendrich commented Jul 21, 2021 •

edited

Loading