-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[chore] configure controller prometheus endpoint for collecting metrics #1520
Conversation
Codecov Report
@@ Coverage Diff @@
## next #1520 +/- ##
==========================================
+ Coverage 46.13% 46.45% +0.31%
==========================================
Files 71 72 +1
Lines 6745 6822 +77
==========================================
+ Hits 3112 3169 +57
- Misses 3262 3282 +20
Partials 371 371
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Railgun should not be concerned with installation of Prometheus.
If you need Prometheus for local development, have you considered using some Prometheus helm chart or the Prometheus Operator?
Right, basically, they are the same tooling - prometheus. The advantage of this one is easy to configure/access. |
This will be used in next release. close atm. |
…troller into controller-prometheus
I wanted to point out that KTF now has the concept of
If you're looking to add prometheus to your testing environment, we could add an issue to KTF to add that as an available cluster addon and then maybe look into that after KIC 2.0 beta releases. |
As part of KIC2.0, we want use prometheus tooling controller performance comparing to the way proxy collecting data. |
Yes, that sounds good. Do you have any thoughts about adding a |
I would create an issue addressing the prometheus addon there. |
We're putting this on hold for now until all other items blocking the KIC 2.0 beta release are resolved - because #705 has lower priority than finishing KIC 2.0. |
close temporarily. |
Michal mentioned he wants to take a look at this, removing WIP atm. |
I think we should be fine to just close this and open up new PRs for a revised solution to #705 and #1497. This one has a rather confusing history since it was originally installing Prometheus and then moved to creating a new listener for it, whereas what we want now for #705 is to add new metrics into the existing kubebuilder metrics server. That server is already working and contains built-in metrics for the controller, so for example, at 10c0b42:
Class display is borked because lol v1beta1 still, but the two |
@shaneutt I am OK to rebasing them into internal directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the high level:
- This PR adds a Prometheus
/metrics
endpoint but Railgun already has a/metrics
endpoint - This PR adds a
/healthz
andreadyz
endpoint but Railgun already has both - This PR adds
pprof
endpoints but Railgun already has these - This PR adds a
/build
endpoint and I don't understand its purpose - This PR adds a
/stop
endpoint and I don't understand its purpose, plus I see it as a DoS security vulnerability
I think that the best way to add metrics (as described in the PR title and description) would be by publishing additional metrics on the /metrics
endpoint already exposed by controller-runtime.
railgun/manager/prometheus.go
Outdated
logger logr.Logger, | ||
wg *sync.WaitGroup) { | ||
defer wg.Done() | ||
mux.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Railgun has a healthz check already - here is a test that verifies that:
func TestHealthEndpoint(t *testing.T) { |
There's no need to add another one.
railgun/manager/prometheus.go
Outdated
w.WriteHeader(http.StatusOK) | ||
}) | ||
|
||
mux.Handle("/metrics", promhttp.Handler()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise, Railgun has a Prometheus metrics listener already, too:
func TestMetricsEndpoint(t *testing.T) { |
railgun/manager/prometheus.go
Outdated
|
||
mux.Handle("/metrics", promhttp.Handler()) | ||
|
||
mux.HandleFunc("/build", func(w http.ResponseWriter, r *http.Request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the use of this endpoint? why do we need it?
railgun/manager/prometheus.go
Outdated
} | ||
}) | ||
|
||
mux.HandleFunc("/stop", func(w http.ResponseWriter, r *http.Request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this? If there is no demonstrated need, I suggest dropping this:
- because YAGNI
- for security reasons: this allows everybody with raw HTTP access to KIC to perform a DoS attack by killing the controller.
railgun/manager/prometheus.go
Outdated
} | ||
}) | ||
|
||
if enableProfiling { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a diagnostics server exposing pprof:
func installProfilingHandlers(mux *http.ServeMux) { |
railgun/manager/run.go
Outdated
@@ -112,6 +112,7 @@ func Run(ctx context.Context, c *config.Config) error { | |||
} | |||
|
|||
go FlipKnativeController(mgr, proxy, &c.KnativeIngressEnabled, c, setupLog) | |||
go RunHTTP(setupLog) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other additional HTTP servers we have (that are not part of controller-runtime) are started under rootcmd.Run
and not under manager.Run
:
func Run(ctx context.Context, c *manager.Config) error { |
This means that, if we want to add more HTTP servers to Railgun, we should be doing that in rootcmd.Run
. But given the other comments (all the endpoints offered by the newly added HTTP server are either duplicates of services Railgun already offers, or raised my thoughts about their usefulness) we may well end up without that additional HTTP server.
railgun/internal/ctrlutils/utils.go
Outdated
const classSpec = "IngressClassName" | ||
const ( | ||
// classSpec indicates the fieldName for objects which support indicating their Ingress Class by spec | ||
classSpec = "IngressClassName" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused constant
c0d33f6
to
d424017
Compare
OK! I've made the rebase for you which includes the adjustments from |
Agree @mflendrich that publish needed metrics using controller-runtime would be the exact change I am going to work on. |
@rainest Got your point of re-use existing metrics and returns the data to the user from there. Sorry about the change of this PR, but I think the good thing is change based on review/comments is addressed accordingly. |
…troller into controller-prometheus
…netes-ingress-controller into controller-prometheus
Per offline slack and comments, close this PR and open a new one. |
What this PR does / why we need it:
Controller metrics - memory, cpu, threads, API response should be used to evaluate controller runtime.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Start an http server integrating with prometheus go library collecting controller metrics during runtime.
#705
#1278
Special notes for your reviewer:
PR Readiness Checklist:
Complete these before marking the PR as
ready to review
:CHANGELOG.md
release notes have been updated to reflect any significant (and particularly user-facing) changes introduced by this PR