-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[apicast] prometheus metrics policy #5
Conversation
3f66e85
to
c100af5
Compare
apicast/config/cloud_hosted.lua
Outdated
@@ -2,12 +2,14 @@ local PolicyChain = require('apicast.policy_chain') | |||
local policy_chain = context.policy_chain | |||
|
|||
if not arg then -- {arg} is defined only when executing the CLI | |||
policy_chain:insert(PolicyChain.load_policy('cloud_hosted.metrics', '0.1', { log_level = 'warn' })) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now it records warnings too but for production we probably want it to be error
and above. Maybe an env var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, log_level should be manage by a ENV var.
This would be a policy specific ENV var?
ex: PROMETHEUS_LOG_LEVEL or system wide?
@@ -0,0 +1 @@ | |||
lua_capture_error_log 4k; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is enough 10 log entries.
So it will collect last 10 log entries with desired log level.
For example when you have between prometheus pulls 15 log entries like: 5 error, 10 warning then the error will not appear.
It is always good to set the desired log level to something we actually need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe this conf could be an env var? Because we certainly will have to play with the Prometheus scrape time and the log capture size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be when it would be part of the liquid template in the main repo.
But here it could not be templated.
And disregard my comment about 10 entries. It fits more.
Well I think 4k is fine when properly configured. We should not capture log levels we don't care about. So lets say there are top ones: emerg, alert, crit, error.
And we configure to capture error
and up. We would not really care if there are some missing higher levels, because the error itself is enough to trigger a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this is controlled by the "log_map" var? can we expose this one as ENV ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see so set_filter_level
will capture everything >=METRICS_LOG_LEVEL
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! :)
28eca89
to
f45985c
Compare
apicast/config/cloud_hosted.lua
Outdated
policy_chain:insert(PolicyChain.load_policy('cloud_hosted.rate_limit', '0.1', { | ||
limit = os.getenv('RATE_LIMIT') or 5, | ||
burst = os.getenv('RATE_LIMIT_BURST') or 50 }), 1) | ||
policy_chain:insert(PolicyChain.load_policy('cloud_hosted.balancer_blacklist', '0.1'), 1) | ||
end | ||
|
||
return { | ||
policy_chain = policy_chain | ||
policy_chain = policy_chain, | ||
ports = { metrics = 9100 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9421 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Thx!
147c051
to
09f65c9
Compare
👍 |
* export log stats
9c392cc
to
7472e92
Compare
7472e92
to
d2f09fc
Compare
and it is customizable by RATE_LIMIT_STATUS env variable
7f5e47e
to
908ea05
Compare
* so it is evaluated before loading configurations
[apicast] prometheus metrics policy
Here is the draft I have for alerts for APIcast based apps, the thresholds are not defined yet, so inputs are welcome. The time range of the queries is 5 minutes but totally flexible. Nginx Error Logs
Detect spikes in nginx error logs (error|crit|alert|emerg) for the last five minutes. Apicast HTTP status
Detect spikes in 5XX
Detect spikes in 4XX Dropped connections
Calculated dropped connections Request Processing Time (duration)I don't have a query for it yet but the idea is to extract the What do you think guys? |
Looks good to me 👍 |
depends on 3scale/APIcast#629
Example