Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
promtail: ratelimiting by label #7597
promtail: ratelimiting by label #7597
Changes from all commits
18cf4f9
7507f8b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any rationale behind not to have
< MinReasonableMaxDistrinctLabels
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_distinct_labels
is a safeguard to prevent OOM in case of misconfiguration. There is no preallocation happening, so setting it less doesn't give you any benefit. Current limit "costs" ~1MiB in my non-scientific calculations when limit is hit and user need to increase it should they need more.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify this bit, user need to increase limit in case if number actively hit ratelimiters exceed it or they'll have innacurate ratelimiting (more entrics will go through then should have). Should some labels values be transient and NOT being in active use (lets say pod_name) , then it is perfectly fine for the system to reach max limit and get then GCed, there is no need to increase limit to accomodate for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth mentioning what is the difference between these two.
with this
Like @liguozhong already mentioned, even my first thought would be to use latter config instead of this new config.
I see you explained, with
by_label_name
we can get separate ratelimiter per label.Wondering what advantage does it serve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it allows you to have ratelimit per label without knowing labels value upfront. Lets say you can set ratelimit per namespace or app, but don't need to reconfigure loki every time new app comes into cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would our first generic code in Loki I believe. Will wait for few more eyes from contributors.
Can we make it bit more readable may be? failing to understand what are
newV
andgcCb
means here. Can we have may be better naming and some descriptive comment on those?If we going with this generic map then, a have few tests for this map would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just started to think why we need this Map in the first place? I see it's getting used only in one place and could be simple
map[string]*rate.Limiter
?. Trying to understand the rationale here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newV - new value
gcCb - garbage collector callback
I am happy to use any names you suggest for these or any other identifiers.
Purpose of this map is to keep maximum number of last used entries and GC the rest. Difference to LRU is that it doesn't pessimize common case of NOT reaching the limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add some mechanism to clean up oldgen after some time if it's untouched?
I think that it might be a case when we have a spike of values, so we populate
oldgen
, and after that we remove stabilise our values, but rateLimitters will live in oldgen forever...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially I could trigger gc on timer, but it doesn't feel right to do, because:
overall it doesn't worth it IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: what's the problem in using existing
prometheus.MustRegister
helper here? I would let it panic in case of duplicate registration rather than handling it viapromtheus.AlreadyRegisteredError
handling. Am I missing any specific use case here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function mimics
getDropcountMetric
frommatch.go
this stage used to use. Now that we need 2 metrics, I created this helper function keepin old behaviour exactly as is