-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingester: fix wrong conversion of global into local limits #4238
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! I left some initial comments, cause I think the logic is not correct in case of zone-aware replication enabled and shuffle sharding (shard size is set).
f5cdc41
to
1161274
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic LGTM, thanks! I left few last nits 🙏 Since this logic is tricky and it's in the critical path, I would like another pair of 👀 : @pstibrany could you review it too, please?
The logic is indeed tricky, and I tried to cover all possible cases with a separate test. If @pstibrany finds a case that is not covered by the logic, we need to add the corresponding test as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but I have a question about "healthiness", and suggestion for scale-up test cases.
pkg/ingester/limiter_test.go
Outdated
ringZonesCount: 3, | ||
healthyIngestersInZoneCount: 2, // in this zone ingesters are scaled down from 3 to 2 | ||
shardSize: 0, | ||
expectedValue: 450, // (900 / 3 zones / 2 healthy ingesters per zone) * 3 replication factor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this depend on whether distributors are allowed to extend writes or not? If one ingester is restarting (ie. it is unhealthy), do we want to temporarily increase limits in other two ingesters in the zone?
I am wondering if we should use "healthiness" concept at all when computing limits. Whether ingester is healthy or not depends on the operation, but we use two distinct write operations. (Update: "healthiness" doesn't change between Write
and WriteNoExtend
, but in general it depends on the operation.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I can answer this question. We would need @pracucci 's support. When I added the HealthyInstancesInZoneCount() function to Lifecycler, I initially counted all instances, but then @pracucci suggested that we should count only the healthy ones, so I implemented it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s consider an example. User has 1M series limit. We have 3 zones, each with 10 ingesters, and we use RF=3. So each ingester has 100k limit.
During rollout, we stop 5 ingesters at once in single zone. So there will be only 5 healthy ingesters left in the zone.
At this point limits at each remaining ingester in the same zone is recomputed to allow 200k series.
Mimir now always uses WriteNoExtend
, so if user wasn’t hitting the 100k limit yet, this is fine. But if user was hitting 100k limit before, now we temporarily allow twice as many series for the user. After rollout in the zone finishes, per-ingester limit will be set back to 100k, but series will be in memory already.
Situation is similar when shuffle sharding is in use for the user. Let’s say user has 24 ingesters in total (ie. 8 per zone). Limit per ingester if all ingesters are healhty will be 125k. During rollout we restart 5 ingesters in one zone, and only 5 will be healthy. Since 5 is less than 8 requested ingesters from the zone, temporary limit on each ingester will be 200k again.
Note that shuffle shard doesn’t take healthiness into consideration, so we’re mixing healthy with ALL ingesters when doing the check.
Given that we always use WriteNoExtend
now, I am not sure why we only consider healthy ingesters when computing the limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked to Peter on Slack. I think the reason why we're not seeing it happening is because the ring client skips the entire zone if there are unhealthy instances, so we don’t write at all to ingesters in the zone we’re rollout out.
However, I'm 👍 with the change proposed by Peter, to count all instances not just healthy ones, given it looks more correct to me too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pstibrany @pracucci Does it mean we need to do the following things?
- update
dskit
by replacing its currentHealthyInstancesInZoneCount()
byInstancesInZoneCount()
and by updating its implementation - once it is merged, update this PR and use the new
InstancesInZoneCount()
method instead? Beside some syntactical changes (replacing "healthyIngesters" with "ingesters"), I don't see other things that should be changed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks correct to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you also need to add InstancesCount()
in dskit to use when the zone-aware replication is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pstibrany @pracucci Could you please review grafana/dskit#270?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This PR needs to be rebased before merging. |
@duricanikolic Look at the PR's diff. As you can see there are changes not related to this PR. I think you merged instead of rebase. You should rebase |
Ingester: fix wrong conversion of global into local limits Upgrade CHANGELOG.md and add TODO in go.mod as a reminder to set the right version of dskit Implementing review findings Implementing review findings Implementing review findings Implementing review findings Upgraded to the latest dskit
60c922c
to
dbfbf09
Compare
Hi @pracucci, |
pkg/ingester/limiter.go
Outdated
// Global limit is equally distributed among all the active zones. | ||
// The portion of global limit related to each zone is then equally distributed | ||
// among all the ingesters belonging to that zone. | ||
return ((globalLimit * l.replicationFactor) / zonesCount) / ingestersInZoneCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non credo sia corretto fare le operazioni di divisione in integer, perchè ad ogni operazione perdi precisione. Questo è il motivo per cui prima effettuavamo queste operazioni in floating point e poi convertivano in int il risultato.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for writing in italian. Let me translate it:
I don’t think it’s correct to do the division operations in integer, because with each operation you lose precision. This is why we previously performed these operations in floating point and then converted the result to int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was requested by @pstibrany. I will revert it so we ensure that we don't lose the precision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
translation:
I don't think it's correct to do the division operations in integer, because with each operation you lose precision. This is why we first performed these operations in floating point and then converted the result to int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested to either simplify that expression and remove floats, or add a test to show why floats are required. While I agree about loss of precision, I doubt it will have any meaningful effect here in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duricanikolic I think Peter feels stronger than me on this, so I would get stick to what Peter suggested previously (int division). I would just add a comment explaining why it's not an issue ("Max difference between using floats or not is 1 / ingestersInZoneCount").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question for both @pracucci and @pstibrany: If we have a global limit 1000, replication factor 2 and number of zones 3, and let's say 1 ingester in that zone, should we return 666 or 667 as the result (1000 * 2 / 3)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answer is: "doeasn't matter". There's no practical difference between the two. Which is Peter's point. I think int divisions are always floored (like casting float to int) so probably it's 666.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duricanikolic I think Peter feels stronger than me on this, so I would get stick to what Peter suggested previously (int division).
No need to revert back.
A question for both @pracucci and @pstibrany: If we have a global limit 1000, replication factor 2 and number of zones 3, and let's say 1 ingester in that zone, should we return 666 or 667 as the result (1000 * 2 / 3)?
Both are fine. Returning ceil would benefit user -- allow them to write one extra series per ingester.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so today I am going to test this PR on mimir-dev-09, and once it is done I will let you know, so we can merge this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (modulo the nit on precision, sorry for the back and forth on this)
What this PR does
This PR fixes a bug introduced by issue #4208: conversion of global to local limits didn't take into account the actual number of ingesters in the zone of interest, but assumed that ingesters were equally distributed in each zone. As a direct consequence, scaling down ingesters in a certain zone caused the limits of that zone to be too low. This PR fixes that problem.
For example, let's assume the following:
GL
) is 1M,RF
) is 3,ZC
) is 3, for examplezone-a
,zone-b
,zone-c
IA
=IB
=IC
) is 10With the previous approach local limit would be calculated, for all 3 zones, as
Similarly, the new approach would calculate the same result for each zone
Suppose that we now scale down ingesters in
zone-a
(IA
) to 5. The previous approach would, again, assign the same local limit to all 3 zones towhich is wrong. The right limits are calculated by the new approach like this:
zone-a
:GL * RF / ZC / IA = 1M * 3 / 3 / 5 = 200K
zone-b
andzone-c
:GL * RF / ZC / IB = GL * RF / ZC / IC = 1M * 3 / 3 / 10 = 100K
.Which issue(s) this PR fixes or relates to
Fixes #4208
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]