Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingester: fix wrong conversion of global into local limits #4238

Merged
merged 4 commits into from
Feb 22, 2023

Conversation

duricanikolic
Copy link
Contributor

What this PR does

This PR fixes a bug introduced by issue #4208: conversion of global to local limits didn't take into account the actual number of ingesters in the zone of interest, but assumed that ingesters were equally distributed in each zone. As a direct consequence, scaling down ingesters in a certain zone caused the limits of that zone to be too low. This PR fixes that problem.
For example, let's assume the following:

  • global limit (GL) is 1M,
  • replication factor (RF) is 3,
  • zone count (ZC) is 3, for example zone-a, zone-b, zone-c
  • number of ingesters per zone (IA = IB = IC) is 10

With the previous approach local limit would be calculated, for all 3 zones, as

GL * RF / (IA + IB + IC) = 1M * 3 / 30 = 100K.

Similarly, the new approach would calculate the same result for each zone

GL * RF / ZC / IA = GL * RF / ZC / IB = GL * RF / ZC / IC = 100K,

Suppose that we now scale down ingesters in zone-a (IA) to 5. The previous approach would, again, assign the same local limit to all 3 zones to

GL * RF / (IA + IB + IC) = 1M * 3 / (5 + 10 + 10) = 120K,

which is wrong. The right limits are calculated by the new approach like this:

  • for zone-a: GL * RF / ZC / IA = 1M * 3 / 3 / 5 = 200K
  • for zone-b and zone-c: GL * RF / ZC / IB = GL * RF / ZC / IC = 1M * 3 / 3 / 10 = 100K .

Which issue(s) this PR fixes or relates to

Fixes #4208

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@duricanikolic duricanikolic requested a review from a team as a code owner February 13, 2023 22:30
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I left some initial comments, cause I think the logic is not correct in case of zone-aware replication enabled and shuffle sharding (shard size is set).

pkg/ingester/limiter.go Outdated Show resolved Hide resolved
pkg/ingester/limiter_test.go Outdated Show resolved Hide resolved
pkg/ingester/limiter.go Show resolved Hide resolved
pkg/ingester/limiter.go Outdated Show resolved Hide resolved
pkg/ingester/limiter.go Outdated Show resolved Hide resolved
pkg/ingester/limiter.go Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic LGTM, thanks! I left few last nits 🙏 Since this logic is tricky and it's in the critical path, I would like another pair of 👀 : @pstibrany could you review it too, please?

CHANGELOG.md Outdated Show resolved Hide resolved
pkg/ingester/limiter.go Outdated Show resolved Hide resolved
pkg/ingester/limiter.go Outdated Show resolved Hide resolved
pkg/ingester/limiter.go Outdated Show resolved Hide resolved
@duricanikolic
Copy link
Contributor Author

The logic LGTM, thanks! I left few last nits 🙏 Since this logic is tricky and it's in the critical path, I would like another pair of 👀 : @pstibrany could you review it too, please?

The logic is indeed tricky, and I tried to cover all possible cases with a separate test. If @pstibrany finds a case that is not covered by the logic, we need to add the corresponding test as well

pkg/ingester/limiter.go Outdated Show resolved Hide resolved
pkg/ingester/limiter.go Outdated Show resolved Hide resolved
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I have a question about "healthiness", and suggestion for scale-up test cases.

pkg/ingester/limiter.go Outdated Show resolved Hide resolved
pkg/ingester/limiter.go Outdated Show resolved Hide resolved
ringZonesCount: 3,
healthyIngestersInZoneCount: 2, // in this zone ingesters are scaled down from 3 to 2
shardSize: 0,
expectedValue: 450, // (900 / 3 zones / 2 healthy ingesters per zone) * 3 replication factor
Copy link
Member

@pstibrany pstibrany Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this depend on whether distributors are allowed to extend writes or not? If one ingester is restarting (ie. it is unhealthy), do we want to temporarily increase limits in other two ingesters in the zone?

I am wondering if we should use "healthiness" concept at all when computing limits. Whether ingester is healthy or not depends on the operation, but we use two distinct write operations. (Update: "healthiness" doesn't change between Write and WriteNoExtend, but in general it depends on the operation.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I can answer this question. We would need @pracucci 's support. When I added the HealthyInstancesInZoneCount() function to Lifecycler, I initially counted all instances, but then @pracucci suggested that we should count only the healthy ones, so I implemented it this way.

Copy link
Member

@pstibrany pstibrany Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s consider an example. User has 1M series limit. We have 3 zones, each with 10 ingesters, and we use RF=3. So each ingester has 100k limit.

During rollout, we stop 5 ingesters at once in single zone. So there will be only 5 healthy ingesters left in the zone.
At this point limits at each remaining ingester in the same zone is recomputed to allow 200k series.
Mimir now always uses WriteNoExtend, so if user wasn’t hitting the 100k limit yet, this is fine. But if user was hitting 100k limit before, now we temporarily allow twice as many series for the user. After rollout in the zone finishes, per-ingester limit will be set back to 100k, but series will be in memory already.

Situation is similar when shuffle sharding is in use for the user. Let’s say user has 24 ingesters in total (ie. 8 per zone). Limit per ingester if all ingesters are healhty will be 125k. During rollout we restart 5 ingesters in one zone, and only 5 will be healthy. Since 5 is less than 8 requested ingesters from the zone, temporary limit on each ingester will be 200k again.

Note that shuffle shard doesn’t take healthiness into consideration, so we’re mixing healthy with ALL ingesters when doing the check.

Given that we always use WriteNoExtend now, I am not sure why we only consider healthy ingesters when computing the limit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked to Peter on Slack. I think the reason why we're not seeing it happening is because the ring client skips the entire zone if there are unhealthy instances, so we don’t write at all to ingesters in the zone we’re rollout out.

However, I'm 👍 with the change proposed by Peter, to count all instances not just healthy ones, given it looks more correct to me too.

Copy link
Contributor Author

@duricanikolic duricanikolic Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pstibrany @pracucci Does it mean we need to do the following things?

  1. update dskit by replacing its current HealthyInstancesInZoneCount() by InstancesInZoneCount() and by updating its implementation
  2. once it is merged, update this PR and use the new InstancesInZoneCount() method instead? Beside some syntactical changes (replacing "healthyIngesters" with "ingesters"), I don't see other things that should be changed here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks correct to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also need to add InstancesCount() in dskit to use when the zone-aware replication is disabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pstibrany @pracucci Could you please review grafana/dskit#270?

pkg/ingester/limiter_test.go Outdated Show resolved Hide resolved
pkg/ingester/limiter_test.go Outdated Show resolved Hide resolved
pkg/ingester/limiter_test.go Show resolved Hide resolved
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

pkg/ingester/limiter.go Outdated Show resolved Hide resolved
@pstibrany
Copy link
Member

This PR needs to be rebased before merging.

@duricanikolic duricanikolic requested a review from a team as a code owner February 21, 2023 15:16
@pracucci
Copy link
Collaborator

@duricanikolic Look at the PR's diff. As you can see there are changes not related to this PR. I think you merged instead of rebase. You should rebase main. Since you have a bunch of commits, the easiest way to rebase reducing conflicts may be squash all your commits into 1 commit, and then git rebase main.

Ingester: fix  wrong conversion of global into local limits

Upgrade CHANGELOG.md and add TODO in go.mod as a reminder to set the right version of dskit

Implementing review findings

Implementing review findings

Implementing review findings

Implementing review findings

Upgraded to the latest dskit
@duricanikolic
Copy link
Contributor Author

duricanikolic commented Feb 22, 2023

@duricanikolic Look at the PR's diff. As you can see there are changes not related to this PR. I think you merged instead of rebase. You should rebase main. Since you have a bunch of commits, the easiest way to rebase reducing conflicts may be squash all your commits into 1 commit, and then git rebase main.

Hi @pracucci,
I have updated my PR accordingly

// Global limit is equally distributed among all the active zones.
// The portion of global limit related to each zone is then equally distributed
// among all the ingesters belonging to that zone.
return ((globalLimit * l.replicationFactor) / zonesCount) / ingestersInZoneCount
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non credo sia corretto fare le operazioni di divisione in integer, perchè ad ogni operazione perdi precisione. Questo è il motivo per cui prima effettuavamo queste operazioni in floating point e poi convertivano in int il risultato.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for writing in italian. Let me translate it:

I don’t think it’s correct to do the division operations in integer, because with each operation you lose precision. This is why we previously performed these operations in floating point and then converted the result to int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was requested by @pstibrany. I will revert it so we ensure that we don't lose the precision

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

translation:

I don't think it's correct to do the division operations in integer, because with each operation you lose precision. This is why we first performed these operations in floating point and then converted the result to int.

Copy link
Member

@pstibrany pstibrany Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested to either simplify that expression and remove floats, or add a test to show why floats are required. While I agree about loss of precision, I doubt it will have any meaningful effect here in practice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duricanikolic I think Peter feels stronger than me on this, so I would get stick to what Peter suggested previously (int division). I would just add a comment explaining why it's not an issue ("Max difference between using floats or not is 1 / ingestersInZoneCount").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question for both @pracucci and @pstibrany: If we have a global limit 1000, replication factor 2 and number of zones 3, and let's say 1 ingester in that zone, should we return 666 or 667 as the result (1000 * 2 / 3)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answer is: "doeasn't matter". There's no practical difference between the two. Which is Peter's point. I think int divisions are always floored (like casting float to int) so probably it's 666.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duricanikolic I think Peter feels stronger than me on this, so I would get stick to what Peter suggested previously (int division).

No need to revert back.

A question for both @pracucci and @pstibrany: If we have a global limit 1000, replication factor 2 and number of zones 3, and let's say 1 ingester in that zone, should we return 666 or 667 as the result (1000 * 2 / 3)?

Both are fine. Returning ceil would benefit user -- allow them to write one extra series per ingester.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so today I am going to test this PR on mimir-dev-09, and once it is done I will let you know, so we can merge this.

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (modulo the nit on precision, sorry for the back and forth on this)

@pracucci pracucci merged commit bca6826 into main Feb 22, 2023
@pracucci pracucci deleted the yuri/issue-4208 branch February 22, 2023 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scaling down ingesters in 1 zone at a time conflicts with "max series per user" limit
4 participants