Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker IP change issue #857

Closed
rgaudin opened this issue Oct 27, 2023 · 3 comments · Fixed by #858
Closed

Worker IP change issue #857

rgaudin opened this issue Oct 27, 2023 · 3 comments · Fixed by #858
Assignees

Comments

@rgaudin
Copy link
Member

rgaudin commented Oct 27, 2023

michaelblob had tons of tasks fail very early because its IP was not in the Wasabi whitelist.

  • we should not let a failing worker (considering that all its tasks are failing early) to continue to get new tasks.
  • we should harden the IP change process.

On that second point:

  • We've had an issue in the past, after the change to postgres which was linked to the new IP not being recorded to DB at time of generation of policy.
  • michaelblob's IP seems dynamic. 4d ago, it was already failing due to IP not in whitelist with 173.73.128.55. Now it's failing for same reason with IP 70.108.9.176.
  • New IP was already recorded in DB when I checked (17:30) but not in whitelist.
  • Whitelist had been modified on Oct 25, 2023, 2:09 AM. Don't know which worker triggered it. We should log that.
  • We limit each worker to 4 IP changes per day but I can't find any trace of this happening in grafana
  • Only Call to policy update was this one so it succeeded.

From those initial information it looks like a similar bug to the previous one (assuming michaelblob triggered that Oct25 change): we saw the IP change, recorded it but for some reason the IP is not in the updated list.
This matches with my record_ip_change("michaelblob") fixing it.

@benoit74
Copy link
Collaborator

At 2023-10-25T02:09:45 (UTC), just before the policy update, we have in the logs:

"PUT /v1/workers/michaelblob/check-in HTTP/1.1" 204 0 "-" "python-requests/2.31.0" "70.108.9.176"

So it looks like michaelblob worker triggered the policy update.

However, the new IP is NOT in the CreatePolicyVersion operation. Previous IP (I checked, it was still 173.73.128.55) is in the policy. So previous fix was not correct / sufficient.

I'm working on reproducing this in a test case.

@benoit74
Copy link
Collaborator

Fix almost ready, PR coming soon.

First part regarding "we should not let a failing worker (considering that all its tasks are failing early) to continue to get new tasks." should be moved to a specific issue from my PoV, it has been like this for long and is probably not a small change since we have many failing tasks currently for reasons not linked to worker setup.

@rgaudin
Copy link
Member Author

rgaudin commented Oct 30, 2023

Absolutely ; I opened this as a reminder late on Friday evening

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants