Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote server failed to handle the request, will retry in a moment needs to be better handled #6287

Open
belimawr opened this issue Dec 11, 2024 · 2 comments · May be fixed by #6477
Open

Remote server failed to handle the request, will retry in a moment needs to be better handled #6287

belimawr opened this issue Dec 11, 2024 · 2 comments · May be fixed by #6477
Labels
good first issue Good for newcomers Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@belimawr
Copy link
Contributor

belimawr commented Dec 11, 2024

When trying to communicate with Fleet-Server (e.g: during enrol), Elastic-Agent might log the error

Remote server failed to handle the request, will retry in a moment

However this can be the result of 3 different HTTP status codes:

// temporaryServerErrorCodes defines status codes that allow clients to retry their request.
var temporaryServerErrorCodes = map[int]struct{}{
http.StatusBadGateway: {},
http.StatusServiceUnavailable: {},
http.StatusGatewayTimeout: {},
}

Bad Gateway and Gateway Timeout are likely connectivity problems, however Service Unavailable can be a real problem with Fleet-Server.

We need to be more clear in the logs which error has happened and likely better handle them individually. E.g: Bad Gateway might be a configuration issue, while Gateway Timeout is a connectivity issue. It might not make sense to retry Bad Gateway, while Gateway Timeout should be retried with an exponential backoff.

@jlind23 jlind23 added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Dec 11, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@cmacknz
Copy link
Member

cmacknz commented Dec 11, 2024

We should just return the exact HTTP status and any error message instead of abstracting it.

This made diagnosing elastic/fleet-server#4200 much harder than it needs to be.

@ycombinator ycombinator added the good first issue Good for newcomers label Dec 12, 2024
@Rohit-code14 Rohit-code14 linked a pull request Jan 4, 2025 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants