Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log ingestion error on 206 response #2065

Merged
merged 11 commits into from
Jan 25, 2022
Merged

Conversation

trask
Copy link
Member

@trask trask commented Jan 21, 2022

Related to #2064, which we never noticed since we didn't log it.

Now the error in #2064 would be logged as:

2022-01-21 15:30:10.108-08:00 WARN  c.m.a.a.i.telemetry.TelemetryChannel - Sending telemetry to the ingestion service (telemetry will be stored to disk on failure and retried later): 109: Field 'message' on type 'ExceptionDetails' is required but missing or empty. Expected: string, Actual: undefined (future warnings will be aggregated and logged once every 5 minutes)

@trask
Copy link
Member Author

trask commented Jan 21, 2022

if you hide whitespace in the diff it will help

case 408: // REQUEST TIMEOUT
case 500: // INTERNAL SERVER ERROR
case 503: // SERVICE UNAVAILABLE
case 429: // TOO MANY REQUESTS
// TODO (heya) should we write to disk on any of these response codes?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure... if we don't write to disk, we should consider tracking how many requests we drop. Let's think about it.. need to come up with new property in customdimension.. it can be part of network statsbeat..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the todo to track data drop via statsbeat?

@trask trask changed the title Fix ingestion error on missing exception message Log ingestion error on 206 response Jan 22, 2022
@trask trask requested a review from heyams January 24, 2022 23:07
Comment on lines +305 to +314
case 408: // REQUEST TIMEOUT
case 429: // TOO MANY REQUESTS
case 500: // INTERNAL SERVER ERROR
case 503: // SERVICE UNAVAILABLE
operationLogger.recordFailure(
"received response code "
+ statusCode
+ " (telemetry will be stored to disk and retried later)");
onFailure.accept(true);
break;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the list of response codes that 2.x used to retry on, I think it's a good list, not sure when we diverged from it

@trask trask merged commit fbb276e into main Jan 25, 2022
@trask trask deleted the log-206-partial-failure-messages branch January 25, 2022 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants