-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributor: make OTLP endpoint return marshalled proto bytes as response body for errors #8227
Conversation
2e0b692
to
b368560
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting small change other than that looks good to me.
I do think that this will solve the issue but we still have to work more on the errors. For example returning this:
the incoming push request has been rejected because its message size is larger than the allowed limit of 140 bytes (err-mimir-distributor-max-write-message-size). To adjust the related limit, configure -distributor.max-recv-msg-size, or contact your service administrator.
To the user explains the error in Mimir but it doesnt explain how to fix it in the collector. This is the bridge that we have to fix. But lets do that in future PRs not on this one 👍
Thanks @jesusvazquez for the review, I have this ticket to track all the remainings we needs in order to support otlp endpoint, attached another ticket with your description in it. |
b368560
to
48174f0
Compare
6536385
to
aeca033
Compare
aeca033
to
566819d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM left a small note but is no blocker
@@ -169,7 +173,8 @@ func handler( | |||
if resp, ok := httpgrpc.HTTPResponseFromError(err); ok { | |||
code, msg = int(resp.Code), string(resp.Body) | |||
} else { | |||
code, msg = toHTTPStatus(ctx, err, limits), err.Error() | |||
_, code = toGRPCHTTPStatus(ctx, err, limits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want the grpc error code for the logline below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not used by normal mimir handler, normally we shouldn't need the grpc status code, i would just leave it as right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM I'll defer to @jesusvazquez 's expertise on the primary functionality here being good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK to merge but there are a few changes I'd like to see made or ticketed as follow up work for next week.
pkg/distributor/push_test.go
Outdated
@@ -1178,7 +1186,7 @@ func TestHandler_ToHTTPStatus(t *testing.T) { | |||
) | |||
require.NoError(t, err) | |||
|
|||
status := toHTTPStatus(ctx, tc.err, limits) | |||
_, status := toGRPCHTTPStatus(ctx, tc.err, limits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be asserting the gRPC code too if we're using the same method to generate them and HTTP codes. Either that or we should split the error translation methods.
// http.StatusInternalServerError is returned. | ||
func toHTTPStatus(ctx context.Context, pushErr error, limits *validation.Overrides) int { | ||
func toGRPCHTTPStatus(ctx context.Context, pushErr error, limits *validation.Overrides) (codes.Code, int) { | ||
if errors.Is(pushErr, context.DeadlineExceeded) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this conditional isn't really needed since this is the default return at the bottom of the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me do it in a separate PR, since this is not related to the otlp change, don't want to break the code in push.
// See doc https://opentelemetry.io/docs/specs/otlp/#failures-1 | ||
func writeErrorToHTTPResponseBody(w http.ResponseWriter, httpCode int, grpcCode codes.Code, msg string, logger log.Logger) { | ||
// writeResponseFailedError would be returned when writeErrorToHTTPResponseBody fails to write the error to the response body. | ||
writeResponseFailedBody, _ := proto.Marshal(grpcstatus.New(codes.Internal, "write error to response failed").Proto()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always marshal this message even if it's not used. We could actually just marshal it once and reuse for every call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this in #8294.
if httpCode != 202 { | ||
level.Error(logger).Log("msg", "push error", "err", err) | ||
} | ||
addHeaders(w, err, r, httpCode, retryCfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding headers here is too late -- headers were already written to client by this time. It means responses will not have correct Retry-After
headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing this in #8294.
if resp, ok := httpgrpc.HTTPResponseFromError(err); ok { | ||
// here the error would always be nil, since it is already checked in httpgrpc.HTTPResponseFromError | ||
s, _ := grpcutil.ErrorToStatus(err) | ||
writeErrorToHTTPResponseBody(w, int(resp.Code), s.Code(), string(resp.Body), logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means we're passing HTTP status code as grpc code. Can OTLP clients handle that? (Abusing http codes as grpc codes is something we do in our code, but I don't think we should send it to clients expecting grpc response)
…onse body for errors (grafana#8227) * OTLP: proto marshal response for otlp push request * add changelog * fix go.mod * Address comments
What this PR does
Which issue(s) this PR fixes or relates to
Fixes #8185 , as discussed there are 2 possible ways to solve this:
This PR is using the 2nd solution since it allows us to have more different logics for otlp endpoint in the future.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.