Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributor: make OTLP endpoint return marshalled proto bytes as response body for errors #8227

Merged
merged 4 commits into from
May 31, 2024

Conversation

ying-jeanne
Copy link
Contributor

@ying-jeanne ying-jeanne commented May 30, 2024

What this PR does

Which issue(s) this PR fixes or relates to

Fixes #8185 , as discussed there are 2 possible ways to solve this:

  • pass a error handling function to handler, have 2 different implementation for otlp and normal push request
  • replicate the handler function for otlp and normal push request.

This PR is using the 2nd solution since it allows us to have more different logics for otlp endpoint in the future.

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@ying-jeanne ying-jeanne force-pushed the marshall-proto-otlp-response branch 3 times, most recently from 2e0b692 to b368560 Compare May 30, 2024 20:50
@aknuds1 aknuds1 self-requested a review May 31, 2024 04:50
Copy link
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting small change other than that looks good to me.

I do think that this will solve the issue but we still have to work more on the errors. For example returning this:

the incoming push request has been rejected because its message size is larger than the allowed limit of 140 bytes (err-mimir-distributor-max-write-message-size). To adjust the related limit, configure -distributor.max-recv-msg-size, or contact your service administrator.

To the user explains the error in Mimir but it doesnt explain how to fix it in the collector. This is the bridge that we have to fix. But lets do that in future PRs not on this one 👍

pkg/distributor/push.go Outdated Show resolved Hide resolved
@ying-jeanne
Copy link
Contributor Author

ying-jeanne commented May 31, 2024

Requesting small change other than that looks good to me.

I do think that this will solve the issue but we still have to work more on the errors. For example returning this:

the incoming push request has been rejected because its message size is larger than the allowed limit of 140 bytes (err-mimir-distributor-max-write-message-size). To adjust the related limit, configure -distributor.max-recv-msg-size, or contact your service administrator.

To the user explains the error in Mimir but it doesnt explain how to fix it in the collector. This is the bridge that we have to fix. But lets do that in future PRs not on this one 👍

Thanks @jesusvazquez for the review, I have this ticket to track all the remainings we needs in order to support otlp endpoint, attached another ticket with your description in it.

@ying-jeanne ying-jeanne force-pushed the marshall-proto-otlp-response branch 2 times, most recently from 6536385 to aeca033 Compare May 31, 2024 17:12
@ying-jeanne ying-jeanne force-pushed the marshall-proto-otlp-response branch from aeca033 to 566819d Compare May 31, 2024 17:29
@ying-jeanne ying-jeanne changed the title [WIP] OTLP: proto marshal response for otlp push request OTLP: proto marshal response for otlp push request May 31, 2024
@ying-jeanne ying-jeanne changed the title OTLP: proto marshal response for otlp push request Distributor: make OTLP endpoint return marshalled proto bytes as response body for 4xx/5xx errors May 31, 2024
@ying-jeanne ying-jeanne changed the title Distributor: make OTLP endpoint return marshalled proto bytes as response body for 4xx/5xx errors Distributor: make OTLP endpoint return marshalled proto bytes as response body for errors May 31, 2024
@ying-jeanne ying-jeanne marked this pull request as ready for review May 31, 2024 17:31
@ying-jeanne ying-jeanne requested review from grafanabot and a team as code owners May 31, 2024 17:31
Copy link
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM left a small note but is no blocker

@@ -169,7 +173,8 @@ func handler(
if resp, ok := httpgrpc.HTTPResponseFromError(err); ok {
code, msg = int(resp.Code), string(resp.Body)
} else {
code, msg = toHTTPStatus(ctx, err, limits), err.Error()
_, code = toGRPCHTTPStatus(ctx, err, limits)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want the grpc error code for the logline below.

Copy link
Contributor Author

@ying-jeanne ying-jeanne May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not used by normal mimir handler, normally we shouldn't need the grpc status code, i would just leave it as right now.

Copy link
Contributor

@johannaratliff johannaratliff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I'll defer to @jesusvazquez 's expertise on the primary functionality here being good

Copy link
Contributor

@56quarters 56quarters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to merge but there are a few changes I'd like to see made or ticketed as follow up work for next week.

pkg/distributor/push.go Outdated Show resolved Hide resolved
@@ -1178,7 +1186,7 @@ func TestHandler_ToHTTPStatus(t *testing.T) {
)
require.NoError(t, err)

status := toHTTPStatus(ctx, tc.err, limits)
_, status := toGRPCHTTPStatus(ctx, tc.err, limits)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be asserting the gRPC code too if we're using the same method to generate them and HTTP codes. Either that or we should split the error translation methods.

// http.StatusInternalServerError is returned.
func toHTTPStatus(ctx context.Context, pushErr error, limits *validation.Overrides) int {
func toGRPCHTTPStatus(ctx context.Context, pushErr error, limits *validation.Overrides) (codes.Code, int) {
if errors.Is(pushErr, context.DeadlineExceeded) {
Copy link
Contributor

@56quarters 56quarters May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this conditional isn't really needed since this is the default return at the bottom of the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me do it in a separate PR, since this is not related to the otlp change, don't want to break the code in push.

@ying-jeanne ying-jeanne enabled auto-merge (squash) May 31, 2024 23:49
@ying-jeanne ying-jeanne merged commit 8de824e into main May 31, 2024
29 checks passed
@ying-jeanne ying-jeanne deleted the marshall-proto-otlp-response branch May 31, 2024 23:58
// See doc https://opentelemetry.io/docs/specs/otlp/#failures-1
func writeErrorToHTTPResponseBody(w http.ResponseWriter, httpCode int, grpcCode codes.Code, msg string, logger log.Logger) {
// writeResponseFailedError would be returned when writeErrorToHTTPResponseBody fails to write the error to the response body.
writeResponseFailedBody, _ := proto.Marshal(grpcstatus.New(codes.Internal, "write error to response failed").Proto())
Copy link
Member

@pstibrany pstibrany Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always marshal this message even if it's not used. We could actually just marshal it once and reuse for every call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this in #8294.

if httpCode != 202 {
level.Error(logger).Log("msg", "push error", "err", err)
}
addHeaders(w, err, r, httpCode, retryCfg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding headers here is too late -- headers were already written to client by this time. It means responses will not have correct Retry-After headers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing this in #8294.

if resp, ok := httpgrpc.HTTPResponseFromError(err); ok {
// here the error would always be nil, since it is already checked in httpgrpc.HTTPResponseFromError
s, _ := grpcutil.ErrorToStatus(err)
writeErrorToHTTPResponseBody(w, int(resp.Code), s.Code(), string(resp.Body), logger)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we're passing HTTP status code as grpc code. Can OTLP clients handle that? (Abusing http codes as grpc codes is something we do in our code, but I don't think we should send it to clients expecting grpc response)

@pstibrany pstibrany mentioned this pull request Jun 6, 2024
3 tasks
narqo pushed a commit to narqo/grafana-mimir that referenced this pull request Jun 6, 2024
…onse body for errors (grafana#8227)

* OTLP: proto marshal response for otlp push request

* add changelog

* fix go.mod

* Address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

otlp: Mimir's OTLP endpoint to return marshalled proto bytes as response body
5 participants