Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/elasticsearch] Handle errors for 413 Entity Too Large #36163

Closed
mauri870 opened this issue Nov 4, 2024 · 4 comments · Fixed by #36396
Closed

[exporter/elasticsearch] Handle errors for 413 Entity Too Large #36163

mauri870 opened this issue Nov 4, 2024 · 4 comments · Fixed by #36396
Labels

Comments

@mauri870
Copy link
Contributor

mauri870 commented Nov 4, 2024

Component(s)

exporter/elasticsearch

Is your feature request related to a problem? Please describe.

The exporter has experimental batching support with the syncronous bulk inserter which can circunvent the request size limitation from Elasticsearch. Unfortunately if the batch is whithin the configured range but the documents are quite large (for example logs containing stacktraces) we can still hit the http.max_content_length, which is 100mb by default.

The underlying go-docappender library used by the exporter does not seem to handle this case, resorting to the client to handle it. I have opened an upstream issue to see if the library can do anything to help in this case.

Another problem is that error 413 is not returned as permanent, the client will be instructed to retry it with no effect.

Describe the solution you'd like

I believe the exporter should do a best effort to circunvent the 413, maybe by splitting the batch again internally to make it fit within the limit and then retry, if possible.

Describe alternatives you've considered

Extend the exporterhelper batcher to support batch splitting or by a max_size config.

Another option is to provide a sentinel error for Entity Too Large, allowing the receiver/processor to split the data and retry again. Currently if a 413 happens the returned error is:

flush failed (413): [413 Request Entity Too Large]

This error is specific to Elasticsearch, I believe we should have a more universal error that other exporters can return when facing a similar error.

Additional context

Relates to https://github.com/elastic/ingest-dev/issues/3677

@mauri870 mauri870 added enhancement New feature or request needs triage New item requiring triage labels Nov 4, 2024
Copy link
Contributor

github-actions bot commented Nov 4, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@mauri870
Copy link
Contributor Author

It appears that there is a similar, more general proposal available at open-telemetry/opentelemetry-collector#8272. Allowing the batcher config to have a max size in bytes seems to be a more straightforward solution.

@carsonip
Copy link
Contributor

I have opened #36396 to reduce the chance of 413 for sync bulk indexer (used when exporterhelper batcher is active), instead of handling the error, because there is no good way to handle such an error other than to send them in appropriately sized chunks in the first place.

@carsonip
Copy link
Contributor

/label -needs-triage

@github-actions github-actions bot removed the needs triage New item requiring triage label Nov 21, 2024
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this issue Dec 17, 2024
…Too Large (open-telemetry#36396)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

Limit the bulk request size to roughly `flush::bytes` for sync bulk
indexer.
Sync bulk indexer is used when `batcher::enabled` is either true or
false. In order words, sync bulk indexer is not used when batcher config
is undefined.
  Change `flush::bytes` to always measure in uncompressed bytes.
Change default `batcher::max_size_items` to `0` as bulk request size
limit is now more effectively enforced by `flush::bytes`.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes open-telemetry#36163

<!--Describe what testing was performed and which tests were added.-->
#### Testing

Modified BenchmarkExporter to run with `{name: "xxlarge_batch",
batchSize: 1000000},` and removed `batcher::max_size_items` and added a
log line for compressed and uncompressed buffer size to reproduce the
error.
```
logger.go:146: 2024-11-19T17:16:40.060Z	ERROR	Flush	{"s.bi.Len": 10382932, "s.bi.UncompressedLen": 532777786}
    logger.go:146: 2024-11-19T17:16:40.312Z	ERROR	bulk indexer flush error	{"error": "flush failed (413): [413 Request Entity Too Large] "}
```

With this PR, every flush logs and there is no error.
```
   logger.go:146: 2024-11-19T17:23:52.574Z	ERROR	Flush	{"s.bi.Len": 99148, "s.bi.UncompressedLen": 5000007}
```

<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->

---------

Co-authored-by: Christos Markou <chrismarkou92@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants