Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make splunkhec exporter http idle timeout lower/configurable #20543

Closed
matthewmodestino opened this issue Apr 3, 2023 · 7 comments · Fixed by #20653
Closed

Make splunkhec exporter http idle timeout lower/configurable #20543

matthewmodestino opened this issue Apr 3, 2023 · 7 comments · Fixed by #20653
Assignees
Labels
enhancement New feature or request exporter/splunkhec

Comments

@matthewmodestino
Copy link

matthewmodestino commented Apr 3, 2023

Component(s)

exporter/splunkhec

Is your feature request related to a problem? Please describe.

The default http connection idle timeout is apparently hard coded at 30 seconds, which is higher than Splunk's, which is set by busyKeepAliveIdleTimeout and defaults to 12 seconds.

This causes "EOF" errors/retries on the collector side.

2022-04-05T12:55:27.607Z info exporterhelper/queued_retry.go:215 Exporting failed. Will retry the request after interval. {"kind": "exporter", "name": "splunk_hec/platform_logs", "error": "Post \"https://foo.bar.com:8088/services/collector\": EOF", "interval": "3.985412818s"}
2022-04-04T16:39:44.614Z    info    exporterhelper/queued_retry.go:218    Exporting failed. Will retry the request after interval.    {"kind": "exporter", "name": "splunk_hec/platform_logs", "error": "Post \"https://foo.bar.com:443/services/collector/event\": EOF", "interval": "4.623187485s"} 

This causes unnecessary retries and buffering on the OTel collector.

Describe the solution you'd like

I would like the http idle connection timer to be configurable, or set much lower, like we did in fluentd-hec, where the idle connection timer is 5 seconds, to explicitly avoid this situation.

https://github.com/splunk/fluent-plugin-splunk-hec#idle_timeout-integer

Describe alternatives you've considered

We have tried the workaround of raising the Splunk busyKeepAliveIdleTimeout to a value higher than OTel or the load balancer that may be between OTel agent and Splunk, and while it does help reduce the instances of EOF, it requires users to customize the default Splunk deployment and is not as easy as tuning the collector in many cases.

Additional context

This has been tracked in Splunk's distro here

@matthewmodestino matthewmodestino added enhancement New feature or request needs triage New item requiring triage labels Apr 3, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Apr 3, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@atoulme
Copy link
Contributor

atoulme commented Apr 3, 2023

This is not a proper response, but some context here:

Our default timeout is set to 10s: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/splunkhecexporter/factory.go#L69
The exporter takes advanced http settings as well: https://github.com/open-telemetry/opentelemetry-collector/blob/main/config/confighttp/README.md

@atoulme
Copy link
Contributor

atoulme commented Apr 3, 2023

It looks like the fix is to change the default idle timeout to 10s, (or at least make it configurable) - we can look into that.

@matthewmodestino
Copy link
Author

Ah! Good find! In fact all these settings likely play a role here:

max_idle_conns
max_idle_conns_per_host
max_conns_per_host
idle_conn_timeout

Will read up on them. I can override these settings as part of the "advanced configs"?

@atoulme
Copy link
Contributor

atoulme commented Apr 3, 2023

yes, here is how:

exporters:
  splunk_hec/foo:
    ...
    idle_conn_timeout: 10s

I have opened a PR to make default 10s instead of 30s.

@matthewmodestino
Copy link
Author

amazing, will try with a customer.

@Dylan-M
Copy link

Dylan-M commented May 22, 2024

I don't see these documented anywhere outside of this issue and the PRs. Was that intentional?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request exporter/splunkhec
Projects
None yet
3 participants