Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[configretry] max_elapsed_time cannot be set to 0 anymore because of new validation logic #9641

Closed
rnishtala-sumo opened this issue Feb 26, 2024 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@rnishtala-sumo
Copy link
Contributor

rnishtala-sumo commented Feb 26, 2024

Describe the bug
According to this change in v0.95.0, max_elapsed_time cannot be set to 0 which was a previously allowed value to attempt retry indefinitely.

Steps to reproduce
To prevent the sumologic exporter form ever dropping data that was successfully queued, set retry_on_failure.max_elapsed_time to 0.

What did you expect to see?
setting retry_on_failure.max_elapsed_time to 0 should be allowed to retry indefinitely

What did you see instead?
errors.New("'max_elapsed_time' must not be less than 'initial_interval'")

What version did you use?
v0.95.0

What config did you use?

exporters:
    sumologic:
        retry_on_failure:
            max_elapsed_time: 0
        sending_queue:
            enabled: true
            storage: file_storage
@rnishtala-sumo rnishtala-sumo added the bug Something isn't working label Feb 26, 2024
@rnishtala-sumo rnishtala-sumo changed the title [config/configretry] max_elapsed_time cannot be set to 0 anymore because of new validation logic [configretry] max_elapsed_time cannot be set to 0 anymore because of new validation logic Feb 26, 2024
@atoulme
Copy link
Contributor

atoulme commented Feb 26, 2024

That behavior is not defined in the docs: https://github.com/open-telemetry/opentelemetry-collector/blame/main/exporter/exporterhelper/README.md#L15
I do see in the godoc: https://github.com/open-telemetry/opentelemetry-collector/blob/main/config/configretry/backoff.go#L41

It'd be good to fix the README as part of this fix.

@TylerHelmuth
Copy link
Member

@rnishtala-sumo I am confused why the code you linked is the culprit. In your example config aren't MaxElapsedTime and InitialInterval set to 0? Also, for the config to take effect you need to set retry_on_failure.enabled = true.

@rnishtala-sumo
Copy link
Contributor Author

rnishtala-sumo commented Feb 26, 2024

@TylerHelmuth, all the other params are their default values. retry_on_failure.enabled is true by default and InitialInterval is 5s. Given the defaults, prior to the change in v0.95.0 one could set MaxElapsedTime to 0 to retry indefinitely.

@rnishtala-sumo
Copy link
Contributor Author

Below is the entire config, that worked in prior releases (< 0.95.0)

exporters:
    sumologic:
        retry_on_failure:
            max_elapsed_time: 0
        sending_queue:
            enabled: true
            storage: file_storage
processors:
    batch:
        send_batch_size: 1024
        timeout: 1s
    memory_limiter:
        check_interval: 5s
        limit_percentage: 75
        spike_limit_percentage: 20
    resource/e8c32e52-d626-4ea9-829a-c60d3c2a8800:
        attributes:
            - action: insert
              key: _source
              value: custom-test
            - action: insert
              key: _contentType
              value: OpenTelemetry
            - action: insert
              key: _sourceCategory
              value: otel/localfile
    resourcedetection/system:
        detectors:
            - system
        system:
            hostname_sources:
                - dns
                - os
receivers:
    filelog/e8c32e52-d626-4ea9-829a-c60d3c2a8800:
        encoding: utf-8
        include:
            - /tmp/test.log
            - /tmp/test1.log
            - /tmp/test2.log
        include_file_name: false
        include_file_path: true
        operators:
            - from: attributes["log.file.path"]
              to: resource["log.file.path"]
              type: move
            - from: resource["log.file.path"]
              to: resource["_sourceName"]
              type: copy
        start_at: end
        storage: file_storage
service:
    pipelines:
        logs/localfilesource/e8c32e52-d626-4ea9-829a-c60d3c2a8800:
            exporters:
                - sumologic
            processors:
                - memory_limiter
                - resourcedetection/system
                - batch
                - resource/e8c32e52-d626-4ea9-829a-c60d3c2a8800
            receivers:
                - filelog/e8c32e52-d626-4ea9-829a-c60d3c2a8800

In 0.95.0, we see the following error

Error: invalid configuration: exporters::sumologic: 'max_elapsed_time' must not be less than 'initial_interval'
2024/02/26 17:57:44 collector server run finished with error: invalid configuration: exporters::sumologic: 'max_elapsed_time' must not be less than 'initial_interval'

@atoulme
Copy link
Contributor

atoulme commented Feb 26, 2024

@rnishtala-sumo would you like to propose a patch? Should the issue be assigned to you? Please let us know.

@rnishtala-sumo
Copy link
Contributor Author

Yes, I can work on the issue and a patch for this would definitely help us upgrade to the next v0.95.x release.

@rnishtala-sumo
Copy link
Contributor Author

Raised the following PR for review

dmitryax pushed a commit that referenced this issue Feb 27, 2024
**Description:** [configretry] Allow max_elapsed_time to be set to 0

**Link to tracking Issue:**
#9641
@shashitessell
Copy link

@rnishtala-sumo in your filelog reciever config , you have specified start_at: end and you have given storage as well . Can you please elaborate me in detail . that if otel collector restarts from where it will start reading file again .

receivers:
    filelog/e8c32e52-d626-4ea9-829a-c60d3c2a8800:
        encoding: utf-8
        include:
            - /tmp/test.log
            - /tmp/test1.log
            - /tmp/test2.log
        include_file_name: false
        include_file_path: true
        operators:
            - from: attributes["log.file.path"]
              to: resource["log.file.path"]
              type: move
            - from: resource["log.file.path"]
              to: resource["_sourceName"]
              type: copy
        start_at: end
        storage: file_storage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants