Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limit Elasticsearch client indexing error logs #40448

Merged
merged 5 commits into from
Aug 14, 2024

Conversation

AndersonQ
Copy link
Member

@AndersonQ AndersonQ commented Aug 7, 2024

Proposed commit message

rate limit Elasticsearch client indexing error logs

Use `elastic-agent-libs/periodic.Doer` to rate limit indexing error logs. This does not affects the logs sent to the event logger.

Checklist

  • My code follows the style guidelines of this project
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

None

How to test this PR locally

Start Filebeat with the following configuration

filebeat.inputs:
  - type: filestream
    id: filestream-input-id
    enabled: true
    parsers:
      - ndjson:
          target: ""
          overwrite_keys: true
          expand_keys: true
          add_error_key: true
          ignore_decoding_error: false
    paths:
      - /tmp/flog.log

output:
  elasticsearch:
    hosts:
      - localhost:9200 # a valid ES is needed
    protocol: https
    username: elastic
    password: changeme
    allow_older_versions: true
    ssl.verification_mode: none

logging:
  level: info
  event_data:
    files:
      name: filebeat-events-data # that's the default, change it if you want another name.

Create the log file /tmp/flog.log with the following content:

{"message":"foo bar","int":10,"string":"str"}
{"message":"another message","int":20,"string":"str2"}
{"message":"index failure","int":"not a number","string":10}
{"message":"second index failure","int":"not a number","string":10}
A broken JSON
  • check filebeat logs, instead of seeing a Cannot index event (status=STATUS): dropping event! for every dropped event, you should see Failed to index N events in last 10s: events were dropped!

Related issues

Logs

  • logs:
{"log.level":"warn","@timestamp":"2024-08-14T10:03:16.900+0200","log.logger":"elasticsearch","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/outputs/elasticsearch.NewClient.func3","file.name":"elasticsearch/client.go","file.line":175},"message":"Failed to index 2 events in last 10s: events were dropped! Look at the event log to view the event and cause.","service.name":"filebeat","ecs.version":"1.6.0"}
  • event log:
{"log.level":"warn","@timestamp":"2024-08-14T10:03:10.714+0200","log.logger":"elasticsearch","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/outputs/elasticsearch.(*Client).applyItemStatus","file.name":"elasticsearch/client.go","file.line":517},"message":"Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Meta:null, Fields:null, Private:interface {}(nil), TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:mapstr.M(nil)}, EncodedEvent:(*elasticsearch.encodedEvent)(0xc00199d980)} (status=400): {\"type\":\"document_parsing_exception\",\"reason\":\"[1:368] failed to parse field [int] of type [long] in document with id 'Gt7pT5EBJuRc3c2Nhw20'. Preview of field's value: 'not a number'\",\"caused_by\":{\"type\":\"illegal_argument_exception\",\"reason\":\"For input string: \\\"not a number\\\"\"}}, dropping event!","service.name":"filebeat","log.type":"event","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-08-14T10:03:10.714+0200","log.logger":"elasticsearch","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/outputs/elasticsearch.(*Client).applyItemStatus","file.name":"elasticsearch/client.go","file.line":517},"message":"Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Meta:null, Fields:null, Private:interface {}(nil), TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:mapstr.M(nil)}, EncodedEvent:(*elasticsearch.encodedEvent)(0xc00199da00)} (status=400): {\"type\":\"document_parsing_exception\",\"reason\":\"[1:48] failed to parse field [int] of type [long] in document with id 'G97pT5EBJuRc3c2Nhw20'. Preview of field's value: 'not a number'\",\"caused_by\":{\"type\":\"illegal_argument_exception\",\"reason\":\"For input string: \\\"not a number\\\"\"}}, dropping event!","service.name":"filebeat","log.type":"event","ecs.version":"1.6.0"}

@AndersonQ AndersonQ self-assigned this Aug 7, 2024
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Aug 7, 2024
@AndersonQ AndersonQ added enhancement backport-skip Skip notification from the automated backport with mergify Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team and removed needs_team Indicates that the issue/PR needs a Team:* label labels Aug 7, 2024
Copy link
Contributor

mergify bot commented Aug 7, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 40157-rate-limit-cannot-index-logs upstream/40157-rate-limit-cannot-index-logs
git merge upstream/main
git push upstream 40157-rate-limit-cannot-index-logs

@AndersonQ AndersonQ force-pushed the 40157-rate-limit-cannot-index-logs branch from 2084e8c to 67949a1 Compare August 7, 2024 08:25
@AndersonQ AndersonQ changed the title wip: rate limit ES client index error logs Rate limit Elasticsearch client indexing error logs Aug 7, 2024
Copy link
Contributor

mergify bot commented Aug 9, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 40157-rate-limit-cannot-index-logs upstream/40157-rate-limit-cannot-index-logs
git merge upstream/main
git push upstream 40157-rate-limit-cannot-index-logs

@AndersonQ AndersonQ force-pushed the 40157-rate-limit-cannot-index-logs branch 3 times, most recently from 47acb51 to 155122c Compare August 9, 2024 15:38
Use `periodic.Doer` to rate limit indexing error logs. This does not affects the logs sent to the event logger.
@AndersonQ AndersonQ force-pushed the 40157-rate-limit-cannot-index-logs branch from 155122c to df532fc Compare August 13, 2024 09:01
@AndersonQ AndersonQ marked this pull request as ready for review August 13, 2024 12:55
@AndersonQ AndersonQ requested review from a team as code owners August 13, 2024 12:55
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@pierrehilbert pierrehilbert requested review from faec and removed request for mauri870 August 13, 2024 14:22
@AndersonQ AndersonQ enabled auto-merge (squash) August 14, 2024 07:41
Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a suggestion to document why we did this.

observer := s.observer
if observer == nil {
observer = outputs.NewNilObserver()
}

log := logp.NewLogger("elasticsearch")

pLogDeadLetter := periodic.NewDoer(10*time.Second,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding a comment explaining why we are rate limiting these so future readers don't need to guess, and can avoid adding new per-event info logs.

@AndersonQ AndersonQ merged commit f010829 into elastic:main Aug 14, 2024
122 checks passed
@AndersonQ AndersonQ deleted the 40157-rate-limit-cannot-index-logs branch August 14, 2024 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rate limit "Cannot index event" log messages
4 participants