Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import-beats: copy ingest pipelines #270

Merged
merged 4 commits into from
Mar 18, 2020

Conversation

mtojek
Copy link
Contributor

@mtojek mtojek commented Mar 17, 2020

This PR adds support for ingest pipelines to "import-beats" script.

Issue: #221

@mtojek mtojek requested a review from ruflin March 17, 2020 13:10
@mtojek mtojek self-assigned this Mar 17, 2020
@@ -0,0 +1,32 @@
---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the pipeline is called default.yml or default.json we automatically fall back to it without any additional config in the manifest. Perhaps worth renaming it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's consider these two cases:

  1. Logstash
../beats/filebeat/module/logstash/slowlog/ingest/pipeline-json.yml
../beats/filebeat/module/logstash/slowlog/ingest/pipeline-plain.yml

Which one should called default.json? Should I rename it only if there is a single pipeline?

  1. Elasticsearch
../beats/filebeat/module/elasticsearch/slowlog/ingest/pipeline-json.yml
../beats/filebeat/module/elasticsearch/slowlog/ingest/pipeline-plaintext.yml
../beats/filebeat/module/elasticsearch/slowlog/ingest/pipeline.yml

Which one should be called default.yml?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of Elasticsearch, it is the pipeline.yml file as this is the one doing the routing. I had to dig into the LS one to see why it even works. It turns out this is an anti pattern from moving forward. It decides the ingest pipeline based on the config. Instead there should always be 1 pipeline. The LS module should be modified to use the ES pattern, otherwise it will not work with the packages.

@ph @ycombinator @exekias We should change the LS module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this has been on the cards for a while anyway: elastic/beats#9964.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Hey @mtojek, just to clarify, we still need elastic/beats#9964 to remain open, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Hey @mtojek, just to clarify, we still need elastic/beats#9964 to remain open, right?

Yes, would be great to fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ycombinator Yes, it now even becomes a must to change to migrate to packages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. @ruflin other than "ASAP" 😄 what's the timeline you need this to be fixed in?

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will you deal with the case where there is more then 1 ingest pipeline?

@mtojek
Copy link
Contributor Author

mtojek commented Mar 17, 2020

How will you deal with the case where there is more then 1 ingest pipeline?

In Elasticseach filebeat module, slowlog manifest.yml:

module_version: 1.0

var:
  - name: paths
    default:
      - /var/log/elasticsearch/*_index_search_slowlog.log
      - /var/log/elasticsearch/*_index_indexing_slowlog.log
      - /var/log/elasticsearch/*_index_search_slowlog.json
      - /var/log/elasticsearch/*_index_indexing_slowlog.json
    os.darwin:
      - /usr/local/var/lib/elasticsearch/*_index_search_slowlog.log
      - /usr/local/var/lib/elasticsearch/*_index_indexing_slowlog.log
      - /usr/local/var/lib/elasticsearch/*_index_search_slowlog.json
      - /usr/local/var/lib/elasticsearch/*_index_indexing_slowlog.json
    os.windows:
      - c:/ProgramData/Elastic/Elasticsearch/logs/*_index_search_slowlog.log
      - c:/ProgramData/Elastic/Elasticsearch/logs/*_index_indexing_slowlog.log
      - c:/ProgramData/Elastic/Elasticsearch/logs/*_index_search_slowlog.json
      - c:/ProgramData/Elastic/Elasticsearch/logs/*_index_indexing_slowlog.json

ingest_pipeline:
  - ingest/pipeline.yml
  - ingest/pipeline-plaintext.yml
  - ingest/pipeline-json.yml
input: config/slowlog.yml

Ingest pipeline is an array.

Let me compare with EPR's: packages/example/coredns/dataset/log/manifest.yml - ingest_pipeline - single value.

Do you have any preferences against selecting the default pipeline? Name? first in directory list?

@ruflin
Copy link
Member

ruflin commented Mar 17, 2020

EPM can have multiple pipelines, but just one is referenced in the manifest. The others are referenced inside the ingest pipelines itself.

@mtojek
Copy link
Contributor Author

mtojek commented Mar 17, 2020

EPM can have multiple pipelines, but just one is referenced in the manifest. The others are referenced inside the ingest pipelines itself.

I will implement the decision logic for selecting right ingest_pipeline while working on the next issue (input config). Hope that works for you.

Ignore this comment. I still need the logic to determine which file should be named default.json / default.yml.

@mtojek
Copy link
Contributor Author

mtojek commented Mar 17, 2020

Ignore this comment. I still need the logic to determine which file should be named default.json / default.yml.

Fixed.

@mtojek mtojek requested a review from ruflin March 17, 2020 16:27
@mtojek mtojek merged commit 7446a76 into elastic:master Mar 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants