Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Rename prospector to input #5944

Closed
wants to merge 63 commits into from

Conversation

ph
Copy link
Contributor

@ph ph commented Dec 21, 2017

Rename usage of prospector to input inside the code base to align with the wording or other product like Logstash and also its better name for UDP and TCP.

This PR use type aliasing to make the old types works with the new names, this mean it should be backward compatible with the community beats without requiring them to changes their code. They will need golang 1.9 to leverage that feature, we have guards in place to fails the build in other version.

Also the usage of prospectors in the configuration and in the module definition will work until 7.0.

TODO

  • Rename prospector usage in python scripts.
  • Real world testing
  • Create a PR for the documentation. related to Beats doc updates in 6.x #5632
  • squash better commit message
  • split into multiple PR.

@ph ph added Filebeat Filebeat in progress Pull request is currently in progress. refactoring labels Dec 21, 2017
@ph ph force-pushed the refactor/rename-prospector-to-input branch from 0b923cd to 0af0e45 Compare December 21, 2017 21:26
@ph
Copy link
Contributor Author

ph commented Dec 22, 2017

@ruflin After changing the name from prospector to input, I believe we should also change of harvester, It sound weird in the context of TCP/UDP and also redis. I don't have a proposal yet for that.

harvester | ˈhärvəstər |
noun
a person or machine that gathers crops as a harvest: back-breaking labor as hired sugar beet harvesters | the world's first mechanical harvester of grapes.
• a person who catches or kills animals for human consumption: coastal fish harvesters are always the first to notice the decline of fish stocks.
• a person or organization that collects or obtains a resource for future use: one of the world's biggest harvesters of the sun's energy.

@ruflin
Copy link
Collaborator

ruflin commented Dec 26, 2017

Agree. I think harvesters together with prospectors made kind of sense but removing one the other one has to follow too. But I would do this in separate steps.

I had in the past discussions with @urso about this. There are at least 2 cases here. In same cases what we have as harvester is not needed anymore. For example for stdin there is only 1 prospector, 1 harvester and we have a 1-1 relation ship. Potentially harvester can become obsolete here. For the log case we have 1-n. In the log harvester big parts are abtracted as readers. The readers can be reused across different prospector/input types like multiline etc. They are chained together. We discussed in the past to rename harvester to reader but I think the way we have reader at the moment is pretty nice and it would conflict here.

An option could be to call it listeners. It only fits partially with the file case. The reason I liked harvester is it describes filebeat going "actively" to the file and check for new data.

One additional note for the prospector / input. For the file case (and probably some others) there is a specific scanner phase which we discussed to extract in the past which is responsible to scan the file system. It potentially could be exchanged with a different scanner (inotify). Just to add some more names to the equation.

Let's keep it "simple" for now and do only the prospector -> input renaming and I'm pretty sure if we add more inputs some of the names become more obvious.

@ph
Copy link
Contributor Author

ph commented Jan 8, 2018

Agree lets do it in multiple steps, naming things are hard :)

@urso
Copy link

urso commented Jan 9, 2018

about harvester: From my point of view the prospectors are the inputs. If an input/prospector can handle the input directly, it should not be required to create another go-routine doing the actual work. The harvesters to date are basically just the workers of the log-input type -> therefore I'd just rename them to worker and even hide the fact we have workers from the user (it's an implementation detail, one could use select on FDs as well). If you really need a name collector or reader might do.

Copy link
Contributor

@dedemorton dedemorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at this because I wanted to see where you are at with the renaming. I've added a few minor comments about stuff I noticed while reading through the message and config-related changes.

# Each - is a prospector. Most options can be set at the prospector level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.
# List of input to fetch data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say inputs (plural)

#
# Possible options are:
# * log: Reads every line of the log file (default)
# * stdin: Reads the standard in

#------------------------------ Log prospector --------------------------------
#------------------------------ Log input --------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change this to say "Input" instead of "Log input" given the various types we now support?

@@ -980,13 +980,13 @@ def test_restart_state(self):
# Make sure all 4 states are persisted
self.wait_until(
lambda: self.log_contains(
"Prospector states cleaned up. Before: 4, After: 4", logfile="filebeat2.log"),
"input states cleaned up. Before: 4, After: 4", logfile="filebeat2.log"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be capitalized?

max_timeout=10)

# Wait until registry file is cleaned
self.wait_until(
lambda: self.log_contains(
"Prospector states cleaned up. Before: 0, After: 0", logfile="filebeat2.log"),
"input states cleaned up. Before: 0, After: 0", logfile="filebeat2.log"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous comment

@dedemorton dedemorton mentioned this pull request Jan 9, 2018
37 tasks
@ph
Copy link
Contributor Author

ph commented Jan 9, 2018

@dedemorton Thanks for the check :)

@ruflin
Copy link
Collaborator

ruflin commented Jan 12, 2018

@ph Ping me when you think this needs a review.

@ph
Copy link
Contributor Author

ph commented Jan 15, 2018

closing opening followup PRs.

@ph ph closed this Jan 15, 2018
@dedemorton dedemorton mentioned this pull request Mar 13, 2018
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Filebeat Filebeat in progress Pull request is currently in progress. refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants