Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow plugins to register pre-configured tokenizers #24751

Merged
merged 8 commits into from
May 19, 2017

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented May 17, 2017

Allows plugins to register pre-configured tokenizers. Much of the decisions are the same as those in #24223, #24572, and #24223. This only migrates the lowercase tokenizer but I figure that is a good start because it proves out the features.

nik9000 added 5 commits May 17, 2017 07:38
one test wasn't really testing anything even though it looked like it was.
It tried to test caching but it failed at it. I'm not actually sure what
good the caching provides.

Adds unit test for registering tokenizers via plugins.
Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

/**
* Shared implementation for pre-configured analysis components.
*/
abstract class PreConfiguredAnalysisComponent<T> implements AnalysisModule.AnalysisProvider<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should be public since they should not be required to be under the o.e.i.analysis package?

tokenizers.add(PreConfiguredTokenizer.singleton("lowercase", LowerCaseTokenizer::new, () -> new TokenFilterFactory() {
@Override
public String name() {
return "lowercase";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if (as a followup) name() could be removed from TokenFilterFactory? It seems to only have 3 uses, one of which is a test. Then TokenFilterFactory could be a functional interface and this would look much cleaner:

tokenizers.add(PreConfiguredTokenizer.singleton("lowercase", LowerCaseTokenizer::new, LowerCaseTokenFilter::new));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Let me have a look later on.

@nik9000 nik9000 removed the WIP label May 19, 2017
@nik9000 nik9000 merged commit b9ea579 into elastic:master May 19, 2017
@nik9000 nik9000 deleted the preconfigured_tokenizer branch June 7, 2017 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants