-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow plugins to register pre-configured tokenizers #24751
Conversation
one test wasn't really testing anything even though it looked like it was. It tried to test caching but it failed at it. I'm not actually sure what good the caching provides. Adds unit test for registering tokenizers via plugins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/** | ||
* Shared implementation for pre-configured analysis components. | ||
*/ | ||
abstract class PreConfiguredAnalysisComponent<T> implements AnalysisModule.AnalysisProvider<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this should be public since they should not be required to be under the o.e.i.analysis package?
tokenizers.add(PreConfiguredTokenizer.singleton("lowercase", LowerCaseTokenizer::new, () -> new TokenFilterFactory() { | ||
@Override | ||
public String name() { | ||
return "lowercase"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if (as a followup) name()
could be removed from TokenFilterFactory? It seems to only have 3 uses, one of which is a test. Then TokenFilterFactory could be a functional interface and this would look much cleaner:
tokenizers.add(PreConfiguredTokenizer.singleton("lowercase", LowerCaseTokenizer::new, LowerCaseTokenFilter::new));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Let me have a look later on.
We have an issue tracking it.
Allows plugins to register pre-configured tokenizers. Much of the decisions are the same as those in #24223, #24572, and #24223. This only migrates the
lowercase
tokenizer but I figure that is a good start because it proves out the features.