Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support For Multiple Languages #25

Merged
merged 16 commits into from
Sep 13, 2022
Merged

Add Support For Multiple Languages #25

merged 16 commits into from
Sep 13, 2022

Conversation

valencik
Copy link
Collaborator

@valencik valencik commented Sep 8, 2022

Adds support for multiple languages!
Just French and Spanish to start.

We do this using a collection of builders that subtypes of AnalyzerBuilder which is now an abstract class.
There's a common Config data type to enable some sharing of settings as seen in the MultiLingualPipeline example:

val base = AnalyzerBuilder.default.withLowerCasing.withASCIIFolding

val english = base.english.withPorterStemmer.tokenizer[IO]
val french = base.french.withFrenchLightStemmer.tokenizer[IO]
val spanish = base.spanish.withSpanishLightStemmer.tokenizer[IO]

All of english, french, and spanish will apply lowercasing and ascifolding.

@valencik valencik self-assigned this Sep 8, 2022
@valencik valencik changed the title Add other languages Add Support For Multiple Languages Sep 13, 2022
@valencik valencik marked this pull request as ready for review September 13, 2022 10:51
@valencik valencik merged commit 0dccf13 into main Sep 13, 2022
@valencik valencik deleted the other-langs branch September 13, 2022 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant