-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metaphone Traité vs Traités #28
Comments
Hello @oncletom. This is indeed the expected output of the metaphone algorithm. While this algorithm is said to work with a lot of european languages, it behaves quite badly regarding French silent letters, and we have an awful whole lot of them. Unfortunately, the double-metaphone won't you help either. There aren't many good phonetic algorithms for the French langugage. I found one here. I want to port it but I am not sure I can do it within this library because of licensing reasons. (Non-commercial CC against MIT). Concerning your use case, I could port the French version of the Porter stemmer so you can match those cases of plural etc. Or else I can work on a French inflector but can you describe what you want to achieve in the end so I know if this make any sense? |
Do you think something like this work? |
Anyway, I found some French phonetic algorithms this afternoon. Will implement them soon along with the Porter stemmer. Should be ready next week. |
@oncletom, there is now a working stemmer for the French language in the master branch if you want to give it a try. |
I also added three phonetic algorithms for the French language (and will soon add two more). |
Just added |
Hello @oncletom. News about this? Can I close the issue? I've added a lot more tools since last times. Don't hesitate to ask :) |
@Yomguithereal hello there and sorry for this immense delay! I have started dedicating time again for writing so I am definitely going to have a better look at talisman improvements. Thank you so much for pinging me :-) I have opened an issue on my side – I guess this one can be closed. Will keep you posted anyway. Thanks again! |
Hello @oncletom. How dost thou fare? Just beware because the library evolves a lot lately and documentation is not really up to date. Don't hesitate to check the changelog or ask me if something goes awry. |
Hijacking the issue? Side note, using nodejs (4.8.2) user, I could load it "standalone" (amazing!) |
Hello @drzraf.
Do you say that because you know how this algorithm is supposed to work and I did something wrong when implementing the paper or do say that because you'd like the algorithm to be more accurate (something I cannot change without altering the integrity of the algorithm, but we could design another stemming algorithm for the French language altogether while forking the carry one).
Why don't you load it likewise? require('talisman/stemmers/french/carry'); |
@Yomguithereal STEP3 lacks many substitutions. |
Hey!
I have been toying around with the metaphone (mostly because it was mentioned in the first examples related to the frequencies documentation page).
For some reason, I encounter this behaviour:
Is it the expected output? Is there a better phonetic algorithm for French language?
I use that mainly to find popular keywords within a corpus of text but maybe I should rather throw the docs into tfidf, split the corpus into words and make a map of term and frequencies from that.
Let me know :-)
The text was updated successfully, but these errors were encountered: