Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metaphone Traité vs Traités #28

Closed
thom4parisot opened this issue Apr 29, 2016 · 12 comments
Closed

Metaphone Traité vs Traités #28

thom4parisot opened this issue Apr 29, 2016 · 12 comments
Labels

Comments

@thom4parisot
Copy link

Hey!

I have been toying around with the metaphone (mostly because it was mentioned in the first examples related to the frequencies documentation page).

For some reason, I encounter this behaviour:

const metaphone = require('talisman/phonetics/metaphone').default;

> metaphone('Traité')
'TRT'
> metaphone('Traités')
'TRTS'

Is it the expected output? Is there a better phonetic algorithm for French language?

I use that mainly to find popular keywords within a corpus of text but maybe I should rather throw the docs into tfidf, split the corpus into words and make a map of term and frequencies from that.

Let me know :-)

@Yomguithereal
Copy link
Owner

Yomguithereal commented Apr 30, 2016

Hello @oncletom. This is indeed the expected output of the metaphone algorithm. While this algorithm is said to work with a lot of european languages, it behaves quite badly regarding French silent letters, and we have an awful whole lot of them. Unfortunately, the double-metaphone won't you help either.

There aren't many good phonetic algorithms for the French langugage. I found one here. I want to port it but I am not sure I can do it within this library because of licensing reasons. (Non-commercial CC against MIT).

Concerning your use case, I could port the French version of the Porter stemmer so you can match those cases of plural etc. Or else I can work on a French inflector but can you describe what you want to achieve in the end so I know if this make any sense?

@Yomguithereal
Copy link
Owner

Do you think something like this work?

@Yomguithereal
Copy link
Owner

Anyway, I found some French phonetic algorithms this afternoon. Will implement them soon along with the Porter stemmer. Should be ready next week.

@Yomguithereal
Copy link
Owner

@oncletom, there is now a working stemmer for the French language in the master branch if you want to give it a try.

@Yomguithereal
Copy link
Owner

I also added three phonetic algorithms for the French language (and will soon add two more).

@Yomguithereal
Copy link
Owner

Just added phonetics/french/sonnex for the French language (this one is quite good and with few tweaks more, it will handle most cases pretty well).

@Yomguithereal
Copy link
Owner

Hello @oncletom. News about this? Can I close the issue? I've added a lot more tools since last times. Don't hesitate to ask :)

@thom4parisot
Copy link
Author

@Yomguithereal hello there and sorry for this immense delay! I have started dedicating time again for writing so I am definitely going to have a better look at talisman improvements. Thank you so much for pinging me :-)

I have opened an issue on my side – I guess this one can be closed.

Will keep you posted anyway. Thanks again!

@Yomguithereal
Copy link
Owner

Hello @oncletom. How dost thou fare? Just beware because the library evolves a lot lately and documentation is not really up to date. Don't hesitate to check the changelog or ask me if something goes awry.

@drzraf
Copy link

drzraf commented Jun 19, 2017

Hijacking the issue?
Is there some place where false radicals should be reported.
Encountered this one during my first uses:
carry("tristesse") -> "tris" although it should be triste (or in the worse case trist)

Side note, using nodejs (4.8.2) user, I could load it "standalone" (amazing!)
nodejs --use_strict --harmony-destructuring -e 'var c = require("./carry.js"); console.log(c.carry("tendresse"));'
... at the price of changing the export default function carry ... to module.exports = { carry: function carry ... } (--harmony-modules flag didn't seem to make it)

@Yomguithereal
Copy link
Owner

Hello @drzraf.

carry("tristesse") -> "tris" although it should be triste (or in the worse case trist)

Do you say that because you know how this algorithm is supposed to work and I did something wrong when implementing the paper or do say that because you'd like the algorithm to be more accurate (something I cannot change without altering the integrity of the algorithm, but we could design another stemming algorithm for the French language altogether while forking the carry one).

Side note, using nodejs (4.8.2) user, I could load it "standalone" (amazing!)

Why don't you load it likewise?

require('talisman/stemmers/french/carry');

@drzraf
Copy link

drzraf commented Oct 15, 2017

@Yomguithereal STEP3 lacks many substitutions.
In the PDF there are almost 250 substitutions in this steps vs 9 in the javascript.
[edit] #137

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants