-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add contractions for won't #952
Conversation
Thanks for giving this a go! I just had a look at this problem and it turned out that it was caused by a missing POS tag in the tokenizer exceptions. The exceptions for the verbs are already handled here, and they did have the correct lemma. But because they were missing a I added the missing tags and it should work properly now! |
Thanks @ines. I couldn't understand why it was not working, and started looking at the Tokenizer class. Thanks for pointing where I was supposed to be looking at :-) Will give it a try today. |
Arrived home, checked out the latest version
Then
Got:
Then downloaded the models
Am I missing anything? Maybe I should have executed something else, or there's something wrong with my code? |
Regarding lemmatisation to |
Thanks @f11r I am working on an application that would need the values to be matched against a dictionary. When I have have contractions, like "Let's", I need the values let and us to match against a list of words. For "Let's", the lemmas will have what I am looking for (i.e. [let, us]). But for "It", if I use the lemma, then I will get -PRON- - if I understand it correctly. In this case I would either have to think about another strategy, or maybe always use lemma and, if I find any word surrounded by dashes, fallback to using the text. |
Hi, I tried implementing the fix for this issue, but alas it did not work in my local environment.
The issue is that Won't & won't return lemma wo and nt. Whereas I would expect to get will not.
Saw a similar issue for Let's -> Let us, but it also didn't work for me. I wonder if there is any documentation on how to test the tokenizer exceptions before submitting pull requests? Anyway, feel free to close it and implement in a better way if necessary.
Cheers
Bruno
Types of changes
Checklist: