Add contractions for won't #952

kinow · 2017-04-03T12:51:57Z

Hi, I tried implementing the fix for this issue, but alas it did not work in my local environment.

The issue is that Won't & won't return lemma wo and nt. Whereas I would expect to get will not.

Saw a similar issue for Let's -> Let us, but it also didn't work for me. I wonder if there is any documentation on how to test the tokenizer exceptions before submitting pull requests? Anyway, feel free to close it and implement in a better way if necessary.

Cheers
Bruno

Types of changes

[ X ] Bug fix (non-breaking change fixing an issue)
New feature (non-breaking change adding functionality to spaCy)
Breaking change (fix or feature causing change to spaCy's existing functionality)
Documentation (addition to documentation of spaCy)

Checklist:

My change requires a change to spaCy's documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
[ X ] All new and existing tests passed.

ines · 2017-04-03T16:17:14Z

Thanks for giving this a go!

I just had a look at this problem and it turned out that it was caused by a missing POS tag in the tokenizer exceptions. The exceptions for the verbs are already handled here, and they did have the correct lemma. But because they were missing a TAG, it was later overwritten by the lemmatizer.

I added the missing tags and it should work properly now!

kinow · 2017-04-03T21:28:58Z

Thanks @ines.

I couldn't understand why it was not working, and started looking at the Tokenizer class. Thanks for pointing where I was supposed to be looking at :-)

Will give it a try today.
Bruno

kinow · 2017-04-04T08:54:04Z

Arrived home, checked out the latest version

git log -n 1
commit 808cd6cf7f184e20d9b8e42364f7e10f045028dc
Author: ines <ines@ines.io>
Date:   Mon Apr 3 18:12:52 2017 +0200

    Add missing tags to verbs (resolves #948)

Then python setup.py build && python setup.py install. Executed a test code.

from spacy.en import English
from spacy.symbols import ORTH, LEMMA, POS

from pprint import pprint as pp

nlp = English()

text = "Don't you use NLP? Won't you need it? Let's use it !!!"

tokens = nlp(text)
for token in tokens:
    print("%s - %s - %s" % (token.text, token.lemma_, token.pos_))

Got:

Do - do - 
n't - not - ADV
you -  - 
use -  - 
NLP -  - 
? -  - 
Wo - will - VERB
n't - not - ADV
you -  - 
need -  - 
it -  - 
? -  - 
Let - let - 
's - -PRON- - 
use -  - 
it -  - 
! -  - 
! -  - 
! -  -

Then downloaded the models python3 -m spacy.en.download --force all, and re-executed the code:

Do - do - VERB
n't - not - ADV
you - -PRON- - PRON <---- ??? why the lemma here is not you?
use - use - VERB
NLP - nlp - PROPN
? - ? - PUNCT
Wo - will - VERB
n't - not - ADV
you - -PRON- - PRON <--- ditto
need - need - VERB
it - -PRON- - PRON <--- ditto
? - ? - PUNCT
Let - let - VERB
's - 's - PRON <----- why not us here?
use - use - VERB
it - -PRON- - PRON
! - ! - PUNCT
! - ! - PUNCT
! - ! - PUNCT

Am I missing anything? Maybe I should have executed something else, or there's something wrong with my code?

f11r · 2017-04-04T13:00:18Z

Regarding lemmatisation to -PRON- see: #906 and #898 (comment).

kinow · 2017-04-07T03:25:02Z

Thanks @f11r

I am working on an application that would need the values to be matched against a dictionary. When I have have contractions, like "Let's", I need the values let and us to match against a list of words.

For "Let's", the lemmas will have what I am looking for (i.e. [let, us]). But for "It", if I use the lemma, then I will get -PRON- - if I understand it correctly.

In this case I would either have to think about another strategy, or maybe always use lemma and, if I find any word surrounded by dashes, fallback to using the text.

Add contractions for won't

c0bdeee

ines closed this Apr 3, 2017

f11r mentioned this pull request Apr 7, 2017

Lemma_ for "I" returns weird value: -PRON- #962

Closed

ines added the lang / en English language data and models label Sep 26, 2017

snyk-bot mentioned this pull request Nov 5, 2020

[Snyk] Fix for 1 vulnerabilities meghasfdc/spaCy#15

Open

snyk-bot mentioned this pull request Sep 23, 2022

[Snyk] Fix for 54 vulnerabilities MarcelRaschke/spaCy#13

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add contractions for won't #952

Add contractions for won't #952

kinow commented Apr 3, 2017

ines commented Apr 3, 2017

kinow commented Apr 3, 2017

kinow commented Apr 4, 2017

f11r commented Apr 4, 2017

kinow commented Apr 7, 2017

Add contractions for won't #952

Add contractions for won't #952

Conversation

kinow commented Apr 3, 2017

Types of changes

Checklist:

ines commented Apr 3, 2017

kinow commented Apr 3, 2017

kinow commented Apr 4, 2017

f11r commented Apr 4, 2017

kinow commented Apr 7, 2017