doc.noun_chunks Sentence Length Bug #693

shiredude95 · 2016-12-20T08:41:41Z

Operating System: Ubuntu 15.04
Python Version Used: 3.5.1
spaCy Version Used: 1.4.0
Environment Information: 64 Bit System,4GB Ram

doc.noun_chunks doesn't parse the complete sentence.

Test1:

from spacy.en import English
nlp = English()
doc=nlp("the TopTown International Airport Board and Goodwill Space Exploration Partnership.")
for chunk in doc.noun_chunks:
    print(chunk)

Produces the output:

the TopTown International Airport Board

But Test2:

from spacy.en import English
nlp = English()
doc=nlp("the Goodwill Space Exploration Partnership and TopTown International Airport Board.")
for chunk in doc.noun_chunks:
    print(chunk)

Produces Output:

the Goodwill Space Exploration Partnership

Although both are identified properly they are done so only when they come early in a sentence and are ignored when they appear near the end.

The text was updated successfully, but these errors were encountered:

honnibal · 2016-12-23T13:53:37Z

It looks like the noun chunk detection rules could be improved here. The issue comes from the combination of coordination and proper nouns:

    def english_noun_chunks(obj):
        '''Detect base noun phrases from a dependency parse.
        Works on both Doc and Span.'''
        labels = ['nsubj', 'dobj', 'nsubjpass', 'pcomp', 'pobj',
                  'attr', 'ROOT', 'root']
        doc = obj.doc # Ensure works on both Doc and Span.
        np_deps = [doc.vocab.strings[label] for label in labels]
        conj = doc.vocab.strings['conj']
        np_label = doc.vocab.strings['NP']
        for i, word in enumerate(obj):
            if word.pos in (NOUN, PROPN, PRON) and word.dep in np_deps:
                yield word.left_edge.i, word.i+1, np_label
            elif word.pos == NOUN and word.dep == conj:
                head = word.head
                while head.dep == conj and head.head.i < head.i:
                    head = head.head
                # If the head is an NP, and we're coordinated to it, we're an NP
                if head.dep in np_deps:
                    yield word.left_edge.i, word.i+1, np_label

I think the correction should be:

elif word.pos in (NOUN, PROPN) and word.dep == conj:

lock · 2018-05-09T00:38:47Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

honnibal added the performance label Dec 23, 2016

ines added this to the Debug parser transition system milestone Feb 18, 2017

ines added a commit that referenced this issue Mar 18, 2017

Add regression test for #693

ad934a9

honnibal closed this as completed in cc36c30 Apr 7, 2017

lock bot locked as resolved and limited conversation to collaborators May 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc.noun_chunks Sentence Length Bug #693

doc.noun_chunks Sentence Length Bug #693

shiredude95 commented Dec 20, 2016 •

edited

Loading

honnibal commented Dec 23, 2016 •

edited

Loading

lock bot commented May 9, 2018

doc.noun_chunks Sentence Length Bug #693

doc.noun_chunks Sentence Length Bug #693

Comments

shiredude95 commented Dec 20, 2016 • edited Loading

honnibal commented Dec 23, 2016 • edited Loading

lock bot commented May 9, 2018

shiredude95 commented Dec 20, 2016 •

edited

Loading

honnibal commented Dec 23, 2016 •

edited

Loading