`._.labels` doesn't work for spans with length of one #91

LawlAoux · 2022-04-26T10:19:17Z

For some reason, when the span has a length of one, ._.labels returns an empty tuple. I would expect it to return the part of speech of the individual word (which can be done by taking the token of the word in the span and then taking tag_.

Reproduction:

import spacy, benepar
nlp = spacy.load('en_core_web_md')
nlp.add_pipe("benepar", config={"model": "benepar_en3"})
doc = nlp("Tuesday morning")
sent = tuple(doc.sents)[0]
first_child = tuple(sent._.children)[0]
pos = first_child._.labels

From this code pos will be an empty tuple, but I would expect it to be equal to first_child[0].tag_ which is "NNP"

The text was updated successfully, but these errors were encountered:

burak0006 · 2022-08-23T12:01:38Z

I encountered the same problem. Even couldn't iterate through ._.parse_string as it is a nested complicated structure with parenthesis

anmolagarwal999 · 2022-10-02T16:51:03Z

@burak0006 @LawlAoux
You can make use of the less complicated version of the parsed string at the leaf to solve this issue.

all_tokens = self.span_obj._.parse_string.split("(")
label = all_tokens[1].split(" ")[0]

Eg:

Here, the parsed strings at the leafs are:

(NN Stock)
(NNS prices)
(VBD soared)
........

badvision · 2022-11-25T15:35:51Z

I also had the same problem. Is there some kind of conversion to CNF along the way that causes the API to go bonkers? The only working solution I could come up with is that which @anmolagarwal999 suggested, but it is unfortunate to have to parse a string constructed of a sentence that is already parsed. :/ A better API is warranted in my opinion.

If you pass the parsed_sentence string into this function it will give you an appropriate tree structure.

# This was adapted from https://stackoverflow.com/questions/54959875/recursive-parentheses-parser-for-expressions-of-strings
def parse_tree(sentence):
    stack = []  # or a `collections.deque()` object, which is a little faster
    top = items = []
    for token in filter(None, re.compile(r'(?:([()])|\s+)').split(sentence)):
        if token == '(':
            stack. Append(items)
            items.append([])
            items = items[-1]
        elif token == ')':
            if not stack:
                raise ValueError("Unbalanced parentheses")
            items = stack.pop()  
        else:
            items. Append(token)
    if stack:
        raise ValueError("Unbalanced parentheses")    
    return top

This is a tree so it's not convenient to get stuff out of it. Here is an XPath-like function which you can use to query the structure.

def find_pos(tree, pos):
    result = []
    if not isinstance(tree[0], str):
        result = [find_pos(subtree, pos) for subtree in tree]
    else:
        pos_parts = pos.split("/")
        if re.match(pos_parts[0], tree[0], flags=re.IGNORECASE):
            if len(pos_parts) == 1:
                return tree[1]
            else:
                result =  [find_pos(subtree, "/".join(pos_parts[1:])) for subtree in tree[1:]]
    if len(result) == 0:
        return None
    result = [f for f in result if f is not None]
    if len(result) == 0:
        return None
    elif len(result) == 1:
        return result[0]
    else:
        return result

You provide the (re-)parsed tree and the desired part of speech (as a string, case insensitive), but you have to specify the path from the root. For example if your sentence is a S > VP kind of sentence then getting the verb(s) should be like this: find_pos(command, 'VP/VB') and if there is a noun associated with that, find_pos(command, 'VP/NP/NN.*') should do. If you want to get prepositional nouns (go to the store) then you can also use find_pos(command, 'VP/PP/NP/NN.*'). Slashes separate tree levels you want to iterate through, but the expressions between the slashes can be complex regex expressions too! This allows some cleverness if you're careful with it.

Since I use regular expressions you have to import re to use this code. Enjoy!

Naman-ntc · 2023-01-22T23:50:45Z

Given any span you can use the function to get a list of labels

def get_span_labels(span: str) -> List[str]:
    labels = span._.labels
    if len(labels) == 0:
        doc = span.doc
        start, end = span.start, span.end
        assert start + 1 == end
        labels = (doc[start].tag_,)
        # constituent_data = doc._._constituent_data
        # labels_index = (
        #     (constituent_data.starts == start) * (constituent_data.ends == end)
        # ).argmax()
        # labels = constituent_data.label_vocab[labels_index]
    return labels

th-yoo · 2024-07-04T09:27:57Z

Below is a portion of the parse_string() function.

        label = label_vocab[label_idx]
        if (i + 1) >= j:
            token = doc[i]
            s = (
                "("
                + u"{} {}".format(token.tag_, token.text)
                .replace("(", "-LRB-")
                .replace(")", "-RRB-")
                .replace("{", "-LCB-")
                .replace("}", "-RCB-")
                .replace("[", "-LSB-")
                .replace("]", "-RSB-")
                + ")"
            )

label is an empty tuple but, ._.parse_string shows token.tag_ as a tag.

Workaroud 1
Instead of ._.lables, use the function below.

def get_labels(span):
    return span._.labels or (span[0].tag_,)

Workaround 2
Override the installed extensions.

org_span_labels = spacy.tokens.Span.remove_extension('labels')

def get_labels(span):
    return  org_span_labels[2](span) or (span[0].tag_,)

spacy.tokens.Span.set_extension('labels', getter=get_labels)

spacy.tokens.Token.remove_extension('labels')
spacy.tokens.Token.set_extension(
    'labels',
    getter=lambda token: get_labels(token.doc[token.i: token.i+1])
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`._.labels` doesn't work for spans with length of one #91

`._.labels` doesn't work for spans with length of one #91

LawlAoux commented Apr 26, 2022

burak0006 commented Aug 23, 2022

anmolagarwal999 commented Oct 2, 2022

badvision commented Nov 25, 2022

Naman-ntc commented Jan 22, 2023 •

edited

Loading

th-yoo commented Jul 4, 2024

._.labels doesn't work for spans with length of one #91

._.labels doesn't work for spans with length of one #91

Comments

LawlAoux commented Apr 26, 2022

burak0006 commented Aug 23, 2022

anmolagarwal999 commented Oct 2, 2022

badvision commented Nov 25, 2022

Naman-ntc commented Jan 22, 2023 • edited Loading

th-yoo commented Jul 4, 2024

`._.labels` doesn't work for spans with length of one #91

`._.labels` doesn't work for spans with length of one #91

Naman-ntc commented Jan 22, 2023 •

edited

Loading