Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛[BugFix]fix get enitity #38

Merged
merged 2 commits into from
Sep 30, 2020
Merged

🐛[BugFix]fix get enitity #38

merged 2 commits into from
Sep 30, 2020

Conversation

AlongWY
Copy link
Contributor

@AlongWY AlongWY commented Jul 8, 2020

Fix seqeval process labels such as 'B', 'B-ARGM-LOC'

Fix get entity
@AlongWY AlongWY changed the title Fix get entity 🐛[BugFix]fix get enitity Jul 8, 2020
@AlongWY
Copy link
Contributor Author

AlongWY commented Jul 8, 2020

examples:

input: ['B', 'I', 'I', 'O', 'B', 'I']
before: [('B', 0, 0), ('I', 1, 2), ('B', 4, 4), ('I', 5, 5)]
after: [('_', 0, 2), ('_', 4, 5)]

input: ['B-ARGM-LOC', 'I-ARGM-LOC', 'I-ARGM-LOC', 'O', 'B-ARGM-TIME', 'I-ARGM-TIME']
before: [('LOC', 0, 2), ('TIME', 4, 5)]
after: [('ARGM-LOC', 0, 2), ('ARGM-TIME', 4, 5)]

This is my test code:

from metrics.seqeval.seqeval import end_of_chunk, start_of_chunk


def before_get_entities(seq, suffix=False):
    """Gets entities from sequence.
    Args:
        seq (list): sequence of labels.
    Returns:
        list: list of (chunk_type, chunk_start, chunk_end).
    """
    if any(isinstance(s, list) for s in seq):
        seq = [item for sublist in seq for item in sublist + ['O']]

    prev_tag = 'O'
    prev_type = ''
    begin_offset = 0
    chunks = []
    for i, chunk in enumerate(seq + ['O']):
        if suffix:
            tag = chunk[-1]
            type_ = chunk.split('-')[0]
        else:
            tag = chunk[0]
            type_ = chunk.split('-')[-1]

        if end_of_chunk(prev_tag, tag, prev_type, type_):
            chunks.append((prev_type, begin_offset, i - 1))
        if start_of_chunk(prev_tag, tag, prev_type, type_):
            begin_offset = i
        prev_tag = tag
        prev_type = type_

    return chunks


def after_get_entities(seq, suffix=False):
    """Gets entities from sequence.
    Args:
        seq (list): sequence of labels.
    Returns:
        list: list of (chunk_type, chunk_start, chunk_end).
    """
    if any(isinstance(s, list) for s in seq):
        seq = [item for sublist in seq for item in sublist + ['O']]

    prev_tag = 'O'
    prev_type = ''
    begin_offset = 0
    chunks = []
    for i, chunk in enumerate(seq + ['O']):
        if suffix:
            tag = chunk[-1]
            type_ = chunk[:-1].rsplit('-', maxsplit=1)[0] or '_'
        else:
            tag = chunk[0]
            type_ = chunk[1:].split('-', maxsplit=1)[-1] or '_'

        if end_of_chunk(prev_tag, tag, prev_type, type_):
            chunks.append((prev_type, begin_offset, i - 1))
        if start_of_chunk(prev_tag, tag, prev_type, type_):
            begin_offset = i
        prev_tag = tag
        prev_type = type_

    return chunks


def main():
    examples_1 = ['B', 'I', 'I', 'O', 'B', 'I']
    print(before_get_entities(examples_1))
    print(after_get_entities(examples_1))
    examples_2 = ['B-ARGM-LOC', 'I-ARGM-LOC', 'I-ARGM-LOC', 'O', 'B-ARGM-TIME', 'I-ARGM-TIME']
    print(before_get_entities(examples_2))
    print(after_get_entities(examples_2))


if __name__ == '__main__':
    main()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants