IndexError: list index out of range #12

UntotaufUrlaub · 2023-07-29T12:48:58Z

Hi,

I encountered an error:

File "/add_score.py", line 53, in add_score
    res = function(["? I haven't had a birthday since 2007. I have a b-day in October and it's almost completely ignored."], ["",])
  File "/add_score_summac.py", line 28, in <lambda>
    "my_summacZS_batched": lambda summs, docs: modelZS.score(docs, summs)['scores'],
  File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 351, in score
    score = self.score_one(source, gen)
  File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 322, in score_one
    image = self.imager.build_image(original, generated)
  File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 113, in build_image
    generated_chunks = self.split_text(generated, granularity=gran_sum)
  File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 94, in split_text
    return self.split_sentences(text)
  File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 71, in split_sentences
    sentences = nltk.tokenize.sent_tokenize(text)
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/__init__.py", line 107, in sent_tokenize
    return tokenizer.tokenize(text)
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1276, in tokenize
    return list(self.sentences_from_text(text, realign_boundaries))
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1332, in sentences_from_text
    return [text[s:e] for s, e in self.span_tokenize(text, realign_boundaries)]
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1332, in <listcomp>
    return [text[s:e] for s, e in self.span_tokenize(text, realign_boundaries)]
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1322, in span_tokenize
    for sentence in slices:
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1421, in _realign_boundaries
    for sentence1, sentence2 in _pair_iter(slices):
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 318, in _pair_iter
    prev = next(iterator)
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1395, in _slices_from_text
    for match, context in self._match_potential_end_contexts(text):
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1382, in _match_potential_end_contexts
    before_words[match] = split[-1]
IndexError: list index out of range

I think it is caused by the leading "? ", which might lead in an empty sentence within the metric.
Is this to be expected and explained somewhere or is this a bug?

kind regards

Edit:
I circumvented (not fixed) this is issue for now using this code:

match = re.match(r"(\s*[.?!]+\s)", summaries[i])
if match:
    summaries[i] = summaries[i][len(match.group(1)):]

because empty leading sentences with other symbols than "?" also caused this issue.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError: list index out of range #12

IndexError: list index out of range #12

UntotaufUrlaub commented Jul 29, 2023 •

edited

Loading

IndexError: list index out of range #12

IndexError: list index out of range #12

Comments

UntotaufUrlaub commented Jul 29, 2023 • edited Loading

UntotaufUrlaub commented Jul 29, 2023 •

edited

Loading