You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
File "/add_score.py", line 53, in add_score
res = function(["? I haven't had a birthday since 2007. I have a b-day in October and it's almost completely ignored."], ["",])
File "/add_score_summac.py", line 28, in <lambda>
"my_summacZS_batched": lambda summs, docs: modelZS.score(docs, summs)['scores'],
File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 351, in score
score = self.score_one(source, gen)
File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 322, in score_one
image = self.imager.build_image(original, generated)
File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 113, in build_image
generated_chunks = self.split_text(generated, granularity=gran_sum)
File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 94, in split_text
return self.split_sentences(text)
File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 71, in split_sentences
sentences = nltk.tokenize.sent_tokenize(text)
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/__init__.py", line 107, in sent_tokenize
return tokenizer.tokenize(text)
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1276, in tokenize
return list(self.sentences_from_text(text, realign_boundaries))
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1332, in sentences_from_text
return [text[s:e] for s, e in self.span_tokenize(text, realign_boundaries)]
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1332, in <listcomp>
return [text[s:e] for s, e in self.span_tokenize(text, realign_boundaries)]
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1322, in span_tokenize
for sentence in slices:
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1421, in _realign_boundaries
for sentence1, sentence2 in _pair_iter(slices):
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 318, in _pair_iter
prev = next(iterator)
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1395, in _slices_from_text
for match, context in self._match_potential_end_contexts(text):
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1382, in _match_potential_end_contexts
before_words[match] = split[-1]
IndexError: list index out of range
I think it is caused by the leading "? ", which might lead in an empty sentence within the metric.
Is this to be expected and explained somewhere or is this a bug?
kind regards
Edit:
I circumvented (not fixed) this is issue for now using this code:
match = re.match(r"(\s*[.?!]+\s)", summaries[i])
if match:
summaries[i] = summaries[i][len(match.group(1)):]
because empty leading sentences with other symbols than "?" also caused this issue.
The text was updated successfully, but these errors were encountered:
Hi,
I encountered an error:
I think it is caused by the leading "? ", which might lead in an empty sentence within the metric.
Is this to be expected and explained somewhere or is this a bug?
kind regards
Edit:
I circumvented (not fixed) this is issue for now using this code:
because empty leading sentences with other symbols than "?" also caused this issue.
The text was updated successfully, but these errors were encountered: