SumEvaluator

This library offers an array of supplementary tools designed to enhance the traditional ROUGE score used in assessing generated summaries. Included in this package are four distinct metrics aimed at evaluating models in terms of Length, Novelty, Focus, and GPT-4 score.

Usage

Run the following command to install the package.

pip install -q git+https://github.com/AlaFalaki/SumEvaluator.git

You can see usage example in the demo.ipynb notebook.

Metrics

GPT-4 Evaluation

This metric will use the GPT-4 model like a human evaluator and assign a score to the generated summary with respect to the article. (instead of the target summary) The approach returns four scores for Relevance, Consistency, Fluency, and Coherence. It will evaluate the generated summary on different aspects, such as including important information from the article or grammatical correctness. Refer to the defined prompt in the gpt.py file for a full definition of each metric.

import SumEvaluator

SumEvaluator.gpt.calculate( [ARTICLE],
                            [SUMMARY],
                            api_key="<OpenAI_API_KEY>")

# {'relevance': 9.0, 'consistency': 9.0, 'fluency': 10.0, 'coherence': 9.0}

You could either pass the API key directly to the .calculate() method, or set the OPENAI_API_KEY key in your Python environment.

Novety

This metric calculates the number of n-grams in the summary but is absent in the article. It is useful for assessing the extent of the model's replication behaviour.

SumEvaluator.novelty.calculate(articles=[ARTICLE],
                               summaries=[SUMMARY],
                               ngrams=[1, 2])

# {'unigram': 0.23809523809523808, 'bigram': 0.5833333333333334}

Focus Finder

This visualization method employs an embedding model to determine the cosine similarity score between the generated summary and each sentence from the article. Then, the illustrate() method helps with illustrating where the model is concentrating its attention during the summary generation process.It is possible to see the score assigned to each sentence by hovering your cursor on the sentence number for a few seconds.

SumEvaluator.focus.prepare([ARTICLE], [SUMMARY], "test_proj")
SumEvaluator.focus.illustrate("test_proj")

Length

A simple metric that measure the average length of the generated summaries and the average ratio of them with respect to the articles lenghts.

SumEvaluator.length.calculate([ARTICLE], [SUMMARY])

# {'average_summary_length': 49.0, 'average_article_summary_ratio': 7.73469387755102}

Requirements

Python 3.8.10
torch 1.10.0
nltk 3.8.1
sentence_transformers 2.2.2
pandas 1.5.3
openai 0.28
six 1.16.0

Citations

If you wish to cite the paper, you may use the following:

@paper comming soon?

GL!

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
SumEvaluator		SumEvaluator
images		images
README.md		README.md
demo.ipynb		demo.ipynb
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SumEvaluator

Usage

Metrics

GPT-4 Evaluation

Novety

Focus Finder

Length

Requirements

Citations

About

Releases

Packages

Languages

AlaFalaki/SumEvaluator

Folders and files

Latest commit

History

Repository files navigation

SumEvaluator

Usage

Metrics

GPT-4 Evaluation

Novety

Focus Finder

Length

Requirements

Citations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages