Our project introduces a metric designed to evaluate the quality of textual summaries. This metric is pivotal in fields like finance, where precise information synthesis is critical.
- Quality Discrimination: Distinguishes effectively between superior and inferior summaries, ensuring clear differentiation in their factual accuracies.
- Factual Accuracy Measurement: Detects and quantifies any factual deviations, assigning lower scores to less accurate summaries.
- Detail-Oriented Assessment: Provides comprehensive evaluations, focusing on how well the summary captures the essence and details of the original text.
This metric is not merely a tool for evaluation; it's a step towards enhancing the integrity of information processing in sectors where factual accuracy is non-negotiable.
Named Entity Comparison: Extract and compare financial-related named entities in texts. Analyzes and visualizes named entity accuracy and presence in summaries versus original texts.
Sentence-Level-based Summary Checking: Applies LLMs to check the consistency between the summary and the original text sentence by sentence. Highlights and identifies inconsistencies between the summary and the original text for in-depth analysis.
│ .gitignore
│ LICENSE.txt
│ README.md
│
├───config
│ config.sh
│ requirements.txt
│
├───data
│ 10summary_with_result.csv
│ falsified_summary.csv
│ falsified_summary_level.csv
│ final_version_cropped_first1000.csv
│ final_version_withouttext.csv
│
├───doc
│ ├───About_Us
│ │ Team's Bio.pdf
│ │
│ ├───Academic Paper
│ │ 5054_factuality_enhanced_language_m.pdf
│ │ Evaluating Factuality.pdf
│ │ Evaluating the Factual Consistency.pdf
│ │
│ ├───Project Description
│ │ Benchmarking LLM .pdf
│ │ CAPSTONE PROJECT PROPOSAL Fidelity Summarization Metrics.pdf
│ │
│ └───Report
│ Capstone Project Initial Due Diligence Report.pdf
│ F23_Fidelity_Benchmarking LLM_1st_report.pdf
│ F23_Fidelity_Benchmarking LLM_final_report.pdf
│ F23_Fidelity_BenchmarkLLM_poster.pdf
│ Project Proposal.pdf
│
├───res
│ │ 10levels.svg
│ │ good_to_bad.svg
│ │ LLM_Assisted_Framework.jpg
│ │ NER_Framework.jpg
│ │
│ └───Baseline
│ Boxplot_for_Scores.png
│
├───samples
│ documents_extraction.ipynb
│ presentation.ipynb
│ summary_level_with_result.csv
│
├───src
│ │ Bart.py
│ │ PaLM.py
│ │ pipeline.py
│ │ summary_generation.py
│
│
└───test
├───Data_Pipeline
└───Summary_Generation
bart.ipynb
llama2.ipynb
PaLM2.ipynb
test.py
python==3.10.0
ipython==8.15.0
nltk==3.8.1
numpy==1.24.3
openai==1.3.7
pandas==1.5.3
python-dotenv==1.0.0
rouge_score==0.1.2
scikit_learn==1.2.2
sentence_transformers==2.2.2
spacy==3.7.2
stanza==1.6.1
bash ./config/config.sh
bash conda install --file ./config/requirements.txt
import sys
sys.path.append('../src/')
import pipeline
os.environ['OPENAI_API_KEY'] = 'Your OpenAi API Key'
The data extraction process is in documents_extraction
You can also find the demo and result compare with baseline metrics in presentation.
- Initial Due Deiligence Report: Initial Due Deiligence Report
- Project Proposal: Project Proposal
- 1st Milestone Report: 1st Milestone Report
- Final Report: Final Report
- Poster: Poster
Lilli Ann Rowan, Indraneel Biswas, Michael Threlfall, and Diana Kulmizev
Adam Kelleher