Skip to content

DS 549 Machine Learning Practicum MAPLE Bill Summarization and Tagging Project

License

Notifications You must be signed in to change notification settings

BU-Spark/ml-maple-bill-summarization

Repository files navigation

MAPLE (Bill Summarization, Tagging, Explanation)

In this project, we generate summaries and category tags for of Massachusetts bills for MAPLE Platform. The goal is to simplify the legal language and content to make it comprehensible for a broader audience (9th-grade comprehension level) by exploring different ML and LLM services.

This repository contains a pipeline from taking bills from Massachusetts legislature, generating summaries and category tags leveraging different the Massachusetts General Law sections, creating a dashboard to display and save the generated texts, to deploying and integrating into MAPLE platform.

Roadmap of Repository Directories

  • Documentation:
    Research.md: our research on large language models and evaluation methods we planned to use for this project.
    Documentation MAPLE.pdf: includes detail operation of our model for future use and improvement.

  • EDA: the notebook eda.ipynb includes our work from scraping data that takes bills from MAPLE Swagger API, creating a dataframe to clean and process data, making visualizations to analyze data and explore characteristics of the dataset.

  • demoapp:
    app.py: contains the codes of the LLM service we used and the wepapp we made using Streamlit. The webapp allows user to search for all bills.
    app2.py: we test on top 12 bills from MAPLE website. We extract information from Massachusetts General Law to add context for the summaries of these bills.
    Other files: helper files to be imported in the above two Python app files.

  • Prompts Engineering: prompts.md stores all prompts that we tested.

  • Tagging: contains the list of categories and tags.

  • Deployment: contains the link of our Streamlit deployed webapp.

Ethical Implications

The dataset used for this project is fully open sourced and can be access through Mass General Laws API.

Our team and MAPLE agree about putting disclaimer that this text is AI-generated.

Although we make use of open source transformers to evaluate hallucination with Vectara, it is important to have experts and human evaluation to further maintain a trustworthy LLM system.

Resources and Citation

Team Members

Vy Nguyen - Email: nptv1207@bu.edu
Andy Yang - Email: ayang903@bu.edu
Gauri Bhandarwar - Email: gaurib3@bu.edu
Weining Mai - Email: weimai@bu.edu