Skip to content

Latest commit

 

History

History
42 lines (30 loc) · 2.18 KB

README.md

File metadata and controls

42 lines (30 loc) · 2.18 KB

QuestionAnswering

This repository contains the code for the Final Project of the Natural Language Processing course in the Artificial Intelligence master degree at UniBo.

The objective of the project is to create an NLP system that solves the problem of Question Answering on the SQuAD dataset.

Quick start

Clone the repository, create a virtual environment and install the requirements provided in requirements.txt.

python3 -m venv .env # or conda create -n NLP python3

Then, once the environment is active:

python3 -m pip install -r requirements.txt

Our normal model's weights can be downloaded from here. They must be placed in src/checkpoints.

Another important step is to download SpaCy's english language model:

python3 -m spacy download en_core_web_sm

Then, the model can be evaluated on a test dataset using python3 compute_answers.py *PATH_TO_TEST_JSON_FILE*.

Organization of the repository

  • TaskExplanation.pdf contains the explanation of the task
  • data contains the JSON files of the training (training_set.json), validation (validation_set.json), test (dev_set.json)
  • src contains the code of our tests and experiments.
    • config.py and utils.py contain utility code that is used thoughout all other files
    • checkpoints should contain the weights of the model
    • baselines.ipynb is a notebook containing the implementation of the baselines described in the report
    • data_analysis.ipynb contains an analysis of the dataset
    • error_analysis.ipynb contains an analysis of the mistakes that the model makes with respect to the ground truth
    • train.ipynb is a notebook containing all of the training experiments we conducted
    • evaluation_tests.ipynb contains the evaluations whose results we presented in the report
  • Finally, report.pdf contains the report for the project.