Skip to content

Latest commit

 

History

History
51 lines (34 loc) · 1.09 KB

README_SIMMC2.md

File metadata and controls

51 lines (34 loc) · 1.09 KB

SIMMC2.0

Disambiguation Baseline

  1. Preprocess the datasets to reformat the data for GPT-2 input.
python format_disambiguation_data.py \
	--simmc_train_json="../../data/simmc2_dials_dstc10_train.json" \
	--simmc_dev_json="../../data/simmc2_dials_dstc10_dev.json" \
	--simmc_devtest_json="../../data/simmc2_dials_dstc10_devtest.json" \
	--disambiguate_save_path="../../data/"
  1. Train and simultaneously test the baseline model.
./evaluate_mm_disambiguation_model.sh 0 5 4

Coreference Baseline

  1. Preprocess the datasets to reformat the data for GPT-2 input.
cd model/mm_dst
./run_preprocess_gpt2.sh
  1. Train the baseline model
./run_train_gpt2.sh
  1. Generate prediction for devtest data
./run_generate_gpt2.sh

The generation results are saved in the /mm_dst/results folder. Change the path_output to a desired path accordingly.

  1. Evaluate predictions for devtest data
./run_evaluate_gpt2.sh

best so far: object f1=0.308, epochs=1, batch=1, gradient acc=4, eval batch=4