Error while creating predictions on heldout dataset #31

iamsimha · 2022-07-31T10:37:17Z

Steps to reproduce:

Create new dataset using create_hf_dataset.py script
In the config, point to your finetuned model and new dataset. We are using XLMR model.

Running
torchrun --nproc_per_node=1 scripts/predict.py -c examples/xlmr_base_test_20220411.yml

throws the below error.

Traceback (most recent call last):
File "/local/home/desktop/Experiments/massive/scripts/predict.py", line 112, in
main()
File "/local/home/desktop/Experiments/massive/scripts/predict.py", line 102, in main
outputs = trainer.predict(test_ds, tokenizer=tokenizer)
File "/home/desktop/Experiments/massive/src/massive/utils/trainer.py", line 188, in predict
output = self.evaluate(
File "/home/desktop/Experiments/massive/src/massive/utils/trainer.py", line 142, in evaluate
output = eval_loop(
File "/home/desktop/anaconda3/envs/massive/lib/python3.9/site-packages/transformers/trainer.py", line 2314, in evaluation_loop
for step, inputs in enumerate(dataloader):
File "/home/desktop/anaconda3/envs/massive/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 652, in next
data = self._next_data()
File "/home/desktop/anaconda3/envs/massive/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 692, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/desktop/anaconda3/envs/massive/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/home/desktop/Experiments/massive/src/massive/loaders/collator_ic_sf.py", line 64, in call
label = entry['slots_num']
KeyError: 'slots_num'

jgmf-amazon · 2022-08-04T17:34:28Z

~~Hi @iamsimha , greetings. To resolve this error, you must point to the numerical mapping for your slots. EX:~~

massive/examples/mt5_base_t2t_mmnlu_20220720.yml

Line 34 in 0d474f3

slot_labels: /PATH/TO/hf-mmnlu-eval/hf-mmnlu-eval.slots

jgmf-amazon · 2022-08-04T17:43:44Z

~~Please let us know if that works. Thanks.~~

jgmf-amazon · 2022-08-04T17:44:39Z

Ah, wait, maybe I read your traceback too quickly. Let me check into this a little further.

jgmf-amazon · 2022-08-04T18:01:44Z

So in my local version of the huggingface-ified evaluation data, created using scripts/create_hf_dataset.py, for each record there is a slots_str key with an empty value. This must be absent in your version of the evaluation data, right? Options are to either (A) add it to yours or (B) do a code change to allow the collator, etc, to work without it. Option B is a better longterm solution, but I'm not sure if we'll have bandwidth on our side in the near term. Please let us know if Option A is workable. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while creating predictions on heldout dataset #31

Error while creating predictions on heldout dataset #31

iamsimha commented Jul 31, 2022

jgmf-amazon commented Aug 4, 2022 •

edited

Loading

jgmf-amazon commented Aug 4, 2022 •

edited

Loading

jgmf-amazon commented Aug 4, 2022

jgmf-amazon commented Aug 4, 2022

Error while creating predictions on heldout dataset #31

Error while creating predictions on heldout dataset #31

Comments

iamsimha commented Jul 31, 2022

jgmf-amazon commented Aug 4, 2022 • edited Loading

jgmf-amazon commented Aug 4, 2022 • edited Loading

jgmf-amazon commented Aug 4, 2022

jgmf-amazon commented Aug 4, 2022

jgmf-amazon commented Aug 4, 2022 •

edited

Loading

jgmf-amazon commented Aug 4, 2022 •

edited

Loading