Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while creating predictions on heldout dataset #31

Open
iamsimha opened this issue Jul 31, 2022 · 4 comments
Open

Error while creating predictions on heldout dataset #31

iamsimha opened this issue Jul 31, 2022 · 4 comments

Comments

@iamsimha
Copy link

Steps to reproduce:

  1. Create new dataset using create_hf_dataset.py script
  2. In the config, point to your finetuned model and new dataset. We are using XLMR model.

Running
torchrun --nproc_per_node=1 scripts/predict.py -c examples/xlmr_base_test_20220411.yml

throws the below error.

Traceback (most recent call last):
File "/local/home/desktop/Experiments/massive/scripts/predict.py", line 112, in
main()
File "/local/home/desktop/Experiments/massive/scripts/predict.py", line 102, in main
outputs = trainer.predict(test_ds, tokenizer=tokenizer)
File "/home/desktop/Experiments/massive/src/massive/utils/trainer.py", line 188, in predict
output = self.evaluate(
File "/home/desktop/Experiments/massive/src/massive/utils/trainer.py", line 142, in evaluate
output = eval_loop(
File "/home/desktop/anaconda3/envs/massive/lib/python3.9/site-packages/transformers/trainer.py", line 2314, in evaluation_loop
for step, inputs in enumerate(dataloader):
File "/home/desktop/anaconda3/envs/massive/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 652, in next
data = self._next_data()
File "/home/desktop/anaconda3/envs/massive/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 692, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/desktop/anaconda3/envs/massive/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/home/desktop/Experiments/massive/src/massive/loaders/collator_ic_sf.py", line 64, in call
label = entry['slots_num']
KeyError: 'slots_num'

@jgmf-amazon
Copy link
Contributor

jgmf-amazon commented Aug 4, 2022

Hi @iamsimha , greetings. To resolve this error, you must point to the numerical mapping for your slots. EX:

slot_labels: /PATH/TO/hf-mmnlu-eval/hf-mmnlu-eval.slots

@jgmf-amazon
Copy link
Contributor

jgmf-amazon commented Aug 4, 2022

Please let us know if that works. Thanks.

@jgmf-amazon
Copy link
Contributor

Ah, wait, maybe I read your traceback too quickly. Let me check into this a little further.

@jgmf-amazon
Copy link
Contributor

So in my local version of the huggingface-ified evaluation data, created using scripts/create_hf_dataset.py, for each record there is a slots_str key with an empty value. This must be absent in your version of the evaluation data, right? Options are to either (A) add it to yours or (B) do a code change to allow the collator, etc, to work without it. Option B is a better longterm solution, but I'm not sure if we'll have bandwidth on our side in the near term. Please let us know if Option A is workable. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants