You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I found that CamRest676 is used in the pre-training phase. However, 675 of CamRest-676(dialogs) are already Included in the MultiWOZ training datasets, and CamRest676 is also processed and trained in a multi-task way.
While in your low-resource training (Sec 4.1.4), i.e., we train our model on MultiWOZ 2.0 by varying the percentage of training data, ranging from 1% (∼80 samples) to 20% (∼1600 samples). Although the model did not use the MultiWOZ dataset in the pre-training phase, the model has already potentially still seen a lot dialogs of MultiWOZ through CamRest676, i.e., the model already leaks information during pre-training phase; as such the results of low-resource training maybe not fully correct.
The text was updated successfully, but these errors were encountered:
jianguoz
changed the title
Incorrect results on low-resource settings due to leaking information from Pre-training Phase
Question about low-resource settings due to leaking information from Pre-training Phase
Sep 2, 2022
jianguoz
changed the title
Question about low-resource settings due to leaking information from Pre-training Phase
Question about low-resource settings due to potential leaking information from Pre-training Phase
Sep 2, 2022
Hi, I found that CamRest676 is used in the pre-training phase. However, 675 of CamRest-676(dialogs) are already
Included
in the MultiWOZ training datasets, and CamRest676 is also processed and trained in a multi-task way.While in your low-resource training (Sec 4.1.4), i.e.,
we train our model on MultiWOZ 2.0 by varying the percentage of training data, ranging from 1% (∼80 samples) to 20% (∼1600 samples).
Although the model did not use the MultiWOZ dataset in the pre-training phase, the model has already potentially still seen a lot dialogs of MultiWOZ through CamRest676, i.e., the model already leaks information during pre-training phase; as such the results of low-resource training maybe not fully correct.The text was updated successfully, but these errors were encountered: