Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretraining Bart on Single corpus #31

Closed
1029694141 opened this issue Jul 28, 2022 · 2 comments
Closed

Pretraining Bart on Single corpus #31

1029694141 opened this issue Jul 28, 2022 · 2 comments

Comments

@1029694141
Copy link

Hi,

First thanks for the work on this repo !

I‘m continues pretraining BART on myself English corpus“train_fineshed.txt”,but the python arguments seems didn‘t work
:“file not found error: ***/train_fineshed.txt.01”

my python command as follow:

python pretrain_nmt.py -n 1 -nr 0 -g 2 --pretrained_model facebook/bart-base --use_official_pretrained --tokenizer_name_or_path facebook/bart-base --is_summarization --warmup_steps 500 --save_intermediate_checkpoints --mono_src /home/WwhStuGrp/yyfwwhstu16/yanmtt/dataset/pubmed/pubmed-dataset/train_fineshed.txt --monolingual_domains 1 --train_domains 1

Can u point out my mistake about ur toolkit?

Thank you for your kind help!

@prajdabre
Copy link
Owner

Thanks for using this toolkit.

You are missing the --shard_files argument since you are running the script for the first time.

@1029694141
Copy link
Author

it‘s useful,thanks!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants