-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is training from scratch possible now? #1283
Comments
You can just instanciate the models without the config = BertConfig(**optionally your favorite parameters**)
model = BertForPretraining(config) I added a flag to # check if instead initialize freshly
if args.do_fresh_init:
config = config_class()
tokenizer = tokenizer_class()
if args.block_size <= 0:
args.block_size = tokenizer.max_len # Our input block size will be the max possible for the model
args.block_size = min(args.block_size, tokenizer.max_len)
model = model_class(config=config)
else:
config = config_class.from_pretrained(args.config_name if args.config_name else args.model_name_or_path)
tokenizer = tokenizer_class.from_pretrained(args.tokenizer_name if args.tokenizer_name else args.model_name_or_path)
if args.block_size <= 0:
args.block_size = tokenizer.max_len # Our input block size will be the max possible for the model
args.block_size = min(args.block_size, tokenizer.max_len)
model = model_class.from_pretrained(args.model_name_or_path, from_tf=bool('.ckpt' in args.model_name_or_path), config=config)
model.to(args.device) |
Hi, thanks for the quick response. |
I don’t know firsthand, but suppose so and it is fundamentally an easy problem to reinitialize weights randomly before any kind of training in pytorch :)
Good luck,
Zacharias
Am 18. Sep. 2019, 1:56 PM +0200 schrieb Stamenov <notifications@github.com>:
… Hi,
thanks for the quick response.
I am more interested in the XLNet and TransformerXL models. Would they have the same interface?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I think XLNet requires a very specific training procedure, see #943 👍 "For XLNet, the implementation in this repo is missing some key functionality (the permutation generation function and an analogue of the dataset record generator) which you'd have to implement yourself." |
Hmm, tokenizers' constructors require a |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@Stamenov Did you figure out how to pretrain XLNet? I'm interested in that as well. |
No, I haven't. According to some recent tweet, huggingface could prioritize putting more effort into providing interfaces for self pre-training. |
You can now leave |
As referenced in e.g. huggingface/transformers#1283
Do the models support training from scratch, together with original (paper) parameters?
The text was updated successfully, but these errors were encountered: