Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Thinking maybe we can increase the learning rate for instruction tuning, and sandwich this with pre-training datasets. Train 100,000 pretraining english, 100,000 zhongwen, then 10,000 translation dataset tokens at 10x or higher learning rate. This might help even out learning or emphasize learning of target task, while augmenting with easier to find monolingual pre-training tokens.
- Loading branch information